Operations Overview

The Operations section provides real-time visibility and control over your layline.io deployments, from cluster health to individual engine states.

Purpose

Once you've designed and deployed your workflows, the Operations section becomes your mission control. This is where you monitor live systems, diagnose issues, and manage the day-to-day running of your data pipelines. Unlike the Assets section (where you build) or the Deployment section (where you configure), Operations is about observing and interacting with what's actually happening right now.

The Operations section is organized around three core concepts:

Cluster Management — The infrastructure view: nodes, deployments, and system health
Engine State — The runtime view: what's executing, what's connected, what's flowing
Audit Trail — The history view: who did what, when, and with what result

Who Uses Operations

Operations Engineers — Monitor cluster health, respond to alarms, manage deployments
Developers — Debug running workflows, inspect live state, trace data flow
Administrators — Manage user access, review audit logs, configure system settings

Main Areas

Cluster Management

The cluster is the foundation — a collection of nodes running layline.io engines. This section covers:

Cluster Login — How to connect to and authenticate with a cluster
Cluster Tab Overview — Navigating the cluster-level interface
Alarm Center — Real-time alerts, thresholds, and notification routing
Deployment Storage — Where deployment configurations live and how to manage them
Scheduler — Workflow scheduling and execution history
Stream Monitor — Controller to observe and manage data streams, throughput, and backpressure
Sniffer Directory — Controller to observer and manage message sniffing
Access Coordinator — Managing access to sources and resources
Operations User Storage — User- and role-specific operational data and preferences
Operations Secret Storage — Secure credential management for operations
AI Storage — Storage for AI/ML model artifacts and training data
Cluster Node Detail — Deep-dive into individual node metrics and logs, as well as switching debugging context to a specific node

Engine State

While Cluster Management shows you the infrastructure, Engine State shows you what's actually running on it. This is the live runtime view:

[Engine State Overview](./engine-state//index.md — The main dashboard for runtime monitoring
Workflow State — Active workflows, their status, and execution context
Service State — Running services and their health
Connection State — Active connections to external systems
Source State — Input sources and their folders, read positions, etc.
Sink State — Output sinks and their write status
Format State — Format parsers and logs
Resource State — Resource status and detail configs

Engine State is particularly useful for debugging: You can see whether all Assets are running as expected, and look at the detailed state of each as well as their configurations.

Audit Trail

The Audit Trail provides a comprehensive record of all workflow and stream related actions taken within the system:

Audit Trail Overview — Understanding the audit log structure and retention

Audit logs capture:

Workflow executions (start, completion, failure)
Stream events (data arrival, processing milestones)

Other logging for system events (alarms, node status changes, etc.) can be found in the respective sections of Cluster Controllers and Engine State.

Navigating the Operations UI

The Operations section uses a three-level navigation pattern:

Section Tabs — Switch between Cluster, Engine State, and Audit Trail
Category Sidebar — Within each section, navigate between specific tools (e.g., Alarm Center, Scheduler)
Detail Panels — Drill into specific entities (a node, a workflow, a log entry)

Most operational screens follow a similar layout:

Top bar — Context selector (cluster, environment, time range)
Main panel — Primary data (lists, graphs, diagrams)
Sidebar — Filters, quick actions, related links

Common Workflows

Investigating an Alarm

Alarm fires → Notification sent (email/Teams)
Open Alarm Center to see the alert details
Check Cluster Overview for node health
Drill into Engine State to find the affected workflow
Review Audit Trail for recent changes
Take corrective action (restart, redeploy, or escalate)

Tracing a Data Flow Issue

Start in Audit Trail Workflow to identify workflow instances with errors
Check Audit Trail Stream to confirm data is arriving and is being processed
Review Engine State to check workflow and service health
Use Cluster Node Detail to inspect logs and metrics on the node running the affected workflow
Identify bottlenecks or failures and take action (e.g., restart workflow, adjust resources, or fix configuration)

Key Concepts

Cluster vs. Engine

Cluster — The physical or virtual infrastructure (nodes, networks, storage)
Engine — The layline.io runtime process executing workflows

A cluster can run multiple engines. An engine belongs to one cluster.

Live State vs. Configuration

Configuration (Assets section) — What should be running (the blueprint)
Live State (Operations section) — What is running right now (the reality)

Operations shows live state. If you see a discrepancy (e.g., a missing running workflow which is configured in a project), it usually means a deployment has failed or is pending or an error has occurred.

Alarms vs. Logs

Alarms — Notifications about current problems requiring attention
Logs — Historical record of past events for analysis

Alarms are actionable now. Logs are searchable history.

Security Considerations

Operations provides powerful visibility into running systems. Access is typically restricted:

Read-only access — View metrics, logs, and state (typical for developers)
Operational access — Restart workflows, acknowledge alarms, trigger deployments (typical for ops engineers)
Administrative access — Full control including user management and audit log access (typical for admins)

See Access Coordinator for details on permission management.

Purpose​

Who Uses Operations​

Main Areas​

Cluster Management​

Engine State​

Audit Trail​

Navigating the Operations UI​

Common Workflows​

Investigating an Alarm​

Tracing a Data Flow Issue​

Key Concepts​

Cluster vs. Engine​

Live State vs. Configuration​

Alarms vs. Logs​

Security Considerations​

See Also​