Operations Overview
The Operations section provides real-time visibility and control over your layline.io deployments, from cluster health to individual engine states.
Purpose
Once you've designed and deployed your workflows, the Operations section becomes your mission control. This is where you monitor live systems, diagnose issues, and manage the day-to-day running of your data pipelines. Unlike the Assets section (where you build) or the Deployment section (where you configure), Operations is about observing and interacting with what's actually happening right now.
The Operations section is organized around three core concepts:
- Cluster Management — The infrastructure view: nodes, deployments, and system health
- Engine State — The runtime view: what's executing, what's connected, what's flowing
- Audit Trail — The history view: who did what, when, and with what result
Who Uses Operations
- Operations Engineers — Monitor cluster health, respond to alarms, manage deployments
- Developers — Debug running workflows, inspect live state, trace data flow
- Administrators — Manage user access, review audit logs, configure system settings
Main Areas
Cluster Management
The cluster is the foundation — a collection of nodes running layline.io engines. This section covers:
- Cluster Login — How to connect to and authenticate with a cluster
- Cluster Tab Overview — Navigating the cluster-level interface
- Alarm Center — Real-time alerts, thresholds, and notification routing
- Deployment Storage — Where deployment configurations live and how to manage them
- Scheduler — Workflow scheduling and execution history
- Stream Monitor — Controller to observe and manage data streams, throughput, and backpressure
- Sniffer Directory — Controller to observer and manage message sniffing
- Access Coordinator — Managing access to sources and resources
- Operations User Storage — User- and role-specific operational data and preferences
- Operations Secret Storage — Secure credential management for operations
- AI Storage — Storage for AI/ML model artifacts and training data
- Cluster Node Detail — Deep-dive into individual node metrics and logs, as well as switching debugging context to a specific node
Engine State
While Cluster Management shows you the infrastructure, Engine State shows you what's actually running on it. This is the live runtime view:
- [Engine State Overview](./engine-state//index.md — The main dashboard for runtime monitoring
- Workflow State — Active workflows, their status, and execution context
- Service State — Running services and their health
- Connection State — Active connections to external systems
- Source State — Input sources and their folders, read positions, etc.
- Sink State — Output sinks and their write status
- Format State — Format parsers and logs
- Resource State — Resource status and detail configs
Engine State is particularly useful for debugging: You can see whether all Assets are running as expected, and look at the detailed state of each as well as their configurations.
Audit Trail
The Audit Trail provides a comprehensive record of all workflow and stream related actions taken within the system:
- Audit Trail Overview — Understanding the audit log structure and retention
Audit logs capture:
- Workflow executions (start, completion, failure)
- Stream events (data arrival, processing milestones)
Other logging for system events (alarms, node status changes, etc.) can be found in the respective sections of Cluster Controllers and Engine State.
Navigating the Operations UI
The Operations section uses a three-level navigation pattern:
- Section Tabs — Switch between Cluster, Engine State, and Audit Trail
- Category Sidebar — Within each section, navigate between specific tools (e.g., Alarm Center, Scheduler)
- Detail Panels — Drill into specific entities (a node, a workflow, a log entry)
Most operational screens follow a similar layout:
- Top bar — Context selector (cluster, environment, time range)
- Main panel — Primary data (lists, graphs, diagrams)
- Sidebar — Filters, quick actions, related links
Common Workflows
Investigating an Alarm
- Alarm fires → Notification sent (email/Teams)
- Open Alarm Center to see the alert details
- Check Cluster Overview for node health
- Drill into Engine State to find the affected workflow
- Review Audit Trail for recent changes
- Take corrective action (restart, redeploy, or escalate)