Sources
Sources define where your data comes from — the connection parameters and technical details for inbound data endpoints.
What are Sources?
A Source Asset specifies the technical parameters for connecting to an external system that provides data. This includes:
- Connection details — endpoints, credentials, and access parameters
- Location configuration — folders, buckets, queues, or topics to read from
- Polling behavior — how often to check for new data
- Failure handling — retry logic and error management
Sources are referenced by Input Processors, which handle the actual message ingestion and processing workflow.
Sources vs Input Processors
Understanding the distinction helps you build maintainable workflows:
| Sources | Input Processors |
|---|---|
| Define where data comes from (connection/location) | Define how data is processed (workflow entry) |
| Reusable across multiple workflows | Workflow-specific configuration |
| Configure endpoints, folders, credentials | Configure format parsing, routing, initial processing |
| Example: S3 bucket name and region | Example: Which format to parse, where to route |
When to define a Source vs using Input Processor directly
Define a Source Asset when:
- The same connection/location is used across multiple workflows
- You need to centralize credential management
- You want to monitor and manage connections independently
- You're connecting to external systems (S3, Kafka, Email, FTP, etc.)
Input Processor configuration is sufficient when:
- The data source is internal to the workflow (Message Source, Timer)
- No external connection parameters are needed
- The configuration is truly workflow-specific
Sources Overview
| Source | Purpose | Common Use Case |
|---|---|---|
| Connect to mail servers (IMAP, POP3, MS 365) | Process incoming emails and attachments | |
| File System | Read from local or mounted file systems | Pick up files from local directories |
| FTP | Connect to FTP/SFTP servers | Retrieve files from FTP endpoints |
| Google Cloud Storage | Poll GCS buckets for objects | Cloud storage integration |
| HTTP | Define HTTP endpoint parameters | REST API data ingestion |
| Kafka | Connect to Apache Kafka topics | Stream processing from Kafka |
| Message | Define inline message sources | Static or template messages |
| NFS | Connect to NFS shares | Network file system access |
| OneDrive | Connect to Microsoft OneDrive | Cloud document processing |
| S3 | Poll Amazon S3 buckets | Object storage integration |
| Service | Use Service Assets as data sources | JDBC queries, HTTP calls as streams |
| SharePoint | Connect to Microsoft SharePoint | Enterprise document management |
| SMB | Connect to SMB/CIFS shares | Windows file sharing |
| SQS | Poll Amazon SQS queues | Message queue processing |
| TCP | Receive data over TCP connections | Network socket data ingestion |
| Timer | Generate scheduled message triggers | Cron-based workflow initiation |
| UDP | Receive data over UDP | Lightweight network messaging |
| Virtual File System | Poll VFS for files | Abstracted file system access |
| WebDAV | Connect to WebDAV servers | Web-based file management |
Common Configuration Patterns
Most Source Assets share these configuration sections:
- Name & Description — Asset identification
- Required Roles — Access control for the asset
- Connection — Reference to a Connection Asset (for external sources)
- Directories/Folders — Source location configuration
- Polling & Processing — How often to check for new data
- Throttling & Failure Handling — Rate limiting and error recovery
See Also
- Input Processors — Process data from Sources
- Connections — Reusable connection definitions used by Sources
- Sinks — The counterpart to Sources — where data goes