Skip to main content

File import and export integrations

File integrations move files between systems - importing CSVs from an SFTP server, exporting reports to S3, or processing spreadsheets dropped in a cloud storage bucket. The main challenge is handling large files without running into memory limits or execution timeouts.

Triggers

  • Scheduled trigger - poll an SFTP server or cloud storage bucket on a fixed interval.
  • Webhook trigger - receive a notification when a file is ready (some platforms push a file reference or upload event rather than the file itself).

Handling large files

Each flow execution has limited memory (1 GB) and a 15-minute execution limit. The most important design decision for file integrations is how to avoid loading entire files into memory.

Pass references, not content. When moving files between storage systems (S3, Azure Blob, Dropbox), pass presigned URLs or file references from source to destination instead of downloading the file into the flow. See handling large files for patterns.

Stream or chunk large files. If you must process file content - parsing CSV rows, transforming records - read the file line-by-line or in fixed-size chunks. Use streaming in custom components for byte-level control.

Use recursive flows for very large datasets. Process one page of records per execution, save a progress cursor to Flow State, and re-invoke the flow with the next cursor. See recursive flows.

Best practices

  • Validate before processing. Check file schema, headers, and encoding before ingesting. Move invalid files to a quarantine folder and emit a structured error rather than failing silently.
  • Idempotency. Use checksums, file IDs, or timestamps to detect files that have already been processed and skip them safely on retry.
  • Config-driven paths and credentials. Store SFTP hosts, bucket names, file paths, and credentials in config variables - never hardcode them. The same integration should deploy across many customers with different configurations.
  • Observability. Log records processed, skipped, and errored. Emit metrics for file counts, sizes, and processing duration.
  • Separate transfer from business logic. Keep the file retrieval step independent from transformation and ingestion so failures are easier to isolate.