Skip to main content

Data sync integrations

Data sync is the most common integration type built on Prismatic. A sync integration has two phases: an initial backfill that loads existing records, and ongoing incremental updates that keep data current as things change.

Initial sync

Use the Instance Deploy trigger to run a backfill when a customer enables the integration. Because this trigger also fires when upgrading integration versions, your flow must be idempotent - re-running it should not create duplicates.

For large datasets that exceed the 15-minute execution limit, use recursive flows to process data a few pages at a time. Each execution processes a hand-full of pages, then saves a cursor to Cross-Flow State (for idempotency). Configure the initial sync flow to run only one execution at a time to avoid concurrently processing the same set of pages.

Incremental updates

Webhooks (preferred). Use webhook triggers for near-real-time updates. Register webhook subscriptions in lifecycle handlers so subscriptions are created and cleaned up automatically.

Polling (fallback). If webhooks aren't available, use polling triggers. Persist a cursor (last processed timestamp or ID) in Flow State so each execution fetches only records that changed since the previous run.

Best practices

  • Idempotency. Use stable record identifiers and track processed keys to avoid duplicates across retries or restarts.
  • Checkpoints. Persist cursors in Flow State or Cross-Flow State so the sync resumes cleanly after a failure.
  • Rate limiting. Use batching and concurrency controls to respect source and destination API quotas.
  • Error handling. Use retries with backoff for transient errors. Route persistent failures to a dead-letter flow for investigation.
  • Observability. Log records scanned, updated, skipped, and errored so sync behavior is visible and diagnosable.
  • Handle large files. Large files can consume lots of memory - use streaming strategies.