Bulk Data Sync

Many integrations need to do an initial bulk import of data when first deployed, and then keep data up to date by processing incoming webhooks in real time. Or, they need to periodically re-sync data from a source system on a schedule.

The code-native batchFlowTrigger pattern handles both cases through a single, unified execution path.

Bulk data sync instance execution

How it works

If you'd like a video walkthrough of this pattern, click here.

A flow using batchFlowTrigger defines two trigger functions:

onDeploy - runs when the instance is deployed. Fetches one page of records and returns them, along with a paginationState cursor so the next page can pick up where this one left off. Prismatic calls onDeploy repeatedly until you return null for paginationState, signaling that the initial sync is complete.
onTrigger - runs each time the flow's webhook fires. Like onDeploy, it can return items and a paginationState to page through data - useful when a single webhook event should trigger a batched re-sync. For simple real-time events, it typically returns just the incoming records with no pagination state.

Both functions produce items - an array of records - that Prismatic hands to onExecution in chunks. Your onExecution function processes records the same way regardless of which function produced them.

Basic structure

flows.ts
import { batchFlowTrigger, flow } from "@prismatic-io/spectral";

// The shape of a single record
type Post = { id: number; title: string };

// The cursor carried between backfill pages to remember where we left off
type PostCursor = { startId: number };

export const importPosts = flow({
  name: "Import Posts",
  stableKey: "import-posts",

  // How many records onExecution receives at once, and how many
  // batches can run concurrently. Note - `onDeploy` can return
  // any number of records, and they'll be split into batches of
  // this size for `onExecution`.
  batchConfig: { batchSize: 5, concurrentBatchLimit: 3 },

  trigger: batchFlowTrigger<Post, PostCursor>({
    onDeploy: async (context, payload) => {
      // Get previous pagination state, or start at 0 if this is the first page
      const startId = payload.paginationState?.startId ?? 0;

      // Assume `fetchPage` returns `{ data: Post[] }` for the next page of posts
      const response = await fetchPage(startId, 20);

      return {
        items: response.data,
        // Return null when the page is empty - signals the sync is done
        paginationState:
          response.data.length > 0
            ? { startId: startId + response.data.length } // Increment the cursor for the next page
            : null,
      };
    },

    // Receive incoming webhook events and return them as a `Post[]` array to be processed by onExecution.
    onTrigger: async (context, payload) => {
      const post = payload.body.data as Post;
      return {
        items: [post],
        response: { statusCode: 200, contentType: "text/plain", body: "ok" },
      };
    },
  }),

  onExecution: async (context, params) => {
    // Both onDeploy and onTrigger deliver records here
    const posts = params.onTrigger.results.body.data as Post[];
    for (const post of posts) {
      context.logger.info(`Processing post ${post.id}: ${post.title}`);
    }
    return { data: null };
  },
});

Key concepts

`items` and `paginationState`

onDeploy returns an object with two fields:

items - the records fetched from this page. These are what onExecution will receive when the set of records is split into batches.
paginationState - any serializable value that represents your position in the dataset. Prismatic passes this back to onDeploy on the next call as payload.paginationState. Return null (or undefined) when there are no more pages.

You choose the shape of paginationState. A simple page offset, a last-seen ID, or an API-provided cursor token all work well.

`onDeploy` runs repeatedly until pagination ends

Prismatic calls onDeploy in a loop:

First call: payload.paginationState is undefined. Fetch page 1 and return paginationState pointing to page 2.
Second call: payload.paginationState is whatever you returned previously. Fetch page 2, return paginationState pointing to page 3.
Continue until a page returns no data - return paginationState: null to stop.

Each call to onDeploy produces a set of items that are immediately dispatched to onExecution, so processing starts while the next page is still being fetched.

`batchConfig` controls throughput

batchConfig: { batchSize: 5, concurrentBatchLimit: 3 }

batchSize - onExecution is called once for every batchSize records. If onDeploy fetches 20 records with batchSize: 5, onExecution is invoked four times.
concurrentBatchLimit - the number of onExecution calls that can run in parallel. Tune this against your downstream system's rate limits.

`onTrigger` and `onDeploy` both feed `onExecution`

onTrigger runs whenever the flow's webhook fires, and it supports the same return shape as onDeploy: items, an optional paginationState, and an optional HTTP response. This means onTrigger can also page through data in batches - for example, if a webhook event signals that a new batch of records is available, onTrigger can fetch and page through them the same way onDeploy does.

For simple real-time events (a single record arrives in the webhook body), onTrigger typically returns just that record with no pagination state. Either way, the items flow into the same onExecution path.

Inside onExecution, records are always available at:

params.onTrigger.results.body.data;

This is an array regardless of whether they came from onDeploy or onTrigger.

Regular data syncs

The onDeploy example above shows how to backfill a large dataset when the integration is first deployed.

If you want to run the data sync on a regular schedule instead of just on deploy, you can omit onDeploy and instead use the same items / paginationState pattern in onTrigger. For example, a scheduled flow could run every hour and page through a source API to fetch new records, returning them to onExecution in batches.

FAQ

Is there a limit to the number of records I can backfill?

To prevent runaway infinite loops, an onDeploy or onTrigger will loop a maximum of 1000 times. If you need to fetch more than 1000 pages of records, consider fetching multiple pages at once and returning them in a single items array.

I only see a few batches when testing - why?

It's easy to accidentally create an infinite loop in an onDeploy or onTrigger function if you handle pagination incorrectly. The test runner will stop after 2 iterations to give you an opportunity to see sample results of your batch processing, but will not execute a full backfill. If you want to test a full backfill, deploy the integration and watch the instance execution in real time.

Example integrations

Two reference integrations show this pattern end to end:

Simple initial data sync - pages through a public JSON API using a numeric offset cursor and processes records through a unified onExecution. A good starting point.
Salesforce initial data sync - uses a last-seen ID cursor to page through Salesforce leads via SOQL, and also sets up a Salesforce Outbound Message to receive new leads in real time.

Additional resources

Get Started with Bulk Data Syncs - a video walkthrough of this pattern
Handling Large Data Sets - webinar discussing large data syncs

How it works​

Basic structure​

Key concepts​

items and paginationState​

onDeploy runs repeatedly until pagination ends​

batchConfig controls throughput​

onTrigger and onDeploy both feed onExecution​

Regular data syncs​

FAQ​

Example integrations​

Additional resources​