How to Design B2B SaaS Integrations that Don't Hit Rate Limits

The previous post in this series covered what rate limits are, how third-party APIs implement them, and the two reactive patterns every integration needs: exponential backoff with jitter and circuit breaker. Those patterns handle rate limits after they're hit.

This post is about not hitting them in the first place.

Here are the three proactive patterns, plus the observability layer that lets you know everything is working as it should.

Proactive pattern 1 – Rate-aware scheduling

Rather than firing requests as fast as possible and recovering from 429s reactively, you pace requests proactively to stay below rate-limit thresholds.

Use this when you have predictable, high-volume operations and know the API rate limit. This includes bulk syncs, nightly data loads, and batch exports.

The low-code approach: Loop + Sleep

In Prismatic's low-code integration designer, the simplest governor is a Loop step wrapping your API call with a Sleep function at the bottom of the loop body:

1234
[Loop: Repeat for Each record]
  → HTTP: POST /contacts      (the API call)
  → Sleep: 750ms              (pacing delay)
[End Loop]

The Sleep component accepts milliseconds as input. If the target API allows 100 requests per minute, sleeping 600ms per iteration (60,000ms/100) keeps you right at the limit. Apply the 80% headroom principle and use 80% of the documented limit, leaving a margin for other processes sharing the same credentials and for minor timing variances. At 80% of 100 requests per minute, your sleep interval becomes 750ms.

The CNI approach: `createClient` with proactive pacing

In a code native integration, use Prismatic's createClient from @prismatic-io/spectral/dist/clients/http. It wraps Axios with built-in retries and backoff. For proactive pacing, combine it with a deliberate sleep between iterations:

1234567891011121314151617181920212223
import { createClient } from "@prismatic-io/spectral/dist/clients/http";

const RATE_LIMIT = 100; // requests per minute, from API docs
const HEADROOM = 0.8;   // use 80% of the limit
const DELAY_MS = Math.floor(60_000 / (RATE_LIMIT * HEADROOM));

const client = createClient({
  baseUrl: "<https://api.example.com/v1>",
  headers: { Authorization: `Bearer ${connection.fields.apiKey}` },
  retryConfig: {
    retries: 5,
    retryDelay: 1000,
    useExponentialBackoff: true,
    retryCondition: (error) => error.response?.status === 429,
  },
});

const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));

for (const record of records) {
  await client.post("/contacts", record);
  await sleep(DELAY_MS);
}

The sleep loop is the proactive governor. The retryConfig is the reactive safety net. If the pacing falls short and a 429 slips through, createClient handles the retry with exponential backoff, targeting only 429 responses so other errors (400, 403, and 500) still surface immediately.

So, why 80% and not 100%? Running at the exact limit leaves no margin for other processes sharing the same credentials, minor timing imprecision, or possible spikes. 80% is a practical default. But you should adjust it based on how many other processes share the same API key.

Replacing the distributed governor

A common pattern we see in custom-built integration infrastructure is to use Redis to share rate-limit state across worker processes (keyed by customer) so that one tenant's burst doesn't consume another's budget. With Prismatic, you don't need to build that.

Prismatic's flow concurrency controls serve the same purpose. Each flow can be set to one of three modes in the trigger settings:

Mode	Behavior
Parallel (default)	Numerous concurrent executions, up to the global organization limit
Throttled	2–15 concurrent executions with overflow queueing automatically
Sequential (FIFO)	One execution at a time, in order of arrival

For a high-volume bulk sync flow, Throttled or Sequential modes are equivalent to a distributed governor, but without Redis, a custom worker pool, or a shared mutable state to manage. Because each of your customers gets a separate instance of the integration, one customer's bulk export cannot consume another customer's concurrency budget. Per-customer isolation (as Redis provides) is built into Prismatic.

Reading rate limit headers dynamically

Hard-coding the rate limit only works until the API changes that limit. A better approach reads X-RateLimit-Remaining from each response and adjusts pacing in real time. In a CNI function, it looks like this:

12345
function calculateDelay(remaining: number, resetAt: number): number {
  const windowMs = Math.max(resetAt - Date.now(), 1_000);
  const safeRequests = Math.floor(remaining * HEADROOM);
  return safeRequests > 0 ? windowMs / safeRequests : windowMs;
}

This ensures that if the remaining count drops faster than expected, things are slowed down. But if more headroom is available, things speed up. The governor adapts to real-time API conditions rather than documented ones.

Staggering schedules across customers

Even with per-customer concurrency controls, running all customer syncs at midnight creates a burst at the infrastructure level regardless of individual rate limits. Prismatic's Schedule trigger accepts standard cron expressions and supports per-customer schedules through a Schedule-type config variable, so that each customer instance can have its own timing.

Use a string-type config variable driven by a code data source to generate a staggered cron offset per customer:

123
// Code data source: generates a per-customer staggered cron offset
const minuteOffset = Math.floor(Math.random() * 5); // 0–4 minute offset
return `${minuteOffset}/5 * * * *`;  // runs every 5 min, offset varies per customer

This spreads the load naturally across your customer base without any custom scheduler infrastructure.

For scheduled flows, also enable Singleton Executions in flow control settings. This skips a scheduled run if the previous execution is still in progress, preventing a slow sync from stacking concurrent runs against an already-constrained API.

Proactive pattern 2 – Batching and bulk endpoints

Combine multiple operations into a single API call where the API supports it, reducing the total request count. This is your first line of rate-limit defense, since fewer requests mean fewer opportunities to hit limits.

Use this whenever you're syncing or updating collections of records and the API provides a batch or bulk endpoint. This should be the default approach for any operation that handles more than a few records at a time.

Here are a few examples:

HubSpot's batch contacts API. Create or update up to 100 contacts in a single request
Salesforce Bulk API 2.0. Designed for large data operations, with a more generous rate limit tier.
Stripe's batch operations for creating multiple objects in one call.

Always check whether the API distinguishes between single-record and bulk endpoints in its rate-limiting documentation. In many cases, bulk endpoints are subject to separate, more generous rate limit tiers. Using them isn't just more efficient, it's the best path for high-volume operations.

Batching in Prismatic

In the low-code designer, use a Loop component to accumulate records into an array using each iteration's output, then pass that aggregated array to the HTTP component's batch endpoint after the loop completes. The loop step automatically collects the output of the last step in each iteration into a results array.

For code-native integrations, collect records in memory and chunk them before sending:

1234567
const BATCH_SIZE = 100;

for (let i = 0; i < records.length; i += BATCH_SIZE) {
  const batch = records.slice(i, i + BATCH_SIZE);
  await client.post("/contacts/batch", { inputs: batch });
  await sleep(DELAY_MS);
}

Deduplication and partial success handling

Deduplication should be combined with batching as needed. In webhook-triggered Prismatic flows, the FIFO Queue trigger includes built-in message deduplication within a 10-minute window, preventing duplicate events from stacking before they reach your batch queue.

Batching introduces a failure mode that single-record requests don't have: partial success. You send 100 records in a batch; 97 succeed and 3 fail. The API returns per-record error details in the response body.

Handling this correctly requires:

Parsing per-record results from the batch response, not just the top-level HTTP status
Retrying only the 3 failed records, not all 100
Logging the failures with enough context to debug: records, errors, and customers

In a code-native integration, generate structured log entries for partial failures via context.logger:

1234567891011121314
for (const result of batchResponse.data.results) {
  if (result.status === "error") {
    context.logger.error("Batch record failed", {
      recordId: result.id,
      error: result.message,
    });
    failedRecords.push(result);
  }
}
// retry only the failures
if (failedRecords.length > 0) {
  await client.post("/contacts/batch", { inputs: failedRecords });
}

Missing any of these can either lead to data loss (ignoring the 3 failures) or duplicate writes (retrying all 100 and re-creating the 97 that already succeeded).

Proactive pattern 3 – Priority lane segmentation

Route integration traffic into separate lanes by priority, ensuring that latency-sensitive, customer-visible operations are never delayed by high-volume background work.

Use this when you have a mix of real-time, user-initiated operations and bulk background syncs. This becomes critical as your integration catalog grows. A 35,000-record nightly import should never delay the webhook that fires when a customer's lead record changes status.

Lane A (real-time) – Webhooks, user-initiated actions, small record updates. Latency-sensitive. Should never queue behind bulk work.
Lane B (bulk/batch) – Scheduled syncs, historical data migrations, large exports. Throughput-oriented. Tolerates delays.

Priority lanes in Prismatic have separate flows with separate concurrency

In Prismatic, flows are the natural unit of lane separation. A single integration can have multiple flows, each with its own trigger type and concurrency configuration:

Flow 1 – Webhook trigger (parallel concurrency). This is Lane A. Webhook events are processed immediately on arrival. No queuing, no wait.
Flow 2 – Schedule trigger (throttled or sequential concurrency). And this is Lane B. The bulk sync runs at a controlled rate, concurrency is capped, and it can never starve Lane A's execution capacity.

This lane separation is configuration, not code. You don't need separate worker pools or custom infrastructure, just separate flows with different concurrency modes in the same integration.

FIFO queues for durable Lane B processing

For high-volume Lane B workloads that need durability, Prismatic's FIFO Queue trigger is the right tool. It queues inbound requests and processes them one at a time. When a Deduplication ID is configured in the Flow Control settings, Prismatic prevents the same event from being processed twice within a 10-minute window.

For workloads that need even more durable queuing (where you need messages to survive beyond a single execution window), Prismatic documents a two-flow pattern using Amazon SQS or Azure Service Bus:

Write flow – Receives parallel webhook requests and immediately enqueues them to SQS (fast acknowledgment to the sender).
Read flow – Runs on a schedule, retrieves one message at a time, processes it, and deletes it.

This is the production-grade equivalent of building custom queue infrastructure, but you are using managed AWS or Azure services rather than your own.

Handling 429s in a queued flow

When a queued flow hits a 429, Prismatic's automatic execution retry handles the re-queue. Configure this per-flow in the trigger settings:

Setting	Recommended value
Retry attempts	3–5 (platform max: 10)
Minutes between attempts	2–5
Exponential backoff	Enabled (doubles interval: 2 → 4 → 8 → 16 min)

For APIs that return a Retry-After header, parse it in your code to honor the exact wait time the API requests:

1234567891011121314151617
try {
  return await client.post("/resource", data);
} catch (error) {
  if (error.response?.status === 429) {
    const retryAfter = parseInt(
      error.response.headers["retry-after"] ?? "60",
      10
    );
    context.logger.warn(`Rate limited. Waiting ${retryAfter}s before retry.`, {
      endpoint: "/resource",
      retryAfter,
    });
    await new Promise((r) => setTimeout(r, retryAfter * 1_000));
    return await client.post("/resource", data);
  }
  throw error; // non-429 errors propagate normally
}

For longer Retry-After values that would push past Prismatic's 15-minute execution limit, skip the in-process wait and let automatic execution retry handle it instead.

Observability is the key to proper rate limit management

Every pattern in this post is only as effective as your ability to observe it working (or failing). Proactive rate limit handling without observability is engineering by assumption.

For every rate limit hit, at a minimum, log the following:

The endpoint that returned the 429
Full response headers (X-RateLimit-Remaining, X-RateLimit-Reset, Retry-After)
The customer or tenant whose request triggered it
The retry attempt number and calculated backoff duration
Whether the retry ultimately succeeded or exhausted all retries

123456789
context.logger.warn("Approaching rate limit", {                                                                       
    remaining: headers["x-ratelimit-remaining"],            
    resetAt:   headers["x-ratelimit-reset"],                                                                            
  });
  context.logger.error("Rate limit exceeded", {                                                                         
    endpoint:   url,                                        
    retryAfter: headers["retry-after"],
    attempt:    retryCount,                                                                                             
  });

Prismatic captures step-level logs for every integration execution, visible in the instance monitor. Logs are retained for 14 days and searchable by severity, flow, and time range.

Streaming logs to your observability stack

With Prismatic's log streaming, these structured log entries flow to Datadog, New Relic, or any other platform that accepts a log stream. Track these metrics over time:

Rate limit hit frequency – Per endpoint, per customer, per time of day
Retry success rate – What percentage of retried requests eventually succeed vs exhaust the retry limit
Average backoff duration – A rising average signals sustained API pressure, not occasional bursts

These metrics enable you to quickly spot the difference between "we handle rate limits gracefully" and "we are structurally over capacity on this API, and it's getting worse." The second scenario calls for architectural changes and not just better retry logic. But, without the data, you can't tell which scenario you are facing.

Using Prismatic alert monitors

Configure Prismatic alert monitors to surface the right signals before your customers notice issues.

Monitor type	What it catches
Execution Failed, Retry Pending	Execution failed for any reason. Automatic retry is scheduled. Use this to confirm rate limit pressure is being handled, not simply dropped.
Log Level Matched or Exceeded	Fires when a warn or error log is written. It catches rate limit pressure before it turns into execution failures.
Execution Failed	Execution failed with no further retries scheduled. Fires when retry is not configured, or on when the final retry attempt fails.
Execution Duration Matched or Exceeded	Execution is taking unusually long. This may indicate that retry loops are consuming too much execution time.

Alert on these thresholds:

Retry exhaustion rate exceeds N% – A reasonable starting point is when more than 1% of requests are failing all retries.
Sudden spike in rate limit hits – May indicate an API-side limit change or a customer running an unusually large ad hoc operation.
Execution duration trending upward over days – This is a structural capacity signal, not a transient event.

A rate-limiting decision framework

The patterns we've covered in this post (and the prior one) aren't mutually exclusive. It's not at all uncommon for production integrations running at scale to combine several patterns.

Here's a quick reference for handling some of the most common rate-limiting scenarios:

Situation	Start with
Occasional 429 on high-volume operations	Retry with exponential backoff + jitter
Predictable bulk sync with known limits	Rate-aware scheduling (proactive throttling)
Multiple flows or customers, shared API credentials	Flow Concurrency controls (Throttled or Sequential) per flow
Syncing collections of records	Batching to reduce total request count
Mix of real-time and bulk operations	Priority lane segmentation
Multiple customers sharing API credentials	Per-customer OAuth isolation
Repeated 429 despite backoff	Automatic execution retry with exponential backoff + Execution Failed alert monitor
Not sure which limit type you're hitting	Log everything first; identify the pattern before optimizing

The architectural reality

When you first build an integration, rate limiting is a code problem. So, you add a delay and a retry, and move on. But when you're executing integrations for hundreds of customers, rate limiting is an architectural problem. The per-request patterns from our prior post are necessary but not sufficient. Governors, queues, priority lanes, and top-level observability are essential for ensuring integrations function reliably at scale.

This is part of why some 80% of integration code ends up being infrastructure rather than the business logic specific to each integration. When you build that infrastructure layer yourself for every integration, the initial cost (and ongoing maintenance) compound quickly. When the platform provides that infrastructure, you write the business logic once, trusting that the infrastructure will handle the rest.

In Prismatic, the infrastructure layer is already there:

Automatic execution retry with exponential backoff replaces custom retry loops
Flow concurrency controls (FIFO/throttled) replace governors and custom worker pools
Separate flows with independent triggers and concurrency modes replace custom priority lane infrastructure
Per-customer credential management isolates rate limit budgets by tenant automatically
Built-in alert monitors and log streaming replace custom monitoring instrumentation

For teams building on Prismatic, rate limiting has stopped being a constant source of support tickets. Rather, it has become a manageable configuration concern – because the platform handles the infrastructure layer, and the integration code handles the business logic.

Check out the demo video to see how Prismatic manages rate limits (and everything else) so you can focus on the business logic.

How to Design B2B SaaS Integrations that Don't Hit Rate Limits

Proactive pattern 1 – Rate-aware scheduling

The low-code approach: Loop + Sleep

The CNI approach: `createClient` with proactive pacing

Replacing the distributed governor

Reading rate limit headers dynamically

Staggering schedules across customers

Proactive pattern 2 – Batching and bulk endpoints

Batching in Prismatic

Deduplication and partial success handling

Proactive pattern 3 – Priority lane segmentation

Priority lanes in Prismatic have separate flows with separate concurrency

FIFO queues for durable Lane B processing

Handling 429s in a queued flow

Observability is the key to proper rate limit management

Streaming logs to your observability stack

Using Prismatic alert monitors

A rate-limiting decision framework

The architectural reality

Related Resources

Ready to ship integrations 10x faster?

How to Design B2B SaaS Integrations that Don't Hit Rate Limits

Proactive pattern 1 – Rate-aware scheduling

The low-code approach: Loop + Sleep

The CNI approach: createClient with proactive pacing

Replacing the distributed governor

Reading rate limit headers dynamically

Staggering schedules across customers

Proactive pattern 2 – Batching and bulk endpoints

Batching in Prismatic

Deduplication and partial success handling

Proactive pattern 3 – Priority lane segmentation

Priority lanes in Prismatic have separate flows with separate concurrency

FIFO queues for durable Lane B processing

Handling 429s in a queued flow

Observability is the key to proper rate limit management

Streaming logs to your observability stack

Using Prismatic alert monitors

A rate-limiting decision framework

The architectural reality

Related Resources

Ready to ship integrations 10x faster?

The CNI approach: `createClient` with proactive pacing