What's a Paginated API and How to Loop Over One in an Integration

What's a Paginated API and How to Loop Over One in an Integration

APIs allow us to access massive data stores. But it doesn't make sense to dump all that data on us at once. Generally, when an API needs to provide a large amount of data but in a manageable way, it uses pagination. That is, it provides the data in chunks or "pages."

For example, if you search for something on Google, the search engine may say it has 9,200,000,000 results. However, you can only see 10 results at a time. If you want to see more results than those immediately displayed, you may need to view them on separate pages (or keep scrolling down to view additional results as Google dynamically adds them to your browser).

Google could dump all the results on a single page at one time, but that would overwhelm your browser, if not your entire system (and take forever to display). And none of us want to scroll through 9 billion results to find something.

Is pagination available for every API?

Not every API supports pagination, but most do. Most APIs that aren't private (that is, partner, public, and open APIs) need to use pagination along with rate throttling and rate limiting to ensure that the APIs can handle the volume of requests being sent their way. If you have an API accessing a small, non-changing database, pagination may have no value. However, most databases aren't stagnant, and their size drives the need to paginate to keep requests from overwhelming the API.

It is possible to set up pagination for just about any type of API, including REST, SOAP, XML-RPC, and GraphQL. The syntax varies a bit based on the type of API, but the underlying principle (chunking the data to make requests faster, lessening the server load, etc.) remains the same.

Do we have to use pagination if the API provides it?

Short answer: yes. Long answer: it depends. In some cases, it is possible to get all data from a paginated API without being restricted to one page at a time. However, most paginated APIs will limit the query results even if you don't pass in pagination parameters when you query the API. That's where looping (repeatedly querying) the API comes into play, as we'll cover later in this post.

The bottom line is that you don't usually determine if the results should be paginated, though you may have some control over specific pagination parameters.

How does pagination work with an API?

Paging is commonly used with APIs for integrations. The integration queries the API and returns a subset (page 1, as it were) of the data. The integration client or the API keeps track of the provided data and returns the next subset (page 2) of the data upon an additional request. Different APIs accept different terms to keep track of which page we are on. These terms include cursor, key, and offset. Using these terms, the integration can tell the API, "Last time I asked for page 17, now I'd like page 18."

We can use the API at https://jsonplaceholder.typicode.com/comments for a real-world paging demonstration. Click the link to navigate to that URL. You should now see 500 sample comments.

If you don't want to view all 500 of them at once, add a _limit parameter to the end of the URL and specify how many you want. Here's what it looks like to view ten comments: https://jsonplaceholder.typicode.com/comments?_limit=10.

But this only gives you the first ten results. What if you want to see results 11 through 20? Add the _start cursor. The resulting URL is https://jsonplaceholder.typicode.com/comments?_limit=10&_start=10.

From here, you can manually page through the results until there are no more:

  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=10
  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=20
  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=30
  • And so forth

How do you get the integration to loop over the API?

It was easy enough to manually change the parameters for the URLs above to get the next page of data, but what if we want to do that programmatically? That's where looping comes in. When we talk about looping over an API for an integration, we mean that the integration includes code that loops (repeats) with slight variations to pull all the needed data from the API – one page at a time.

As mentioned earlier, different APIs use different terms to define the beginning and end of a "page" of data. Let's suppose that, for the following example, the API returns metadata along with the result that reads like { currentPage: 5, numPages: 10 }. Based on that, you could write a bit of Python code that loops over the API. It would look like this:

import requests

session = requests.Session()

# Fetch one page of results at a time
def get_items_pages():
  url = "https://example.com/api/items"

  # Fetch the first page and return it
  response = session.get(url).json()
  yield response['data']

  # Loop over the remaining pages and return one at a time
  number_pages = response['page_info']['numPages']
  for page in range(2, num_pages+1):
    response = session.get(url, params={'page': page}).json()
    yield response['data']

# Loop over each page
for page_of_items in get_item_pages():
  # Loop over each item on the page
  for item in page_of_items:
    # Do something with each item

Of course, you are not restricted to Python to loop over the API. You could write the looping function in any modern programming language (TypeScript, C#, etc.).

Where else can you see this looping pattern?

If you've run SQL database queries, you may have used a similar pattern, with LIMIT and OFFSET. In SQL, to select the first ten rows from a database table, you could write the following:

FROM   items

That query is similar to what an API runs when it returns data for you. Now, if you wanted to get the next ten items (11 through 20) from the query, you'd write the following SQL:

FROM   items

What are common issues when looping over an API?

  • Many datasets are dynamic. As a result, data that corresponds with your API query parameters may change while the integration is looping over the API. Some APIs handle this by keeping track of the data provided so far. In other cases, you might need the integration to handle any processing-time changes to the dataset.

  • Some APIs don't let you know how many pages there are. Some APIs will let you know how many pages are available. Others don't say anything until you are on the last page and there's no data left. You might even get a number of results or pages, but it's evident that the number is an approximation. That's how Google gave us the number 9,200,000,000 in the first example we discussed. Google didn't run SELECT COUNT(*) FROM ... for the underlying database, or we would have received a non-rounded number.

  • An infinite loop is possible. If the API your integration calls doesn't inform you how many pages there are, you'll need to loop over the API until you run out of pages. If your code can't identify that it's reached the end of the data, your integration could find itself in an infinite loop.

Want to learn more about API integrations?

Want to learn more about API integrations?

Download our API Integrations Guide to see what an API integration is and learn how it works.

Get my Copy

Looping is critical for integrations

Looping over an API is critical to many integrations dealing with large datasets. Every API can have slightly different rules and terms, so it's good to know what you are working with before you start.

Our pre-built loop component makes it easy for devs to programmatically loop over an API to get the results they need for their B2B SaaS integrations. For more detail on looping over an API using Prismatic, check out Looping Over a Paginated API.

To learn how an embedded iPaaS can help with more than just looping over APIs, schedule a demo, or contact us.

About Prismatic

Prismatic, the world's most versatile embedded iPaaS, helps B2B SaaS teams launch powerful product integrations up to 8x faster. The industry-leading platform provides a comprehensive toolset so teams can build integrations fast, deploy and support them at scale, and embed them in their products so customers can self-serve. It encompasses both low-code and code-native building experiences, pre-built app connectors, deployment and support tooling, and an embedded integration marketplace. From startups to Fortune 100, B2B SaaS companies across a wide range of verticals and many countries rely on Prismatic to power their integrations.

Get the latest from Prismatic

Subscribe to receive updates, product news, blog posts, and more.