Platform
Connectors
Solutions
Developers
Resources
Pricing
Start For Free
Tech Topics

What's a Paginated API and How to Loop Over One in an Integration

What's a Paginated API and How to Loop Over One in an Integration

APIs allow us to access massive data stores. But it doesn't make sense to dump all that data on us at once. Generally, when an API needs to provide a large amount of data but in a manageable way, it uses pagination. That is, it provides the data in chunks or "pages."

For example, if you search for something on Google, the search engine may say that it has 9,200,000,000 results. However, you can only see ten results at a time. If you want to see more results than those immediately displayed, you need to view them on separate pages.

Google could dump all the results on a single page, but that would overwhelm your browser, if not your entire system (and take forever to display). And none of us want to scroll through 9 billion results to find something.

How does pagination work with an API?

Paging is used commonly with APIs for integrations as well. The integration queries the API and returns a subset (page 1, as it were) of the data. Either the integration client or the API keeps track of the provided data and returns the next subset (page 2) of the data upon request. Different APIs accept different terms to keep track of which page we are on. Some terms commonly used for this are cursor, start, or offset. The integration uses the term to "bookmark" the last page of data provided. That way, the integration can tell the API something like, "Last time I asked for page 17, now I'd like page 18."

We can use the API at https://jsonplaceholder.typicode.com/comments for a real-world paging demonstration. Click the link to navigate to that URL. You should now see 500 sample comments.

If you don't want to view all 500 of them at once, add a _limit parameter to the end of the URL and specify how many you want. Here's what it looks like to view ten comments: https://jsonplaceholder.typicode.com/comments?_limit=10.

But this only gives you the first ten results. What if you want to see results 11 through 20? Add the _start cursor. The resulting URL is https://jsonplaceholder.typicode.com/comments?_limit=10&_start=10.

From here, you can manually page through the results until there are no more:

  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=10
  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=20
  • https://jsonplaceholder.typicode.com/comments?_limit=10&_start=30
  • And so forth

How do you get the integration to loop over the API?

It's easy enough to change the parameters for the URLs above to get the next page of data, but what if we want to do that programmatically? That's where looping comes in. When we talk about looping over an API for an integration, we mean that the integration includes code that loops (repeats) with slight variations to pull all the needed data from the API, one page at a time.

As mentioned earlier, different APIs use different terms to define the beginning and end of a "page" of data. Let's suppose that, for the following example, the API returns metadata along with the result that reads like { currentPage: 5, numPages: 10 }. Based on that, you could write a bit of Python code that loops over the API. It would look like this:


import requests

session = requests.Session()

# Fetch one page of results at a time
def get_items_pages():
  url = "https://example.com/api/items"

  # Fetch the first page and return it
  response = session.get(url).json()
  yield response['data']

  # Loop over the remaining pages and return one at a time
  number_pages = response['page_info']['numPages']
  for page in range(2, num_pages+1):
    response = session.get(url, params={'page': page}).json()
    yield response['data']

# Loop over each page
for page_of_items in get_item_pages():
  # Loop over each item on the page
  for item in page_of_items:
    # Do something with each item

Where else can you see this looping pattern?

If you've run SQL database queries, you may have used a similar pattern, with LIMIT and OFFSET. In SQL, to select the first ten rows from a database table, you could write the following:

SELECT id,
       name,
       price,
       quantity
FROM   items
LIMIT  10

That query is similar to what an API runs when it returns data for you. Now, if you wanted to get the next ten items (11 through 20) from the query, you’d write the following SQL:

SELECT id,
       name,
       price,
       quantity
FROM   items
LIMIT  10
OFFSET 10

What are common issues when looping over an API?

  • Many datasets are dynamic. As a result, data that corresponds with your API query parameters may change while the integration is looping over the API. Some APIs handle this by keeping track of the data provided so far. In other cases, you might need the integration to handle any processing-time changes to the dataset.

  • Some APIs don't let you know how many pages there are. Some APIs will let you know how many pages are available. Others don't say anything until you are on the last page and there's no data left. You might even get a number of results or pages, but it's evident that the number is an approximation. That's how Google gave us the number 9,200,000,000 in the first example we discussed. Google didn't run SELECT COUNT(*) FROM ... for the underlying database, or we would have received a non-rounded number.

  • An infinite loop is possible. If the API your integration calls doesn't inform you how many pages there are, you'll need to loop over the API until you run out of pages. If your code can’t identify that it’s reached the end of the data, your integration could find itself in an infinite loop.

Conclusion

Looping over an API is critical to many integrations dealing with large datasets. Every API can have slightly different rules and terms, so it's good to know what you are working with before you start.

If you'd like to learn how an embedded iPaaS can simplify the process of looping over an API for your B2B SaaS integrations, please contact us.


About Prismatic

Prismatic is the integration platform for B2B software companies. It's the quickest way to build integrations to the other apps your customers use and to add a native integration marketplace to your product. A complete embedded iPaaS solution that empowers your whole organization, Prismatic encompasses an intuitive integration designer, embedded integration marketplace, integration deployment and support, and a purpose-built cloud infrastructure. Prismatic was built in a way developers love and provides the tools to make it perfectly fit the way you build software.

Get the latest from Prismatic

Subscribe to receive updates, product news, blog posts, and more.

Get the latest from Prismatic

© 2022 Prismatic LLC. All rights reserved.