Amazon AWS Logo
Amazon AWS Logo

When working with AWS APIs through Python, developers often hit a frustrating wall: API responses that only return partial results. AWS limits the number of items returned in a single API call to prevent system overload, which means getting complete datasets requires multiple requests. Boto3’s paginator feature automatically handles these multiple requests, letting developers retrieve entire result sets without manually managing continuation tokens or markers.

The pagination process breaks down large datasets into manageable chunks. Instead of receiving all 10,000 S3 objects at once, AWS might return 1,000 objects per page with a token to fetch the next batch. This approach protects both AWS services and client applications from memory issues and network timeouts.

Understanding pagination becomes essential when building production applications that handle real-world data volumes. The difference between a script that crashes after retrieving partial results and one that reliably processes complete datasets often comes down to proper pagination implementation. Most AWS services use pagination, making this knowledge critical for any Python developer working with cloud resources.

AWS API Pagination with Boto3: A Step-by-Step Guide

When working with AWS APIs using Boto3, you often encounter responses that return partial results due to API limits on the number of items per request. To retrieve the full dataset, you need to handle pagination — making multiple requests to fetch all results seamlessly.

This guide explains how to use Boto3’s built-in pagination features effectively.


What Is Pagination in AWS APIs?

Many AWS API operations limit the number of items returned in a single response (e.g., 1000 S3 objects). To get the complete list, you must make additional requests, each continuing where the last left off. This process is called pagination.


How Boto3 Helps with Pagination

Boto3 simplifies pagination with Paginator objects. Instead of manually handling tokens or markers, you can use paginators to automatically iterate through all pages of results.


Step-by-Step: Using Paginators in Boto3

1. Create a Client

Start by creating a Boto3 client for the AWS service you want to interact with.

import boto3

s3_client = boto3.client('s3')

2. Get a Paginator Object

Use the client’s get_paginator method, passing the name of the operation you want to paginate.

paginator = s3_client.get_paginator('list_objects_v2')

3. Use the Paginator to Iterate Through Pages

Call the paginator’s paginate method with any required parameters. It returns an iterable of response pages.

pages = paginator.paginate(Bucket='my-bucket')

for page in pages:
    for obj in page.get('Contents', []):
        print(obj['Key'])

This code will print all object keys in the specified S3 bucket, regardless of how many objects there are.


Example: Paginating EC2 Instances

ec2_client = boto3.client('ec2')
paginator = ec2_client.get_paginator('describe_instances')

for page in paginator.paginate():
    for reservation in page['Reservations']:
        for instance in reservation['Instances']:
            print(instance['InstanceId'], instance['State']['Name'])

Tips and Best Practices

  • Check AWS documentation for which operations support pagination.
  • Use paginators to avoid manually managing tokens like NextToken or Marker.
  • Combine paginators with filters or parameters to limit results if needed.
  • For large datasets, paginators improve memory efficiency by loading one page at a time.

Summary

Boto3 paginators provide an easy and reliable way to handle AWS API pagination:

  • Create a client.
  • Obtain a paginator for the operation.
  • Iterate over all pages using paginate().

This approach ensures you retrieve complete result sets without extra manual effort.

For more details, see the official Boto3 Pagination documentation.

Key Takeaways

  • Boto3 paginators automatically manage multiple API requests needed to retrieve complete datasets from AWS services
  • AWS APIs limit response sizes to prevent system overload, requiring pagination to access full result sets
  • Proper pagination implementation prevents common issues like incomplete data retrieval and application crashes

Frequently Asked Questions

Developers often encounter specific challenges when implementing pagination with Boto3. These common questions address practical implementation issues and optimization techniques.

How do I use the Boto3 Paginator for listing objects in an S3 bucket?

Create an S3 client first, then get the paginator for the list_objects_v2 operation. The paginator handles the token management automatically.

import boto3

client = boto3.client('s3', region_name='us-west-2')
paginator = client.get_paginator('list_objects_v2')

Call the paginate method with your bucket name to create a page iterator. Loop through each page to access the objects.

page_iterator = paginator.paginate(Bucket='your-bucket-name')

for page in page_iterator:
    if 'Contents' in page:
        for obj in page['Contents']:
            print(obj['Key'])

The paginator automatically requests new pages when needed. Each page contains up to 1000 objects by default.

What is the default page size for a paginator in Boto3, and how can it be adjusted?

Default page sizes vary by AWS service. S3 returns up to 1000 objects per page, while IAM returns 50 users by default.

Use the PageSize parameter in PaginationConfig to control items per page. Set this when calling the paginate method.

page_iterator = paginator.paginate(
    Bucket='your-bucket',
    PaginationConfig={'PageSize': 100}
)

AWS services may return fewer items than requested based on their limits. Some services ignore the PageSize parameter entirely.

MaxItems limits the total number of items returned across all pages. This differs from PageSize which controls individual page size.

Can you provide an example of using paginator in Boto3 to retrieve items from a DynamoDB table?

Create a DynamoDB client and get the scan paginator. The scan operation retrieves all items from a table.

import boto3

client = boto3.client('dynamodb', region_name='us-east-1')
paginator = client.get_paginator('scan')

Use the paginate method with your table name. Process each page of results separately.

page_iterator = paginator.paginate(
    TableName='your-table-name',
    PaginationConfig={'PageSize': 25}
)

for page in page_iterator:
    items = page.get('Items', [])
    for item in items:
        print(item)

DynamoDB scan operations can consume many read capacity units. Consider using query operations when possible for better performance.

How does the Boto3 Paginator handle large sets of results when using AWS services?

Paginators split large result sets into smaller chunks called pages. This prevents memory issues and API timeouts with large datasets.

The paginator stores continuation tokens automatically. These tokens tell AWS where to start the next page of results.

Each API call retrieves one page of data. The paginator makes additional calls as you iterate through results.

Memory usage stays constant regardless of total result size. Only one page loads into memory at a time.

AWS APIs set maximum limits per page to protect their systems. Paginators work within these service-specific constraints.

Is it possible to filter or search results when using a Paginator in Boto3?

Server-side filtering passes parameters to the AWS API before pagination starts. This reduces data transfer and improves performance.

operation_parameters = {
    'Bucket': 'your-bucket',
    'Prefix': 'photos/'
}
page_iterator = paginator.paginate(**operation_parameters)

Client-side filtering uses JMESPath expressions on paginated results. Apply the search method to a page iterator for this approach.

page_iterator = paginator.paginate(Bucket='your-bucket')
filtered_iterator = page_iterator.search("Contents[?Size > `1000`][]")

for item in filtered_iterator:
    print(item)

JMESPath filtering processes each page individually. Large result sets may still transfer unnecessary data from AWS.

Combine both methods when possible. Use server-side filtering to reduce data transfer, then apply JMESPath for complex conditions.

What are the best practices for efficiently handling API pagination in Boto3 with AWS services?

Use server-side filtering parameters to reduce data transfer costs. Filter at the AWS API level before pagination begins.

Set appropriate PageSize values based on your use case. Smaller pages reduce memory usage but increase API calls.

Store pagination tokens when processing large datasets over time. Use StartingToken to resume from specific positions.

page_iterator = paginator.paginate(
    TableName='large-table',
    PaginationConfig={
        'PageSize': 100,
        'StartingToken': saved_token
    }
)

Handle API rate limits with exponential backoff. Paginators can trigger rate limiting with rapid successive calls.

Process pages immediately rather than storing all results in memory. This approach scales better with large datasets.

Use MaxItems to limit total results when you only need a subset. This prevents unnecessary API calls and data transfer.