Skip to main content

PDF Component

Search and extract data from PDF documents

Component key: pdf · Changelog ↓

Description

PDF (Portable Document Format) is a file format developed by Adobe for presenting documents independently of software, hardware, or operating systems. The pdf component allows finding text in PDF documents, listing page numbers, and extracting specific pages from a document.

This component does not support PDF documents that are encrypted.

Actions

Extract All Text

Extracts all text from the specified PDF document and returns it as an array of text strings. | key: extractAllText

InputNotesExample
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract All Text
Loading…

Extract Page

Extracts the specified page from the PDF document and returns it as a new separate PDF document. | key: extractPage

InputNotesExample
Page Number

The page number to extract from the PDF.

5
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract Page
Loading…

Extract Page Text

Extracts text from the specified page range in the PDF document. | key: extractPageText

InputNotesExample
Page End

The ending page number for extraction. If not provided, only the start page is extracted.

5
Page Start

The starting page number for extraction.

1
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract Page Text
Loading…

Extract Structured Text

Extracts all text items from the PDF with their position coordinates, dimensions, font metadata, and layout flags for custom parsing. | key: extractStructuredText

InputNotesExample
Page End

The ending page number for extraction. If not provided, extraction continues to the last page.

5
Page Start

The starting page number for extraction. If not provided, extraction starts from the first page.

1
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract Structured Text
Loading…

Extract Table Data

Detects and extracts tabular structures from the PDF using coordinate-based row and column clustering, returning two-dimensional string arrays. | key: extractTableData

InputNotesExample
Column Tolerance

X-coordinate tolerance in PDF points for detecting table column boundaries. Default is 10 points. Decrease for dense tables, increase for tables with wider spacing.

10
Page End

The ending page number for extraction. If not provided, extraction continues to the last page.

5
Page Start

The starting page number for extraction. If not provided, extraction starts from the first page.

1
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Row Tolerance

Y-coordinate tolerance in PDF points for grouping text items into table rows. Default is 3 points.

3
Example Payload for Extract Table Data
Loading…

Extract Text by Pattern

Extracts text from the specified PDF document that matches the search text. | key: extractTextByPattern

InputNotesExample
Case Sensitive

When true, the search is case-sensitive.

true
Characters After

The number of characters to extract after the search pattern. If not provided, the entire page is returned.

10
Search Pattern

This is the text to search for in the PDF document.

Some Text
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract Text by Pattern
Loading…

Extract Text with Layout

Extracts text from the PDF with line breaks and paragraph spacing preserved from the original document layout. | key: extractTextWithLayout

InputNotesExample
Line Tolerance

Y-coordinate tolerance in PDF points for grouping text items into lines. Items within this vertical distance are considered same-line. Default is 2 points. Increase for PDFs with inconsistent text positioning.

2
Page End

The ending page number for extraction. If not provided, extraction continues to the last page.

5
Page Start

The starting page number for extraction. If not provided, extraction starts from the first page.

1
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Extract Text with Layout
Loading…

Find Pattern

Searches the PDF document and returns page numbers containing text that matches the search criteria. | key: findPattern

InputNotesExample
Case Sensitive

When true, the search is case-sensitive.

true
Contains

When true, returns pages containing the pattern; when false, returns pages without the pattern.

true
Search Pattern

The text pattern to search for in the PDF document.

Some Text
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Use Regex

When true, treats the search pattern as a regular expression.

true
Example Payload for Find Pattern
Loading…

Find Text Position

Searches the PDF document and returns the position coordinates of all occurrences of the specified text. | key: findTextPosition

InputNotesExample
Case Sensitive

When true, the search is case-sensitive.

true
Page Number

Limit the search to a specific page number. If not provided, all pages are searched.

1
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Search Text

The text to search for in the PDF document.

Tenant
Example Payload for Find Text Position
Loading…

Page Numbers

Returns a sequence of page numbers for the PDF document, from 1 to the last page. | key: pageNumbers

InputNotesExample
PDF Data

The PDF file data to process. This can be a file reference from a previous step.

Example Payload for Page Numbers
Loading…

Changelog

2026-03-30

Added PDF text and table extraction actions:

  • Find Text Position - Returns coordinate positions of text occurrences within a PDF document
  • Extract Structured Text - Returns all text items with position coordinates, dimensions, and font metadata for custom parsing
  • Extract Text with Layout - Preserves line breaks and paragraph spacing from the original PDF layout
  • Extract Table Data - Detects and extracts tabular structures as two-dimensional arrays using coordinate-based heuristics

2026-02-12

Improved error handling across PDF processing actions with descriptive error messages and detection of corrupted or invalid PDF structures