Skip to main content

PDF Component

Search and extract data from PDF documents

Component key: pdf

Description#

The pdf component provides some actions for interacting with PDF documents, like finding text in a PDF document, listing page numbers, or extracting a specific page of a document.

This component cannot currently handle PDF documents that are encrypted.

Actions#

Extract Page#

Returns the specified page in the given PDF document as a new separate PDF document. | key: extractPage

InputKeyNotesExample

Page Number

string
/ Required
pageNumberThis specifies a page in a PDF document by number.5

PDF data

data
/ Required
pdfDataThis must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.

Output Example Payload#

{  "data": {    "type": "Buffer",    "data": [      69,      120,      97,      109,      112,      108,      101    ]  },  "contentType": "application/pdf"}

Find Pattern#

Searches the PDF document and returns a list of page numbers containing text that satisfies the search criteria. | key: findPattern

InputKeyNotesExample

Case Sensitive

boolean
/ Required
caseSensitiveThis specifies whether searching should be case-sensitive. You can choose true or false.true

Contains?

boolean
/ Required
containsThis specifies whether to return page numbers that either contain or don't contain the search pattern. Options are true or falsetrue

Search Pattern

string
/ Required
patternThis is the pattern to search for in the PDF document.Some Text

PDF data

data
/ Required
pdfDataThis must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.

Use Regex

boolean
/ Required
useRegexThis specifies whether the search pattern is a regular expression.true

Output Example Payload#

{  "data": [    1,    2,    3  ]}

Page Numbers#

Returns a sequence of page numbers for the specified PDF document, from 1 to the last page number. | key: pageNumbers

InputKeyNotes

PDF data

data
/ Required
pdfDataThis must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.

Output Example Payload#

{  "data": [    1,    2,    3,    4,    5  ]}