Skip to main content

PDF Component

Search and extract data from PDF documents

Component key: pdf

Description#

The pdf component provides some actions for interacting with PDF documents, like finding text in a PDF document, listing page numbers, or extracting a specific page of a document.

This component cannot currently handle PDF documents that are encrypted.

Actions#

Extract Page#

Returns the specified page in the given PDF document as a new separate PDF document. | key: extractPage

InputNotesExample
Input
Page Number
string
/ Required
pageNumber
Notes
This specifies a page in a PDF document by number.
Example
5
Input
PDF data
data
/ Required
pdfData
Notes
This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.
Example
 

Output Example Payload#

{  "data": {    "type": "Buffer",    "data": [      69,      120,      97,      109,      112,      108,      101    ]  },  "contentType": "application/pdf"}

Find Pattern#

Searches the PDF document and returns a list of page numbers containing text that satisfies the search criteria. | key: findPattern

InputNotesExample
Input
Case Sensitive
boolean
/ Required
caseSensitive
Notes
This specifies whether searching should be case-sensitive. You can choose true or false.
Example
true
Input
Contains?
boolean
/ Required
contains
Notes
This specifies whether to return page numbers that either contain or don't contain the search pattern. Options are true or false
Example
true
Input
Search Pattern
string
/ Required
pattern
Notes
This is the pattern to search for in the PDF document.
Example
Some Text
Input
PDF data
data
/ Required
pdfData
Notes
This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.
Example
 
Input
Use Regex
boolean
/ Required
useRegex
Notes
This specifies whether the search pattern is a regular expression.
Example
true

Output Example Payload#

{  "data": [    1,    2,    3  ]}

Page Numbers#

Returns a sequence of page numbers for the specified PDF document, from 1 to the last page number. | key: pageNumbers

InputNotes
Input
PDF data
data
/ Required
pdfData
Notes
This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF.

Output Example Payload#

{  "data": [    1,    2,    3,    4,    5  ]}