PDF Component

Search and extract data from PDF documents
Component key: pdf#
DescriptionThe pdf component provides some actions for interacting with PDF documents, like finding text in a PDF document, listing page numbers, or extracting a specific page of a document.
This component cannot currently handle PDF documents that are encrypted.
#
Actions#
Extract PageReturns the specified page in the given PDF document as a new separate PDF document. | key: extractPage
Input | Notes | Example |
---|---|---|
Input Page Number string / Required pageNumber | Notes This specifies a page in a PDF document by number. | Example 5 |
Input PDF data data / Required pdfData | Notes This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF. | Example |
#
Output Example Payload{ "data": { "type": "Buffer", "data": [ 69, 120, 97, 109, 112, 108, 101 ] }, "contentType": "application/pdf"}
#
Find PatternSearches the PDF document and returns a list of page numbers containing text that satisfies the search criteria. | key: findPattern
Input | Notes | Example |
---|---|---|
Input Case Sensitive boolean / Required caseSensitive | Notes This specifies whether searching should be case-sensitive. You can choose true or false. | Example true |
Input Contains? boolean / Required contains | Notes This specifies whether to return page numbers that either contain or don't contain the search pattern. Options are true or false | Example true |
Input Search Pattern string / Required pattern | Notes This is the pattern to search for in the PDF document. | Example Some Text |
Input PDF data data / Required pdfData | Notes This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF. | Example |
Input Use Regex boolean / Required useRegex | Notes This specifies whether the search pattern is a regular expression. | Example true |
#
Output Example Payload{ "data": [ 1, 2, 3 ]}
#
Page NumbersReturns a sequence of page numbers for the specified PDF document, from 1 to the last page number. | key: pageNumbers
Input | Notes |
---|---|
Input PDF data data / Required pdfData | Notes This must refer to a Base64 encoded data URI or a buffer containing the raw bytes of a PDF. |
#
Output Example Payload{ "data": [ 1, 2, 3, 4, 5 ]}