PDF Component

Search and extract data from PDF documents

Component key: pdf

Description

The pdf component provides some actions for interacting with PDF documents, like finding text in a PDF document, listing page numbers, or extracting a specific page of a document.

This component cannot currently handle PDF documents that are encrypted.

Actions

Extract All Text

Extracts all the text from the specified PDF document and returns it as an array of texts. | key: extractAllText

Input	Notes
PDF data data / Required pdfData	This must refer to a buffer containing the raw bytes of a PDF.

{
  "data": [
    "This is an example text extracted from page 1 of a PDF document.",
    "This is an example text extracted from page 2 of a PDF document.",
    "This is an example text extracted from page 3 of a PDF document."
  ]
}

Extract Page

Returns the specified page in the given PDF document as a new separate PDF document. | key: extractPage

Input	Notes	Example
Page Number string / Required pageNumber	This specifies a page in a PDF document by number.	5
PDF data data / Required pdfData	This must refer to a buffer containing the raw bytes of a PDF.

{
  "data": {
    "type": "Buffer",
    "data": [
      69,
      120,
      97,
      109,
      112,
      108,
      101
    ]
  },
  "contentType": "application/pdf"
}

Extract Page Text

Locates and extracts pages text from the specified PDF document that matches the specified page range. | key: extractPageText

Input	Notes	Example
Page End string pageEnd	This specifies the ending page to extract from the PDF document. If not defined, will only extract the page on the pageStart input.	5
Page Start string / Required pageStart	This specifies the starting page to extract from the PDF document.	1
PDF data data / Required pdfData	This must refer to a buffer containing the raw bytes of a PDF.

{
  "data": [
    "This is an example text extracted from page 1 of a PDF document."
  ]
}

Extract Text by Pattern

Extracts text from the specified PDF document that matches the search text. | key: extractTextByPattern

Input	Default	Notes	Example
Case Sensitive boolean / Required caseSensitive	false	This specifies whether searching should be case-sensitive. You can choose true or false.	true
Characters After string charactersAfter		This specifies the number of characters to extract from the PDF document after the search pattern found. If not specified, the entire page is returned.	10
Search Pattern string / Required pattern		This is the text to search for in the PDF document.	Some Text
PDF data data / Required pdfData		This must refer to a buffer containing the raw bytes of a PDF.

{
  "data": [
    "This is an example text extracted from page 1 of a PDF document.",
    "This is an example text extracted from page 2 of a PDF document.",
    "This is an example text extracted from page 3 of a PDF document."
  ]
}

Find Pattern

Searches the PDF document and returns a list of page numbers containing text that satisfies the search criteria. | key: findPattern

Input	Default	Notes	Example
Case Sensitive boolean / Required caseSensitive	false	This specifies whether searching should be case-sensitive. You can choose true or false.	true
Contains? boolean / Required contains	true	This specifies whether to return page numbers that either contain or don't contain the search pattern. Options are true or false
Search Pattern string / Required pattern		This is the pattern to search for in the PDF document.	Some Text
PDF data data / Required pdfData		This must refer to a buffer containing the raw bytes of a PDF.
Use Regex boolean / Required useRegex	false	This specifies whether the search pattern is a regular expression.	true

{
  "data": [
    1,
    2,
    3,
    4,
    5
  ]
}

Page Numbers

Returns a sequence of page numbers for the specified PDF document, from 1 to the last page number. | key: pageNumbers

Input	Notes
PDF data data / Required pdfData	This must refer to a buffer containing the raw bytes of a PDF.

{
  "data": [
    1,
    2,
    3,
    4,
    5
  ]
}

Search and extract data from PDF documents

Description​

Actions​

Extract All Text​

Extract Page​

Extract Page Text​

Extract Text by Pattern​

Find Pattern​

Page Numbers​