PDF Component
Search and extract data from PDF documents
Component key: pdfDescription
The pdf component provides some actions for interacting with PDF documents, like finding text in a PDF document, listing page numbers, or extracting a specific page of a document.
This component cannot currently handle PDF documents that are encrypted.
Actions
Extract All Text
Extracts all the text from the specified PDF document and returns it as an array of texts. | key: extractAllText
Output Example Payload
{
"data": [
"This is an example text extracted from page 1 of a PDF document.",
"This is an example text extracted from page 2 of a PDF document.",
"This is an example text extracted from page 3 of a PDF document."
]
}
Extract Page
Returns the specified page in the given PDF document as a new separate PDF document. | key: extractPage
Output Example Payload
{
"data": {
"type": "Buffer",
"data": [
69,
120,
97,
109,
112,
108,
101
]
},
"contentType": "application/pdf"
}
Extract Page Text
Locates and extracts pages text from the specified PDF document that matches the specified page range. | key: extractPageText
Output Example Payload
{
"data": [
"This is an example text extracted from page 1 of a PDF document."
]
}
Extract Text by Pattern
Extracts text from the specified PDF document that matches the search text. | key: extractTextByPattern
Output Example Payload
{
"data": [
"This is an example text extracted from page 1 of a PDF document.",
"This is an example text extracted from page 2 of a PDF document.",
"This is an example text extracted from page 3 of a PDF document."
]
}
Find Pattern
Searches the PDF document and returns a list of page numbers containing text that satisfies the search criteria. | key: findPattern
Output Example Payload
{
"data": [
1,
2,
3,
4,
5
]
}
Page Numbers
Returns a sequence of page numbers for the specified PDF document, from 1 to the last page number. | key: pageNumbers
Output Example Payload
{
"data": [
1,
2,
3,
4,
5
]
}