#include <DataExtractionModule.h>
|
static bool | IsModuleAvailable (DataExtractionEngine engine) |
|
static UString | ExtractData (const UString &input_pdf_file, DataExtractionEngine engine, DataExtractionOptions *options=0) |
|
static void | ExtractData (const UString &input_pdf_file, const UString &output_json_file, DataExtractionEngine engine, DataExtractionOptions *options=0) |
|
static void | DetectAndAddFormFieldsToPDF (PDFDoc &doc, DataExtractionOptions *options=0) |
|
static void | ExtractToXLSX (const UString &input_pdf_file, const UString &output_xlsx_file, DataExtractionOptions *options=0) |
|
static void | ExtractToXLSX (const UString &input_pdf_file, Filters::Filter &output_xlsx_stream, DataExtractionOptions *options=0) |
|
The class DataExtractionModule. static interface to Apryse SDKs data extraction functionality
Definition at line 23 of file DataExtractionModule.h.
Enumerator |
---|
e_Tabular |
Tabular Data engine. This engine identifies column and row structure and analyzes numeric columns. It is especially suited to documents that are table-based such as spreadsheets.
|
e_Form |
Form field extraction engine. This engine uses artificial intelligence and computer vision to detect form fields in documents that do not have any interactive field annotations embedded.
|
e_DocStructure |
Document structure engine. This engine discovers the full logical structure, including headers, footers, paragraphs, list items, table columns, cells, borders, images and graphics.
|
e_FormKeyValue |
Form field with key value extraction engine. This engine uses artificial intelligence and computer vision to detect form fields, including field name and values, in documents that do not have any interactive field annotations embedded. Note: This engine is experimental and subject to change.
|
Definition at line 29 of file DataExtractionModule.h.
Perform automatic form field detection, then insert the fields into the PDF. Note: The FormKeyValue engine is experimental and subject to change.
- Parameters
-
doc | – The PDF document where fields are detected from and inserted into. |
options | – Data extraction options (optional). |
Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.
- Parameters
-
input_pdf_file | – The source document filename. |
engine | – The extraction engine. |
options | – Data extraction options (optional). |
- Returns
- JSON string representing the extracted results.
Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.
- Parameters
-
input_pdf_file | – The source document filename. |
output_json_file | – The resulting JSON filename. |
engine | – The extraction engine. |
options | – Data extraction options (optional). |
Perform data extraction on a PDF in XLSX output format.
- Parameters
-
input_pdf_file | – The source document filename. |
output_xlsx_file | – The resulting XLSX filename. |
options | – Data extraction options (optional). |
Perform data extraction on a PDF in XLSX output format.
- Parameters
-
input_pdf_file | – The source document filename. |
output_xlsx_stream | – The resulting XLSX filter. |
options | – Data extraction options (optional). |
Find out whether the specified data extraction module is available (and licensed).
- Parameters
-
engine | – The extraction engine. |
- Returns
- returns true if data extraction operations can be performed.
The documentation for this class was generated from the following file: