All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
pdftron::PDF::DataExtractionModule Class Reference

#include <DataExtractionModule.h>

Public Types

enum  DataExtractionEngine { e_Tabular = 0, e_Form = 1, e_DocStructure = 2, e_FormKeyValue = 3 }
 

Static Public Member Functions

static bool IsModuleAvailable (DataExtractionEngine engine)
 
static UString ExtractData (const UString &input_pdf_file, DataExtractionEngine engine, DataExtractionOptions *options=0)
 
static void ExtractData (const UString &input_pdf_file, const UString &output_json_file, DataExtractionEngine engine, DataExtractionOptions *options=0)
 
static void DetectAndAddFormFieldsToPDF (PDFDoc &doc, DataExtractionOptions *options=0)
 
static void ExtractToXLSX (const UString &input_pdf_file, const UString &output_xlsx_file, DataExtractionOptions *options=0)
 
static void ExtractToXLSX (const UString &input_pdf_file, Filters::Filter &output_xlsx_stream, DataExtractionOptions *options=0)
 

Detailed Description

The class DataExtractionModule. static interface to Apryse SDKs data extraction functionality

Definition at line 23 of file DataExtractionModule.h.

Member Enumeration Documentation

Enumerator
e_Tabular 

Tabular Data engine. This engine identifies column and row structure and analyzes numeric columns. It is especially suited to documents that are table-based such as spreadsheets.

e_Form 

Form field extraction engine. This engine uses artificial intelligence and computer vision to detect form fields in documents that do not have any interactive field annotations embedded.

e_DocStructure 

Document structure engine. This engine discovers the full logical structure, including headers, footers, paragraphs, list items, table columns, cells, borders, images and graphics.

e_FormKeyValue 

Form field with key value extraction engine. This engine uses artificial intelligence and computer vision to detect form fields, including field name and values, in documents that do not have any interactive field annotations embedded. Note: This engine is experimental and subject to change.

Definition at line 29 of file DataExtractionModule.h.

Member Function Documentation

static void pdftron::PDF::DataExtractionModule::DetectAndAddFormFieldsToPDF ( PDFDoc doc,
DataExtractionOptions options = 0 
)
static

Perform automatic form field detection, then insert the fields into the PDF. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
doc– The PDF document where fields are detected from and inserted into.
options– Data extraction options (optional).
static UString pdftron::PDF::DataExtractionModule::ExtractData ( const UString input_pdf_file,
DataExtractionEngine  engine,
DataExtractionOptions options = 0 
)
static

Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file– The source document filename.
engine– The extraction engine.
options– Data extraction options (optional).
Returns
JSON string representing the extracted results.
static void pdftron::PDF::DataExtractionModule::ExtractData ( const UString input_pdf_file,
const UString output_json_file,
DataExtractionEngine  engine,
DataExtractionOptions options = 0 
)
static

Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file– The source document filename.
output_json_file– The resulting JSON filename.
engine– The extraction engine.
options– Data extraction options (optional).
static void pdftron::PDF::DataExtractionModule::ExtractToXLSX ( const UString input_pdf_file,
const UString output_xlsx_file,
DataExtractionOptions options = 0 
)
static

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file– The source document filename.
output_xlsx_file– The resulting XLSX filename.
options– Data extraction options (optional).
static void pdftron::PDF::DataExtractionModule::ExtractToXLSX ( const UString input_pdf_file,
Filters::Filter output_xlsx_stream,
DataExtractionOptions options = 0 
)
static

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file– The source document filename.
output_xlsx_stream– The resulting XLSX filter.
options– Data extraction options (optional).
static bool pdftron::PDF::DataExtractionModule::IsModuleAvailable ( DataExtractionEngine  engine)
static

Find out whether the specified data extraction module is available (and licensed).

Parameters
engine– The extraction engine.
Returns
returns true if data extraction operations can be performed.

The documentation for this class was generated from the following file: