#include <DataExtractionModule.h>

Public Types
enum	DataExtractionEngine { e_Tabular = 0, e_Form = 1, e_DocStructure = 2, e_FormKeyValue = 3 }

Static Public Member Functions
static bool	IsModuleAvailable (DataExtractionEngine engine)

static UString	ExtractData (const UString &input_pdf_file, DataExtractionEngine engine, DataExtractionOptions *options=0)

static void	ExtractData (const UString &input_pdf_file, const UString &output_json_file, DataExtractionEngine engine, DataExtractionOptions *options=0)

static void	DetectAndAddFormFieldsToPDF (PDFDoc &doc, DataExtractionOptions *options=0)

static void	ExtractToXLSX (const UString &input_pdf_file, const UString &output_xlsx_file, DataExtractionOptions *options=0)

static void	ExtractToXLSX (const UString &input_pdf_file, Filters::Filter &output_xlsx_stream, DataExtractionOptions *options=0)

Detailed Description

The class DataExtractionModule. static interface to Apryse SDKs data extraction functionality

Definition at line 23 of file DataExtractionModule.h.

Member Enumeration Documentation

enum pdftron::PDF::DataExtractionModule::DataExtractionEngine

Enumerator
e_Tabular	Tabular Data engine. This engine identifies column and row structure and analyzes numeric columns. It is especially suited to documents that are table-based such as spreadsheets.
e_Form	Form field extraction engine. This engine uses artificial intelligence and computer vision to detect form fields in documents that do not have any interactive field annotations embedded.
e_DocStructure	Document structure engine. This engine discovers the full logical structure, including headers, footers, paragraphs, list items, table columns, cells, borders, images and graphics.
e_FormKeyValue	Form field with key value extraction engine. This engine uses artificial intelligence and computer vision to detect form fields, including field name and values, in documents that do not have any interactive field annotations embedded. Note: This engine is experimental and subject to change.

Definition at line 29 of file DataExtractionModule.h.

Member Function Documentation

static void pdftron::PDF::DataExtractionModule::DetectAndAddFormFieldsToPDF	(	PDFDoc &	doc,
		DataExtractionOptions *	options = `0`
	)

static

Perform automatic form field detection, then insert the fields into the PDF. Note: The FormKeyValue engine is experimental and subject to change.

Parameters

doc	– The PDF document where fields are detected from and inserted into.
options	– Data extraction options (optional).

static UString pdftron::PDF::DataExtractionModule::ExtractData	(	const UString &	input_pdf_file,
		DataExtractionEngine	engine,
		DataExtractionOptions *	options = `0`
	)

static

Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.

Parameters

input_pdf_file	– The source document filename.
engine	– The extraction engine.
options	– Data extraction options (optional).

Returns: JSON string representing the extracted results.

static void pdftron::PDF::DataExtractionModule::ExtractData	(	const UString &	input_pdf_file,
		const UString &	output_json_file,
		DataExtractionEngine	engine,
		DataExtractionOptions *	options = `0`
	)

static

Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.

Parameters

input_pdf_file	– The source document filename.
output_json_file	– The resulting JSON filename.
engine	– The extraction engine.
options	– Data extraction options (optional).

static void pdftron::PDF::DataExtractionModule::ExtractToXLSX	(	const UString &	input_pdf_file,
		const UString &	output_xlsx_file,
		DataExtractionOptions *	options = `0`
	)

static

Perform data extraction on a PDF in XLSX output format.

Parameters

input_pdf_file	– The source document filename.
output_xlsx_file	– The resulting XLSX filename.
options	– Data extraction options (optional).

static void pdftron::PDF::DataExtractionModule::ExtractToXLSX	(	const UString &	input_pdf_file,
		Filters::Filter &	output_xlsx_stream,
		DataExtractionOptions *	options = `0`
	)

static

Perform data extraction on a PDF in XLSX output format.

Parameters

input_pdf_file	– The source document filename.
output_xlsx_stream	– The resulting XLSX filter.
options	– Data extraction options (optional).

static bool pdftron::PDF::DataExtractionModule::IsModuleAvailable ( DataExtractionEngine engine )

static

Find out whether the specified data extraction module is available (and licensed).

Parameters

engine – The extraction engine.

Returns: returns true if data extraction operations can be performed.

The documentation for this class was generated from the following file:

PDF/DataExtractionModule.h

Public Types

Static Public Member Functions

Detailed Description

Member Enumeration Documentation

Member Function Documentation