Class: DataExtractionModule

PDFNet. DataExtractionModule


new DataExtractionModule()

The class DataExtractionModule. static interface to Apryse SDKs data extraction functionality

Classes

DataExtractionOptions

Members


<static> DataExtractionEngine

Type:
  • number
Properties:
Name Type Description
e_Tabular number
e_Form number
e_DocStructure number
e_FormKeyValue number

Methods


<static> createDataExtractionOptions()

Method to create a DataExtractionOptions object
Returns:
A promise that resolves to a PDFNet.DataExtractionModule.DataExtractionOptions.
Type
Promise.<PDFNet.DataExtractionModule.DataExtractionOptions>

<static> detectAndAddFormFieldsToPDF(doc [, options])

Perform automatic form field detection, then insert the fields into the PDF. Note: The FormKeyValue engine is experimental and subject to change.
Parameters:
Name Type Argument Description
doc PDFNet.PDFDoc | PDFNet.SDFDoc | PDFNet.FDFDoc - The PDF document where fields are detected from and inserted into.
options PDFNet.DataExtractionModule.DataExtractionOptions <optional>
- Data extraction options (optional).
Returns:
Type
Promise.<void>

<static> extractData(input_pdf_file, output_json_file, engine [, options])

Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.
Parameters:
Name Type Argument Description
input_pdf_file string - The source document filename.
output_json_file string - The resulting JSON filename.
engine number
PDFNet.DataExtractionModule.DataExtractionEngine = {
	e_Tabular : 0
	e_Form : 1
	e_DocStructure : 2
	e_FormKeyValue : 3
}
-- The extraction engine.
options PDFNet.DataExtractionModule.DataExtractionOptions <optional>
- Data extraction options (optional).
Returns:
Type
Promise.<void>

<static> extractDataAsString(input_pdf_file, engine [, options])

Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.
Parameters:
Name Type Argument Description
input_pdf_file string - The source document filename.
engine number
PDFNet.DataExtractionModule.DataExtractionEngine = {
	e_Tabular : 0
	e_Form : 1
	e_DocStructure : 2
	e_FormKeyValue : 3
}
-- The extraction engine.
options PDFNet.DataExtractionModule.DataExtractionOptions <optional>
- Data extraction options (optional).
Returns:
A promise that resolves to jSON string representing the extracted results.
Type
Promise.<string>

<static> extractToXLSX(input_pdf_file, output_xlsx_file [, options])

Perform data extraction on a PDF in XLSX output format.
Parameters:
Name Type Argument Description
input_pdf_file string - The source document filename.
output_xlsx_file string - The resulting XLSX filename.
options PDFNet.DataExtractionModule.DataExtractionOptions <optional>
- Data extraction options (optional).
Returns:
Type
Promise.<void>

<static> extractToXLSXWithFilter(input_pdf_file, output_xlsx_stream [, options])

Perform data extraction on a PDF in XLSX output format.
Parameters:
Name Type Argument Description
input_pdf_file string - The source document filename.
output_xlsx_stream PDFNet.Filter - The resulting XLSX filter.
options PDFNet.DataExtractionModule.DataExtractionOptions <optional>
- Data extraction options (optional).
Returns:
Type
Promise.<void>

<static> isModuleAvailable(engine)

Find out whether the specified data extraction module is available (and licensed).
Parameters:
Name Type Description
engine number
PDFNet.DataExtractionModule.DataExtractionEngine = {
	e_Tabular : 0
	e_Form : 1
	e_DocStructure : 2
	e_FormKeyValue : 3
}
-- The extraction engine.
Returns:
A promise that resolves to returns true if data extraction operations can be performed.
Type
Promise.<boolean>