public class

DataExtractionModule

extends Object
java.lang.Object
   ↳ com.pdftron.pdf.DataExtractionModule

Class Overview

The class DataExtractionModule. static interface to PDFTron SDKs data extraction functionality

Summary

Nested Classes
enum DataExtractionModule.DataExtractionEngine  
Public Methods
static void detectAndAddFormFieldsToPDF(Doc doc)
Perform automatic form field detection, then insert the fields into the PDF.
static void detectAndAddFormFieldsToPDF(Doc doc, DataExtractionOptions options)
Perform automatic form field detection, then insert the fields into the PDF.
static void extractData(String input_pdf_file, String output_json_file, DataExtractionModule.DataExtractionEngine engine, DataExtractionOptions options)
Perform data extraction on a PDF file using the specified engine.
static String extractData(String input_pdf_file, DataExtractionModule.DataExtractionEngine engine)
Perform data extraction on a PDF file using the specified engine and return the resulting JSON string.
static String extractData(String input_pdf_file, DataExtractionModule.DataExtractionEngine engine, DataExtractionOptions options)
Perform data extraction on a PDF file using the specified engine and return the resulting JSON string.
static void extractData(String input_pdf_file, String output_json_file, DataExtractionModule.DataExtractionEngine engine)
Perform data extraction on a PDF file using the specified engine.
static void extractToXLSX(String input_pdf_file, Filter output_xlsx_stream, DataExtractionOptions options)
Perform data extraction on a PDF in XLSX output format.
static void extractToXLSX(String input_pdf_file, String output_xlsx_file)
Perform data extraction on a PDF in XLSX output format.
static void extractToXLSX(String input_pdf_file, String output_xlsx_file, DataExtractionOptions options)
Perform data extraction on a PDF in XLSX output format.
static void extractToXLSX(String input_pdf_file, Filter output_xlsx_stream)
Perform data extraction on a PDF in XLSX output format.
static boolean isModuleAvailable(DataExtractionModule.DataExtractionEngine engine)
Find out whether the specified data extraction module is available (and licensed).
[Expand]
Inherited Methods
From class java.lang.Object

Public Methods

public static void detectAndAddFormFieldsToPDF (Doc doc)

Perform automatic form field detection, then insert the fields into the PDF.

Parameters
doc -- The PDF document where fields are detected from and inserted into.

public static void detectAndAddFormFieldsToPDF (Doc doc, DataExtractionOptions options)

Perform automatic form field detection, then insert the fields into the PDF. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
doc -- The PDF document where fields are detected from and inserted into.
options -- Data extraction options (optional).

public static void extractData (String input_pdf_file, String output_json_file, DataExtractionModule.DataExtractionEngine engine, DataExtractionOptions options)

Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file -- The source document filename
output_json_file -- The resulting JSON filename
engine -- The extraction engine
options -- Data extraction options

public static String extractData (String input_pdf_file, DataExtractionModule.DataExtractionEngine engine)

Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file -- The source document filename
engine -- The extraction engine
Returns
  • JSON string representing the extracted results

public static String extractData (String input_pdf_file, DataExtractionModule.DataExtractionEngine engine, DataExtractionOptions options)

Perform data extraction on a PDF file using the specified engine and return the resulting JSON string. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file -- The source document filename
engine -- The extraction engine
options -- Data extraction options
Returns
  • JSON string representing the extracted results

public static void extractData (String input_pdf_file, String output_json_file, DataExtractionModule.DataExtractionEngine engine)

Perform data extraction on a PDF file using the specified engine. Note: The FormKeyValue engine is experimental and subject to change.

Parameters
input_pdf_file -- The source document filename
output_json_file -- The resulting JSON filename
engine -- The extraction engine

public static void extractToXLSX (String input_pdf_file, Filter output_xlsx_stream, DataExtractionOptions options)

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file -- The source document filename
output_xlsx_stream -- The resulting XLSX filter
options -- Data extraction options

public static void extractToXLSX (String input_pdf_file, String output_xlsx_file)

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file -- The source document filename
output_xlsx_file -- The resulting XLSX filename

public static void extractToXLSX (String input_pdf_file, String output_xlsx_file, DataExtractionOptions options)

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file -- The source document filename
output_xlsx_file -- The resulting XLSX filename
options -- Data extraction options

public static void extractToXLSX (String input_pdf_file, Filter output_xlsx_stream)

Perform data extraction on a PDF in XLSX output format.

Parameters
input_pdf_file -- The source document filename
output_xlsx_stream -- The resulting XLSX filter

public static boolean isModuleAvailable (DataExtractionModule.DataExtractionEngine engine)

Find out whether the specified data extraction module is available (and licensed).

Parameters
engine -- The extraction engine
Returns
  • returns true if data extraction operations can be performed