Requirements These packages are required to use these features in production. Trial keys have unlimited access to all features.
This guide walks you through installing and configuring Apryse’s Data Extraction Module so you can start extracting data from PDFs quickly and reliably.
When using Python on Windows or Linux you can install the package via PIP with this command:
x64 is supported, but Arm and 32 bit are not.
sh 1 pip  install  --extra-index-url=https://pypi.apryse.com  apryse-data-extraction 
When using Node.js on Windows or Linux you can install the package via NPM with this command:
x64 is supported, but Arm and 32 bit are not.
sh 1 npm  install @pdftron/data-extraction 
For Windows, just copy DataExtractionModuleWindows.zip  in your PDFNetC folder, then extract it locally. You should have files like:
x64 is supported, but Arm and 32 bit are not.
Lib\Windows\StructuredOutput.exe Lib\Windows\OCRModule.exe Lib\Windows\TabularData\TabularData.dll Lib\Windows\AIPageObjectExtractor\AIPageObjectExtractor.dll For Linux, just copy DataExtractionModuleLinux.tar.gz  in your PDFNetC directory, then extract it locally. You should have files like
Lib/Linux/StructuredOutput Lib/Linux/OCRModule Lib/Linux/TabularData/TabularData Lib/Linux/AIPageObjectExtractor/AIPageObjectExtractor Please refer to the below specifications to learn more about the output JSON format.
If you are using PIP or NPM, you may skip setting AddResourceSearchPath. Otherwise, follow the directions below.
The first thing to set up before the module can be used is the location of the Lib directory under which the external add-ons are installed, so that the SDK knows where to look for them. This is achieved via the PDFNet AddResourceSearchPath function. If a relative path is used, it is based on the end-user executable.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 PDFNet. AddResourceSearchPath ( " ../../../../../Lib/ " ); 
1 PDFNet :: AddResourceSearchPath ( " ../../../Lib/ " ); 
1 PDFNetAddResourceSearchPath ( " ../../../PDFNetC/Lib/ " ) 
1 PDFNet. addResourceSearchPath ( " ../../../Lib/ " ); 
1 await  PDFNet. addResourceSearchPath ( ' ../../lib/ ' ); 
1 PDFNet :: AddResourceSearchPath ( " ../../../PDFNetC/Lib/ " ); 
1 PDFNet.AddResourceSearchPath( " ../../../PDFNetC/Lib/ " ) 
1 PDFNet . AddResourceSearchPath ( " ../../../PDFNetC/Lib/ " ) 
1 PDFNet. AddResourceSearchPath ( " ../../../../../Lib/ " ) 
Note: do not specify the actual Windows, Linux, MacOS directory, where the individual executables are, but its parent folder.
For error handling purposes, it is generally advisable to test whether the module is available via the IsModuleAvailable function. Since the Data Extraction suite consists of multiple modules, an extra parameter is used to clarify the component to test.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 if  ( ! DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_tabular)) 
2 { 
3    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
4 } 
5 if  ( ! DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_structure)) 
6 { 
7    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
8 } 
9 if  ( ! DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_form)) 
10 { 
11    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
12 } 
13 if  ( ! DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_generic_key_value)) 
14 { 
15    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
16 } 
17 if  ( ! DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_classification)) 
18 { 
19    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
20 } 
1 if  ( ! DataExtractionModule :: IsModuleAvailable (DataExtractionModule :: e_Tabular)) 
2 { 
3    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
4 } 
5 if  ( ! DataExtractionModule :: IsModuleAvailable (DataExtractionModule :: e_DocStructure)) 
6 { 
7    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
8 } 
9 if  ( ! DataExtractionModule :: IsModuleAvailable (DataExtractionModule :: e_Form)) 
10 { 
11    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
12 } 
13 if  ( ! DataExtractionModule :: IsModuleAvailable (DataExtractionModule :: e_GenericKeyValue)) 
14 { 
15    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
16 } 
17 if  ( ! DataExtractionModule :: IsModuleAvailable (DataExtractionModule :: e_DocClassification)) 
18 { 
19    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
20 } 
1 if ! DataExtractionModuleIsModuleAvailable (DataExtractionModuleE_Tabular) { 
2    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 } 
4 if ! DataExtractionModuleIsModuleAvailable (DataExtractionModuleE_DocStructure) { 
5    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
6 } 
7 if ! DataExtractionModuleIsModuleAvailable (DataExtractionModuleE_Form) { 
8    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
9 } 
10 if ! DataExtractionModuleIsModuleAvailable (DataExtractionModuleE_GenericKeyValue) { 
11    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
12 } 
13 if ! DataExtractionModuleIsModuleAvailable (DataExtractionModuleE_DocClassification) { 
14    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
15 } 
1 if  ( ! DataExtractionModule. isModuleAvailable (DataExtractionModule.DataExtractionEngine.e_tabular)) 
2 { 
3    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
4 } 
5 if  ( ! DataExtractionModule. isModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_structure)) 
6 { 
7    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
8 } 
9 if  ( ! DataExtractionModule. isModuleAvailable (DataExtractionModule.DataExtractionEngine.e_form)) 
10 { 
11    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
12 } 
13 if  ( ! DataExtractionModule. isModuleAvailable (DataExtractionModule.DataExtractionEngine.e_generic_key_value)) 
14 { 
15    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
16 } 
17 if  ( ! DataExtractionModule. isModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_classification)) 
18 { 
19    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
20 } 
1 if  ( !await  PDFNet.DataExtractionModule. isModuleAvailable (PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular)) { 
2    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 } 
4 if  ( !await  PDFNet.DataExtractionModule. isModuleAvailable (PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure)) { 
5    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
6 } 
7 if  ( !await  PDFNet.DataExtractionModule. isModuleAvailable (PDFNet.DataExtractionModule.DataExtractionEngine.e_Form)) { 
8    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
9 } 
10 if  ( !await  PDFNet.DataExtractionModule. isModuleAvailable (PDFNet.DataExtractionModule.DataExtractionEngine.e_GenericKeyValue)) { 
11    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
12 } 
13 if  ( !await  PDFNet.DataExtractionModule. isModuleAvailable (PDFNet.DataExtractionModule.DataExtractionEngine.e_DocClassification)) { 
14    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
15 } 
1 if  ( ! DataExtractionModule :: IsModuleAvailable ( DataExtractionModule :: e_Tabular )) { 
2    // Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 } 
4 if  ( ! DataExtractionModule :: IsModuleAvailable ( DataExtractionModule :: e_DocStructure )) { 
5    // Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
6 } 
7 if  ( ! DataExtractionModule :: IsModuleAvailable ( DataExtractionModule :: e_Form )) { 
8    // Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
9 } 
10 if  ( ! DataExtractionModule :: IsModuleAvailable ( DataExtractionModule :: e_GenericKeyValue )) { 
11    // Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
12 } 
13 if  ( ! DataExtractionModule :: IsModuleAvailable ( DataExtractionModule :: e_DocClassification )) { 
14    // Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
15 } 
1 if not  DataExtractionModule.IsModuleAvailable(DataExtractionModule.e_Tabular): 
2    pass  # Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 if not  DataExtractionModule.IsModuleAvailable(DataExtractionModule.e_DocStructure): 
4    pass  # Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
5 if not  DataExtractionModule.IsModuleAvailable(DataExtractionModule.e_Form): 
6    pass  # Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
7 if not  DataExtractionModule.IsModuleAvailable(DataExtractionModule.e_GenericKeyValue): 
8    pass  # Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
9 if not  DataExtractionModule.IsModuleAvailable(DataExtractionModule.e_DocClassification): 
10    pass  # Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
1 if ! DataExtractionModule . IsModuleAvailable ( DataExtractionModule :: E_Tabular )  then 
2    # Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 end 
4 if ! DataExtractionModule . IsModuleAvailable ( DataExtractionModule :: E_DocStructure )  then 
5    # Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
6 end 
7 if ! DataExtractionModule . IsModuleAvailable ( DataExtractionModule :: E_Form )  then 
8    # Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
9 end 
10 if ! DataExtractionModule . IsModuleAvailable ( DataExtractionModule :: E_GenericKeyValue )  then 
11    # Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
12 end 
13 if ! DataExtractionModule . IsModuleAvailable ( DataExtractionModule :: E_DocClassification )  then 
14    # Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
15 end 
1 If Not  DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_tabular)  Then 
2    ' Unable to run Data Extraction: PDFTron SDK Tabular Data module not available. 
3 End If 
4 If Not  DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_structure)  Then 
5    ' Unable to run Data Extraction: PDFTron SDK Structured Output module not available. 
6 End If 
7 If Not  DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_form)  Then 
8    ' Unable to run Data Extraction: PDFTron SDK AIFormFieldExtractor module not available. 
9 End If 
10 If Not  DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_generic_key_value)  Then 
11    ' Unable to run Data Extraction: PDFTron SDK AIGenericKeyValue module not available. 
12 End If 
13 If Not  DataExtractionModule. IsModuleAvailable (DataExtractionModule.DataExtractionEngine.e_doc_classification)  Then 
14    ' Unable to run Data Extraction: PDFTron SDK AIDocClassification module not available. 
15 End If 
If you have the module installed but the function still returns false, please double check that the correct path was used in AddResourceSearchPath earlier.
Although the default options will satisfy most common use cases, we offer a couple of options to customize the extraction behavior and unlock lesser-used functionality.
The options object is passed as the last parameter to any extraction function, as shown below.
Use the Language option to set the preferred OCR language(s). If you work with scanned documents in languages other than English, specify one or more 3-letter ISO 639-2  language codes, separated by spaces. For example, "eng deu spa fra" for English, German, Spanish, French. You may also use comma or plus as a separator.
Supported languages:
eng: Englishdeu or ger: Germanfra or fre: Frenchita: Italianrus: Russianspa: SpanishNote: Listing too many languages at once may hurt performance and accuracy. If you know the exact language, it is always best to use that single setting.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetLanguage ( " fra spa " );  // French and Spanish 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 DataExtractionOptions options; 
2 options. SetLanguage ( " fra spa " );  // French and Spanish 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule :: e_Tabular,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetLanguage ( " fra spa " );  // French and Spanish 
3 DataExtractionModuleExtractData ( " table.pdf " ,  " table.json " , DataExtractionModuleE_Tabular, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setLanguage ( " fra spa " );  // French and Spanish 
3 DataExtractionModule. extractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setLanguage ( " fra spa " );  // French and Spanish 
3 await  PDFNet.DataExtractionModule. extractData ( ' table.pdf ' ,  ' table.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options . setLanguage ( " fra spa " );  // French and Spanish 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: e_Tabular , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetLanguage( " fra spa " )  # French and Spanish 
3 DataExtractionModule.ExtractData( " table.pdf " ,  " table.json " , DataExtractionModule.e_Tabular, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetLanguage ( " fra spa " )  # French and Spanish 
3 DataExtractionModule . ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: E_Tabular , options) 
1 Dim  options  As  DataExtractionOptions  = New  DataExtractionOptions () 
2 options. SetLanguage ( " fra spa " )  ' French and Spanish 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options) 
Use the PDFPassword option to specify a PDF password if one is required.
Encrypted PDF files that are protected by a password may only be opened when the password is specified in addition to the filename. No password is necessary for files that can be viewed without any authentication.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetPDFPassword ( " password123 " );  // password for input PDF 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 DataExtractionOptions options; 
2 options. SetPDFPassword ( " password123 " );  // password for input PDF 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule :: e_Tabular,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetPDFPassword ( " password123 " )  // password for input PDF 
3 DataExtractionModuleExtractData ( " table.pdf " ,  " table.json " , DataExtractionModuleE_Tabular, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setPDFPassword ( " password123 " );  // password for input PDF 
3 DataExtractionModule. extractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setPDFPassword ( " password123 " );  // password for input PDF 
3 await  PDFNet.DataExtractionModule. extractData ( ' table.pdf ' ,  ' table.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options . setPDFPassword ( " password123 " );  // password for input PDF 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: e_Tabular , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetPDFPassword( " password123 " )  # password for input PDF 
3 DataExtractionModule.ExtractData( " table.pdf " ,  " table.json " , DataExtractionModule.e_Tabular, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetPDFPassword ( " password123 " )  # password for input PDF 
3 DataExtractionModule . ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: E_Tabular , options) 
1 Dim  options  As  DataExtractionOptions  = New  DataExtractionOptions () 
2 options. SetPDFPassword ( " password123 " )  ' password for input PDF 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options) 
Use the Pages option to restrict the extraction to a selected range of pages.
This can be a single page number (such as "1" for the first page), or a range separated by a dash (such as "1-5", or "7-" for 7 and beyond). An empty string means all pages are extracted.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetPages ( " 1 " );  // extract page 1 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 DataExtractionOptions options; 
2 options. SetPages ( " 1 " );  // extract page 1 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule :: e_Tabular,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetPages ( " 1 " )  // page 1 
3 DataExtractionModuleExtractData ( " table.pdf " ,  " table.json " , DataExtractionModuleE_Tabular, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setPages ( " 1 " );  // extract page 1 
3 DataExtractionModule. extractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setPages ( " 1 " );  // page 1 
3 await  PDFNet.DataExtractionModule. extractData ( ' table.pdf ' ,  ' table.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options . setPages ( " 1 " );  // page 1 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: e_Tabular , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetPages( " 1 " )  # page 1 
3 DataExtractionModule.ExtractData( " table.pdf " ,  " table.json " , DataExtractionModule.e_Tabular, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetPages ( " 1 " )  # page 1 
3 DataExtractionModule . ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: E_Tabular , options) 
1 Dim  options  As  DataExtractionOptions  = New  DataExtractionOptions () 
2 options. SetPages ( " 1 " )  ' extract page 1 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular, options) 
You can specify regions to include or exclude from analysis for each page in a document using the Inclusion Zone and Exclusion Zone options for a page. These options specify rectangles in user-space coordinates that allow developers to either include or exclude a region from analysis. For example, if a document has a table that you don't want to analyze, you could specify it's bounding box as an exclusion zone, or if a document has only one paragraph that you care about, you could use an inclusion zone. If no zones are specified for a page, the entire page is included in analysis.
Inclusion and exclusion zones can be combined to create complex regions of interest. Inclusions zones are combined by union, and exclusion zones are subtracted.
This option is only supported for the Form, FormKeyValue, and GenericKeyValue engines at this time.
Inclusion and Exclusion example C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 
3 RectCollection  p4InclusionZones  = new  RectCollection (); 
4 RectCollection  p4ExclusionZones  = new  RectCollection (); 
5 p4InclusionZones. AddRect ( 30 ,  432 ,  562 ,  684 ); 
6 p4ExclusionZones. AddRect ( 30 ,  657 ,  295 ,  684 ); 
7 options. AddInclusionZonesForPage (p4InclusionZones,  4 ); 
8 options. AddExclusionZonesForPage (p4ExclusionZones,  4 ); 
9 
10 DataExtractionModule. ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value, options); 
1 DataExtractionOptions options; 
2 
3 RectCollection p4_inclusion_zones, p4_exclusion_zones; 
4 p4_inclusion_zones. AddRect ( 30 ,  432 ,  562 ,  684 ); 
5 p4_exclusion_zones. AddRect ( 30 ,  657 ,  295 ,  684 ); 
6 options. AddInclusionZonesForPage (p4_inclusion_zones,  4 ); 
7 options. AddExclusionZonesForPage (p4_exclusion_zones,  4 ); 
8 
9 DataExtractionModule :: ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule :: e_GenericKeyValue,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 
3 p4InclusionZones  :=  NewRectCollection () 
4 p4ExclusionZones  :=  NewRectCollection () 
5 p4InclusionZones. AddRect ( NewRect ( 30 ,  432 ,  562 ,  684 )) 
6 p4ExclusionZones. AddRect ( NewRect ( 30 ,  657 ,  295 ,  684 )) 
7 options. AddInclusionZonesForPage (p4InclusionZones,  4 ) 
8 options. AddExclusionZonesForPage (p4ExclusionZones,  4 ) 
9 
10 DataExtractionModuleExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModuleE_GenericKeyValue, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 
3 RectCollection  p4InclusionZones  =  new  RectCollection (); 
4 RectCollection  p4ExclusionZones  =  new  RectCollection (); 
5 p4InclusionZones. addRect ( 30 ,  432 ,  562 ,  684 ); 
6 p4ExclusionZones. addRect ( 30 ,  657 ,  295 ,  684 ); 
7 options. addInclusionZonesForPage (p4InclusionZones,  4 ); 
8 options. addExclusionZonesForPage (p4ExclusionZones,  4 ); 
9 
10 DataExtractionModule. extractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 					 
3 const  p4InclusionZones  =  []; 
4 const  p4ExclusionZones  =  []; 
5 p4InclusionZones. push ( new  PDFNet. Rect ( 30 ,  432 ,  562 ,  684 )); 
6 p4ExclusionZones. push ( new  PDFNet. Rect ( 30 ,  657 ,  295 ,  684 )); 
7 options. addInclusionZonesForPage (p4InclusionZones,  4 ); 
8 options. addExclusionZonesForPage (p4ExclusionZones,  4 ); 
9 
10 await  PDFNet.DataExtractionModule. extractData ( ' newsletter.pdf ' ,  ' newsletter_key_val_with_zones.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_GenericKeyValue, options); 
1 $options  = new  DataExtractionOptions (); 
2 
3 $p4InclusionZones  = new  RectCollection (); 
4 $p4ExclusionZones  = new  RectCollection (); 
5 $p4InclusionZones -> AddRect ( new  Rect ( 30.0 ,  432.0 ,  562.0 ,  684.0 )); 
6 $p4ExclusionZones -> AddRect ( new  Rect ( 30.0 ,  657.0 ,  295.0 ,  684.0 )); 
7 $options -> AddInclusionZonesForPage ($p4InclusionZones,  4 ); 
8 $options -> AddExclusionZonesForPage ($p4ExclusionZones,  4 ); 
9 
10 DataExtractionModule :: ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,  DataExtractionModule :: e_GenericKeyValue , $options); 
1 options  =  DataExtractionOptions() 
2 
3 p4_inclusion_zones  =  RectCollection() 
4 p4_exclusion_zones  =  RectCollection() 
5 p4_inclusion_zones.AddRect(Rect( 30 ,  432 ,  562 ,  684 )) 
6 p4_exclusion_zones.AddRect(Rect( 30 ,  657 ,  295 ,  684 )) 
7 options.AddInclusionZonesForPage(p4_inclusion_zones,  4 ) 
8 options.AddExclusionZonesForPage(p4_exclusion_zones,  4 ) 
9 
10 DataExtractionModule.ExtractData( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.e_GenericKeyValue, options) 
1 options =  DataExtractionOptions . new () 
2 
3 p4_inclusion_zones =  RectCollection . new () 
4 p4_exclusion_zones =  RectCollection . new () 
5 p4_inclusion_zones. AddRect ( Rect . new ( 30 ,  432 ,  562 ,  684 )) 
6 p4_exclusion_zones. AddRect ( Rect . new ( 30 ,  657 ,  295 ,  684 )) 
7 options. AddInclusionZonesForPage (p4_inclusion_zones,  4 ) 
8 options. AddExclusionZonesForPage (p4_exclusion_zones,  4 ) 
9 
10 DataExtractionModule . ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,  DataExtractionModule :: E_GenericKeyValue , options) 
1 Dim  options  As New  DataExtractionOptions () 
2 
3 Dim  p4InclusionZones  As New  RectCollection () 
4 Dim  p4ExclusionZones  As New  RectCollection () 
5 p4InclusionZones. AddRect ( 30 ,  432 ,  562 ,  684 ) 
6 p4ExclusionZones. AddRect ( 30 ,  657 ,  295 ,  684 ) 
7 options. AddInclusionZonesForPage (p4InclusionZones,  4 ) 
8 options. AddExclusionZonesForPage (p4ExclusionZones,  4 ) 
9 
10 DataExtractionModule. ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,DataExtractionModule.DataExtractionEngine.e_generic_key_value, options) 
Specifies if Deep Learning is used with table recognition in the DocStructure engine. Table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetDeepLearningAssist ( true );  // Enable Deep learning assistant 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_DocStructure, options); 
1 DataExtractionOptions options; 
2 options. SetDeepLearningAssist ( true );  // Enable Deep learning assistant 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule :: e_DocStructure,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetDeepLearningAssist ( true )  // Enable Deep learning assistant 
3 DataExtractionModuleExtractData ( " table.pdf " ,  " table.json " , DataExtractionModuleE_DocStructure, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setDeepLearningAssist ( true );  // Enable Deep learning assistant 
3 DataExtractionModule. extractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_DocStructure, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setDeepLearningAssist ( true );  // Enable Deep learning assistant 
3 await  PDFNet.DataExtractionModule. extractData ( ' table.pdf ' ,  ' table.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options . setDeepLearningAssist ( true );  // Enable Deep learning assistant 
3 DataExtractionModule :: ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: e_DocStructure , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetDeepLearningAssist( True )  # Enable Deep learning assistant 
3 DataExtractionModule.ExtractData( " table.pdf " ,  " table.json " , DataExtractionModule.e_DocStructure, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetDeepLearningAssist ( true )  # Enable Deep learning assistant 
3 DataExtractionModule . ExtractData ( " table.pdf " ,  " table.json " ,  DataExtractionModule :: E_DocStructure , options) 
1 Dim  options  As  DataExtractionOptions  = New  DataExtractionOptions () 
2 options. SetDeepLearningAssist ( True )  ' Enable Deep learning assistant 
3 DataExtractionModule. ExtractData ( " table.pdf " ,  " table.json " , DataExtractionModule.DataExtractionEngine.e_DocStructure, options) 
When automatically detecting form fields and adding them to a document, you can force the module to preserve any existing form annotations that are already present in the document, only adding newly detected fields.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 PDFDoc  doc  = new  PDFDoc ( " formfields.pdf " ); 
2 DataExtractionOptions  options  = new  DataExtractionOptions (); 
3 options. SetOverlappingFormFieldBehavior ( " KeepOld " ); 
4 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc, options); 
1 PDFDoc  doc ( " formfields.pdf " ); 
2 DataExtractionOptions options; 
3 options. SetOverlappingFormFieldBehavior ( " KeepOld " ); 
4 DataExtractionModule :: DetectAndAddFormFieldsToPDF (doc,  & options); 
1 doc  =  NewPDFDoc ( " formfields.pdf " ) 
2 options  :=  NewDataExtractionOptions () 
3 options. SetOverlappingFormFieldBehavior ( " KeepOld " ) 
4 DataExtractionModuleDetectAndAddFormFieldsToPDF (doc, options) 
1 PDFDoc  doc  =  new  PDFDoc ( " formfields.pdf " ); 
2 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
3 options. setOverlappingFormFieldBehavior ( " KeepOld " ); 
4 DataExtractionModule. detectAndAddFormFieldsToPDF (doc, options); 
1 const  doc  = await  PDFNet.PDFDoc. createFromFilePath ( " formfields.pdf " ); 
2 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
3 options. setOverlappingFormFieldBehavior ( ' KeepOld ' ); 
4 await  PDFNet.DataExtractionModule. detectAndAddFormFieldsToPDF (doc, options); 
1 $doc  = new  PDFDoc ( " formfields.pdf " ); 
2 $options  = new  DataExtractionOptions (); 
3 $options -> SetOverlappingFormFieldBehavior ( " KeepOld " ); 
4 DataExtractionModule :: DetectAndAddFormFieldsToPDF ($doc, $options); 
1 doc  =  PDFDoc( " formfields.pdf " ) 
2 options  =  DataExtractionOptions() 
3 options.SetOverlappingFormFieldBehavior( " KeepOld " ) 
4 DataExtractionModule.DetectAndAddFormFieldsToPDF(doc, options) 
1 doc =  PDFDoc . new ( " formfields.pdf " ) 
2 options =  DataExtractionOptions . new () 
3 options. SetOverlappingFormFieldBehavior ( " KeepOld " ) 
4 DataExtractionModule . DetectAndAddFormFieldsToPDF (doc, options) 
1 Dim  doc  as  PDFDoc  = New  PDFDoc ( " formfields.pdf " ) 
2 Dim  options  = New  DataExtractionOptions () 
3 options. SetOverlappingFormFieldBehavior ( " KeepOld " ) 
4 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc, options) 
Specifies if empty fields should be recognized in the GenericKeyValue engine. The default is true. Users who don't require empty fields could benefit from setting this option to false, thus reducing processing time.
This option only affects the GenericKeyValue engine.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetDetectEmptyFields ( false ); 
3 DataExtractionModule. ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value, options); 
1 DataExtractionOptions options; 
2 options. SetDetectEmptyFields ( false ); 
3 DataExtractionModule :: ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule :: e_GenericKeyValue,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetDetectEmptyFields ( false ) 
3 DataExtractionModuleExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModuleE_GenericKeyValue, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setDetectEmptyFields ( false ); 
3 DataExtractionModule. extractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setDetectEmptyFields ( false ); 
3 await  PDFNet.DataExtractionModule. extractData ( ' newsletter.pdf ' ,  ' newsletter_key_val_with_zones.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_GenericKeyValue, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options -> SetDetectEmptyFields ( false ); 
3 DataExtractionModule :: ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,  DataExtractionModule :: e_GenericKeyValue , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetDetectEmptyFields( False ) 
3 DataExtractionModule.ExtractData( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " , DataExtractionModule.e_GenericKeyValue, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetDetectEmptyFields ( false ) 
3 DataExtractionModule . ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,  DataExtractionModule :: E_GenericKeyValue , options) 
1 Dim  options  = New  DataExtractionOptions () 
2 options. SetDetectEmptyFields ( False ) 
3 DataExtractionModule. ExtractData ( " newsletter.pdf " ,  " newsletter_key_val_with_zones.json " ,DataExtractionModule.DataExtractionEngine.e_generic_key_value, options) 
Specifies the minimum confidence threshold for a class to be accepted in the DocClassification engine. The default is 0.25. Classes that don't meet the minimum threshold will not be listed in the output JSON.
This option only affects the DocClassification engine.
C# C++ Go Java JavaScript PHP Python Ruby VB 
1 DataExtractionOptions  options  = new  DataExtractionOptions (); 
2 options. SetMinimumConfidenceThreshold ( 0.7 ); 
3 DataExtractionModule. ExtractData ( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModule.DataExtractionEngine.e_doc_classification, options); 
1 DataExtractionOptions options; 
2 options. SetMinimumConfidenceThreshold ( 0.7 ); 
3 DataExtractionModule :: ExtractData ( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModule :: e_DocClassification,  & options); 
1 options  :=  NewDataExtractionOptions () 
2 options. SetMinimumConfidenceThreshold ( 0.7 ) 
3 DataExtractionModuleExtractData ( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModuleE_DocClassification, options) 
1 DataExtractionOptions  options  =  new  DataExtractionOptions (); 
2 options. setMinimumConfidenceThreshold ( 0.7 ); 
3 DataExtractionModule. extractData ( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModule.DataExtractionEngine.e_doc_classification, options); 
1 const  options  =  new  PDFNet.DataExtractionModule. DataExtractionOptions (); 
2 options. setMinimumConfidenceThreshold ( 0.7 ); 
3 await  PDFNet.DataExtractionModule. extractData ( ' Email.pdf ' ,  ' Email_Classified.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_DocClassification, options); 
1 $options  = new  DataExtractionOptions (); 
2 $options -> SetMinimumConfidenceThreshold ( 0.7 ); 
3 DataExtractionModule :: ExtractData ( " Email.pdf " ,  " Email_Classified.json " ,  DataExtractionModule :: e_DocClassification , $options); 
1 options  =  DataExtractionOptions() 
2 options.SetMinimumConfidenceThreshold( 0.7 ) 
3 DataExtractionModule.ExtractData( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModule.e_DocClassification, options) 
1 options =  DataExtractionOptions . new () 
2 options. SetMinimumConfidenceThreshold ( 0.7 ) 
3 DataExtractionModule . ExtractData ( " Email.pdf " ,  " Email_Classified.json " ,  DataExtractionModule :: E_DocClassification , options) 
1 Dim  options  = New  DataExtractionOptions () 
2 options. SetMinimumConfidenceThreshold ( 0.7 ) 
3 DataExtractionModule. ExtractData ( " Email.pdf " ,  " Email_Classified.json " , DataExtractionModule.DataExtractionEngine.e_doc_classification, options)