Apryse's Form Field Identification engine helps you extract structured data from PDFs designed as forms, whether they're interactive or scanned. We currently offer 2 Form Field Identification Engines: "Form Field Detection" and "Form Field Key-Value Extraction".
Both engines require GLIBC 2.31 or newer on Linux, such as Debian 11 or Ubuntu 10.04 or newer
Detects likely form fields in scanned or static PDFs based on layout and spacing. Supported field types include:
Text fields Checkboxes Radio buttons (coming soon) Each detected field includes:
Field type (e.g., text, checkbox) Bounding box coordinates Confidence score In addition to detecting field positions, this engine attempts to match each field with a corresponding key (label) and value (user entry).
Field type Key text Value text Confidence Bounding box Specify the name of the input PDF file and the name of the output JSON file, then select the Form engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form);
1 DataExtractionModule :: ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: e_Form);
1 DataExtractionModuleExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModuleE_Form)
1 DataExtractionModule. extractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form);
1 await PDFNet.DataExtractionModule. extractData ( ' formfields-scanned.pdf ' , ' formfields-scanned.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);
1 DataExtractionModule :: ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: e_Form );
1 DataExtractionModule.ExtractData( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.e_Form)
1 DataExtractionModule . ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: E_Form )
1 DataExtractionModule. ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form)
Alternatively, you can select the Form Key-Value Extraction engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form_key_value);
1 DataExtractionModule :: ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: e_FormKeyValue);
1 DataExtractionModuleExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModuleE_FormKeyValue)
1 DataExtractionModule. extractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form_key_value);
1 await PDFNet.DataExtractionModule. extractData ( ' formfields-scanned.pdf ' , ' formfields-scanned.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_FormKeyValue);
1 DataExtractionModule :: ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: e_FormKeyValue );
1 DataExtractionModule.ExtractData( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.e_FormKeyValue)
1 DataExtractionModule . ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule :: E_FormKeyValue )
1 DataExtractionModule. ExtractData ( " formfields-scanned.pdf " , " formfields-scanned.json " , DataExtractionModule.DataExtractionEngine.e_form_key_value)
If you are going to parse the JSON right away, you may as well retrieve it as an in-memory string, instead of an external file.
Specify the name of the input PDF file, then select the Form engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 string json = DataExtractionModule. ExtractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form);
1 UString json = DataExtractionModule :: ExtractData ( " formfields.pdf " , DataExtractionModule :: e_Form);
1 json := DataExtractionModuleExtractData ( " formfields.pdf " , DataExtractionModuleE_Form).( string )
1 String json = DataExtractionModule. extractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form);
1 const json = await PDFNet.DataExtractionModule. extractDataAsString ( ' formfields.pdf ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Form);
1 $json = DataExtractionModule :: ExtractData ( " formfields.pdf " , DataExtractionModule :: e_Form );
1 json = DataExtractionModule.ExtractData( " formfields.pdf " , DataExtractionModule.e_Form)
1 json = DataExtractionModule . ExtractData ( " formfields.pdf " , DataExtractionModule :: E_Form )
1 Dim json As String = DataExtractionModule. ExtractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form)
Alternatively, you can select the Form Key-Value Extraction engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 string json = DataExtractionModule. ExtractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form_key_value);
1 UString json = DataExtractionModule :: ExtractData ( " formfields.pdf " , DataExtractionModule :: e_FormKeyValue);
1 json := DataExtractionModuleExtractData ( " formfields.pdf " , DataExtractionModuleE_FormKeyValue).( string )
1 String json = DataExtractionModule. extractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form_key_value);
1 const json = await PDFNet.DataExtractionModule. extractDataAsString ( ' formfields.pdf ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_FormKeyValue);
1 $json = DataExtractionModule :: ExtractData ( " formfields.pdf " , DataExtractionModule :: e_FormKeyValue );
1 json = DataExtractionModule.ExtractData( " formfields.pdf " , DataExtractionModule.e_FormKeyValue)
1 json = DataExtractionModule . ExtractData ( " formfields.pdf " , DataExtractionModule :: E_FormKeyValue )
1 Dim json As String = DataExtractionModule. ExtractData ( " formfields.pdf " , DataExtractionModule.DataExtractionEngine.e_form_key_value)
You can automatically add detected forms to a PDF in a single step.
Java 1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionModule. detectAndAddFormFieldsToPDF (doc);
C# C++ Go Java JavaScript PHP Python Ruby VB
1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc);
1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionModule. detectAndAddFormFieldsToPDF (doc);
1 doc := NewPDFDoc ( " formfields.pdf " )
2 DataExtractionModuleDetectAndAddFormFieldsToPDF (doc)
1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionModule. detectAndAddFormFieldsToPDF (doc);
1 const doc = await PDFNet.PDFDoc. createFromFilePath ( " formfields.pdf " );
2 await PDFNet.DataExtractionModule. detectAndAddFormFieldsToPDF (doc);
1 $doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionModule :: DetectAndAddFormFieldsToPDF ($doc);
1 doc = PDFDoc( " formfields.pdf " )
2 DataExtractionModule.DetectAndAddFormFieldsToPDF(doc)
1 doc = PDFDoc . new ( " formfields.pdf " )
2 DataExtractionModule . DetectAndAddFormFieldsToPDF (doc)
1 Dim doc as PDFDoc = New PDFDoc ( " formfields.pdf " )
2 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc)
Select OCR Language
Password-Protected PDFs
Page Range
Region of Interest
Use Form Field Detection for basic layout-based detection. Use Form Field Key-Value Extraction when you need semantic mapping (label-to-input).