Apryse's Key-Value Extraction engine helps you automatically identify and extract meaningful key-value pairs from PDFs — even when the document is unstructured or doesn't contain form fields. Whether you're processing invoices, resumes, or complex reports, KVE saves hours of manual tagging by turning documents into structured JSON.
The engine scans each page for likely key terms (e.g., labels, field names, categories) and maps them to associated values (e.g., specific data, answers, identifiers). This hyponymic relationship allows KVE to structure data from real-world documents without templates or prior annotation.
Specify the name of the input PDF file and the name of the output JSON file, then select the Generic Key Value engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value);
1 DataExtractionModule :: ExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule :: e_GenericKeyValue);
1 DataExtractionModuleExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModuleE_GenericKeyValue)
1 DataExtractionModule. extractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value);
1 await PDFNet.DataExtractionModule. extractData ( " newsletter.pdf " , " newsletter.json " , PDFNet.DataExtractionModule.DataExtractionEngine.e_GenericKeyValue);
1 DataExtractionModule :: ExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule :: e_GenericKeyValue );
1 DataExtractionModule.ExtractData( " newsletter.pdf " , " newsletter.json " , DataExtractionModule.e_GenericKeyValue)
1 DataExtractionModule . ExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule :: E_GenericKeyValue )
1 DataExtractionModule. ExtractData ( " newsletter.pdf " , " newsletter.json " , DataExtractionModule.DataExtractionEngine.e_generic_key_value)
Specify the name of the input PDF file, then select the Generic Key Value engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 string json = DataExtractionModule. ExtractData ( " newsletter.pdf " , DataExtractionModule.DataExtractionEngine.e_generic_key_value);
1 UString json = DataExtractionModule :: ExtractData ( " newsletter.pdf " , DataExtractionModule :: e_GenericKeyValue);
1 json := DataExtractionModuleExtractData ( " newsletter.pdf " , DataExtractionModuleE_GenericKeyValue).( string )
1 String json = DataExtractionModule. extractData ( " newsletter.pdf " , DataExtractionModule.DataExtractionEngine.e_generic_key_value);
1 const json = await PDFNet.DataExtractionModule. extractDataAsString ( ' newsletter.pdf ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_GenericKeyValue);
1 $json = DataExtractionModule :: ExtractData ( " newsletter.pdf " , DataExtractionModule :: e_GenericKeyValue );
1 json = DataExtractionModule.ExtractData( " newsletter.pdf " , DataExtractionModule.e_GenericKeyValue)
1 json = DataExtractionModule . ExtractData ( " newsletter.pdf " , DataExtractionModule :: E_GenericKeyValue )
1 Dim json As String = DataExtractionModule. ExtractData ( " newsletter.pdf " , DataExtractionModule.DataExtractionEngine.e_generic_key_value)
Select OCR Language
Password-Protected PDFs
Page Range
Region of Interest
By default, DetectAndAddFormFieldsToPDF
uses the Form Field Detection engine . You can force the function to use the Form Field Key-Value Extraction engine using the "Form Extraction Engine" option.
C# C++ Go Java JavaScript PHP Python Ruby VB
1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionOptions options = new DataExtractionOptions ();
3 options. SetFormExtractionEngine ( " FormKeyValue " );
4 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc, options);
1 PDFDoc doc ( " formfields.pdf " );
2 DataExtractionOptions options;
3 options. SetFormExtractionEngine ( " FormKeyValue " );
4 DataExtractionModule :: DetectAndAddFormFieldsToPDF (doc, & options);
1 doc = NewPDFDoc ( " formfields.pdf " )
2 options := NewDataExtractionOptions ()
3 options. SetFormExtractionEngine ( " FormKeyValue " )
4 DataExtractionModuleDetectAndAddFormFieldsToPDF (doc, options)
1 PDFDoc doc = new PDFDoc ( " formfields.pdf " );
2 DataExtractionOptions options = new DataExtractionOptions ();
3 options. setFormExtractionEngine ( " FormKeyValue " );
4 DataExtractionModule. detectAndAddFormFieldsToPDF (doc, options);
1 const doc = await PDFNet.PDFDoc. createFromFilePath ( " formfields.pdf " );
2 const options = new PDFNet.DataExtractionModule. DataExtractionOptions ();
3 options. setFormExtractionEngine ( ' FormKeyValue ' );
4 await PDFNet.DataExtractionModule. detectAndAddFormFieldsToPDF (doc, options);
1 $doc = new PDFDoc ( " formfields.pdf " );
2 $options = new DataExtractionOptions ();
3 $options -> SetFormExtractionEngine ( " FormKeyValue " );
4 DataExtractionModule :: DetectAndAddFormFieldsToPDF ($doc, $options);
1 doc = PDFDoc( " formfields.pdf " )
2 options = DataExtractionOptions()
3 options.SetFormExtractionEngine( " FormKeyValue " )
4 DataExtractionModule.DetectAndAddFormFieldsToPDF(doc, options)
1 doc = PDFDoc . new ( " formfields.pdf " )
2 options = DataExtractionOptions . new ()
3 options. SetFormExtractionEngine ( " FormKeyValue " )
4 DataExtractionModule . DetectAndAddFormFieldsToPDF (doc, options)
1 Dim doc as PDFDoc = New PDFDoc ( " formfields.pdf " )
2 Dim options = New DataExtractionOptions ()
3 options. SetFormExtractionEngine ( " FormKeyValue " )
4 DataExtractionModule. DetectAndAddFormFieldsToPDF (doc, options)
NOTE This option only has an effect on the `DetectAndAddFormFieldsToPDF` function. Passing this option to `ExtractData` will have no effect, as the `engine` parameter will take precedence.