Form Field Identification

Form Field Identification

Apryse's Form Field Identification engine helps you extract structured data from PDFs designed as forms, whether they're interactive or scanned. We currently offer 2 Form Field Identification Engines: "Form Field Detection" and "Form Field Key-Value Extraction".

Both engines require GLIBC 2.31 or newer on Linux, such as Debian 11 or Ubuntu 10.04 or newer

Form Field Detection Engine

Detects likely form fields in scanned or static PDFs based on layout and spacing. Supported field types include:

  • Text fields
  • Checkboxes
  • Radio buttons (coming soon)

Output:

Each detected field includes:

  • Field type (e.g., text, checkbox)
  • Bounding box coordinates
  • Confidence score

Form Field Key-Value Extraction Engine

In addition to detecting field positions, this engine attempts to match each field with a corresponding key (label) and value (user entry).

Output:

  • Field type
  • Key text
  • Value text
  • Confidence
  • Bounding box

Extract form fields as JSON file

Specify the name of the input PDF file and the name of the output JSON file, then select the Form engine:

1DataExtractionModule.ExtractData("formfields-scanned.pdf", "formfields-scanned.json", DataExtractionModule.DataExtractionEngine.e_form);

Alternatively, you can select the Form Key-Value Extraction engine:

1DataExtractionModule.ExtractData("formfields-scanned.pdf", "formfields-scanned.json", DataExtractionModule.DataExtractionEngine.e_form_key_value);

Extract form fields as JSON string

If you are going to parse the JSON right away, you may as well retrieve it as an in-memory string, instead of an external file.

Specify the name of the input PDF file, then select the Form engine:

1string json = DataExtractionModule.ExtractData("formfields.pdf", DataExtractionModule.DataExtractionEngine.e_form);

Alternatively, you can select the Form Key-Value Extraction engine:

1string json = DataExtractionModule.ExtractData("formfields.pdf", DataExtractionModule.DataExtractionEngine.e_form_key_value);

Extract form fields and add to PDF

You can automatically add detected forms to a PDF in a single step.

Java

1PDFDoc doc = new PDFDoc("formfields.pdf");
2DataExtractionModule.detectAndAddFormFieldsToPDF(doc);
1PDFDoc doc = new PDFDoc("formfields.pdf");
2DataExtractionModule.DetectAndAddFormFieldsToPDF(doc);

Additional Options:

Select OCR Language

Password-Protected PDFs

Page Range

Region of Interest

Best Practices

  • Use Form Field Detection for basic layout-based detection.
  • Use Form Field Key-Value Extraction when you need semantic mapping (label-to-input).

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales