Tabular Data Extraction

Apryse's Tabular Data Extraction engine transforms PDFs with tables into clean, structured outputs you can use in spreadsheets, analytics tools, or downstream systems. Whether you're processing invoices, reports, or research data, this engine helps you turn visual tables into machine-readable formats.

How It Works

The engine detects the row and column structure across pages and consolidates all text into a structured table. It's designed to handle both native and scanned PDFs with a strong focus on numerical and tabular data.

You can export the data as:

  • JSON (ideal for programmatic use)
  • Excel (XLSX) (ideal for business workflows)

Extract tabular data as JSON file

Specify the name of the input PDF file and the name of the output JSON file, then select the Tabular engine:

1DataExtractionModule.ExtractData("table.pdf", "table.json", DataExtractionModule.DataExtractionEngine.e_tabular);

Extract tabular data as JSON string

If you are going to parse the JSON right away, you may as well retrieve it as an in-memory string, instead of an external file.

Specify the name of the input PDF file, then select the Tabular engine:

1string json = DataExtractionModule.ExtractData("financial.pdf", DataExtractionModule.DataExtractionEngine.e_tabular);

Extract tabular data as Excel file

Specify the name of the input PDF file and the name of the output XLSX file:

1DataExtractionModule.ExtractToXLSX("table.pdf", "table.xlsx");

Extract tabular data as Excel stream

Specify the name of the input PDF file and an output filter, such as MemoryFilter:

1MemoryFilter output_xlsx_stream = new MemoryFilter(0, false);
2DataExtractionModule.ExtractToXLSX("financial.pdf", output_xlsx_stream);

Optional Configuration

Select OCR Language

Password-Protected PDFs

Page Range

Best Use Cases

  • Financial statements
  • Invoices and billing reports
  • Research tables
  • Survey exports
  • Any document where tabular data is the core structure

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales