Apryse's Tabular Data Extraction engine transforms PDFs with tables into clean, structured outputs you can use in spreadsheets, analytics tools, or downstream systems. Whether you're processing invoices, reports, or research data, this engine helps you turn visual tables into machine-readable formats.
The engine detects the row and column structure across pages and consolidates all text into a structured table. It's designed to handle both native and scanned PDFs with a strong focus on numerical and tabular data.
You can export the data as:
JSON (ideal for programmatic use)Excel (XLSX) (ideal for business workflows)Specify the name of the input PDF file and the name of the output JSON file, then select the Tabular engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractData ( " table.pdf " , " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular);
1 DataExtractionModule :: ExtractData ( " table.pdf " , " table.json " , DataExtractionModule :: e_Tabular);
1 DataExtractionModuleExtractData ( " table.pdf " , " table.json " , DataExtractionModuleE_Tabular)
1 DataExtractionModule. extractData ( " table.pdf " , " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular);
1 await PDFNet.DataExtractionModule. extractData ( ' table.pdf ' , ' table.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular);
1 DataExtractionModule :: ExtractData ( " table.pdf " , " table.json " , DataExtractionModule :: e_Tabular );
1 DataExtractionModule.ExtractData( " table.pdf " , " table.json " , DataExtractionModule.e_Tabular)
1 DataExtractionModule . ExtractData ( " table.pdf " , " table.json " , DataExtractionModule :: E_Tabular )
1 DataExtractionModule. ExtractData ( " table.pdf " , " table.json " , DataExtractionModule.DataExtractionEngine.e_tabular)
If you are going to parse the JSON right away, you may as well retrieve it as an in-memory string, instead of an external file.
Specify the name of the input PDF file, then select the Tabular engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 string json = DataExtractionModule. ExtractData ( " financial.pdf " , DataExtractionModule.DataExtractionEngine.e_tabular);
1 UString json = DataExtractionModule :: ExtractData ( " financial.pdf " , DataExtractionModule :: e_Tabular);
1 json := DataExtractionModuleExtractData ( " financial.pdf " , DataExtractionModuleE_Tabular).( string )
1 String json = DataExtractionModule. extractData ( " financial.pdf " , DataExtractionModule.DataExtractionEngine.e_tabular);
1 const json = await PDFNet.DataExtractionModule. extractDataAsString ( ' financial.pdf ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_Tabular);
1 $json = DataExtractionModule :: ExtractData ( " financial.pdf " , DataExtractionModule :: e_Tabular );
1 json = DataExtractionModule.ExtractData( " financial.pdf " , DataExtractionModule.e_Tabular)
1 json = DataExtractionModule . ExtractData ( " financial.pdf " , DataExtractionModule :: E_Tabular )
1 Dim json As String = DataExtractionModule. ExtractData ( " financial.pdf " , DataExtractionModule.DataExtractionEngine.e_tabular)
Specify the name of the input PDF file and the name of the output XLSX file:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractToXLSX ( " table.pdf " , " table.xlsx " );
1 DataExtractionModule :: ExtractToXLSX ( " table.pdf " , " table.xlsx " );
1 DataExtractionModuleExtractToXLSX ( " table.pdf " , " table.xlsx " )
1 DataExtractionModule. extractToXLSX ( " table.pdf " , " table.xlsx " );
1 await PDFNet.DataExtractionModule. extractToXLSX ( ' table.pdf ' , ' table.xlsx ' );
1 DataExtractionModule :: ExtractToXLSX ( " table.pdf " , " table.xlsx " );
1 DataExtractionModule.ExtractToXLSX( " table.pdf " , " table.xlsx " )
1 DataExtractionModule . ExtractToXLSX ( " table.pdf " , " table.xlsx " )
1 DataExtractionModule. ExtractToXLSX ( " table.pdf " , " table.xlsx " )
Specify the name of the input PDF file and an output filter, such as MemoryFilter:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 MemoryFilter output_xlsx_stream = new MemoryFilter ( 0 , false );
2 DataExtractionModule. ExtractToXLSX ( " financial.pdf " , output_xlsx_stream);
1 MemoryFilter output_xlsx_stream ( 0 , false );
2 DataExtractionModule :: ExtractToXLSX ( " financial.pdf " , output_xlsx_stream);
1 outputXlsxStream := NewMemoryFilter ( 0 , false )
2 DataExtractionModuleExtractToXLSX ( " financial.pdf " , outputXlsxStream)
1 MemoryFilter output_xlsx_stream = new MemoryFilter ( 0 , false );
2 DataExtractionModule. extractToXLSX ( " financial.pdf " , output_xlsx_stream);
1 const outputXlsxStream = PDFNet.Filters. MemoryFilter ( 0 , false );
2 await PDFNet.DataExtractionModule. extractToXLSX ( ' financial.pdf ' , outputXlsxStream);
1 $outputXlsxStream = new MemoryFilter ( 0 , false );
2 DataExtractionModule :: ExtractToXLSX ( " financial.pdf " , $outputXlsxStream);
1 outputXlsxStream = Filters.MemoryFilter( 0 , False )
2 DataExtractionModule.ExtractToXLSX( " financial.pdf " , outputXlsxStream)
1 outputXlsxStream = Filters . MemoryFilter . new ( 0 , false )
2 DataExtractionModule . ExtractToXLSX ( " financial.pdf " , outputXlsxStream)
1 Dim output_xlsx_stream As MemoryFilter = New MemoryFilter ( 0 , False )
2 DataExtractionModule. ExtractToXLSX ( " financial.pdf " , output_xlsx_stream)
Select OCR Language
Password-Protected PDFs
Page Range
Financial statements Invoices and billing reports Research tables Survey exports Any document where tabular data is the core structure