Requirements These packages are required to use these features in production. Trial keys have unlimited access to all features.
View DemoApryse's Document Structure Recognition engine helps you capture the visual and logical layout of a document. Unlike tabular extraction, this mode is designed to mimic how a human sees the page — recognizing paragraphs, lists, headers, footers, and images as distinct blocks.
It's ideal for use cases involving:
Accessibility tagging (e.g., reading order) Screen reading tools Document reconstruction Visual layout parsing The engine detects layout elements based on visual positioning, spacing, indentation, and structural boundaries. It separates:
Paragraphs and lists Headers and footers Section columns vs table columns Tables embedded inside paragraphs Images and graphical elements Specify the name of the input PDF file and the name of the output JSON file, then select the Doc Structure engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 DataExtractionModule. ExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule.DataExtractionEngine.e_doc_structure);
1 DataExtractionModule :: ExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule :: e_DocStructure);
1 DataExtractionModuleExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModuleE_DocStructure)
1 DataExtractionModule. extractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule.DataExtractionEngine.e_doc_structure);
1 await PDFNet.DataExtractionModule. extractData ( ' paragraphs_and_tables.pdf ' , ' paragraphs_and_tables.json ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
1 DataExtractionModule :: ExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule :: e_DocStructure );
1 DataExtractionModule.ExtractData( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule.e_DocStructure)
1 DataExtractionModule . ExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule :: E_DocStructure )
1 DataExtractionModule. ExtractData ( " paragraphs_and_tables.pdf " , " paragraphs_and_tables.json " , DataExtractionModule.DataExtractionEngine.e_doc_structure)
If you are going to parse the JSON right away, you may as well retrieve it as an in-memory string, instead of an external file.
Specify the name of the input PDF file, then select the Doc Structure engine:
C# C++ Go Java JavaScript PHP Python Ruby VB
1 string json = DataExtractionModule. ExtractData ( " tagged.pdf " , DataExtractionModule.DataExtractionEngine.e_doc_structure);
1 UString json = DataExtractionModule :: ExtractData ( " tagged.pdf " , DataExtractionModule :: e_DocStructure);
1 json := DataExtractionModuleExtractData ( " tagged.pdf " , DataExtractionModuleE_DocStructure).( string )
1 String json = DataExtractionModule. extractData ( " tagged.pdf " , DataExtractionModule.DataExtractionEngine.e_doc_structure);
1 const json = await PDFNet.DataExtractionModule. extractDataAsString ( ' tagged.pdf ' , PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
1 $json = DataExtractionModule :: ExtractData ( " tagged.pdf " , DataExtractionModule :: e_DocStructure );
1 json = DataExtractionModule.ExtractData( " tagged.pdf " , DataExtractionModule.e_DocStructure)
1 json = DataExtractionModule . ExtractData ( " tagged.pdf " , DataExtractionModule :: E_DocStructure )
1 Dim json As String = DataExtractionModule. ExtractData ( " tagged.pdf " , DataExtractionModule.DataExtractionEngine.e_doc_structure)
Select OCR Language
Password-Protected PDFs
Page Range
Deep Learning Assist