Platforms
Frameworks
Languages
Platform Specifics
File format support
Apryse's Data Extraction Suite allows the programmatic inspection of unstructured PDF documents and detects various structural elements in an easy-to-process way.
We can name several use cases where content recognition brings added value to documents:
We offer three types of Data Extraction Modes:
Note: If you would prefer a Word, Excel, PowerPoint output for editing, viewing or printing, we would suggest our Office conversion APIs instead of document extraction. Unless your goal is to perform extensive spreadsheet calculations or data mining on the cells, in which case Tabular Data Extraction may suit you better.
For developers, system integrators, statisticians, machine learning engineers, JSON is probably the most suitable format. It is significantly easier to parse and iterate than Excel or even HTML. The JSON links back to the input PDF via page numbers and bounding box coordinates, which allows you to visualize the logical structure as annotation overlays on top of the PDF. You may choose to highlight certain entities or draw boxes around them.
The JSON also supplies a reading order for natural language processing or screen reading.
The data extraction functionality is implemented as an external module that can be downloaded from Data Extraction Module. It's currently offered for desktop and server Windows and Linux.
In evaluation mode, you are limited to processing no more than 6 pages in a single extraction operation.
In addition, an evaluation sheet is randomly inserted into the input document with the following text:
PDFTron Data Extraction trial mode. The trial is limited to 6 pages and will insert extra pages into the result (like this one).
This message will show up randomly in the JSON or Excel output.
Intelligent Data Extraction workflow
In this section, we showcase the potential Data Extraction workflow.
Set Up Apryse SDK Free Trial
New to Apryse? This guide will walk you through the steps to create your license key and begin creating your application.
Did you find this helpful?
Trial setup questions?
Ask experts on DiscordNeed other help?
Contact SupportPricing or product questions?
Contact Sales