Product:

Get started

Viewer

Basic operations

Learn more

Annotation

MS Office

Generate via template

Conversion

Intelligent Document Processing

Barcode extraction

Augmenting LLMs with Intelligent Data Extraction

Overview

Setup

Document Context Example

Document RAG Example

Detailed Discussion

PDF/A

Accessibility

Forms

Create

Page manipulation

PDF Editing

Extraction

OCR

Digital signature

Comparison

Bookmark

Optimization

Layer (OCG)

Redaction

Security

Portfolio

Low-level PDF API

Changelogs

Using Intelligent Data Extraction to Augment Contextual LLM Queries - Document Context Example

Example 1 - Document Context

The following is a simple example that shows how one might include document context with a query to an LLM, while leveraging information about the document structure contained in the PDF. The process can be broken down into a few steps:

Extract document structure information using the Apryse Data Extraction Module
Convert the document structure information to a more concise and recognizable format. We have chosen HTML, which works well with Open AI's GPT.
Include the context and structure information in a query to the LLM.

To run the example, use the following command (with your virtual environment active, if using):

sh

1python3 ./doc_context.py

You should see some text indicating progress, with a question and answer about the document appearing at the end. LLM's aren't guaranteed to produce identical output between runs, but you should see something similar to the following:

sh

1Extracting Document Structure from <your-absolute-path>/doc_context_guide/data/
2    pdf/travel_expenses.pdf...
3Extracted data to <your-absolute-path>c/doc_context_guide/data/output/
4    doc_context_example/travel_expenses/json/travel_expenses.json
5
6================================================================================
7
8Question: How much did the employee spend on airfare?
9
10Answer: To calculate the total amount spent on airfare by the employee, we need 
11to sum up all the expenses categorized under "Travel" that specifically mention 
12flights. According to the expense report, these are the relevant entries:
13
141. Flight to Toronto, ON, Canada (03/17/2023 - 03/20/2023): $500.00
152. Flight to Boston, MA, USA (05/01/2023 - 05/02/2023): $400.75
163. Flight to Toronto, ON, Canada (06/07/2023 - 06/08/2023): $450.25
174. Flight to Miami, FL, USA (08/19/2023 - 08/24/2023): $600.35
18
19Adding these amounts together gives:
20
21$500.00 + $400.75 + $450.25 + $600.35 = $1951.35
22
23Therefore, the employee spent a total of $1951.35 on airfare.

Next Steps

Document RAG Example

In this section, we introduce the concept of Retrieval Augmented Generation (RAG), and show how you can break down larger documents into searchable chunks to use with your queries.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales

Product:

Product:

Using Intelligent Data Extraction to Augment Contextual LLM Queries - Document Context Example

Example 1 - Document Context

sh

sh

Next Steps

On this page