Using Intelligent Data Extraction to Augment Contextual LLM Queries - Document Context Example

Example 1 - Document Context

The following is a simple example that shows how one might include document context with a query to an LLM, while leveraging information about the document structure contained in the PDF. The process can be broken down into a few steps:

  1. Extract document structure information using the Apryse Data Extraction Module
  2. Convert the document structure information to a more concise and recognizable format. We have chosen HTML, which works well with Open AI's GPT.
  3. Include the context and structure information in a query to the LLM.

To run the example, use the following command (with your virtual environment active, if using):

sh

1python3 ./doc_context.py

You should see some text indicating progress, with a question and answer about the document appearing at the end. LLM's aren't guaranteed to produce identical output between runs, but you should see something similar to the following:

sh

1Extracting Document Structure from <your-absolute-path>/doc_context_guide/data/
2 pdf/travel_expenses.pdf...
3Extracted data to <your-absolute-path>c/doc_context_guide/data/output/
4 doc_context_example/travel_expenses/json/travel_expenses.json
5
6================================================================================
7
8Question: How much did the employee spend on airfare?
9
10Answer: To calculate the total amount spent on airfare by the employee, we need
11to sum up all the expenses categorized under "Travel" that specifically mention
12flights. According to the expense report, these are the relevant entries:
13
141. Flight to Toronto, ON, Canada (03/17/2023 - 03/20/2023): $500.00
152. Flight to Boston, MA, USA (05/01/2023 - 05/02/2023): $400.75
163. Flight to Toronto, ON, Canada (06/07/2023 - 06/08/2023): $450.25
174. Flight to Miami, FL, USA (08/19/2023 - 08/24/2023): $600.35
18
19Adding these amounts together gives:
20
21$500.00 + $400.75 + $450.25 + $600.35 = $1951.35
22
23Therefore, the employee spent a total of $1951.35 on airfare.

Next Steps

Document RAG Example

In this section, we introduce the concept of Retrieval Augmented Generation (RAG), and show how you can break down larger documents into searchable chunks to use with your queries.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales