Using Intelligent Data Extraction to Augment Contextual LLM Queries - Setup for Linux and Windows

Setup

To run the provided examples, you will need to do some initial setup. All commands provided should be run from within the idp_rag_guide folder, unless otherwise stated. We'll ## assume you already have Python version >= 3.5 installed (If not, see instructions here). Perform the following setup steps:

1. Download Sample Code

Download the the sample code and unpack it. You should see an idp_rag_guide folder with the following structure:

sh

1idp_rag_guide/
2├── data
3└── pdf
4└── travel_expenses.pdf
5├── doc_context.py
6├── idp_rag_utils
7├── bookmark_utils.py
8├── document_structure.py
9└── __init__.py
10├── iso32000_rag.py
11└── requirements.txt

2. Obtain an Apryse SDK license.

Your license should include our IDP offerings. If you don't already have one, you can request a demo key. A demo key will be sufficient to run the simpler Document Context Example, but won't be able to run the more complex Document RAG Example.

Get your Apryse trial key:

License Key

Apryse collects some data regarding your usage of the SDK for product improvement.

If you wish to continue without data collection, contact us and we will email you a no-tracking trial key for you to get started.

3. Obtain an Open AI API Key.

Running the code included in this guide will make requests to Open AI that are not free, so you will need a funded account.

4. Install the required Python modules.

We will do so in a virtual environment.

Bash

1python3 -m venv idp-venv
2source idp-venv/bin/activate
3python3 -m pip install -r requirements.txt

5. Export license keys

Export your Apryse SDK license key and Open AI API key as environment variables:

Bash

1export OPENAI_API_KEY=<your-api-key>
2export APRYSE_SDK_LICENSE_KEY=<your-license-key>

6. Download

Download the Structured Output Module from Apryse. If you don't already have this installed, you can download it as follows:

sh

1New-Item -ItemType Directory -Force -Path apryse_sdk_modules
2Set-Location apryse_sdk_modules
3Invoke-WebRequest -Uri https://www.pdftron.com/downloads/StructuredOutputModuleWindows.zip -OutFile StructuredOutputModuleWindows.zip
4tar -xf StructuredOutputModuleWindows.zip
5Remove-Item StructuredOutputModuleWindows.zip
6Set-Location ..

7. Optional download

If you plan on running the Document RAG example, you should also download the ISO_32000-2 PDF standard, which is used in this sample, and is available for free download from Adobe. Place the downloaded file at the following location: idp_rag_guide/data/pdf/PDF_ISO_32000-2.pdf

You should now be ready to run the examples.

Next Steps

Document Context Example

In this section, we show how to run a simple example that demonstrates how to attach contextual information from a document to your queries.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales