Using Intelligent Data Extraction to Augment Contextual LLM Queries - Overview

Overview

Large-Language Models (LLMs) are a powerful tool for a wide range of tasks, including information retrieval, summarization, logical reasoning, and much more. A general limitation of LLMs is that their contextual knowledge is confined to the information available in their training data, meaning that their ability to answer questions about private data or events that occurred after their training will be limited. One workaround for this issue is to provide the contextual data that you are interested in within the query itself, giving the LLM access to the data that was not present in its training set. This guide will show you how to extract structured data from your documents using the Apryse Data Extraction Module, and will show you several techniques for providing this data as context to an LLM along with a query.

This guide will work with Open AI's Python API, although it could be modified to work with other platforms as well.

NOTE

Some of the operations contained within are long-running or cost money to perform. To save time and money, we use disk caching when appropriate. This allows you to modify the queries without needing to repeat all the other preprocessing. To clear the cache, you will need to delete the associated files.

Get Started

Setup

In this section, we show how to set up your environment to run the examples.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales