Some test text!
Core / Guides / Document RAG Example
The previous example works well with small documents, but issues may arise when confronted with larger documents or a large corpus of documents, for a couple of reasons:
What can we do about this? Here, we will introduce the concept of Retrieval-Augmented Generation (RAG) (we will also refer to Retrieval-Augmented Generators as RAGs, depending on context. These are systems that employ Retrieval-Augmented Generation). A RAG can be used to find relevant information to a query from a large corpus of context information. This relevant information can then be attached to the query as context, without needing to attach the entire document. To do this, we will expand a bit on the list of steps we provided in the previous example :
For the following example, we will use a very large document, the ISO_32000-2:2020 PDF standard, to demonstrate these techniques. This document is available for free download from Adobe. If you haven't already, please download it and place it at the following location: idp_rag_guide/data/pdf/PDF_ISO_32000-2.pdf.
To run the example, use the following command (with your virtual environment active, if using):
python3 ./iso32000_rag.py
You should see some text indicating progress, with a question and answer about the document appearing at the end. LLM's aren't guaranteed to produce identical output between runs, but you should see something similar to the following:
Extracting Document Structure from /home/matt/dev/idp-deep-learning/
document-summary-rag/idp_rag_guide/data/pdf/PDF_ISO_32000-2.pdf...
Extracted data to /home/matt/dev/idp-deep-learning/document-summary-rag/
idp_rag_guide/data/output/rag_example/PDF_ISO_32000-2/json/PDF_ISO_32000-2.json
Using bookmark tree to split the document into sections...
Generating HTML and Text representations for each section...
Generating embeddings for each section...
================================================================================
Question: What are the meanings of the numeric values used by the Tj Operator? For example, "[(He)20(ll)10(o Wo)10(rld)]TJ"?
Detected Context:
9.4.3 Text-showing operators
9.2.3 Achieving special graphical effects
9.2.4 Glyph positioning and metrics
Response: The numeric values used by the TJ operator in a text-showing command
like "[(He)20(ll)10(o Wo)10(rld)]TJ" represent adjustments to the text position
between the glyphs or strings of glyphs. According to Excerpt #1, each element
of the array passed to the TJ operator can be either a string or a number. If
the element is a string, the operator shows the string. If it is a number, the
operator adjusts the text position by that amount. This adjustment is a
translation of the text matrix, Tm, and the number is expressed in thousandths
of a unit of text space. The effect of this adjustment is to move the next
glyph painted either to the left or down by the given amount, depending on the
writing mode. In the default coordinate system, a positive adjustment moves the
next glyph to the left (in horizontal writing mode) by the amount specified.
Therefore, in the example "[(He)20(ll)10(o Wo)10(rld)]TJ":
- The "20" after "(He)" moves the next glyph ("ll") 20 thousandths of a unit of
text space to the left of where it would normally be placed.
- The "10" after "(ll)" moves the next glyph sequence "(o Wo)" 10 thousandths
of a unit of text space to the left of its standard position.
- Similarly, the "10" after "(o Wo)" adjusts the position of "(rld)" to the
left by 10 thousandths of a unit of text space from where it would otherwise be
positioned.
This mechanism allows for fine control over the spacing between glyphs or
groups of glyphs, enabling adjustments for kerning, aesthetic spacing, or other
typographic considerations.
For more details on how to build something like this yourself and a discussion of some of the decisions made for this example, see the Detailed Discussion .
Trial setup questions? Ask experts on Discord
Need other help? Contact Support
Pricing or product questions? Contact Sales