Some test text!

Discord Logo

Chat with us

PDFTron is now Apryse, learn more here.

Cpp / Guides / Text search


PDFTron is now Apryse, learn more here.

Search for text in a PDF in C++

To search for text in a PDF using regular expression and then apply a link annotation on the highlighted result.

In this example, we add a link annotation but any other types of annotations can be applied here such as redaction annotations in the case of a search and redact workflow.
PDFDoc doc(filename);
TextSearch txt_search;
TextSearch::Mode mode = TextSearch::e_whole_word | TextSearch::e_page_stop;
UString pattern("");

//use regular expression to find credit card number
mode |= TextSearch::e_reg_expression | TextSearch::e_highlight;
pattern = "\\d{4}-\\d{4}-\\d{4}-\\d{4}"; //or "(\\d{4}-){3}\\d{4}"

//call Begin() method to initialize the text search.
txt_search.Begin( doc, pattern, mode );
SearchResult result = txt_search.Run();

if ( result )
  //add a link annotation based on the location of the found instance
  Highlights hlts = result.GetHighlights();
  while ( hlts.HasNext() )
    Page cur_page= doc.GetPage(hlts.GetCurrentPageNumber());
    const double *quads;
    int quad_count = hlts.GetCurrentQuads(quads);
    for ( int i = 0; i < quad_count; ++i )
      //assume each quad is an axis-aligned rectangle
      const double *q = &quads[8*i];
      double x1 = min(min(min(q[0], q[2]), q[4]), q[6]);
      double x2 = max(max(max(q[0], q[2]), q[4]), q[6]);
      double y1 = min(min(min(q[1], q[3]), q[5]), q[7]);
      double y2 = max(max(max(q[1], q[3]), q[5]), q[7]);
      Annots::Link hyper_link = Annots::Link::Create(doc, Rect(x1, y1, x2, y2), Action::CreateURI(doc, ""));

Search PDF files for text
Full code sample which shows how to use TextSearch to search text on PDF pages using regular expressions.

Get the answers you need: Support