Search for text in a PDF in UWP

To search for text in a PDF using regular expression and then apply a link annotation on the highlighted result.

In this example, we add a link annotation but any other types of annotations can be applied here such as redaction annotations in the case of a search and redact workflow.

C#

1PDFDoc doc = new PDFDoc(filename);
2Int32Ref pageNumber = new Int32Ref(0);
3StringRef resultString = new StringRef();
4StringRef ambientString = new StringRef();
5Highlights highlights = new Highlights();
6TextSearch textSearch = new TextSearch();
7
8Int32 mode = (Int32)(TextSearchSearchMode.e_whole_word | TextSearchSearchMode.e_page_stop | TextSearchSearchMode.e_highlight);
9
10//use regular expression to find credit card number
11mode |= (Int32)(TextSearchSearchMode.e_reg_expression | TextSearchSearchMode.e_highlight);
12textSearch.SetMode(mode);
13String pattern = "\\d{4}-\\d{4}-\\d{4}-\\d{4}"; //or "(\\d{4}-){3}\\d{4}"
14textSearch.SetPattern(pattern);
15
16//call Begin() method to initialize the text search.
17textSearch.Begin(doc, pattern, mode, -1, -1);
18TextSearchResultCode code = textSearch.Run(pageNumber, resultString, ambientString, highlights);
19
20if (code == TextSearchResultCode.e_found)
21{
22 //add a link annotation based on the location of the found instance
23 hlts.Begin(doc);
24 while (hlts.HasNext())
25 {
26 Page cur_page = doc.GetPage(hlts.GetCurrentPageNumber());
27 double[] quads = hlts.GetCurrentQuads();
28 int quad_count = quads.Length / 8;
29 for (int i = 0; i < quad_count; ++i)
30 {
31 //assume each quad is an axis-aligned rectangle
32 int offset = 8 * i;
33 double x1 = Math.Min(Math.Min(Math.Min(quads[offset + 0], quads[offset + 2]), quads[offset + 4]), quads[offset + 6]);
34 double x2 = Math.Max(Math.Max(Math.Max(quads[offset + 0], quads[offset + 2]), quads[offset + 4]), quads[offset + 6]);
35 double y1 = Math.Min(Math.Min(Math.Min(quads[offset + 1], quads[offset + 3]), quads[offset + 5]), quads[offset + 7]);
36 double y2 = Math.Max(Math.Max(Math.Max(quads[offset + 1], quads[offset + 3]), quads[offset + 5]), quads[offset + 7]);
37
38 Annots.Link hyper_link = Annots.Link.Create(doc.GetSDFDoc(), new Rect(x1, y1, x2, y2), Action.CreateURI(doc.GetSDFDoc(), "http://www.apryse.com"));
39 hyper_link.RefreshAppearance();
40 cur_page.AnnotPushBack(hyper_link);
41 }
42 hlts.Next();
43 }
44}

Search PDF files for text
Full code sample which shows how to use TextSearch to search text on PDF pages using regular expressions.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales