Programmatic search without WebViewer

It's possible to load a document and search through its content without rendering the document in WebViewer. This can be helpful for performance as the document doesn't need to be rendered to search through it. This feature is provided by the TextSearch class and an example can be found below.

This guide uses the full API without a viewer.

JavaScript

1const main = async () => {
2 try {
3 const doc = await PDFNet.PDFDoc.createFromURL('PATH TO File/fileName.pdf');
4 doc.initSecurityHandler();
5 doc.lock();
6
7 const txtSearch = await PDFNet.TextSearch.create();
8 let searchMode = PDFNet.TextSearch.Mode;
9 let mode = PDFNet.TextSearch.Mode.e_whole_word | PDFNet.TextSearch.Mode.e_highlight;
10 // 'pattern' can be a regular express when using 'e_reg_expression' mode
11 let pattern = 'string to search';
12
13 txtSearch.begin(doc, pattern, mode);
14 let result = await txtSearch.run();
15
16 while (true) {
17 if (result.code === PDFNet.TextSearch.ResultCode.e_found) {
18 let highlights = result.highlights;
19 highlights.begin(doc);
20
21 while (await highlights.hasNext()) {
22 // 'highlights' will have multiple Quad objects if 'pattern' is on multiple lines
23 let quad = await highlights.getCurrentQuads();
24 await highlights.next();
25 }
26 } else if (result.code === PDFNet.TextSearch.ResultCode.e_page) {
27 console.log(`Finish searching page ${results.page_num}`);
28 // will only get 'result' for end of page if 'PDFNet.TextSearch.Mode.e_page_stop' was added to 'mode'
29 } else if (result.code === PDFNet.TextSearch.ResultCode.e_done) {
30 console.log(`Finish searching the document`);
31 // if 'run()' is called again, it'll return the same 'result' oject with 'result.code' of 'e_done'
32 break;
33 }
34
35 // It's possible to change the search pattern or mode while searching
36 // However any text or pages searched will not be searched again
37 // txtSearch.setMode(mode);
38 // txtSearch.setPattern('new string to search');
39
40 result = await txtSearch.run();
41 }
42 } catch (err) {
43 console.log(err);
44 }
45};
46
47PDFNet.runWithCleanup(main, 'Insert commercial license key here after purchase');

Like other PDFNet code, start by using runWithCleanup to run the code. Afterwards, create new TextSearch and PDFDoc objects (in the above sample, we used createFromURL but other methods work as well). To start the search, call the begin method on the TextSearch object. begin takes in the following parameters:

  • doc: PDFDoc object of the document to search
  • search_pattern: text string or regex pattern to search
  • mode: A number that encodes the search options, generated by bitwise ORing options together
  • start_page: optional page number to start searching on. Defaults to 1
  • end_page: optional page number for when to stop searching. Defaults to last page

The mode input is a number used for controlling how the search behaves. It can be created by using the "|" bitwise OR operation on the desired modes to use. All the mode can be found on the PDFNet.TextSearch.Mode object, they are:

  • e_reg_expression: If set, treat the search pattern as a regular expression
  • e_case_sensitive: If set, the text searched must match case of the search pattern
  • e_whole_word: If set, only match whole words
  • e_search_up: If set, search from the last page of the document backwards to the first
  • e_page_stop: If set, will return a 'result' whenever a page has been searched
  • e_highlight: If set, will return the quads of found results
  • e_ambient_string: If set, will return text around the search pattern

After calling 'begin', calling run will begin searching the document. It'll return a promise that resolves to a 'result' object with the with the following properties

  • e_ambient_string: If using e_ambient_string mode, return characters surrounding the search pattern
  • code: a PDFNet.TextSearch.ResultCode indicating whether the result is from a search term being found or the text search finished searching through a page or the document
    • e_found: search pattern found
    • e_page: done searching a page
    • e_done: done searching the whole document
  • highlights: a Highlights object
  • out_str: The string that matches the search term. Since sometimes case doesn't matter or regular expression could be use for searching, this could be different from the original search term
  • page_num: The page the result was found on

If using e_page_stop mode, run will return a result whenever it has finished searching a page. Otherwise, it'll only return results when a match has been found or if the document has finished searching. After the first search result is returned, keep on calling run to get the next result until the search is complete.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales