Product:

Get started

Release notes

Migration Guides

What is WebViewer

DocumentViewer

Open/Save Document

Events

UI customization

Annotation

Collaboration

MS Office

DOCX Editor

Spreadsheet Editor

Conversion

PDF/A

Forms

Generate

Page manipulation

Edit page content

Extraction

Digital signature

Overview

Search using UI and API

Alternate PDF search

PDF Search without the WebViewer UI

Search and replace

Samples

APIs

Outlines/Bookmarks

Compare files

Optimization

Layers (OCGs)

Measurement

Redaction

Security

Portfolios

Low-level PDF API

Full API

WebViewer Server

Custom server

Best practices

Advanced

HTML

BIM

Video

Audio

Changelogs

Programmatic search without WebViewer

Using full API search

It's possible to load a document and search through its content without rendering the document in WebViewer. This can be helpful for performance as the document doesn't need to be rendered to search through it. This feature is provided by the TextSearch class and an example can be found below.

This guide uses the full API without a viewer.

JavaScript

1const main = async () => {
2  try {
3    const doc = await PDFNet.PDFDoc.createFromURL('PATH TO File/fileName.pdf');
4    doc.initSecurityHandler();
5    doc.lock();
6
7    const txtSearch = await PDFNet.TextSearch.create();
8    let searchMode = PDFNet.TextSearch.Mode;
9    let mode = PDFNet.TextSearch.Mode.e_whole_word | PDFNet.TextSearch.Mode.e_highlight;
10    // 'pattern' can be a regular express when using 'e_reg_expression' mode
11    let pattern = 'string to search';
12
13    txtSearch.begin(doc, pattern, mode);
14    let result = await txtSearch.run();
15
16    while (true) {
17      if (result.code === PDFNet.TextSearch.ResultCode.e_found) {
18        let highlights = result.highlights;
19        highlights.begin(doc);
20
21        while (await highlights.hasNext()) {
22          // 'highlights' will have multiple Quad objects if 'pattern' is on multiple lines
23          let quad = await highlights.getCurrentQuads();
24          await highlights.next();
25        }
26      } else if (result.code === PDFNet.TextSearch.ResultCode.e_page) {
27        console.log(`Finish searching page ${result.page_num}`);
28        // will only get 'result' for end of page if 'PDFNet.TextSearch.Mode.e_page_stop' was added to 'mode'
29      } else if (result.code === PDFNet.TextSearch.ResultCode.e_done) {
30        console.log(`Finish searching the document`);
31        // if 'run()' is called again, it'll return the same 'result' oject with 'result.code' of 'e_done'
32        break;
33      }
34
35      // It's possible to change the search pattern or mode while searching
36      // However any text or pages searched will not be searched again
37      // txtSearch.setMode(mode);
38      // txtSearch.setPattern('new string to search');
39
40      result = await txtSearch.run();
41    }
42  } catch (err) {
43    console.log(err);
44  }
45};
46
47PDFNet.runWithCleanup(main, 'YOUR_LICENSE_KEY');

Like other PDFNet code, start by using runWithCleanup to run the code. Afterwards, create new TextSearch and PDFDoc objects (in the above sample, we used createFromURL but other methods work as well). To start the search, call the begin method on the TextSearch object. begin takes in the following parameters:

doc: PDFDoc object of the document to search
search_pattern: text string or regex pattern to search
mode: A number that encodes the search options, generated by bitwise ORing options together
start_page: optional page number to start searching on. Defaults to 1
end_page: optional page number for when to stop searching. Defaults to last page

The mode input is a number used for controlling how the search behaves. It can be created by using the "|" bitwise OR operation on the desired modes to use. All the mode can be found on the PDFNet.TextSearch.Mode object, they are:

e_reg_expression: If set, treat the search pattern as a regular expression
e_case_sensitive: If set, the text searched must match case of the search pattern
e_whole_word: If set, only match whole words
e_search_up: If set, search from the last page of the document backwards to the first
e_page_stop: If set, will return a 'result' whenever a page has been searched
e_highlight: If set, will return the quads of found results
e_ambient_string: If set, will return text around the search pattern

After calling 'begin', calling run will begin searching the document. It'll return a promise that resolves to a 'result' object with the with the following properties

e_ambient_string: If using e_ambient_string mode, return characters surrounding the search pattern
code: a PDFNet.TextSearch.ResultCode indicating whether the result is from a search term being found or the text search finished searching through a page or the document
- e_found: search pattern found
- e_page: done searching a page
- e_done: done searching the whole document
highlights: a Highlights object
out_str: The string that matches the search term. Since sometimes case doesn't matter or regular expression could be use for searching, this could be different from the original search term
page_num: The page the result was found on

If using e_page_stop mode, run will return a result whenever it has finished searching a page. Otherwise, it'll only return results when a match has been found or if the document has finished searching. After the first search result is returned, keep on calling run to get the next result until the search is complete.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales

Product:

Product:

Programmatic search without WebViewer

Using full API search

JavaScript

Related Links

On this page