Some test text!
Web / Guides / Programmatic Search without UI
It's possible to load a document and search through its content without rendering the document in WebViewer. This can be helpful for performance as the document doesn't need to be rendered to search through it. This feature is provided by the TextSearch
class and an example can be found below.
const main = async () => {
try {
const doc = await PDFNet.PDFDoc.createFromURL('PATH TO File/fileName.pdf');
doc.initSecurityHandler();
doc.lock();
const txtSearch = await PDFNet.TextSearch.create();
let searchMode = PDFNet.TextSearch.Mode;
let mode = PDFNet.TextSearch.Mode.e_whole_word | PDFNet.TextSearch.Mode.e_highlight;
// 'pattern' can be a regular express when using 'e_reg_expression' mode
let pattern = 'string to search';
txtSearch.begin(doc, pattern, mode);
let result = await txtSearch.run();
while (true) {
if (result.code === PDFNet.TextSearch.ResultCode.e_found) {
let highlights = result.highlights;
highlights.begin(doc);
while (await highlights.hasNext()) {
// 'highlights' will have multiple Quad objects if 'pattern' is on multiple lines
let quad = await highlights.getCurrentQuads();
await highlights.next();
}
} else if (result.code === PDFNet.TextSearch.ResultCode.e_page) {
console.log(`Finish searching page ${results.page_num}`);
// will only get 'result' for end of page if 'PDFNet.TextSearch.Mode.e_page_stop' was added to 'mode'
} else if (result.code === PDFNet.TextSearch.ResultCode.e_done) {
console.log(`Finish searching the document`);
// if 'run()' is called again, it'll return the same 'result' oject with 'result.code' of 'e_done'
break;
}
// It's possible to change the search pattern or mode while searching
// However any text or pages searched will not be searched again
// txtSearch.setMode(mode);
// txtSearch.setPattern('new string to search');
result = await txtSearch.run();
}
} catch (err) {
console.log(err);
}
};
PDFNet.runWithCleanup(main, 'Insert commercial license key here after purchase');
Like other PDFNet code, start by using runWithCleanup to run the code. Afterwards, create new TextSearch
and PDFDoc
objects (in the above sample, we used createFromURL but other methods work as well). To start the search, call the begin method on the TextSearch
object. begin
takes in the following parameters:
doc
: PDFDoc
object of the document to searchsearch_pattern
: text string or regex pattern to searchmode
: A number that encodes the search options, generated by bitwise ORing options togetherstart_page
: optional page number to start searching on. Defaults to 1end_page
: optional page number for when to stop searching. Defaults to last pageThe mode
input is a number used for controlling how the search behaves. It can be created by using the "|" bitwise OR operation on the desired modes to use. All the mode can be found on the PDFNet.TextSearch.Mode
object, they are:
e_reg_expression
: If set, treat the search pattern as a regular expressione_case_sensitive
: If set, the text searched must match case of the search patterne_whole_word
: If set, only match whole wordse_search_up
: If set, search from the last page of the document backwards to the firste_page_stop
: If set, will return a 'result' whenever a page has been searchede_highlight
: If set, will return the quads of found resultse_ambient_string
: If set, will return text around the search patternAfter calling 'begin', calling run will begin searching the document. It'll return a promise that resolves to a 'result' object with the with the following properties
e_ambient_string
: If using e_ambient_string
mode, return characters surrounding the search patterncode
: a PDFNet.TextSearch.ResultCode
indicating whether the result is from a search term being found or the text search finished searching through a page or the document
e_found
: search pattern founde_page
: done searching a pagee_done
: done searching the whole documenthighlights
: a Highlights objectout_str
: The string that matches the search term. Since sometimes case doesn't matter or regular expression could be use for searching, this could be different from the original search termpage_num
: The page the result was found onIf using e_page_stop
mode, run
will return a result whenever it has finished searching a page. Otherwise, it'll only return results when a match has been found or if the document has finished searching. After the first search result is returned, keep on calling run
to get the next result until the search is complete.
Trial setup questions? Ask experts on Discord
Need other help? Contact Support
Pricing or product questions? Contact Sales