Search and Redact PDF on Server/Desktop

This sample code searches a PDF document for all instances of a search pattern and redacts these instances.

JavaScript

1const { PDFNet } = require('@pdftron/pdfnet-node');
2const ApryseLicense = require('../LicenseKey/LicenseKey');
3
4const main = async () => {
5 const pattern = 'features'; // expression to search for
6 const redactText = 'Top Secret'; // text to show in place of redacted content. Can be empty string.
7
8 const doc = await PDFNet.PDFDoc.createFromURL(filename);
9 const txtSearch = await PDFNet.TextSearch.create();
10 const mode = PDFNet.TextSearch.Mode.e_whole_word + PDFNet.TextSearch.Mode.e_highlight; // Use whole word search
11
12 txtSearch.setMode(mode);
13 txtSearch.setPattern(pattern);
14
15 // call Begin() method to initialize the text search.
16 txtSearch.begin(doc, pattern, mode);
17 const redactions = []; // array to hold redaction objects
18 let result;
19 // loop to find all instances of the pattern
20 while ((result = await txtSearch.run()).code === PDFNet.TextSearch.ResultCode.e_found) {
21 // add a redaction object based on the location of the found instance
22 const highlights = result.highlights;
23 await highlights.begin(doc);
24 while (await highlights.hasNext()) {
25 const pageNumber = await highlights.getCurrentPageNumber();
26 const quadArr = await highlights.getCurrentQuads();
27 for (let i = 0; i < quadArr.length; ++i) {
28 const currQuad = quadArr[i];
29 const x1 = Math.min(Math.min(Math.min(currQuad.p1x, currQuad.p2x), currQuad.p3x), currQuad.p4x);
30 const x2 = Math.max(Math.max(Math.max(currQuad.p1x, currQuad.p2x), currQuad.p3x), currQuad.p4x);
31 const y1 = Math.min(Math.min(Math.min(currQuad.p1y, currQuad.p2y), currQuad.p3y), currQuad.p4y);
32 const y2 = Math.max(Math.max(Math.max(currQuad.p1y, currQuad.p2y), currQuad.p3y), currQuad.p4y);
33 redactions.push(await PDFNet.Redactor.redactionCreate(pageNumber, (await PDFNet.Rect.init(x1, y1, x2, y2)), false, redactText));
34 }
35 highlights.next();
36 }
37 }
38 const appearance = {};
39 appearance.redaction_overlay = true;
40 appearance.border = false;
41 appearance.positive_overlay_color = await PDFNet.ColorPt.init(1, 0.2, 0.2, 0); // red
42 appearance.show_redacted_content_regions = true;
43 PDFNet.Redactor.redact(doc, redactions, appearance, false, false);
44 doc.save('textsearch_redacted.pdf', PDFNet.SDFDoc.SaveOptions.e_linearized);
45}
46
47PDFNet.runWithCleanup(main, ApryseLicense.key) // provide your license key here
48 .catch(function (error) {
49 console.log('Error: ' + JSON.stringify(error));
50 })
51 .then(function () {
52 PDFNet.shutdown();
53 });
54

PDF redaction
Full code sample which shows how to use Apryse's PDFNet.Redactor to remove potentially sensitive content within PDF documents.

About redactor

Apryse Redactor makes sure that if a portion of an image, text, or vector graphics is contained in a redaction region, that portion of the image or path data is destroyed and is not hidden with clipping or image masks. Apryse SDK API can also be used to review and remove metadata and other content that can exist in a PDF document, including XML Forms Architecture (XFA) content and Extensible Metadata Platform (XMP) content.

The redaction process in Apryse SDK consists of two steps:

1. Content identification
A user applies redact annotations that specify the pieces or regions of content that should be removed. This example uses PDFNet.TextSearch to identify the content for redaction programmatically, but it can also be identified in other ways such as using PDFNet.TextExtractor, or interactively (e.g. using WebViewer). Up until the next step is performed, the user can see, move and redefine these annotations.

2. Content removal
Using PDFNet.Redactor.redact() the user instructs Apryse SDK to apply the redact regions, after which the content in the area specified by the redact annotations is removed. The redaction function includes a number of options to control the style of the redaction overlay (including color, text, font, border, transparency, etc.)

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales