Sanitize PDFs - Python Sample Code

This is sample code for using Apryse SDK to remove hidden, non-visual content within PDF documents. Using pdftron.PDF.Sanitizer ensures that if metadata, form data, bookmarks, hidden layers, markup annotations, JavaScript, or file attachments are present in a document, that content is permanently destroyed and is not simply disabled or obscured. Sample code is provided in Python, C++, C#, Java, Node.js (JavaScript), PHP, Ruby, and VB.

Implementation steps

To sanitize files with Apryse Server SDK:

Step 1: Follow get started with Server SDK in your preferred language or framework.
Step 2: Add the sample code provided in this guide.

Learn more about Apryse Server SDK.

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
3# Consult legal.txt regarding legal and license information.
4#---------------------------------------------------------------------------------------
5
6import site
7site.addsitedir("../../../PDFNetC/Lib")
8import sys
9from PDFNetPython import *
10
11sys.path.append("../../LicenseKey/PYTHON")
12from LicenseKey import *
13
14#------------------------------------------------------------------------------
15# PDFNet's Sanitizer is a security-focused feature that permanently removes
16# hidden, sensitive, or potentially unsafe content from a PDF document.
17# While redaction targets visible page content such as text or graphics,
18# sanitization focuses on non-visual elements and embedded structures.
19#
20# PDFNet Sanitizer ensures hidden or inactive content is destroyed,
21# not merely obscured or disabled. This prevents leakage of sensitive
22# data such as authoring details, editing history, private identifiers,
23# and residual form entries, and neutralizes scripts or attachments.
24#
25# Sanitization is recommended prior to external sharing with clients,
26# partners, or regulatory bodies. It helps align with privacy policies
27# and compliance requirements by permanently removing non-visual data.
28#------------------------------------------------------------------------------
29
30def main():
31 # Relative paths to folders containing test files.
32 input_path = "../../TestFiles/"
33 output_path = "../../TestFiles/Output/"
34
35 PDFNet.Initialize(LicenseKey)
36
37 # The following example illustrates how to retrieve the existing
38 # sanitizable content categories within a document.
39 try:
40 doc = PDFDoc(input_path + "numbered.pdf")
41 doc.InitSecurityHandler()
42
43 opts = Sanitizer.GetSanitizableContent(doc)
44 if opts.GetMetadata():
45 print("Document has metadata.")
46 if opts.GetMarkups():
47 print("Document has markups.")
48 if opts.GetHiddenLayers():
49 print("Document has hidden layers.")
50 print("Done...")
51 except Exception as e:
52 print(e)
53
54 # The following example illustrates how to sanitize a document with default options,
55 # which will remove all sanitizable content present within a document.
56 try:
57 doc = PDFDoc(input_path + "financial.pdf")
58 doc.InitSecurityHandler()
59
60 Sanitizer.SanitizeDocument(doc, None)
61 doc.Save(output_path + "financial_sanitized.pdf", SDFDoc.e_linearized)
62 print("Done...")
63 except Exception as e:
64 print(e)
65
66 # The following example illustrates how to sanitize a document with custom set options,
67 # which will only remove the content categories specified by the options object.
68 try:
69 options = SanitizeOptions()
70 options.SetMetadata(True)
71 options.SetFormData(True)
72 options.SetBookmarks(True)
73
74 doc = PDFDoc(input_path + "form1.pdf")
75 doc.InitSecurityHandler()
76
77 Sanitizer.SanitizeDocument(doc, options)
78 doc.Save(output_path + "form1_sanitized.pdf", SDFDoc.e_linearized)
79 print("Done...")
80 except Exception as e:
81 print(e)
82
83 PDFNet.Terminate()
84
85if __name__ == '__main__':
86 main()
87
88

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales