Sanitize PDFs - Ruby Sample Code

This is sample code for using Apryse SDK to remove hidden, non-visual content within PDF documents. Using pdftron.PDF.Sanitizer ensures that if metadata, form data, bookmarks, hidden layers, markup annotations, JavaScript, or file attachments are present in a document, that content is permanently destroyed and is not simply disabled or obscured. Sample code is provided in Python, C++, C#, Java, Node.js (JavaScript), PHP, Ruby, and VB.

Implementation steps

To sanitize files with Apryse Server SDK:

Step 1: Follow get started with Server SDK in your preferred language or framework.
Step 2: Add the sample code provided in this guide.

Learn more about Apryse Server SDK.

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
3# Consult legal.txt regarding legal and license information.
4#---------------------------------------------------------------------------------------
5
6require '../../../PDFNetC/Lib/PDFNetRuby'
7include PDFNetRuby
8require '../../LicenseKey/RUBY/LicenseKey'
9
10$stdout.sync = true
11
12#------------------------------------------------------------------------------
13# PDFNet's Sanitizer is a security-focused feature that permanently removes
14# hidden, sensitive, or potentially unsafe content from a PDF document.
15# While redaction targets visible page content such as text or graphics,
16# sanitization focuses on non-visual elements and embedded structures.
17#
18# PDFNet Sanitizer ensures hidden or inactive content is destroyed,
19# not merely obscured or disabled. This prevents leakage of sensitive
20# data such as authoring details, editing history, private identifiers,
21# and residual form entries, and neutralizes scripts or attachments.
22#
23# Sanitization is recommended prior to external sharing with clients,
24# partners, or regulatory bodies. It helps align with privacy policies
25# and compliance requirements by permanently removing non-visual data.
26#------------------------------------------------------------------------------
27
28 # Relative paths to folders containing test files.
29 input_path = "../../TestFiles/"
30 output_path = "../../TestFiles/Output/"
31
32 PDFNet.Initialize(PDFTronLicense.Key)
33
34 # The following example illustrates how to retrieve the existing
35 # sanitizable content categories within a document.
36 begin
37 doc = PDFDoc.new(input_path + "numbered.pdf")
38 doc.InitSecurityHandler
39
40 opts = Sanitizer.GetSanitizableContent(doc)
41 if opts.GetMetadata
42 puts "Document has metadata."
43 end
44 if opts.GetMarkups
45 puts "Document has markups."
46 end
47 if opts.GetHiddenLayers
48 puts "Document has hidden layers."
49 end
50 puts "Done..."
51 rescue Exception => e
52 puts e
53 end
54
55 # The following example illustrates how to sanitize a document with default options,
56 # which will remove all sanitizable content present within a document.
57 begin
58 doc = PDFDoc.new(input_path + "financial.pdf")
59 doc.InitSecurityHandler
60
61 Sanitizer.SanitizeDocument(doc, nil)
62 doc.Save(output_path + "financial_sanitized.pdf", SDFDoc::E_linearized)
63 puts "Done..."
64 rescue Exception => e
65 puts e
66 end
67
68 # The following example illustrates how to sanitize a document with custom set options,
69 # which will only remove the content categories specified by the options object.
70 begin
71 options = SanitizeOptions.new
72 options.SetMetadata(true)
73 options.SetFormData(true)
74 options.SetBookmarks(true)
75
76 doc = PDFDoc.new(input_path + "form1.pdf")
77 doc.InitSecurityHandler
78
79 Sanitizer.SanitizeDocument(doc, options)
80 doc.Save(output_path + "form1_sanitized.pdf", SDFDoc::E_linearized)
81 puts "Done..."
82 rescue Exception => e
83 puts e
84 end
85
86 PDFNet.Terminate
87
88

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales