Handwriting ICR to search PDFs and Extract Text - Ruby Sample Code

Requirements
View Demo

Sample code shows how to use the Apryse Server OCR module on scanned documents in multiple languages; provided in Python, C++, C# (.Net), Java, Node.js (JavaScript), PHP, Ruby and VB. The OCR module can make searchable PDFs and extract scanned text for further indexing.

Looking for OCR + WebViewer? Check out our OCR - Showcase Sample Code

Learn more about our Server SDK and OCR capabilities.

Implementation steps

To run this sample, you will need:

  1. Get started with Server SDK in your language/framework.
  2. Download ICR Module.
  3. Add the sample code provided below.

To use this feature in production, your license key will need the ICR Package. Trial keys already include this package.

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
3# Consult LICENSE.txt regarding license information.
4#---------------------------------------------------------------------------------------
5
6require '../../../PDFNetC/Lib/PDFNetRuby'
7include PDFNetRuby
8require '../../LicenseKey/RUBY/LicenseKey'
9
10$stdout.sync = true
11
12# Relative path to the folder containing test files.
13$input_path = "../../TestFiles/HandwritingICR/"
14$output_path = "../../TestFiles/Output/"
15
16#---------------------------------------------------------------------------------------
17# The Handwriting ICR Module is an optional PDFNet add-on that can be used to extract
18# handwriting from image-based pages and apply them as hidden text.
19#
20# The Apryse SDK Handwriting ICR Module can be downloaded from https://dev.apryse.com/
21#---------------------------------------------------------------------------------------
22
23# The first step in every application using PDFNet is to initialize the
24# library and set the path to common PDF resources. The library is usually
25# initialized only once, but calling Initialize multiple times is also fine.
26PDFNet.Initialize(PDFTronLicense.Key)
27
28# The location of the Handwriting ICR Module
29PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/");
30
31begin
32
33 # Test if the add-on is installed
34 if !HandwritingICRModule.IsModuleAvailable
35 puts 'Unable to run HandwritingICRTest: Apryse SDK Handwriting ICR Module'
36 puts 'not available.'
37 puts '---------------------------------------------------------------'
38 puts 'The Handwriting ICR Module is an optional add-on, available for download'
39 puts 'at https://dev.apryse.com/. If you have already downloaded this'
40 puts 'module, ensure that the SDK is able to find the required files'
41 puts 'using the PDFNet.AddResourceSearchPath() function.'
42
43 else
44
45 # --------------------------------------------------------------------------------
46 # Example 1) Process a PDF without specifying options
47 puts "Example 1: processing icr.pdf"
48
49 # Open the .pdf document
50 doc = PDFDoc.new($input_path + "icr.pdf")
51
52 # Run ICR on the .pdf with the default options
53 HandwritingICRModule.ProcessPDF(doc)
54
55 # Save the result with hidden text applied
56 doc.Save($output_path + "icr-simple.pdf", SDFDoc::E_linearized)
57 doc.Close
58
59 # --------------------------------------------------------------------------------
60 # Example 2) Process a subset of PDF pages
61 puts "Example 2: processing pages from icr.pdf"
62
63 # Open the .pdf document
64 doc = PDFDoc.new($input_path + "icr.pdf")
65
66 # Process handwriting with custom options
67 options = HandwritingICROptions.new
68
69 # Optionally, process a subset of pages
70 options.SetPages("2-3")
71
72 # Run ICR on the .pdf
73 HandwritingICRModule.ProcessPDF(doc, options)
74
75 # Save the result with hidden text applied
76 doc.Save($output_path + "icr-pages.pdf", SDFDoc::E_linearized)
77 doc.Close
78
79 # --------------------------------------------------------------------------------
80 # Example 3) Ignore zones specified for each page
81 puts "Example 3: processing & ignoring zones"
82
83 # Open the .pdf document
84 doc = PDFDoc.new($input_path + "icr.pdf")
85
86 # Process handwriting with custom options
87 options = HandwritingICROptions.new
88
89 # Process page 2 by ignoring the signature area on the bottom
90 options.SetPages("2")
91 ignore_zones_page2 = RectCollection.new
92 # These coordinates are in PDF user space, with the origin at the bottom left corner of the page.
93 # Coordinates rotate with the page, if it has rotation applied.
94 ignore_zones_page2.AddRect(Rect.new(78, 850.1 - 770, 340, 850.1 - 676))
95 options.AddIgnoreZonesForPage(ignore_zones_page2, 2)
96
97 # Run ICR on the .pdf
98 HandwritingICRModule.ProcessPDF(doc, options)
99
100 # Save the result with hidden text applied
101 doc.Save($output_path + "icr-ignore.pdf", SDFDoc::E_linearized)
102 doc.Close
103
104 # --------------------------------------------------------------------------------
105 # Example 4) The postprocessing workflow has also an option of extracting ICR results
106 # in JSON format, similar to the one used by the OCR Module
107 puts "Example 4: extract & apply"
108
109 # Open the .pdf document
110 doc = PDFDoc.new($input_path + "icr.pdf")
111
112 # Extract ICR results in JSON format
113 json = HandwritingICRModule.GetICRJsonFromPDF(doc)
114 File.open($output_path + "icr-get.json", 'w') { |file| file.write(json) }
115
116 # Insert your post-processing step (whatever it might be)
117 # ...
118
119 # Apply potentially modified ICR JSON to the PDF
120 HandwritingICRModule.ApplyICRJsonToPDF(doc, json)
121
122 # Save the result with hidden text applied
123 doc.Save($output_path + "icr-get-apply.pdf", SDFDoc::E_linearized)
124 doc.Close
125
126 print("Done.")
127 end
128
129rescue => error
130 puts "Unable to extract handwriting, error: " + error.message
131end
132
133PDFNet.Terminate

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales