Handwriting ICR to search PDFs and Extract Text - Python Sample Code

Requirements
View Demo

Sample code shows how to use the Apryse Server OCR module on scanned documents in multiple languages; provided in Python, C++, C# (.Net), Java, Node.js (JavaScript), PHP, Ruby and VB. The OCR module can make searchable PDFs and extract scanned text for further indexing.

Looking for OCR + WebViewer? Check out our OCR - Showcase Sample Code

Learn more about our Server SDK and OCR capabilities.

Implementation steps

To run this sample, you will need:

  1. Get started with Server SDK in your language/framework.
  2. Download ICR Module.
  3. Add the sample code provided below.

To use this feature in production, your license key will need the ICR Package. Trial keys already include this package.

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
3# Consult LICENSE.txt regarding license information.
4#---------------------------------------------------------------------------------------
5
6import site
7site.addsitedir("../../../PDFNetC/Lib")
8import sys
9from PDFNetPython import *
10
11sys.path.append("../../LicenseKey/PYTHON")
12from LicenseKey import *
13
14# Relative path to the folder containing test files.
15input_path = "../../TestFiles/HandwritingICR/"
16output_path = "../../TestFiles/Output/"
17
18def WriteTextToFile(outputFile, text):
19 # Write the contents of text to the disk
20 f = open(outputFile, "w")
21 try:
22 f.write(text)
23 finally:
24 f.close()
25
26# ---------------------------------------------------------------------------------------
27# The Handwriting ICR Module is an optional PDFNet add-on that can be used to extract
28# handwriting from image-based pages and apply them as hidden text.
29#
30# The Apryse SDK Handwriting ICR Module can be downloaded from https://dev.apryse.com/
31# --------------------------------------------------------------------------------------
32
33def main():
34
35 # The first step in every application using PDFNet is to initialize the
36 # library and set the path to common PDF resources. The library is usually
37 # initialized only once, but calling Initialize() multiple times is also fine.
38 PDFNet.Initialize(LicenseKey)
39
40 # The location of the Handwriting ICR Module
41 PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/")
42
43 # Test if the add-on is installed
44 if not HandwritingICRModule.IsModuleAvailable():
45
46 print("""
47 Unable to run HandwritingICRTest: Apryse SDK Handwriting ICR Module
48 not available.
49 ---------------------------------------------------------------
50 The Handwriting ICR Module is an optional add-on, available for download
51 at https://dev.apryse.com/. If you have already downloaded this
52 module, ensure that the SDK is able to find the required files
53 using the PDFNet.AddResourceSearchPath() function.""")
54
55 else:
56
57 # --------------------------------------------------------------------------------
58 # Example 1) Process a PDF without specifying options
59 print("Example 1: processing icr.pdf")
60
61 # Open the .pdf document
62 doc = PDFDoc(input_path + "icr.pdf")
63
64 # Run ICR on the .pdf with the default options
65 HandwritingICRModule.ProcessPDF(doc)
66
67 # Save the result with hidden text applied
68 doc.Save(output_path + "icr-simple.pdf", SDFDoc.e_linearized)
69 doc.Close()
70
71 # --------------------------------------------------------------------------------
72 # Example 2) Process a subset of PDF pages
73 print("Example 2: processing pages from icr.pdf")
74
75 # Open the .pdf document
76 doc = PDFDoc(input_path + "icr.pdf")
77
78 # Process handwriting with custom options
79 options = HandwritingICROptions()
80
81 # Optionally, process a subset of pages
82 options.SetPages("2-3")
83
84 # Run ICR on the .pdf
85 HandwritingICRModule.ProcessPDF(doc, options)
86
87 # Save the result with hidden text applied
88 doc.Save(output_path + "icr-pages.pdf", SDFDoc.e_linearized)
89 doc.Close()
90
91 # --------------------------------------------------------------------------------
92 # Example 3) Ignore zones specified for each page
93 print("Example 3: processing & ignoring zones")
94
95 # Open the .pdf document
96 doc = PDFDoc(input_path + "icr.pdf")
97
98 # Process handwriting with custom options
99 options = HandwritingICROptions()
100
101 # Process page 2 by ignoring the signature area on the bottom
102 options.SetPages("2")
103 ignore_zones_page2 = RectCollection()
104 # These coordinates are in PDF user space, with the origin at the bottom left corner of the page.
105 # Coordinates rotate with the page, if it has rotation applied.
106 ignore_zones_page2.AddRect(Rect(78, 850.1 - 770, 340, 850.1 - 676))
107 options.AddIgnoreZonesForPage(ignore_zones_page2, 2)
108
109 # Run ICR on the .pdf
110 HandwritingICRModule.ProcessPDF(doc, options)
111
112 # Save the result with hidden text applied
113 doc.Save(output_path + "icr-ignore.pdf", SDFDoc.e_linearized)
114 doc.Close()
115
116 # --------------------------------------------------------------------------------
117 # Example 4) The postprocessing workflow has also an option of extracting ICR results
118 # in JSON format, similar to the one used by the OCR Module
119 print("Example 4: extract & apply")
120
121 # Open the .pdf document
122 doc = PDFDoc(input_path + "icr.pdf")
123
124 # Extract ICR results in JSON format
125 json = HandwritingICRModule.GetICRJsonFromPDF(doc)
126 WriteTextToFile(output_path + "icr-get.json", json)
127
128 # Insert your post-processing step (whatever it might be)
129 # ...
130
131 # Apply potentially modified ICR JSON to the PDF
132 HandwritingICRModule.ApplyICRJsonToPDF(doc, json)
133
134 # Save the result with hidden text applied
135 doc.Save(output_path + "icr-get-apply.pdf", SDFDoc.e_linearized)
136 doc.Close()
137
138 print("Done.")
139
140 PDFNet.Terminate()
141
142
143if __name__ == '__main__':
144 main()
145

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales