Handwriting ICR to search PDFs and Extract Text - C++ Sample Code

Requirements
View Demo

Sample code shows how to use the Apryse Server OCR module on scanned documents in multiple languages; provided in Python, C++, C# (.Net), Java, Node.js (JavaScript), PHP, Ruby and VB. The OCR module can make searchable PDFs and extract scanned text for further indexing.

Looking for OCR + WebViewer? Check out our OCR - Showcase Sample Code

Learn more about our Server SDK and OCR capabilities.

Implementation steps

To run this sample, you will need:

  1. Get started with Server SDK in your language/framework.
  2. Download ICR Module.
  3. Add the sample code provided below.

To use this feature in production, your license key will need the ICR Package. Trial keys already include this package.

1//---------------------------------------------------------------------------------------
2// Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
3// Consult legal.txt regarding legal and license information.
4//---------------------------------------------------------------------------------------
5#include <PDF/PDFNet.h>
6#include <PDF/PDFDoc.h>
7#include <PDF/HandwritingICRModule.h>
8#include <PDF/HandwritingICROptions.h>
9#include <iostream>
10#include <fstream>
11#include <vector>
12#include <string>
13#include "../../LicenseKey/CPP/LicenseKey.h"
14
15using namespace std;
16using namespace pdftron;
17using namespace PDF;
18using namespace SDF;
19
20static void WriteTextToFile(const std::string& filename, const UString& text)
21{
22 ofstream out_file(filename.c_str(), ofstream::binary);
23 string out_buf = text.ConvertToUtf8();
24 out_file.write(out_buf.c_str(), out_buf.size());
25 out_file.close();
26}
27
28//---------------------------------------------------------------------------------------
29// The Handwriting ICR Module is an optional PDFNet add-on that can be used to extract
30// handwriting from image-based pages and apply them as hidden text.
31//
32// The Apryse SDK Handwriting ICR Module can be downloaded from https://dev.apryse.com/
33//---------------------------------------------------------------------------------------
34int main(int argc, char *argv[])
35{
36 try
37 {
38 // The first step in every application using PDFNet is to initialize the
39 // library and set the path to common PDF resources. The library is usually
40 // initialized only once, but calling Initialize() multiple times is also fine.
41 PDFNet::Initialize(LicenseKey);
42
43 // The location of the Handwriting ICR Module
44 PDFNet::AddResourceSearchPath("../../../Lib/");
45
46 // Test if the add-on is installed
47 if (!HandwritingICRModule::IsModuleAvailable())
48 {
49 cout << endl;
50 cout << "Unable to run HandwritingICRTest: Apryse SDK Handwriting ICR Module" << endl;
51 cout << "not available." << endl;
52 cout << "---------------------------------------------------------------" << endl;
53 cout << "The Handwriting ICR Module is an optional add-on, available for download" << endl;
54 cout << "at https://dev.apryse.com/. If you have already downloaded this" << endl;
55 cout << "module, ensure that the SDK is able to find the required files" << endl;
56 cout << "using the PDFNet::AddResourceSearchPath() function." << endl << endl;
57 return 0;
58 }
59
60 // Relative path to the folder containing test files.
61 string input_path = "../../TestFiles/HandwritingICR/";
62 string output_path = "../../TestFiles/Output/";
63
64 //--------------------------------------------------------------------------------
65 // Example 1) Process a PDF without specifying options
66 try
67 {
68 cout << "Example 1: processing icr.pdf" << endl;
69
70 // Open the .pdf document
71 PDFDoc doc(input_path + "icr.pdf");
72
73 // Run ICR on the .pdf with the default options
74 HandwritingICRModule::ProcessPDF(doc);
75
76 // Save the result with hidden text applied
77 doc.Save(output_path + "icr-simple.pdf", SDFDoc::e_linearized);
78 }
79 catch (Common::Exception& e)
80 {
81 cout << e << endl;
82 }
83 catch (...)
84 {
85 cout << "Unknown Exception" << endl;
86 }
87
88 //--------------------------------------------------------------------------------
89 // Example 2) Process a subset of PDF pages
90 try
91 {
92 cout << "Example 2: processing pages from icr.pdf" << endl;
93
94 // Open the .pdf document
95 PDFDoc doc(input_path + "icr.pdf");
96
97 // Process handwriting with custom options
98 HandwritingICROptions options;
99
100 // Optionally, process a subset of pages
101 options.SetPages("2-3");
102
103 // Run ICR on the .pdf
104 HandwritingICRModule::ProcessPDF(doc, &options);
105
106 // Save the result with hidden text applied
107 doc.Save(output_path + "icr-pages.pdf", SDFDoc::e_linearized);
108 }
109 catch (Common::Exception& e)
110 {
111 cout << e << endl;
112 }
113 catch (...)
114 {
115 cout << "Unknown Exception" << endl;
116 }
117
118 //--------------------------------------------------------------------------------
119 // Example 3) Ignore zones specified for each page
120 try
121 {
122 cout << "Example 3: processing & ignoring zones" << endl;
123
124 // Open the .pdf document
125 PDFDoc doc(input_path + "icr.pdf");
126
127 // Process handwriting with custom options
128 HandwritingICROptions options;
129
130 // Process page 2 by ignoring the signature area on the bottom
131 options.SetPages("2");
132 RectCollection ignore_zones_page2;
133 // These coordinates are in PDF user space, with the origin at the bottom left corner of the page.
134 // Coordinates rotate with the page, if it has rotation applied.
135 ignore_zones_page2.AddRect(78, 850.1 - 770, 340, 850.1 - 676);
136 options.AddIgnoreZonesForPage(ignore_zones_page2, 2);
137
138 // Run ICR on the .pdf
139 HandwritingICRModule::ProcessPDF(doc, &options);
140
141 // Save the result with hidden text applied
142 doc.Save(output_path + "icr-ignore.pdf", SDFDoc::e_linearized);
143 }
144 catch (Common::Exception& e)
145 {
146 cout << e << endl;
147 }
148 catch (...)
149 {
150 cout << "Unknown Exception" << endl;
151 }
152
153 //--------------------------------------------------------------------------------
154 // Example 4) The postprocessing workflow has also an option of extracting ICR results
155 // in JSON format, similar to the one used by the OCR Module
156 try
157 {
158 cout << "Example 4: extract & apply" << endl;
159
160 // Open the .pdf document
161 PDFDoc doc(input_path + "icr.pdf");
162
163 // Extract ICR results in JSON format
164 UString json = HandwritingICRModule::GetICRJsonFromPDF(doc);
165 WriteTextToFile(output_path + "icr-get.json", json);
166
167 // Insert your post-processing step (whatever it might be)
168 // ...
169
170 // Apply potentially modified ICR JSON to the PDF
171 HandwritingICRModule::ApplyICRJsonToPDF(doc, json);
172
173 // Save the result with hidden text applied
174 doc.Save(output_path + "icr-get-apply.pdf", SDFDoc::e_linearized);
175 }
176 catch (Common::Exception& e)
177 {
178 cout << e << endl;
179 }
180 catch (...)
181 {
182 cout << "Unknown Exception" << endl;
183 }
184
185 cout << "Done." << endl;
186
187 PDFNet::Terminate();
188 }
189 catch (Common::Exception& e)
190 {
191 cout << e << endl;
192 }
193 catch (...)
194 {
195 cout << "Unknown Exception" << endl;
196 }
197
198 return 0;
199}

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales