Handwriting ICR to search PDFs and Extract Text - PHP Sample Code

Requirements
View Demo

Sample code shows how to use the Apryse Server OCR module on scanned documents in multiple languages; provided in Python, C++, C# (.Net), Java, Node.js (JavaScript), PHP, Ruby and VB. The OCR module can make searchable PDFs and extract scanned text for further indexing.

Looking for OCR + WebViewer? Check out our OCR - Showcase Sample Code

Learn more about our Server SDK and OCR capabilities.

Implementation steps

To run this sample, you will need:

  1. Get started with Server SDK in your language/framework.
  2. Download ICR Module.
  3. Add the sample code provided below.

To use this feature in production, your license key will need the ICR Package. Trial keys already include this package.

1<?php
2//---------------------------------------------------------------------------------------
3// Copyright (c) 2001-2026 by Apryse Software Inc. All Rights Reserved.
4// Consult LICENSE.txt regarding license information.
5//---------------------------------------------------------------------------------------
6if(file_exists("../../../PDFNetC/Lib/PDFNetPHP.php"))
7include("../../../PDFNetC/Lib/PDFNetPHP.php");
8include("../../LicenseKey/PHP/LicenseKey.php");
9
10// Relative path to the folder containing the test files.
11$input_path = getcwd()."/../../TestFiles/HandwritingICR/";
12$output_path = getcwd()."/../../TestFiles/Output/";
13
14function WriteTextToFile($outputFile, $text)
15{
16 $outfile = fopen($outputFile, "w");
17 fwrite($outfile, $text);
18 fclose($outfile);
19}
20
21//---------------------------------------------------------------------------------------
22// The Handwriting ICR Module is an optional PDFNet add-on that can be used to extract
23// handwriting from image-based pages and apply them as hidden text.
24//
25// The Apryse SDK Handwriting ICR Module can be downloaded from https://dev.apryse.com/
26//---------------------------------------------------------------------------------------
27
28 // The first step in every application using PDFNet is to initialize the
29 // library and set the path to common PDF resources. The library is usually
30 // initialized only once, but calling Initialize() multiple times is also fine.
31 PDFNet::Initialize($LicenseKey);
32 PDFNet::GetSystemFontList(); // Wait for fonts to be loaded if they haven't already. This is done because PHP can run into errors when shutting down if font loading is still in progress.
33
34 // The location of the Handwriting ICR Module
35 PDFNet::AddResourceSearchPath("../../../PDFNetC/Lib/");
36
37 // Test if the add-on is installed
38 if(!HandwritingICRModule::IsModuleAvailable()) {
39 echo "Unable to run HandwritingICRTest: PDFTron SDK Handwriting ICR Module\n
40 not available.\n
41 ---------------------------------------------------------------\n
42 The Handwriting ICR Module is an optional add-on, available for download\n
43 at https://dev.apryse.com/. If you have already downloaded this\n
44 module, ensure that the SDK is able to find the required files\n
45 using the PDFNet::AddResourceSearchPath() function.\n";
46 } else
47 {
48 //--------------------------------------------------------------------------------
49 // Example 1) Process a PDF without specifying options
50 echo "Example 1: processing icr.pdf\n";
51
52 // Open the .pdf document
53 $doc = new PDFDoc($input_path."icr.pdf");
54
55 // Run ICR on the .pdf with the default options
56 HandwritingICRModule::ProcessPDF($doc);
57
58 // Save the result with hidden text applied
59 $doc->Save($output_path."icr-simple.pdf", SDFDoc::e_linearized);
60 $doc->Close();
61
62 //--------------------------------------------------------------------------------
63 // Example 2) Process a subset of PDF pages
64 echo "Example 2: processing pages from icr.pdf\n";
65
66 // Open the .pdf document
67 $doc = new PDFDoc($input_path."icr.pdf");
68
69 // Process handwriting with custom options
70 $options = new HandwritingICROptions();
71
72 // Optionally, process a subset of pages
73 $options->SetPages("2-3");
74
75 // Run ICR on the .pdf
76 HandwritingICRModule::ProcessPDF($doc, $options);
77
78 // Save the result with hidden text applied
79 $doc->Save($output_path."icr-pages.pdf", SDFDoc::e_linearized);
80 $doc->Close();
81
82 //--------------------------------------------------------------------------------
83 // Example 3) Ignore zones specified for each page
84 echo "Example 3: processing & ignoring zones\n";
85
86 // Open the .pdf document
87 $doc = new PDFDoc($input_path."icr.pdf");
88
89 // Process handwriting with custom options
90 $options = new HandwritingICROptions();
91
92 // Process page 2 by ignoring the signature area on the bottom
93 $options->SetPages("2");
94 $ignore_zones_page2 = new RectCollection();
95 // These coordinates are in PDF user space, with the origin at the bottom left corner of the page.
96 // Coordinates rotate with the page, if it has rotation applied.
97 $rect = new Rect(78.0, 850.1 - 770.0, 340.0, 850.1 - 676.0);
98 $ignore_zones_page2->AddRect($rect);
99 $options->AddIgnoreZonesForPage($ignore_zones_page2, 2);
100
101 // Run ICR on the .pdf
102 HandwritingICRModule::ProcessPDF($doc, $options);
103
104 // Save the result with hidden text applied
105 $doc->Save($output_path."icr-ignore.pdf", SDFDoc::e_linearized);
106 $doc->Close();
107
108 //--------------------------------------------------------------------------------
109 // Example 4) The postprocessing workflow has also an option of extracting ICR results
110 // in JSON format, similar to the one used by the OCR Module
111 echo "Example 4: extract & apply\n";
112
113 // Open the .pdf document
114 $doc = new PDFDoc($input_path."icr.pdf");
115
116 // Extract ICR results in JSON format
117 $json = HandwritingICRModule::GetICRJsonFromPDF($doc);
118 WriteTextToFile($output_path."icr-get.json", $json);
119
120 // Insert your post-processing step (whatever it might be)
121 // ...
122
123 // Apply potentially modified ICR JSON to the PDF
124 HandwritingICRModule::ApplyICRJsonToPDF($doc, $json);
125
126 // Save the result with hidden text applied
127 $doc->Save($output_path."icr-get-apply.pdf", SDFDoc::e_linearized);
128 $doc->Close();
129
130 echo "Done.\n";
131 }
132 PDFNet::Terminate();
133
134?>

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales