Product:

Get started

Release notes

Viewer

Basic operations

Learn more

Annotation

MS Office

Generate via template

Conversion

Smart Data Extraction

Augmenting LLMs with Smart Data Extraction

PDF/A

Accessibility

Forms

Create

Page manipulation

PDF Editing

OCR

Digital signature

Comparison

Bookmark

Optimization

Layer (OCG)

Redaction

Sanitization

Security

Portfolio

Low-level PDF API

Handwriting ICR

Overview

ICR Workflow

Samples

APIs

Changelogs

Handwriting Intelligent Character Recognition (ICR) workflows for the Apryse Server SDK

Requirements

Package: ICR

Module: ICR

This guide includes handwriting ICR workflows starting with the simplest use cases, then moving to more advanced use cases.

Process a scanned document

Make a searchable PDF by adding invisible text to an image-based PDF, such as a scanned document, using Handwriting ICR.

1PDFDoc doc(input_pdf_path);
2
3// Run ICR on the .pdf with the default options.
4HandwritingICRModule::ProcessPDF(doc);

1PDFDoc doc = new PDFDoc(input_pdf_path);
2
3// Run ICR on the .pdf with the default options.
4HandwritingICRModule.ProcessPDF(doc);

1doc := NewPDFDoc(input_pdf_path)
2// Run ICR on the .pdf with the default options.
3HandwritingICRModuleProcessPDF(doc)

1Using doc As PDFDoc = New PDFDoc(input_pdf_path)
2
3   ' Run ICR on the .pdf with the default options.
4   HandwritingICRModule.ProcessPDF(doc)
5      
6End Using

1PDFDoc doc = new PDFDoc(input_pdf_path);
2
3// Run ICR on the .pdf with the default options.
4HandwritingICRModule.processPDF(doc);

1async function main() {
2   const doc = await PDFNet.PDFDoc.createFromFilePath(input_pdf_path);
3
4   // Run ICR on the .pdf with the default options
5   await PDFNet.HandwritingICRModule.processPDF(doc);
6}
7PDFNet.runWithCleanup(main);

1PTPDFDoc * doc = [[PTPDFDoc alloc] initWithFilepath: [input_pdf_path]];
2
3// Run ICR on the .pdf with the default options
4[PTHandwritingICRModule ProcessPDF: doc options: nil];

1doc = PDFDoc(input_pdf_path)
2
3# Run ICR on the .pdf with the default options
4HandwritingICRModule.ProcessPDF(doc)

1$doc = new PDFDoc($input_pdf_path);
2
3// Run ICR on the .pdf with the default options
4HandwritingICRModule::ProcessPDF($doc);

1doc = PDFDoc.new(input_pdf_path)
2
3# Run ICR on the .pdf with the default options
4HandwritingICRModule.ProcessPDF(doc)

Full code sample to process a scanned document

We also have a full code sample to add searchable/selectable text to an image-based PDF, like a scanned document, which shows how to use the Apryse Handwriting ICR module on scanned documents in multiple programming languages. The Handwriting ICR module can make searchable PDFs and extract scanned text for further indexing. Samples are available in Python, C# (.Net), C++, Go, Java, Node.js (JavaScript), PHP, Ruby, VB, and Obj-C.

Extract handwritten text as JSON

If you want to apply raw ICR output to the input document, you can call HandwritingICRModule.ProcessPDF. However, it is likely that some post-processing will be beneficial, e.g., common spell checker or comparing results against white/blacklists. For this purpose, you can, first, extract text and corresponding metadata as JSON before re-applying the processed results to the input document.

1// Open the .pdf document.
2PDFDoc doc(input_path + "icr.pdf");
3
4// Extract ICR results in JSON format.
5UString json = HandwritingICRModule::GetICRJsonFromPDF(doc);
6
7// Post-processing step (whatever it might be) 
8
9// Re-apply results. 
10HandwritingICRModule::ApplyICRJsonToPDF(doc, json);

1// Open the .pdf document
2PDFDoc doc = new PDFDoc(input_pdf_path);
3
4// Extract ICR results in JSON format
5string json = HandwritingICRModule.GetICRJsonFromPDF(doc);
6
7// Post-processing step (whatever it might be) 
8
9// Re-apply results. 
10HandwritingICRModule.ApplyICRJsonToPDF(doc, json);

1doc := NewPDFDoc(input_pdf_path)
2json := HandwritingICRModuleGetICRJsonFromPDF(doc)
3// Post-processing step (whatever it might be)
4// Re-apply results. 
5HandwritingICRModuleApplyICRJsonToPDF(doc, json)

1' Open the .pdf document
2Using doc As PDFDoc = New PDFDoc(input_pdf_path)
3	' Extract ICR results in JSON format
4	 Dim json As String = HandwritingICRModule.GetICRJsonFromPDF(doc)
5
6	' Post-processing step (whatever it might be) 
7
8	' Re-apply results. 
9	HandwritingICRModule.ApplyICRJsonToPDF(doc, json)
10
11End Using

1// Open the .pdf document
2PDFDoc doc = new PDFDoc(input_pdf_path);
3
4// Extract ICR results in JSON format
5String json = HandwritingICRModule.getICRJsonFromPDF(doc);
6
7// Post-processing step (whatever it might be) 
8
9// Re-apply results. 
10HandwritingICRModule.applyICRJsonToPDF(doc, json);

1async function main() {
2   // Open the .pdf document
3   const doc = await PDFNet.PDFDoc.createFromFilePath(input_pdf_path);
4   
5   // Extract ICR results in JSON format
6   const json = await PDFNet.HandwritingICRModule.getICRJsonFromPDF(doc);
7
8   // Post-processing step (whatever it might be) 
9
10   // Re-apply results. 
11   await PDFNet.HandwritingICRModule.applyICRJsonToPDF(doc, json);
12}
13PDFNet.runWithCleanup(main);

1// Open the .pdf document
2PTPDFDoc * doc = [[PTPDFDoc alloc] initWithFilepath: input_pdf_path];
3
4// Extract OCR results as JSON
5NSString * json = [PTHandwritingICRModule GetICRJsonFromPDF: doc options: nil];
6
7// Post-processing step (whatever it might be) 
8
9// Re-apply results. 
10[PTHandwritingICRModule ApplyICRJsonToPDF: doc json: json];

1# Open the .pdf document
2doc = PDFDoc(input_pdf_path)
3
4# Extract OCR results as JSON
5json = HandwritingICRModule.GetICRJsonFromPDF(doc)
6
7# Post-processing step (whatever it might be) 
8
9# Re-apply results. 
10HandwritingICRModule.ApplyICRJsonToPDF(doc, json)

1// Open the .pdf document
2$doc = new PDFDoc($input_pdf_path);
3
4// Extract ICR results in JSON format
5$json = HandwritingICRModule::GetICRJsonFromPDF($doc);
6
7// Post-processing step (whatever it might be) 
8
9// Re-apply results. 
10HandwritingICRModule::ApplyICRJsonToPDF($doc, $json);

1# Open the .pdf document
2doc = PDFDoc.new(input_pdf_path)
3
4# Extract ICR results in JSON format
5json = HandwritingICRModule.GetICRJsonFromPDF(doc)
6
7# Post-processing step (whatever it might be) 
8
9# Re-apply results. 
10HandwritingICRModule.ApplyICRJsonToPDF(doc, json)

Output Attributes

ICR output consists of nested arrays:

Array of pages.
Array of paragraphs.
Array of lines.
Array of words.

Pages have additional metadata:

Attribute	Value	Description
num		page number
dpi		document resolution (needed to correctly scale the coordinates from points to pixels)
origin	TopLeft	coordinate system has origin at the top left corner (default)
	BottomLeft	coordinate system has origin at the bottom left corner (i.e., PDF page coordinate system)

Then, each word in the ICR output includes the following:

Attribute	Value	Description
x	bounding box lower left corner x coordinate
y	bounding box lower left corner y coordinate
length	length of bounding box
font-size	text's font size
text	text output
orientation	L	270 degrees clockwise rotation
	R	90 degrees clockwise rotation
	D	180 degrees clockwise rotation
	U	0 degrees clockwise rotation
Each line has an optional `box` property consisting of 4 values having the same interpretation as `pdftron::PDF::Rect`.

External ICR results

The API can also be used to apply ICR JSON generated by different OCR or ICR engines. The expected structure for input JSON is:

JSON

1{  
2   "Page":[  
3    	{  
4          "Word":[  
5              {  
6                  "font-size": 12,
7                  "length": 43,
8                  "text":"ABC",
9                  "x": 321,
10                  "y": 141
11              }
12         ],
13         "num": 1,
14         "dpi": 96,
15         "origin": "TopLeft"
16      	}
17   ]
18}

Note that the ICR structure is simplified and we're expecting an array of Page, with each page consisting of Word array. Each Word is described by its text content and 4 typographic point values (font-size="12" x="321" y="141" length="43" in the example above) needed to construct the bounding box for placement of text on a page.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales

Product:

Product:

Handwriting Intelligent Character Recognition (ICR) workflows for the Apryse Server SDK

Process a scanned document

Full code sample to process a scanned document

Extract handwritten text as JSON

Output Attributes

External ICR results

JSON

On this page