PDFTron pdfnet-node Class: OCROptions

Class: OCROptions

PDFNet .OCRModule. OCROptions

new OCROptions()

An object containing options for OCRModule functions

Methods

addDPI(value)

Knowing proper image resolution is important, as it enables the OCR engine to translate pixel heights of characters to their respective font sizes. We do our best to retrieve resolution information from the input's metadata, however it occasionally can be corrupt or missing. Hence we allow manual override of source's resolution, which supersedes any metadata found (both explicit as in image metadata and implicit as in PDF).

Parameters:

Name	Type	Description
`value`	number	image resolution

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

addIgnoreZonesForPage(regions, page_num)

Adds a collection of ignorable regions for the given page, an optional list of page areas not to be included in analysis

Parameters:

Name	Type	Description
`regions`	Array.<PDFNet.Rect>	the zones to be added to the ignore list
`page_num`	number	the page number the added regions belong to

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

addLang(lang)

Adds a language to the list of to be considered when processing this document

Parameters:

Name	Type	Description
`lang`	string	the new language to be added to the language list

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

addTextZonesForPage(regions, page_num)

Adds a collection of text regions of interest for the given page, an optional list of known text zones that will be used to improve OCR quality

Parameters:

Name	Type	Description
`regions`	Array.<PDFNet.Rect>	the zones to be added to the text region list
`page_num`	number	the page number the added regions belong to

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

setIgnoreExistingText(value)

Sets the value for IgnoreExistingText in the options object

Parameters:

Name	Type	Description
`value`	boolean	Default value is false, so that areas with existing text will be automatically skipped during OCR. Setting to true probably only makes sense when used with GetOCRJson/XML, as pre-existing text might end up being duplicated in the document when used with ImageToPDF and ProcessPDF.

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

setOCREngine(value)

Set the backend processing engine to use for OCR operations Options include 'default', 'any', or 'iris'. Chosen module must be present and correctly licensed.

Parameters:

Name	Type	Description
`value`	string	Options include 'default', 'any', or 'iris'. as pre-existing text might end up being duplicated in the document when used with ImageToPDF and ProcessPDF.

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

setUsePDFPageCoords(value)

Sets the value for UsePDFPageCoords in the options object Sets origin of the coordinate system for input/output

Parameters:

Name	Type	Description
`value`	boolean	if true, sets origin of the coordinate system for input/output to the bottom left corner and reverses the direction of y-coordinate axis from downward to upward, otherwise top left corner is used as the origin and the y-coordinate axis direction is downward

Returns:

this object, for call chaining

Type: PDFNet.OCRModule.OCROptions

Class: OCROptions

new OCROptions()

Methods

addDPI(value)

Parameters:

Returns:

addIgnoreZonesForPage(regions, page_num)

Parameters:

Returns:

addLang(lang)

Parameters:

Returns:

addTextZonesForPage(regions, page_num)

Parameters:

Returns:

setIgnoreExistingText(value)

Parameters:

Returns:

setOCREngine(value)

Parameters:

Returns:

setUsePDFPageCoords(value)

Parameters:

Returns:

Search results