PDFTron pdfnet-node Class: WordOutputOptions

new WordOutputOptions()

A class containing options common to ToWord functions

Members

<static> SearchableImageSetting

Type:

number

Properties:

Name	Type	Description
`e_ocr_image_text`	number	Deprecated. OCR will be performed.
`e_ocr_image`	number	Deprecated. OCR will not be performed.
`e_ocr_text`	number	Indicates that OCR will be performed on scanned pages, and the recognized text replaces the image pixels underneath (default).
`e_ocr_off`	number	Indicates that OCR will not be performed.
`e_ocr_always`	number	Indicates that OCR will always be performed on all pages, and the recognized text replaces the image pixels underneath.

<static> WordOutputFormat

Type:

number

Properties:

Name	Type	Description
`e_wof_docx`	number
`e_wof_doc`	number	Deprecated
`e_wof_rtf`	number
`e_wof_txt`	number

Methods

setConnectHyphens(connect)

Specifies whether hyphens in the PDF should be connected. This only works with English words. Default is false.

Parameters:

Name	Type	Description
`connect`	boolean	if true, hyphens in the PDF will be connected.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setCustomOCRLanguage(ocrlang)

Specifies the custom OCR languages to use. Note: Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Parameters:

Name	Type	Description
`ocrlang`	string	the OCR language(s).

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setFootnotesSetting(option)

Specifies how Foonotes should be converted. Default is e_Recover, which will include them as footnotes.

Parameters:

Name	Type	Description
`option`	number	PDFNet.Convert.StructuredOutput.SectionConversionSetting = { e_Recover: 0, e_DoNotDetect: 1, e_DetectAndRemove: 2 } The footnotes setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setHeadersAndFootersSetting(option)

Specifies how header and footers should be converted. Default is e_Recover, which will include them as headers and footers.

Parameters:

Name	Type	Description
`option`	number	PDFNet.Convert.StructuredOutput.SectionConversionSetting = { e_Recover: 0, e_DoNotDetect: 1, e_DetectAndRemove: 2 } The header and footer setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setLanguage(language)

Specifies the OCR language. Default is automatic language detection. Note: This option is only available for e_reflow_paragraphs mode.

Parameters:

Name Type Description

Name	Type	Description
`language`	number	PDFNet.Convert.OutputOptionsOCR.LanguageChoice = { e_lang_auto: 0, e_lang_catalan: 1, e_lang_danish: 2, e_lang_german: 3, e_lang_english: 4, e_lang_spanish: 5, e_lang_finnish: 6, e_lang_french: 7, e_lang_italian: 8, e_lang_dutch: 9, e_lang_norwegian: 10, e_lang_portuguese: 11, e_lang_polish: 12, e_lang_romanian: 13, e_lang_russian: 14, e_lang_slovenian: 15, e_lang_swedish: 16, e_lang_turkish: 17 } the OCR language.

language

number

PDFNet.Convert.OutputOptionsOCR.LanguageChoice = {
	e_lang_auto: 0,
	e_lang_catalan: 1,
	e_lang_danish: 2,
	e_lang_german: 3,
	e_lang_english: 4,
	e_lang_spanish: 5,
	e_lang_finnish: 6,
	e_lang_french: 7,
	e_lang_italian: 8,
	e_lang_dutch: 9,
	e_lang_norwegian: 10,
	e_lang_portuguese: 11,
	e_lang_polish: 12,
	e_lang_romanian: 13,
	e_lang_russian: 14,
	e_lang_slovenian: 15,
	e_lang_swedish: 16,
	e_lang_turkish: 17
}

the OCR language.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setPages(page_from, page_to)

Specifies a range of pages to be converted. By default all pages are converted. The first page has the page number of 1.

Parameters:

Name	Type	Description
`page_from`	number	the first page to be converted.
`page_to`	number	the last page to be converted (inclusive). Use a negative value to specify the last page in the PDF.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setPDFPassword(password)

Specifies the password if the PDF requires one.

Parameters:

Name	Type	Description
`password`	string	the PDF password, if required; an empty string otherwise.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setPreferredOCREngine(engine)

Specifies the preferred OCR engine. Default is solidocr.

Parameters:

Name	Type	Description
`engine`	number	PDFNet.Convert.OutputOptionsOCR.PreferredOCREngine = { e_engine_default: 0, e_engine_tesseract: 1 } the OCR engine.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setPrioritizeVisualAppearance(replica)

Specifies whether to prefer an exact visual replica of the PDF at the expense of preventing reflow of document paragraphs. Default is false.

Parameters:

Name	Type	Description
`replica`	boolean	False is preferred for most documents that contain paragraphs. Consider using true for documents that don't flow, such as CAD drawings, Illustrator-generated files.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setSearchableImageSetting(setting)

Specifies how scanned image pages should be converted. Default is e_ocr_text.

Parameters:

Name	Type	Description
`setting`	number	PDFNet.Convert.WordOutputOptions.SearchableImageSetting = { e_ocr_text: 2, e_ocr_off: 3, e_ocr_always: 4 } the searchable image setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

setWordOutputFormat(format)

Specifies the output document format (DOCX, RTF, TXT). It is the most useful when the output file extension is not .docx, .rtf or .txt. Note: The DOC file format is now deprecated, DOCX is used automatically instead.

Parameters:

Name	Type	Description
`format`	number	PDFNet.Convert.WordOutputOptions.WordOutputFormat = { e_wof_docx: 0, e_wof_rtf: 2, e_wof_txt: 3 } the output document format (DOCX, RTF, TXT).

Returns:

this object, for call chaining

Type: PDFNet.Convert.WordOutputOptions

Class: WordOutputOptions

new WordOutputOptions()

Members

<static> SearchableImageSetting

Type:

Properties:

<static> WordOutputFormat

Type:

Properties:

Methods

setConnectHyphens(connect)

Parameters:

Returns:

setCustomOCRLanguage(ocrlang)

Parameters:

Returns:

setFootnotesSetting(option)

Parameters:

Returns:

setHeadersAndFootersSetting(option)

Parameters:

Returns:

setLanguage(language)

Parameters:

Returns:

setPages(page_from, page_to)

Parameters:

Returns:

setPDFPassword(password)

Parameters:

Returns:

setPreferredOCREngine(engine)

Parameters:

Returns:

setPrioritizeVisualAppearance(replica)

Parameters:

Returns:

setSearchableImageSetting(setting)

Parameters:

Returns:

setWordOutputFormat(format)

Parameters:

Returns:

Search results