Class: WordOutputOptions

PDFNet.Convert. WordOutputOptions


new WordOutputOptions()

A class containing options common to ToWord functions

Members


<static> SearchableImageSetting

Type:
  • number
Properties:
Name Type Description
e_ocr_image_text number Deprecated. OCR will be performed.
e_ocr_image number Deprecated. OCR will not be performed.
e_ocr_text number Indicates that OCR will be performed on scanned pages, and the recognized text replaces the image pixels underneath (default).
e_ocr_off number Indicates that OCR will not be performed.
e_ocr_always number Indicates that OCR will always be performed on all pages, and the recognized text replaces the image pixels underneath.

<static> WordOutputFormat

Type:
  • number
Properties:
Name Type Description
e_wof_docx number
e_wof_doc number Deprecated
e_wof_rtf number
e_wof_txt number

Methods


setConnectHyphens(connect)

Specifies whether hyphens in the PDF should be connected. This only works with English words. Default is false.
Parameters:
Name Type Description
connect boolean if true, hyphens in the PDF will be connected.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setCustomOCRLanguage(ocrlang)

Specifies the custom OCR languages to use. Note: Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.
Parameters:
Name Type Description
ocrlang string the OCR language(s).
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setLanguage(language)

Specifies the OCR language. Default is automatic language detection. Note: This option is only available for e_reflow_paragraphs mode.
Parameters:
Name Type Description
language number
PDFNet.Convert.OutputOptionsOCR.LanguageChoice = {
	e_lang_auto: 0,
	e_lang_catalan: 1,
	e_lang_danish: 2,
	e_lang_german: 3,
	e_lang_english: 4,
	e_lang_spanish: 5,
	e_lang_finnish: 6,
	e_lang_french: 7,
	e_lang_italian: 8,
	e_lang_dutch: 9,
	e_lang_norwegian: 10,
	e_lang_portuguese: 11,
	e_lang_polish: 12,
	e_lang_romanian: 13,
	e_lang_russian: 14,
	e_lang_slovenian: 15,
	e_lang_swedish: 16,
	e_lang_turkish: 17
}
the OCR language.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setPages(page_from, page_to)

Specifies a range of pages to be converted. By default all pages are converted. The first page has the page number of 1.
Parameters:
Name Type Description
page_from number the first page to be converted.
page_to number the last page to be converted (inclusive). Use a negative value to specify the last page in the PDF.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setPDFPassword(password)

Specifies the password if the PDF requires one.
Parameters:
Name Type Description
password string the PDF password, if required; an empty string otherwise.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setPreferredOCREngine(engine)

Specifies the preferred OCR engine. Default is solidocr.
Parameters:
Name Type Description
engine number
PDFNet.Convert.OutputOptionsOCR.PreferredOCREngine = {
	e_engine_default: 0,
	e_engine_tesseract: 1
}
the OCR engine.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setPrioritizeVisualAppearance(replica)

Specifies whether to prefer an exact visual replica of the PDF at the expense of preventing reflow of document paragraphs. Default is false.
Parameters:
Name Type Description
replica boolean False is preferred for most documents that contain paragraphs. Consider using true for documents that don't flow, such as CAD drawings, Illustrator-generated files.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setSearchableImageSetting(setting)

Specifies how scanned image pages should be converted. Default is e_ocr_text.
Parameters:
Name Type Description
setting number
PDFNet.Convert.WordOutputOptions.SearchableImageSetting = {
	e_ocr_text: 2,
	e_ocr_off: 3,
	e_ocr_always: 4
}
the searchable image setting.
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions

setWordOutputFormat(format)

Specifies the output document format (DOCX, RTF, TXT). It is the most useful when the output file extension is not .docx, .rtf or .txt. Note: The DOC file format is now deprecated, DOCX is used automatically instead.
Parameters:
Name Type Description
format number
PDFNet.Convert.WordOutputOptions.WordOutputFormat = {
	e_wof_docx: 0,
	e_wof_rtf: 2,
	e_wof_txt: 3
}
the output document format (DOCX, RTF, TXT).
Returns:
this object, for call chaining
Type
PDFNet.Convert.WordOutputOptions