PDFTron pdfnet-node Class: HTMLOutputOptions

new HTMLOutputOptions()

An object containing options common to ToHtml and ToEpub functions

Members

<static> ContentReflowSetting

Type:

number

Properties:

Name	Type	Description
`e_fixed_position`	number	Content uses fixed positioning (default).
`e_reflow_paragraphs`	number	Deprecated. Text flows within paragraphs.
`e_reflow_full`	number	Text flows freely edge-to-edge in a single column.

<static> SearchableImageSetting

Type:

number

Properties:

Name	Type	Description
`e_ocr_image_text`	number	Convert both images and pre-existing hidden text from previous OCR. Only applies to e_reflow_paragraphs.
`e_ocr_image`	number	Convert images only, ignoring pre-existing text from previous OCR, and do not perform any new OCR.
`e_ocr_text`	number	Convert pre-existing text from previous OCR only (e_reflow_paragraphs mode). Perform new OCR on scanned pages (e_reflow_full mode).
`e_ocr_off`	number	Convert images only, ignoring pre-existing text from previous OCR, and do not perform any new OCR.
`e_ocr_always`	number	Perform new OCR on all pages. (e_reflow_full mode).

Methods

setConnectHyphens(connect)

Specifies whether hyphens in the PDF should be connected. Default is false. Note: This option is only available for e_reflow_paragraphs and e_reflow_full modes.

Parameters:

Name	Type	Description
`connect`	boolean	if true, hyphens in the PDF will be connected.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setContentReflowSetting(reflow)

Switch between fixed (pre-paginated) and reflowable HTML generation. Default is e_fixed_position. In e_reflow_paragraphs mode (now deprecated), conversions require that the optional PDFTron HTML reflow paragraphs add-on module is available. In e_reflow_full mode, conversions require that the optional PDFTron StructuredOutput add-on module is available.

Parameters:

Name	Type	Description
`reflow`	number	PDFNet.Convert.HTMLOutputOptions.ContentReflowSetting = { e_fixed_position: 0, e_reflow_paragraphs: 1, e_reflow_full: 2 } the generated HTML will be either fixed or reflowable.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setDisableVerticalSplit(disable)

Specifies whether to disable the detection of section columns. Default is false. Enable this if your tables are coming out as section columns. Note: This option is only available for e_reflow_paragraphs mode. In e_reflow_full mode, columns are detected automatically.

Parameters:

Name	Type	Description
`disable`	boolean	if true, the detection of section columns are disabled.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setDPI(dpi)

The output resolution, from 1 to 1000, in Dots Per Inch (DPI) at which to render elements which cannot be directly converted. Default is 140. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`dpi`	number	the resolution in Dots Per Inch

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setEmbedImages(embed)

Specifies whether images are embedded in the HTML without having to link to external files. Default is true. Note: This option is only available for e_reflow_paragraphs and e_reflow_full modes.

Parameters:

Name	Type	Description
`embed`	boolean	if true, images are embedd in the HTML, otherwise, images are saved as external files.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setExternalLinks(enable)

Enable the conversion of external URL navigation. Default is false. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`enable`	boolean	if true, links that specify external URL's are converted into HTML.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setFileConversionTimeoutSeconds(seconds)

Specifies the amount of time in seconds after which the conversion fails. Default is 300. Very long files need more time to convert. Note: This option is only available for e_reflow_paragraphs mode. The timeout feature is not necessary in other modes.

Parameters:

Name	Type	Description
`seconds`	number	the timeout in seconds.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setFootnotesSetting(option)

Specifies how Foonotes should be converted. Default is e_Recover, which will include them as footnotes. Note: This option is only available for e_reflow_full mode.

Parameters:

Name	Type	Description
`option`	number	PDFNet.Convert.StructuredOutput.SectionConversionSetting = { e_Recover: 0, e_DoNotDetect: 1, e_DetectAndRemove: 2 } The footnotes setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setHeadersAndFootersSetting(option)

Specifies how header and footers should be converted. Default is e_Recover, which will include them as headers and footers. Note: This option is only available for e_reflow_full mode.

Parameters:

Name	Type	Description
`option`	number	PDFNet.Convert.StructuredOutput.SectionConversionSetting = { e_Recover: 0, e_DoNotDetect: 1, e_DetectAndRemove: 2 } The header and footer setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setImageDPI(dpi)

Specifies the output image resolution, from 8 to 600, in Pixels Per Inch (PPI). The higher the PPI, the larger the image. Default is 192. Note: This option is only available for e_reflow_paragraphs mode. In other modes, image resolution is determined automatically for an optimal result.

Parameters:

Name	Type	Description
`dpi`	number	the resolution in Pixels Per Inch.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setInternalLinks(enable)

Enable the conversion of internal document navigation. Default is false. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`enable`	boolean	if true, links that specify page jumps are converted into HTML.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setJPGQuality(quality)

Specifies the compression quality to use when generating JPEG images. Note: This option is only available for e_fixed_position and e_reflow_paragraphs modes. In e_reflow_full mode, the optimal JPEG quality is chosen automatically for best balance between size and quality.

Parameters:

Name	Type	Description
`quality`	number	the JPEG compression quality, from 0(highest compression) to 100(best quality).

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setLanguage(language)

Specifies the OCR language. Default is automatic language detection. Note: This option is only available for e_reflow_full mode.

Parameters:

Name Type Description

Name	Type	Description
`language`	number	PDFNet.Convert.OutputOptionsOCR.LanguageChoice = { e_lang_auto: 0, e_lang_catalan: 1, e_lang_danish: 2, e_lang_german: 3, e_lang_english: 4, e_lang_spanish: 5, e_lang_finnish: 6, e_lang_french: 7, e_lang_italian: 8, e_lang_dutch: 9, e_lang_norwegian: 10, e_lang_portuguese: 11, e_lang_polish: 12, e_lang_romanian: 13, e_lang_russian: 14, e_lang_slovenian: 15, e_lang_swedish: 16, e_lang_turkish: 17 } the OCR language.

language

number

PDFNet.Convert.OutputOptionsOCR.LanguageChoice = {
	e_lang_auto: 0,
	e_lang_catalan: 1,
	e_lang_danish: 2,
	e_lang_german: 3,
	e_lang_english: 4,
	e_lang_spanish: 5,
	e_lang_finnish: 6,
	e_lang_french: 7,
	e_lang_italian: 8,
	e_lang_dutch: 9,
	e_lang_norwegian: 10,
	e_lang_portuguese: 11,
	e_lang_polish: 12,
	e_lang_romanian: 13,
	e_lang_russian: 14,
	e_lang_slovenian: 15,
	e_lang_swedish: 16,
	e_lang_turkish: 17
}

the OCR language.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setMaximumImagePixels(max_pixels)

Specifies the maximum image slice size in pixels. Default is 2000000. Note: This setting now will no longer reduce the total number of image pixels. Instead a lower value will just produce more slices and vice versa. Note: Since image compression works better with more pixels a larger max pixels should generally create smaller files. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`max_pixels`	number	the maximum number of pixels an image can have

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setNoPageWidth(enable)

Determines whether to flow contents across the entire browser window. Default is false. Note: This option is only available for e_reflow_paragraphs mode. In e_reflow_full mode, content always flows across the entire browser window.

Parameters:

Name	Type	Description
`enable`	boolean	if true, content will flow across entire page.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setPages(page_from, page_to)

Specifies a range of pages to be converted. By default all pages are converted. The first page has the page number of 1. Note: This option is only available for e_reflow_paragraphs and e_reflow_full modes.

Parameters:

Name	Type	Description
`page_from`	number	the first page to be converted.
`page_to`	number	the last page to be converted (inclusive). Use a negative value to specify the last page in the PDF.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setPDFPassword(password)

Specifies the password if the PDF requires one. Note: This option is only available for e_reflow_paragraphs and e_reflow_full modes.

Parameters:

Name	Type	Description
`password`	string	the PDF password, if required; an empty string otherwise.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setPreferJPG(prefer_jpg)

Use JPG files rather than PNG. This will apply to all generated images. Default is true. Note: This option is only available for e_fixed_position and e_reflow_paragraphs modes.

Parameters:

Name	Type	Description
`prefer_jpg`	boolean	if true JPG images will be used whenever possible.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setPreferredOCREngine(engine)

Specifies the preferred OCR engine. Default is solidocr. Note: This option is only available for e_reflow_full mode.

Parameters:

Name	Type	Description
`engine`	number	PDFNet.Convert.OutputOptionsOCR.PreferredOCREngine = { e_engine_default: 0, e_engine_tesseract: 1 } the OCR engine.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setReportFile(path)

Generate a XML file that contains additional information about the conversion process. By default no report is generated. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`path`	string	The file path to which the XML report is written to.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setScale(scale)

Set an overall scaling of the generated HTML pages. Default is 1.0. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`scale`	number	a number greater than 0 which is used as a scale factor. For example, calling SetScale(0.5) will reduce the HTML body of the page to half its original size, whereas SetScale(2) will double the HTML body dimensions of the page and will rescale all page content appropriately.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setSearchableImageSetting(setting)

Specifies how scanned image pages should be converted. Default is e_ocr_image_text. Note: This option is only available for e_reflow_paragraphs and e_reflow_full modes. In e_reflow_paragraphs mode, this feature does not perform OCR, but instead it relies on pre-existing text from previous OCR. Both images and pre-existing hidden text are kept by default. In e_reflow_full mode, pre-existing OCRed content is ignored and a new OCR is performed from scratch by default. e_ocr_off can be used to disable OCR.

Parameters:

Name	Type	Description
`setting`	number	PDFNet.Convert.HTMLOutputOptions.SearchableImageSetting = { e_ocr_image_text: 0, e_ocr_image: 1, e_ocr_text: 2, e_ocr_off: 3, e_ocr_always: 4 } the searchable image setting.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setSimpleLists(enable)

Determines whether to use

tags for list items. Default is false. Note: This option is only available for e_reflow_paragraphs mode. In e_reflow_full mode, list items always use

tags.

Parameters:

Name	Type	Description
`enable`	boolean	if true, tags are used for list items.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setSimplifyText(enable)

Controls whether converter optimizes DOM or preserves text placement accuracy. Default is false. Note: This option is only available for e_fixed_position mode.

Parameters:

Name	Type	Description
`enable`	boolean	If true, converter will try to reduce DOM complexity at the expense of text placement accuracy.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions

setTitle(title)

Specifies the title for the output HTML. Note: This option is only available for e_reflow_paragraphs mode. HTML titles are not supported in other modes at the moment.

Parameters:

Name	Type	Description
`title`	string	the title of the output HTML.

Returns:

this object, for call chaining

Type: PDFNet.Convert.HTMLOutputOptions