Section:

PDF to HTML Command Line Conversion

Apryse's PDF2HTML is an easy-to-use, stand-alone command-line application that provides users with an efficient means of converting PDF documents to HTML files. PDF2HTML exports high quality HTML that authentically maintains the contents of the original PDF.

Like other Apryse products, PDF2HTML does not rely on any other third-party software. PDF2HTML can be used in server environments or as a batch conversion process.

Why PDF2HTML?

PDF2HTML is easy to use and reliably preserves the contents of the PDF in the HTML output. PDF fonts are mapped to the appropriate system fonts with style, size and kerning to make accurate fit. PDF tables with and without PDF table structures are detected and converted to HTML tables automatically. Single and multi-column pages are converted to equivalent structures with text flow preserved to facilitate editing. Graphics are converted authentically and placed accurately on the page.

PDF2HTML can be used in server environments and is also suitable for use in batch conversion workflows.

Key Functions

  • Convert PDF to HTML and HTM.
  • Support for all versions of Acrobat documents.
  • Support for Unicode and all PDF font formats.
  • Support for password-protected PDF.
  • Batch conversion.
  • Options to improve text readability and layout.
  • Automatically converts to structured content.
  • Option to convert specific page ranges.
  • Options to control image quality.
  • Options to handle OCRed PDFs.

Common Use Case Scenarios

  • Simple conversion of PDF to HTML for Web posting.
  • Server-based, on-demand conversion of PDF documents to HTML format.
  • Batch processing of PDF files for data collection.

Operating Systems Supported

  • Windows, Linux and Mac.

System Requirements

  • At least 30 MB of free disk space.
  • Memory requirement is dependent on the source document being converted. We recommend a minimum of 4 GB.

Example

sh

1#!/bin/sh
2echo "Example 1) Convert myIn.pdf in this folder to myOut.html using default options:"
3./pdf2html -in myIn.pdf -out myOut.html -license PDFTRON_LICENSE_KEY

More PDF to HTML tools

Depending on your use case, PDF to HTML can be used for rendering with high fidelity and accuracy or to primarily be used in content extraction. This means our tools can help you to display the output or be used in data analysis workflows.

Here are the different options for PDF to HTML conversion depending on your requirements:

PDF to HTML for the highest rendering accuracy

Here are the options for maintaining the original PDF layout and visual accuracy.

WebViewer
To convert PDF to HTML canvas in real-time client-side.

PDF to HTML/ePub
To convert PDF to fixed layout HTML/ePub where one PDF page becomes one HTML file.

PDF2SVG
To convert PDF to SVG to create a vector based image that can be embedded in an HTML file.

PDF2Image
To convert PDF to Image (PNG, JPG, TIFF, Raw) to create a raster based image that can be embedded in an HTML file.

PDF to HTML for extracting semantic content

Here are the options for extracting semantic content from the output.

PDF2HTML
To convert PDF to a single HTML file that preserves the PDF content using a custom heuristic method.

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales