PDF2Html - Convert PDF to HTML - Ruby Sample Code

Sample code for using Apryse SDK to programmatically convert generic PDF documents to HTML, provided in Python, C++, C#, Java, Node.js (JavaScript), PHP, Ruby, Go and VB. Learn more about our PDF to HTML

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
3# Consult LICENSE.txt regarding license information.
4#---------------------------------------------------------------------------------------
5
6require '../../../PDFNetC/Lib/PDFNetRuby'
7include PDFNetRuby
8require '../../LicenseKey/RUBY/LicenseKey'
9
10$stdout.sync = true
11
12#---------------------------------------------------------------------------------------
13# The following sample illustrates how to use the PDF.Convert utility class to convert
14# documents and files to HTML.
15#
16# There are two HTML modules and one of them is an optional PDFNet Add-on.
17# 1. The built-in HTML module is used to convert PDF documents to fixed-position HTML
18# documents.
19# 2. The optional add-on module is used to convert PDF documents to HTML documents with
20# text flowing across the browser window.
21#
22# The PDFTron SDK HTML add-on module can be downloaded from https://dev.apryse.com/
23#
24# Please contact us if you have any questions.
25#---------------------------------------------------------------------------------------
26
27# Relative path to the folder containing the test files.
28$inputPath = "../../TestFiles/"
29$outputPath = "../../TestFiles/Output/"
30
31def main()
32 # The first step in every application using PDFNet is to initialize the
33 # library. The library is usually initialized only once, but calling
34 # Initialize() multiple times is also fine.
35 PDFNet.Initialize(PDFTronLicense.Key)
36
37 #-----------------------------------------------------------------------------------
38
39 begin
40 # Convert PDF document to HTML with fixed positioning option turned on (default)
41 puts "Converting PDF to HTML with fixed positioning option turned on (default)"
42
43 $outputFile = $outputPath + "paragraphs_and_tables_fixed_positioning"
44
45 Convert.ToHtml($inputPath + "paragraphs_and_tables.pdf", $outputFile)
46 puts "Result saved in " + $outputFile
47 rescue => error
48 puts "Unable to convert PDF document to HTML, error: " + error.message
49 end
50
51 #-----------------------------------------------------------------------------------
52
53 PDFNet.AddResourceSearchPath("../../../PDFNetC/Lib/");
54
55 if !StructuredOutputModule.IsModuleAvailable() then
56 puts ""
57 puts "Unable to run part of the sample: PDFTron SDK Structured Output module not available."
58 puts "-------------------------------------------------------------------------------------"
59 puts "The Structured Output module is an optional add-on, available for download"
60 puts "at https://docs.apryse.com/core/info/modules/. If you have already"
61 puts "downloaded this module, ensure that the SDK is able to find the required files"
62 puts "using the PDFNet::AddResourceSearchPath() function."
63 puts ""
64 return
65 end
66
67 #-----------------------------------------------------------------------------------
68
69 begin
70 # Convert PDF document to HTML with reflow full option turned on (1)
71 puts "Converting PDF to HTML with reflow full option turned on (1)"
72
73 $outputFile = $outputPath + "paragraphs_and_tables_reflow_full.html"
74
75 $htmlOutputOptions = Convert::HTMLOutputOptions.new()
76
77 # Set e_reflow_full content reflow setting
78 $htmlOutputOptions.SetContentReflowSetting(Convert::HTMLOutputOptions::E_reflow_full)
79
80 Convert.ToHtml($inputPath + "paragraphs_and_tables.pdf", $outputFile, $htmlOutputOptions)
81 puts "Result saved in " + $outputFile
82 rescue => error
83 puts "Unable to convert PDF document to HTML, error: " + error.message
84 end
85
86 #-----------------------------------------------------------------------------------
87
88 begin
89 # Convert PDF document to HTML with reflow full option turned on (only converting the first page) (2)
90 puts "Converting PDF to HTML with reflow full option turned on (only converting the first page) (2)"
91
92 $outputFile = $outputPath + "paragraphs_and_tables_reflow_full_first_page.html"
93
94 $htmlOutputOptions = Convert::HTMLOutputOptions.new()
95
96 # Set e_reflow_full content reflow setting
97 $htmlOutputOptions.SetContentReflowSetting(Convert::HTMLOutputOptions::E_reflow_full)
98
99 # Convert only the first page
100 $htmlOutputOptions.SetPages(1, 1)
101
102 Convert.ToHtml($inputPath + "paragraphs_and_tables.pdf", $outputFile, $htmlOutputOptions)
103 puts "Result saved in " + $outputFile
104 rescue => error
105 puts "Unable to convert PDF document to HTML, error: " + error.message
106 end
107
108 #-----------------------------------------------------------------------------------
109 PDFNet.Terminate
110 puts "Done."
111end
112
113main()

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales