Compress PDF Image JBIG6 - Python Sample Code

Sample code for using Apryse SDK to recompress bitonal (black and white) images in existing PDF documents using JBIG2 compression (lossless or lossy). The sample is intended to show how to specify hint information for the image encoder and is not meant to be a generic PDF optimization tool. To demonstrate the possible compression rates, we recompressed a document containing 17 scanned pages. The original input document is ~1.4MB and is using standard CCITT Fax compression. Lossless JBIG2 compression shrunk the filesize to 641KB, while lossy JBIG2 compression shrunk it to 176KB. Capabilities include programatically creating new fields and widget annotations, form filling, modifying existing field values, form templating, and flattening form fields.

Learn more about our Server SDK.

1#---------------------------------------------------------------------------------------
2# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
3# Consult LICENSE.txt regarding license information.
4#---------------------------------------------------------------------------------------
5
6import site
7site.addsitedir("../../../PDFNetC/Lib")
8import sys
9from PDFNetPython import *
10
11sys.path.append("../../LicenseKey/PYTHON")
12from LicenseKey import *
13
14
15# This sample project illustrates how to recompress bi-tonal images in an
16# existing PDF document using JBIG2 compression. The sample is not intended
17# to be a generic PDF optimization tool.
18#
19# You can download the entire document using the following link:
20# http://www.pdftron.com/net/samplecode/data/US061222892.pdf
21
22def main():
23 PDFNet.Initialize(LicenseKey)
24
25 pdf_doc = PDFDoc("../../TestFiles/US061222892-a.pdf")
26 pdf_doc.InitSecurityHandler()
27
28 cos_doc = pdf_doc.GetSDFDoc()
29 num_objs = cos_doc.XRefSize()
30
31 i = 1
32 while i < num_objs:
33 obj = cos_doc.GetObj(i)
34 if obj is not None and not obj.IsFree() and obj.IsStream():
35 # Process only images
36 itr = obj.Find("Subtype")
37 if not itr.HasNext() or not itr.Value().GetName() == "Image":
38 i = i + 1
39 continue
40
41 input_image = Image(obj)
42 # Process only gray-scale images
43 if input_image.GetComponentNum() != 1:
44 i = i + 1
45 continue
46
47 # Skip images that are already compressed using JBIG2
48 itr = obj.Find("Filter")
49 if (itr.HasNext() and itr.Value().IsName() and itr.Value().GetName() == "JBIG2Decode"):
50 i = i + 1
51 continue
52
53 filter = obj.GetDecodedStream()
54 reader = FilterReader(filter)
55
56 hint_set = ObjSet() # hint to image encoder to use JBIG2 compression
57 hint = hint_set.CreateArray()
58
59 hint.PushBackName("JBIG2")
60 hint.PushBackName("Lossless")
61
62 new_image = (Image.Create(cos_doc, reader,
63 input_image.GetImageWidth(),
64 input_image.GetImageHeight(),
65 1,
66 ColorSpace.CreateDeviceGray(),
67 hint))
68
69 new_img_obj = new_image.GetSDFObj()
70 itr = obj.Find("Decode")
71
72 if itr.HasNext():
73 new_img_obj.Put("Decode", itr.Value())
74 itr = obj.Find("ImageMask")
75 if itr.HasNext():
76 new_img_obj.Put("ImageMask", itr.Value())
77 itr = obj.Find("Mask")
78 if itr.HasNext():
79 new_img_obj.Put("Mask", itr.Value())
80
81 cos_doc.Swap(i, new_img_obj.GetObjNum())
82 i = i + 1
83
84 pdf_doc.Save("../../TestFiles/Output/US061222892_JBIG2.pdf", SDFDoc.e_remove_unused)
85 pdf_doc.Close()
86 PDFNet.Terminate()
87
88if __name__ == '__main__':
89 main()

Did you find this helpful?

Trial setup questions?

Ask experts on Discord

Need other help?

Contact Support

Pricing or product questions?

Contact Sales