Some test text!

Search
Hamburger Icon

Ruby / Guides / Filters and streams

Filters and streams in Ruby

One of the basic building blocks of a PDF document is an SDF stream object. For example, in a PDF document all page content, images, embedded fonts, and files are represented using object streams that can be compressed and encrypted using various Filter chains. See the "Stream Objects" and "Filters" chapters in the PDF Reference Manual for more details.

Apryse SDK supports an efficient and flexible architecture for processing streams using filter pipelines.

A filter is an abstraction of a sequence of bytes, such as a file, an input/output device, an inter-process communication pipe, or a TCP/IP socket. A filter can also perform certain transformations of input/output data (e.g. data compression/decompression, color conversion, and so on).

Input filters/streams

Apryse SDK enables generic input from external files using the MappedFile filter. Use MappedFile to open, read from, and close files on a file system. For example:

file = MappedFile(filename)
doc_stream = SDFDoc.new(file)

Opens an external image file for reading. MappedFile buffers input and output for better performance. Although it is possible to read input data directly through the Filter interface (MappedFile is a subclass of Filter), it is more convenient to attach a FilterReader to the filter and then read data through FilterReader interface:

file_sz = file.FileSize()
file_reader = FilterReader.new(file)
mem = file_reader.Read(file_sz)
doc_mem = SDFDoc.new(mem, file_sz)

Data associated with SDF stream objects can be accessed using Stream.GetRawStream() or Stream.GetDecodedStream() methods.

stream = doc.GetTrailer()
dec_stm = stream.GetDecodedStream()
reader = FilterReader.new(dec_stm)

Stream.GetRawStream() creates a Filter used to extract raw data as it appears in a serialized SDF document (or a decrypted version of the stream if the document is secured). Stream.GetDecodedStream() creates a Filter pipeline and returns the last filter in the chain. For example, a given stream may be compressed using JPEG (DCTDecode) compression and encoded using ASCII85 into an ASCII stream. When GetDecodedStream() is invoked on this SDF stream, it will return the last filter in a chain that composed of three filters (the file segment input Filter, the DCTDecode Filter, and the ASCII85Decode Filter, respectively). Data extracted from the returned Filter will be raw image data (i.e. RGB byte triples).

It's possible to iterate through the Filter chain using the Filter.GetAttachedFilter() method. It's also possible to construct new filter chains, and to edit existing ones, using the Filter.AttachFilter() method.

Output filters/streams

To write a filter to a file, simply use Filter.WriteToFile():

dec_stm.WriteToFile(output_filename, false)

To modify or add to an output file filter/stream, simply use the FilterWriter class:

writer = FilterWriter.new(dec_stm)
writer.writeString("Hello World")
writer.flush()

Implement custom filters

Apryse SDK provides full support for all common Filters used in PDF. Although included Filters should cover all common use case scenarios, advanced users may want to provide custom implementations for certain filters (e.g. custom color conversion, or a new compression method). Apryse SDK provides an open and expandable architecture for creation of custom filters. To implement a custom Filter, derive a new class from Filter base class and implement the required interface. A more detailed guide for implementing custom Filters is available through Apryse Systems developer program.

Please contact support@pdftron.com for more details.

Get the answers you need: Chat with us