Click or drag to resize

TextExtractorProcessingFlags Enumeration

Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.

Namespace:  pdftron.PDF
Assembly:  pdftron (in pdftron.dll) Version: 255.255.255.255
Syntax
public enum TextExtractorProcessingFlags
Members
  Member nameValueDescription
e_none0
e_no_ligature_exp1Disables expanding of ligatures using a predefined mapping. Default ligatures are: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE.
e_no_dup_remove2Disables removing duplicated text that is frequently used to achieve visual effects of drop shadow and fake bold.
e_punct_break4Treat punctuation (e.g. full stop, comma, semicolon, etc.) as word break Characters.
e_remove_hidden_text8Enables removal of text that is obscured by images or rectangles. Since this option has small performance penalty on performance of text extraction, by default it is not enabled.
e_no_invisible_text16Enables removing text that uses rendering mode 3 (i.e. invisible text). Invisible text is usually used in 'PDF Searchable Images' (i.e. scanned pages with a corresponding OCR text). As a result, invisible text will be extracted by default.
e_no_watermarks128Enables removal of text that is marked as part of a Watermark layer
e_extract_using_zorder256Use Z-order as reading order for text
See Also