Enum TextExtractor.ProcessingFlags

Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.

public enum TextExtractor.ProcessingFlags

Name	Description
e_extract_using_zorder	Use Z-order as reading order for text
e_no_dup_remove	Disables removing duplicated text that is frequently used to achieve visual effects of drop shadow and fake bold.
e_no_invisible_text	Enables removing text that uses rendering mode 3 (i.e. invisible text). Invisible text is usually used in 'PDF Searchable Images' (i.e. scanned pages with a corresponding OCR text). As a result, invisible text will be extracted by default.
e_no_ligature_exp	Disables expanding of ligatures using a predefined mapping. Default ligatures are: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE.
e_no_watermarks	Enables removal of text that is marked as part of a Watermark layer
e_none
e_punct_break	Treat punctuation (e.g. full stop, comma, semicolon, etc.) as word break characters.
e_remove_hidden_text	Enables removal of text that is obscured by images or rectangles. Since this option has small performance penalty on performance of text extraction, by default it is not enabled.