Enum TextExtractor.ProcessingFlags
Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.
Namespace: pdftron.PDF
Assembly: PDFNet.dll
Syntax
public enum TextExtractor.ProcessingFlags
Fields
Name | Description |
---|---|
e_extract_using_zorder | Use Z-order as reading order for text |
e_no_dup_remove | Disables removing duplicated text that is frequently used to achieve visual effects of drop shadow and fake bold. |
e_no_invisible_text | Enables removing text that uses rendering mode 3 (i.e. invisible text). Invisible text is usually used in 'PDF Searchable Images' (i.e. scanned pages with a corresponding OCR text). As a result, invisible text will be extracted by default. |
e_no_ligature_exp | Disables expanding of ligatures using a predefined mapping. Default ligatures are: fi, ff, fl, ffi, ffl, ch, cl, ct, ll, ss, fs, st, oe, OE. |
e_no_watermarks | Enables removal of text that is marked as part of a Watermark layer |
e_none | |
e_punct_break | Treat punctuation (e.g. full stop, comma, semicolon, etc.) as word break characters. |
Enables removal of text that is obscured by images or rectangles. Since this option has small performance penalty on performance of text extraction, by default it is not enabled. |