TextExtractorProcessingFlags Enumeration |
Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.
Namespace:
pdftron.PDF
Assembly:
pdftron (in pdftron.dll) Version: 255.255.255.255
Syntax public enum TextExtractorProcessingFlags
Public Enumeration TextExtractorProcessingFlags
public enum class TextExtractorProcessingFlags
pdftron.PDF.TextExtractorProcessingFlags = function();
pdftron.PDF.TextExtractorProcessingFlags.createEnum('pdftron.PDF.TextExtractorProcessingFlags', false);
Members
| Member name | Value | Description |
---|
| e_none | 0 | |
| e_no_ligature_exp | 1 | Disables expanding of ligatures using a predefined mapping.
Default ligatures are: fi, ff, fl, ffi, ffl, ch, cl, ct, ll,
ss, fs, st, oe, OE. |
| e_no_dup_remove | 2 | Disables removing duplicated text that is frequently used to
achieve visual effects of drop shadow and fake bold. |
| e_punct_break | 4 | Treat punctuation (e.g. full stop, comma, semicolon, etc.) as
word break Characters. |
| e_remove_hidden_text | 8 | Enables removal of text that is obscured by images or
rectangles. Since this option has small performance penalty
on performance of text extraction, by default it is not
enabled. |
| e_no_invisible_text | 16 | Enables removing text that uses rendering mode 3 (i.e. invisible text).
Invisible text is usually used in 'PDF Searchable Images' (i.e. scanned
pages with a corresponding OCR text). As a result, invisible text
will be extracted by default. |
| e_no_watermarks | 128 | Enables removal of text that is marked as part of a Watermark layer
|
| e_extract_using_zorder | 256 | Use Z-order as reading order for text
|
See Also