Click or drag to resize

TextExtractorGetAsXML Method (TextExtractorXMLOutputFlags)

Get text content in a form of an XML string.

Namespace:  pdftron.PDF
Assembly:  pdftron (in pdftron.dll) Version: 255.255.255.255
Syntax
public string GetAsXML(
	TextExtractorXMLOutputFlags flags
)

Parameters

flags
Type: pdftron.PDFTextExtractorXMLOutputFlags
flags controlling XML output. For more information, please see TextExtract::XMLOutputFlags.

Return Value

Type: String
The string containing XML output.
Remarks
XML output will be encoded in UTF-8 and will have the following structure: <Page num="1 crop_box="0, 0, 612, 792" media_box="0, 0, 612, 792" rotate="0"> <Flow id="1"> <Para id="1"> <Line box="72, 708.075, 467.895, 10.02" style="font-family:Calibri; font-size:10.02; color: #000000;"> <Word box="72, 708.075, 30.7614, 10.02">PDFNet</Word> <Word box="106.188, 708.075, 15.9318, 10.02"<SDK</Word> <Word box="125.617, 708.075, 6.22242, 10.02"<is</Word> ... </Line> </Para> </Flow> </Page> The above XML output was generated by passing the following union of flags in the call to GetAsXML(): (TextExtractor::e_words_as_elements | TextExtractor::e_output_bbox | TextExtractor::e_output_style_info) In case 'xml_output_flags' was not specified, the default XML output would look as follows:
<Page num="1 crop_box="0, 0, 612, 792" media_box="0, 0, 612, 792" rotate="0">
<Flow id="1">
<Para id="1">
<Line<PDFNet SDK is an amazingly comprehensive, high-quality PDF developer toolkit...</Line>
<Line<levels. Using the PDFNet PDF library, ...</Line>
...
</Para>
</Flow>
</Page>
See Also