All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
pdftron::PDF::Line Class Reference

#include <TextExtractor.h>

Public Member Functions

int GetNumWords ()
 
bool IsSimpleLine ()
 
const double * GetBBox ()
 
std::vector< double > GetQuad ()
 
void GetQuad (double out_quad[8])
 
Word GetFirstWord ()
 
Word GetWord (int word_idx)
 
Line GetNextLine ()
 
int GetCurrentNum ()
 
Style GetStyle ()
 
int GetParagraphID ()
 
int GetFlowID ()
 
bool EndsWithHyphen ()
 
bool IsValid ()
 
bool operator== (const Line &) const
 
bool operator!= (const Line &) const
 
 Line ()
 

Detailed Description

TextExtractor::Line object represents a line of text on a PDF page. Each line consists of a sequence of words, and each words in one or more styles.

Definition at line 530 of file TextExtractor.h.

Constructor & Destructor Documentation

pdftron::PDF::Line::Line ( )

Member Function Documentation

bool pdftron::PDF::Line::EndsWithHyphen ( )
Returns
true is this line of text ends with a hyphen (i.e. '-'), false otherwise.
const double* pdftron::PDF::Line::GetBBox ( )
Parameters
out_bboxThe bounding box for this line (in unrotated page coordinates).
Note
To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().
int pdftron::PDF::Line::GetCurrentNum ( )
Returns
the index of this line of the current page.
Word pdftron::PDF::Line::GetFirstWord ( )
Returns
the first word in the line.
Note
To traverse the list of all words on this line use word.GetNextWord().
int pdftron::PDF::Line::GetFlowID ( )
Returns
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines/paragraphs belong to which flows.
Line pdftron::PDF::Line::GetNextLine ( )
Returns
the next line on the page.
int pdftron::PDF::Line::GetNumWords ( )
Returns
The number of words in this line.
int pdftron::PDF::Line::GetParagraphID ( )
Returns
The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines belong to which paragraphs.
std::vector<double> pdftron::PDF::Line::GetQuad ( )
Returns
The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).
void pdftron::PDF::Line::GetQuad ( double  out_quad[8])
Parameters
out_quadThe quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).
Style pdftron::PDF::Line::GetStyle ( )
Returns
predominant style for this line.
Word pdftron::PDF::Line::GetWord ( int  word_idx)
Returns
the i-th word in this line.
Parameters
word_idxA integer representing the index of the word to get.
bool pdftron::PDF::Line::IsSimpleLine ( )
Returns
true is this line is not rotated (i.e. if the quadrilaterals returned by GetBBox() and GetQuad() coincide).
bool pdftron::PDF::Line::IsValid ( )
Returns
true if this is a valid line, false otherwise.
bool pdftron::PDF::Line::operator!= ( const Line ) const
bool pdftron::PDF::Line::operator== ( const Line ) const

The documentation for this class was generated from the following file: