#include <TextExtractor.h>
TextExtractor::Line object represents a line of text on a PDF page. Each line consists of a sequence of words, and each words in one or more styles.
Definition at line 530 of file TextExtractor.h.
pdftron::PDF::Line::Line |
( |
| ) |
|
bool pdftron::PDF::Line::EndsWithHyphen |
( |
| ) |
|
- Returns
- true is this line of text ends with a hyphen (i.e. '-'), false otherwise.
const double* pdftron::PDF::Line::GetBBox |
( |
| ) |
|
- Parameters
-
out_bbox | The bounding box for this line (in unrotated page coordinates). |
- Note
- To account for the effect of page '/Rotate' attribute, transform all points using page.GetDefaultMatrix().
int pdftron::PDF::Line::GetCurrentNum |
( |
| ) |
|
- Returns
- the index of this line of the current page.
Word pdftron::PDF::Line::GetFirstWord |
( |
| ) |
|
- Returns
- the first word in the line.
- Note
- To traverse the list of all words on this line use word.GetNextWord().
int pdftron::PDF::Line::GetFlowID |
( |
| ) |
|
- Returns
- The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines/paragraphs belong to which flows.
Line pdftron::PDF::Line::GetNextLine |
( |
| ) |
|
- Returns
- the next line on the page.
int pdftron::PDF::Line::GetNumWords |
( |
| ) |
|
- Returns
- The number of words in this line.
int pdftron::PDF::Line::GetParagraphID |
( |
| ) |
|
- Returns
- The unique identifier for a paragraph or column that this line belongs to. This information can be used to identify which lines belong to which paragraphs.
std::vector<double> pdftron::PDF::Line::GetQuad |
( |
| ) |
|
- Returns
- The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates).
void pdftron::PDF::Line::GetQuad |
( |
double |
out_quad[8] | ) |
|
- Parameters
-
out_quad | The quadrilateral representing a tight bounding box for this line (in unrotated page coordinates). |
Style pdftron::PDF::Line::GetStyle |
( |
| ) |
|
- Returns
- predominant style for this line.
Word pdftron::PDF::Line::GetWord |
( |
int |
word_idx | ) |
|
- Returns
- the i-th word in this line.
- Parameters
-
word_idx | A integer representing the index of the word to get. |
bool pdftron::PDF::Line::IsSimpleLine |
( |
| ) |
|
- Returns
- true is this line is not rotated (i.e. if the quadrilaterals returned by GetBBox() and GetQuad() coincide).
bool pdftron::PDF::Line::IsValid |
( |
| ) |
|
- Returns
- true if this is a valid line, false otherwise.
bool pdftron::PDF::Line::operator!= |
( |
const Line & |
| ) |
const |
bool pdftron::PDF::Line::operator== |
( |
const Line & |
| ) |
const |
The documentation for this class was generated from the following file: