public class

TextExtractor.Word

extends Object
implements AutoCloseable
java.lang.Object
   ↳ com.pdftron.pdf.TextExtractor.Word

Summary

Public Methods
void close()
Frees the native memory of the object.
void destroy()
Frees the native memory of the object.
boolean equals(Object other)
Rect getBBox()
Get the bounding box.
TextExtractor.Style getCharStyle(int char_idx)
Get the character style.
int getCurrentNum()
Get the index of this word of the current line
double[] getGlyphQuad(int glyph_idx)
Get the glyph quadrilateral bounding box.
TextExtractor.Word getNextWord()
Get the next word.
int getNumGlyphs()
Gets the glyphs count.
double[] getQuad()
Return the quadrilateral representing a tight bounding box for this word (in unrotated page coordinates).
String getString()
Get the string.
int getStringLen()
Get the string length
TextExtractor.Style getStyle()
Get the word style.
boolean isValid()
Checks if word is valid.
[Expand]
Inherited Methods
From class java.lang.Object
From interface java.lang.AutoCloseable

Public Methods

public void close ()

Frees the native memory of the object. This can be explicity called to control the deallocation of native memory and avoid situations where the garbage collector does not free the object in a timely manner.

public void destroy ()

Frees the native memory of the object. This can be explicity called to control the deallocation of native memory and avoid situations where the garbage collector does not free the object in a timely manner.

public boolean equals (Object other)

public Rect getBBox ()

Get the bounding box.

Note: To account for the effect of page '/Rotate' attribute, transform all points using getDefaultMatrix().

Returns
  • The bounding box for this word (in unrotated page coordinates).

public TextExtractor.Style getCharStyle (int char_idx)

Get the character style.

Parameters
char_idx The index of a character in this word.
Returns
  • The style associated with a given character.

public int getCurrentNum ()

Get the index of this word of the current line

Returns
  • the index of this word of the current line. A word that starts the line will return 0, whereas the last word in the line will return (line.getNumWords()-1).

public double[] getGlyphQuad (int glyph_idx)

Get the glyph quadrilateral bounding box.

Parameters
glyph_idx The index of a glyph in this word.
Returns
  • The quadrilateral representing a tight bounding box for a given glyph in the word (in unrotated page coordinates).

public TextExtractor.Word getNextWord ()

Get the next word.

Returns
  • the next word on the current line.

public int getNumGlyphs ()

Gets the glyphs count.

Returns
  • The number of glyphs in this word.

public double[] getQuad ()

Return the quadrilateral representing a tight bounding box for this word (in unrotated page coordinates).

Returns
  • quadrilateral as array of doubles

public String getString ()

Get the string.

Returns
  • the content of this word represented as a Unicode string.

public int getStringLen ()

Get the string length

Returns
  • the number of characters in this word.

public TextExtractor.Style getStyle ()

Get the word style.

Returns
  • predominant word style

public boolean isValid ()

Checks if word is valid.

Returns
  • true if this is a valid word, false otherwise.