1 OCRDocumentText

Class Ax.ocr.OCRDocumentText


Text layer OCR (optical character recognition) library Allows to register a text and obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns. Text layer: Is the text of the document represented on a textplain. It can be extracted from a pdfdocument using function getTextFromPage (axional pdf library). OCRDocumentText can be used to obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns or througth text positions.

Constructor Summary

Method
Description
JSOCRDocumentText(Ax.text.String text)
Creates a new JSOCRDocumentText instance from a text.

Method Summary

Modifier and Type
Method
Description
boolean 
findArray(Ax.ocr.OCRArrayHeader header)
Searches the given text.
boolean 
findText(Ax.text.String text)
Searches the given text.
Ax.text.String 
Returns a regular expression that only matches with the given parts.
Ax.text.String 
getLineAt(int row)
Returns an specific row.
int 
Returns the column on the text where search value has been matched by the findText function.
int 
Returns the end index on the text where search value has been matched by the findText function.
Ax.text.String 
Returns the match of all regular expressions.
Ax.text.String 
getMatchGroup(int group)
Returns the text matched on a specific group.
getMatchGroupBoundingBox(int group,Ax.text.String text)
int 
Returns the number of groups matched on the findText function.
int 
getMatchGroupEnd(int group)
Returns the end index on the text where the specific group has been matched by the findText function.
int 
getMatchGroupStart(int group)
Returns the start index on the text where the specific group has been matched by the findText function.
int 
Returns the row on the text where search value has been matched by the findText function.
int 
Returns the start index on the text where search value has been matched by the findText function.
int 
Returns the number of columns matched.
int 
Returns the number of rows on the ocrDocumentText.
Ax.text.String 
getTextArray(int start, int end)
Returns the content between delimited selection.
Ax.text.String 
getTextRect(int row,int nrows,int col,int ncols)
Returns the text between delimited selection.
Ax.text.String 

Constructor Detail

Ax.ocr.OCRDocumentText.JSOCRDocumentText

Ax.ocr.OCRDocumentText.JSOCRDocumentText(
	string text
						)
Info:
Creates a new JSOCRDocumentText instance from a text.
Parameters:
text - the text to use

Example
Copy
var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf'));
     var ocrText = new Ax.ocr.OCRDocumentText(pdf.getTextFromPage(2))

Method Detail

Ax.ocr.OCRDocumentText.findArray

boolean Ax.ocr.OCRDocumentText.findArray(
	object header
)
Info:
Searches the given text. It can contain regular expressions.
Parameters:
header - 
Returns:
boolean

Ax.ocr.OCRDocumentText.findText

boolean Ax.ocr.OCRDocumentText.findText(
	string text
)
Info:
Searches the given text. It can contain regular expressions.
Parameters:
text - the text to search
Returns:
boolean

Ax.ocr.OCRDocumentText.generateMultigroupExpr

string Ax.ocr.OCRDocumentText.generateMultigroupExpr(
	string[] parts
)
Info:
Returns a regular expression that only matches with the given parts.
Parameters:
parts - the strings to match
Returns:
string

Ax.ocr.OCRDocumentText.getLineAt

string Ax.ocr.OCRDocumentText.getLineAt(
	smallint row
)
Info:
Returns an specific row.
Parameters:
row - the row wanted
Returns:
string

Ax.ocr.OCRDocumentText.getMatchCol

smallint Ax.ocr.OCRDocumentText.getMatchCol()
Info:
Returns the column on the text where search value has been matched by the findText function.
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchEnd

smallint Ax.ocr.OCRDocumentText.getMatchEnd()
Info:
Returns the end index on the text where search value has been matched by the findText function.
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchGroup

string Ax.ocr.OCRDocumentText.getMatchGroup()
Info:
Returns the match of all regular expressions.
Returns:
string

Ax.ocr.OCRDocumentText.getMatchGroup

string Ax.ocr.OCRDocumentText.getMatchGroup(
	smallint group
)
Info:
Returns the text matched on a specific group. Useful when using regular expressions.
Parameters:
group - the match group
Returns:
string

Ax.ocr.OCRDocumentText.getMatchGroupBoundingBox

object Ax.ocr.OCRDocumentText.getMatchGroupBoundingBox(
	smallint group
)
Parameters:
group - 
Returns:
object

Ax.ocr.OCRDocumentText.getMatchGroupBoundingBox

object Ax.ocr.OCRDocumentText.getMatchGroupBoundingBox(
	smallint group,
	string text
)
Parameters:
group - 
text - 
Returns:
object

Ax.ocr.OCRDocumentText.getMatchGroupCount

smallint Ax.ocr.OCRDocumentText.getMatchGroupCount()
Info:
Returns the number of groups matched on the findText function. Useful when working with regular expressions.
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchGroupEnd

smallint Ax.ocr.OCRDocumentText.getMatchGroupEnd(
	smallint group
)
Info:
Returns the end index on the text where the specific group has been matched by the findText function.
Parameters:
group - the match group
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchGroupStart

smallint Ax.ocr.OCRDocumentText.getMatchGroupStart(
	smallint group
)
Info:
Returns the start index on the text where the specific group has been matched by the findText function.
Parameters:
group - the match group
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchRow

smallint Ax.ocr.OCRDocumentText.getMatchRow()
Info:
Returns the row on the text where search value has been matched by the findText function.
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchStart

smallint Ax.ocr.OCRDocumentText.getMatchStart()
Info:
Returns the start index on the text where search value has been matched by the findText function.
Returns:
smallint

Ax.ocr.OCRDocumentText.getMatchWidth

smallint Ax.ocr.OCRDocumentText.getMatchWidth()
Info:
Returns the number of columns matched.
Returns:
smallint

Ax.ocr.OCRDocumentText.getNumberOfRows

smallint Ax.ocr.OCRDocumentText.getNumberOfRows()
Info:
Returns the number of rows on the ocrDocumentText.
Returns:
smallint

Ax.ocr.OCRDocumentText.getText

string Ax.ocr.OCRDocumentText.getText()
Returns:
string

Ax.ocr.OCRDocumentText.getTextArray

array Ax.ocr.OCRDocumentText.getTextArray(
	smallint start,
	smallint end
)
Info:
Returns the content between delimited selection. Each array element is one content line/row.
Parameters:
start - starting row
end - ending row
Returns:
array

Ax.ocr.OCRDocumentText.getTextRect

string Ax.ocr.OCRDocumentText.getTextRect(
	smallint row,
	smallint nrows,
	smallint col,
	smallint ncols
)
Info:
Returns the text between delimited selection.
Parameters:
row - starting row
nrows - number pf rows
col - starting column
ncols - number of columns
Returns:
string

Ax.ocr.OCRDocumentText.toString

string Ax.ocr.OCRDocumentText.toString()
Returns:
string

2 OCRDocumentBytes

Class Ax.ocr.OCRDocumentBytes


Bitmap layer OCR (optical character recognition) library Allows to register a text and obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns. Byte code layer: Is an array multidimensional with the information of the text layer. That layer provides information like the position of the words on the original document. It can be extracted from a pdfdocument using function getBitmapFromPage (axional pdf library). OCRDocumentBytes can be used to obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns or througth text positions.

Constructor Summary

Method
Description
JSOCRDocumentBytes(Ax.text.String byteCode)
Creates a new instance from any InputStream convertible object (File, Blob, URL, base64 string).

Method Summary

Modifier and Type
Method
Description
Map 
getRectangleFromTextPosition(int row, int col, int endRow, int endCol)
Returns the positions on the pdf informing the positions on the text layer.
List 
getWordsRectangle(int areaIniY,int areaIniX,int areaEndY,int areaEndX)
Returns an array of words between rectangle position of pdf document.

Constructor Detail

Ax.ocr.OCRDocumentBytes.JSOCRDocumentBytes

Ax.ocr.OCRDocumentBytes.JSOCRDocumentBytes(
	string byteCode
						)
Info:
Creates a new instance from any InputStream convertible object (File, Blob, URL, base64 string).
Parameters:
byteCode - object to analyze

Example
Copy
var pdf = new Ax.pdf.PDFReader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf'));
 var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2));

Method Detail

Ax.ocr.OCRDocumentBytes.getRectangleFromTextPosition

object Ax.ocr.OCRDocumentBytes.getRectangleFromTextPosition(
	smallint row,
	smallint col,
	smallint endRow,
	smallint endCol
)
Info:
Returns the positions on the pdf informing the positions on the text layer.
Parameters:
row - row on text layer
col - col on text layer
endRow - end row on text layer
endCol - end col on text layer
Returns:
object

Ax.ocr.OCRDocumentBytes.getWordsRectangle

array Ax.ocr.OCRDocumentBytes.getWordsRectangle(
	smallint areaIniY,
	smallint areaIniX,
	smallint areaEndY,
	smallint areaEndX
)
Info:
Returns an array of words between rectangle position of pdf document.
Parameters:
areaIniY - initial y position in pdf
areaIniX - initial x position in pdf
areaEndY - end y position in pdf
areaEndX - 
Returns:
array