1 OCRDocumentBytes

You can load a byte code text of a PDF document into a JavaScript OCRDocumentText object from any InputStream convertible object (File, Blob, URL, base64 string)

Copy
<script>
var pdf = new Ax.pdf.PDFReader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf'));
var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2))
</script>

You can use OCRDocumentBytes to obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns or througth text positions.

Return Method Description
JSON getRectangleFromTextPosition Returns the positions on the pdf informing the positions on the text layer.
Array getWordsRectangle Returns an array of words between rectangle position of pdf document.

2 Get rectangle position on pdf from text position

The getRectangleFromTextPosition method returns the rectangle positions on the pdf informing the positions on the text layer. Parameters (row inicial, col inicial, row final, col final)

Copy
<script>
var pdf = new Ax.pdf.PDFReader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf'));
var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2));
console.log("POSITION ON THE PDF: " + ocrByteCode.getRectangleFromTextPosition(4,14,6,20))
</script>

3 Get the words between the rectangle position of a pdf

The getWordsRectangle function return the words beetween a rectangular selection based on a pdf position. Parameters (position y inicial, position x inicial, position y final, position x final)

Copy
<script>
var pdf = new Ax.pdf.PDFReader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf'));
var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2));
console.log("POSITION ON THE PDF: " + ocrByteCode.getWordsRectangle(660,70,700,100))
</script>