1 OCRDocumentBytes
You can load a byte code text of a PDF
document into a JavaScript OCRDocumentText object from any InputStream convertible object (File, Blob, URL, base64 string)
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2)) </script>
2
You can use OCRDocumentBytes
to obtain an object which encapsulates the contextual analysis methods to perform operations of data extraction through the search patterns or througth text positions.
Return | Method | Description |
---|---|---|
JSON | getRectangleFromTextPosition | Returns the positions on the pdf informing the positions on the text layer. |
Array | getWordsRectangle | Returns an array of words between rectangle position of pdf document. |
2 Get rectangle position on pdf from text position
The getRectangleFromTextPosition
method returns the rectangle positions on the pdf informing the positions on the text layer. Parameters (row inicial, col inicial, row final, col final)
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2)); console.log("POSITION ON THE PDF: " + ocrByteCode.getRectangleFromTextPosition(4,14,6,20)) </script>
POSITION ON THE PDF: {yini=665, xini=73, yend=696, xend=99}
3 Get the words between the rectangle position of a pdf
The getWordsRectangle
function return the words beetween a rectangular selection based on a pdf position. Parameters (position y inicial, position x inicial, position y final, position x final)
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); var ocrByteCode = new Ax.ocr.OCRDocumentBytes(pdf.getBitmapFromPage(2)); console.log("POSITION ON THE PDF: " + ocrByteCode.getWordsRectangle(660,70,700,100)) </script>
POSITION ON THE PDF: [{str=...continued, col=13, row=3}, {str=And, col=13, row=4}, {str=more, col=17, row=4}, {str=text., col=13, row=5}, {str=Oh,, col=19, row=5}]