Obtains the formatted text from a HOCR document.
1 hocr.getTextFromDocument
<hocr.getTextFromDocument
rows='rows'
ncols='ncols'
>
<hocr text /> +
</hocr.getTextFromDocument>
Attributes | |||||
---|---|---|---|---|---|
Name | Type | Required | Default | Description | |
Arows | string | Number of rows to be returned. | |||
Ancols | string | Number of columns to be returned. |
Arguments | |||||
---|---|---|---|---|---|
Name | Type | Required | Unique | Nullable | Description |
Ehocr text | String |
Returns | |
---|---|
Type | Description |
String | Returns the text ascii formatted from the HOCR definition. |
Example
Obtains the text from a HOCR document.
Copy
<xsql-script name='hocr_getTextFromPage'> <body> <set name='m_ocr_text'><![CDATA[ <html xmlns="http://www.w3.org/1999/xhtml"> <body> <div class="ocr_page" title="bbox 0 0 2548 3300; image /path/to/scanned/image.png"> <span class="ocr_line" title="bbox 659 143 863 177">Some Text</span> <span class="ocr_line" title="bbox 723 275 916 324">More Text</span> </div> </body> </html>] ]> </set> <println> <hocr.getTextFromDocument> <m_ocr_text /> </hocr.getTextFromPage> </println> </body> </xsql-script>