Obtains the text corresponding to the requested oage of a document in HOCR format.
 

1 hocr.getTextFromPage

<hocr.getTextFromPage
    page='page'
    rows='rows'
    ncols='ncols'
>
    <hocr text /> +
</hocr.getTextFromPage>
Example

Obtains the text from a HOCR document.

Copy
<xsql-script name='hocr.getTextFromPage'>
        <body>
            <set name='m_ocr_text'><![CDATA[
            <html xmlns="http://www.w3.org/1999/xhtml">
              <body>
                <div class="ocr_page" title="bbox 0 0 2548 3300; image /path/to/scanned/image.png">
                  <span class="ocr_line" title="bbox 659 143 863 177">Some Text</span>
                  <span class="ocr_line" title="bbox 723 275 916 324">More Text</span>
                </div>
              </body>
            </html>] ]>
            </set>

            <println>
                <hocr.getTextFromPage page='1'>
                    <m_ocr_text />
                </hocr.getTextFromPage>
            </println>
        </body>
    </xsql-script>