Obtains the formatted text from a HOCR document.
 

1 hocr.getTextFromDocument

<hocr.getTextFromDocument
    rows='rows'
    ncols='ncols'
>
    <hocr text /> +
</hocr.getTextFromDocument>
Example

Obtains the text from a HOCR document.

Copy
<xsql-script name='hocr_getTextFromPage'>
        <body>
            <set name='m_ocr_text'><![CDATA[
            <html xmlns="http://www.w3.org/1999/xhtml">
              <body>
                <div class="ocr_page" title="bbox 0 0 2548 3300; image /path/to/scanned/image.png">
                  <span class="ocr_line" title="bbox 659 143 863 177">Some Text</span>
                  <span class="ocr_line" title="bbox 723 275 916 324">More Text</span>
                </div>
              </body>
            </html>] ]>
            </set>

            <println>
                <hocr.getTextFromDocument>
                    <m_ocr_text />
                </hocr.getTextFromPage>
            </println>
        </body>
    </xsql-script>