OCR (Optical Character Recognition) is a new and advanced optical recognition technology. It extracts the text from an image or scanned document to be automatically stored and indexed in a database, among other features. As a general characteristic, this data recognition system is applied using regular expressions of pattern recognition within the text.

Among other features, Axional OCR facilitates the data entry for various types of documents that serve as the starting point for company-specific business processes. Likewise, it simplifies information storage and filing, removing the need for access to the physical document in order to examine it in detail.

Axional OCR thus provides an efficient information entry system for company databases, making the integration of any structured physical document possible. This Axional model aims, more specifically, to automatically integrate digitized supplier invoices into the ERP system database.

1 Prerequisites: PDF generation

For the correct operation of the application, it is necessary documents to be digital PDF with text layer, that is, documents such as scanned paper documents or PDF files, which have been transformed into digitized texts. The transformed document looks exactly like the original, but allows the data recognition of the into searchable data. It is easy to recognize these types of files, since the text is selectable.

Nowadays it is very common to receive invoices from suppliers by e-mail, and they are very likely to be in a PDF with text layer format.

When no PDF with text layer are available, it is necessary to transform them. This transformation is an external procedure to the application. Document generation can be done by an external provider or by using a document scanning application with special capabilities. For example, you can use Tesseract, as an Open Source OCR Engine. Also, most current printers with scanner have an OCR application.

IMPORTANT

The process of transforming a document (either paper or scanned) into a PDF with text layer is an external process that is not part of the scope of Axional OCR application.

2 Structure

The process of integrating data obtained from the PDF document into the system is carried out in several consecutive stages.

• Internal integration: processed data will be transferred to a predefined internal table (destiny table). This is the last step of the Axional OCR functioning process.