Intelligent Character Recognition

Data extraction using Intelligent Character Recognition (ICR): it’s the dream right? Scan your paper, load your electronic images, and stand back while the algorithms do their thing. The result is clean, usable, error-free data.

Unfortunately, the expectation and the reality are often very different. While moving toward automation may be the right move, organizations must understand the capabilities and shortcomings of the ICR technology, and whether they meet their organization’s data capture goals. It’s these considerations that will keep them from spending on technology they may not need, or employing one that can help successfully meet their document output objectives.

No one should force document automation if it isn’t needed. ICR is an expensive technology that requires a robust IT infrastructure to employ properly. How forms are structured, how they’re filled out, and their volume will determine whether a project is a candidate for ICR.

ICR Structure

Structured and semi-structured forms can have the majority of their data read and captured with ICR. These form types are stable and ICR can locate most or all of the fields with very little effort. In addition to recognizing fields, ICR can assign data to multiple fields for future use, attach lookup tables, run calculations, organize, sort and apply a variety of algorithms based on the specific need.

Unstructured forms are the opposite; the data points that need to be captured are random.Because of their very nature, unstructured forms must be manually reviewed and interpreted. The best industry practice to capture data from unstructured form types is double-blind manual keying (a topic I will cover in another blog post).

How Forms are Filled Out

The best type of data to capture is machine print. These are documents that are filled out using standardized type, as would be produced by a computer or typewriter. When implementing an ICR that relies on finding and capturing keyword data in either structured or unstructured forms, machine print documents must be used to achieve a high rate of data capture.

When capturing machine print data from structured forms, data capture accuracy approaches 100%. If structured documents are handwritten,the ICR software will know what the data points are, but the accuracy rate drops somewhat. For unstructured handwritten forms, the accuracy is extremely low and not suited for ICR.

Volume Level

Because of the expense of ICR it only makes sense to employ it if a project’s volume is high enough. Automation is meant to be a time, and ultimately a cost, saving technology. Even if highly structured,and machine print forms are used, low volume will never produce a return on the ICR investment.

Intelligent Character Recognition has made some huge advancements in recent years, and with the rise of machine learning and artificial intelligence it may even surpass the human ability to recognize and capture form-level data. Until then, understanding what it can and cannot do will determine its successful implementation.

About iTech

iTech Data Services is a US-based data services and content management company with principal operations in the United States and India. iTech specializes in delivering cost-effective and quality solutions, including document scanning, OCR/ICR data capture, data entry, data integration, forms processing, workflow management, data transformation, and data archiving. Well trained and skilled employees, and state-of-the-art off-shore locations enable iTech to deliver optimal solutions for its clients. For more information contact Jason Dodge at jason@iTechDataServices.com