Handwriting Recognition for the Enterprise
Despite the pace of digital transformation, there are a lot of enterprises where handwritten content flows within an enterprise, warranting manual intervention. OCR as a technology has been available for decades and now with Deep Learning models available many AI experts make us believe that this is a solved problem. On talking to various enterprises, it seems this is still a thorn in the flesh for enterprises. For a handwriting model to be adopted in the enterprise, the expected benefit is either eliminating manual intervention (which means the extraction accuracy has to be good enough for straight through processing) or reducing significantly the effort involved in verifying and correcting the data being extracted. On examining this problem, a little more closely, it appears that handwritten content appears mostly in forms like policy applications or mortgage application or insurance claims or tax forms etc.
Most forms contain input areas which could be grouped into the following categories
Looking at the problem of extraction differently If Instead of approaching the handwriting problem as a whole, is it possible to break it into individual problems and carve out a solution tailored to each input group. That means we can potentially apply special techniques for each type of content being captured. Text in boxes is usually hand printed characters, and people in general do try to write legible thus making it easier for the machine to read them more accurately with specialized character and number recognition models. The text box reading accuracy is around 80% which could be further improved by post processing logic and rules. Similarly, machine learning models can be created for checkboxes to identify whether a box is ticked or not, irrespective of the placement of the boxes (horizontally, vertically or a combination of both) giving us very good accuracy of about 97-98%. Free form text of course means humans tend to use cursive handwriting which is difficult to read even by the best deep learning models this will have a significantly lower accuracy. Amazon, Microsoft, Google all provide various handwriting models which could be used on transaction based pricing. Logos can be detected by image classification techniques, and open source solutions for extracting bar codes are also readily available. Strong domain specific post processing techniques for each type of content can improve the extraction accuracy further. The advantage of this approach is that each type of model can be refined and tweaked independent of the other. The extraction accuracy of a handwritten form is no longer a single number but an amalgamation of accuracy of all these components. Value to an enterprise The end objective of a handwriting extraction solution is to reduce the manual effort in verifying and correcting the data. The main problem for data entry teams is that of solution accuracy, bulk of the human work is spent in evaluating which field is erroneous and correcting the same, making it time consuming. Any solution that we provide has to start by identifying how a user can consume such an application. Can we design our model in such a way that human effort is spent in evaluating the fields most critical to the enterprise rather than all the fields so that the desired efficiency is achieved vis-à-vis the cost being spent?
Some of the approaches which may be considered are