Handwriting Recognition for the Enterprise
Despite the pace of digital transformation, there are a lot of enterprises where handwritten content flows within an enterprise, warranting manual intervention. OCR as a technology has been available for decades and now with Deep Learning models available many AI experts make us believe that this is a solved problem. On talking to various enterprises, it seems this is still a thorn in the flesh for enterprises. For a handwriting model to be adopted in the enterprise, the expected benefit is either eliminating manual intervention (which means the extraction accuracy has to be good enough for straight through processing) or reducing significantly the effort involved in verifying and correcting the data being extracted. On examining this problem, a little more closely, it appears that handwritten content appears mostly in forms like policy applications or mortgage application or insurance claims or tax forms etc.
Most forms contain input areas which could be grouped into the following categories
- Text to be written in boxes
- Numbers in boxes
- Check boxes or radio buttons
- Free form text (eg: job description or feedback for services or nature of injury etc.)
- Logo of the company
- Barcode
Looking at the problem of extraction differently If Instead of approaching the handwriting problem as a whole, is it possible to break it into individual problems and carve out a solution tailored to each input group. That means we can potentially apply special techniques for each type of content being captured. Text in boxes is usually hand printed characters, and people in general do try to write legible thus making it easier for the machine to read them more accurately with specialized character and number recognition models. The text box reading accuracy is around 80% which could be further improved by post processing logic and rules. Similarly, machine learning models can be created for checkboxes to identify whether a box is ticked or not, irrespective of the placement of the boxes (horizontally, vertically or a combination of both) giving us very good accuracy of about 97-98%. Free form text of course means humans tend to use cursive handwriting which is difficult to read even by the best deep learning models this will have a significantly lower accuracy. Amazon, Microsoft, Google all provide various handwriting models which could be used on transaction based pricing. Logos can be detected by image classification techniques, and open source solutions for extracting bar codes are also readily available. Strong domain specific post processing techniques for each type of content can improve the extraction accuracy further. The advantage of this approach is that each type of model can be refined and tweaked independent of the other. The extraction accuracy of a handwritten form is no longer a single number but an amalgamation of accuracy of all these components. Value to an enterprise The end objective of a handwriting extraction solution is to reduce the manual effort in verifying and correcting the data. The main problem for data entry teams is that of solution accuracy, bulk of the human work is spent in evaluating which field is erroneous and correcting the same, making it time consuming. Any solution that we provide has to start by identifying how a user can consume such an application. Can we design our model in such a way that human effort is spent in evaluating the fields most critical to the enterprise rather than all the fields so that the desired efficiency is achieved vis-à-vis the cost being spent?
Some of the approaches which may be considered are
- Grade fields being extracted as RAG : with Green being the one where the system is most confident of extraction.
- Order the content extracted in order of importance (a descriptive text in cursive handwriting may not be so important to the enterprise application consuming it as compared to a check box or text in boxes)
- Or a combination of both approaches so that a human can spend time scrutinizing more important parts of the application.
In some cases it is possible that the extraction accuracy may be good enough for straight through processing. For example if the solution were to extract content from feedback forms, the critical fields would be the ratings checkboxes rather than the descriptive text, and getting the ratings checkboxes to 99% accuracy with descriptive text at 60% could be good enough for straight through processing. So in conclusion innovative approaches to recognizing handwritten content could deliver greater accuracy while leveraging current investments in technologies.