Enhancing Lao Vehicle License Plate Recognition through jTessBoxEditor and Tesseract Optical Character Recognition (OCR)
Keywords:
Optical Character Recognition (OCR), Lao Character, Data Set, Data training, Lao Car Plate, jTessBoxEditor tool, Image ProcessingAbstract
Optical Character Recognition (OCR) is a critical technology that converts printed or handwritten text into machine-readable text, enabling automated data processing and analysis. However, OCR for languages with complex text, such as Lao, presents unique challenges due to the intricate nature of the characters. This paper introduces the use of the jTessBoxEditor tool for training an OCR engine to recognize Lao text. The jTessBoxEditor tool, an extension of the Tesseract OCR engine, provides a user-friendly interface for creating and refining training data. The experimental results demonstrate the effectiveness of various techniques in recognizing Lao characters on Lao car plate. These techniques include data set creation, the generation of Tiff/Box files through character separation, and the isolation or combination of consonants and letters. Notably, the utilization of both existing box training and shape clustering training contributes to improved recognition performance. Furthermore, our findings highlight the importance of speed and simplicity in the OCR modeling process. Specifically, the creation of a dataset that involves character separation and consonant isolation, coupled with the use of existing box training, emerges as the most efficient and effective approach for Lao text recognition.
