Is it possible to train tesseract v5 for OCR Egyptain licence plate?

2 years ago

#271607

Radwa Ahmed

I'm working on a project to OCR Egyptian licence plate written in arabic alphabet and arabic-indic numbers. The traineddata from https://github.com/Shreeshrii/tessdata_arabic gives an accuracy of 60% for letters and 70% for numbers. I'm gussing the bad accuracy is because the font on the plates is different. Also the letters are written seperatly (أ هـ ج)(ل ل ص) on the plates while it's usually connected in text books (أهج)(للص). And also because the plates deteceted have different lighting conditions or the letters may not be so clear -the plate can be dirty or distorted-.

Here's a sample that's recognised with extra apostrophe at the beginning ('ل ل ص ٦٢٩) after preprocessing the image to gray scale then to black and white. The correct characters are (ل ل ص ٦٢٩)

Another sample of the plates I am trying to recognise. black and white preprocessing. This one fails. it's recognised as (ط ئ ؤ د ١٢) The characters on the plate are (ط ج د ١٢٦٤)

Should I try with another preprossiccing? Or should I retrain the existing traineddata for the different font (I searched the font name but couldn't find it). Or train from scratch as the the plate images have alot of noise and differ in brightness/constract.

ocr

arabic

python-tesseract

pre-trained-model

indic

0 Answers

Your Answer

Posts

Questions

Blogs

Jobs