1 year ago
#271607
Radwa Ahmed
Is it possible to train tesseract v5 for OCR Egyptain licence plate?
I'm working on a project to OCR Egyptian licence plate written in arabic alphabet and arabic-indic numbers. The traineddata from https://github.com/Shreeshrii/tessdata_arabic gives an accuracy of 60% for letters and 70% for numbers. I'm gussing the bad accuracy is because the font on the plates is different. Also the letters are written seperatly (أ هـ ج)(ل ل ص) on the plates while it's usually connected in text books (أهج)(للص). And also because the plates deteceted have different lighting conditions or the letters may not be so clear -the plate can be dirty or distorted-.
Here's a sample that's recognised with extra apostrophe at the beginning ('ل ل ص ٦٢٩) after preprocessing the image to gray scale then to black and white. The correct characters are (ل ل ص ٦٢٩)
Another sample of the plates I am trying to recognise. black and white preprocessing. This one fails. it's recognised as (ط ئ ؤ د ١٢) The characters on the plate are (ط ج د ١٢٦٤)
Should I try with another preprossiccing? Or should I retrain the existing traineddata for the different font (I searched the font name but couldn't find it). Or train from scratch as the the plate images have alot of noise and differ in brightness/constract.
ocr
arabic
python-tesseract
pre-trained-model
indic
0 Answers
Your Answer