1 year ago
#382850
seongyeop
Removing noise line in captcha image to solve complex captcha image
I want to delete the random noise line in captcha iamge. There are several captcha samples in below.
I utilized cv2 and pytesseract.
import cv2
from pytesseract import image_to_string
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = cv2.imread("test.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gry, 'gray')
plt.show()
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
plt.imshow(gry, 'gray')
plt.show()
cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
# cls = cv2.morphologyEx(gry, cv2.MORPH_OPEN, None)
plt.imshow(cls, 'gray')
plt.show()
val, thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
val,thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)
# val, thr = cv2.threshold(cls, 200, 255, cv2.THRESH_BINARY_INV)
# val, thr = cv2.threshold(cls, 0, 255, 8 )
plt.imshow(thr, 'gray')
plt.show()
print(val)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
morph_img = cv2.morphologyEx(thr, cv2.MORPH_CLOSE, kernel)
plt.imshow(morph_img, 'gray')
plt.show()
txt = image_to_string(thr)
print(txt)
txt = image_to_string(morph_img)
print(txt)
The result.
As I see the result, It's hard to remove some noise lines which hard to predict '7','E'.
Is there any good solution to solve the captcha?
Please recommend a method to deal with some harmful lines!
++ data URL path is encoded with base64 like this:

When I changed some encrypted alphabet in base64 text, the portion of the image was changed. It can be a tiny solution to solve captcha?
opencv
line
captcha
python-tesseract
noise
0 Answers
Your Answer