1 year ago

#382850

test-img

seongyeop

Removing noise line in captcha image to solve complex captcha image

I want to delete the random noise line in captcha iamge. There are several captcha samples in below.

test1

test2

test3

I utilized cv2 and pytesseract.

import cv2
from pytesseract import image_to_string
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

img = cv2.imread("test.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gry, 'gray')
plt.show()
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
plt.imshow(gry, 'gray')
plt.show()

cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
# cls = cv2.morphologyEx(gry, cv2.MORPH_OPEN, None)
plt.imshow(cls, 'gray')
plt.show()
val, thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
val,thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)

# val, thr = cv2.threshold(cls, 200, 255, cv2.THRESH_BINARY_INV)
# val, thr = cv2.threshold(cls, 0, 255, 8 )
plt.imshow(thr, 'gray')
plt.show()
print(val)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
morph_img = cv2.morphologyEx(thr, cv2.MORPH_CLOSE, kernel)
plt.imshow(morph_img, 'gray')
plt.show()

txt = image_to_string(thr)
print(txt)
txt = image_to_string(morph_img)
print(txt)

The result.

result

As I see the result, It's hard to remove some noise lines which hard to predict '7','E'.

Is there any good solution to solve the captcha?

Please recommend a method to deal with some harmful lines!

++ data URL path is encoded with base64 like this:



When I changed some encrypted alphabet in base64 text, the portion of the image was changed. It can be a tiny solution to solve captcha?

opencv

line

captcha

python-tesseract

noise

0 Answers

Your Answer

Accepted video resources