1 year ago

#382850

test-img

seongyeop

Removing noise line in captcha image to solve complex captcha image

I want to delete the random noise line in captcha iamge. There are several captcha samples in below.

test1

test2

test3

I utilized cv2 and pytesseract.

import cv2
from pytesseract import image_to_string
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

img = cv2.imread("test.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gry, 'gray')
plt.show()
(h, w) = gry.shape[:2]
gry = cv2.resize(gry, (w*2, h*2))
plt.imshow(gry, 'gray')
plt.show()

cls = cv2.morphologyEx(gry, cv2.MORPH_CLOSE, None)
# cls = cv2.morphologyEx(gry, cv2.MORPH_OPEN, None)
plt.imshow(cls, 'gray')
plt.show()
val, thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
val,thr = cv2.threshold(cls, 0, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)

# val, thr = cv2.threshold(cls, 200, 255, cv2.THRESH_BINARY_INV)
# val, thr = cv2.threshold(cls, 0, 255, 8 )
plt.imshow(thr, 'gray')
plt.show()
print(val)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (4,8))
morph_img = cv2.morphologyEx(thr, cv2.MORPH_CLOSE, kernel)
plt.imshow(morph_img, 'gray')
plt.show()

txt = image_to_string(thr)
print(txt)
txt = image_to_string(morph_img)
print(txt)

The result.

result

As I see the result, It's hard to remove some noise lines which hard to predict '7','E'.

Is there any good solution to solve the captcha?

Please recommend a method to deal with some harmful lines!

++ data URL path is encoded with base64 like this:

data:image/png;base64,/9j/4AAQSkZJRgABAgAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAAgAKoDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD2+iimSyxwQvNNIscUalndzgKBySSegqxD6p3el2d7KJpY2WYLt86GRopNvXbvQhtuTnGcZ5rye68V6pY/ES3u5tQV7O4YEWJvGjit1YGMLMOQjrwzjBwc161Z2stvve4u5bmaTG4sAqLjsijgDJPXLdAWOBUwnzXFCbvoVv7Pv4P+PTVpCOgjvIlmRV9iu1yenLMe+cnmgXmo2rxi9tLZoWdYvPguMHcxADFHA2gk9AzHJAG7rVu/a1WwmF8qPbMuyRHTeHDcbdvO7OcbcHOcYOa+ftL8ax6Z41f/AISOHVLvw3e7pLWPVC83lRO+YplSTduAUFcjJILHJ+7XTSoyqptdDXn6y2Pomq91ex2mxWWSSWTPlxRKWZsfyHIG44AyMkZrJv8ATItLtHu7O5vbSOIFpfLuxsjiAJO1JsxKowOgXABwccHlrjwxqnizw5ZC+u7OacyRzX0Sl4pUlEbZjMhMoXazn5AgHJxtzXO07aGUr7RNu71PSVvBc3+s6M95bSjdaefGuzbu+XLHPmKx4Y7emAE3MT01pd29/apc2syTQyD5XU8HBwR9QQQR2IrzPVvCPhrQvCl9f3fhqdbizhVVeW7Z1nkJ2hhskB27iCcqhweAOg6rwtpt14X8OadbXkrvGsJNzuOfIcncMcnCjJUkHHAbAG41HKoarYlJxdjZvJHtL22ud7fZ5CLeZc8AsfkbHruO3gc7wScLUl7qmn6bs+339ra+ZnZ58ypuxjOMnnqPzqe4giuraW3mXdFKhR1yRlSMEcV5JoN1Lp3xMlvNdb7bc3ELRpJbwCSSOQIjcxoNykJ8pIXrn0YhylaxpFXZ6Bq+qwahpKW+k38csl/cLZJLazAlM/NKVdThXWISOCe6jg5AO/Xkmh/8TL4331/pPy2MClblR+5Ylo8PujbDH94Fzxwxye1ejJ/xN74TdbC1kdFRv+W06tjdj0QqwGerfNgbVY6LVWKcbE1xrek2k7QXOqWUMy43Ry3CKwyMjIJ9Kz5/FugW19DFd3D21ywwn2m1liIVjjqyjCkjr049q4fWYoPC/wAU9PuLYW0FvN5bFNgjjhRsxN0IHQFs8cn89H4g6nBrPh+yS0ju/s5uEme7ktJUhjQgqGLFeQd4xgH+Wed1XZ+R50sVNKb0Ti9j0C3W2KtcWyxFbkiVpIwMSnaAGJH3vlCjPoBS3FzBaQNPczRwwrjdJIwVRk4GSfeszwtdWt14ZsBaXUdwsEKQO8eQN6KARggEevIHBB71sVsndXOyEuaKZFb3MF3As9tNHNC2dskbBlODg4I96lrx34enSl/tF9WsEngHlfv5LcSpB9/O44JQEZJYjaAvzEcZ9Be00G11G3NlaWkOowXBSKOGNYi7mIkqTjlQj7jj0HU8FytHcdzaluNs6QRp5kpwzDOAiZ+8T+BwO59gSJ6gt7fydzu/mTyY8yTGM46ADsoycD69SSTPUq/UEc9eeHPCVjEJJ/D2lgM21VTTkd3PXCqqkscAngdAT0BrnNU8H22r3dvp0GjaHpit/pEypYrJIsSn5Q7oV2l2ONq9kciQ4we40meS60axuJm3Sy28bu2MZJUEnio4tFs4dTfUUN19pf7xa7lZSPmwNhbbgbmwMYGeMUX5kmuobnlvj7wrNbeH7fURoWj2L2pC3EmmuFDhsDJQxqcbguPmJG49eo7Pw8f7W8L2mtLrOpWTTRF7uRpUdWdSQ7YlDrGuQxATaMHkcADQ8XR293oz6dNK4NwrHy1MY3Ko+ZnZ1YIi5DF8ZB24yxCnI8E+Djo1s8s89yYJXWaOzlfKK4Jw5G0HOMYyAflDEK2FjUaTU3N7FU1KL5r6FqSbW7WCbXtQfT5ra0ieSOKWKW28qMKS0xA80hiMjaVyFPVSWU8vrHhyXxB8MtL0q/0O6t72zsYfs195QlELhF+Uop84bgArDYdp5wwUNXoeuaLbeINKm067kuY4ZVZS1vO0TcqV/hPzDBPytlT3Bq1ZwSW1qkMt3NduuczTBA7c55CKq8dOAOlbxm42cXrcuVRSfvRX9fh+B4J4Q8a3o8La74H1gzPfG0ltNNjdcOJCrIYCSRyDjaDzwVHOxa92ltbe92XcMmyUoPLuoNpYoecZIIZTnocjv1ANc34h+Huna54n0rxDFJ9j1GyuYppXSPcLlEIIVhkfNwAG9OCDgY37m3NsJniWRrWbd58MRIdSerx7eQe5A5PUfNkNpXqQnaUdO/qY2scrr8M+v+MdC8P3saPb22/UbtI8FJEX5YyQwzy24FBnhup6ju65E3UenmfUbJbLVZpswJqEKo9wZMAhJFjUF1AUE7Pm2gfIdpauptnSS1heOf7QjIpWbIPmDH3srxz144rmuC3aG2lt9ktxbq+6KP5YgRyqAcKT3x6+mM5OSfLPCJuND8Q6xe6t5ou7aCH7UjgySGJl/eSKBkvh1jJYZGM9TivWq5rX7LSoBc3d3YWN3eTHehuLYSGKNVUO57lUALEZGSQoOWGZ5NVboXFX0R57f6ZeXHxA0ODzpYdVuII5dRa2nCyxSuXeRQ4PRUO1ck/IqjnjPskUUcESRRRrHGihURBgKBwAB2FY/hnQoNDspvLt1hluZTLIAFBA6KDt4BxyQPlDFtuFwK26abbcu/8AX9fcaVJJ2iuh5p8V45IZdHv4EaORGkU3CDBUjaUG4dD94j8cd66C+8ZaXGi5e21O0vNsccFqweclh91oj1B55yDlgu3vW/d6Tp9/KJL2zhuWVdqidA4UewOQCe5HXAz0GLUcccMSRRIscaKFVFGAoHQAdhUOErtp2ucKozVSUk97HnPw88K7tJmv72W4QXLL5K2148WVAPJMTjJySMNyu0+td5q13JYaNfXkQUyQW8kqhhwSqkjPtxTbjTVadrq0k+yXbY3yogIlwOBID94cDuGAyARk057KO/s0i1W1tbllbJUx70JGQGAbOCR25xkjJ6mqacUovoaUaSpQUF0PMvh9Z6c1u8l7YwXj3DyIkcsSyNmMRlQmehPmtnthQeACa6Kx0ddP8TXV1p0VuTbIp/s2LakYkZR5vkg42uENv83CtvYHbuJV76FpmneIbuUwy22nW1o147QXEkYjdiQSFVsg7Y3zt4I2jHAzs6BocdhDFezpN/acsWbgy3Ly7WbaWUZYjA2qoPJwi5J60P35XYJNy1Jo9etrjMFsN2ojh7F2CyxH/poBnav+1yCMbd25cv8A7OvW+Z9avFY8kRxQhQfYFCQPTJJ9zTJNChuMTzzSjUByLyFjG6/7K9R5YOD5bblJALBjknVpcrl8Rve2x//Z

When I changed some encrypted alphabet in base64 text, the portion of the image was changed. It can be a tiny solution to solve captcha?

opencv

line

captcha

python-tesseract

noise

0 Answers

Your Answer

Accepted video resources