1 year ago

#280679

test-img

aiden rosenblatt

How to replace strings in a dataframe where there is a likely typo

I been working on this for a few hours but no progress on how to automate. I have a dataframe with over 50,000 rows.

Occasionally there is a misspelling like

  • Rosalind vs Rosalinda
  • Wong vs Wang

Of course there can be cases where there is lets say indeed two different people but lets assume that they work in different factories

  • John Wong from Factory1
  • John Wang from Factory1 -> Should be changed to John Wong
  • John Wang from Factory2

Without manually finding all the typos, how do I clean this dataset or atleast identify likely typos?

So the dataframe would go from

DF1

  Lname    Fname     Location
  Wong     John      Factory1
  Wang     John      Factory1
  Wong     Joh       Facotry1
  Wang     John      Factory2

to something like

   Lname   Fname     Location
   Wong    John      Factory1
   Wong    John      Factory1
   Wong    John      Factory1
   Wang    John      Factory2

Is something like this possible? Thanks

Edit: fixed typo in the location

python

python-3.x

pandas

levenshtein-distance

0 Answers

Your Answer

Accepted video resources