1 year ago
#280679
aiden rosenblatt
How to replace strings in a dataframe where there is a likely typo
I been working on this for a few hours but no progress on how to automate. I have a dataframe with over 50,000 rows.
Occasionally there is a misspelling like
- Rosalind vs Rosalinda
- Wong vs Wang
Of course there can be cases where there is lets say indeed two different people but lets assume that they work in different factories
- John Wong from Factory1
- John Wang from Factory1 -> Should be changed to John Wong
- John Wang from Factory2
Without manually finding all the typos, how do I clean this dataset or atleast identify likely typos?
So the dataframe would go from
DF1
Lname Fname Location
Wong John Factory1
Wang John Factory1
Wong Joh Facotry1
Wang John Factory2
to something like
Lname Fname Location
Wong John Factory1
Wong John Factory1
Wong John Factory1
Wang John Factory2
Is something like this possible? Thanks
Edit: fixed typo in the location
python
python-3.x
pandas
levenshtein-distance
0 Answers
Your Answer