Fuzzy String Matching in Python

Lets say you want to determine if two strings are almost the same in Python. A few years ago I used the python-Levenshtein. I chose that because I read about the Levenshtein distance. Thought it looked good. Did a Google search. One problem with this it is not maintained and it is not ready for production.

Recently, I came at the problem anew and found the builtin Python library called difflib. I could match strings using:

from difflib import SequenceMatcher as SM

SM(None, 'The first string', 'The first string').ratio()
>>> 1.0

SM(None, 'The first string', 'The second string').ratio()
>>> 0.7272727272727273

For my needs, this works well enough. It’s builtin. Its my new go to lib.

If difflib does not work for you, this looks promising: fuzzywuzzy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s