Fuzzy Match

One common use case to utilize the python tool for, is fuzzy matching. Fuzzy matching is a process that finds strings in a dataset that are approximately similar to a target string, even if they aren't an exact match, by using algorithms to calculate the degree of similarity.

Input, Output

Input
Output

A table from an Analytics tool output with two columns able to be fuzzy matched

Your python altered data frames or charts with similarity score dictated by return

Example Use Cases

  • Matching address that may have been entered incorrectly or slightly different

  • Matching names they may have been entered incorrectly or slightly different

  • Assistance in reconciling data between different systems where exact matches aren't guaranteed

Example Input Table

A
B

Apple

grpe

Banana

appl

Orange

bnna

Grape

orng

Example Code

This example code creates a fuzzy match from a tool input and only returns those matches with a similarity score above .5 via def fuzzy_match(list1, list2, threshold=0.5):. It matches the first two columns provided, so you will need to rearrange your data to be matched as the first two columns.

#import Libraries
import pandas as pd
import difflib

# Function to perform fuzzy matching
def fuzzy_match(list1, list2, threshold=0.5):  # Lowering the threshold to 0.5
    matches = []
    for item in list1:
        match = difflib.get_close_matches(item, list2, n=1, cutoff=threshold)
        if match:
            similarity = difflib.SequenceMatcher(None, item, match[0]).ratio() * 100
            matches.append((item, match[0], similarity))
    return matches


#Convert list of dictionaries to pandas dataframe
tbl = sources[0]
df = tbl.df 


#Print the Dataframe to check structure
print("DataFrame structure:\n",df)


#Select the first two columns dynamically
list1 = df.iloc[:,0].tolist()
list2 = df.iloc[:,1].tolist()

#Perform Fuzzy Matching
matches = fuzzy_match(list1, list2)


#Create Dataframe for reults
results_df = pd.DataFrame(matches, columns=['Original Column','Match','Similarity Score'])


#Print and Return Results
print("\nResults:\n",results_df)
return [Table(df=results_df, name="Fuzzy Match Results")]

Example Output Table

Original Column
Match
Similarity Score

Apple

appl

66.67

Orange

orng

60.00

Grape

grpe

66.67

Last updated

Was this helpful?