Fuzzy Match

One common use case to utilize the python tool for, is fuzzy matching. Fuzzy matching is a process that finds strings in a dataset that are approximately similar to a target string, even if they aren't an exact match, by using algorithms to calculate the degree of similarity.

Input, Output

Input

Output

A table from an Auditboard Analytics tool output with two columns able to be fuzzy matched

Your python altered data frames or charts with similarity score dictated by return

Example Use Cases

Matching address that may have been entered incorrectly or slightly different
Matching names they may have been entered incorrectly or slightly different
Assistance in reconciling data between different systems where exact matches aren't guaranteed

Example Input Table

Apple

grpe

Banana

appl

Orange

bnna

Grape

orng

Example Code

This example code creates a fuzzy match from a tool input and only returns those matches with a similarity score above .5 via def fuzzy_match(list1, list2, threshold=0.5):. It matches the first two columns provided, so you will need to rearrange your data to be matched as the first two columns.

#import Libraries
import pandas as pd
import difflib

# Function to perform fuzzy matching
def fuzzy_match(list1, list2, threshold=0.5):  # Lowering the threshold to 0.5
    matches = []
    for item in list1:
        match = difflib.get_close_matches(item, list2, n=1, cutoff=threshold)
        if match:
            similarity = difflib.SequenceMatcher(None, item, match[0]).ratio() * 100
            matches.append((item, match[0], similarity))
    return matches


#Convert list of dictionaries to pandas dataframe
tbl = sources[0]
df = tbl.df 


#Print the Dataframe to check structure
print("DataFrame structure:\n",df)


#Select the first two columns dynamically
list1 = df.iloc[:,0].tolist()
list2 = df.iloc[:,1].tolist()

#Perform Fuzzy Matching
matches = fuzzy_match(list1, list2)


#Create Dataframe for reults
results_df = pd.DataFrame(matches, columns=['Original Column','Match','Similarity Score'])


#Print and Return Results
print("\nResults:\n",results_df)
return [Table(df=results_df, name="Fuzzy Match Results")]

Example Output Table

Original Column

Match

Similarity Score

Apple

appl

66.67

Orange

orng

60.00

Grape

grpe

66.67

Last updated 9 months ago

Was this helpful?