Fuzzy Match
One common use case to utilize the python tool for, is fuzzy matching. Fuzzy matching is a process that finds strings in a dataset that are approximately similar to a target string, even if they aren't an exact match, by using algorithms to calculate the degree of similarity.
Input, Output
A table from an Auditboard Analytics tool output with two columns able to be fuzzy matched
Your python altered data frames or charts with similarity score dictated by return
Example Use Cases
Matching address that may have been entered incorrectly or slightly different
Matching names they may have been entered incorrectly or slightly different
Assistance in reconciling data between different systems where exact matches aren't guaranteed
Example Input Table
Apple
grpe
Banana
appl
Orange
bnna
Grape
orng
Example Code
This example code creates a fuzzy match from a tool input and only returns those matches with a similarity score above .5 via def fuzzy_match(list1, list2, threshold=0.5):
. It matches the first two columns provided, so you will need to rearrange your data to be matched as the first two columns.
#import Libraries
import pandas as pd
import difflib
# Function to perform fuzzy matching
def fuzzy_match(list1, list2, threshold=0.5): # Lowering the threshold to 0.5
matches = []
for item in list1:
match = difflib.get_close_matches(item, list2, n=1, cutoff=threshold)
if match:
similarity = difflib.SequenceMatcher(None, item, match[0]).ratio() * 100
matches.append((item, match[0], similarity))
return matches
#Convert list of dictionaries to pandas dataframe
tbl = sources[0]
df = tbl.df
#Print the Dataframe to check structure
print("DataFrame structure:\n",df)
#Select the first two columns dynamically
list1 = df.iloc[:,0].tolist()
list2 = df.iloc[:,1].tolist()
#Perform Fuzzy Matching
matches = fuzzy_match(list1, list2)
#Create Dataframe for reults
results_df = pd.DataFrame(matches, columns=['Original Column','Match','Similarity Score'])
#Print and Return Results
print("\nResults:\n",results_df)
return [Table(df=results_df, name="Fuzzy Match Results")]
Example Output Table
Apple
appl
66.67
Orange
orng
60.00
Grape
grpe
66.67
Last updated
Was this helpful?