Analytics
  • Introductions
    • 👋Hello Analytics!
    • 👩‍🎓Workflow Philosophy
    • 📹Build a Workflow
    • 🖨️Start from a Template
      • 3 Way Match
      • Accounts Payable - Analytical Review
      • Accounts Payable - Debt Aging and Approval
      • Accounts Receivable - Analytical Review
      • Accounts Receivable - Completeness, Write-Offs, and Manual Adjustment Testing
      • Account Receivable - Credit Aging and Approval
      • Accounts Receivable - Credit Limits and Balance Testing
      • Account Reconciliation
      • Automated Sampling
      • Benford's Law
      • CutOff Testing
      • Change Management Review
      • Disbursement Analysis
      • Fictitious Vendor Testing
      • Financial Account Variance Analysis
      • Fixed Assets - Depreciation Recalculation
      • Fixed Assets - Duplicate Identification
      • Fixed Assets - Negative NBV Check
      • Journal Entries - Analytical Review
      • Journal Entries - Anomaly Detection
      • Journal Entries - Approval and Suspicious Posting Testing
      • Journal Entries - Balanced and Suspicious Amount Testing
      • Journal Entries - Duplicates and Sequence Testing
      • Missing Data Check
      • New User Access Testing
      • Numerical Outlier Detection
      • Payroll - Ghost Employees, Analysis, and Duplicates
      • Payroll - Inactivity and Salary Confirmation
      • Procure to Pay - Duplicates and Suspicious Posting Testing
      • Procure to Pay - Mismatched Documents & Vendor Spend
      • Procure to Pay - Payments Remitted to Employees
      • Procure to Pay - Payment Timing and Terms Test
      • Procure to Pay - Vendor Master File Review
      • Risk Scoring
      • Segregation of Duties
      • Terminated User Access Testing
      • Travel & Expense Testing
      • User Access Review
    • 📓Release Notes
  • Workflows
    • 🎨The Canvas
    • 📏Setting Up
    • 👆Drag & Drop
      • 🖱️Click to Add
    • 🧰Tools
    • 🧮Expression Builder
      • Operators
      • Functions
        • Aggregate Functions
          • AVERAGE
          • CORR
          • COUNT
          • COUNTD
          • COUNTBY
          • COUNTIF
          • COUNTIFS
          • COVAR
          • COVARP
          • COVARS
          • MAX
          • MEDIAN
          • MIN
          • MAXBY
          • MINBY
          • PERCENTILE
          • RUNNINGTOTALBY
          • SIGN
          • SIN
          • SQRT
          • SQUARE
          • STDEV
          • STDEVP
          • STDEVS
          • SUM
          • SUMBY
          • SUMIF
          • TAN
          • VAR
          • VARP
          • VARS
          • ZN
        • Conversion Functions
          • TIMESTAMPTODATE
          • TODATE
          • TODECIMAL
          • TOINT
        • Date/Time Functions
          • DATEADD
          • DATEDIF
          • DATENAME
          • DATENORMALIZE
          • DATEPART
          • DATETRUNC
          • DAY
          • DAYS
          • WORKDAYS
          • HOUR
          • ISDATE
          • ISOWEEKDAY
          • ISOWEEK
          • ISOQUARTER
          • ISOYEAR
          • MAKEDATE
          • MAKEDATETIME
          • MINUTE
          • MONTH
          • NOW
          • QUARTER
          • SECOND
          • TODAY
          • WEEK
          • WEEKDAY
          • YEAR
        • Logical Functions
          • AND
          • BETWEEN
          • CASE
          • CHOOSE
          • CONTAINSWITHIN
          • IF
          • IFS
          • IIF
          • IN
          • IFNULL
          • ISBOOLEAN
          • ISDECIMAL
          • ISDURATION
          • ISINTEGER
          • ISNULL
          • ISNUMBER
          • ISSTRING
          • ISUNIQUE
          • NOT
          • NULL
          • OR
          • SWITCH
          • ALL
            • TOSTRING
          • ANY
        • Math Functions
          • ABS
          • ACOS
          • ASIN
          • ATAN
          • ATAN2
          • CEILING
          • COS
          • COT
          • COSEC
          • DEGREES
          • DIV
          • EVEN
          • EXPONENTIAL
          • FILLINFINITY
          • FLOOR
          • HAVERSINE
          • LOG
          • LN
          • ODD
          • MODULO
          • PERCENTILEOFVALUE
          • PERCENTILEVALUE
          • PI
          • POWER
          • RADIANS
          • RANDOM
          • ROUND
          • SEC
        • Table Functions
          • ENCODE
          • INDEX
          • INDEXBY
          • FILLNULL
          • FIRSTBY
          • GENERATEUNIQUEID
          • LASTBY
          • LOOKUP
          • MATCH
          • NTH
          • OFFSET
          • OFFSETBY
          • PREVIOUSVALUE
          • RANK
          • RANKBY
          • RECORDID
          • ROLLINGAVERAGE
          • ROW
          • RUNNINGAVERAGE
          • RUNNINGMAX
          • RUNNINGMIN
          • RUNNINGSTDEV
          • RUNNINGTOTAL
          • WINDOWAVERAGE
          • WINDOWMAX
          • WINDOWMIN
          • WINDOWCOUNT
          • WINDOWSUM
          • SEQUENCE
          • WINDOWMEDIAN
          • WINDOWSTDEV
          • WINDOWSTDEVP
          • WINDOWSTDEVS
          • WINDOWVAR
          • WINDOWVARP
          • WINDOWVARS
          • WINDOWCORR
          • WINDOWCOVAR
          • WINDOWCOVARP
          • WINDOWCOVARS
          • SMOOTHEDAVERAGE
        • Text Functions
          • ASCII
          • CHAR
          • CONCAT
          • CONTAINS
          • ENDSWITH
          • FIND
          • FINDNTH
          • ISEMPTY
          • JSONPARSE
          • LEFT
          • LENGTH
          • LOWER
          • LTRIM
          • MID
          • PROPER
          • RIGHT
          • RTRIM
          • SPACE
          • SPLIT
          • STARTSWITH
          • TRIM
          • SUBSTITUTE
          • UPPER
        • Window Functions
    • 🔗Parameters
      • File Input Parameters
      • Sample Tool Parameters
      • System Parameters
  • Tools
    • 🗃️Import
      • Import File
      • Import From API
      • Import From Integration
        • GitHub
        • Snowflake
      • Import Sample Data
      • New Table
    • ✨Clean
      • Find Replace
      • ParseJSON
      • Sample
      • Text to Columns
      • Validate
    • 💻Code
      • PythonCode
        • Fuzzy Match
    • 🤝Merge
      • Append
      • Join
    • 🧙Transform
      • Edit Columns
      • Add Columns
      • Select Columns
      • Filter
      • Sort
      • Deduplicate
      • Pivot
      • Unpivot
    • 📊Visualize
      • Chart
        • Bar
        • Line
        • Combo
        • Scatter
        • Histogram
        • Box
        • Pie
        • Area
        • Funnel
    • 📬Publish
      • Publish to Toolkit
      • Publish via Email
  • 📚Dictionary
Powered by GitBook
On this page
  • Input, Output
  • Example Use Cases
  • Example Input Table
  • Example Code
  • Example Output Table

Was this helpful?

  1. Tools
  2. Code
  3. PythonCode

Fuzzy Match

One common use case to utilize the python tool for, is fuzzy matching. Fuzzy matching is a process that finds strings in a dataset that are approximately similar to a target string, even if they aren't an exact match, by using algorithms to calculate the degree of similarity.

Input, Output

Input
Output

A table from an Auditboard Analytics tool output with two columns able to be fuzzy matched

Your python altered data frames or charts with similarity score dictated by return

Example Use Cases

  • Matching address that may have been entered incorrectly or slightly different

  • Matching names they may have been entered incorrectly or slightly different

  • Assistance in reconciling data between different systems where exact matches aren't guaranteed

Example Input Table

A
B

Apple

grpe

Banana

appl

Orange

bnna

Grape

orng

Example Code

This example code creates a fuzzy match from a tool input and only returns those matches with a similarity score above .5 via def fuzzy_match(list1, list2, threshold=0.5):. It matches the first two columns provided, so you will need to rearrange your data to be matched as the first two columns.

#import Libraries
import pandas as pd
import difflib

# Function to perform fuzzy matching
def fuzzy_match(list1, list2, threshold=0.5):  # Lowering the threshold to 0.5
    matches = []
    for item in list1:
        match = difflib.get_close_matches(item, list2, n=1, cutoff=threshold)
        if match:
            similarity = difflib.SequenceMatcher(None, item, match[0]).ratio() * 100
            matches.append((item, match[0], similarity))
    return matches


#Convert list of dictionaries to pandas dataframe
tbl = sources[0]
df = tbl.df 


#Print the Dataframe to check structure
print("DataFrame structure:\n",df)


#Select the first two columns dynamically
list1 = df.iloc[:,0].tolist()
list2 = df.iloc[:,1].tolist()

#Perform Fuzzy Matching
matches = fuzzy_match(list1, list2)


#Create Dataframe for reults
results_df = pd.DataFrame(matches, columns=['Original Column','Match','Similarity Score'])


#Print and Return Results
print("\nResults:\n",results_df)
return [Table(df=results_df, name="Fuzzy Match Results")]

Example Output Table

Original Column
Match
Similarity Score

Apple

appl

66.67

Orange

orng

60.00

Grape

grpe

66.67

Last updated 7 months ago

Was this helpful?

💻