Guest Blog: Detect Fraud Using Z-Score

Trent Blog Photo-53.png
 

By Trent Russell

Trent Russell is the Founder of Greenskies Analytics. He graduated from the University of Alabama with an MIS degree before joining Ernst & Youngs’ IT Risk Assurance practice where he served multiple industries. He later joined the Financial Service Office at EY and facilitated the development of data analytics procedures. At Greenskies he works to change the perception of internal audit as a cost center byway of analytics and tech.

One of the easiest ways to detect fraud is to use a statistical measure called z-score. We're hardly going to touch the statistical part of this whole thing though. We'll barely touch the statistical part because, I'll admit, the stats class I took in college was one of my least favorite. Stats class was one of my least favorite because there was more of a focus on getting the right answer and less on what the right answer actually meant.

So I'll be doing two things to show you how to use the statistical measure called z-score. I'll talk about

-  how you can do the calculation

- how to easily read the results of the analysis.

Using Z-Score to Find Outliers in Data

In simplest terms, a z-score finds outliers in your data.

A simple and common example of what we'd look for using the z-score is an outlier in the form of a dollar amount in an invoice for each vendor. Let's say a given vendor typically has invoices in the range of 2,000 – 7,000 dollars, but there's also an invoice out there for $19,000. That $19,000 doesn't fit the vendor's typical invoice range; hence it is an outlier. That's something we should consider investigating as a potential fraudulent invoice.

If we tried to look for the outlier in descending order, the $19,000 transaction might not stick out because there might be hundreds of invoices greater than $19,000.

The Z-score calculation

To do the z-score calculation, we can apply the z-score formula.

z-score = (invoice amount – average invoice amount) / standard deviation

Check out my z-score tutorial for a step-by-step video on calculating z-score in Excel.

(Tip to ACL Galvanize customers – there's functionality called "Outliers" that will do a lot of this work for you, but note that you won’t see the actual z-score. The functionality will just list in a table all the transactions that have a z-score greater than the score you declare when you set the analysis up.)

Interpreting the results of the analysis

So now that you've watched the video and applied it to your data, all you have to do is interpret the data.

Typically, a z-score of 2.5 or higher is considered an outlier (There you go. Interpretation done!).

However, depending on how many invoices have a z-score that high, you might want to increase it by half a point until the population is more reasonable to review. Conversely, if the population is really small, then you should consider decreasing the z-score by half a point.

Here comes the techy stats part - what we're saying is that an invoice with a z-score of 2.5 is 2.5 standard deviations away from the average invoice amount for that vendor. If you calculate the standard deviation for a set of normally distributed values, 99.7% of the values fall within three standard deviations of the mean (±), which is why we typically start in the 2.5 range.

Other Uses of Z-Score

While it's not related to invoices, one of my favorite use cases for using z-score is finding where employees have overcharged overtime! Other than running the z-score on the vendor, we can also run it on a job title. With the z-score, you can find where someone with a given job title is charging more overtime hours compared to their peers with the same job title.

There's no limitation on what you can run z-score on, regardless of if you're an accountant, an investigator, an auditor, or a business owner. Identifying outliers can help you identify potential fraud and, more importantly, get you to start asking questions of your data. 

For more information on fraudulent invoices, check out my article on duplicate invoice analytics.