Outliers can change the results of data analysis and make them less reliable. For example, if one or more of your values is much higher than most of the other data, it makes the mean higher, which may not be a good representation of the data as a whole. Getting rid of these outlying data points is, therefore, a crucial step in making statistical calculations that can be trusted. Since Excel doesn’t have a dedicated outliers function, the easiest way to find outliers is to use the interquartile range. You can also use the Trimmean function to get a similar result.
How to figure out the Interquartile Range (IQR)
The interquartile range of data is the range covered by the “box” on a box-and-whisker plot, or more specifically, the difference between the values of the first and third quartiles. Excel has a feature that lets you figure out any quartile for your data. Find a free cell and type “=QUARTILE([data range], [quartile number]).” Put the range of your data cells where it says “[data range], and the quartile you want where it says “[quartile number].”
For instance, if you have data in cells A2 through A101 and want to find the value for the first quartile, you would type “=QUARTILE(A2 through A101, 1).” For the first part of the argument, you can use your mouse to highlight the relevant cells, but after the comma, you need to write the number of the quartile you want. For the third quartile, you type “=QUARTILE(A2:A101, 3),” where A2:A101 is a range of cells.
Take the value of the first quartile cell and subtract it from the value of the third quartile cell. Do this in another empty cell. If the first quartile is in cell C2 and the third quartile is in cell D2, type “=D2-C2” to get the answer. This is the range between the two medians.
Outlier Analysis in Excel
You can now use the interquartile range in the outlier formula, which says that the upper limit of the data is the value of the third quartile plus 1.5 times the interquartile range, and the lower limit is the value of the first quartile minus 1.5 times the interquartile range.
If the first quartile value is in cell C2, the third quartile value is in cell D2, and the interquartile range is in cell E2, you would type “=C2-(1.5 * E2)” to find the lower limit and “=D2+(1.5 * E2)” to find the upper limit. In general, you type “=[first quartile] – (1.5 * [interquartile range])” to find the lower limit and “=[third quartile] + (1.5 * [interquartile range])” to find the upper limit.
Outliers are things that are below the lower limit or above the upper limit.
To finish the outlier test in Excel, quickly figure out which values in your data class are outliers by using the logical “OR” function. To find the outliers, type “=OR([data cell]>[upper limit], [data cell][lower limit]),” where [data cell] is the name of the cell and [upper limit] and [lower limit] are the upper and lower limits. For example, if the data is in cells A2 through A101, the upper limit is in cell F2, and the lower limit is in cell G2, go to cell B2 and type “=OR(A2>$F$2, A2$G$2)” to use the function. The dollar signs before “F,” “G,” and “2” tell Excel that this shouldn’t change when you drag the formula down.
If the value in A2 is above the upper limit or below the lower limit, the value is an outlier and “TRUE” is shown. You can move this formula down by clicking the bottom right corner of the cell with the formula and dragging it down until it stops next to the last data cell. This will make the same calculation for each data point.
If you want to change how the outliers are formatted, you can also select the data and go to “Conditional Formatting” in the “Styles” section of the “Home” tab. Select “New Rule” and then “Use a formula to figure out which cells to format.” Type the same formula as in the last sentence, and then click “Format” to choose a different format for outliers.
To use Trimmean
The “Trimmean” function makes it easier to find the “outliers.” To use the function, type “=TRIMMEAN([data range], [proportion to trim]),” where “[data range]” is the range of cells with data and “[proportion to trim]” is the decimal percentage you want to trim. This takes out the values at the top and bottom that are the most extreme, and then the mean is found based on the values that are left. So, if you cut 10%, it would get rid of the top 5% and the bottom 5% before figuring out the mean.
Enter “=TRIMMEAN(A2:A101, 0.05)” to find the adjusted mean if the data ranges from A2 to A101 and you want to get rid of the most and least extreme 5 percent of values. You could cut 15% if you wrote “=TRIMMEAN(A2:A101, 0.15).”