Top

Download PDF

Read More

An outlier is a mathematical value in a set of data which is quite distinguishing from the other values. In simple terms, outliers are values uncommonly far from the middle. Mostly, outliers have a significant impact on mean, but not on the median, or mode. Thus, the outliers are crucial in their influence on the mean. Remember that there is no rule to determine the outliers. Value of an outlier is generally more than 1.5 times the value of the interquartile range (IQR) beyond the quartiles.

Plotting the data on a number line as a dot plot will enable you to determine the outliers.

Outliers are basically considered to be stragglers, meaning that — extremely high or extremely low values — in a data that can throw off the stats. For example, if you were measuring the height of people in a room, your average value might be thrown off if Robert Wadlow was in the room.

Apparently, Robert Wadlow is discovered to be the tallest man ever in medical history, who when last measured to be 2.72 m (8 ft 11.1 in) tall on 27 June 1940.

Box and whisker plots will often display outliers as dots that are individualized from the rest of the plot.

Below are a box plot and whisker plot of the distribution from above that does not display outliers.

(Image will be uploaded soon)

Below, is a box and whisker plot of a similar distribution that does display outliers.

(Image will be uploaded soon)

Below is the step-by-step solution to the outlier math example.

Example:

Determine the outliers of the data set. Also, evaluate the mean of the data set including the outliers and excluding the outliers.

35, 75, 20, 25, 15, 30, 30, 15, 45, 40, 110

Solution:

First, arrange the data set in order.

15, 15, 20, 25, 30, 30, 35, 40, 45, 75, 110

Now, plot the data on a number line in the form of a dot plot.

The values 75 and 110 are far off the middle. Thus, these two values are outliers for the assigned set of data.

Find the mean median mode outlier of the data:

Mean = {Sum of the data values}/{Number of data values}

= [15 + 15 + 20 + 25 + 30 + 30 + 35 + 40 + 45 + 75 + 110]/ 11

= 40

Now to find the mean without the outlier,

Evaluating the mean of the data set excluding the outliers, remove the values far off the middle (i.e. 75 and 110):

Mean = Sum of the data values/Number of data values

= {15 + 15 + 20 + 25 + 30 + 30 + 35 + 40 + 45}/9

=20.45

The mean of the given data set is 40 when outliers are included, however, it is 20.45 when outliers are not included.

Example:

For the data set including values 2, 5, 6, 9, 12, we are available with the following five-number summary:

Solution:

Minimum = 2

1st Quartile = 3.5

Median = 6

3rd Quartile = 10.5

Maximum = 12

IQR = 10.5 – 3.5 = 7.

Thus, 1.5·IQR = 10.5.

In order to identify if there are any outliers, we should consider the numbers that are 1.5·IQR or 10.5 beyond the quartiles.

1st quartile – 1.5·IQR = 3.5 – 10.5 = –7

3rd quartile + 1.5·IQR = 10.5 + 10.5 = 21

Considering the fact that none of the data lies outside the interval from –7 to 21, thus, we deduce there are no outliers.

The outlier is a data point that lies outside the entire pattern in a distribution.

The outliers are shown as dots.

A usual rule says that a data point is an outlier given that it is more than 1.5 IQR1.

The whiskers are required to change.

Whiskers stretch out to the farthest point in the data set that isn't an outlier.

FAQ (Frequently Asked Questions)

Q1. Can There be a Negative Outlier?

Answer: Yes, absolutely.

To understand the theory, let's consider a outlier math example for a data set:

-19, -1, (0), 5, 7, (9), 12, 12, (12), 13, 13

Low threshold Q1-1.5 × (Q3-Q1)

= 0 - 1.5 × 12

= -18.

Seeing that our minimum value is -19 is less than (<) -18, thus it is an outlier.

Now, let's shift our numbers in a manner that there's no more negative numbers:

0, 18, (19), 24, 26, (28), 31, 31, (31), 32, 32 – (a similar order, but with numbers moved to be positive.)

Low threshold = Q1 - 1.5 × (Q3 - Q1)

= 19 - 1.5 × (31 - 19)

= 19 - 1.5 × 12

= 19 - 18 = 1.

Our difference is the same here, -19 - (-18) = 0 - 1 = -1, therefore, negative numbers can be used in our data sets as well as positive.

If you pay attention to it, you will notice that there's no difference in negative or positive numbers since there remains no difference between coordinates on the (x, y) plane. For example, you can obtain the distance between 2 points, it doesn't matter where those 2 points lie. This is not a special case.

Q2. What if Most of the Data Points Lie Outside the IQR?

Answer: Although you can have "many" outliers (in a large data set), it is impossible for "most" of the data points to be outside of the IQR. The IQR, or more specifically, the zone between Q1 and Q3, by definition contains the middle 50% of the data. Extending that to 1.5*IQR above and below it is a very generous zone to encompass most of the data.