What is a Box and Whiskers Plot?

Sometimes we need more elaborated details in various distributions or datasets that may not be fulfilled by the measures of any central tendency like mean, median, and mode. Information on the variability or the dispersion of the data demands a much more concrete foundation. This demand can be fulfilled by a box and whiskers plot. This may arise a predictable question that is “what is a box and a whiskers plot?” the question can be answered satisfactorily. A box plot is a graph that offers us a much firm indication or idea about how the values in the data should be spread. Box plots extend its lines from the boxes which are normally called whiskers. Whiskers are used to indicate variability outside the upper and the lower quartiles. Being non-parametric is one of the features of the Box plot. And this feature of the Box plot actually helps to display a variation of a statistical population in the samples where it does not make any assumptions about the underlying statistical distribution. The gaps between the different parts of the box indicate the degree of dispersion (spread) and the skewness present in the data, along with the show outliers. Box plots are drawn in two ways, i.e., we can choose to draw it either horizontally or vertically. A box plot is like a chart that we often use in exploratory data analysis.

Box plots have a five-number summary of a set of data that includes the minimum score, first quartile (lower), median, third quartile (upper), and the maximum score.

[Image will be Uploaded Soon]

Although it may seem that a box plot is primitive if compared to a histogram or a density plot, they have an advantage of occupying less space, which is quite useful when comparing distributions between many groups or datasets.

Elements of Box and Whisker Plot

The Minimum Score: The minimum score is the lowest score after excluding the outliers.

The Maximum Score: The maximum score is the highest score after excluding the outliers.

The Median: The median represents the midpoint of a data which can be shown using the line that divides the box into two halves (it is sometimes also called the second quartile). It is seen that most of the scores are much greater or equal to the value and half are less.

The Lower Quartile: We can also call the lower quartile as the first quartile which falls below 25 percent of the scores.

The Upper Quartile: The upper quartile is also known as the third quartile and it falls below 75 percent of the scores.

The Whiskers: The upper and the lower whiskers are the lines that represent the scores outside the middle 50%.

The Interquartile Range (or IQR): The interquartile range is the middle box plot that represent the scores between 25 percent to 75 percent i.e., 50 present scores.

Box and Whisker Plot Solved Examples

Question 1) Given below is a sample of the weight of 10 boxes of raisins in grams:

25, 28, 29, 29, 30, 34, 35, 35, 37, 38.

Solution

Step 1: First create a box-plot of the data and start from the smallest to the largest. But here the data is already in increasing order. So let us move on to our next step.

Step 2: Find the median in this step along with the mean of the two middle numbers.

Therefore, median = 30+34/2 = 32

Step 3: Moving on to our next step where we have to find the quartile. Our first quartile would be the median of the data points. That is to the left of the median so it’s 29. The third quartile would be at the right of the median so it’s 35.

Step 4: This is our last step and here we will be completing the five-number summary and do do this, we have to find the min and max. As we know that our min is 25 which is the smallest data point and our max is 38 which is the largest data point.

So finally, we can put our five-number summary as 25, 29, 32,35, and 38.

Question 2: The five-number summary data set that we have used in our above example, we will plot our box and whiskers plot.

25, 28, 31, 34, 37, and 40.

Solution 2:

Step 1: Start scaling and labelling the axis that will fit the summary.

[Image will be Uploaded Soon]

Step 2: Next we will be drawing a box from Q1 to Q3 vertically through the median. So the Q1 as we know was 29, Q3 was 35 and the median was 32.

[Image will be Uploaded Soon]

Step 3: Time for the whiskers to be drawn from Q1 to minimum and from Q3 to maximum. The minimum is 25 and the maximum is 38.

[Image will be Uploaded Soon]

And boom! our box and whiskers plot is ready.

Question 1: What is the Use of Box and Whiskers Plot?

Answer: The Box plot offers us a visual summary of the data therefore allowing the researchers to identify the mean values, the dispersion of the set, and the signs of skewness. This can happen only if the median will be close to the bottom and the whiskers will be shorter towards the lower end then we know it is a positive skewed. If the median will be closer to the top and the whiskers will be shorter towards the upper end then we know it is a negative skewed. They display the average score of the data set. Box plot offers to display the outliers within the data set. The whiskers box plot helps us to get more knowledge about the symmetry of the data that is when the median is positioned exactly in the middle and the whiskers on both sides are equal the distribution will be symmetric.

Question 2: How Do We Compare Two Box and Whisker Plots?

Answer: Anyone and everyone can compare the two box plots on the basis of three things:

First, the difference in the level by comparing the two boxes that is the Q1 median, and the Q3.

Second, the difference in the variation i.e., the width of the box will measure the variation of the data set. If the width of the boxes is almost close enough to each other then we must look at the ranges i.e. the minimum and the maximum.

Third, the asymmetries i.e., the outliers are most important as they make a huge difference in the skewness.