To understand the idea of mean deviation for ungrouped data, one should first take a glance on the concept of ‘Frequency Distribution’. A frequency distribution is a depiction of several observations within a given interval, either in the form of a graph or in a tabular format. The analyzed data and the objective of the analyst govern the size of these intervals. In statistical context, the intervals must be mutually exclusive and exhaustive. The general form of a frequency distribution is determined by the chart of normal distribution.
This statistical tool provides a visual representation for the distribution of observations within a particular analysis. The data collected in a sample is illustrated by histograms or bar graphs where the y-axis determines frequency count, and the x-axis determines the variable to be evaluated. It can also be represented in the form of a frequency table or a pie chart.
Now, when we focus on the types of data available for distribution, clearly there are two categories - one being grouped data and the other ungrouped data.
Grouped data is the one where raw data is categorized into various groups represented in the form of the table. The main purpose of the table is to illustrate data points in each group. There are many ways by which a single test data set can be grouped. For example, a class of students can be grouped according to their height or the marks obtained in different subjects.
Ungrouped data, on the other hand, is raw data that has not been categorized into groups. For example, when conducting a survey where you want to analyze how many women above 50 use social media in a particular area, you first need to know the total number of women who use social media in that area.
Example: Marks of 30 students in a class in Mathematics, out of 100, are given as:
88, 90, 56, 39, 45, 60, 65, 78, 91, 99, 78, 38, 67, 85, 47, 59, 81, 33, 12, 55, 96, 23, 74, 86, 88, 69, 77, 89, 95, 91
The above data is ungrouped in nature.
On grouping, the above data can be represented as-
Mean deviation determines the dispersion of all the data items in the series comparative to the measure of central tendency. This measure of central tendency is commonly median or mean. A mean deviation can also be calculated about the mode.
Let us consider a set of data that consist of observations x1, x2, x3, …………. xn
There are two steps involved in the calculation of the mean deviation of ungrouped data:
To calculate the median, the data set is to be first arranged in ascending order and the number of data present should be counted i.e. n. Depending on whether n is odd or even, the following calculations are performed:
If n is odd, the median is \[\frac{(n+1)^{th}}{2}\] item
If n is even, the median is: {[\[\frac{(n)^{th}}{2}\]] item + [\[\frac{(n)}{2}\] + 1\[^{th}\]] item} ÷ 2
Whereas, the Mean (\[\bar{x}\]) is = \[\frac{\text{Sum of the observations}}{\text{number of observations}}\] = \[\frac{\sum_{i=1}^{n}x_{i}}{n}\]
On calculating the mean or median, the value of the central tendency, relative to which the mean deviation is calculated is denoted by ‘a’. The absolute deviation of each data value from the measure of central tendency is being calculated first:
|x1−a|, |x2−a|, |x3−a|………..|xn−a|
Once we have all the absolute deviations, we then find out the mean of these absolute deviations for that particular set of data
Mean absolute deviation = \[\frac{\sum_{i=1}^{n} \mid x_{i} - a \mid }{n}\]
If the central tendency measure taken into consideration is mean i.e. x, the following equation can also be adapted:
Mean absolute deviation = \[\frac{\sum_{i=1}^{n} \mid x_{i} - \bar{x} \mid }{n}\]
In the case of the median (M),
Mean absolute deviation is = \[\frac{\sum_{i=1}^{n} \mid x_{i} - M \mid }{n}\]
It is important to remember that the difference between the lowest and highest values in a set of observation, i.e. the range, is the most important factor for measuring the dispersion.
The four steps to calculating the Mean Absolute Deviation or MAD are:
Find the average or mean
Find the value of the difference between the mean and each data point
For each difference, take the absolute value
Find the average or the mean of the differences found
Isn’t it simple?!
Q1. Calculate the Mean Deviation of the Following Ungrouped Data that Represents the Temperature Set for a Region for 10 Days in ^{0}F Taking the Measure of Central Tendency as Mean.
Ans: Temperature set range for 10 days - 85, 96, 76, 108, 85, 80, 100, 85, 70, 95
Mean (X̄) = Σⁿᵢ₌₁ xi / n
= (85 + 96 + 76 + 108 + 85 + 80 + 100 + 85 + 70 + 95) / 10
= 880 / 10
= 80
Now in order to calculate the median, we first have to find out the absolute deviation from the mean with respect to each data item.
Xi | lXi - X̄l |
85 | 3 |
96 | 8 |
76 | 12 |
108 | 20 |
85 | 3 |
80 | 8 |
100 | 12 |
85 | 3 |
70 | 18 |
95 | 7 |
Σⁿᵢ₌₁ lXi - X̄l | 94 |
Mean absolute deviation (δx) = Σⁿᵢ₌₁ lXi - X̄l / n = 94 / 10 = 9.4
Q2. The Marks Scored By a Student Out of 100 in Eight Different Subjects are as Given Below:
88, 75, 91, 68, 95, 79, 86, 99
Calculate the mean deviation about the median of this ungrouped data.
Ans: In order to find the median, the data set needs to be arranged in ascending order:
68, 75, 79, 86, 88, 91, 95, 99
Now, as there are even number of observations, the median will, therefore, be represented by the:
{[nth/2] item + [n/2 + 1th] item } ÷ 2
In this case, n = 8
Thus, median (M) = {[8th / 2] item + [8 / 2 + 1th] item} ÷ 2
= {[4th] item + [5th] item} ÷ 2
=(86 + 88) ÷ 2
= 174 / 2
= 87
Now that we know the median of the data distribution, we have to find the absolute deviation of each data with respect to the central tendency measured, i.e. median.
Xi | 丨Xi - M丨 |
68 | 19 |
75 | 12 |
79 | 8 |
86 | 1 |
88 | 1 |
91 | 4 |
95 | 8 |
99 | 12 |
Σⁿᵢ₌₁丨Xi - M丨 | 65 |
The mean absolute deviation of the above distribution about its median is:
Mean absolute deviation (δx) = Σⁿᵢ₌₁丨Xi - M丨/n = 65/8 = 8.125