Bias Definition In Statistics
A bias is the deliberate or involuntary favouring of one class or outcome over other potential groups or outcomes in the chosen set of data. If you are asked to define bias in statistics- it is a phenomenon that occurs when a model or data set is unrepresentative. This sampling procedure highlights some grave issues for the researcher as a simple raise cannot ease it in sample size. Bias portrays the actual variation between the expected value and the real value of the parameter considered for the assay. There are multiple sources of bias that result in this. It is a drawback in statistical analysis and needs to be rectified in order to provide accurate data investigation.
In this article, all types of bias have been discussed in detail to help you identify potential sources of bias while planning a sample survey. On identifying a probable bias, it is important to determine whether the result is an overestimate or an underestimate.
Different Types of Bias In Statistics
The major types of bias that can significantly affect the job of a data scientist or analyst are:
Omitted variable bias
As per the sampling method in statistics, bias can be critically segregated into two major classifications:
1. Measurement Bias (Observation or Information Bias):
When major information in a survey is either measured, collected, or interpreted inaccurately, it leads to information bias. As per John’s Hopkins, it is when: “…information is collected differently between two groups, leading to an error in the conclusion of the association.”
For example, you might survey to find out if you have consumed a specific brand of soap. Confused by the arrangement of the questionnaire some people must have mistakenly responded as not consumed even if that person has consumed it. Different types of information bias result due to the following reasons:
(i) Data Collection Error:
Mishandling of data or machinery malfunction may lead to ill-handling of data by the scientist.
(ii) Fault in the Questionnaire:
The interviewer may pose the question such that it may have more choices as per the interviewer points of view as compared to the opposite idea to that of the purpose of the survey. This directs the respondent’s response.
(iii) Respondents Record-keeping System:
Older adults, when expected to fill the survey answers by remembering their previous experiences, might land into misunderstanding the questionnaire and fetch incorrect inputs because of weak record keeping.
2. Non-representative Bias (Selection Bias):
This type of bias occurs when a survey sample fails to represent the population accurately. This occurs due to involuntarily working with a specific division of population instead of the whole, where the sample becomes unrepresentative of the whole population. The major types of selection bias are:
(i) Undercoverage Bias:
It occurs when some respondents of the sample population are not represented in the sample. Some members are excluded from the survey. Convenience sampling is the major reason behind such a bias. This happens when data is collected from an easily accessible source like a local supermarket.
(ii) Non-response Bias:
In such cases, individuals identified to represent a survey are unwilling or unable to participate in the survey. Thus, respondents have an upper hand to the outcome of the survey. The conflicting views of non-respondents remain disregarded and unnoticed.
(iii) Voluntary Response Bias:
Voluntary response type bias happens when members of a sample are self-selected volunteers. An example is call-in radio shows. These sorts of responses from voluntary callers give a faulty representation of the overall population in favour of strong opinions.
(iV) Volunteer Bias:
Volunteer bias meaning in statistics is defined by the situation where the population that volunteers for the trials may not represent the targeted respondents.
(V) Survivorship Bias:
A survey that calls for the survival of a lengthy process for being counted as a complete response gives rise to biased sampling.
(Vi) Confirmation Bias:
Such a bias rises on samples that favour the information pertaining to only one belief.
All information that defines bias in statistics is included in this article with special focus on different kinds of bias, leading to a clear idea about identification as well as rectification of bias in data analysis.
Did You Know?
An estimator in statistics is a set of protocols for estimating a quantity based on collected data. A biased estimator is the one that gives a false reflection of the population parameter. Suppose you are in a party, playing the game of “bell the cat” where you get to stick the bell to the cat’s picture while being blindfolded. The person, who pins the bell closest to where the bell should go on the neck, wins the game. But unfortunately, even after trying ten times, you tend to put the bell either on the nose or the stomach or the ears of the cat. In this case, your estimation about the location of the exact position of where the bell must be pinned to is a biased estimator.