Sampling in Statistics

A sample statistic refers to quantity from the sample of the given population. A sample is a group of elements that are chosen from the population. The features which we use to describe the population are called the parameters and the properties of the sample data are known as statistics. Population and sample both are the important part of statistics. A sample statistic is a piece of information that we collect from a fraction of a population. Here we will study about sampling statistics methods, hypothesized mean, mean standard deviation and distribution of means.


What is a Sample in Statistics?

A sample statistic is a numerical descriptive measure of a sample data points. A statistic is generally derived from measurements of the individual data from the sample. The statistics are a characteristic of a sample data distribution such as mean, median, mode, standard deviation and proportions. A sample statistic can be used to measure any characteristic of the sample.


Hypothesized Mean

Hypothesis testing is an essential procedure in statistics. A hypothesis test used to evaluate two mutually exclusive statements about a population that determine which statement is best and also supported by the sample data.

The process of hypothesis testing involves setting up two competing hypotheses, first is null hypothesis and second one is alternate hypothesis.

The techniques for hypothesis testing depend on

(i) the type of outcome variable being analyzed (continuous, dichotomous, discrete)

(ii) the number of comparison groups in the investigation

(iii) whether the comparison groups are independent


Estimating the Mean

Following are the steps for estimating the mean :

Step 1. First we have to add a new column to the table writing down the midpoint (middle value) of each group.

Step 2.  Multiply each midpoint value by the frequency of that group and then add the results in a new column.

Step 3. Add the values in the midpoint × frequency column.

Step 4. Finally, divide that value by the total frequency to get the estimate of the mean.


Sample Standard Deviation

The sample standard deviation formula is:

s = \[\sqrt{\frac{\sum (X - \bar{X})^{2}}{n-1}}\]

Sample standard deviation formula

where,

s = sample standard deviation

\[\sum\] = sum

\[\bar{X}\] = sample mean

n = number of scores in the sample.


Sampling Distribution

A sampling distribution is similar to a probability  distribution of a statistic that we choose from random samples of a given population. It is also known as a finite-sample distribution, it represents the distribution of frequencies for how to spread apart various outcomes for a specific population.


The sampling distribution depends on multiple factors such as statistics, sample size, sampling process, and the overall population. It is used to help calculate statistics such as means, ranges, variances and standard deviations for the given sample.


Sample Mean

The sample mean refers to the average value found in a sample. A sample is just a small part of a whole data. For example, if we work for a polling company and want to know how much people pay for food a year, you aren’t going to want to poll over 300 million people. Instead of that, we take a fraction of that 300 million (perhaps a thousand people) that fraction is called a sample. In other words, mean refers to “average.” So in this example, the sample mean will be the average amount therefore those thousand people will have to pay for food a year.


The sample mean is useful when we have to estimate what the whole population is doing, without surveying everyone. Suppose sample mean for the food example was $2400 per year. The odds that we will get is a very similar figure if we surveyed all 300 million people. So the sample mean is a way to save a lot of time as well as money.


The Sample Mean Formula 

The sample mean formula is: \[\bar{X}\] = \[\frac{\sum x_{i}}{n}\]

Here

  • \[\bar{X}\] just stands for the “sample mean”

  • \[\sum\] is summation notation

  • x\[_{i}\] “all of the x-values”

  • n is number of items in the sample mean

Mean and Standard Deviation

The mean refers to average or the most common value in a collection of numbers. There are multiple ways to calculate the mean. There are the two most popular methods i.e Arithmetic mean and geometric mean.


A standard deviation is the measurement of the distribution of a dataset which is related to its mean and it is calculated by the square root of the variance. It is calculated as the square root of variance by determining each data point's deviation which is relative to the mean. If the data points are further from the mean, then there is a chance of higher deviation within the data set. Therefore, the more spread out the data, the higher is the standard deviation.


The Formula for Standard Deviation is Given Below:

Standard deviation = s = \[\sqrt{\frac{\sum_{i=1}^{n} (X_{i} - \bar{X})^{2}}{(n-1)}}\]

Where 

X\[_{i}\] = It is the of the i\[^{th}\] point in the data set

\[\bar{X}\] = It is the mean value of the data set

X = It is the  number of data points in the data set


Probability Sample

Probability sampling is a sampling technique that is used by researchers to choose samples from a larger population using a method that is based on the theory of probability. For a participant to be considered as a probability sample, they must be selected using a random selection.


The most critical requirement of probability sampling is that everyone in the population is known and they have equal chance of getting selected. Suppose, if we have a population of 100 people, and every person would have odds of 1 in 100 for getting selected. In this case probability sampling gives us the best chance to create a sample that is mainly representative of the population.


It uses statistical theory while selecting a small group of people (or sample) from an existing large population and then predicts all their responses that will match with the overall population.


Errors in Sampling

Sampling error often occurs when the sample we use in the study is not representative of the whole population. It often occurs, that’s why, researchers always calculate a margin of error during final results as a statistical practice. The margin error is the amount of error that is allowed for miscalculation while representing the difference between the sample and the actual population. We can control and eliminate these sampling by creating a sample design, having a large enough sample to reflect the entire population, or using an online sample or survey audience to collect responses. 

FAQs (Frequently Asked Questions)

1. What are the Different Methods of Probability Sampling?

Ans: The different methods of probability sampling are:

(i) Simple random sampling

(ii) Systematic sampling

(iii) Clustered sampling

(iv) Stratified random sampling

2. Define Sampling Distribution of Proportion?

Ans: It gives us information about proportions in a population. We can select samples from the population and then get the sample proportion. The mean of all the sample proportions that we calculate from each sample group would become the proportion of the entire population.

3. How Does Sampling Distribution Work?

Ans: Sampling distribution work in the following way:

  • First, select a random sample of a specific size from a given population.

  • Calculate a statistic for the sample, such as the mean median, or standard deviation.

  • Develop a frequency distribution of each sample statistic that we have calculated from the step above.

  • Finally, plot the frequency distribution of each sample statistic that you developed from the step above. The resulting graph will be the sampling distribution.