Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store

What Is the Chi Square Formula?

ffImage
hightlight icon
highlight icon
highlight icon
share icon
copy icon
SearchIcon
widget title icon
Latest Updates

widget icon
Start Your JEE Practice Here :
JEE Test Series 2026

How Do You Calculate Chi Square Step by Step?

The chi-square statistic is a fundamental quantity in inferential statistics, particularly in hypothesis tests involving categorical data. It quantifies the discrepancy between observed frequencies and the frequencies expected under a specific hypothesis.


Mathematical Expression for the Chi-Square Statistic

Let $O_i$ denote the observed frequency for the $i$-th category, and $E_i$ denote the expected frequency for the $i$-th category under the null hypothesis. The chi-square statistic, denoted as $\chi^2$, is defined by the formula:


\[\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}\]


Here, $k$ is the total number of mutually exclusive categories. Each term in the summation quantifies the squared standardized difference between observed and expected frequencies for one category.


Determination of Expected Frequencies in Contingency Tables

In a contingency table of $r$ rows and $c$ columns, the expected frequency $E_{ij}$ for cell $(i, j)$ is calculated under the assumption of independence as follows:


\[E_{ij} = \frac{(\text{Row total for }i) \times (\text{Column total for }j)}{N}\]


Here, $N$ denotes the total number of observations, while the row and column totals refer to the respective sums over the observed frequencies. For more principles on this calculation, refer to Statistics and Probability Overview.


Interpretation of the Chi-Square Value in Hypothesis Testing

The computed value of $\chi^2$ can be used to evaluate the null hypothesis by comparing it with the tabulated critical value from the chi-square distribution with an appropriate degrees of freedom, given by:


\[\text{Degrees of freedom (df)} = (r-1) \times (c-1)\]


For a goodness-of-fit test with $k$ categories, the degrees of freedom is $k-1$.


Calculation of the p-Value in the Chi-Square Test

Once the statistic $\chi^2$ is computed, the p-value is determined as the probability that a value at least as extreme as the observed statistic occurs under the null hypothesis, i.e., $P(\chi^2_{\text{df}} \geq \chi^2_{\text{obs}})$. A low p-value (commonly below 0.05) provides significant evidence against the null hypothesis. A high p-value indicates consistency with the null hypothesis.


Detailed Example: Computing the Chi-Square Statistic for Observed vs. Expected Car Ownership

Given: The distribution of $n=50$ families according to the number of cars they own:


One car: $O_1 = 30$, $E_1 = 25.6$
Two cars: $O_2 = 14$, $E_2 = 15.1$
Three cars: $O_3 = 6$, $E_3 = 5.2$


For each category, calculate the contribution to $\chi^2$:


\[ \frac{(O_1 - E_1)^2}{E_1} = \frac{(30 - 25.6)^2}{25.6} = \frac{19.36}{25.6} = 0.75625 \] \[ \frac{(O_2 - E_2)^2}{E_2} = \frac{(14 - 15.1)^2}{15.1} = \frac{1.21}{15.1} = 0.08013 \] \[ \frac{(O_3 - E_3)^2}{E_3} = \frac{(6 - 5.2)^2}{5.2} = \frac{0.64}{5.2} = 0.12308 \]


Sum these contributions:


\[ \chi^2 = 0.75625 + 0.08013 + 0.12308 = 0.95946 \]


Result: $\chi^2 = 0.96$ (to two decimal places).


For the tabulated critical value at appropriate degrees of freedom and the desired significance level, the conclusion on the null hypothesis can be made by comparison. For further computational tools, see Chi Square Formula.


Stepwise Expansion: Derivation of the Chi-Square Statistic

Consider a discrete random experiment categorized into $k$ mutually exclusive classes. Under the null hypothesis $H_0$, let the expected frequency for the $i$-th class be $E_i = N p_i$ where $p_i$ is the expected probability of the $i$-th class, and $N$ is the sample size. For $i = 1, 2, \ldots, k$:


Step 1: Compute the difference between observed and expected values:


\[ d_i = O_i - E_i \]


Step 2: Square this difference:


\[ d_i^2 = (O_i - E_i)^2 \]


Step 3: Normalize the squared difference by dividing by $E_i$:


\[ \frac{d_i^2}{E_i} = \frac{(O_i - E_i)^2}{E_i} \]


Step 4: Sum over all categories to obtain the chi-square statistic:


\[ \chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \]


Critical Use Cases and Statistical Considerations of the Chi-Square Statistic

The chi-square test is employed for evaluating categorical data in cases such as the goodness-of-fit test, test of independence in a contingency table, and test of homogeneity. Valid statistical inference requires the data to be in frequency form (actual counts, not percentages), and for each expected frequency $E_i$ to be sufficiently large (generally $E_i \geq 5$) for the approximation to the chi-square distribution to remain accurate. For deeper foundational understanding, see Understanding Probability.


Frequently Examined Points on the Chi-Square Formula

The chi-square formula presupposes mutually exclusive categories. Its statistic is always non-negative since each squared difference is non-negative. A small value of $\chi^2$ affirms close alignment between observed and expected frequencies. A large value suggests a statistically significant discrepancy, frequently leading to the rejection of the null hypothesis at the chosen significance level. The exact threshold for rejection is determined via the chi-square distribution table based on the test’s degrees of freedom.


The chi-square procedure is widely used in biology, genetics, and medical sciences for inference regarding categorical data, as well as in numerous statistical methods for hypothesis testing involving qualitative data. For application to distributions and combinatorial data, see Permutations and Combinations.


FAQs on What Is the Chi Square Formula?

1. What is the Chi Square formula?

The Chi Square formula is used in statistics to assess how observed data compares to expected data under a specific hypothesis. The formula is:

χ² = Σ (Oi – Ei)² / Ei

Where:

  • χ² = Chi Square statistic
  • Oi = Observed frequency for category i
  • Ei = Expected frequency for category i
This formula is widely used for hypothesis testing in statistics, such as the Chi Square test of independence and the goodness of fit test.

2. What are the assumptions of the Chi Square test?

The Chi Square test relies on certain assumptions:

  • Data must consist of frequency counts or categorical data.
  • Observations are independent of each other.
  • Expected frequency in each category should generally be at least 5.
  • Sample is randomly selected.
Meeting these conditions ensures reliable results from the Chi Square test in statistics.

3. What is the purpose of the Chi Square test?

The main purpose of the Chi Square test is to determine if there is a significant difference between the expected and observed frequencies in categorical data.

It is commonly used for:

  • Testing goodness of fit (how well observed data fits an expected distribution)
  • Test of independence (checking association between two categorical variables)
  • Homogeneity testing (are different samples from the same population?)
It is a very important tool in statistics and research projects for categorical data.

4. How do you calculate expected frequency in a Chi Square test?

Expected frequency in a Chi Square test is calculated with:

Expected frequency = (Row total × Column total) / Grand total

Steps:

  1. Multiply the total for the row by the total for the column.
  2. Divide the result by the grand total (sum of all observations).
This ensures proper comparison between observed and expected values.

5. What is the degree of freedom in a Chi Square test?

Degree of freedom (df) in a Chi Square test depends on the number of categories.

  • For a single variable (goodness of fit): df = Number of categories – 1
  • For two-way tables (independence): df = (Rows – 1) × (Columns – 1)
The df is crucial for determining the critical value from the Chi Square table.

6. What are the types of Chi Square tests?

There are mainly two types of Chi Square tests in statistics:

  • Chi Square test of goodness of fit: Checks how well observed data matches expected distribution.
  • Chi Square test of independence: Tests whether two categorical variables are independent in a population.
Both tests use the Chi Square formula and are essential in analysing categorical data.

7. What are the uses of the Chi Square test in real life?

The Chi Square test is widely used in real life applications such as:

  • Survey data analysis (e.g. customer preference studies)
  • Medical research (testing association between disease and risk factors)
  • Genetics and biology (testing Mendelian ratios)
  • Market research (product popularity, brand association)
It allows researchers to validate hypotheses using categorical data effectively.

8. What are the limitations of the Chi Square test?

The Chi Square test has several limitations:

  • Not suitable for small sample sizes (expected frequency should be ≥ 5).
  • Applies only to categorical data, not numerical data.
  • Cannot determine the strength or direction of association, only presence.
  • Sensitive to outliers and skewed data.
Always consider these points when interpreting results from Chi Square analysis.

9. What is the difference between the Chi Square test and t-test?

The Chi Square test and t-test are different statistical tools:

  • Chi Square test is used for categorical data to compare frequencies or proportions.
  • t-test is used for numerical/continuous data to compare means.
Each test is chosen based on the data type and research question.

10. Why is the Chi Square test called a non-parametric test?

The Chi Square test is called a non-parametric test because:

  • It does not require assumptions about the population parameters (like mean or variance).
  • It works with categorical data and frequency counts.
  • No need for normality in the data distribution.
This makes the Chi Square test flexible and widely used in statistics.