Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store

Correlation and Regression in Statistics Explained

Reviewed by:
ffImage
hightlight icon
highlight icon
highlight icon
share icon
copy icon

Correlation and Regression formulas types and solved examples

The concepts of correlation and regression play a key role in mathematics and statistics, helping students analyse data relationships and make predictions. These are frequently used in school projects, competitive exams like JEE, and real-world data analysis.


What Is Correlation and Regression?

Correlation is the statistical measure that describes the strength and direction of the relationship between two variables. If two variables, like temperature and ice cream sales, increase together, they show positive correlation. If an increase in one variable leads to a decrease in another (like exercise and weight), it is a negative correlation. Regression, however, is used to predict the value of one variable based on the value(s) of another. For example, regression can help predict a student’s future exam marks based on hours studied.

You’ll find these concepts applied in data analysis, predictive modeling, research writing, and classroom projects. Studying correlation and regression boosts logical thinking and data literacy for students in all fields.


Key Formula for Correlation and Regression

Correlation Coefficient Formula:
\( r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \)
Regression Line Equation (Simple Linear Regression):
\( y = a + bx \)
Where:

a = Intercept of the line
b = Slope or regression coefficient


Difference Between Correlation and Regression

Aspect Correlation Regression
Definition Measures the strength and direction of relationship between two variables. Predicts the value of one variable based on the value of another.
Value output Ranges from -1 (perfect negative) to +1 (perfect positive) Regression equation (e.g., \(y = a + bx\))
Interchange of variables Variables are not classified as dependent or independent There is a clear dependent (y) and independent (x) variable
Cause-Effect Does not imply causation Can suggest predictive, directional relationship
Application To summarise relationship and association To make predictions and model data

Cross-Disciplinary Usage

Correlation and regression are not only useful in Mathematics but also play an important role in fields like Physics (to relate physical measurements), Computer Science (machine learning models use regression), Economics (predicting financial trends), and even in daily logical reasoning. Students preparing for JEE, NEET, or research-based projects will often need these concepts to support data-driven conclusions.


Step-by-Step Illustration

Example Problem: A teacher collected data from five students on hours studied (x) and marks scored (y):
x: 2, 4, 6, 8, 10
y: 40, 50, 65, 80, 100
Find the correlation coefficient and regression equation to predict marks based on study hours.

Step-by-step Solution:

1. Calculate the mean of x (\(\bar{x}\)) and y (\(\bar{y}\))

2. Find the deviations (\(x_i - \bar{x}\)) and (\(y_i - \bar{y}\)) for each observation

3. Multiply the deviations for each pair and sum them: \(\sum (x_i - \bar{x})(y_i - \bar{y})\)

4. Calculate the sum of squared deviations for x and y separately

5. Use the correlation formula:
\( r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} \)

6. Calculate slope (b):
\( b = \frac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2} \)

7. Find intercept (a):
\( a = \bar{y} - b\bar{x} \)

8. Final regression equation: \(y = a + bx\)

Interpretation: Use the regression line to predict marks for students who studied any number of hours within this range.


Speed Trick or Vedic Shortcut

Here’s a quick shortcut for finding the mean deviation — a common sub-calculation in correlation and regression questions:

  1. Add up all the values for x and y separately.
  2. Divide by the number of values to get the mean quickly.
  3. Subtract the mean from each value to get deviations instantly.

Practicing this shortcut improves calculation speed in statistics sections of exams. Vedantu’s expert teachers often demonstrate such hacks in live sessions for a smoother problem-solving experience.


Try These Yourself

  • List real-life pairs showing positive, negative, and zero correlation.
  • Given x: 1, 2, 3, y: 2, 4, 6, find the regression line for predicting y from x.
  • Explain why correlation does not mean causation, using your own example.
  • If r = 0.9, what does it say about the relationship between the two variables?

Frequent Errors and Misunderstandings

  • Assuming correlation means one variable causes another (it does not).
  • Mixing up dependent and independent variables in regression equations.
  • Believing that weak correlation always means no relationship (other factors might be involved).
  • Forgetting to check for linear trend before applying formulas.

Relation to Other Concepts

The idea of correlation and regression closely connects with covariance (measures variability together), mean and variance, and is foundational for statistical inference. Mastering this helps build strong skills for understanding probability distributions, prediction, and interpreting research data.


Classroom Tip

A quick way to remember: Correlation = Connection; Regression = Regression line predicts Results. Vedantu’s teachers suggest drawing scatter plots for visual clues before calculating—this helps students see the relationship type at a glance.


We explored correlation and regression—from definition, formula, example problem, quick tricks, and common mistakes, to how these connect with bigger math ideas. For more tricks, live help, and exam support, keep practicing with Vedantu’s online courses and resources.


Related readings on Vedantu:
Correlation: Types and Uses    Regression Analysis: Concepts & Applications    Scatter Plot Interpretation


FAQs on Correlation and Regression in Statistics Explained

1. What is correlation in statistics?

Correlation is a statistical measure that shows the strength and direction of the relationship between two variables. In correlation analysis, the relationship can be:

  • Positive correlation: both variables increase or decrease together.
  • Negative correlation: one variable increases while the other decreases.
  • No correlation: no clear linear relationship.
The strength of correlation is measured using the correlation coefficient, which ranges from -1 to +1.

2. What is the formula for Pearson’s correlation coefficient?

The formula for Pearson’s correlation coefficient (r) is r = Σ[(x − x̄)(y − ȳ)] / √[Σ(x − x̄)² Σ(y − ȳ)²]. Here:

  • and ȳ are the means of x and y.
  • The numerator measures the covariance.
  • The denominator standardizes using the product of standard deviations.
The value of r always lies between -1 and +1.

3. What does the correlation coefficient tell you?

The correlation coefficient tells you the strength and direction of a linear relationship between two variables. Specifically:

  • r = +1: perfect positive linear correlation.
  • r = -1: perfect negative linear correlation.
  • r = 0: no linear correlation.
Values closer to ±1 indicate a stronger linear relationship, while values near 0 indicate a weak relationship.

4. What is regression in statistics?

Regression is a statistical method used to model and predict the relationship between a dependent variable and one or more independent variables. In linear regression, the relationship is expressed as y = a + bx, where:

  • y is the dependent variable.
  • x is the independent variable.
  • a is the intercept.
  • b is the slope (regression coefficient).
Regression analysis is mainly used for prediction and forecasting.

5. What is the difference between correlation and regression?

The main difference is that correlation measures relationship strength, while regression predicts values. Key differences include:

  • Correlation measures the degree and direction of association between variables.
  • Regression establishes a functional relationship and predicts the dependent variable.
  • Correlation is symmetric (no dependent/independent variable).
  • Regression distinguishes between dependent and independent variables.
Correlation does not imply causation, while regression is used for estimation and prediction.

6. What is the formula for the regression line?

The formula for the simple linear regression line is y = a + bx. The coefficients are calculated as:

  • b = Σ[(x − x̄)(y − ȳ)] / Σ(x − x̄)² (slope)
  • a = ȳ − b x̄ (intercept)
This equation represents the line of best fit that minimizes the sum of squared errors.

7. How do you interpret the slope in linear regression?

The slope in linear regression represents the change in the dependent variable for a one-unit increase in the independent variable. In the equation y = a + bx, the slope b means:

  • If b > 0, y increases as x increases.
  • If b < 0, y decreases as x increases.
  • The magnitude of b shows the rate of change.
For example, if b = 2, then y increases by 2 units for every 1-unit increase in x.

8. Can you give an example of calculating correlation?

Yes, correlation can be calculated using paired data and the Pearson formula. For example, consider (x, y): (1,2), (2,4), (3,6).

  • x̄ = 2, ȳ = 4
  • Using r = Σ[(x − x̄)(y − ȳ)] / √[Σ(x − x̄)² Σ(y − ȳ)²]
  • The calculated value is r = 1
This indicates a perfect positive correlation because y increases exactly proportionally with x.

9. What is the coefficient of determination (R²)?

The coefficient of determination, denoted by , measures the proportion of variance in the dependent variable explained by the regression model. It is calculated as R² = r² in simple linear regression.

  • R² = 0: model explains none of the variation.
  • R² = 1: model explains all the variation.
For example, if r = 0.8, then R² = 0.64, meaning 64% of the variation is explained.

10. Does correlation imply causation?

No, correlation does not imply causation because a relationship between two variables does not prove that one causes the other. Even if the correlation coefficient is close to +1 or -1:

  • There may be a third (confounding) variable.
  • The relationship may be coincidental.
  • The direction of cause may be unclear.
Correlation analysis only measures association, not cause-and-effect.