Linear Regression

Introduction to Linear Regression

Linear regression is used to predict the relationship between two variables by applying a linear equation to observed data. There are two types of variable, one variable is called an independent variable, and the other is a dependent variable. Linear regression is commonly used for predictive analysis. The main idea of regression is to examine two things. First, does a set of predictor variables do a good job in predicting an outcome (dependent) variable? The second thing is which variables are significant predictors of the outcome variable ?. In this article, we will discuss the concept of the Linear Regression Equation, formula and Properties of Linear Regression.  


Examples of Linear Regression

The weight of the person is linearly related to their height. So, this shows a linear relationship between the height and weight of the person. According to this, as we increase the height, the weight of the person will also increase.


It is not necessary that one variable is dependent on others, or one causes the other, but there is some critical relationship between the two variables. In such cases, we use a scatter plot to simply the strength of the relationship between the variables. If there is no relation or linking between the variables then the scatter plot does not indicate any increasing or decreasing pattern. In such cases, the linear regression design is not beneficial to the given data.


Linear Regression Equation

The measure of the relationship between two variables is shown by the correlation coefficient. The range of the coefficient lies between -1 to +1. This coefficient shows the strength of the association of the observed data between two variables.

Linear Regression Equation is given below :

Y=a+bX

where X is the independent variable and it is plotted along the x-axis

Y is the dependent variable and it is plotted along the y-axis

Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).


Linear Regression Formula

As we know, linear regression shows the linear relationship between two variables. The equation of linear regression is similar to that of the slope formula.  We have learned this formula before in earlier classes such as a linear equation in two variables. Linear Regression Formula  is given by the equation

Y= a + bX

We will find the value of a and b by using the below formula

a = \[\frac{(\sum y)(\sum x^{2})-(\sum x)(\sum xy)}{[n(\sum x^{2})-(\sum x)^{2}]}\]

b = \[\frac{[n(\sum xy)-(\sum x)(\sum y)]}{[n(\sum x^{2})-(\sum x)^{2}]}\]


Simple Linear Regression

Simple linear regression is the most straight forward case having a single scalar predictor variable x and a single scalar response variable y. The equation for this regression is given as y=a+bx


The expansion to multiple and vector-valued predictor variables is known as multiple linear regression. It is also known as multivariable linear regression. The equation for this regression is given as Y = a+bX


Almost all real-world regression patterns include multiple predictors. The basic explanations of linear regression are often explained in terms of multiple regression. Note that, in these cases, the dependent variable y is yet a scalar.


Least Square Regression Line or Linear Regression Line

The most popular method to fit a regression line in the XY plot is found by using least-squares. This process is used to determine the best-fitting line for the given data by reducing the sum of the squares of the vertical deviations from each data point to the line. If a point rests on the fitted line accurately, then the value of its perpendicular deviation is 0. It is 0 because the variations are first squared, then added, so their positive and negative values will not be cancelled.

[Image will be Uploaded Soon]

Linear regression determines the straight line, known as the least-squares regression line or LSRL. Suppose Y is a dependent variable and X is an independent variable, then the population regression line is given by the equation;

Y = B\[_{0}\] + B\[_{1}\]X

Where

B\[_{0}\] is a constant

B\[_{1}\] is the regression coefficient

When a random sample of observations is given, then the regression line is expressed as;

\[\hat{y}\] = b\[_{0}\] + b\[_{1}\]x

where b\[_{0}\] is a constant

b\[_{1}\] is the regression coefficient, 

x is the independent variable, 

ŷ is known as the predicted value of the dependent variable.


Properties of Linear Regression

For the regression line where the regression parameters b\[_{0}\] and b\[_{1}\] are defined, the following properties are applicable:

  • The regression line reduces the sum of squared differences between observed values and predicted values.

  • The regression line passes through the mean of X and Y variable values.

  • The regression constant b\[_{0}\] is equal to the y-intercept of the linear regression.

  • The regression coefficient b\[_{1}\] is the slope of the regression line. Its value is equal to the average change in the dependent variable (Y) for a unit change in the independent variable (X)

Regression Coefficient

The regression coefficient is given by the equation :

Y = B\[_{0}\] + B\[_{1}\]X  

Where

B\[_{0}\] is a constant

B\[_{1}\] is the regression coefficient

Given below is the formula to find the value of the regression coefficient.

B\[_{1}\] = b\[_{1}\] = Σ[(x\[_{i}\] - x)(y\[_{i}\] - y)]/Σ[(x\[_{i}\] - x)\[^{2}\]]

Where xiand yi are the observed data sets.

And x and y are the mean value.


Solved Examples

1. Find a linear regression equation for the following two sets of data:


x

2

4

6

8

y

3

7

5

10


Sol: To find the linear regression equation we need to find the value of Σx, Σy, Σx\[^{2}\] and Σxy 

Construct the table and find the value


x

y

xy

2

3

4

6

4

7

16

28

6

5

36

30

8

10

64

80

Σx = 20

Σy = 25

Σx² = 120

Σxy = 144


The formula of linear equation is y=a+bx

Using the formula we will find the value of a and b

a = \[\frac{(\sum y)(\sum x^{2})-(\sum x)(\sum xy)}{[n(\sum x^{2})-(\sum x)^{2}]}\]

Now put the values in the equation

a = \[\frac{25 \times 120 - 20 \times 144}{4 \times 120 - 400}\]

a = \[\frac{120}{80}\]

a = 1.5

b = \[\frac{[n(\sum xy)-(\sum x)(\sum y)]}{[n(\sum x^{2})-(\sum x)^{2}]}\]

Put the values in the equation

b = \[\frac{4 \times 144 - 20 \times 25}{4 \times 120 - 400}\]

b = \[\frac{76}{80}\]

b = 0.95

Hence we got the value of a = 1.5 and b = 0.95

The linear equation is given by

Y = a + bx

Now put the value of a and b in the equation

Hence equation of linear regression is y = 1.5 + 0.95x

FAQs (Frequently Asked Questions)

1.What are the Types of Linear Regression?

Ans: Different types of linear regression are:

  • Simple linear regression

  • Multiple linear regression

  • Logistic regression

  • Ordinal regression

  • Multinomial regression

  • Discriminant Analysis

2. What are the Differences Between Linear and Logistic Regression?

Ans: Linear regression is used to predict the value of a continuous dependent variable with the help of independent variables. Logistic Regression is used to predict the categorical dependent variable with the help of independent variables. It is also used to predict the values of categorical variables.

3. How Does a Linear Regression Work?

Ans: Linear Regression is the process of finding a line that best fits the data points available on the plot. So it used to predict output values for inputs that are not present in the data set. Generally, those outputs would fall on the line.