Coefficient of Determination

Bookmark added to your notes.
View Notes
×

Introduction to Coefficient of Determination

The coefficient of determination (described by R2) is the square of the correlation (r) between anticipated y scores and actual y scores; hence, it ranges from 0 to 1. It is quite a crucial output of regression analysis which is narrated as the magnitude/dimension of the variance in the dependent variable which is anticipated from the independent variable. When it comes to linear regression, the coefficient of determination also measures equivalent to the square of the correlation between x and y scores.

Image will be uploaded soon

Interpretation of R2

R2 has its own significance. In a way that:-

  • An R2 of 0 implies that the dependent variable is unable to be anticipated from the independent variable.

  • An R2 of 1 indicates the dependent variable is able to be anticipated error-free from the independent variable.

  • An R2 between 0 and 1 means the magnitude to which the dependent variable is foreseeable.

  • An R2 of 0.10 indicates that 10% of the variance in Y is foreseeable from X

  • An R2 of 0.20 indicates that 20% is foreseeable; and so on.

Formula for Coefficient of Determination

The standard formula for calculating the coefficient of determination with a linear regression system with one independent variable is as below:-

Where,


Coefficient of

Determination = \[(Correlation Coefficient)^{2}\]

Formula


Correlation  =  \[\frac{\sum[(X - X_{m}) * (Y - Y_{m})]}{\sqrt{[\sum[(X - X_{m})^{2} * \sum(Y - Y_{m})^{2}}]}\]

Coefficient 

Coefficient of Determination 

SYNTAX

MEANING

R

Correlation

R2

Coefficient of determination of the linear regression equation

N

Number of observations in the regression linear equation

X

Average/Mean of the independent variable of the linear regression equation

Xi

Independent variable of the regression linear equation

Y

Average/Mean of the dependent variable of the linear regression equation

Yi

Dependent variable of the linear regression equation

σx

deviation of the independent variable

σy

deviation of the dependent variable

 

Adequacy of R2

The utility of R2 prevails in its proficiency to establish the possibility of future events to have been occurring within the anticipated outcomes. The theory behind is that if more samples are added, the coefficient would exhibit the probability of a new point falling on the line.

Even if there is a strong relationship between the two variables, determination is not evident of causality. For example, a study on wedding anniversary may exhibit a big number of anniversaries falling within a time span of one or two months. This though does not indicate that the passage of time or change of seasons contributes to pregnancy.

Syntax of Coefficient of Determination

The coefficient of determination in maths is generally written as R2_p. Here, the “p” means the number of columns of data, which is quite resourceful when comparing the R2 of different data sets.

Solved Examples

Problem1.

Below is given the link between two variables, m and n where m + 3n = 10. Likewise, the link between the other two variables, p and q where 2p + 5q = 25. The coefficient of p on m measures 0.80. Estimate the coefficient of q on m?


Solution 1.

We have,

m + 3n = 10

n= m-103-13M -103 -13

Furthermore,

 

2p + 5q = 25

q= p-252-52p-252 -52

Since we are familiar of the formula,

Then,

0.80 = -2.5-0.33-2.5-0.33× bqnbqn

 

0.80= 7.5 bqnbqn

Thus,

 

Bqnbqn= 0.1333 × 0.80 = 8/75

 

Hence we get the answer 8/75

Problem 2.

Given that the value of co-efficient of determination is 0.64. Estimate the value of co-efficient of correlation? Choose from the below given option.

A. 0.04

B. 0.40

C. 0.08

D. 0.80

Solution 2.

The answer is Option D, which is 0.80 (or 80%).

Reason being that the coefficient of determination is represented by R2. Thus, a coefficient of determination of 0.64 indicates that the coefficient of correlation will be 0.8 since the range for the coefficient of correlation is -1 to +1, and hence, the range for the coefficient of determination is 0 to +1.

Fun Facts/ Key Takeaways

  • The coefficient of determination is frequently referred to as R2 (or R-squared)

  • The coefficient of determination often named as the ‘goodness of fit’

  • The coefficient of determination is a complicated theory centralized around the statistical assessment of future fashions of data

  • The coefficient of determination is executed to give an explanation of how the magnitude of variability of one factor can be caused by its link to another factor.

  • The computation of coefficient of determination is characterized as a value between 0.0 and 1.0

  • The value of 0.0 represent that the model fails to correctly model the data

  • The value of 1.0 represents a perfect fit which also makes it an utterly reliable fashion for future anticipations.

FAQ (Frequently Asked Questions)

1. What does Adjusted Coefficient of Determination mean?

The Adjusted Coefficient of Determination denoted as (Adjusted R-squared) is sort of rearrangement for the Coefficient of Determination that considers the number of variables in a data set. It also inflicts a penalty for points that don’t accommodate the model.

You may know that some values in a data set (particularly, a too-small sample size) can result in deceptive data, but you might not know that excessive data points too can induce certain issues. That is to say, each time you add a data point in regression analysis, R2 will show an increase and then never decreases. Thus, the more points you add, the better the regression will appear to “accommodate” your data. If your data doesn’t quite seem to accommodate a line, it can be irresistible to keep on adding data until you obtain a satisfactory fit.


2. What is the best use of the Coefficient of Determination?

The most common usage of (R²) is perhaps how well the regression model accommodates the assessed data. For example, R² of 80% exhibits that 80% of the data “accommodate” the regression model. Usually, a larger coefficient signifies a better fit for the model. Though it does not make for a universal truth that a large r-squared is superlative for the regression model. Having said that, the quality of the coefficient is dependent upon several factors, including the units of the variables, the characteristic of the variables executed in the model, and the used data transformation. Therefore, even a large coefficient can sometimes induce problems with the regression model.