Correlation

Definition of Correlation

Correlation refers to the process of establishing a relationship between two variables. To identify or to understand whether a relationship exists between two variables or not, you plot the points on a scatter plot. There are many ways in which you can relate the variables - like the ordinal level of measurement or higher level of measurement, but the most commonly used approach is a correlation.

 

Correlation in Statistics

In this section, you will be learning how to interpret correlation coefficients and calculate correlation coefficients for interval level scales as well as the original level scales. A correlation coefficient is a single number which is summarized by the relationship between 2 numbers using methods of correlation. The reason behind scaling correlation coefficient is to make sure that it always lies between +1 and -1. If the coefficient is close to 0 then the relation between the relationship between the two numbers is less and when the relationship is far away from 0 then the relationship is strong between the two variables. 

The usual symbols given to these variables are X and Y. To show how these variables are related to each other, the values are illustrated by drawing them on the scatter diagram and then graph the combinations of the variables X and Y. First, the scatter diagram is drawn. Next, the method to determine Pearson’s r is performed. Initially, small samples are taken to represent it and then larger sizes of samples are used. 

Types of correlation

Now that we know that the scatter plots are used to explain the correlation between two numbers or variables, let us study about correlation and its types. The relationship between the two variables can be compared using three different types of correlation: positive correlation, negative correlation, or no correlation.

  • Positive Correlation:  This situation occurs if the value of one variable increases the value of the variable also increases

  • Negative Correlation:   This situation occurs if the value of one variable increases the value of the decreases also decreases

  • No Correlation: In this situation, the variables are not dependent on each other

Pearson’s Correlation Coefficient Formula

The most commonly used formula to find the linear dependency of two sets of data is Pearson’s Correlation Coefficient Formula. The value of Pearson’s Correlation Coefficient lies between positive 1 and a negative 1. When the value of the coefficient is above +1 and less than - 1, the data is considered to be unrelated to each other. Data sets are considered to be in positive correlation if their coefficient is +1 and the data sets are considered to be in a negative correlation if their coefficient is -1.


r = \[\frac{n(\sum{xy})-(\sum{x})(\sum{y})}{\sqrt{[n\sum{x^{2}-(\sum{x})^{2}}][n\sum{y^{2}-(\sum{y})^{2}]}}}\]

Here,

n = It is the quantity of information that is available

Σx = The total value of the first variable

Σy = The total value of the second variable

Σxy = It is the product of the sums of the first and the second value

Σx2 = It is the square of the sum of the first value

Σy2 = It is the square of the sum of the second value

 

Linear Correlation Coefficient Formula

The formula for the linear correlation coefficient is given below:

\[\frac{n\sum_{i=1}^{n}{x_i}{y_i}-\sum_{i=1}^{n}{x_i}\sum_{i=1}^{n}{y_i}}{\sqrt{n\sum_{i=1}^{n}{x_i}^{2}-(\sum_{i=1}^{n}{x_i})^{2}}\sqrt{n\sum_{i=1}^{n}{y_i}^{2}-(\sum_{i=1}^{n}{y_i})^{2}}}\]

Sample Correlation Coefficient Formula

The sample correlation coefficient formula is: rab = Sab / SaSb

Here, = is the sample standard deviation

Sa = is the sample standard deviation

Sb = is the sample standard deviation

Sab = is the sample covariance

 

Population Correlation Coefficient Formula

The formula for population correlation coefficient is:

 

rab = σabaσb

 

Here,

σa = is the population standard deviation

σb = is the population standard deviation

σab = is the population covariance

 

Solved Problem

The number of years of education received and The age of entering the workforce will give us the years of formal education one has received.  In the table below, you’ll see the years of education (A) a person has received and the age at which he entered the workforce (B). The survey was done among 12 people and all these people were aged above 30 years or more. 

 

Person Number

No. of Years of Education 

Age of Entry in Workforce

1

10

16

2

12

17

3

15

18

4

8

15

5

20

18

6

17

22

7

12

19

8

15

22

9

12

18

10

10

15

11

8

18

12

10

16

 

Here, you can notice that people started their formal education early and you can also notice that the relationship between the number of years of schooling and the age at which they entered the workforce. For example: see Person 11. They had just 8 years of formal education but they entered the workforce at the age of 18. The scatter diagram helps you understand the relationship between the number of years of schooling and the age at which they entered the workforce.

FAQ (Frequently Asked Questions)

1. What is Correlation?

Correlation refers to the process of establishing a relationship between two variables. To identify or to understand whether a relationship exists between two variables or not, you plot the points on scatter plot. There are many ways in which you can relate the variables - like the ordinal level of measurement or higher level of measurement, but the most commonly used approach is a correlation.

2. Explain the Types of Correlation?

The scatter plots are used to explain the different types of correlation between two numbers or variables. Between two data sets, the correlation and its types are positive correlation, negative correlation, or no correlation.


  • Positive Correlation:  This situation occurs if the value of one variable increases the value of the variable also increases

  • Negative Correlation:   This situation occurs if the value of one variable increases the value of the decreases also decreases

  • No Correlation: In this situation, the variables are not dependent on each other

Check out the types of correlation pdf.


3. What is the Formula for Correlation Coefficient?

The most commonly used formula to find the linear dependency of two sets of data is Pearson’s Correlation Coefficient Formula. The value should lie between +1 and -1. And if the coefficient is 0 then there is no relationship between the two data sets.

n = It is the quantity of information that is available

Σx = The total value of the first variable

Σy = The total value of the second variable

Σxy = It is the product of the sums of the first and the second value

Σx2 = It is the square of the sum of the first value

Σy2 = It is the square of the sum of the second value