Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store
seo-qna
SearchIcon
banner

Prove that ${{b}_{yx}}\cdot {{b}_{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$\[\]

Answer
VerifiedVerified
564.6k+ views
Hint: We recall the definitions and formula of regression coefficients ${{b}_{xy}},{{b}_{yx}}$ in the regression analysis bivariate data $X,Y$ as the slopes of regression line. We recall the formula for correlation coefficient $\rho \left( X,Y \right)$ which is the ratio the ratio of covariance $COV\left( X,Y \right)$ of the bivariate population and product of standard deviations ${{\sigma }_{x}},{{\sigma }_{y}}$ of $X$ and $Y$. We proceed from the left hand side to prove the statement. \[\]

Complete step by step answer:
We know that mean of a population with $n$ data points $X={{x}_{1}},{{x}_{2}},...{{x}_{n}}$ is given by
\[\overline{X}=\dfrac{1}{n}\sum\limits_{i=1}^{n}{{{x}_{i}}}\]
We know in regression analysis that in bivariate data two variables vary each other. It means if there are two variables $X,Y$ then $X$ may depend on $Y$ and also $Y$ may depend on $X.$ Let us take a set of $n$ data points $\left( {{x}_{1}},{{y}_{1}} \right),\left( {{x}_{2}},{{y}_{2}} \right),...,\left( {{x}_{n}},{{y}_{n}} \right)$. We use the least square method to find the regression lines to fit the data. Let us assume when $X={{x}_{1}},{{x}_{2}},...{{x}_{n}}$may depend on $Y={{y}_{1}},{{y}_{2}},{{y}_{3}},...,{{y}_{n}}$ we obtain equation of the regression line
\[X=c+dx\]
Here $c$ is the average value of $X$ when $Y$ is zero. We know that the slope of the above line is called the regression coefficient ${{b}_{xy}}$ which is given by
\[{{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}\]
We assume the equation of the regression line $Y$ may depend on $X$as
\[Y=ax+b\]
Here $a$ is the average value of $Y$ when $X$ is zero. Here the slope of the line is the regression coefficient ${{b}_{yx}}$ which is given by
\[{{b}_{yx}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}\]
The correlation coefficient $\rho \left( X,Y \right)$ of the population determines the degree of causality of $X$on $Y$ or $Y$on $X.$we know that it is the ratio of covariance $COV\left( X,Y \right)$ of the bivariate population and product of standard deviations of $X$ and $Y$. So it is given by
\[\rho \left( X,Y \right)=\dfrac{\text{COV}\left( X,Y \right)}{{{\sigma }_{x}}{{\sigma }_{y}}}=\dfrac{\sum{{{x}_{i}}{{y}_{i}}-n\overline{X}\overline{Y}}}{\sqrt{\sum{{{x}_{i}}^{2}-n{{\overline{X}}^{2}}}}\sqrt{\sum{{{y}_{i}}^{2}-n{{\overline{Y}}^{2}}}}}\]
We proceed from the left hand side of the statement ${{b}_{yx}}\cdot {{b}_{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$
\[\begin{align}
  & {{b}_{yx}}\cdot {{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}\times \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}} \\
 & ={{\left( \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sqrt{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}\times \sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}} \right)}^{2}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}} \\
\end{align}\]
Which is equal to the right hand side and hence the statement is proved.

Note: We can alternatively solve if we know the relation between regression coefficients ${{b}_{xy}},{{b}_{yx}}$ and standard deviation ${{\sigma }_{x}},{{\sigma }_{y}}$ as ${{b}_{xy}}=\rho \dfrac{{{\sigma }_{x}}}{{{\sigma }_{y}}},{{b}_{yx}}=\rho \dfrac{{{\sigma }_{y}}}{{{\sigma }_{x}}}$ and then proceed from left hand side. The proving statement can be written as the correlation coefficient in bivariate data is the geometric mean of regression coefficients.