Prove that ${{b}_{yx}}\cdot {{b}_{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$\[\]

Question

Prove that ${{b}_{yx}}\cdot {{b}_{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$

Vedantu Content Team · Accepted Answer

Hint: We recall the definitions and formula of regression coefficients ${{b}_{xy}},{{b}_{yx}}$ in the regression analysis bivariate data $X,Y$ as the slopes of regression line. We recall the formula for correlation coefficient $\rho \left( X,Y \right)$ which is the ratio the ratio of covariance $COV\left( X,Y \right)$ of the bivariate population and product of standard deviations ${{\sigma }_{x}},{{\sigma }_{y}}$ of $X$ and $Y$. We proceed from the left hand side to prove the statement. Complete step by step answer:We know that mean of a population with $n$ data points  $X={{x}_{1}},{{x}_{2}},...{{x}_{n}}$ is given by$$\overline{X}=\dfrac{1}{n}\sum\limits_{i=1}^{n}{{{x}_{i}}}$$We know in regression analysis that in bivariate data two variables vary each other. It means if there are two variables $X,Y$ then $X$ may depend on $Y$ and also $Y$ may depend on $X.$ Let us take a set of $n$ data points $\left( {{x}_{1}},{{y}_{1}} \right),\left( {{x}_{2}},{{y}_{2}} \right),...,\left( {{x}_{n}},{{y}_{n}} \right)$. We use the least square method to find the regression lines to fit the data. Let us assume when   $X={{x}_{1}},{{x}_{2}},...{{x}_{n}}$may depend on $Y={{y}_{1}},{{y}_{2}},{{y}_{3}},...,{{y}_{n}}$ we obtain equation of the regression line$$X=c+dx$$Here $c$ is the average value of $X$ when $Y$ is zero. We know that the slope of the above line is called the regression coefficient  ${{b}_{xy}}$ which is given by$${{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}$$We assume the equation of the regression line  $Y$ may depend on $X$as$$Y=ax+b$$Here $a$ is the average value of $Y$ when $X$ is zero. Here the slope of the line is the regression coefficient ${{b}_{yx}}$ which is given by $${{b}_{yx}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}$$The correlation coefficient  $\rho \left( X,Y \right)$ of the population determines the degree of causality of $X$on $Y$ or $Y$on $X.$we know that it is the ratio of covariance $COV\left( X,Y \right)$ of the bivariate population and product of  standard deviations of $X$ and $Y$. So it is given by $$\rho \left( X,Y \right)=\dfrac{\text{COV}\left( X,Y \right)}{{{\sigma }_{x}}{{\sigma }_{y}}}=\dfrac{\sum{{{x}_{i}}{{y}_{i}}-n\overline{X}\overline{Y}}}{\sqrt{\sum{{{x}_{i}}^{2}-n{{\overline{X}}^{2}}}}\sqrt{\sum{{{y}_{i}}^{2}-n{{\overline{Y}}^{2}}}}}$$We proceed from the left hand side of the statement  ${{b}_{yx}}\cdot {{b}_{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$$$\begin{align}  & {{b}_{yx}}\cdot {{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}\times \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}} \\  & ={{\left( \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sqrt{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}\times \sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}} \right)}^{2}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}} \\ \end{align}$$Which is equal to the right hand side and hence the statement is proved. Note: We can alternatively solve if we know the relation between regression coefficients ${{b}_{xy}},{{b}_{yx}}$ and standard deviation  ${{\sigma }_{x}},{{\sigma }_{y}}$ as ${{b}_{xy}}=\rho \dfrac{{{\sigma }_{x}}}{{{\sigma }_{y}}},{{b}_{yx}}=\rho \dfrac{{{\sigma }_{y}}}{{{\sigma }_{x}}}$ and then proceed from left hand side. The proving statement can be written as the correlation coefficient in bivariate data is the geometric mean of regression coefficients.