Consider the statistical set-up from the previous problem. In particular, recall that \mathbf{u}= \frac{1}{\sqrt{5}} (1,2)^ T and
\displaystyle \mathrm{{\boldsymbol X}}_1 \, =\, \begin{pmatrix} 1\\ 2\end{pmatrix},\, \mathrm{{\boldsymbol X}}_2 \, = \, \begin{pmatrix} 3\\ 4\end{pmatrix},\, \mathrm{{\boldsymbol X}}_3 \, =\, \begin{pmatrix} -1 \\ 0\end{pmatrix}.
Observe that for i = 1,2,3, the number \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i (where \mathbf{u} is a unit vector) gives the signed distance from the origin to the endpoint of the projection \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i. By signed distance , we mean that \left| \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i \right| is the length of \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i and
\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i > 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the direction of } \, \mathbf{u}
\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i < 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the opposite direction of } \, \mathbf{u}\,
Compute the empirical variance of the data set
\mathbf{u}\cdot \mathrm{{\boldsymbol X}}_1,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_2,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_3.
unanswered
Let \mathbb {X} denote the matrix whose i-th row is \mathrm{{\boldsymbol X}}_ i^ T.
Recall that S = \frac{1}{3} \mathbb {X}^ T (I_3 - \frac{1}{3} \mathbf{1} \mathbf{1}^ T) \mathbb {X} denotes the empirical covariance matrix of our data set.
What is \mathbf{u}^ T S \mathbf{u}?
(You are encouraged to use computational software.)
unanswered
Are your answers from part 1 and part 2 of this question the same?
1 answer
First, let's compute the matrix X:
X = [1 2; 3 4; -1 0]
Next, let's compute the matrix 1:
1 = [1 1 1]
Now, let's compute the matrix A:
A = I3 - (1/3) * 11'
A = [2/3 -1/3 -1/3; -1/3 2/3 -1/3; -1/3 -1/3 2/3]
Finally, let's compute the empirical covariance matrix S:
S = (1/3) * X'A*X
S = [14/3 -4/3; -4/3 4/3]
Now, let's compute u^T S u:
u = (1/sqrt(5)) * [1 2]
u^T S u = (1/sqrt(5)) * [1 2] * [14/3 -4/3; -4/3 4/3] * [1 2]
u^T S u = (1/sqrt(5)) * [6/3 6/3] * [1 2]
u^T S u = (1/sqrt(5)) * [6/3 + 12/3]
u^T S u = (1/sqrt(5)) * 18/3
u^T S u = (1/sqrt(5)) * 6
Therefore, the value of u^T S u is 6/sqrt(5).
The answer from part 1 is the empirical variance of the data set, and the answer from part 2 (u^T S u) is a measure of how much the data set varies along the direction of u. These two values are not the same.