Consider the statistical set-up from the previous problem. In particular, recall that \mathbf{u}= \frac{1}{\sqrt{5}} (1,2)^ T and

\displaystyle \mathrm{{\boldsymbol X}}_1 \, =\, \begin{pmatrix} 1\\ 2\end{pmatrix},\, \mathrm{{\boldsymbol X}}_2 \, = \, \begin{pmatrix} 3\\ 4\end{pmatrix},\, \mathrm{{\boldsymbol X}}_3 \, =\, \begin{pmatrix} -1 \\ 0\end{pmatrix}.
Observe that for i = 1,2,3, the number \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i (where \mathbf{u} is a unit vector) gives the signed distance from the origin to the endpoint of the projection \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i. By signed distance , we mean that \left| \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i \right| is the length of \text {proj}_{\mathbf{u}} \mathrm{{\boldsymbol X}}_ i and

\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i > 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the direction of } \, \mathbf{u}
\displaystyle \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_ i < 0 \displaystyle \Longrightarrow \mathrm{{\boldsymbol X}}_ i \, \, \text {points approximately in the opposite direction of } \, \mathbf{u}\,
Compute the empirical variance of the data set

\mathbf{u}\cdot \mathrm{{\boldsymbol X}}_1,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_2,\, \mathbf{u}\cdot \mathrm{{\boldsymbol X}}_3.

unanswered
Let \mathbb {X} denote the matrix whose i-th row is \mathrm{{\boldsymbol X}}_ i^ T.

Recall that S = \frac{1}{3} \mathbb {X}^ T (I_3 - \frac{1}{3} \mathbf{1} \mathbf{1}^ T) \mathbb {X} denotes the empirical covariance matrix of our data set.

What is \mathbf{u}^ T S \mathbf{u}?
(You are encouraged to use computational software.)

unanswered
Are your answers from part 1 and part 2 of this question the same?

1 answer

To compute the empirical variance of the data set, we first need to compute the empirical covariance matrix S.

First, let's compute the matrix X:

X = [1 2; 3 4; -1 0]

Next, let's compute the matrix 1:

1 = [1 1 1]

Now, let's compute the matrix A:

A = I3 - (1/3) * 11'

A = [2/3 -1/3 -1/3; -1/3 2/3 -1/3; -1/3 -1/3 2/3]

Finally, let's compute the empirical covariance matrix S:

S = (1/3) * X'A*X

S = [14/3 -4/3; -4/3 4/3]

Now, let's compute u^T S u:

u = (1/sqrt(5)) * [1 2]

u^T S u = (1/sqrt(5)) * [1 2] * [14/3 -4/3; -4/3 4/3] * [1 2]

u^T S u = (1/sqrt(5)) * [6/3 6/3] * [1 2]

u^T S u = (1/sqrt(5)) * [6/3 + 12/3]

u^T S u = (1/sqrt(5)) * 18/3

u^T S u = (1/sqrt(5)) * 6

Therefore, the value of u^T S u is 6/sqrt(5).

The answer from part 1 is the empirical variance of the data set, and the answer from part 2 (u^T S u) is a measure of how much the data set varies along the direction of u. These two values are not the same.