When computing the least-squares estimator, we are computing some \hat{{\boldsymbol \beta }} which minimizes the error
\min _{{\boldsymbol \beta }\in \mathbb {R}^ p} \| \mathbf Y- \mathbb {X}{\boldsymbol \beta }\| _2^2
where \| v\| _2 is the Euclidean norm.
Let n be the number of samples and let each \mathbf X_ i be p-dimensional. (For example, n might be the number of patients, and p-1 is the number of covariates that we are trying to study - e.g. height, weight, age and blood pressure as in the previous problem.)
Recall that by employing the same technique of computing the gradient (with respect to the components of {\boldsymbol \beta }) and setting it equal to zero, we can show that \hat{{\boldsymbol \beta }} must satisfy the score equation
\mathbb {X}^ T \mathbb {X} \hat{{\boldsymbol \beta }} = \mathbb {X}^ T \mathbf Y.
We would like to isolate \hat{{\boldsymbol \beta }} by multiplying by (\mathbb {X}^ T \mathbb {X})^{-1} from the left. Which of the following conditions, each on its own, guarantees that \mathbb {X}^ T \mathbb {X} is invertible? Choose all that apply.
Hint: Use the fact from linear algebra that \mathrm{rank}(A^ T A) = \mathrm{rank}(A). What are the dimensions of \mathbb {X}^ T \mathbb {X}?
There are at least as many observations as covariates (i.e. n \geq p).
There are at least as many covariates as observations (i.e. n \leq p).
\mathbb {X} has rank n, where n is the number of samples.
\mathbb {X} has rank p, where p is the number of covariates.
There are at least p distinct samples amongst the X's, so that \mathbb {X} has at least p distinct rows.
There are at least p distinct values amongst the Y.
1 answer
1) There are at least as many observations as covariates (i.e. n \geq p).
2) \mathbb {X} has rank p, where p is the number of covariates.
These conditions ensure that the matrix \mathbb {X}^ T \mathbb {X} has full rank and is therefore invertible.