When computing the least-squares estimator, we are computing some \hat{{\boldsymbol \beta }} which minimizes the error

Question

\min _{{\boldsymbol \beta }\in \mathbb {R}^ p} \|  \mathbf Y- \mathbb {X}{\boldsymbol \beta }\| _2^2
 
where \| v\| _2 is the Euclidean norm.

Let n be the number of samples and let each \mathbf X_ i be p-dimensional. (For example, n might be the number of patients, and p-1 is the number of covariates that we are trying to study - e.g. height, weight, age and blood pressure as in the previous problem.)

Recall that by employing the same technique of computing the gradient (with respect to the components of {\boldsymbol \beta }) and setting it equal to zero, we can show that \hat{{\boldsymbol \beta }} must satisfy the score equation

\mathbb {X}^ T \mathbb {X} \hat{{\boldsymbol \beta }} = \mathbb {X}^ T \mathbf Y.
 
We would like to isolate \hat{{\boldsymbol \beta }} by multiplying by (\mathbb {X}^ T \mathbb {X})^{-1} from the left. Which of the following conditions, each on its own, guarantees that \mathbb {X}^ T \mathbb {X} is invertible? Choose all that apply.

Hint: Use the fact from linear algebra that \mathrm{rank}(A^ T A) = \mathrm{rank}(A). What are the dimensions of \mathbb {X}^ T \mathbb {X}?

There are at least as many observations as covariates (i.e. n \geq p).

There are at least as many covariates as observations (i.e. n \leq p).

\mathbb {X} has rank n, where n is the number of samples.

\mathbb {X} has rank p, where p is the number of covariates.

There are at least p distinct samples amongst the X's, so that \mathbb {X} has at least p distinct rows.

There are at least p distinct values amongst the Y.

GPT 3.5 · Answer

The conditions that guarantee that \mathbb {X}^ T \mathbb {X} is invertible are:

1) There are at least as many observations as covariates (i.e. n \geq p).
2) \mathbb {X} has rank p, where p is the number of covariates.

These conditions ensure that the matrix \mathbb {X}^ T \mathbb {X} has full rank and is therefore invertible.