Question

An aerospace manufacturing company would like to assess the performance of its existing planes for its latest design. Based on a sample size of n = 1000 flights, each with an identically designed plane, it collects data of the form (x_{1},y_{1}), \ldots , (x_{1000},y_{1000}), where x represents the distance traveled and y represents liters of fuel consumed.

You, as a statistician hired by the company, decide to perform linear regression on the model y = a + bx to predict the efficiency of the design. In the context of linear regression, recall that the mathematical model calls for:

\mathbf Y= \left( \begin{array}{c} y_1 \\ \vdots \\ y_{1000} \end{array} \right) \in \mathbb {R}^{1000}, \quad {\boldsymbol \varepsilon }\in \mathbb {R}^{1000}, \quad \mathbb {X} = \left( \begin{array}{cc} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_{1000} \end{array} \right) \in \mathbb {R}^{1000 \times 2}, \quad {\boldsymbol \beta }= \left( \begin{array}{c} a \\ b \end{array} \right) \in \mathbb {R}^2.

Assume that {\boldsymbol \varepsilon }\sim \mathcal{N}(0, \sigma ^2 I_{1000}) for some fixed \sigma ^2, so that \mathbf Y\sim \mathcal{N}(\mathbb {X} {\boldsymbol \beta }, \sigma ^2 I_{1000}).

Using the setup as above, you compute the LSE, which comes out to

\hat{{\boldsymbol \beta }} = \left( \begin{array}{c} \hat{a} \\ \hat{b} \end{array} \right) = \left( \begin{array}{c} 0.8 \text { liters} \\ 15.0 \text { liters / km} \end{array} \right).

Just from \hat{{\boldsymbol \beta }}, what is a reasonable prediction for the total amount of fuel a plane (in liters) consumes after 200 kilometers?

Using the same setup as the previous problem:

\hat{{\boldsymbol \beta }} = \left( \begin{array}{c} \hat{a} \\ \hat{b} \end{array} \right) = \left( \begin{array}{c} 0.8 \text { liters} \\ 15.0 \text { liters / km} \end{array} \right).

Using n=1000 samples, by thinking of \mathbf Y as the vector of observations, we might also consider the Maximum Likelihood Estimator \beta _{MLE}. As a reminder, {\boldsymbol \beta }_{MLE} maximizes, over all choices of {\boldsymbol \beta }, the likelihood (or the log-likelihood) of \mathbf Y\sim \mathcal{N}(\mathbb {X} {\boldsymbol \beta }, \sigma ^2 I_{1000}).

Numerically, {\boldsymbol \beta }_{MLE} = \left( \begin{array}{c} a_{MLE} \\ b_{MLE} \end{array} \right), where:

a_{MLE} =
b_{MLE} =

Answers

GPT 3.5
To find $\beta_{MLE}$, we need to maximize the likelihood (or the log-likelihood) of $Y \sim \mathcal{N}(X\beta, \sigma^2 I_{1000})$. Since $Y$ follows a normal distribution, the log-likelihood function is given by:

$$\ell(\beta) = -\frac{1}{2} \log(|\sigma^2 I_{1000}|) - \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)$$

To maximize $\ell(\beta)$, we can take the derivative with respect to $\beta$ and set it equal to zero. Let's calculate the derivatives:

$$\frac{\partial \ell(\beta)}{\partial \beta} = \frac{\partial}{\partial \beta}\left(-\frac{1}{2} \log(|\sigma^2 I_{1000}|) - \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right)$$

The first term, $\frac{\partial}{\partial \beta}(-\frac{1}{2} \log(|\sigma^2 I_{1000}|))$, is equal to zero because it does not depend on $\beta$.

For the second term, we can calculate the derivative as follows (using the chain rule and the fact that $\frac{\partial A^{-1}}{\partial x} = -A^{-1}\frac{\partial A}{\partial x}A^{-1}$):

$$\frac{\partial}{\partial \beta}\left(- \frac{1}{2}(Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right) = -\frac{1}{2} \frac{\partial}{\partial \beta}\left((Y - X\beta)^T(\sigma^2 I_{1000})^{-1}(Y - X\beta)\right)$$
$$= -\frac{1}{2} \frac{\partial}{\partial \beta}\left(\beta^TX^T\sigma^2 I_{1000}(Y - X\beta)\right) = -\frac{1}{2}(-2X^T\sigma^2 I_{1000})(Y - X\beta)$$
$$= \sigma^2X^T(Y - X\beta)$$

Setting this derivative equal to zero, we have:

$$\sigma^2X^T(Y - X\beta) = 0$$

Simplifying, we get:

$$X^TY - X^TX\beta = 0$$

Multiplying by $X^T$ on both sides, we have:

$$X^TX\beta = X^TY$$

Solving for $\beta$, we get:

$$\beta_{MLE} = (X^TX)^{-1}X^TY$$

Substituting the values of $X$, $Y$, and $\hat{\beta}$ given in the problem, we can calculate $\beta_{MLE}$:

$$\beta_{MLE} = \left(\begin{array}{cc} 1 & 15.0 \\ 1 & 15.0 \\ \vdots & \vdots \\ 1 & 15.0 \end{array}\right)^{-1} \left(\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_{1000} \end{array}\right)$$

Simplifying:

$$\beta_{MLE} = \left(\begin{array}{cc} 1 & 1000 \\ 1 & 1000 \\ \vdots & \vdots \\ 1 & 1000 \end{array}\right) \left(\begin{array}{c} 0.8 \\ 15.0 \end{array}\right)$$

$$\beta_{MLE} = \left(\begin{array}{c} 800 \\ 15,000 \end{array}\right)$$

Therefore, the maximum likelihood estimate for $\beta$ is $\beta_{MLE} = \left(\begin{array}{c} 800 \\ 15,000 \end{array}\right)$.

Related Questions