We continue with the LR-test on the HIP study.
Let Y_ T and Y_ C be the numbers of cancer deaths in the treatment and control groups respectively. Assuming these are independent from each other, the probability of having y_ t breast cancer deaths in the treatment group and y_ c breast cancer deaths in the control group is the product
\displaystyle \displaystyle \mathbf{P}(Y_ T=y_ t, Y_ C=y_ c) \displaystyle = \displaystyle \mathbf{P}(Y_ T=y_ t) \mathbf{P}(Y_ C=y_ c).
Recall the HIP mammography study data:
We use the binomial model for Y_ T and Y_ C:
\displaystyle \displaystyle Y_ T\sim \text {Binom}(31000, \pi _ T)
\displaystyle Y_ C\sim \text {Binom}(31000, \pi _ C)
The likelihood ratio test statistic is
\displaystyle \displaystyle \Lambda (y_ T, y_ C) \displaystyle = \displaystyle -2\log \frac{\max _{\Theta _0} \mathbf{P}(y_ T,y_ C;\pi _ T,\pi _ C)}{\max _{\Theta _ A} \mathbf{P}(y_ T,y_ C;\pi _ T,\pi _ C)}
\displaystyle = \displaystyle -2\log \frac{\max _{\pi _ T=\pi _ C\in [0,1]}\mathbf{P}(y_ T,y_ C;\pi )}{\max _{\pi _ T\neq \pi _ C} \mathbf{P}(y_ T,y_ C;\pi _ T,\pi _ C)}
\displaystyle = \displaystyle -2\log \frac{\max _{\pi _ T=\pi _ C=\pi \in [0,1]}\mathbf{P}\left(\text {Binom}(31000,\pi ) = y_ T\right)\mathbf{P}\left(\text {Binom}(31000,\pi ) = y_ C\right) }{\max _{\pi _ T\neq \pi _ C} \mathbf{P}\left(\text {Binom}(31000,\pi _ T) = y_ T\right)\mathbf{P}\left(\text {Binom}(31000,\pi _ C) = y_ C\right)}
\displaystyle = \displaystyle -2\log \frac{\mathbf{P}\left(\text {Binom}(31000,{\color{blue}{\hat{\pi }^{\text {MLE}}}} ) = y_ T\right)\mathbf{P}\left(\text {Binom}(31000,{\color{blue}{\hat{\pi }^{\text {MLE}}}} ) = y_ C\right) }{\mathbf{P}\left(\text {Binom}(31000,{\color{blue}{\hat{\pi }^{\text {MLE}}_ T}} ) = y_ T\right)\mathbf{P}\left(\text {Binom}(31000,{\color{blue}{\hat{\pi }^{\text {MLE}}_ C}} ) = y_ C\right)}
where we have used \displaystyle \mathbf{P}\left(\text {Binom}(n,p)=y\right) to denote the probability that a binomial variable with parameters n,p takes value y.
Based on the observed data, Find the parameters (\pi _ T,\pi _ C) that maximize the numerator and the denominator in the definition of the test statistic \Lambda. That is, find the 3 different maximum likelihood estimates (in blue ) in the expression above.
Review: MLE for Binomial Distribution
Show
The value \pi that maximizes \mathbf{P}(\text {Binom}(31000,\pi ) = 39)\mathbf{P}(\text {Binom}(31000,\pi ) = 63):
{\color{blue}{\hat{\pi }^{\text {MLE}}}} =\quad
unanswered
The value of \pi _ T that maximizes \mathbf{P}(\text {Binom}(31000,\pi _ T) = 39):
{\color{blue}{\hat{\pi }^{\text {MLE}}_ T}} =\quad
unanswered
The value of \pi _ C that maximizes \mathbf{P}(\text {Binom}(31000,\pi _ C) = 63):
{\color{blue}{\hat{\pi }^{\text {MLE}}_ C}} =\quad
unanswered
What is the value of the test statistic \Lambda based on observed data? (Enter the value with a precision of 3 decimal points.)
1 answer
For the numerator, the value of \pi that maximizes \mathbf{P}(\text{Binom}(31000,\pi) = 39)\mathbf{P}(\text{Binom}(31000,\pi) = 63) is the MLE for the overall breast cancer death rate. We can calculate it using the MLE formula for the binomial distribution:
{\color{blue}{\hat{\pi }^{\text {MLE}}}} = \frac{y_T + y_C}{31000 + 31000}
Substituting the values y_T = 39 and y_C = 63, we have:
{\color{blue}{\hat{\pi }^{\text {MLE}}}} = \frac{39 + 63}{31000 + 31000} = \frac{102}{62000} = 0.00164516
For the denominator, we want to find the values of \pi_T and \pi_C that maximize the probabilities \mathbf{P}(\text{Binom}(31000,\pi_T) = 39) and \mathbf{P}(\text{Binom}(31000,\pi_C) = 63) respectively.
To find the MLE for \pi_T, we can solve the equation:
\mathbf{P}(\text{Binom}(31000,\pi_T) = 39) = \left( \binom{31000}{39} \pi_T^{39} (1-\pi_T)^{31000-39} \right)
Taking the derivative with respect to \pi_T and setting it to zero, we can solve for \pi_T:
39\left(\frac{\pi_T^{38}(1-\pi_T)^{31000-39}}{\pi_T^{39}(1-\pi_T)^{31000-39}} - 1\right) = 0
Simplifying, we have:
\pi_T - \pi_T^{40} - (1-\pi_T)^{31000-39} = 0
Unfortunately, this equation does not have a closed-form solution and needs to be solved numerically. Using numerical methods or software, we can find that the MLE for \pi_T is approximately 0.00125806.
Similarly, for \pi_C, we want to find the MLE that maximizes \mathbf{P}(\text{Binom}(31000,\pi_C) = 63). Again, this requires solving an equation numerically. The MLE for \pi_C is approximately 0.00203226.
Therefore, the 3 different maximum likelihood estimates for the parameters (\pi_T, \pi_C) are:
{\color{blue}{\hat{\pi }^{\text {MLE}}}} = 0.00164516
{\color{blue}{\hat{\pi }^{\text {MLE}}_T}} = 0.00125806
{\color{blue}{\hat{\pi }^{\text {MLE}}_C}} = 0.00203226
To find the value of the test statistic \Lambda based on the observed data, we substitute these values into the formula:
\Lambda(y_T, y_C) = -2\log \frac{\mathbf{P}(\text{Binom}(31000,\hat{\pi }^{\text {MLE}}) = y_T)\mathbf{P}(\text{Binom}(31000,\hat{\pi }^{\text {MLE}}) = y_C) } {\mathbf{P}(\text{Binom}(31000,\hat{\pi }^{\text {MLE}}_T) = y_T)\mathbf{P}(\text{Binom}(31000,\hat{\pi }^{\text {MLE}}_C) = y_C)}
Substituting the observed values y_T = 39, y_C = 63, and the MLEs, we can calculate the test statistic:
\Lambda(39, 63) = -2\log \frac{\mathbf{P}(\text{Binom}(31000,0.00164516) = 39)\mathbf{P}(\text{Binom}(31000,0.00164516) = 63) } {\mathbf{P}(\text{Binom}(31000,0.00125806) = 39)\mathbf{P}(\text{Binom}(31000,0.00203226) = 63)}
Calculating this expression will give us the value of the test statistic \Lambda based on the observed data.