We consider a 1-dimensional logistic regression problem, i.e., assume that data \, X_ i \in \mathbb {R}, i = 1, \dots , n \, is given and that get independent observations of
Y_ i | X_ i \sim \textsf{Ber}\left( \frac{\mathbf e^{\beta X_ i}}{1 + \mathbf e^{\beta X_ i}} \right),
where \, \beta \in \mathbb {R} \,.
Moreover, recall that the associated log likelihood for \, \beta \, is then given by
\ell (\beta ) = \sum _{i=1}^{n} \left( Y_ i X_ i \beta - \ln (1 + \exp (X_ i \beta )) \right)
Calculate the first and second derivate of \, \ell \,. Instructions: The summation \sum _{i=1}^{n} is already placed to the left of the answer box. Enter the summands in terms of \beta, X_ i (enter “X_i") and Y_ i (enter “Y_i").
\displaystyle \ell '(\beta ) = \sum _{i=1}^{n}
X_i*Y_i-((X_i*e^(X_i*beta))/(1+e^(X_i*beta)))
correct
\displaystyle \ell ^{\prime \prime }(\beta ) = \sum _{i=1}^{n}
-(((X_i)^2)*e^(X_i*beta))/((1+e^(X_i*beta))^2)
correct
[Math Processing Error]
What can you conclude about \, \ell '(\beta ) \,?
\, \ell ' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.
\, \ell ' \, is strictly decreasing.
\, \ell ' \, is strictly increasing.
correct
Submit
You have used 3 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
3/3 points (graded)
Imagine we are given the following data (n=2):
\displaystyle X_1 = 0 \displaystyle Y_1 = 0
\displaystyle X_2 = 1 \displaystyle Y_2 = 1
In order to give the maximum likelihood estimator, we want to solve
\ell '(\beta ) = 0
for the given data.
First, we rewrite this as
\ell '(\beta ) = f(\beta ) + g,
where
f(\beta ) = -\sum _{i=1}^{n} X_ i \frac{1}{1 + \mathbf e^{-X_ i \beta }}.
and g is some appropriate value.
What is the range of \, f(\beta ) \,?
\mathbb {R}
\mathbb {R}_{< 0} = \{ r \in \mathbb {R}: r < 0\}
(-1,0), the unit open interval
\{ -1,0\}, the set containing two values, -1 and 0
correct
What is g?
1
correct
What can you conclude about the solution \, \beta \,?
\, \beta = 1 \,.
\, \beta = 0 \,.
There is no \, \beta \, that solves \, \ell '(\beta ) = 0 \,.
All \, \beta \in \mathbb {R} \, solve \, \ell '(\beta ) = 0 \,.
correct
SaveSave your answer
Submit
You have used 1 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(c)
5 points possible (graded)
The problem you encountered in part (b) is called separation . It occurs when the \, Y_ i \, can be perfectly recovered by a linear classifier, i.e., when there is a \, \beta \, such that
\displaystyle X_ i \beta > 0 \implies {} \displaystyle Y_ i = 1,
\displaystyle X_ i \beta < 0 \implies {} \displaystyle Y_ i = 0.
In order to avoid this behavior, one option is to use a prior on \, \beta \,. Let us investigate what happens if we assume that \, \beta \, is drawn from a \, N(0, 1) \, distribution, i.e.,
P(\beta , Y | X) = P(\beta ) \prod _{i=1}^{n} P(Y_ i | X_ i, \beta )
What is the joint log likelihood \, \widetilde{\ell }(\beta ) \, of this Bayesian model? Again, for simplicity, let's plug in (X_1,Y_1) = (0,0) and (X_2,Y_2) = (1,1). (Try to work out the general formula on your own. It will also be provided in the solution.)
\, \widetilde{\ell }(\beta ) =\quad \,
unanswered
Now, we want to find the maximum a posteriori probability estimate, which is obtained by finding \, \beta \, such that \, \widetilde{\ell }(\beta ) = 0 \,. To this end, calculate the first and second derivative \, \widetilde{\ell }'(\beta ) \, and \, \widetilde{\ell }"(\beta ) \,.
\displaystyle \widetilde\ell '(\beta )=\quad
unanswered
\displaystyle \widetilde\ell ^{\prime \prime }(\beta )=\quad
unanswered
What can you conclude about \, \widetilde{\ell }'(\beta ) \,?
\, \widetilde{\ell }' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.
\, \widetilde{\ell }' \, is strictly decreasing.
\, \widetilde{\ell }' \, is strictly increasing.
unanswered
Given the same data as in (b), what can you say about the existence of a solution?
Applying the same arguments as in (b), we see that there is no optimal \, \beta \,.
Modyfing the notation of \, f \, in (b) accordingly, we see that \, f \, now ranges over all of \, \mathbb {R} \,, hence there is a solution.
1 answer