We consider a 1-dimensional logistic regression problem, i.e., assume that data \, X_ i \in \mathbb {R}, i = 1, \dots , n \, is given and that get independent observations of

Question

Y_ i | X_ i \sim 	extsf{Ber}\left( \frac{\mathbf e^{\beta X_ i}}{1 + \mathbf e^{\beta X_ i}} ight),
 
where \, \beta \in \mathbb {R} \,.

Moreover, recall that the associated log likelihood for \, \beta \, is then given by

\ell (\beta ) = \sum _{i=1}^{n} \left( Y_ i X_ i \beta - \ln (1 + \exp (X_ i \beta )) ight)
 
Calculate the first and second derivate of \, \ell \,. Instructions: The summation \sum _{i=1}^{n} is already placed to the left of the answer box. Enter the summands in terms of \beta, X_ i (enter “X_i") and Y_ i (enter “Y_i").

\displaystyle \ell '(\beta ) = \sum _{i=1}^{n} 
X_i*Y_i-((X_i*e^(X_i*beta))/(1+e^(X_i*beta)))
  correct 
 
\displaystyle \ell ^{\prime \prime }(\beta ) = \sum _{i=1}^{n} 
-(((X_i)^2)*e^(X_i*beta))/((1+e^(X_i*beta))^2)
  correct 
[Math Processing Error] 
What can you conclude about \, \ell '(\beta ) \,?

\, \ell ' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.

\, \ell ' \, is strictly decreasing.

\, \ell ' \, is strictly increasing.
correct
Submit
You have used 3 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(b)
3/3 points (graded)
Imagine we are given the following data (n=2):

\displaystyle X_1 = 0	\displaystyle Y_1 = 0	 	 
 	\displaystyle X_2 = 1	\displaystyle Y_2 = 1	 	 
In order to give the maximum likelihood estimator, we want to solve

\ell '(\beta ) = 0
 
for the given data.

First, we rewrite this as

\ell '(\beta ) = f(\beta ) + g,
 
where

f(\beta ) = -\sum _{i=1}^{n} X_ i \frac{1}{1 + \mathbf e^{-X_ i \beta }}.
 
and g is some appropriate value.

What is the range of \, f(\beta ) \,?

\mathbb {R}

\mathbb {R}_{< 0} = \{ r \in \mathbb {R}: r < 0\}

(-1,0), the unit open interval

\{ -1,0\}, the set containing two values, -1 and 0
correct
What is g?

1
 correct 
What can you conclude about the solution \, \beta \,?

\, \beta = 1 \,.

\, \beta = 0 \,.

There is no \, \beta \, that solves \, \ell '(\beta ) = 0 \,.

All \, \beta \in \mathbb {R} \, solve \, \ell '(\beta ) = 0 \,.
correct
SaveSave your answer
Submit
You have used 1 of 3 attemptsSome problems have options such as save, reset, hints, or show answer. These options follow the Submit button.
(c)
5 points possible (graded)
The problem you encountered in part (b) is called separation . It occurs when the \, Y_ i \, can be perfectly recovered by a linear classifier, i.e., when there is a \, \beta \, such that

\displaystyle X_ i \beta > 0 \implies {}	\displaystyle Y_ i = 1,	 	 
 	\displaystyle X_ i \beta < 0 \implies {}	\displaystyle Y_ i = 0.	 	 
In order to avoid this behavior, one option is to use a prior on \, \beta \,. Let us investigate what happens if we assume that \, \beta \, is drawn from a \, N(0, 1) \, distribution, i.e.,

P(\beta , Y | X) = P(\beta ) \prod _{i=1}^{n} P(Y_ i | X_ i, \beta )
 
What is the joint log likelihood \, \widetilde{\ell }(\beta ) \, of this Bayesian model? Again, for simplicity, let's plug in (X_1,Y_1) = (0,0) and (X_2,Y_2) = (1,1). (Try to work out the general formula on your own. It will also be provided in the solution.)

\, \widetilde{\ell }(\beta ) =\quad \, 
  unanswered 
 
Now, we want to find the maximum a posteriori probability estimate, which is obtained by finding \, \beta \, such that \, \widetilde{\ell }(\beta ) = 0 \,. To this end, calculate the first and second derivative \, \widetilde{\ell }'(\beta ) \, and \, \widetilde{\ell }"(\beta ) \,.

\displaystyle \widetilde\ell '(\beta )=\quad 
  unanswered 
 
\displaystyle \widetilde\ell ^{\prime \prime }(\beta )=\quad 
  unanswered 
 
What can you conclude about \, \widetilde{\ell }'(\beta ) \,?

\, \widetilde{\ell }' \, is neither increasing nor decreasing on the whole of \, \mathbb {R} \,.

\, \widetilde{\ell }' \, is strictly decreasing.

\, \widetilde{\ell }' \, is strictly increasing.
unanswered
Given the same data as in (b), what can you say about the existence of a solution?

Applying the same arguments as in (b), we see that there is no optimal \, \beta \,.

Modyfing the notation of \, f \, in (b) accordingly, we see that \, f \, now ranges over all of \, \mathbb {R} \,, hence there is a solution.

GPT 3.5 · Answer

unanswered