Consider the following mixture of two Gaussians:

p(x;θ)=π1N(x;μ1,σ21)+π2N(x;μ2,σ22)
This mixture has parameters θ={π1,π2,μ1,μ2,σ21,σ22}. They correspond to the mixing proportions, means, and variances of each Gaussian. We initialize θ as θ0={0.5,0.5,6,7,1,4}.

We have a dataset D with the following samples of x: x(0)=−1, x(1)=0, x(2)=4, x(3)=5, x(4)=6.

We want to set our parameters θ such that the data log-likelihood l(D;θ) is maximized:

argmaxθ ∑i=04logp(x(i);θ).
Recall that we can do this with the EM algorithm. The algorithm optimizes a lower bound on the log-likelihood, thus iteratively pushing the data likelihood upwards. The iterative algorithm is specified by two steps applied successively:

E-step: infer component assignments from current θ0=θ (complete the data)

p(y=k∣x(i)):=p(y=k∣x(i);θ0), for k=1,2, and i=0,…,4.
M-step: maximize the expected log-likelihood

l~(D;θ):=∑i∑kp(y=k∣x(i))logp(x(i),y=k;θ)p(y=k∣x(i))
with respect to θ while keeping p(y=k∣x(i)) fixed.

To see why this optimizes a lower bound, consider the following inequality:

logp(x;θ) =log∑yp(x,y;θ)
=log∑yq(y|x)p(x,y;θ)q(y|x)
=logEy∼q(y|x)[p(x,y;θ)q(y|x)]
≥Ey∼q(y|x)[logp(x,y;θ)q(y|x)]
=∑yq(y|x)logp(x,y;θ)q(y|x)
where the inequality comes from Jensen's inequality . EM makes this bound tight for the current setting of θ by setting q(y|x) to be p(y∣x;θ0).

Note: If you have taken 6.431x Probability–The Science of Uncertainty, you could review the video in Unit 8: Limit Theorems and Classical Statistics, Additional Theoretical Material, 2. Jensen's Inequality.

Likelihood Function
1 point possible (graded)
What is the log-likelihood of the data l(D;θ) given the initial setting of θ? Please round to the nearest tenth.

Note: You will want to write a script to calculate this, using the natural log (np.log) and np.float64 data types.

1 answer

To calculate the log-likelihood of the data \( l(D;\theta) \) given the initial setting of the parameters \( \theta_0 = {0.5, 0.5, 6, 7, 1, 4} \) (where \(\pi_1 = 0.5\), \(\pi_2 = 0.5\), \(\mu_1 = 6\), \(\mu_2 = 7\), \(\sigma_1^2 = 1\), and \(\sigma_2^2 = 4\)), we can follow these steps:

  1. Define the Gaussian function: The probability density function for a Gaussian distribution is given by:

    \[ N(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \]

  2. Calculate the mixture probability: The mixture probability for the given parameters is:

    \[ p(x; \theta) = \pi_1 \cdot N(x; \mu_1, \sigma_1^2) + \pi_2 \cdot N(x; \mu_2, \sigma_2^2) \]

  3. Compute the log-likelihood: The log-likelihood is the sum of the log of this mixture probability over all data points \( D = {x(0), x(1), x(2), x(3), x(4)} \).

The dataset given is:

  • \( x(0) = -1 \)
  • \( x(1) = 0 \)
  • \( x(2) = 4 \)
  • \( x(3) = 5 \)
  • \( x(4) = 6 \)

Implementation in Python

Below is a Python script that calculates the log-likelihood based on the given parameters and the data points.

import numpy as np

# Parameters
pi1 = 0.5
pi2 = 0.5
mu1 = 6
mu2 = 7
sigma1_squared = 1
sigma2_squared = 4

# Dataset
data = np.array([-1, 0, 4, 5, 6])

# Gaussian function
def gaussian(x, mu, sigma_squared):
    return (1 / np.sqrt(2 * np.pi * sigma_squared)) * np.exp(-((x - mu) ** 2) / (2 * sigma_squared))

# Calculate the log-likelihood
log_likelihood = 0
for x in data:
    p1 = gaussian(x, mu1, sigma1_squared)
    p2 = gaussian(x, mu2, sigma2_squared)
    mixture_prob = pi1 * p1 + pi2 * p2
    log_likelihood += np.log(mixture_prob)

# Round result to nearest tenth
log_likelihood_rounded = round(log_likelihood, 1)
print(log_likelihood_rounded)

Output

You can run the above script in a Python environment to get the log-likelihood rounded to the nearest tenth. However, to summarize the expected output:

After executing the code, the final output would be the value of the log-likelihood rounded to one decimal place, which you can plug in to achieve your answer.