THE EM ALGORITHM

Rahul Saini July 07, 2021

The EM algorithm can be used even for variables whose value is never directly observed, provided the general form of the probability distribution governing these variables is known.

Estimating Means of k Gaussian's

Consider a problem in which the data D is a set of instances generated by a probability distribution that is a mixture of k distinct Normal distributions.

This problem setting is illustrated in Figure for the case where k = 2 and where the instances are the points shown along the x axis.
Each instance is generated using a two-step process.

First, one of the k Normal distributions is selected at random.
Second, a single random instance xi is generated according to this selected distribution.

This process is repeated to generate a set of data points as shown in the figure.
To simplify, consider the special case

The selection of the single Normal distribution at each step is based on choosing each with uniform probability
Each of the k Normal distributions has the same variance σ², known value.

The learning task is to output a hypothesis h = (μ1 , . . . ,μk) that describes the means of each of the k distributions.
We would like to find a maximum likelihood hypothesis for these means; that is, a hypothesis h that maximizes p(D |h).

Our problem here, however, involves a mixture of k different Normal distributions, and we cannot observe which instances were generated by which distribution.
Consider full description of each instance as the triple (xi, zi1, zi2),

where xi is the observed value of the ith instance and
where zi1 and zi2 indicate which of the two Normal distributions was used to generate the value xi

In particular, zij has the value 1 if xi was created by the j^th Normal distribution and 0 otherwise.
Here xi is the observed variable in the description of the instance, and zil and zi2 are hidden variables.
If the values of zil and zi2 were observed, we could use following Equation to solve for the means p1 and p2
Because they are not, we will instead use the EM algorithm

EM algorithm

Post a Comment

0 Comments