![\begin{displaymath}\mbox{Pr}\left[ {\bf x}, \rho \vert \lambda, {\bf c} \right] ... ...um_j \rho_j^{(i)} \vert {\bf x}_i - {\bf c}_j \vert^2 \right)} \end{displaymath}](img11.gif)
Since the exponent is quadratic, the above probability density is equivalent to a product of Gaussian radial basis functions. The
vectors are the basis function centres and
is an inverse variance parameter.
The multiplicities
are missing data. The EM algorithm (Dempster, Laird & Rubin) gives a way to maximise likelihoods with missing data as follows:
As
gets large, the expected multiplicities
tend to one for the cluster nearest point i and zero otherwise. So in the limit of infinite variance, this algorithm reduces to k-means (Bishop, 1995, ``Neural Networks for Pattern Recognition'').