next up previous
Next: Generalisations of k-means (1) Up: No Title Previous: Bayesian interpretation of k-means

Bayesian interpretation of k-means (3)


\begin{displaymath}\mbox{Pr}\left[ {\bf x}, \rho \vert \lambda, {\bf c} \right] ... ...um_j \rho_j^{(i)} \vert {\bf x}_i - {\bf c}_j \vert^2 \right)} \end{displaymath}


Since the exponent is quadratic, the above probability density is equivalent to a product of Gaussian radial basis functions. The ${\bf c}$ vectors are the basis function centres and $\lambda$ is an inverse variance parameter.


The multiplicities $\rho$ are missing data. The EM algorithm (Dempster, Laird & Rubin) gives a way to maximise likelihoods with missing data as follows:

As $\lambda$ gets large, the expected multiplicities $\hat{\rho}_j^{(i)}$ tend to one for the cluster nearest point i and zero otherwise. So in the limit of infinite variance, this algorithm reduces to k-means (Bishop, 1995, ``Neural Networks for Pattern Recognition'').




2000-04-26