Since the exponent is quadratic, the above probability density is equivalent to a product of Gaussian radial basis functions. The vectors are the basis function centres and
is an inverse variance parameter.
The multiplicities are missing data. The EM algorithm (Dempster, Laird & Rubin) gives a way to maximise likelihoods with missing data as follows:
As gets large, the expected multiplicities
tend to one for the cluster nearest point i and zero otherwise. So in the limit of infinite variance, this algorithm reduces to k-means (Bishop, 1995, ``Neural Networks for Pattern Recognition'').