We now move to the question of how to cluster sequence and expression data simultaneously using multiple competing models. The prototype optimization algorithms (Expectation/Maximization for clustering, Gibbs sampling for alignment) have already been introduced, so it only remains to combine them. We will describe just one of several ways that this can be done.
We must introduce some more notation (the last!). Suppose that we have k sequence-expression models; the m'th model is and its parameters are
.
We also allow a null sequence-expression model, . The null sequence likelihood is given by (6) and the null expression likelihood by setting
equal to
in (3). The joint null likelihood is the product of these two likelihoods
by analogy with equation (11).
The null sequence model generates independent unrelated sequences and the null expression model generates expression profiles each of which is effectively an independent one-member cluster.
The basic job of our clustering algorithm is to assign each gene to a cluster. We handle this in our formalism by having a different multiplicities vector for each model, denoting the vector for the m'th model by (and for the null model,
). Then the act of assigning gene g to model
is equivalent to setting
In our Gibbs/EM algorithm, the are the missing data. The E-step of the EM algorithm involves setting these missing data to their expectation values according to the posterior probability distribution
(i.e. multiplying equation (14) by this probability and summing over m).
In other words, as with k-means, we effectively force the models to compete to explain the genes. At the same time, by allowing the to be continuously valued expectations rather than zero or one, we allow uncertainty where k-means does not. See e.g. Bishop95 Bishop95 for an alternative explanation of this approach.
We are now ready to describe our Gibbs/EM update procedure, which is:
Relevant equations (apart from Bayes' rule) are (7), (8) and (10) for the Gibbs sampling step, (12) and (13) for the Expectation step (we can also introduce prior model probabilities here) and (5) for the Maximization step.
Another way of viewing this algorithm is as a mixture density estimation problem, where a k-component mixture of models must be fit to an observed scattering of sequence-expression data. Viewed this way, the oft-quoted limitation of k-means (that the parameter k must be known in advance) seems not insurmountable, as mixture density estimation when the number of components is unknown is a familiar problem in Bayesian statistics.