Next: Gibbs sampling continued Up: No Title Previous: Generalisations of k-means (3)

Gibbs sampling

Method for finding ungapped multiple sequence alignments without getting trapped in local optima.

Essentially: holding the rest of the alignment fixed, sample the indentation of one sequence.

    Sequence#1  (+)   18 GCGTACCACT 27
    Sequence#2  (+)    8 GCGTACCACG 17
    Sequence#3  (-)   12 GCGTACCCCG 21
    Sequence#4  (+)    5 GCTGACTACG 14
    Sequence#5  (-)   14 GCATACCGCG 23
    Sequence#6  (+)   18 TCGTACCACG 27

    Sequence#7:
         ctaaggctcggatgcgTACGACGACA
          ctaaggctcggatgcGTACGACGACa
           ctaaggctcggatgCGTACGACGAca
            ctaaggctcggatGCGTACGACGaca
             ctaaggctcggaTGCGTACGACgaca
                .....

Likelihood of aligning residue r to column c is

$\begin{displaymath}p_r^{(c)} = (f_r^{(c)} + D_r) / \sum_s (f_s^{(c)} + D_s) \end{displaymath}$

where f^(c)_r is the observed frequency of residue r in column c, and D_r are (Dirichlet) pseudocounts.

(Lawrence et al, Science 262, 1993)

2000-04-26