The likelihood of sequence , at indentation dS, conditioned on the rest of the alignment
, is given by the product of the column likelihoods:
where is the residue at position i of sequence
.
Extensions:
Traditional procedure is to (i) cluster microarray data; (ii) sample upstream sequences to find promoter motifs (see e.g. Tavazoie et al, Nature Genetics 22, 1999).
However, we now have all the ammo necessary to make joint, competitive models of sequence and array data: