Consider what happens if we take a couple of real numbers, and
, and use them to scale/offset the expression data for gene g by applying the mapping
to all likelihood formulae involving expression profiles.
If we choose
then this is equivalent to the ``normalization'' of profile vectors that is commonly applied to expression data preceding analysis. While this normalization is effective as a correction for biases in the experimental protocol, it is also crude: it does not distinguish between systematic experimental errors and genuine differences in transcription levels.
Our likelihood framework permits a more principled approach. We can incorporate prior probabilities for the model-independent scaling parameters and
. Suitable priors might be the Gaussian distributions
and
. The values of
and
can then be sampled at some point during the Gibbs/EM update procedure, e.g. between the Gibbs and Expectation steps. One way to do the sampling is to generate values from the prior distribution, then choose one of these values randomly according to the likelihoods given by equation (12).
It is equally straightforward to incorporate experiment-dependent parameters into the scaling, e.g.
with and
.
When such additional parameters are allowed, we may constrain them (by keeping ,
,
and
small) to avoid overfitting when the dataset is sparse.