Normal Linear Regression
We consider the following model
often assuming
with either
or
.
(Look at the generalisation of this using G-Wishart priors as in
Maestrini and Wand 2020)
The latter can be expressed as
The proposed product-form variational densities are
with optimal solutions
where the
parameter is dropped if not needed.
From the model above, the joint density is
.
Model Likelihood
The log-likelihood has the form
so that
due to the assumed variational
independence of
and
.
To obtain the above, (letting
and
)
Model Coefficients
For the regression coefficients, the full conditional is
From which, the optimal density is
The exact formulation of this density depends upon the form of
and the resulting density
.
Assuming that
,
replace
with
.
Inverse-Gamma prior on
For the variance component assuming
,
From which, the optimal density is
implying that
in the variational parameters for
.
An alternative form for the
term which avoids repeated computation of statistics is to use
The updates are then
until the change in
is below a specified tolerance level indicating convergence.
The lower bound itself is
Half-
prior on
(Maestrini and Wand 2020)
The optimal density for
is unchanged from the previous section.
For
,
From which, the optimal density is
For
,
from which, the optimal density is
The updates are then
Linear Regression
(Wand et al. 2010)
We replace
by
which is equivalent to
Assuming an inverse-gamma prior on
,
the full-conditionals satisfy
We use the factorisation
which, from the form of the full-conditionals above, results in
The ELBO is then
Linear Mixed Models
Either of the previous likelihoods may be extended to a hiearchical
model
Where we specify grouped parameters,
such that
hint at the structure of
,
e.g. (but not restricted to)
This covers combinations of different hierarchical terms in the
linear predictor for example,
or
Similarly to the fixed effects case, we can consider priors on the
variance components
where if
then
and
The derivation for the optimal densities follows that for the linear
regression model. Define
,
,
and
Then
From which, the optimal density is
Suppose we again assumed
and
,
then similar to before
The new variational densities are those for
.
Consider just one
where
(i.e. a standard random effects model) and where
then
Note that
So the optimal density is also
Inverse-Wishart, i.e.
where
and where the required expectations are
Notation
The symbol
is used to indicate equality up to an additive constant, similar to
for multiplicative constants.
References
Wand, Matt, JT Ormerod, SA Padoan, and R Fruhwirth. 2010.
“Variational Bayes for Elaborate Distributions.”