Normal Linear Regression
We consider the following model
 often assuming
with either
or
.
(Look at the generalisation of this using G-Wishart priors as in
Maestrini and Wand 2020)
The latter can be expressed as
The proposed product-form variational densities are
with optimal solutions
 where the
parameter is dropped if not needed.
From the model above, the joint density is
.
Model Likelihood
The log-likelihood has the form
 so that
 due to the assumed variational
independence of
and
.
To obtain the above, (letting
and
)
 
Model Coefficients
For the regression coefficients, the full conditional is
From which, the optimal density is
The exact formulation of this density depends upon the form of
and the resulting density
.
Assuming that
,
replace
with
.
 
Inverse-Gamma prior on
For the variance component assuming
,
 From which, the optimal density is
 implying that
 in the variational parameters for
.
An alternative form for the
term which avoids repeated computation of statistics is to use
The updates are then
 until the change in
is below a specified tolerance level indicating convergence.
The lower bound itself is
 
Half-
prior on
(Maestrini and Wand 2020)
The optimal density for
is unchanged from the previous section.
For
,
 From which, the optimal density is
For
,
 from which, the optimal density is
The updates are then
 
 
Linear Regression
(Wand et al. 2010)
We replace
by
 which is equivalent to
Assuming an inverse-gamma prior on
,
the full-conditionals satisfy
We use the factorisation
which, from the form of the full-conditionals above, results in
The ELBO is then
 
Linear Mixed Models
Either of the previous likelihoods may be extended to a hiearchical
model
 Where we specify grouped parameters,
 such that
hint at the structure of
,
e.g. (but not restricted to)
This covers combinations of different hierarchical terms in the
linear predictor for example,
 or
Similarly to the fixed effects case, we can consider priors on the
variance components
 where if
then
and
The derivation for the optimal densities follows that for the linear
regression model. Define
,
,
and
Then
 From which, the optimal density is
Suppose we again assumed
and
,
then similar to before
The new variational densities are those for
.
Consider just one
where
(i.e. a standard random effects model) and where
then
 Note that
 So the optimal density is also
Inverse-Wishart, i.e.
 where
 and where the required expectations are
 
Notation
The symbol
is used to indicate equality up to an additive constant, similar to
for multiplicative constants.
 
References
Wand, Matt, JT Ormerod, SA Padoan, and R Fruhwirth. 2010.
“Variational Bayes for Elaborate Distributions.”