Deriving the Posterior Distribution for Population Variance

In an earlier post, we discussed how to derive the posterior distribution for the population mean. In this post, we will focus on deriving the posterior distribution for the variance parameter which is used in different types of Bayesian inference.

Background Context

In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical Bayesian estimation of discrete choice models etc), the following situation arises:

1. There are $$n$$ respondents whose response can be modeled by a set of independent variables and associated parameters. We will denote these parameters by $$\beta_i$$ .
2.

3. We assume that these individual parameters are from a normal distribution with mean $$\bar{\beta}$$ and variance matrix $$\Sigma$$ i.e., $$\beta_i \sim N(\bar{\beta}, \Sigma)$$.
4.

5. In Bayesian analysis, we incorporate prior information and one way to incorporate prior information for the variance matrix is to assume that $$\Sigma \sim W^{-1} \left(\Psi, \mathcal{v}\right)$$ where $$W^{-1}(.)$$ denotes the inverse wishart distribution.

Given the above context, the goal is to find out the posterior distribution for $$\Sigma$$ given the values for all the other parameters so that we can generate draws from that distribution. The posterior distribution for $$\Sigma$$ is a wishart distribution as given below:

$$f(\Sigma |-) \sim W^{-1} \left(\Psi + \sum_i{ \left(\beta_i-\bar{\beta}\right) \left(\beta_i-\bar{\beta}\right)^\top}, \mathcal{v} + n\right)$$

The next section gives a rough outline of the proof. Some steps are skipped and glossed over. Feel free to ask for clarification in the comments.

Informal Sketch of Proof

We assumed that $$\beta_i \sim N({\bar{\beta}}, \Sigma)$$ and $$\Sigma \sim W^{-1} \left(\Psi, \mathcal{v}\right)$$. Thus, using Bayes theorem, the posterior for $$\Sigma$$ is given by:

$$f \left(\Sigma | – \right) \propto f \left(\Sigma |-\right) \prod_i \left\{ f \left( \beta_i|-\right) \right\}$$

Collecting the terms that involve $$\Sigma$$, we get the following:

$$|\Sigma|^{-\frac{\mathcal{v}+p+1}{2}} e^{-\frac{1}{2} tr\left(\Psi \Sigma^{-1} \right)} |\Sigma|^{-\frac{n}{2}} e^{-\frac{1}{2} \sum_i{\left(\beta_i-\bar{\beta}\right)^\top} \Sigma^{-1} \left(\beta_i-\bar{\beta}\right) }$$

Now, we can write $$\left(\beta_i-\bar{\beta}\right)^\top \Sigma^{-1} \left(\beta_i-\bar{\beta}\right) = tr \left( \left(\beta_i-\bar{\beta}\right)^\top \Sigma^{-1} \left(\beta_i-\bar{\beta}\right) \right)$$ and use two properties of trace: $$tr(ABC) = tr(CAB)$$ and $$tr(A+B) = tr(A) + tr(B)$$ to obtain the following:

$$|\Sigma|^{-\frac{\mathcal{v}+p+1}{2}} e^{-\frac{1}{2} tr\left(\Psi \Sigma^{-1} \right)} |\Sigma|^{-\frac{n}{2}} e^{-\frac{1}{2} tr \left( \sum_i{ \left(\beta_i-\bar{\beta}\right) \left(\beta_i-\bar{\beta}\right)^\top} \Sigma^{-1} \right) }$$

Thus, it follows that the posterior distribution for $$\Sigma$$ is:

$$f(\Sigma |-) \sim W^{-1} \left(\Psi + \sum_i{ \left(\beta_i-\bar{\beta}\right) \left(\beta_i-\bar{\beta}\right)^\top}, \mathcal{v} + n\right)$$