Deriving the Posterior Distribution for Population Mean

Background Context

In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical Bayesian estimation of discrete choice models etc), the following situation arises:

1. There are $$n$$ respondents whose response can be modeled by a set of independent variables and associated parameters. We will denote these parameters by $$\beta_i$$ .
2.

3. We assume that these individual parameters are from a normal distribution with mean $$\bar{\beta}$$ and variance matrix $$\Sigma$$ i.e., $$\beta_i \sim N(\bar{\beta}, \Sigma)$$.
4.

5. In Bayesian analysis, we incorporate prior information and one way to incorporate prior information is to assume that $$\bar{\beta} \sim N \left(\beta_0, \Sigma_0 \right)$$

Given the above context, the goal is to find out the posterior distribution for $$\bar{\beta}$$ given the values for all the other parameters so that we can generate draws from that distribution. The posterior distribution for $$\bar{\beta}$$ is a normal distribution as given below:

$$f(\bar{\beta} |- ) \sim N\left( \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1} \left( \Sigma_0^{-1} \beta_0 + \sum_i \Sigma^{-1} \beta_i \right), \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1}\right)$$

The next section gives a rough outline of the proof. Some steps are skipped and glossed over. Feel free to ask for clarification in the comments.

Informal Sketch of Proof

We assumed that $$\beta_i \sim N({\bar{\beta}}, \Sigma)$$ and $${\bar{\beta}} \sim N({\beta_0}, \Sigma_0)$$. Thus, using Bayes theorem, the posterior for $$\bar{\beta}$$ is given by:

$$f \left(\bar{\beta} | – \right) \propto f \left(\bar{\beta} |-\right) \prod_i \left\{ f \left( \beta_i|-\right) \right\}$$

Collecting the terms that involve $$\bar{\beta}$$, we get the following:

$$\left(\bar{\beta} – \beta_0 \right)^\top \Sigma_0^{-1} \left(\bar{\beta} – \beta_0 \right) + \sum_i \left(\beta_i – \bar{\beta} \right)^\top \Sigma^{-1} \left(\beta_i – \bar{\beta} \right)$$

Using the property $$(A+B)^\top = A^\top + B^\top$$ and retaining the terms that involve $$\bar{\beta}$$, we get:

$$\left(\bar{\beta}^\top \Sigma_0^{-1} \bar{\beta} \; – \; \bar{\beta}^\top \Sigma_0^{-1} \beta_0 \; – \; \beta_0^\top \Sigma_0^{-1} \bar{\beta} \right) \; + \; \sum_i \left( – \; \beta_i^\top \Sigma^{-1} \bar{\beta} \; – \; \bar{\beta}^\top \Sigma^{-1} \beta_i \; + \; \bar{\beta}^\top \Sigma^{-1} \bar{\beta} \right)$$

Using the property $$(A B)^\top = B^\top A^\top$$ and that the transpose of a symmetric matrix is itself, we get:

$$\left(\bar{\beta}^\top \Sigma_0^{-1} \bar{\beta} \; – \; 2 \beta_0^\top \Sigma_0^{-1} \bar{\beta} \right) \; + \; \sum_i \left( – \; 2 \beta_i^\top \Sigma^{-1} \bar{\beta} \; + \; \bar{\beta}^\top \Sigma^{-1} \bar{\beta} \right)$$

Collecting terms and further simplification leads to:

$$\bar{\beta}^\top \left( \Sigma_0^{-1} + n \Sigma^{-1} \right) \bar{\beta} \; – \; 2 \left( \beta_0^\top \Sigma_0^{-1} + \sum_i \beta_i^\top \; \Sigma^{-1} \right) \bar{\beta}$$

Using the matrix equivalent for completing the square, we realize that the posterior for $$\bar{\beta}$$ is given by:

$$f(\bar{\beta} |- ) \sim N\left( \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1} \left( \Sigma_0^{-1} \beta_0 + \sum_i \Sigma^{-1} \beta_i \right), \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1}\right)$$