Deriving the Posterior Distribution for Population Mean

Background Context

In a lot of different Bayesian contexts (e.g., hierarchical Bayesian linear regression, hierarchical Bayesian estimation of discrete choice models etc), the following situation arises:

  1. There are \(n\) respondents whose response can be modeled by a set of independent variables and associated parameters. We will denote these parameters by \(\beta_i\) .

  3. We assume that these individual parameters are from a normal distribution with mean \(\bar{\beta}\) and variance matrix \(\Sigma\) i.e., \(\beta_i \sim N(\bar{\beta}, \Sigma)\).

  5. In Bayesian analysis, we incorporate prior information and one way to incorporate prior information is to assume that \(\bar{\beta} \sim N \left(\beta_0, \Sigma_0 \right)\)

Given the above context, the goal is to find out the posterior distribution for \(\bar{\beta}\) given the values for all the other parameters so that we can generate draws from that distribution. The posterior distribution for \(\bar{\beta}\) is a normal distribution as given below:

$$f(\bar{\beta} |- ) \sim N\left( \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1} \left( \Sigma_0^{-1} \beta_0 + \sum_i \Sigma^{-1} \beta_i \right), \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1}\right)$$

The next section gives a rough outline of the proof. Some steps are skipped and glossed over. Feel free to ask for clarification in the comments.

Informal Sketch of Proof

We assumed that \(\beta_i \sim N({\bar{\beta}}, \Sigma)\) and \({\bar{\beta}} \sim N({\beta_0}, \Sigma_0)\). Thus, using Bayes theorem, the posterior for \(\bar{\beta}\) is given by:

$$ f \left(\bar{\beta} | – \right) \propto f \left(\bar{\beta} |-\right) \prod_i \left\{ f \left( \beta_i|-\right) \right\} $$

Collecting the terms that involve \(\bar{\beta}\), we get the following:

$$ \left(\bar{\beta} – \beta_0 \right)^\top \Sigma_0^{-1} \left(\bar{\beta} – \beta_0 \right) + \sum_i \left(\beta_i – \bar{\beta} \right)^\top \Sigma^{-1} \left(\beta_i – \bar{\beta} \right) $$

Using the property \((A+B)^\top = A^\top + B^\top\) and retaining the terms that involve \(\bar{\beta}\), we get:

$$ \left(\bar{\beta}^\top \Sigma_0^{-1} \bar{\beta} \; – \; \bar{\beta}^\top \Sigma_0^{-1} \beta_0 \; – \; \beta_0^\top \Sigma_0^{-1} \bar{\beta} \right) \; + \; \sum_i \left( – \; \beta_i^\top \Sigma^{-1} \bar{\beta} \; – \; \bar{\beta}^\top \Sigma^{-1} \beta_i \; + \; \bar{\beta}^\top \Sigma^{-1} \bar{\beta} \right) $$

Using the property \((A B)^\top = B^\top A^\top\) and that the transpose of a symmetric matrix is itself, we get:

$$ \left(\bar{\beta}^\top \Sigma_0^{-1} \bar{\beta} \; – \; 2 \beta_0^\top \Sigma_0^{-1} \bar{\beta} \right) \; + \; \sum_i \left( – \; 2 \beta_i^\top \Sigma^{-1} \bar{\beta} \; + \; \bar{\beta}^\top \Sigma^{-1} \bar{\beta} \right) $$

Collecting terms and further simplification leads to:

$$ \bar{\beta}^\top \left( \Sigma_0^{-1} + n \Sigma^{-1} \right) \bar{\beta} \; – \; 2 \left( \beta_0^\top \Sigma_0^{-1} + \sum_i \beta_i^\top \; \Sigma^{-1} \right) \bar{\beta} $$

Using the matrix equivalent for completing the square, we realize that the posterior for \(\bar{\beta}\) is given by:

$$f(\bar{\beta} |- ) \sim N\left( \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1} \left( \Sigma_0^{-1} \beta_0 + \sum_i \Sigma^{-1} \beta_i \right), \left(\Sigma_0^{-1} + n \Sigma^{-1} \right)^{-1}\right)$$

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.