Abstract
What is Gibbs sampling?
How does it work?
- Procedure
- Example
Why does it work?
- Markov Chain Monte Carlo (MCMC)
- Proof

Abstract

This article provides the recipes, simple Python codes and mathematical proof to the most basic form of Gibbs sampling.

What is Gibbs sampling?

Gibbs sampling is a method to generate samples from a multivariant distribution using only conditional distributions , and so on. It is used when the original distribution is hard to calculate but the conditional distributions are available.

How does it work?

Procedure

Starting with a sample , to generate a new sample , do the following

Sample from .
Sample from .
Sample from
… and so on. The last term is sampling from .
Now we have the new sample . Repeat for the next sample.

Note that the value of a new sample depends only on the the previous one, and not the one before, Gibbs sampling belongs to a class of methods called Markov Chain Monte Carlo (MCMC).

Example

Consider a mixture of two 2-dimensional normal distribution below with joint distribution .

We can use the procedure above to to generate samples using only the conditional distribution and . The sample chain could initially be trapped in one local distribution:

But if enough number of points are sampled, the original distribution is recovered:

The codes that generate this example can be found here.

Why does it work?

This section outlines the proof of Gibbs sampling. I mostly follow MIT 14.384 Time Series Analysis, Lectures 25 and 26.

Markov Chain Monte Carlo (MCMC)

Gibbs sampling generates a Markov chain. That’s obvious because the series , , … depends only on the previous value. An important concept to understand is the transition Probability , the probability of generating sample from the previous sample .

Suppose we want to sample a distribution , we want to generate samples in a way that the new sample still follows distribution . Mathematically, for transition probability , the probability measure is invariant if

and the transition probability is

irreducible (i.e. any point can reach any other points)
aperiodic

The MCMC problem is to find a transition probability such that is invariant.

Proof

The procedure of Gibbs sampling outlined previously can be written as a transition probability

Let be the distribution we want to sample from. We need to show is invariant under this transition, i.e.

In other words, if we follow to get new sample based on the previous sample , and repeat this process over and over again, we stay in the same distribution .

Since we need to integrate over , the first thing is to rewrite using Bayes rule so that there’s no conditional on . Bayes rule is

or if there’s always a conditional on :

Using

We get

Substituting this to Eq.(), and noticing that does not depend on the integrating variable ,

The first product

because

So we have

We need to show the integral on the right hand side to be exactly 1.

We start by writing out the terms explicitly and use ,

The only term that depends on is and it is integrated out to be 1. The integral becomes

Turning each fraction into a conditional probability

and so on. The integral becomes

Only the first term depends on and it is integrated out to be 1. After that, only the leading term depends on and it is integrated out to be 1 and so on. Therefore the whole integral is 1 resulting in

So the transition probability from Gibbs sampling is invariant. If we sample according to the Gibbs scheme, we can get unbiased samples of .