Gibbs Sampling

October 02, 2020

Gibbs Sampling

Yet Another MCMC Method

Like other MCMC methods, the Gibbs sampler constructs a Markov Chain whose values converge towards a target distribution. Gibbs Sampling is in fact a specific case of the Metropolis-Hastings algorithm wherein proposals are always accepted.

To elaborate, suppose you wanted to sample a multivariate probability distribution.

Note: A multivariate probability distribution is a function of multiple variables (i.e. 2 dimensional normal distribution).

https://commons.wikimedia.org/wiki/File:MultivariateNormal.png

We don’t know how to sample from the latter directly. However, because of some mathematical convenience or maybe just sheer luck, we happen to know the conditional probabilities.

This is where Gibbs sampling comes in. Gibbs Sampling is applicable when the joint distribution is not known explicitly or is difficult to sample from directly, but the conditional distribution of each variable is known and is easier to sample from.

Gibbs Sampling Algorithm

We start off by selecting an initial value for the random variables X & Y. Then, we sample from the conditional probability distribution of X given Y = Y⁰ denoted p(X|Y⁰). In the next step, we sample a new value of Y conditional on X¹, which we just computed. We repeat the procedure for an additional n - 1 iterations, alternating between drawing a new sample from the conditional probability distribution of X and the conditional probability distribution of Y, given the current value of the other random variable.

Let’s take a look at an example. Suppose we had the following posterior and conditional probability distributions.

where g(y) contains the terms that don’t include x, and g(x) contains the those don’t depend on y.

We don’t know the value of C (normalizing constant). However, we do know the conditional probability distributions. Therefore, we can use Gibbs Sampling to approximate the posterior distribution.

Note: The conditional probability are in fact normal distributions, and can be rewritten as follows.

Given the preceding equations, we proceed to implement the Gibbs Sampling algorithm in Python. To begin, we import the following libraries.

import numpy as np  
import scipy as sp  
import matplotlib.pyplot as plt  
import pandas as pd  
import seaborn as sns  
sns.set()

We define the function for the posterior distribution (assume C=1).

f = lambda x, y: np.exp(-(x*x*y*y+x*x+y*y-8*x-8*y)/2.)

Then, we plot the probability distribution.

xx = np.linspace(-1, 8, 100)  
yy = np.linspace(-1, 8, 100)  
xg,yg = np.meshgrid(xx, yy)  
z = f(xg.ravel(), yg.ravel())  
z2 = z.reshape(xg.shape)  
plt.contourf(xg, yg, z2, cmap='BrBG')

Now, we’ll attempt to estimate the probability distribution using Gibbs Sampling. As we mentioned previously, the conditional probabilities are normal distributions. Therefore, we can express them in terms of mu and sigma.

In the following block of code, we define functions for mu and sigma, initialize our random variables X & Y, and set N (the number of iterations).

N = 50000  
x = np.zeros(N+1)  
y = np.zeros(N+1)  
x[0] = 1.  
y[0] = 6.  
sig = lambda z, i: np.sqrt(1./(1.+z[i]*z[i]))  
mu = lambda z, i: 4./(1.+z[i]*z[i])

We step through the Gibbs Sampling algorithm.

for i in range(1, N, 2):  
    sig_x = sig(y, i-1)  
    mu_x = mu(y, i-1)  
    x[i] = np.random.normal(mu_x, sig_x)  
    y[i] = y[i-1]  
      
    sig_y = sig(x, i)  
    mu_y = mu(x, i)  
    y[i+1] = np.random.normal(mu_y, sig_y)  
    x[i+1] = x[i]

Finally, we plot the results.

plt.hist(x, bins=50);  
plt.hist(y, bins=50);

``` plt.contourf(xg, yg, z2, alpha=0.8, cmap='BrBG') plt.plot(x[::10],y[::10], '.', alpha=0.1) plt.plot(x[:300],y[:300], c='r', alpha=0.3, lw=1) ```

As we can see, the probability distribution obtained using the Gibbs Sampling algorithm does a good job of approximating the target distribution.

Conclusion

The Gibbs Sampling is a Monte Carlo Markov Chain method that iteratively draws an instance from the distribution of each variable, conditional on the current values of the other variables in order to estimate complex joint distributions. In contrast to the Metropolis-Hastings algorithm, we always accept the proposal. Thus, Gibbs Sampling can be much more efficient.