Monte Carlo Integration

October 03, 2020

Monte Carlo Integration

Often times, we can’t solve integrals analytically and must resort to numerical methods. Among these include Monte Carlo integration. As you may remember, the integral of a function can be interpreted as the area under a function’s curve. Monte Carlo integration works by evaluating a function at different random points between a and b, adding up the area of the rectangles and taking the average of the sum. As the number of points increases, the result approaches to the actual solution to the integral.

Monte Carlo integration expressed algebraically:

Compared to other numerical methods, Monte Carlo integration is particularly well suited for computing the area of odd shapes.

In the proceeding section, we’ll see how we can use Monte Carlo integration to determine the posterior probability when we know the prior and likelihood, but are lacking the normalizing constant.

Bayesian Statistics In a Nutshell

The posterior probability refers to a specific term in Baye’s formula.

Baye’s theorem can be used to calculate the probability that a person who tests positive on a screening test for a particular disease actually has the disease.

We would use this formula if we already know P(A), P(B) and P(B|A) but want to know P(A|B). For example, say we’re testing for a rare disease that infects 1% of the population. Medical professionals have developed a highly sensitive and specific test, which is not quite perfect.

99% of sick patients test positive
99% of healthy patients test negative

Bayes theorem tells us that:

Suppose we had 10,000 people, 100 are sick and 9,900 are healthy. Moreover, after giving all of them the test we’d get 99 sick people testing sick, but 99 healthy people testing sick as well. Thus, we’d end up with the following probabilities.

p(sick) = 0.01

p(not sick) = 1–0.01 = 0.99

p(+|sick) = 0.99

p(+|not sick) = 0.01

Bayes’ Theorem applied to probability distributions

In the previous example, we assumed the prior probability a person is sick was a known quantity of exactly .001. However, in the real world, it is unreasonable to believe that this probability of .001 is in fact so precise. The probability that a given person is sick will vary widely depending on their age, sex, weight, etc. In general, our knowledge of a given prior probability is far from perfect because it is derived from previous samples (meaning different populations may give different estimates of the prior probability). In Bayesian statistics, we may replace this value of .001 with a distribution for the prior probability that captures our prior uncertainty about its true value. The inclusion of a prior probability distribution ultimately produces a posterior probability that is also no longer a single quantity; rather, the posterior becomes a probability distribution as well. This is in contrast to the classical perspective, which assumes parameters are fixed quantities.

Normalizing constant

As we saw in the articles on Gibbs Sampling and Metropolis-Hasting, Monte Carlo methods can be used to compute the posterior probability distribution when the normalizing constant is not known.

Let’s explore why we need a normalizing constant in the first place. In probability theory, a normalizing constant is a constant by which a function must be multiplied so the area under its graph is 1. Still not clear? Let’s take a look at an example.

Recall that the function of a normal distribution can be written as:

The square root of two pi is the normalizing constant.

Let’s examine how we went about determining it. We start with the following function (assumes that mean is 0, and the variance is 1):

If we drew it out, it would form a bell shape curve.

The problem lies in the fact that if we took the area under the curve, it wouldn’t be equal to 1, which is required for it to be a probability density function. Therefore, we divide the function by the result of the integral (the normalizing constant).

Going back to the issue at hand, that is, how to compute the posterior probability without the normalizing constant… As it turns out, for a continuous sample space, the normalizing constant can be rewritten as:

At this point, you should be thinking Monte Carlo integration!

Python Code

Let’s take a look at how we could go about determining the posterior probability by performing Monte Carlo Integration in Python. We start off by importing the required libraries, and setting the random seed to ensure the results are reproducible.

import os  
import sys  
import matplotlib.pyplot as plt  
import numpy as np  
import pandas as pd  
import scipy.stats as st

np.random.seed(42)

We then set the values of the parameters of the beta and binomial distributions.

a, b = 10, 10  
n = 100  
h = 59  
thetas = np.linspace(0, 1, 200)

The domain of probability density function ranges from 0 to 1. Therefore, we can simplify the equation.

In code, the preceding equation is written as follows:

prior = st.beta(a, b).pdf(thetas)  
likelihood = st.binom(n, thetas).pmf(h)  
post = prior * likelihood  
post /= (post.sum() / len(thetas))

Finally, we visualize the probability density functions of the prior, posterior, and likelihood.

plt.figure(figsize=(12, 9))  
plt.plot(thetas, prior, label='Prior', c='blue')  
plt.plot(thetas, n*likelihood, label='Likelihood', c='green')  
plt.plot(thetas, post, label='Posterior', c='red')  
plt.xlim([0, 1])  
plt.xlabel(r'$\theta$', fontsize=14)  
plt.ylabel('PDF', fontsize=16)  
plt.legend();

Conclusion

Monte Carlo integration is a numerical method for solving integrals. It works by evaluating a function at random points, summing said values, and then computing their average.