Unsupervised Machine Learning: Affinity Propagation Algorithm Explained

July 01, 2019

Unsupervised Machine Learning: Affinity Propagation Algorithm Explained

The Affinity Propagation algorithm was published in 2007 by Brendan Frey and Delbert Dueck in Science. In contrast to other traditional clustering methods, Affinity Propagation does not require you to specify the number of clusters. In layman’s terms, in Affinity Propagation, each data point sends messages to all other points informing its targets of each target’s relative attractiveness to the sender. Each target then responds to all senders with a reply informing each sender of its availability to associate with the sender, given the attractiveness of the messages that it has received from all other senders. Senders reply to the targets with messages informing each target of the target’s revised relative attractiveness to the sender, given the availability messages it has received from all targets. The message-passing procedure proceeds until a consensus is reached. Once the sender is associated with one of its targets, that target becomes the point’s exemplar. All points with the same exemplar are placed in the same cluster.

Algorithm

Suppose that we had the following dataset. Each of the participants is represented as a data point in a 5 dimensional space.

Similarity Matrix (c)

Barring those on the diagonal, every cell in the similarity matrix is calculated by negating the sum of the squares of the differences between participants.

If that didn’t make any sense to you, don’t fret, it will become clear once go walk through an example. For the similarity between Alice and Bob, the sum of the squares of the differences is (3–4)² + (4–3)² + (3–5)² + (2–1)² + (1–1)² = 7. Thus, the similarity value is -7.

Still not clear? Let’s look at another example.

I highly recommend calculated a few of these yourself. You should end up with something close to the following table.

The algorithm will converge around a small number of clusters if a smaller value is chosen for the diagonal, and vice versa. Therefore, we fill in the diagonal elements of the similarity matrix with -22, the lowest number from among the different cells.

Responsibility Matrix (r)

We start off by constructing an availability matrix with all elements set to zero. Then, we calculate every cell in the responsibility matrix using the following formula:

where i refers to the row and k the column of the associated matrix.

For example, the responsibility of Bob (column) to Alice (row) is -1, which is the similarity of Bob to Alice (-7) minus the maximum of the remaining similarities of Alice’s row (-6).

Again, I highly recommend you try calculating a few of these yourself.

After calculating the responsibilities for the rest of the pairs of participants, we end up with the following matrix.

Availability Matrix (a)

We use a separate equation for updating the elements on the diagonal of the availability matrix than we do the elements off the diagonal of the availability matrix.

The proceeding formula is used to fill in the elements on the diagonal:

where i refers to the row and k the column of the associated matrix.

In essence, the equation is telling you to sum all the values above 0 along the column except for the row whose value is equal to the column in question. For example, the self-availability of Alice is the sum of the positive responsibilities of Alice’s column excluding Alice’s self-responsibility (10 + 11 + 0 + 0 = 21).

Still don’t understand? Let’s go over a couple more examples.

The following equation is used to update off-diagonal elements:

In other words, say you’re trying to fill in a(Cary, Edna). Considering the elements along Edna’s column, you exclude the Edna/Edna relationship and the Cary/Edna relationship and sum all the remaining positive responsibilities together. For example, the availability of Bob (column) to Alice (row) is Bob’s self-responsibility plus the sum of the remaining positive responsibilities of Bob’s column excluding the responsibility of Bob to Alice (-15 + 0 + 0 + 0 = -15).

I highly recommend you try calculating a few cells yourself.

After calculating the rest, we wind up with the following availability matrix.

Criterion Matrix (c)

Each cell in the criterion matrix is simply the sum of the availability matrix and responsibility matrix at that location.

The criterion value of Bob (column) to Alice (row) is the sum of the responsibility and availability of Bob to Alice (-1 + -15 = -16).

The highest criterion value of each row is designated as the exemplar. Rows that share the same exemplar are in the same cluster. Thus, in our example. Alice, Bob, and Cary form one cluster whereas Doug and Edna constitute the second.

It’s worth noting that in this example, the variables ranged over the same scale. In general, however, variables are on different scales and must be normalized prior to training.

Code

Let’s jump into some code. To start, import the following libraries.

import numpy as np  
from matplotlib import pyplot as plt  
import seaborn as sns  
sns.set()  
from sklearn.datasets.samples_generator import make_blobs  
from sklearn.cluster import AffinityPropagation

We use scikit-learn to generate data with nicely defined clusters.

X, clusters = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)  
plt.scatter(X[:,0], X[:,1], alpha=0.7, edgecolors='b')

Next, we initialize and train our model.

af = AffinityPropagation(preference=-50)

clustering = af.fit(X)

Finally, we plot the data points using a different color for each cluster.

plt.scatter(X[:,0], X[:,1], c=clustering.labels_, cmap='rainbow', alpha=0.7, edgecolors='b')

Final Thoughts

Affinity Propagation is an unsupervised machine learning algorithm that is particularly well suited for problems where we don’t know the optimal number of clusters.

Sources

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.490.7628&rep=rep1&type=pdf