Mutual Information in R

In my endeavors as a graduate student in image processing I have found R to be an incredibly useful tool for testing hypotheses and rapid prototyping. Ultimately everything needs to be implemented in a real compiled language like C++ to make a useful tool, but it has saved me a lot time from implementing things that are dumb. I am going to share with you my experience of using R to do feature selection.

I am trying to predict the position on a surface using shape descriptors. I have a training set of about 50 point clouds representing the surface of the corpus callosum. For each point in the training set, I compute geometric moments. These moments boil down all the information in a spherical region around the point down to a single number. They are more sophisticated than simple descriptors like curvature and should capture the relationship between position on the surface and the shape of the corpus callosum.

The problem with using geometric moments is that are infinitely many of them to choose from and many of them are redundant or not very useful. Therefore I want a way to try a lot of them and quickly rule out the useless ones. I will be using mutual information (MI) for that. MI measures the dependence of two random variables on one another using entropy. An MI of 0 indicates no dependence (and therefore no relationship). Higher values indicate a stronger relationship.

For each point, I write its medial axis coordinates and the shape descriptors to a CSV file in the following format:

AxialDistance, Angle, d0, d1, d2, d3, d4, d5, d6, d7
0.00388039, 1.66916, 0.137261, 1, 2.4811e-07, 0.0446063, 0.0102502, 1, -1.07544e-06, 0.071294, 0.00606162
0.00466314, 1.03462, 0.288215, 1, -1.34452e-06, 0.0483304, 0.0120159, 1, -2.80076e-06, 0.072637, 0.00579168
...
~378,000 more rows like this

I’m trying to figure out which shape descriptors will be the best predictor for the medial axis coordinates. One way to eyeball this is simply by making a simple x-y plot where one axis has the descriptor, and the other has the variable you’re trying to predict. For example:

data <- read.csv("fs_lowres_2mm_o4_i4.csv")
plot(data$d5, data$AxialDistance)

gets a plot similar to the following:
fs_lowres_2mm_o4_i4_axial006
This particular descriptor seems to have a relationship (although nonlinear) with axial distance. This seems like a pretty good descriptor in comparison to something like:

fs_lowres_2mm_o4_i4_angle003

In my case, I am trying to pick a set of useful feature descriptors in a relatively automatic way, so looking at plots is far too time consuming. Instead I can just calculate the mutual information for each descriptor and only look at the ones significantly above 0.

#install.packages(entropy) if you don't have the entropy package
require(entropy)
disc=discretize2d(data$AxialDistance, data$d0, numBins1=16, numBins2=16)
mutualInfo=mi.empirical(disc)

So here we see what we expect, the descriptor that yields a plot that looks nearly uniformly random has an MI of 0.043 while the plot with a very patterned relationship has an MI of 0.905!

I hope this helps!

Leave a Comment