Mutual information¶
The intensity of a voxel located at coordinates
(
) within the fixed image
is given
by:
and similarly, the intensity of a voxel at coordinates
(
) within the moving image
is given
by:
We want to apply a coordinate transform
to the moving image such that it
registers best with the fixed image
. However, since
and
could have been obtained from different
imaging modalities, we cannot simply minimize the error between
intensities
and
. Instead, we must attempt
to maximize the amount of information shared between the two
images. The statistical Mutual Information can therefore be
used as a similarity metric:
(1)¶
which depends on the probability distributions of the voxel
intensities for the static and moving images. We now view
and
as random variables with associated
probability distribution functions
and
and joint probability
. Note
that applying the spatial transformation
to
will result in modification of
. This
effect is implied with the notation
.
Furthermore, if
results in voxels being
displaced outside of the image,
will change.
Perhaps more importantly, if
results in a
voxel being remapped to a location that falls between points on the
voxel grid, some form of interpolation must be employed to obtain
, which will modify
. These effects are
implied with the notation
.
The method of interpolation used to obtain given
is important both in terms of
execution speed and solution convergence. Here we will discuss the
Partial Volume Interpolation method proposed by Maes et. al.

Figure 1(a) depicts the situation where has displaced a voxel (red) to the center of a
neighborhood of 8 other voxels (black). The interpolation method divides the
volume defined by the 8-nearest neighbors (red) into 8 partial volumes (w1 -
w8) that all share the red voxel’s location as a common vertex Figure 1(b-c).
We can more formally define the 8 partial volumes as a function of
:
(2)¶
Noting that
Once the partial volumes have been computed, they are placed into
the histograms of their corresponding voxels in the 8-neighborhood.
Partial volume is placed into the histogram bin
associated with
,
with
, etc.
This method is used in computing both
and
.
We now have a complete expression for the mutual information similarity metric, which we will use as a cost function that describes the quality of the registration.
(3)¶
We can attempt to achieve the optimal registration by maximizing this cost
function, which is achieved through modification of the coordinate
transformation applied to the moving image
. It is important to note that the
could be any
type of conceivable coordinate transformation. For example,
could represent a set of affine transforms to be applied
to the moving image. Here, however, we will be focusing on a parametrically
defined nonrigid coordinate transformation. The transformation
is characterized by the displacement field
, which defines how individual voxels within the moving image
should be displaced such that
and
are
optimally registered. This displacement field is computed at every voxel
given the B-spline coefficients
defined for the sparse array of
control-points:
(4)¶
where () form the indices of the 64 control points
in the immediate vicinity of a voxel located at
(
). Note, (
) forms the
voxel’s offsets within the region bound by the 8 control-points in
the immediate vicinity of the voxel. For simplicity, we refer to
such a region as a tile.
Having defined the coordinate transformation
in terms of the B-spline
coefficients
, it is becomes apparent that the mutual
information may be maximized through optimization of
.
If this is to be accomplished via the method of gradient descent,
an analytic expression for the gradient
must be derived. The chain rule
is employed to separate the expression into two partial derivatives
with the first being dependent on the similarity metric employed
and the second on the type of transform employed:
(5)¶
where the second term is easily obtained by taking the derivative
of (4) with respect to :
Referring again to the first term of (5), it is
important to realize that and
are coupled
through the probability distribution
and are
therefore directly affected by the 8 neighborhood partial volume
interpolation. This becomes further apparent when
is further decomposed:
(6)¶
where the first term may be found by taking the derivative of
(3) with respect to the joint distribution :
(7)¶
see {A}.
The second term, ,
describes how the joint distribution changes with the vector field.
Explicitly stated, a displacement vector locally transforms the
coordinates of a voxel in the moving image
such that:
From Figure fig:pv8 it becomes immediately apparent that as the vector field is
modified the partial volumes -
to be inserted into
the moving and joint histograms
and
will change in
size. Consequently,
can be
found for
by taking the derivative of (2)
with respect to each of the Cartesian directions.
Therefore, as prescribed by (6), computing at a given voxel in
involves first cycling through the
8-bins corresponding to the 8 nearest neighbors used during the partial volume
interpolation and computing
for each, which
is scalar and therefore the same for all Cartesian directions. The 8
values are each weighted by one
. Finally, once this process has been
completed for all 8 bins, the resulting
values is assimilated into the sparse coefficient array
by way of
(5). Once this is completed for all voxels, the next optimizer
iteration begins and the process is repeated with a modified coefficient array
.
NOTE
It is important to note that if all 8 histograms associated with
the neighborhood are equal, the value of
. This is a direct result
of
and similarly for the - and
-directions. However, when
the 8 histograms within the neighborhood are not equal, this will result in the
gradient
being “pulled” in the Cartesian
direction of voxels associated with large
,
which is desirable when attempting to maximize MI.