# Catching Common Image Processing Programming Errors with Generic Unit Tests When implementing image algorithms, I am prone to make these mistakes:

• swapping x and y;
• working on the wrong channel;
• making off-by-one errors, especially in window algorithms;
• making division-by-zero errors;
• handling borders incorrectly; and
• handling non-power-of-two sized images incorrectly.

Since these types of errors are prevalent in many image-processing algorithms, it would be useful to develop, once and for all, general tests that will catch these errors quickly for any algorithm.

This post is about such tests.

## The General Idea

The general idea is to exploit the fact that many algorithms satisfy invariants such as these:

some_transform(algorithm(image))
== algorithm(some_transform(image))

For example: a 3-by-3 box blur is invariant under a vertical flip:

box_blur3x3(flip_vertical(image))
== flip_vertical(box_blur3x3 (image))

It does not matter whether we apply the flip before or after applying the box filter—the result should be the same.

What kind of errors can the above test expose? Since it checks whether the algorithm (not the image) is vertically symmetric, the test can catch certain off-by-one errors along the vertical axis. Here is an example of a faulty implementation of the blur algorithm. Here the function sum adds all the pixel values in the rectangle x0..x1 – 1 and y0..y1 – 1.

//C-like pseudo-code
Image & box_filter3x3(Image & image)
{
forXY(image, x, y)
{
x0, y0 = max(0, x – 1), max(0, y – 1)
x1, y1 = min(x + 1, image.width), min(y + 1, image.height)
result = sum(image, x0, x1, y0, y1) / ((x1 – x0) * (y1 – y0));
}

image = result;
return image;
}

Can you spot the error? The actual window used is only 2 by 2. But how does the test catch this error? To see why the test will fail, notice the difference in windows for the top-left and bottom-left pixels:

Top Left:

x0, y0 == 0, 0
x1, y1 == 1, 1

Bottom Left

x0, y0 == width – 2, height – 2
x1, y1 == width , height

As you can see, the two windows have different sizes! The bottom-left pixel window is 1 by 1, but the top left pixel is 2 by 2. Thus flipping the image before and after applying the box-blur gives different results.

The correct algorithm is

//C-like pseudo-code
Image & box_filter3x3(Image & image)
{
forXY(image, x, y)
{
x0, y0 = max(0, x – 1), max(0, y – 1)
x1, y1 = min(x + 2, image.width), min(y + 2, image.height)
result = sum(image, x0, x1, y0, y1) /
((x1 – x0) * (y1 – y0));
}

image = result;

return Image;
}

Now the windows for the top and bottom left pixels are:

x0, y0 == 0, 0
x1, y1 == 2, 2

Bottom Left

x0, y0 == width – 2, height – 2
x1, y1 == width , height

We can see the window sizes are now the same.

## Transforms

Here is a list of suggested transforms and the types of errors they can catch:

 flipVertical flipHorizontal Off-by-one errors. flipDiagonal Swapping x and y. rotateChannels Working on the wrong channel. invert Calculation mistakes, certain kinds of division-by-zero. errors. scaleIntensity Calculation mistakes. adjustGamma Certain kinds of statistical ordering errors. crop Certain kind of incorrect border calculations, certain power-of-two errors. translateVertival translateHorizontal Certain kinds of incorrect border calculations, off-by-one errors. desaturate Certain channel processing errors.

Many algorithms should (in theory) also be invariant under scaling. However, because of the complexity involved in interpolation and sampling, I do not recommend using scaling for testing. It is quite difficult to determine under some interpolation or sampling scheme whether an algorithm should in fact be (exactly) invariant or not under scaling. This of course also applies to other transforms that rely on sampling or interpolation.

You will probably also select transforms that are more specific to the algorithms you are implementing. Keep these as simple as possible—not only to avoid implementation errors, but also to avoid subtle misconceptions: it must be easy to see (or well established) that a certain algorithm is invariant under a transform. Test your own transform functions aggressively.

## Generic Solutions

Writing test code for each transform is easy, but tedious (and hence error prone). Here I show how a generic test can be implemented using C++ macros or functional programming. The generic test can be used to test whether any algorithm is invariant under a given transform.

### Macros Implementation

Probably the easiest way to implement a general test-function in C++ is to implement it as a macro. Here is how this looks:

//C++
#define TEST_TRANSFORM_INVARIANCE(image, transform, command) \
do { \
Image image; \
make_test_image(image); \
\
Image original_image(image); \
command; \
\
Image image1(image); \
transform(image1); \
\
image = original_image; \
transform(image); \
command; \
\
if (image != image1) \
report_useful_failure_message(...); \
} while(0)

We can now use this macro like this:

TEST_TRANSFORM_INVARIANCE (image, flipVertical, my_algorithm(image));

The nice thing about the macro approach is how easy functions that take any number of parameters can be tested:

TEST_TRANSFORM_INVARIANCE (image, flipVertical, my_algorithm(image, 10, 20));

Notice that the first parameter is a variable name. The macro declares this variable, and the user can use it in the command parameter to pass the image to the algorithm.

### Functional Programming Implementation

If macros are not available, or using them is not desirable, you can still use a generic approach. Unfortunately, we need to define functions with a consistent argument list for each test we want to perform, which means we might need to write more code in some languages.

The general test function can be defined as follows in C++:

//C++
test_transform_inavriance(
Image & (* function)(Image &),
Image & (* transform)(Image &))
{
Image image;
make_test_image(image);

Image original_image(image);
function(image);

Image image1(image);
transform(image1);

image = original_image;
transform(image);
function(image);

if (image != image1)
report_useful_failure_message(...);
}

To use it with a single argument function, we simply call it like this:

test_transform_inavriance(flip_vertical, my_algorithm);

To use it with a function that takes more than one parameter, we need to define a function to call the extra parameters:

// C++
Image & test_my_algorithm_wrapper(Image & image)
{
my_algorithm(image, 10, 20);
}
...
test_transform_inavriance(flip_vertical, my_algorithm_wrapper);

In languages that support closures, you can wrap functions more cleanly so that you do not need to define a function for each test command. For example, in Python:

# Python
def wrap(fn, args):
def wrapper(image):
fn(image, *args)
return fn
...
test_transform_inavriance(flip_vertical, wrap(my_algorithm, 10, 20));

## Test Bundles

Both the macro and functional programming approaches allow you to combine tests of different transforms in convenient functions:

// C++
#define TEST_ALL(image, command)\
do { \
TEST_TRANSFORM_INVARIANCE (image, flip_vertical, command); \
TEST_TRANSFORM_INVARIANCE (image, flip_horizontal, command); \
...
} while(0)
// C++
void test_all(Image & (* fn)(Image &))
{
test_transform_inavriance(flip_vertical, function);
test_transform_inavriance(flip_ horizontal, function);
...
}

## Test Images

It is important that your images do not destroy the very asymmetry that you are trying to expose. For instance, if we ran the test at the very beginning of this post with an all-zero image, the test will have passed. The image should be asymmetric under the transform we are using, that is, the following must hold:

transform(image) != image

For different transforms, this means different things. For example, for flip_diagonal, the image must not be square, for inverse, the image must not be the constant grayscale image 0.5.

It is useful to build an extra test into our test macro or function, to make sure that we do not inadvertently break this requirement. Here is the modified macro:

// C++
#define TEST_TRANSFORM_INVARIANCE(image, transform, command)\
do { \
Image image; \
make_test_image(image); \
\
Image original_image(image); \
transform(image); \
\
if(image == original_image) \
report_unsuitable_test_image(...); \
\
image = original_image; \
command; \
\
Image image1(image); \
transform(image1); \
\
image = original_image; \
transform(image); \
command; \
\
if (image != image1) \
report_useful_failure_message(...); \
} while(0)

For many algorithms, using an image of noise (independent for each channel) works well. Make sure the image have unequal dimensions that are prime numbers (for example, 11 and 13). Using prime numbers ensures that the two dimensions will not have any common divisors, which makes certain kind of tests for recursive algorithms (such as quad-tree compression) more robust.

## Reporting Failures

In the example implementation, we have the line

if (image != image1) report_useful_failure_message(...);

This is a somewhat simplistic scheme. What is a useful failure message? Of course, the message must report the test that failed. But it should also give information about how the test failed:

• whether it was a mismatch of the number of channels, image dimensions, or pixel values;
• the number of channels and image dimensions;
• the number of pixels for which the test failed; and
• the first pixel location where the test failed, and the expected and actual pixel values.

These bits of information can help us hunt down the cause. For example, if the test data is an 11-by-13 image, and the test fails for 13 pixels, the error is most likely a border problem caused by an off-by-one error. If the first pixels that fails is 0, 0, and the test fails for (almost) all pixels, and the pixel values differ only by sign, we can guess that there is some sign error made that affects the entire image.

The actual test will thus be a bit more complicated, delegating to a function such as this:

test_image_equality(
Image & image1,
Image image 2,
FailureInfo & info)
{
info.has_failed = false;
info.failed_pixels_count = 0;
info.dimensions_image1 = image1.dimensions;
info.dimensions_image2 = image2.dimensions;
info.channels_image1 = image1.channels
info.channels_image2 = image2.channels

if(image1.channels != image2.channels)
{
info.has_failed = true;
info.failure_type = CHANNEL_MISMATCH;
return;
}

if(image1.dimensions != image2.dimensions)
{
info.has_failed = true;
info.failure_type = DIMENSION_MISMATCH;
return;
}

forXY(image, x, y)
{
if(abs(image1(x, y) - image2(x, y)) < THRESHOLD)
{
if (!info.has_failed)
{
info.has_failed = true;
info.failure_type = VALUE_MISMATCH;
info.first_failed_x = x;
info.first_failed_y = y;
info_first_failed_image1 = image1(x,y);
info_first_failed_image2 = image2(x,y);
}

info.failed_pixels_count++;
}
}
}

In the test macro or function, we call the function like this:

...
FailureInfo info; \
test_image_equality(image, image1, info); \
if (info.has_failed) report_failure(info); \
...

## Using Third-Party Image Libraries

It is important that you understand the transforms you use for testing very well. It is therefore a good idea to use you own transforms, and not those of a library. This way, unknown design choices in the library cannot bite you. Many image libraries do not for example specify how they handle borders, division by zero, rounding, etc. These details are often not important when using the algorithms in a task—but they can make a test fail (or succeed) when it should not.

The same danger exists when you use third party algorithms as building blocks for your own. In this case testing your code will be harder. As a starting point, consider testing whether third party algorithms satisfy the invariants you expect to hold. If they do not, you may need to modify your test code to accommodate for the discrepancy. If they do, you can be a little more confident that your tests will be reliable.

(Yes, it is not normally recommended to test other people’s code with unit tests. However, this is a once-off test to make sure that your own tests are reliable, and since it is so easy to implement (since you already have the test macro / function and transforms), I do not see the harm. Just keep the third-party library test separate from your own.)

## Defining Correctness—When are two images equal?

Correctness of an image algorithm is a subtle issue, mostly because the discrete, quantized, finite model of image computation is inherently an approximation of the “real” thing. The question is too broad to tackle here (and frankly, I do not have enough knowledge to do it), so I will focus on aspects that are specifically relevant to this kind of testing.

Let us start with an example of an essentially correct algorithm that fails a test that we expect that it should not:

// C++
Image & threshold(Image & image)
{
forXY(image, x, y)
image(x, y) = (image(x, y) <= 0.5f) ? 0 : 1;

return image;
}

We expect this simple algorithm to be invariant under the inverse transform. However, it is not. Consider a pixel value of 0.5: the inverse of this is 0.5. So both the pixel and its inverse will map to zero, and hence the following will not be true:

inverse(threshold(image)) == threshold(inverse(image)) //not true
inverse(threshold(0.5f)) == inverse(0) == 1 //left side
threshold(inverse(0.5f)) == threshold(0.5f) == 0 //right side

It looks like the problem is easily solved by changing the algorithm:

// C++
Image & threshold(Image ℑ)
{
forXY(image, x, y)
image(x, y) = (image(x, y) < 0.5f) ? 0 : 1;

return image;
}

But, for floating point numbers on a machine, there is another number 0.5 – epsilon such that the following algorithm is equivalent to the one above:

// C++
Image & threshold(Image & image)
{
forXY(image, x, y)
image(x, y) = (image(x, y) <= 0.5f - epsilon) ? 0 : 1;

return image;
}

But for this algorithm our invariant does not hold for a pixel with the value 0.5 – epsilon! Since the last two algorithms are equivalent, it means the modified algorithm is also not correct in this strict sense. In fact, it is not possible to implement an algorithm that is correct in this sense (using floating-point numbers).

But we feel that any of the above implementation is correct, and hence our test is wrong. So how do we change it, and should we?

We can change one (or more) of three things:

• the test data
• the test transform
• how we test for equality in images

Because of the simplicity and general usefulness of both the transform and the test data, I feel it is best to leave these as is. That means we have to change the measure of equality. Since we use random data, we might want to use the “is probably equal” measure—that is, two images are equal if a certain percentage of pixels are equal (within some threshold). Since the likelihood of generating 0.5 is low, our test should pass. I do not like this idea for two reasons:

First, it is not generally useful. For example, another test might fail when the borders are not equal (a much larger number of pixels), even though we feel that it must not for some specific algorithm. We then need another equality measure to handle that case.

Second, it is bound to hide a class of errors where a small number of pixels are incorrectly processed, and we do not want that.

Instead, we would like some measure that is generally useful, but can be tweaked for the specific algorithm to overcome those cases where we expect the algorithm to fail.

One way to do this is to use equality masks: a simple bit array that says whether we are interested in that pixel or not. In this case, we need the mask to be constructed from the test data; here is how we do it:

//C++
Image & image,
float value)
{
forXY(image, x, y)
mask = image(x, y) == 0.5f;
}

We then change our macro to this:

#define TEST_TRANSFORM_INVARIANCE (image, mask, transform, command, mask_command) \
do { \
Image image; \
make_test_image(image); \
\
Image original_image(image); \
transform(image); \
\
if(image == original_image)
report_unsuitable_test_image(...); \
\
image = original_image; \
command; \
\
Image image1(image); \
transform(image1); \
\
image = original_image; \
transform(image); \
command; \
\
\
report_useful_failure_message(...); \
} while(0)

We call the macro like this:

The functional programming solutions are similarly modified:

// C++
test_transform_inavriance(
Image & (* function)(Image &),
Image & (* transform)(Image &))
{
Image image;
make_test_image(image);

Image original_image(image);

if(image == original_image)
report_unsuitable_test_image(...);

image = original_image;
function(image);

Image image1(image);
transform(image1);

image = original_image;
transform(image);
function(image);

report_useful_failure_message(...);
}

We need a wrapping function to call our extra argument:

BitImage & void wrapped_ make_ignore_value_mask(
Image & image)
{
}

Now we can call our test function like this:

test_transform_inavriance(image, mask, inverse, threshold,
wrapped_make_ignore mask(mask, image));

In languages that support closures such as Python, we can again use the wrapping technique as before:

test_transform_inavriance(image, mask, inverse, threshold,
wrap(make_ignore mask(mask, image), 0.5))

Here are a few suggestions for useful mask types:

 ignore_none A constant array of 1s. Probably the one to be used with most algorithms. ignore_value ignore_values Ignores all the pixels in the test data that equal a given value or list of values. ignore_range Ignores all the pixels in the test data that fall in the given range. ignore_out_of_range Ignores all the pixels in the test data that fall outside a given range. ignore_rect Ignores all the pixels inside a specified rectangle. ignore_border Ignore all pixels in a boarder of specified width.

Be careful for inadvertent ignore_all masks. For example, the ignore_border mask might be all zero when the border is thicker than half the smallest image dimension. It is a good idea to build in a test to check that the mask is not all-zero in the test macro or function.

Also watch out when transforms or algorithms change image dimensions—make sure the mask is constructed correctly.

We have now answered how we can change the test to pass for the thresholding example (and more generally); it remains to answer whether we should. My feeling is: only when you have to. Generally, I test without regard for such subtle issues. When a test fails, I carefully try to understand whether it is because the algorithm is fundamentally wrong, or whether it is just an obscure instance that makes the test fail. In the latter case, I will amend the test with the necessary masks.

Update: Here is another example of how an invariance test can fail even when the algorithm is fundamentally correct. Consider a region quadtree compression scheme. Suppose we want to compress a 5&times 5 image, and suppose the algorithm divides the image in four rectangles. Because 5 is odd, the rectangles will differ in size. The biggest rectangle should always be in the same spot, regardless of the reflection; thus the algorithm should fail when testing pixels along the center. We cannot really address this issue with masks. To me it is unclear how to handle this at all (and clearly, we do not want to limit our tests to powers-of-two images only).

## Warning: A False sense of Security

When you have a large group of transforms to construct invariance checks from, you can easily test for a whole bunch of errors with just a few lines of code. If the tests pass, it is easy to think that the algorithm is correct. It may not be:

• Perhaps the test image is unsuitable. This may be by coincidence if the test image is random.
• Any particular test does not catch all errors of a certain type.
• All transforms do not cover the entire set of possible errors.

A set of invariance tests does in general not prove the correctness of a particular algorithm. It merely exposes certain kinds of errors. Therefore, additional tests for correctness are necessary.

## Invariance Test Checklist / Summary

The basic structure of the test macro / function is as follows:

1. Create test data.
2. Calculate transform(algorithm(test_image)).
3. Calculate algorithm(transform(test_image)).
4. Compare these results and report any failures.

Remember:

• Check that the test data is changed by the transform.
• Use masks to cater for peripheral special cases where certain pixels, values, or regions in an image should be ignored for comparisons. 