Back to the main page

Examples using the sample tools

Note: The ImageMagick tool display is used throughout. ImageMagick is an excellent free, open source set of general image processing tools.

Object detection - looking for compact blobs of features

NOTE: The EmbedCV library really does not support object detection! It does have a few functions for computing the integral image transform and simple box features (Paul Viola and Michael Jones, Robust Real-time Object Detection, July 2001). Use of the integral transform for object detection really requires much deeper infrastructure than what EmbedCV currently provides. The level of effort for full object detection and identification is so much greater that I felt it best to checkpoint the code as-is. The approach on this page is a hack done over Thanksgiving weekend and is extremely inefficient. An optimized approach should run orders of magnitude faster.

All object detection methods I am aware of detect features and try to cluster them into clumps that appear recognizable to the computer vision system as distinct objects. The detected features almost always come from the luminosity image. Color is typically ignored. This is sensible as most image information content is in luminosity, not chromaticity. However, it does sometimes happen that the useful information is in chromaticity rather than luminosity.

Let's try detecting edges in the image to see if they "clump" together in a way useful for object detection. We will use the ppm2edge tool to generate an edge image.

cat pooltable.jpg | ./jpg2ppm | ./ppm2edge | display -

Usage:    cat input.ppm | ./ppm2edge [-a|-x|-y|-z] [-s number] [-d] > output.ppm
  how edges are output (default is -z)
      -a show -y as red, -x as green, luma as blue
      -x only show horizontal edges (from vertical kernel)
      -y only show vertical edges (from horizontal kernel)
      -z show combined edges
  reduce edge magnitudes by a power of 2 (default is -s 5)
      -s number of bits to right shift

Unfortunately, the edge image appears cluttered. This is typical, especially for the edge detection methods that rely exclusively on convolution with a small kernel. In this case, a 3x3 Sobel kernel is used. As a result, higher frequency noise is detected as edges. Worse, most of the edges do not seem to fit into compact blobs. We need a way of rejecting the longer edges while keeping the shorter ones.

Let's try something simple. We'll blur the image using ppm2blur before detecting edges. That should reduce the effect of noise at least. Note: A linear filter like blurring (some form of average of neighboring values) works well for additive noise but poorly for "shot noise" - that requires a non-linear filter (e.g. median of neighboring values).

cat pooltable.jpg | ./jpg2ppm | ./ppm2blur -r 10 | ./ppm2edge | display -

Usage:    cat input.ppm | ./ppm2blur [-r num] > output.ppm
  default is blur once (-r 1)
      -r number of times to repeat blurring operation

To make a clear difference, the image was blurred 10 times. It did help. There are fewer noisy specks of detected edge. The balls on the pool table stand out more than they did before. But it is still not enough. Alone, Sobel edge detection will not give us the features we need to identify objects (compact blobs of stuff) in the image.

We need to try something different.

cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox | display -

Usage:    cat input.ppm | ./ppm2fbox [-s shift] > output.ppm
  size of feature boxes (default -p 8)
      -p number pixels of box small dimension
  reduce feature magnitudes by a power of 2 (default is -s 5)
      -s number of bits to right shift

The tool ppm2fbox computes the integral image transform and uses it to calculate two kinds of box sum features - one for horizontal differences and the other for vertical. It is equivalent to looking for edges at different image scales (remember that the Sobel convolution kernel is fixed at 3x3 - multiple scales would require an image pyramid). Integral image box features may be of arbitrary size without incurring any additional computational cost. That is the power of the integral image transform (a very clever arithmetic trick).

The result is better. Noise is greatly attenuated. Unlike the Sobel kernel that only sees the 9 pixel values in a 3x3 square mask, The integral box features for the image above consider 256 pixel values in a 16x16 square. This effectively averages over many pixels. Anything much smaller than the box size will tend to be ignored. This is where the multiple scale calculation enters into our approach. By varying the feature box size and shape, we can select for different features in the image.

So now let's see if we can isolate the feature blobs. We will use the ppm2morph tool for this (somewhat non-intuitive to use it this way but very useful, especially for visualizing what is going on). The horizontal and vertical box feature blobs will grow outwards one pixel from the outer edges. This allows recognizing them as distinct.

cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | display -

Usage:    cat input.ppm | ./ppm2morph [-r|-g|-b] [-d num|-e num|-o num|-c num] > output.ppm
  which color channel to process (default is all channels)
      -r red image channel only
      -g green image channel only
      -b blue image channel only
  morphological operation (default is -e 1)
      -d number of times repeat dilation (expand region)
      -e number of times repeat erosion (shrink region)
      -o number of times repeat opening (remove small blobs)
      -c number of times repeat closing (fill in holes)

Uh oh, this is not a very good result. Apparently, the entire image frame is still filled up with detected feature values. Only small scattered patches of the image are clear to allow for dilation to grow an edge. This is the same problem we had with Sobel edge detection. However, all we need to do is ignore any but the highest feature values. To do this quickly, EmbedCV relies on division by bit shifting. It is close enough for our purpose and faster than integer division.

cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | display -

This looks much better. There are distinct blob outlines. They appear approximately where our eye would be drawn in distinguishing objects. The highest concentration of blobs, many of them overlapping, are on the pool balls. Next, we have to cluster these blobs into distinct objects.

cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox | display -

Usage:    cat input.ppm | ./ppm2tbox [-t threshold] [-r radius] > output.ppm
  threshold for detected object (default is -t 8)
      -t number of intersection points
  object radius window (default is -r 16)
      -r number of pixels

This sort of worked? The tool ppm2tbox counts any point where a horizontal (green) and vertical (red) blob outline overlaps as a feature point (appears as yellow). Many object detection and recognition approaches rely on first detecting corners or other spatial invariant features in an image. Then these features are sifted and clustered to look for objects. The integral image box feature approach specifically avoids this! What I am doing here is only a hack of my own invention. Keypoint detection followed by object detection is analogous to searching a full tree to find an optimal answer. The method of Viola and Jones specifically searches for objects with an optimized and highly aggressive pruning algorithm.

Anyway, this attempt detected:

We need to cluster more aggressively as too many objects are identified.

cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox -t 50 -r 50 | display -

This worked much better.

Note the number of points exceeding the threshold is almost the same as before. Most of the object detection improvement is more due to larger clusters from the specified object radius rather than culling out bad feature points. This is another illustration of searching for objects at multiple scales.

One more example ...

A pool table with a solid top of green felt allows for a more efficient approach using image segmentation based on color. Let's pick a problem that forces us to use feature detection instead of something simpler (and much faster) like segmentation.

Here is a picture of some rocks. We would like to isolate the rocks as distinct objects. This is a difficult image for chroma image segmentation. The shadows of the larger rocks do appear distinct from the background. But mostly the field of rocks is a mottled mixture of color. One thing to remember is that the 8x8 block cosine transform compression artifacts from JPEGs and many other media formats affect chromaticity more than luminosity. Information content is deliberately much lower by design in chroma as the human eye is far less sensitive to color inaccuracy than it is to luminance noise. As a result, color image segmentation must be able to tolerate this - either the regions to be detected are large and homogeneous or the colors very distinct. In this case, we have neither.

cat rocks.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox -t 100 -r 50 | display -

This time, we just jumped to a good solution. The larger stones are detected in the rock field. One question that you may be thinking is - where did all of these magic numbers come from? How did I know to use this threshold or that radius? The answer is that I did not know until I tried. Through experimentation, I found numbers that seemed to work reasonably well. This may seem like cheating. I agree with you.

Better statistical techniques for clustering (analogous to Otsu's method for image segmentation) address this issue to some extent. EmbedCV does not support these techniques. I still need to learn how they work, let alone write software to automate the process. So I resort to hand-tuning the object filter parameters through experimentation until the output looks right.

On a deeper level, this is not entirely cheating. Very many techniques require tuned parameters. For example, the K-Means algorithm for clustering presumes from the start that the number of clusters is known. And threshold based image segmentation methods are known to perform best with tuning. In a controlled environment like a factory, hand-tuning is likely the best way.

The approach of Viola and Jones also tunes the filter, except in an automated fashion. A large set of positive and negative object images are input into an Adaptive Boosting based machine learning algorithm. Now the computer is tuning itself. Of course, initially a human being had to establish the "ground truth" by sorting images into positive and negative cases.