Note: The ImageMagick tool display is used throughout. ImageMagick is an excellent free, open source set of general image processing tools.
NOTE: The EmbedCV library really does not support object detection! It does have a few functions for computing the integral image transform and simple box features (Paul Viola and Michael Jones, Robust Real-time Object Detection, July 2001). Use of the integral transform for object detection really requires much deeper infrastructure than what EmbedCV currently provides. The level of effort for full object detection and identification is so much greater that I felt it best to checkpoint the code as-is. The approach on this page is a hack done over Thanksgiving weekend and is extremely inefficient. An optimized approach should run orders of magnitude faster.
All object detection methods I am aware of detect features and try to cluster them into clumps that appear recognizable to the computer vision system as distinct objects. The detected features almost always come from the luminosity image. Color is typically ignored. This is sensible as most image information content is in luminosity, not chromaticity. However, it does sometimes happen that the useful information is in chromaticity rather than luminosity.
Let's try detecting edges in the image to see if they "clump" together in a way useful for object detection. We will use the ppm2edge tool to generate an edge image.
cat pooltable.jpg | ./jpg2ppm | ./ppm2edge | display -
Usage: cat input.ppm | ./ppm2edge [-a|-x|-y|-z] [-s number] [-d] > output.ppm
how edges are output (default is -z)
-a show -y as red, -x as green, luma as blue
-x only show horizontal edges (from vertical kernel)
-y only show vertical edges (from horizontal kernel)
-z show combined edges
reduce edge magnitudes by a power of 2 (default is -s 5)
-s number of bits to right shift
Unfortunately, the edge image appears cluttered. This is typical, especially for the edge detection methods that rely exclusively on convolution with a small kernel. In this case, a 3x3 Sobel kernel is used. As a result, higher frequency noise is detected as edges. Worse, most of the edges do not seem to fit into compact blobs. We need a way of rejecting the longer edges while keeping the shorter ones.
Let's try something simple. We'll blur the image using ppm2blur before detecting edges. That should reduce the effect of noise at least. Note: A linear filter like blurring (some form of average of neighboring values) works well for additive noise but poorly for "shot noise" - that requires a non-linear filter (e.g. median of neighboring values).
cat pooltable.jpg | ./jpg2ppm | ./ppm2blur -r 10 | ./ppm2edge | display -
Usage: cat input.ppm | ./ppm2blur [-r num] > output.ppm
default is blur once (-r 1)
-r number of times to repeat blurring operation
To make a clear difference, the image was blurred 10 times. It did help. There are fewer noisy specks of detected edge. The balls on the pool table stand out more than they did before. But it is still not enough. Alone, Sobel edge detection will not give us the features we need to identify objects (compact blobs of stuff) in the image.
We need to try something different.
cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox | display -
Usage: cat input.ppm | ./ppm2fbox [-s shift] > output.ppm
size of feature boxes (default -p 8)
-p number pixels of box small dimension
reduce feature magnitudes by a power of 2 (default is -s 5)
-s number of bits to right shift
The tool ppm2fbox computes the integral image transform and uses it to calculate two kinds of box sum features - one for horizontal differences and the other for vertical. It is equivalent to looking for edges at different image scales (remember that the Sobel convolution kernel is fixed at 3x3 - multiple scales would require an image pyramid). Integral image box features may be of arbitrary size without incurring any additional computational cost. That is the power of the integral image transform (a very clever arithmetic trick).
The result is better. Noise is greatly attenuated. Unlike the Sobel kernel that only sees the 9 pixel values in a 3x3 square mask, The integral box features for the image above consider 256 pixel values in a 16x16 square. This effectively averages over many pixels. Anything much smaller than the box size will tend to be ignored. This is where the multiple scale calculation enters into our approach. By varying the feature box size and shape, we can select for different features in the image.
So now let's see if we can isolate the feature blobs. We will use the ppm2morph tool for this (somewhat non-intuitive to use it this way but very useful, especially for visualizing what is going on). The horizontal and vertical box feature blobs will grow outwards one pixel from the outer edges. This allows recognizing them as distinct.
cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | display -
Usage: cat input.ppm | ./ppm2morph [-r|-g|-b] [-d num|-e num|-o num|-c num] > output.ppm
which color channel to process (default is all channels)
-r red image channel only
-g green image channel only
-b blue image channel only
morphological operation (default is -e 1)
-d number of times repeat dilation (expand region)
-e number of times repeat erosion (shrink region)
-o number of times repeat opening (remove small blobs)
-c number of times repeat closing (fill in holes)
Uh oh, this is not a very good result. Apparently, the entire image frame is still filled up with detected feature values. Only small scattered patches of the image are clear to allow for dilation to grow an edge. This is the same problem we had with Sobel edge detection. However, all we need to do is ignore any but the highest feature values. To do this quickly, EmbedCV relies on division by bit shifting. It is close enough for our purpose and faster than integer division.
cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | display -
This looks much better. There are distinct blob outlines. They appear approximately where our eye would be drawn in distinguishing objects. The highest concentration of blobs, many of them overlapping, are on the pool balls. Next, we have to cluster these blobs into distinct objects.
cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox | display -
Usage: cat input.ppm | ./ppm2tbox [-t threshold] [-r radius] > output.ppm
threshold for detected object (default is -t 8)
-t number of intersection points
object radius window (default is -r 16)
-r number of pixels
This sort of worked? The tool ppm2tbox counts any point where a horizontal (green) and vertical (red) blob outline overlaps as a feature point (appears as yellow). Many object detection and recognition approaches rely on first detecting corners or other spatial invariant features in an image. Then these features are sifted and clustered to look for objects. The integral image box feature approach specifically avoids this! What I am doing here is only a hack of my own invention. Keypoint detection followed by object detection is analogous to searching a full tree to find an optimal answer. The method of Viola and Jones specifically searches for objects with an optimized and highly aggressive pruning algorithm.
Anyway, this attempt detected:
cat pooltable.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox -t 50 -r 50 | display -
This worked much better.
A pool table with a solid top of green felt allows for a more efficient approach using image segmentation based on color. Let's pick a problem that forces us to use feature detection instead of something simpler (and much faster) like segmentation.
Here is a picture of some rocks. We would like to isolate the rocks as distinct objects. This is a difficult image for chroma image segmentation. The shadows of the larger rocks do appear distinct from the background. But mostly the field of rocks is a mottled mixture of color. One thing to remember is that the 8x8 block cosine transform compression artifacts from JPEGs and many other media formats affect chromaticity more than luminosity. Information content is deliberately much lower by design in chroma as the human eye is far less sensitive to color inaccuracy than it is to luminance noise. As a result, color image segmentation must be able to tolerate this - either the regions to be detected are large and homogeneous or the colors very distinct. In this case, we have neither.
cat rocks.jpg | ./jpg2ppm | ./ppm2fbox -s 13 | ./ppm2morph -r -d 1 | ./ppm2morph -g -d 1 | ./ppm2tbox -t 100 -r 50 | display -
This time, we just jumped to a good solution. The larger stones are detected in the rock field. One question that you may be thinking is - where did all of these magic numbers come from? How did I know to use this threshold or that radius? The answer is that I did not know until I tried. Through experimentation, I found numbers that seemed to work reasonably well. This may seem like cheating. I agree with you.
Better statistical techniques for clustering (analogous to Otsu's method for image segmentation) address this issue to some extent. EmbedCV does not support these techniques. I still need to learn how they work, let alone write software to automate the process. So I resort to hand-tuning the object filter parameters through experimentation until the output looks right.
On a deeper level, this is not entirely cheating. Very many techniques require tuned parameters. For example, the K-Means algorithm for clustering presumes from the start that the number of clusters is known. And threshold based image segmentation methods are known to perform best with tuning. In a controlled environment like a factory, hand-tuning is likely the best way.
The approach of Viola and Jones also tunes the filter, except in an automated fashion. A large set of positive and negative object images are input into an Adaptive Boosting based machine learning algorithm. Now the computer is tuning itself. Of course, initially a human being had to establish the "ground truth" by sorting images into positive and negative cases.