Date: Tue, 23 May 2006 00:11:16 -0500 (CDT) Subject: big insight into stanley's vision system X-UID: 173 The Intel OpenCV article about Stanley says exactly how it works. But the meaning doesn't come through until some of the concepts are laid out. This last week, I've been poking at my image processing book while eating lunch, waiting for a haircut, etc. Everything came together just now. Stanley's image segmentation relies on an ergodic assumption that only holds when the robot is moving. When stationary, it is completely reliant on LIDAR. Basically, the adaptive vision is an optimization for extending range, just like they have always claimed. It doesn't work when the robot is not moving. Stanley maintains a circular image buffer. This is a time series of monocular camera images. When the robot is moving, stuff like rocks and dirt on the side of the road will blur together when averaged in the time domain at any point. The average will equal the spatial average if the ergodic assumption is true. This will tend to be the case when the road and scenery are not changing too much. For a robot moving at high speed on road, this is most of the time. In general, ergodicity is a statement about the equality of averages in the time and spatial domains. I don't know much more about it than this as I just read a definition from the Wikipedia. This is a very powerful result (discovered 75 years ago by the American mathematician, Birkhoff). Alone, this is not enough. The image segmentation is very rough. Adaptive RGB histogram thresholding does not yield a clean road surface. Region connectedness criteria are applied to yield the large contiguous blob that should be the road. Then it is compared to expected models of a road. This last step is a key observation - the system is not model free. It must know what a road looks like to find it reliably. In this light, I was going in the right direction. My approach was just far too simple. Also, highest performance should be driving down a path not much wider than the robot. The reason is that the model is simple. The road is a long strip of pavement. This allows the vision system to reject shadows or other anomalies from lighting. In a parking lot with trees, the model becomes far more complex. Stanley would then rely on the LIDAR system. But as my robot does not have such a system, it will tend to become confused. What this means is that real world robot vision is generations away. The most advanced field robots today are still very primitive. Without LIDAR or other pulsed time of flight distance measurement systems, the could not function reliably.