Date: Thu, 20 Apr 2006 01:08:42 -0500 (CDT) Subject: full pipe buffers caused control system failure X-UID: 165 The robot control system failed due to pipe buffers between processes filling up. This was an inherent design flaw that went unnoticed during development inside my apartment. In fact, there were multiple symptoms that were observed but not connected to a common cause or a serious problem. The rate gyro and sensors are read by a process named "sensor". This process prints the readings to standard output which is piped to the next process in the chain named "boss". gyro --data--> sensor --pipe--> boss From 10 to 50 times per second, "sensor" writes approximately 15 bytes into the pipe connected to "boss". Pipes in Linux are 4096 bytes in size. This means the pipe can hold the last 25 seconds of gyro and sensor readings. So it is possible for the "boss" process to act on gyro and sensor readings that have occurred up to 25 seconds in the past. The robot might be turning and not know this until much later. Motor control is done by a process named "motor" that has real time (RT) scheduling. This process reads command frames from a pipe with the "boss" process. boss --pipe--> motor "boss" writes approximately 25 to 50 bytes into the pipe with "motor" every time it passes through its control loop. This means it is possible for over 160 control loop cycles to take place before the motors act on the command given. The pipe buffer creates a time delay. So here's what happened. The robot is powered up, all systems booted, and control system started. At this point, it just sits there. In the time it takes to get everything ready, the pipe buffer between "sensor" and "boss" is partially filled with several seconds of old readings. The pipe buffer between "boss" and "motor" is still empty as "motor" has RT scheduling (SCHED_RR). Now forward drive throttle is applied. It so happens that this doubles the number of bytes output from "boss" into the pipe to "motor". The reason is that the robot has just switched from the initial neutral into forward drive and now must output many other commands. Instead of 25 bytes, there are now 50 bytes with every control loop cycle. Even with RT (SCHED_RR - still observes time slices) scheduling, the "motor" process can not consume the control commands fast enough and the pipe buffer quicky fills up. There is a backlog of several dozen commands telling the robot to drive forwards. As the robot starts moving, the command to turn left is given. As explained earlier, the robot still believes it is not turning as all gyro readings it sees are old, not what is happening now. So the wheels turn left and lock in place against the limits. These commands to turn left start filling up the pipe to the motor control process. The robot is now turning hard left and driving forwards at speed. It can not slow down or reverse as the pipe has several seconds worth of commands telling it to drive forwards. It can not turn right for the same reason. There are several seconds of commands to turn left. This explains why the controls, even the manual override, were locked out. On numerous occassions, I witnessed similar anomalies of control lockout during development in the apartment. But every time I restarted the control system, everything worked perfectly. Now I know that I was flushing away the pipe buffers. The old sensor reading problem was never noticed as the robot was usually stationary and not turning. The delay in sensor readings did not affect motor control as the gyro was stationary. Historical readings are the same as new ones. There were times when everything was working. When I picked up the robot and turned it back and forth to see the steering compensate, there was minimal lag and good feedback control. My explanation for this is that not much else was happening - the robot remained in neutral (25 bytes per motor control loop, not 50) and there were no radio commands (I was holding the robot, not moving the joysticks). So the computer was able to keep the pipe buffers empty as not much else was happening. But it was on the edge all the time. The fix is to keep the pipe buffers from filling up by reducing the sensor reading rate. I had so many sensor reading problems before that I tuned "sensor" for highest performance. In hindsight, the sensor readings are the basic arrival rate into the system. Unfortunately, the system as a whole is not fast enough to service this. I will probably add a noise filter into "sensor", time average successive readings for noise reduction, and halve the output rate.