Date: Tue, 18 Apr 2006 01:04:01 -0500 (CDT) Subject: robot crash (long) X-UID: 164 http://golem5.org/robot1/images/img2883.jpg http://golem5.org/robot1/images/img2884.jpg http://golem5.org/robot1/images/design20060418.jpg I crashed the robot on Sunday. It wasn't a good feeling. It was also a lesson as I had stayed up all night, was tired and hungry, and decided to go test anyway. That was a mistake. I need to go more with common sense instead of pushing. When that little voice says it is a bad idea, it usually is. The pictures look worse than it is. The only damage is the bent front wheel axle. It's a short length of threaded stainless rod. I'll pick up another piece and make another one. On the positive side, the robot is actually pretty strong. It survives crashes far better than I would have thought. I was always worried about the exposed front wheels in the typical dune buggy configuration. They have no bumper to protect them. What happens in a turn is that if the robot's outer front wheel strikes something, it turns inward, bending the axle section and stretching the elastic cables (bungee cords) that hold the front wheels in place. This turns out to be an advantage over rigid tie rods, better ability to absorb crash energy. And I had concern about the 1/4 inch threaded stainless axles up front. They always felt springy and prone to bending - that is actually a good thing. Were they fully rigid, then the impact energy would transfer deeper into the frame. On the negative note, the control system was very erratic. The front wheels were twitchy. I had not seen that indoors. It was windy. That could have caused the robot to vibrate and the gyro reading to fluctuate. The wheels slewed hard left after I pushed the stick over. Then they were stuck and wouldn't come back to center. The throttle stuck on full. So hard left turn at full throttle and the controls weren't responding. In the rush to rewrite the control system, I had removed the remote kill. It flashed through my mind as I saw the robot go out of control how much I wished I hadn't done that. The only thing that stopped the robot was the runaway failsafe on the motor control board. This sounds kind of cool written down here but at the time I had a very sick feeling in my stomach. I still don't know why this happened. I had planned on logging and recording everything. But I forgot to turn on the logging system! So I have no test run data from the crash. Otherwise, I'd have video and data. Note that while the sensor bar was knocked loose and the right front axle bent, none of the electronics suffered damage (that I know of) or crashed. So the machine was still driveable afterwards although crippled. As you can see from the high level design drawing, there are multiple points of failure. I'm aware of some failure modes as I have seen them in development. Sometimes the sensor process locks onto the serial frames wrong and shifts the bytes. Were this to happen, the distance sensor reading might replaced the gyro. This would cause the robot to slew hard left and stay there. I still had direct manual control over the steering motor and could command it to turn right. However, both the gyro feedback and manual override control is done in the same thread. Additionally, the same thread sends logging data to hippo. So a failure with either the gyro or logging would take out the manual override. There are many more failure scenarios. But this is making me think of another style of defensive design. There are tiers of service and function. The core functions must be highly reliable. It's like the airbag system in a car. That must function under all conditions no matter what happens. If the electrical system fails, the airbag system must still work. I've read that automotive design has traditionally been constrained by these considerations. That is why you don't see pure drive-by-wire cars. Commercial cars are always power and computer assisted but the driver is always in the loop with manual control at all times. In a pure drive-by-wire system, if you lose the computer, then you lose control. At some level, this becomes inevitable. So there's the famous story of the US naval ship that lost propulsion because a Windows NT computer crashed. And the F-16 has a force sensitive joystick with a complete fly-by-wire system. These kinds of machines are so complicated that you must rely on the computer. And if you lose the computer, then you are doomed anyway. Although, the Russians did very well with the MIG-29 (now obsolescent but impressive for its time) with fully hydraulic and manual controls. So, my better sense tells me that I must have two things before I proceed with field testing. 1. A good understanding of the control loops and data flow paths through the software with an eye to failure modes. This means that design is conscious of failure modes and done in such a way to obviate it. 2. Formalized field procedures for preparing, warming up, and deploying the robot. I was stressed as I had too much to do on Sunday and tired. I need to have a checklist with clear steps. For each run, I should follow the checklist. That way, mistakes like forgetting to turn on the logging and recording system will not happen. It takes me a good 15 minutes to set up everything to run the robot. This is very awkward. It's actually nerve wracking as sometimes people are watching me while booting all the systems, getting the radios up, and uploading software. After the crash, I forgot to unplug the router on the robot. So I drove home and carried it inside powered up the entire time. At work, one guy with a nice Mac Powerbook power cycles this laptop only once every two or three weeks. He just puts it in sleep mode instead of a full power down. This gives an idea to arrive at a testing area with everything hot. If I can get enough reliability and safety in the system, then everything could be powered up, robot, computer, radios, before reaching the testing area. That way, all the computers and electronics are ready to go. Park the car, open the trunk, remove robot, take out laptop, go. From 15 minutes down to one minute. I'm probably not going to pursue this for quite a while. But it is a lesson how important these logistical considerations are in the real world. If your technology or equipment requires lots of special care, then it is most inconvenient to use. One more thing. The control system in the design diagram is a pretty radical departure from the original notion I had of one big multithreaded process (which later had a small RT scheduled child process). The reason why I leaned towards the one big process idea is to have more control. It's like if you only run one thing, then it is easier to control what is going on. But if you have many processes, then timing and scheduling are much more complicated. This is true in general. A multitasking OS is not DOS. So really what I wanted to do was make the multitasking OS like DOS where I had tight control over the computer. Well, what I found is that in practice this is very difficult or perhaps impossible. In Unix, all processes are children of init, process id 1. So in theory, especially as in Linux threads are implemented in the kernel as processes, the functionality of the new design with small processes connected by pipes should be technically possible with the "one big process". In practice, I think this is very difficult to do.