Why Independent Verification Is Important

While I was researching a topic, I happened across an article that is a good reminder to us all of the consequences -- sometimes fatal -- of believing in the infallibility of software. In their article “An Investigation of the Therac-25 Accidents,” Nancy G. Leveson and Clark S. Turner (parts one, two, three, four, and five) say that trusting software too much is a common pitfall in system development that “leads to complacency and over-reliance on computerized functions.” Leveson and Turner make the point that no software can be entirely bug-free, and therefore program managers need to incorporate independent verification methods. “Systems should not be designed such that a single software error or software errors can be catastrophic,” they write.

Their case in point is the Therac-25 machine, a therapeutic radiation device designed to treat cancer patients in the mid-1980s. The machine sometimes administered fatal radiation doses for almost two years (from 1985 to 1987), despite signs that something was wrong. The article on the Therac-25, published by IEEE Computer in 1993, makes for fascinating, albeit slightly challenging, reading.

Basically the story is this: A software bug caused the Therac-25 to display that it was administering too low a dosage (or no dose) of radiation. In response, operators would repeatedly push the button to administer additional radiation. At times, patients received overdoses of radiation, despite evidence that they were receiving too much, such as red, striped skin in the pattern of the radiation-blocking tray. In some cases patients eventually died. The manufacturer insisted that their machine was incapable of overdosing patients, and hospital clinicians, too, had a hard time believing they were killing patients and continued to use the machine.

But patients continued to die, and investigators concluded that one of the software bugs that caused the Therac-25 to administer undetected doses of radiation was instigated whenever the operator punched the cursor's directional down or right arrows.

A few important conclusions come from this incident:

First, the manufacturer had subjected the Therac-25 to safety testing and hospitals had used it for thousands of hours. But, it’s not enough to test. Errors can occur even after you install a software patch to fix the first problem. “There is always another software bug," the authors wrote.

Second, much of the software code for the Therac-25 was taken from earlier models. Don’t assume that just because the software was used before, it’s okay to use it again. “Reusing software modules does not guarantee safety in the new system … and sometimes leads to awkward and dangerous designs,” Leveson and Turner wrote.