CSC 379:Week 5, Group 5

From Expertiza_Wiki
Jump to navigation Jump to search

Software Safety: Accident Models - Systems Theory vs. Chain of Events

Skim through the following paper (focus on sections 1, 2.3, and 3, skip figures and tables) entitled "A Systems-Theoretic Approach to Safety in Software-Intensive Systems" by Nancy G. Leveson, a Professor of Aeronautics and Astronautics at MIT, then answer the following questions:

The majority of the content you need to form an informed response to the above questions is included in the paper. Bring in outside resources and topics discussed in class lectures as appropriate to support your response.


What are some shortcomings of traditional methods of accident reporting when applied to complex systems like software systems?

  • Event-chain models tend to stop once something to blame is found. "reports stopped after assigning blame—usually to the operators who interacted with the software—and never got to the root of why the accident occurred"
  • Event chain models were not designed to handle complex systems such as software. "in dealing with software in safety-critical systems is the result of inappropriately attempting to extend the techniques that were successful in simpler, electromechanical systems and were based on models of accident causation that no longer apply"
    • Software can be very complex

How does the STAMP model improve accident prevention efforts? Explain some general concepts of the model.

"Systems theory allows more complex relationships between events to be considered"

"Accident models based on systems theory consider accidents as arising from the interactions among system components and usually do not specify single causal variables or factors"

The STAMP model provides more information in terms of how to prevent future accidents rather then trying to place blame.

Hazard analysis using STAMP rather then traditional methods can prevent accidents from happening in software based systems

Why was the Milstar satellite damaged although the components of the Inertial Navigation Unit (INU) operated correctly with respect to the instructions, including constraints, and data provided? Why would use of the STAMP model more thoroughly prevent problems such as those that occurred with the INU compared to traditional accident reporting?

There was a miscommunication (or the lack of) between the different agencies responsible for the different components of systems control. Specifically, the Flight Control Software and the Inertial Measurement System were not colaborating properly. Individually, each of the subsystems worked properly; it was only when working together that miscommunication and problems manifested. If the STAMP model was used, the model would allow a particular system/process to be broken down, thus better understanding of the process itself, and how the process interacts with other components.

What are some appropriate applications of the STAMP model (both current and past)? Explain.

The STAMP model is especially useful in analyzing complex socio-technical and software-based systems where accidents can occur due to complex human decision making, component interaction rather than single component failure, and accidents that occur because slow shifts toward an accident prone environment.

Walkerton, Ontario: Water Contamination Accident

"The stage for the accident had been set over a large number of years by actions at all levels of the socio-technical system structure."[2]

"Degradation in the water safety control structure had occurred over time, without any particular single decision to do so but simply as a series of decisions that moved the public water system slowly toward a state of high risk where any slight error or deviation from the normal could lead to a major accident. Degradation of the safety control structure may be related to asynchronous evolution, where one part of a system changes without the related necessary changes in other parts. Changes to subsystems may be carefully designed, but consideration of their effects on other parts of the system, including the control aspects, may be neglected or inadequate."[2]

CNN Article on the Outbreak

The Mars Polar Lander Loss

"The software did not adequately control the descent speed of the aircraft - it misinterpreted noise from a Hall effect sensor as an indication the spacecraft had reached the surface of the planet"[1]

The components did not fail in terms of not satisfying their specified requirements, the failure occurred due to an unplanned effect of the system's interacting components.

CNN Article on the Mars Polar Lander Loss

The Space Shuttle Challenger

The O-rings did not adequately control propellant gas release and there were inadequate controls in the launch-decision process. The failures occurred due to a complex socio-technical interaction.

The Rogers Commission Report

What are some ethical concerns of assigning blame for accidents?

An investigation into an accident has two main objectives:

1. to assign blame/responsibility for the accident

2. to prevent future accidents

An ethical dillema may occur if, based on chain-of-event reasoning, blame is assigned to a "root cause" without taking into account for instance a situation that had become slowly unstable and anything could have set off an accident.

Outside Links

1. "A Systems-Theoretic Approach to Safety in Software-Intensive Systems" - Nancy G. Leveson

2. Applying STAMP in Accident Analysis - Nancy Leveson, Mirna Daouk, Nicolas Dulac, and Karen Marais

Relevant Class Website Links