On Nuclear Reactors and Banks

Putting Nuclear Reactors and Banks into the same sentence seems odd to most people, but Tim Hartford points out in What we can learn from a nuclear reactor that there are some important similarities.  Both are complex and tightly coupled systems.  There are similarities in their failure modes and safeguard systems — and there are similarities in the way the safeguards can fail us and cause further harm.

It might seem obvious that the way to make a complex system safer is to install some safety measures. Engineers have long known that life is not so simple. In 1638, Galileo described an early example of unintended consequences in engineering. Masons would store stone columns horizontally, lifted off the soil by two piles of stone. The columns often cracked in the middle under their own weight. The “solution” – a third pile of stone in the centre – didn’t help. The two end supports would often settle a little, and the column, balanced like a see-saw on the central pile, would then snap as the ends sagged.

Galileo had found a simple example of a profound point: a new safety measure or reinforcement often introduces unexpected ways for things to go wrong. This was true at Three Mile Island. It was also true during the horrific accident on the Piper Alpha oil and gas platform in 1988, which was aggravated by a safety device designed to prevent vast seawater pumps from starting automatically and killing the rig’s divers. The death toll was 167.

In 1966, at the Fermi nuclear reactor near Detroit, a partial meltdown put the lives of 65,000 people at risk. Several weeks after the plant was shut down, the reactor vessel had cooled enough to identify the culprit: a zirconium filter the size of a crushed beer can, which had been dislodged by a surge of coolant in the reactor core and then blocked the circulation of the coolant. The filter had been installed at the last moment for safety reasons, at the express request of the Nuclear Regulatory Commission.

The problem in all of these cases is that the safety system introduced what an engineer would call a new “failure mode” – in other words, a new way for things to go wrong. And that was precisely the problem in the financial crisis.

“… a new safety measure or reinforcement often introduces unexpected ways for things to go wrong”

We the people do not understand this principle.  We the people demand that something be done.  But often that something just makes the system more complex while introducing new modes of failure.

See also Some Laws of Systemantics:

The Fundamental Failure-Mode Theorem (F.F.T.): Complex systems usually operate in failure mode.

Software Engineers know this too.

Long long ago, in a galaxy far away,  I worked on a large and complex system with lots of built-in logging. Against all odds, a bug was discovered in the code.  When I turned up the level of detail for debug logging, the system crashed.  It turns out that there was a bug in the log statement — code that had never been run before.

Comments are closed.