Data Center Outages - Part Deux

How to Avoid Outages – Part Deux

A previous Uptime Intelligence Note suggested that avoiding data center outages might be as simple as trying harder. The Note suggested that management failures are the main reason that enterprises continue to experience downtime incidents, even in fault tolerant facilities. “The Intelligence Trap,” a new book by David Robson, sheds light on why management mistakes continue to plague data center operations. The book suggests that management must be aware of its own limitations if it is going to properly staff operations, develop training programs and evaluate risk. Robson also notes that modern education practices, especially for children in the US and Europe, have created notions about intelligence that eventually undermine management efforts and also inform opposing and intractable political arguments about wider societal issues such as climate change and the anti-vaccination movement.

Citing dozens of anecdotes and psychological studies, Robson carefully examines how the intelligence trap ensnares almost everyone, including notables such as Arthur Conan Doyle, Albert Einstein and other famous, high IQ individuals — even Nobel laureates. Brilliant as they are, many fall prey to a variety of hoaxes. Some believe a wide variety of conspiracy theories and junk science, in many cases persuading others to join them.

The foundations of the trap, Robson notes, begin with the early and widespread acceptance of the IQ test, its narrow definition of intelligence and expectations that “smart” people will perform better at all tasks than others. Psychologist Keith Stanovich says intelligence tests “work quite well for what they do,” but are a poor metric of overall success. He calls the link between intelligence and rationality weak.

Studies show that geniuses experience deference in fields far beyond their areas of expertise. But according to studies, they are at least as apt to use their intelligence to craft arguments that justify their existing positions than to open themselves to new information that may cause them to change their opinions.

In some cases, ingrained patterns of thinking may further undermine these geniuses, with Robson noting that experts often develop mental shortcuts, based on experience, that enable them to work quickly on routine cases. In the medical field, these patterns contribute to a higher rate of misdiagnosis: in the US, one in six initial diagnoses proves to be incorrect, leading to one in 10 (40,000-80,000) patient deaths per year.

The intelligence trap can catch organizations, too. The US FBI held firmly to a mistaken fingerprint reading that implicated an innocent man in the 2004 terrorist bombing in Madrid, even after non-forensic evidence emerged that showed the suspect had played no role in the attack. FBI experts were not even persuaded by Spanish authorities who found discrepancies between the print found at the bombing and the suspect’s fingerprint.

Escaping the intelligence trap is difficult, but possible. However, ingrained patterns of thinking lead successful individuals to surround themselves with too many experts, forming non-diverse teams that do not function well together. High levels of intelligence and experience and lack of diversity can lead to overconfidence that increases their appetite for risk. Competition among accomplished team members can also lead to poor team function.

Data center owners and operators should note that even highly functional teams sometimes process close calls as successes, which leads them to discount the likelihood of a serious incident. NASA suffered two well-known space shuttle disasters after its engineers began to downplay known risks due to a string of successful missions. Prior to the loss of the Challenger, many shuttles had survived the faulty seals that caused the 1986 disaster, and every shuttle had survived damage to foam insulation until the Columbia disaster in 2003. Prior experience had reduced the perceived risk of these dangerous conditions, a condition called “outcome bias.”

The result, says Catherine Tinsley, a professor of management at Georgetown specializing in corporate catastrophe, is a subtle but distinct increase in risk appetite. She says, “I’m not here to tell you what your risk tolerance should be. I’m here to say that when you experience a near miss, your risk tolerance will increase and you won’t be aware of it.”

The real question here is not whether mistakes and disasters are inevitable. Rather, it’s how to become conscious of the subconscious. Staff should be trained to recognize the limitations that cause bad decisions. And perhaps more succinctly, data center operators and owners should recognize that procedures must not only be foolproof — they must be “expert-proof” as well.

More information on this topic is available to members of the Uptime Institute Network. Membership information can be found here.

Share this