Making sense of the outage numbers

Making sense of the outage numbers

In recent years, Uptime Institute has published regular reports examining both the rate and causes of data center and IT service outages. The reports, which have been widely read and reported in the media, paint a picture of an industry that is struggling with resiliency and reliability — and one where operators regularly suffer downtime, disruption, and reputational and financial damage.

Is this a fair picture? Rather like the authors of a scientific paper whose findings from a small experiment are hailed as a major breakthrough, Uptime Institute Intelligence has often felt a certain unease when the results of these complex findings, pulled from an ever changing and complex environment, are distilled into sound bites and headlines.

In May this year, Uptime Intelligence published its Annual outage analysis 2022. The key findings were worded cautiously: outage rates are not falling; many outages have serious / severe consequences; the cost and impact of outages is increasing. This year, the media largely reported the findings accurately, picking different angles on the data — but this has not always been the case.

What does Uptime Institute think about the overall outages rate? Is reliability good or bad? Are outages rising or falling? If there were straightforward answers, there would be no need for this discussion. The reality, however, is that outages are both worsening and improving.

In our recent outage analysis report, four in five organizations surveyed say they’d had an outage in the past three years (Figure 1). This is in line with the previous years’ results and is consistent with various other Uptime studies. 

diagram: Most organizations experienced an outage in the past three years
Figure 1 Most organizations experienced an outage in the past three years

A smaller proportion, about one in five, have had a “serious” or “severe” outage (Uptime classes outages on a severity scale of one to five; these are levels four and five), which means the outcome has serious or severe financial and reputational consequences. This is consistent with our previous studies and our data also shows the cost of outages is increasing. 

By combining this information, we can see that the rate of outages, and their severity and impact, is not improving — in some ways it’s worsening. But hold on, say many providers of data center services and IT, we know our equipment is much better than it was, and we know that IT and data center technicians have better tools and skills — so why aren’t things improving? The fact is, they are.

Our data and findings are based on multiple sources: some reliable, others less so. The primary tools we use are large, multinational surveys of IT and data center operators. Respondents report on outages of the IT services delivered from their data center site(s) or IT operation. Therefore, the outage rate is “per site” or “per organization”.

This is important because the number of organizations with IT and data centers has increased significantly. Even more notable is the amount of IT (data, compute, IT services) per site / organization, which is rising dramatically every year.

What do we conclude? First, the rate of outages per site / company / IT operation is steady on average and is neither rising nor falling. Second, the total number of outages is rising steadily, but not substantially, even though the number of organizations either using or offering IT is increasing. Lastly, the number of outages as a percentage of all IT delivered is falling steadily, if not dramatically.

This analysis is not easy for the media to summarize in one headline. But let’s make one more observation as a comparison. In 1970, there were 298 air crashes, which resulted in 2,226 deaths; in 2021, there were 84 air crashes, which resulted in 359 deaths. This is an enormous improvement, particularly allowing for the huge increase in passenger miles flown. If the airline safety record was similar to the IT industry’s outage rate, there would still be many hundreds of crashes per year and thousands of deaths.

This is perhaps not a like-for-like comparison — flight safety is, after all, always a matter of life and death. It does, however, demonstrate the power of collective commitment, transparency of reporting, regulation and investment. As IT becomes more critical, the question for the sector, for regulators and for every IT service and data center operator is (as it always has been): what level of outage risk is acceptable?

Watch our 2022 Outage Report Webinar for more from Uptime Institute Intelligence on the causes, frequency and costs of digital infrastructure outages. The complete 2022 Annual Outage Analysis report is available exclusively to Uptime Institute members. Learn more about Uptime Institute Membership and request a free guest trial here.

Share this