As a result of some high-profile outages and the growing interest in running critical services in a public cloud, the reliability — and transparency — of public cloud services has come under scrutiny.
Cloud services are designed to operate with low failure rates. Large (at-scale) cloud and IT service providers, such as Amazon Web Services, Microsoft Azure and Google Cloud, incorporate layers of software and middleware, balance capacity across systems, networks and data centers, and reroute workloads and traffic away from failures and troubled sites.
On the whole, these architectures provide high levels of service availability at scale. Despite this, no architecture is fail-safe and many failures have been recorded that can be attributed to the difficulties of managing such complex, at-scale software, data and networks.
Our recent resiliency survey of data center and IT managers shows that enterprise managers are reasonably concerned about the resiliency of public cloud services (see Figure 1). Only one in seven (14%) respondents say public cloud services are resilient enough to run all their workloads. The same proportion say the cloud is not resilient enough to run any of their workloads; and 32% say the cloud is only resilient enough to run some of their workloads. The increase in the number of “don’t know” responses since our 2021 survey also shows that confidence in the resiliency of the cloud has become shrouded in some uncertainty and skepticism.
Concerns over cloud resiliency may be partly due to several recent outages being attributed to third-party service providers. In our resiliency survey, 39% of organizations suffered an outage that was caused by a problem with a third-party supplier.
As more workloads are outsourced to external providers, the more these operators account for high profile, public outages. Over a five-year period, third-party, commercial operators of IT and / or data centers (cloud, hosting, colocation, digital services, telecommunications, etc.) combined accounted for almost 63% of all public outages since 2016, when Uptime Institute started tracking them. This percentage has crept up year-by-year: in 2021 the combined proportion of outages caused by these commercial operators was 70%.
When third-party IT and data center service providers do have an outage, customers are affected immediately. These customers may seek compensation — and a full explanation. Many regulators and enterprises now want increased visibility, accountability and improved service level agreements — especially for cloud providers.
Watch our 2022 Outage Report Webinar for more research on the causes, frequency and costs of digital infrastructure outages. The complete 2022 Annual Outage Analysis report is available exclusively to Uptime Institute members.