The pandemic, outages and the internet giants
In a recent Uptime Institute Intelligence analysis we considered a question that Uptime Institute has been asked many times since COVID-19 lockdowns began: Has the pandemic caused any increase in outages? The question arose because the pandemic has caused staff shortages, extended shifts, delays to maintenance, and a shortage of parts for at least some operators. In theory, any of these factors could contribute to more outages.
We also noted considerable speculation in the media about the internet giants, which have seen some dramatic changes to traffic and workload patterns during lockdowns in various regions.
In April, based on a survey and other evidence, we concluded that there may indeed have been a small increase in outages, although it is not always definitively possible to ascribe the cause to the pandemic. There were roughly eight outages, about 4% of the sample, that were related to COVID-19.
In mid-July, we repeated our research. Again, the number of those with an outage said to be related to COVID-19 was in the 3-4% range. In this survey, a similar percentage said COVID-19 contributed to an IT service slowdown. Not dramatic, but significant. For context, there were two to three times as many outages over the period that were not COVID-19 related (per survey findings). However, as we noted before, an outage caused by human error can’t necessarily be ascribed to the pandemic (e.g., to tiredness or unfamiliar duties).
And what of the internet giants/cloud companies, which deploy architectures based on sharing loads across multiple data centers (within and between regional availability zones)? These companies make use of the natural, distributed resiliency of the internet, but at the same time experience great changes in traffic flows as worker (and machine) behaviors change.
Now that the first half of 2020 has ended, we have compared the prevalence and impact of publicly reported outages in the cloud/internet giant and digital service provider groups against their 2019 performance.
As shown in the figure above, the patterns are consistent with our past reporting: the number of publicly recorded outages by cloud/internet giants is holding steady or increasing, and most outages are minor. Although the category of “serious/severe” outages is likely to jump this year (the half year for 2020 equals the full year for 2019) there has been a strong increase in all outages every year since we began tracking public outages in 2016.
As the pandemic forced changes on businesses — notably, a shift to remote working and greater online service delivery — many increased their dependence on cloud and digital service providers. There have been a few well-publicized outages (e.g., Google Cloud, Zoom, IBM Cloud) and/or instances of capacity constraints (e.g., Microsoft Azure), but overall, these cloud and digital service providers appear to have responded well to the stresses of the pandemic.