Blog Multi Author - Uptime Institute Blog

Is Google a credible enterprise cloud?

February 15, 2023/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

Google was an underdog when it launched its infrastructure cloud in 2013. Amazon had already made a name for itself as a disruptive technology provider, having launched Amazon Web Services (AWS) seven years prior. Microsoft, a household name in commercial software, launched Azure in 2010. What chance did Google, a company known primarily for its search engine, have competing with cloud leader AWS and enterprise behemoth Microsoft?

Initially, it was difficult to understand Google Cloud’s value proposition. Google was primarily a consumer business, with users expected to serve themselves through instruction manuals and support portals. In a business-to-business (B2B) engagement involving significant expenditure, most cloud buyers need more of a personalized and professional relationship — i.e., help and technical support accessible round the clock, coupled with regular review meetings and negotiations.

Although organizations like the idea of the newest and shiniest technologies, it is the reliability and consistency — not just of the product but also of its vendor — that drives day-to-day success. Buyers want relationships, commitments and financial sustainability from their suppliers. Google had the technology, but its reliability and consistency as an IT services partner were untested and unclear.

Google Cloud has, since then, become a more credible enterprise cloud provider. It has achieved this by developing its partnerships and enterprise credentials while harnessing its existing reputation for innovation and scale. But Google’s endeavor to build a competitive cloud business has never been straightforward: nor has it proved possible by mimicking its rivals.

Building differentiation

When it first launched, Google Cloud had yet to identify (or at least promote) its fundamental value proposition. As part of its reinvention, however, it is now promoting itself as specializing in Big Data. Google has a core web business (search and advertising) that effectively uses the entire web as its database. Google’s search engine exemplifies outstanding web-wide reliability and simplicity, and the company is known globally for innovation on a vast scale.

Google has doubled down on this message, promising users access to those innovations that have made Google so ubiquitous — such as machine learning and web-scale databases. The company doesn’t want users to choose Google Cloud because it has low-price virtual machines; rather, Google sees its role (and brand) as helping businesses scale their opportunities, using the newest technology.

In this vein, the company recently announced new capabilities for its BigQuery database, new machine learning translation tools, and other Big Data products and capabilities at the company’s developer conference, Google Cloud Next ‘22. But even infrastructure capabilities (such as new Intel and NVIDIA server chips) as well as the general availability of Google’s latest generation of AI-accelerators (tensor processors or TPUs) were presented in the context of making the most of data. In Google’s messaging, new infrastructure doesn’t just promise faster virtual machines — it delivers better processing capabilities to customers looking to develop new ways of extracting value from data.

Building credibility

Google might be relatively new to enterprise sales, but its partners are experienced players. Google has addressed its lack of enterprise experience by partnering with systems integrators such as Infosys, HCLTech, Tata Consultancy Services (TCS), Accenture, Capgemini and Atos. It has developed European “sovereign” clouds with T-Systems, Thales and Minsait. Google offers its Anthos Multi-Cloud platform through original equipment manufacturers (OEMs) including Cisco, Dell EMC, Hewlett Packard Enterprise (HPE), Intel, Lenovo, NetApp, Nutanix, NVIDIA, and VMware.

Google is historically popular with developers due to its open-source approach. But some recent successes may be the result of its repositioning to promote business value over technical ability. This approach is more likely to capture the ear of C-level executives (who make the big, transformational decisions), including appointing primary cloud providers. Google has built credibility by selling to brands such as Toyota, Wayfair, Snap, Twitter, PayPal and HSBC.

The company also demonstrates credibility through continued investment. At Google Cloud Next ‘22 the company announced new regions in Austria, Greece, Norway, South Africa, and Sweden, bringing the total number of regions to 48. Security and productivity, too, were high on the agenda at that event, again helping to build brand credibility.

Economics is still a challenge

Although Google’s cloud business has matured considerably in recent years, it still faces challenges. As previously discussed in Cloud price increases damage trust, Google Cloud prices saw some sharp increases in October 2022 – with multi-region nearline storage rates, for example, increasing by 50%, and some operations fees doubling. Load balancers will also be subject to an outbound bandwidth charge. Google Cloud has made considerable gains in convincing users that it is a relationship-led, enterprise-focused, innovative company and not just a consumer business. But such sweeping price increases would appear to damage its credibility as a reliable business partner in this regard. Google Cloud revenue increased by 35% year-on-year in Q2 2022, reaching $6.3 billion. Despite this growth, however, the division reported an operating loss of $858 million for the same period. Google Cloud’s revenue trails that of AWS and Microsoft Azure, by a wide margin. Google Cloud may well have implemented its recent price increases with the intention of building a more profitable and more sustainable business. Its recent price hikes are reasonable, considering, as outlined above, the importance customers attach to reliability and consistency. The question is, has Google yet convinced the market that it is worth its recent price hikes? While users should continue to consider Google Cloud as part of fuller vendor evaluations, they should perhaps bear in mind its history of raising prices.

Higher data center costs unlikely to cause exodus to public cloud

February 8, 2023/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

A debate has been raging since cloud computing entered the mainstream: which is the cheaper venue for enterprise customers — cloud or on-premises data centers? This debate has proved futile for two reasons. First, the characteristics of any specific application will dictate which venue is more expensive — there is no simple, unequivocal answer. Second — the question implies that a buyer would choose a cloud or on-premises data center primarily because it is cheaper. This is not necessarily the case.

Infrastructure is not a commodity. Most users will not choose a venue purely because it costs less. Users might choose to keep workloads within their data centers or at a colo because they want to be confident they are fully compliant with legislation and / or regulatory requirements, or to be situated close to end users. They might choose cloud computing for workloads that require rapid scalability, or to access platform services further up the stack. Of course, costs matter to CIOs and CFOs alike, but cloud computing, on-premises data centers and colos all deliver value beyond their relative cost differences.

One way of assessing the value of a product is through a price-sensitivity analysis, whereby users are asked how they would (hypothetically) respond to price changes. Users who derive considerable value from a product are less likely to change their buying behavior following any increase in cost. Users more sensitive to cost increases will typically consider competing offers to reduce or maintain costs. Switching costs are also a factor in a user’s sensitivity to price changes. In cloud computing, for example, the cost of rearchitecting an application as part of a migration might not be justifiable if the resultant ongoing cost savings are limited.

IT decision-makers surveyed as part of Uptime Intelligence’s Data Center Capacity Trends Survey 2022 were asked what percentage of current workloads they would be likely to migrate to the cloud if their existing data center costs (covering on-premises and colos) rose 10%, 50% or 100%, respectively (assuming cloud prices remained stable).

While Uptime has neither conducted nor seen extensive research into rising costs, most operators are likely to be experiencing strong inflationary pressures (i.e., of over 15%) on their operations: energy prices and staff shortages being the main drivers.

The survey responses are illustrated in two different formats:

Figure 1 summarizes the average percentage of workloads likely to be migrated to the cloud as a result of any increase in costs.
Figure 2 shows what percentage of respondentswould make no response to any such increases (shown as 0%), what proportion would be likely to migrate some of their workloads (10% to 50%) and what proportion would be likely to migrate most of their workloads (50% or more).

**Figure 1 Average likely workload migration to public cloud due to data center cost increases**

diagram: Operators’ responses to data center cost increase — **Figure 2 Operators’ responses to data center cost increase**

What does this data tell us? Figure 1 shows that if on-premises or colo costs were to increase by 10%, then around 12% of workloads could migrate to the cloud. If costs were to increase by 50%, approximately 24% of workloads would potentially move to the cloud. Even if costs were to double, however, only just over 30% of workloads would be likely to migrate to the public cloud. This suggests that on-premises and colo users are not particularly price-sensitive. While they are likely to have some impact, rising data center costs per se are unlikely to trigger a mass exodus to the public cloud.

Some users are more price sensitive than others, however. Figure 2 shows that 42% of respondents indicate a 10% increase in costs would not drive any workloads to the public cloud. One quarter of respondents would still be unlikely to migrate workloads even if faced with price hikes of 50%. Notably, a quarter of respondents indicate they would not migrate any workloads even if costs were to double. This may suggest that at least 25% of those organizations surveyed do not consider the public cloud to be a viable option for their workloads currently.

This reluctance may be the result of several factors. Some respondents may derive value from hosting workloads in non-cloud data centers and may believe this to justify any additional expense. Others may believe that regulatory, technical and compliance issues render the public cloud unviable, making cost implications irrelevant. Some users may feel that moving to the public cloud is simply cost-prohibitive.

Most users are susceptible to price increases, however — at least to some extent. A 10% increase in costs would drive 55% of organizations to migrate some or most of their workloads to the cloud. A total of 59% of respondents indicate they would do so if faced with a more substantial 50% increase. Faced with a doubling of their costs, over a quarter of respondents would migrate most of their workloads to the cloud. Again, this is assuming that cloud costs remain constant — and it is unlikely that cloud providers could absorb such significant upward cost pressures without any increase in prices.

Other survey data (not shown in graphics) indicates that even if infrastructure expenditure were to double, only 7% of respondents would migrate their entire workloads to the cloud. Given that 25% of respondents indicate that they would keep all workloads on-premises regardless of cost increases, this confirms that most users are adopting a hybrid IT approach. Most users are willing to consider on-premises and cloud facilities for their workloads, choosing the most appropriate option for each application.

Although the Uptime Intelligence Data Center Capacity Trends Survey 2022 did not, specifically, cover the impact of price reductions, it is possible to estimate the potential impacts of cloud providers cutting their rates. A price cut of 10% would be unlikely to attract significantly more workloads to the public cloud: but a 50% reduction would have a more dramatic impact. As indicated above, however, cloud providers — faced with the same energy-cost challenges as data center owners and colos — are more likely to absorb any cost increases in their gross margins rather than risk damaging their credibility by raising prices (see OVHcloud price hike shows cloud’s vulnerability to energy costs).

In conclusion:

Many organizations have no desire (or the ability) to use the public cloud, regardless of any cost increases, and will absorb any price hikes as best as they can.
Most organizations are adopting a hybrid IT approach and use a mix of cloud and on-premises locations for their workloads.
Rising costs (such as energy) may accelerate workload migration from on-premises data centers and colos to the public cloud (assuming cloud providers’ prices do not rise too).
The costs involved in moving applications and rearchitecting them to work effectively in the public cloud mean single-digit cost increases are likely to have only a minimal effect on migrations.
More significant cost increases could drive more workloads to the cloud since the savings to be made over the longer term could justify the switching costs involved.
Public-cloud price reductions could, similarly, accelerate cloud migration; however, dramatic price cuts are unlikely.

First signs of federal data center reporting mandates appear in US

February 1, 2023/in Executive, Operations/by Jay Dietrich, Research Director of Sustainability, Uptime Institute, [email protected]

The past year (2022) has seen regulators in many countries develop or mandate requirements to report data centers’ operating information and environmental performance metrics. The first of these, the European Commission (EC) Energy Efficiency Directive (EED) recast is currently under review by the European Parliament and is expected to become law in 2023. This directive will mandate three levels of information reporting, the application and publication of energy performance improvement and efficiency metrics, and conformity with certain energy efficiency requirements (see EU’s EED recast set to create reporting challenges).

Similar legislative and regulatory initiatives are now appearing in the US with the White House Office of Technology and Science Policy’s (OTSP’s) Climate and energy implications of crypto-assets in the US report, published in September 2022. Concurrently with this, Senator Sheldon Whitehouse is drafting complimentary legislation that addresses both crypto and conventional data centers and sets the stage for the introduction of similar regulation to the EED over the next three to five years.

The OTSP report focuses on the impacts of the recent precipitous increase in energy consumption resulting from cryptocurrency mining in the US — initially driven by high crypto prices, low electricity costs and China’s prohibition of cryptomining operations. The OTSP report estimates cryptomining energy consumption (for both Bitcoin and Ethereum mining) to be responsible for 0.9% to 1.7% of US electricity consumption, and for 0.4% to 0.8% of greenhouse gas (GHG) emissions, in 2021.

The OTSP’s projections may already be out of date due to the current high energy prices and the collapse in value of most crypto assets. The OTSP’s projections, moreover, do not take into account the likely impact of Ethereum mining operations (estimated to account for one-quarter to one-third of industry consumption) moving from “proof of work” (PoW) to “proof of stake” (PoS).

PoW is the original “consensus mechanism” used in cryptocurrency transactions, whereby miners compete to solve increasingly difficult algorithms to validate transactions — at the cost of ever-increasing energy consumption. PoS transactions are mediated by randomly selected miners who stake a quantity of cryptocurrency (and their experience level) for the right to confirm transactions — enabling the use of less computationally intense (and therefore less energy-intense) algorithms. Ethereum converted to PoS in September 2022 in an initiative known as “the Merge”: this change is expected to reduce its mining energy consumption by over 99%.

The OTSP report implies that the broader adoption of crypto assets and the application of the underlying blockchain software used across a range of business processes will continue to drive increasing blockchain-related energy consumption. The report does not offer a specific projection of increasing energy consumption from cryptomining and further blockchain deployments. Given that most, if not all, enterprise blockchain deployments use PoS validation, and given the ability of PoW infrastructure to move quickly to locations with minimal regulation and energy costs, much of this anticipated energy growth may not materialize.

To mitigate this projected growth in energy consumption, the OTSP report calls on the federal government to encourage and ensure the responsible development of cryptomining operations in three specific areas.

Minimizing GHG emissions and other impacts from cryptomining operations. The report proposes that the US government implement a collaborative process to develop effective, evidence-based environmental performance standards governing the development, design and operation of cryptomining facilities. It proposes that the Department of Energy (DOE) or the Environmental Protection Agency (EPA) should be empowered to set energy performance standards for “crypto-asset mining equipment, blockchain and other operations.”
Requiring cryptomining organizations to obtain and publicly report data in order to understand, monitor and mitigate impacts. The report stipulates that cryptomining operations should publicly report their location(s), energy consumption, energy mix, GHG emissions (using existing protocols), electronic waste recycling, environmental justice implications and demand-response participation.
Promoting further research to improve understanding and innovation. The report recommends prioritizing research and development in next-generation digital asset technologies that promote the US’ goals in terms of security, privacy, equity, resiliency and climate.

While these recommendations are primarily directed at cryptomining operations, the report also assesses conventional (i.e., non-crypto-asset) data center operations, noting that cryptomining energy consumption in 2021 was roughly comparable to that of conventional data centers. This clearly raises the question: if cryptomining energy consumption warrants public data reporting and energy performance standards, then why should conventional data center operations not also be included in that mandate?

Under US law, Congress would need to pass legislation authorizing an administrative agency to require data centers to report their location(s), operational data and environmental performance information. Senator Whitehouse is developing draft legislation to address both crypto-asset and conventional data centers, using the EED as a blueprint. The Senator’s proposals would amend the Energy Independence and Security Act of 2007 (EISA) to require all public and private conventional and cryptomining data center locations with more than 100 kW of installed IT equipment (nameplate power) to report data to the Energy Information Administration (EIA). These data center locations would need to outline their operating attributes: a requirement remarkably similar to the EED’s information reporting mandates.

The proposals also require the DOE to promulgate a final rule covering energy conservation standards for “Servers and Equipment for Cryptomining” within two years of the EISA amendments going into force. While this requirement is specific to cryptomining equipment, it is likely that the DOE will lobby Congress to include energy conservation standards for conventional data center IT equipment as part of these proposed amendments. The DOE has already attempted to set energy conservation standards for computer servers (79 FR 11350 02/28/2014) through authority granted under the EISA regulating commercial office equipment.

Little will happen immediately. Legislative and regulatory processes and procedures in the US can be laborious, and final standards governing data center information and energy efficiency reporting are likely to remain several years away. But the release of the OTSP report and the development of draft US legislation indicate that the introduction and adoption of these standards is a matter of “when” (and how strictly?) rather than “if”.

Owners and operators of digital infrastructure need to be prepared. The eventual promulgation of these standards, taken in conjunction with proposed regulation on climate change disclosures from the Securities and Exchange Commission will, sooner or later, dictate that operators establish data collection and management processes to meet information reporting requirements. Operators will need to develop a strategy for meeting these requirements and will need to have policies in place to ensure they undertake projects that increase the work delivered per megawatt-hour of energy consumed across their data center operations.

Data center managers would also be wise to engage with industry efforts to develop simple and effective energy-efficiency metrics. These metrics are required under both US draft legislation and the EC EED recast and are likely to be included in legislation and regulation in other jurisdictions. An ITI Green Grid (TGG) Working Group has been put in place to work on this issue, and other efforts have been proposed by groups and institutions such as Infrastructure Masons (iMasons) and the Climate Neutral Data Centre Pact. Uptime Institute is also providing detailed feedback on behalf of its members on an EC study proposing options and making recommendations for data reporting and metrics as required under the EED recast.

Industry initiatives that encompass all types of IT operations are going to be important. Just as importantly, the industry will need to converge on a single and cohesive globally applicable metric (or set of metrics) to facilitate standardized reporting and minimize confusion.

Rapid interconnectivity growth will add complexity and risk

January 25, 2023/in Executive, Operations/by Lenny Simon, Senior Research Associate, Uptime Institute

Recent geopolitical concerns, predictions of a looming recession, and continued supply chain difficulties are unlikely to dampen growth in digital bandwidth on private networks according to Equinix’s 2022 Global Interconnection Index (GXI). Global interconnection bandwidth (the volume of data exchanged between companies directly, bypassing the public internet) is a barometer for digital infrastructure and sheds light on the difference in dynamics between verticals. High growth in private interconnection is a boon for Equinix as the world’s largest colocation provider by market share but makes resiliency more challenging for its customers: all these interconnects are also potential points of failure.

The Equinix GXI projects strong growth across the industry in 2023, with global interconnection bandwidth projected to increase by 41% compared to 2022. Overall, global interconnection bandwidth is projected to grow by a compound annual growth rate (CAGR) of 40% into 2025, when it is expected to reach nearly 28,000 terabits per second (tbps). These numbers include direct connections between enterprises and their digital business partners (such as telecommunications, cloud, edge, and software as a service (SaaS) providers).

The Equinix study projects faster growth in private interconnection for enterprises than for networks operated by telecommunications companies or cloud providers. This growth in private interconnection is driven by high demand for digital services and products — many of which also require a presence with multiple cloud providers as well as integration with major SaaS companies.

The energy and utility sector is likely to see the greatest growth in private network interconnection through 2025, with a CAGR of 53%, as energy data becomes increasingly important for managing intermittent renewable energy and decarbonizing the grid. Digital services supporting sustainability efforts such as carbon accounting are likely to require additional private interconnection with SaaS providers to accurately track operational sustainability metrics.

The banking and insurance and manufacturing sectors are expected to see CAGRs of 49% and 45%, respectively, over the same period. These industries are particularly sensitive to errors and outages, however, and appropriate planning will be necessary.

There is a reason Equinix has been drawing attention to the benefits of interconnection for the past six years: as at Q2 2022 the company operates 435,800 cross-connects throughout its own data centers. Its closest competitor, Digital Realty, reported just 185,000 cross-connects at its facilities in the same quarter. Equinix defines a cross-connect as a point-to-point cable link between two customers in the same retail colocation data center. For colocation companies, cross-connects not only represent core recurring revenue streams but also make their network-rich facilities more valuable as integration hubs between organizations.

As private interconnection increases, so too does the interdependency of digital infrastructure. Strong growth in interconnection may be responsible for the increasing proportion of networking and third-party-related outages in recent years. Uptime’s 2022 resiliency survey sheds light on the two most common causes of connectivity-related outages: misconfiguration and change management failure (reported by 43% of survey respondents); and third-party network-provider failure (43%). Asked specifically if their organization had suffered an outage caused by a problem with a third-party supplier, 39% of respondents confirmed this to be the case (see Figure 1).

diagram: Most common causes of major third-party outages — **Figure 1. The most common causes of major third-party outages**

When third-party IT and data center service providers do have an outage, customers are immediately affected — and may seek compensation. Enterprise end-users will need additional transparency and stronger service-level agreements from providers to better manage additional points of failure, as well as the outsourcing of their architecture resiliency. Importantly, managing the added complexity of an enterprise IT architecture spanning on-premises, colocation and cloud facilities demands more organizational resources in terms of skilled staff, time and budget.

Failing that, businesses might encounter unexpected availability and reliability issues rather than any anticipated improvement. According to Uptime’s 2021 annual survey of IT and data center managers, one in eight (of those who had a view) reported that using a mix of IT venues had resulted in their organization experiencing a deterioration in service resiliency, rather than the reverse.

By: Lenny Simon, Senior Research Associate and Max Smolaks, Analyst

Reports of cloud decline have been greatly exaggerated

January 18, 2023/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

Cloud providers have experienced unprecedented growth over the past few years. CIOs the world over, often prompted by CFOs and CEOs, have been favoring the cloud over on-premises IT for new and major projects — with the result that the largest cloud provider, Amazon Web Services (AWS), has seen revenue increase by 30% to 40% every year since 2014 (when it recorded an 80% jump in turnover). Microsoft Azure and Google have reported similar numbers in recent times.

But there are signs of a slowdown:

While AWS reported a quarter-on-quarter revenue increase of 27.5% for Q3 2022, this is down from 33% in Q2 — the slowest growth in its history.
Microsoft’s CFO has also commented that Azure could see revenue growth decline in their next quarter, following disappointing 35% growth in the three months to September 2022.

Why this slowdown in cloud growth?

The global macroeconomic environment — specifically, high energy costs together with inflation — is making organizations more cautious about spending money. Cloud development projects are no different from many others and are likely to be postponed or deprioritized due to rising costs, skill shortages and global uncertainty.

Some moves to the cloud may have been indefinitely deferred. Public cloud is not always cheaper than on-premises implementations, and many organizations may have concluded that migration is just not worthwhile in light of other financial pressures.

For those organizations that have already built cloud-based applications it is neither feasible nor wise to turn off applications or resources to save money: these organizations are, instead, spending more time examining and optimizing their costs.

Cutting cloud costs, not consumption

Cloud providers’ top-line revenue figures suggest customers are successfully reducing their cloud costs. How are they doing this?

Optimizing cloud expenditure involves two key activities: first, eliminating waste (such as orphaned resources and poorly sized virtual machines); and second, more cost-effective procurement, through alternative pricing models such as consistent-usage commitments or spot instances — both of which, crucially, reduce expenditure without impacting application performance.

Hyperscaler cloud providers, which are more interested in building longer-term relationships than in deriving higher gross margins in the short term, offer tools to help users reduce expenditure. These tools have improved significantly over the past few years.

Many organizations have now crossed a threshold in terms of cloud use, where the savings to be made mean it is to their benefit to invest in optimization (using these tools). One factor driving optimization here is higher cloud expenditure — in part an ongoing consequence of the pandemic, which saw businesses retooling IT to survive, rather than focusing on cutting IT costs.

It should, perhaps, have been anticipated that customers would, at some point, start using these tools to their own advantage — current pressures on other costs having made cutting IT expenditure more critical than before.

Will cloud prices rise?

Cloud providers’ overriding objective of winning and keeping customers over the long term explains why hyperscalers are likely to try and avoid increasing their prices for the foreseeable future. Providers want to maintain good relationships with their customers so that they are the de facto provider of choice for new projects and developments: price hikes would damage the customer trust they’ve spent so long cultivating.

AWS’s Q3 2022 gross margin was 26%, some 3% down on Q2. This drop in margin could be attributed to rising energy costs, which AWS states almost doubled over the same period (hedging and long-term purchase agreements notwithstanding). Microsoft has reported it will face additional energy costs of $800 million this financial year. While AWS and Microsoft could have increased prices to offset rising energy costs and maintain their profit margins they have, so far, chosen not to do so rather than risk damaging customers’ trust.

How will this play out, going forward? Financial pressures may make organizations more careful about cloud spending. Projects may be subject to more stringent justification and approval, and some migrations are likely to be delayed (or even cancelled) for now. As revenue increases in absolute terms, achieving high-percentage revenue gains becomes increasingly difficult. Nonetheless, while the days of 40% revenue jumps may be over, this recent downturn is unlikely to be the start of a rapid downward spiral. AWS’s Q3 2022 revenue growth may have shrunk in percentage terms: but it was still in excess of $4 billion.

Applications architected for the cloud should be automatically scalable, and capable of meeting customers’ requirements without their having to spend more than necessary. Cloud applications allow organizations to adapt their business models and / or drive innovation — which may be one of the reasons many have been able to survive (and, in some cases, thrive) during challenging times. In a sense, the decline in growth that the cloud companies have suffered recently demonstrates that the cloud model is working exactly as intended.

The hyperscaler cloud providers are likely to continue to expand globally and create new products and services. Enterprise customers, in turn, are likely to continue to find cloud services competitive in comparison with colocation-based or on-premises alternatives. Much of the cloud’s value comes from a perception of it offering “unlimited” resources. If providers don’t increase capacity, they risk failing to meet customers’ expectations when required — damaging credibility, and relationships. AWS, Google and Microsoft continue to compete for market share, worldwide. Reducing investment now could risk future profitability.

AWS currently has 13,000 vacancies advertised on its website — a sign that the cloud sector is certainly not in retreat. This fact, rather, suggests future growth will be strong.

Major data center fire highlights criticality of IT services

January 11, 2023/in Design, Executive, Operations/by Daniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]

Uptime Institute’s outages database suggests data center fires are infrequent, and rarely have a significant impact on operations. Uptime has identified 14 publicly reported, high-profile data center outages caused by fire or fire suppression systems since 2020. The frequency of fires is not increasing relative to the IT load or number of data centers but, uncontained, they are potentially disastrous to facilities, and subsequent outages can be ruinous for the business.

SK Group, South Korea’s second largest conglomerate, is the latest high-profile organization to suffer a major data center fire, following a breakout at a multistory colocation facility operated by its SK Inc. C&C subsidiary in Pangyo (just south of Seoul) on October 15. According to police reports, the fire started in a battery room before spreading quickly to the rest of the building. It took firefighters around eight hours to bring the blaze under control.

While there were no reported injuries, this incident could prove to be the largest data center outage caused by fire to date. It is a textbook example of how seemingly minor incidents can escalate to wreak havoc through cascading interdependencies in IT services.

The incident took tens of thousands of servers offline, including not only SK Group’s own systems but also the IT infrastructure running South Korea’s most popular messaging and single sign-on platform, KakaoTalk. The outage disrupted its integrated mobile payment system, transport app, gaming platform and music service — all of which are used by millions. The outage also affected domestic cloud giant Naver (the “Google of South Korea”) which reported disruption to its online search, shopping, media and blogging services.

While SK Group has yet to disclose the root cause of the fire, Kakao, the company behind KakaoTalk, has pointed to the Lithium-ion (Li-ion) batteries deployed at the facility — manufactured by SK on, another SK Group subsidiary. In response, SK Group has released what it claims are records from its battery management system (BMS) showing no deviation from normal operations prior to the incident. Some local media reports contradict this, however, claiming multiple warnings were, in fact, generated by the BMS. Only a thorough investigation will settle these claims. In the meantime, both sides are reported to be “lawyering up.”

The fallout from the outage is not limited to service disruptions or lost revenue, and has prompted a statement from the country’s president, Yoon Suk-yeol, who has promised a thorough investigation into the causes of, and the extent of the damages arising from, the fire. The incident has, so far, led to a police raid on SK Inc. C&C headquarters; the resignation of Kakao co-CEO Whon Namkoong; and the establishment of a national task force for disaster prevention involving military officials and the national intelligence agency. Multiple class-action lawsuits against Kakao are in progress, mainly based on claims that the company has prioritized short-term profits over investment in more resilient IT infrastructure.

The South Korean government has announced a raft of measures aimed at preventing large-scale digital service failures. All large data centers will now be subject to disaster management procedures defined by the government, including regular inspections and safety drills. Longer-term, the country’s Ministry of Science and ICT will be pushing for the development of battery technologies posing a lower fire risk — a matter of national interest for South Korea, home to some of the world’s largest Li-ion cell manufacturers including Samsung SDI and LG Chem, in addition to SK on.

The fire in South Korea will inevitably draw comparisons with the data center fire that brought down the OVHcloud Strasbourg facility in 2021. Impacting some 65,000 customers, many of whom lost their data in the blaze (see Learning from the OVHcloud data center fire), this fire, as in Pangyo, was thought to have involved uninterruptible power supply (UPS) systems. According to the French Bureau of Investigation and Analysis on Industrial Risks (BEA-RI), the lack of an automatic fire extinguisher system, delayed electrical cutoff and building design all contributed to the spread of the blaze.

A further issue arising from this outage, and one that remains to be determined, is the financial cost to SK Group, Kakao and Naver. The fire at the OVHcloud Strasbourg facility was estimated to cost the operator more than €105 million — with less than half of this being covered by insurance. The cost of the fire in Pangyo is likely to run into tens (if not hundreds) of millions of dollars. This should serve as a timely reminder of the importance of fire suppression, particularly in battery rooms.

Li-ion batteries in mission-critical applications — risk creep?

Li-ion batteries present a greater fire risk than valve-regulated lead-acid batteries, regardless of their specific chemistries and construction – a position endorsed by the US’ National Fire Protection Association and others. Since the breakdown of cells in Li-ion batteries produces combustible gases (including oxygen) which can result in a major thermal-runaway event (in which the fire spreads uncontrollably between cells, across battery packs and, potentially, even cabinets if these are inappropriately distanced), the fires they cause are notoriously difficult to suppress.

Many operators have, hitherto, found the risk-reward profile of Li-ion batteries (in terms of their lower footprint and longer lifespan) to be acceptable. Uptime surveys show major UPS vendors reporting strong uptake of Li-ion batteries in data center and industrial applications: some vendors report shipping more than half their major three-phase UPS systems with Li-ion battery strings. According to the Uptime Institute Global Data Center Survey 2021, nearly half of operators have adopted this technology for their centralized UPS plants, up from about a quarter three years ago. The Uptime Institute Global Data Center Survey 2022 found Li-ion adoption levels to be increasing still further (see Figure 1).

diagram of Data centers are embracing Li-ion batteries — **Figure 1. Data centers are embracing Li-ion batteries**

The incident at the SK Inc. C&C facility highlights the importance of selecting appropriate fire suppression systems, and the importance of fire containment as part of resiliency. Most local regulation governing fire prevention and mitigation concentrates (rightly) on securing people’s safety, rather than on protecting assets. Data center operators, however, have other critically important issues to consider — including equipment protection, operational continuity, disaster recovery and mean time to recovery.

While gaseous (or clean agent) suppression is effective at slowing down the spread of a fire in the early stages of Li-ion cell failure (when coupled with early detection), it is arguably less suitable for handling a major thermal-runaway event. The cooling effects of water and foam mean these are likely to perform better; double-interlock pre-action sprinklers also limit the spread. Placing battery cabinets farther apart can help prevent or limit the spread of a major fire. Dividing battery rooms into fire-resistant compartments (a measure mandated by Uptime Institute’s Tier IV resiliency requirements) can further decrease the risk of facility-wide outages.

Such extensive fire prevention measures could, however, compromise the benefits of Li-ion batteries in terms of their higher volumetric energy density, lower cooling needs and overall advantage in lifespan costs (particularly where space is at a premium).

Advances in Li-ion chemistries and cell assembly will address operational safety concerns — lithium iron phosphate, with its higher ignition point and no release of oxygen during decomposition – being a case in point. Longer term, inherently safer, innovative chemistries — such as sodium-ion and nickel-zinc — will probably offer a more lasting solution to the safety (and sustainability) conundrum around Li-ion. Until then, the growing prevalence of vast amounts of Li-ion batteries in data centers means the propensity of violent fires can only grow — with potentially dire financial consequences.

By: Max Smolaks, Analyst, Uptime Institute Intelligence and Daniel Bizo, Research Director, Uptime Institute Intelligence

First signs of federal data center reporting mandates appear in US

Major data center fire highlights criticality of IT services

Li-ion batteries in mission-critical applications — risk creep?

Explaining the Uptime Institute’s Tier Classification System (April 2021 Update)

The Making of a Good Method of Procedure

A Look at Data Center Cooling Technologies

Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics

Implementing Data Center Cooling Best Practices