A debate has been raging since cloud computing entered the mainstream: which is the cheaper venue for enterprise customers — cloud or on-premises data centers? This debate has proved futile for two reasons. First, the characteristics of any specific application will dictate which venue is more expensive — there is no simple, unequivocal answer. Second — the question implies that a buyer would choose a cloud or on-premises data center primarily because it is cheaper. This is not necessarily the case.
Infrastructure is not a commodity. Most users will not choose a venue purely because it costs less. Users might choose to keep workloads within their data centers or at a colo because they want to be confident they are fully compliant with legislation and / or regulatory requirements, or to be situated close to end users. They might choose cloud computing for workloads that require rapid scalability, or to access platform services further up the stack. Of course, costs matter to CIOs and CFOs alike, but cloud computing, on-premises data centers and colos all deliver value beyond their relative cost differences.
One way of assessing the value of a product is through a price-sensitivity analysis, whereby users are asked how they would (hypothetically) respond to price changes. Users who derive considerable value from a product are less likely to change their buying behavior following any increase in cost. Users more sensitive to cost increases will typically consider competing offers to reduce or maintain costs. Switching costs are also a factor in a user’s sensitivity to price changes. In cloud computing, for example, the cost of rearchitecting an application as part of a migration might not be justifiable if the resultant ongoing cost savings are limited.
IT decision-makers surveyed as part of Uptime Intelligence’s Data Center Capacity Trends Survey 2022 were asked what percentage of current workloads they would be likely to migrate to the cloud if their existing data center costs (covering on-premises and colos) rose 10%, 50% or 100%, respectively (assuming cloud prices remained stable).
While Uptime has neither conducted nor seen extensive research into rising costs, most operators are likely to be experiencing strong inflationary pressures (i.e., of over 15%) on their operations: energy prices and staff shortages being the main drivers.
The survey responses are illustrated in two different formats:
Figure 1 summarizes the average percentage of workloads likely to be migrated to the cloud as a result of any increase in costs.
Figure 2 shows what percentage of respondentswould make no response to any such increases (shown as 0%), what proportion would be likely to migrate some of their workloads (10% to 50%) and what proportion would be likely to migrate most of their workloads (50% or more).
What does this data tell us? Figure 1 shows that if on-premises or colo costs were to increase by 10%, then around 12% of workloads could migrate to the cloud. If costs were to increase by 50%, approximately 24% of workloads would potentially move to the cloud. Even if costs were to double, however, only just over 30% of workloads would be likely to migrate to the public cloud. This suggests that on-premises and colo users are not particularly price-sensitive. While they are likely to have some impact, rising data center costs per se are unlikely to trigger a mass exodus to the public cloud.
Some users are more price sensitive than others, however. Figure 2 shows that 42% of respondents indicate a 10% increase in costs would not drive any workloads to the public cloud. One quarter of respondents would still be unlikely to migrate workloads even if faced with price hikes of 50%. Notably, a quarter of respondents indicate they would not migrate any workloads even if costs were to double. This may suggest that at least 25% of those organizations surveyed do not consider the public cloud to be a viable option for their workloads currently.
This reluctance may be the result of several factors. Some respondents may derive value from hosting workloads in non-cloud data centers and may believe this to justify any additional expense. Others may believe that regulatory, technical and compliance issues render the public cloud unviable, making cost implications irrelevant. Some users may feel that moving to the public cloud is simply cost-prohibitive.
Most users are susceptible to price increases, however — at least to some extent. A 10% increase in costs would drive 55% of organizations to migrate some or most of their workloads to the cloud. A total of 59% of respondents indicate they would do so if faced with a more substantial 50% increase. Faced with a doubling of their costs, over a quarter of respondents would migrate most of their workloads to the cloud. Again, this is assuming that cloud costs remain constant — and it is unlikely that cloud providers could absorb such significant upward cost pressures without any increase in prices.
Other survey data (not shown in graphics) indicates that even if infrastructure expenditure were to double, only 7% of respondents would migrate their entire workloads to the cloud. Given that 25% of respondents indicate that they would keep all workloads on-premises regardless of cost increases, this confirms that most users are adopting a hybrid IT approach. Most users are willing to consider on-premises and cloud facilities for their workloads, choosing the most appropriate option for each application.
Although the Uptime Intelligence Data Center Capacity Trends Survey 2022 did not, specifically, cover the impact of price reductions, it is possible to estimate the potential impacts of cloud providers cutting their rates. A price cut of 10% would be unlikely to attract significantly more workloads to the public cloud: but a 50% reduction would have a more dramatic impact. As indicated above, however, cloud providers — faced with the same energy-cost challenges as data center owners and colos — are more likely to absorb any cost increases in their gross margins rather than risk damaging their credibility by raising prices (see OVHcloud price hike shows cloud’s vulnerability to energy costs).
In conclusion:
Many organizations have no desire (or the ability) to use the public cloud, regardless of any cost increases, and will absorb any price hikes as best as they can.
Most organizations are adopting a hybrid IT approach and use a mix of cloud and on-premises locations for their workloads.
Rising costs (such as energy) may accelerate workload migration from on-premises data centers and colos to the public cloud (assuming cloud providers’ prices do not rise too).
The costs involved in moving applications and rearchitecting them to work effectively in the public cloud mean single-digit cost increases are likely to have only a minimal effect on migrations.
More significant cost increases could drive more workloads to the cloud since the savings to be made over the longer term could justify the switching costs involved.
Public-cloud price reductions could, similarly, accelerate cloud migration; however, dramatic price cuts are unlikely.
https://journal.uptimeinstitute.com/wp-content/uploads/2023/02/Higher-data-center-costs-unlikely-to-cause-exodus-to-public-cloud-featured.jpg5391030Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngDr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]2023-02-08 13:00:002023-02-06 11:34:16Higher data center costs unlikely to cause exodus to public cloud
The past year (2022) has seen regulators in many countries develop or mandate requirements to report data centers’ operating information and environmental performance metrics. The first of these, the European Commission (EC) Energy Efficiency Directive (EED) recast is currently under review by the European Parliament and is expected to become law in 2023. This directive will mandate three levels of information reporting, the application and publication of energy performance improvement and efficiency metrics, and conformity with certain energy efficiency requirements (see EU’s EED recast set to create reporting challenges).
Similar legislative and regulatory initiatives are now appearing in the US with the White House Office of Technology and Science Policy’s (OTSP’s) Climate and energy implications of crypto-assets in the US report, published in September 2022. Concurrently with this, Senator Sheldon Whitehouse is drafting complimentary legislation that addresses both crypto and conventional data centers and sets the stage for the introduction of similar regulation to the EED over the next three to five years.
The OTSP report focuses on the impacts of the recent precipitous increase in energy consumption resulting from cryptocurrency mining in the US — initially driven by high crypto prices, low electricity costs and China’s prohibition of cryptomining operations. The OTSP report estimates cryptomining energy consumption (for both Bitcoin and Ethereum mining) to be responsible for 0.9% to 1.7% of US electricity consumption, and for 0.4% to 0.8% of greenhouse gas (GHG) emissions, in 2021.
The OTSP’s projections may already be out of date due to the current high energy prices and the collapse in value of most crypto assets. The OTSP’s projections, moreover, do not take into account the likely impact of Ethereum mining operations (estimated to account for one-quarter to one-third of industry consumption) moving from “proof of work” (PoW) to “proof of stake” (PoS).
PoW is the original “consensus mechanism” used in cryptocurrency transactions, whereby miners compete to solve increasingly difficult algorithms to validate transactions — at the cost of ever-increasing energy consumption. PoS transactions are mediated by randomly selected miners who stake a quantity of cryptocurrency (and their experience level) for the right to confirm transactions — enabling the use of less computationally intense (and therefore less energy-intense) algorithms. Ethereum converted to PoS in September 2022 in an initiative known as “the Merge”: this change is expected to reduce its mining energy consumption by over 99%.
The OTSP report implies that the broader adoption of crypto assets and the application of the underlying blockchain software used across a range of business processes will continue to drive increasing blockchain-related energy consumption. The report does not offer a specific projection of increasing energy consumption from cryptomining and further blockchain deployments. Given that most, if not all, enterprise blockchain deployments use PoS validation, and given the ability of PoW infrastructure to move quickly to locations with minimal regulation and energy costs, much of this anticipated energy growth may not materialize.
To mitigate this projected growth in energy consumption, the OTSP report calls on the federal government to encourage and ensure the responsible development of cryptomining operations in three specific areas.
Minimizing GHG emissions and other impacts from cryptomining operations. The report proposes that the US government implement a collaborative process to develop effective, evidence-based environmental performance standards governing the development, design and operation of cryptomining facilities. It proposes that the Department of Energy (DOE) or the Environmental Protection Agency (EPA) should be empowered to set energy performance standards for “crypto-asset mining equipment, blockchain and other operations.”
Requiring cryptomining organizations to obtain and publicly report data in order to understand, monitor and mitigate impacts. The report stipulates that cryptomining operations should publicly report their location(s), energy consumption, energy mix, GHG emissions (using existing protocols), electronic waste recycling, environmental justice implications and demand-response participation.
Promoting further research to improve understanding and innovation. The report recommends prioritizing research and development in next-generation digital asset technologies that promote the US’ goals in terms of security, privacy, equity, resiliency and climate.
While these recommendations are primarily directed at cryptomining operations, the report also assesses conventional (i.e., non-crypto-asset) data center operations, noting that cryptomining energy consumption in 2021 was roughly comparable to that of conventional data centers. This clearly raises the question: if cryptomining energy consumption warrants public data reporting and energy performance standards, then why should conventional data center operations not also be included in that mandate?
Under US law, Congress would need to pass legislation authorizing an administrative agency to require data centers to report their location(s), operational data and environmental performance information. Senator Whitehouse is developing draft legislation to address both crypto-asset and conventional data centers, using the EED as a blueprint. The Senator’s proposals would amend the Energy Independence and Security Act of 2007 (EISA) to require all public and private conventional and cryptomining data center locations with more than 100 kW of installed IT equipment (nameplate power) to report data to the Energy Information Administration (EIA). These data center locations would need to outline their operating attributes: a requirement remarkably similar to the EED’s information reporting mandates.
The proposals also require the DOE to promulgate a final rule covering energy conservation standards for “Servers and Equipment for Cryptomining” within two years of the EISA amendments going into force. While this requirement is specific to cryptomining equipment, it is likely that the DOE will lobby Congress to include energy conservation standards for conventional data center IT equipment as part of these proposed amendments. The DOE has already attempted to set energy conservation standards for computer servers (79 FR 11350 02/28/2014) through authority granted under the EISA regulating commercial office equipment.
Little will happen immediately. Legislative and regulatory processes and procedures in the US can be laborious, and final standards governing data center information and energy efficiency reporting are likely to remain several years away. But the release of the OTSP report and the development of draft US legislation indicate that the introduction and adoption of these standards is a matter of “when” (and how strictly?) rather than “if”.
Owners and operators of digital infrastructure need to be prepared. The eventual promulgation of these standards, taken in conjunction with proposed regulation on climate change disclosures from the Securities and Exchange Commission will, sooner or later, dictate that operators establish data collection and management processes to meet information reporting requirements. Operators will need to develop a strategy for meeting these requirements and will need to have policies in place to ensure they undertake projects that increase the work delivered per megawatt-hour of energy consumed across their data center operations.
Data center managers would also be wise to engage with industry efforts to develop simple and effective energy-efficiency metrics. These metrics are required under both US draft legislation and the EC EED recast and are likely to be included in legislation and regulation in other jurisdictions. An ITI Green Grid (TGG) Working Group has been put in place to work on this issue, and other efforts have been proposed by groups and institutions such as Infrastructure Masons (iMasons) and the Climate Neutral Data Centre Pact. Uptime Institute is also providing detailed feedback on behalf of its members on an EC study proposing options and making recommendations for data reporting and metrics as required under the EED recast.
Industry initiatives that encompass all types of IT operations are going to be important. Just as importantly, the industry will need to converge on a single and cohesive globally applicable metric (or set of metrics) to facilitate standardized reporting and minimize confusion.
https://journal.uptimeinstitute.com/wp-content/uploads/2023/01/First-signs-of-federal-data-center-reporting-mandates-appear-in-US-featured.jpg5391030Jay Dietrich, Research Director of Sustainability, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngJay Dietrich, Research Director of Sustainability, Uptime Institute, [email protected]2023-02-01 13:00:002023-01-30 10:51:42First signs of federal data center reporting mandates appear in US
Recent geopolitical concerns, predictions of a looming recession, and continued supply chain difficulties are unlikely to dampen growth in digital bandwidth on private networks according to Equinix’s 2022 Global Interconnection Index (GXI). Global interconnection bandwidth (the volume of data exchanged between companies directly, bypassing the public internet) is a barometer for digital infrastructure and sheds light on the difference in dynamics between verticals. High growth in private interconnection is a boon for Equinix as the world’s largest colocation provider by market share but makes resiliency more challenging for its customers: all these interconnects are also potential points of failure.
The Equinix GXI projects strong growth across the industry in 2023, with global interconnection bandwidth projected to increase by 41% compared to 2022. Overall, global interconnection bandwidth is projected to grow by a compound annual growth rate (CAGR) of 40% into 2025, when it is expected to reach nearly 28,000 terabits per second (tbps). These numbers include direct connections between enterprises and their digital business partners (such as telecommunications, cloud, edge, and software as a service (SaaS) providers).
The Equinix study projects faster growth in private interconnection for enterprises than for networks operated by telecommunications companies or cloud providers. This growth in private interconnection is driven by high demand for digital services and products — many of which also require a presence with multiple cloud providers as well as integration with major SaaS companies.
The energy and utility sector is likely to see the greatest growth in private network interconnection through 2025, with a CAGR of 53%, as energy data becomes increasingly important for managing intermittent renewable energy and decarbonizing the grid. Digital services supporting sustainability efforts such as carbon accounting are likely to require additional private interconnection with SaaS providers to accurately track operational sustainability metrics.
The banking and insurance and manufacturing sectors are expected to see CAGRs of 49% and 45%, respectively, over the same period. These industries are particularly sensitive to errors and outages, however, and appropriate planning will be necessary.
There is a reason Equinix has been drawing attention to the benefits of interconnection for the past six years: as at Q2 2022 the company operates 435,800 cross-connects throughout its own data centers. Its closest competitor, Digital Realty, reported just 185,000 cross-connects at its facilities in the same quarter. Equinix defines a cross-connect as a point-to-point cable link between two customers in the same retail colocation data center. For colocation companies, cross-connects not only represent core recurring revenue streams but also make their network-rich facilities more valuable as integration hubs between organizations.
As private interconnection increases, so too does the interdependency of digital infrastructure. Strong growth in interconnection may be responsible for the increasing proportion of networking and third-party-related outages in recent years. Uptime’s 2022 resiliency survey sheds light on the two most common causes of connectivity-related outages: misconfiguration and change management failure (reported by 43% of survey respondents); and third-party network-provider failure (43%). Asked specifically if their organization had suffered an outage caused by a problem with a third-party supplier, 39% of respondents confirmed this to be the case (see Figure 1).
When third-party IT and data center service providers do have an outage, customers are immediately affected — and may seek compensation. Enterprise end-users will need additional transparency and stronger service-level agreements from providers to better manage additional points of failure, as well as the outsourcing of their architecture resiliency. Importantly, managing the added complexity of an enterprise IT architecture spanning on-premises, colocation and cloud facilities demands more organizational resources in terms of skilled staff, time and budget.
Failing that, businesses might encounter unexpected availability and reliability issues rather than any anticipated improvement. According to Uptime’s 2021 annual survey of IT and data center managers, one in eight (of those who had a view) reported that using a mix of IT venues had resulted in their organization experiencing a deterioration in service resiliency, rather than the reverse.
By: Lenny Simon, Senior Research Associate and Max Smolaks, Analyst
https://journal.uptimeinstitute.com/wp-content/uploads/2023/01/Rapid-interconnectivity-growth-will-add-complexity-and-risk-featured.jpg5391030Lenny Simon, Senior Research Associate, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngLenny Simon, Senior Research Associate, Uptime Institute2023-01-25 13:00:002023-01-23 16:43:45Rapid interconnectivity growth will add complexity and risk
Cloud providers have experienced unprecedented growth over the past few years. CIOs the world over, often prompted by CFOs and CEOs, have been favoring the cloud over on-premises IT for new and major projects — with the result that the largest cloud provider, Amazon Web Services (AWS), has seen revenue increase by 30% to 40% every year since 2014 (when it recorded an 80% jump in turnover). Microsoft Azure and Google have reported similar numbers in recent times.
But there are signs of a slowdown:
While AWS reported a quarter-on-quarter revenue increase of 27.5% for Q3 2022, this is down from 33% in Q2 — the slowest growth in its history.
Microsoft’s CFO has also commented that Azure could see revenue growth decline in their next quarter, following disappointing 35% growth in the three months to September 2022.
Why this slowdown in cloud growth?
The global macroeconomic environment — specifically, high energy costs together with inflation — is making organizations more cautious about spending money. Cloud development projects are no different from many others and are likely to be postponed or deprioritized due to rising costs, skill shortages and global uncertainty.
Some moves to the cloud may have been indefinitely deferred. Public cloud is not always cheaper than on-premises implementations, and many organizations may have concluded that migration is just not worthwhile in light of other financial pressures.
For those organizations that have already built cloud-based applications it is neither feasible nor wise to turn off applications or resources to save money: these organizations are, instead, spending more time examining and optimizing their costs.
Cutting cloud costs, not consumption
Cloud providers’ top-line revenue figures suggest customers are successfully reducing their cloud costs. How are they doing this?
Optimizing cloud expenditure involves two key activities: first, eliminating waste (such as orphaned resources and poorly sized virtual machines); and second, more cost-effective procurement, through alternative pricing models such as consistent-usage commitments or spot instances — both of which, crucially, reduce expenditure without impacting application performance.
Hyperscaler cloud providers, which are more interested in building longer-term relationships than in deriving higher gross margins in the short term, offer tools to help users reduce expenditure. These tools have improved significantly over the past few years.
Many organizations have now crossed a threshold in terms of cloud use, where the savings to be made mean it is to their benefit to invest in optimization (using these tools). One factor driving optimization here is higher cloud expenditure — in part an ongoing consequence of the pandemic, which saw businesses retooling IT to survive, rather than focusing on cutting IT costs.
It should, perhaps, have been anticipated that customers would, at some point, start using these tools to their own advantage — current pressures on other costs having made cutting IT expenditure more critical than before.
Will cloud prices rise?
Cloud providers’ overriding objective of winning and keeping customers over the long term explains why hyperscalers are likely to try and avoid increasing their prices for the foreseeable future. Providers want to maintain good relationships with their customers so that they are the de facto provider of choice for new projects and developments: price hikes would damage the customer trust they’ve spent so long cultivating.
AWS’s Q3 2022 gross margin was 26%, some 3% down on Q2. This drop in margin could be attributed to rising energy costs, which AWS states almost doubled over the same period (hedging and long-term purchase agreements notwithstanding). Microsoft has reported it will face additional energy costs of $800 million this financial year. While AWS and Microsoft could have increased prices to offset rising energy costs and maintain their profit margins they have, so far, chosen not to do so rather than risk damaging customers’ trust.
How will this play out, going forward? Financial pressures may make organizations more careful about cloud spending. Projects may be subject to more stringent justification and approval, and some migrations are likely to be delayed (or even cancelled) for now. As revenue increases in absolute terms, achieving high-percentage revenue gains becomes increasingly difficult. Nonetheless, while the days of 40% revenue jumps may be over, this recent downturn is unlikely to be the start of a rapid downward spiral. AWS’s Q3 2022 revenue growth may have shrunk in percentage terms: but it was still in excess of $4 billion.
Applications architected for the cloud should be automatically scalable, and capable of meeting customers’ requirements without their having to spend more than necessary. Cloud applications allow organizations to adapt their business models and / or drive innovation — which may be one of the reasons many have been able to survive (and, in some cases, thrive) during challenging times. In a sense, the decline in growth that the cloud companies have suffered recently demonstrates that the cloud model is working exactly as intended.
The hyperscaler cloud providers are likely to continue to expand globally and create new products and services. Enterprise customers, in turn, are likely to continue to find cloud services competitive in comparison with colocation-based or on-premises alternatives. Much of the cloud’s value comes from a perception of it offering “unlimited” resources. If providers don’t increase capacity, they risk failing to meet customers’ expectations when required — damaging credibility, and relationships. AWS, Google and Microsoft continue to compete for market share, worldwide. Reducing investment now could risk future profitability.
AWS currently has 13,000 vacancies advertised on its website — a sign that the cloud sector is certainly not in retreat. This fact, rather, suggests future growth will be strong.
https://journal.uptimeinstitute.com/wp-content/uploads/2023/01/Reports-of-cloud-decline-exaggerated-featured.jpg5391030Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngDr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]2023-01-18 15:00:002023-01-18 12:23:29Reports of cloud decline have been greatly exaggerated
Uptime Institute’s outages database suggests data center fires are infrequent, and rarely have a significant impact on operations. Uptime has identified 14 publicly reported, high-profile data center outages caused by fire or fire suppression systems since 2020. The frequency of fires is not increasing relative to the IT load or number of data centers but, uncontained, they are potentially disastrous to facilities, and subsequent outages can be ruinous for the business.
SK Group, South Korea’s second largest conglomerate, is the latest high-profile organization to suffer a major data center fire, following a breakout at a multistory colocation facility operated by its SK Inc. C&C subsidiary in Pangyo (just south of Seoul) on October 15. According to police reports, the fire started in a battery room before spreading quickly to the rest of the building. It took firefighters around eight hours to bring the blaze under control.
While there were no reported injuries, this incident could prove to be the largest data center outage caused by fire to date. It is a textbook example of how seemingly minor incidents can escalate to wreak havoc through cascading interdependencies in IT services.
The incident took tens of thousands of servers offline, including not only SK Group’s own systems but also the IT infrastructure running South Korea’s most popular messaging and single sign-on platform, KakaoTalk. The outage disrupted its integrated mobile payment system, transport app, gaming platform and music service — all of which are used by millions. The outage also affected domestic cloud giant Naver (the “Google of South Korea”) which reported disruption to its online search, shopping, media and blogging services.
While SK Group has yet to disclose the root cause of the fire, Kakao, the company behind KakaoTalk, has pointed to the Lithium-ion (Li-ion) batteries deployed at the facility — manufactured by SK on, another SK Group subsidiary. In response, SK Group has released what it claims are records from its battery management system (BMS) showing no deviation from normal operations prior to the incident. Some local media reports contradict this, however, claiming multiple warnings were, in fact, generated by the BMS. Only a thorough investigation will settle these claims. In the meantime, both sides are reported to be “lawyering up.”
The fallout from the outage is not limited to service disruptions or lost revenue, and has prompted a statement from the country’s president, Yoon Suk-yeol, who has promised a thorough investigation into the causes of, and the extent of the damages arising from, the fire. The incident has, so far, led to a police raid on SK Inc. C&C headquarters; the resignation of Kakao co-CEO Whon Namkoong; and the establishment of a national task force for disaster prevention involving military officials and the national intelligence agency. Multiple class-action lawsuits against Kakao are in progress, mainly based on claims that the company has prioritized short-term profits over investment in more resilient IT infrastructure.
The South Korean government has announced a raft of measures aimed at preventing large-scale digital service failures. All large data centers will now be subject to disaster management procedures defined by the government, including regular inspections and safety drills. Longer-term, the country’s Ministry of Science and ICT will be pushing for the development of battery technologies posing a lower fire risk — a matter of national interest for South Korea, home to some of the world’s largest Li-ion cell manufacturers including Samsung SDI and LG Chem, in addition to SK on.
The fire in South Korea will inevitably draw comparisons with the data center fire that brought down the OVHcloud Strasbourg facility in 2021. Impacting some 65,000 customers, many of whom lost their data in the blaze (see Learning from the OVHcloud data center fire), this fire, as in Pangyo, was thought to have involved uninterruptible power supply (UPS) systems. According to the French Bureau of Investigation and Analysis on Industrial Risks (BEA-RI), the lack of an automatic fire extinguisher system, delayed electrical cutoff and building design all contributed to the spread of the blaze.
A further issue arising from this outage, and one that remains to be determined, is the financial cost to SK Group, Kakao and Naver. The fire at the OVHcloud Strasbourg facility was estimated to cost the operator more than €105 million — with less than half of this being covered by insurance. The cost of the fire in Pangyo is likely to run into tens (if not hundreds) of millions of dollars. This should serve as a timely reminder of the importance of fire suppression, particularly in battery rooms.
Li-ion batteries in mission-critical applications — risk creep?
Li-ion batteries present a greater fire risk than valve-regulated lead-acid batteries, regardless of their specific chemistries and construction – a position endorsed by the US’ National Fire Protection Association and others. Since the breakdown of cells in Li-ion batteries produces combustible gases (including oxygen) which can result in a major thermal-runaway event (in which the fire spreads uncontrollably between cells, across battery packs and, potentially, even cabinets if these are inappropriately distanced), the fires they cause are notoriously difficult to suppress.
Many operators have, hitherto, found the risk-reward profile of Li-ion batteries (in terms of their lower footprint and longer lifespan) to be acceptable. Uptime surveys show major UPS vendors reporting strong uptake of Li-ion batteries in data center and industrial applications: some vendors report shipping more than half their major three-phase UPS systems with Li-ion battery strings. According to the Uptime Institute Global Data Center Survey 2021, nearly half of operators have adopted this technology for their centralized UPS plants, up from about a quarter three years ago. The Uptime InstituteGlobal Data Center Survey 2022found Li-ion adoption levels to be increasing still further (see Figure 1).
The incident at the SK Inc. C&C facility highlights the importance of selecting appropriate fire suppression systems, and the importance of fire containment as part of resiliency. Most local regulation governing fire prevention and mitigation concentrates (rightly) on securing people’s safety, rather than on protecting assets. Data center operators, however, have other critically important issues to consider — including equipment protection, operational continuity, disaster recovery and mean time to recovery.
While gaseous (or clean agent) suppression is effective at slowing down the spread of a fire in the early stages of Li-ion cell failure (when coupled with early detection), it is arguably less suitable for handling a major thermal-runaway event. The cooling effects of water and foam mean these are likely to perform better; double-interlock pre-action sprinklers also limit the spread. Placing battery cabinets farther apart can help prevent or limit the spread of a major fire. Dividing battery rooms into fire-resistant compartments (a measure mandated by Uptime Institute’s Tier IV resiliency requirements) can further decrease the risk of facility-wide outages.
Such extensive fire prevention measures could, however, compromise the benefits of Li-ion batteries in terms of their higher volumetric energy density, lower cooling needs and overall advantage in lifespan costs (particularly where space is at a premium).
Advances in Li-ion chemistries and cell assembly will address operational safety concerns — lithium iron phosphate, with its higher ignition point and no release of oxygen during decomposition – being a case in point. Longer term, inherently safer, innovative chemistries — such as sodium-ion and nickel-zinc — will probably offer a more lasting solution to the safety (and sustainability) conundrum around Li-ion. Until then, the growing prevalence of vast amounts of Li-ion batteries in data centers means the propensity of violent fires can only grow — with potentially dire financial consequences.
By: Max Smolaks, Analyst, Uptime Institute Intelligence and Daniel Bizo, Research Director, Uptime Institute Intelligence
https://journal.uptimeinstitute.com/wp-content/uploads/2023/01/Major-data-center-fire-highlights-criticality-of-IT-services-featured.jpg5391030Daniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngDaniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]2023-01-11 13:00:002023-01-10 15:40:40Major data center fire highlights criticality of IT services
Amazon Web Services (AWS) has made a minor change to its private-cloud appliance, AWS Outposts, that could significantly impact resiliency. The cloud provider has enabled local access to cloud administration, removing the appliance’s reliance on the public cloud. In the event of a network failure between the public cloud and the user’s data center, the private-cloud container platform can still be configured and maintained.
Many public-cloud providers have extended their offerings to now make their services accessible through the user’s own choice of data center. Services are typically billed in the same way as they are via the public cloud, and accessed through the same portal and software interfaces, but are now delivered from hardware and software hosted in the user’s own facility. Such services are in demand from customers seeking to meet compliance or data protection requirements, or to improve the end-user experience through lower latency.
In one business model, the cloud provider ships a server-storage private-cloud appliance to an organization’s data center. The organization manages the data center. The public-cloud provider is responsible for the hardware and middleware that delivers the cloud functionality.
The term “private cloud” describes a cloud platform where the user has access to elements of the platform not usually accessible in the public cloud (such as the data center facility, hardware and middleware). These appliances are a particular type of private cloud, not designed to be operated independently of the public cloud. They are best thought of as extensions of the public cloud to the on-premises data center (or colocation facility) since administration and software maintenance is performed via the public cloud.
As the public and private cloud use the same platform and application programming interfaces (APIs), applications can be built across the organization’s and the cloud provider’s data centers, and the platform can be managed as one. For more information on private-cloud appliances (see the Uptime Institute Intelligence report Cloud scalability and resiliency from first principles).
The resilience of this architecture has not, hitherto, been assured because the application still relies on the cloud provider’s ability to manage some services, such as the management interface. The public-cloud provider controls the interface for interacting with the user’s on-premises cloud (the “control plane”); if that interface goes down, so too does the ability to administrate the on-premises cloud.
Ironically, it is precisely during an outage that an administrator is most likely to want to make such changes to configuration — to reserve capacity for mission-critical workloads or to reprioritize applications to handle the loss of public-cloud capacity, for example. If an AWS Outpost appliance were being used in a factory to support manufacturing machinery, for instance, the inability to configure local capabilities during a network failure could significantly affect production.
It is for this reason that AWS’s announcement that its Elastic Kubernetes Service product (Amazon EKS) can be managed locally on AWS Outposts is important. Kubernetes is a platform used to manage containers. This new capability allows users to configure API endpoints on the AWS Outposts appliance, meaning the container configuration can be changed via the local network without connecting to the public cloud.
In practical terms, this addition makes AWS Outposts more resilient to outages because it can function in the event of a connectivity failure between the cloud provider and the data center. AWS Outposts is now far more feasible as a disaster-recovery or failover location, and more appropriate for edge locations, where connectivity might be less assured.
The most important aspect of this development, however, is that it indicates AWS — the largest cloud provider — is perhaps acknowledging that users don’t just want an extension of the public cloud to their own facilities. Although many organizations are pursuing a hybrid-cloud approach, where public and private cloud platforms can work together, they don’t want to sacrifice the autonomy of each of those environments.
Organizations want venues to work independently of each other if required, avoiding single points of failure. To address this desire, other AWS Outposts services may be made locally configurable over time as users demand autonomy and greater control over their cloud applications.
https://journal.uptimeinstitute.com/wp-content/uploads/2022/11/Cloud-Autonomy-featured.jpg6281200Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngDr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]2022-12-07 08:00:002022-11-14 15:25:40Tweak to AWS Outposts reflects demand for greater cloud autonomy
Higher data center costs unlikely to cause exodus to public cloud
/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]A debate has been raging since cloud computing entered the mainstream: which is the cheaper venue for enterprise customers — cloud or on-premises data centers? This debate has proved futile for two reasons. First, the characteristics of any specific application will dictate which venue is more expensive — there is no simple, unequivocal answer. Second — the question implies that a buyer would choose a cloud or on-premises data center primarily because it is cheaper. This is not necessarily the case.
Infrastructure is not a commodity. Most users will not choose a venue purely because it costs less. Users might choose to keep workloads within their data centers or at a colo because they want to be confident they are fully compliant with legislation and / or regulatory requirements, or to be situated close to end users. They might choose cloud computing for workloads that require rapid scalability, or to access platform services further up the stack. Of course, costs matter to CIOs and CFOs alike, but cloud computing, on-premises data centers and colos all deliver value beyond their relative cost differences.
One way of assessing the value of a product is through a price-sensitivity analysis, whereby users are asked how they would (hypothetically) respond to price changes. Users who derive considerable value from a product are less likely to change their buying behavior following any increase in cost. Users more sensitive to cost increases will typically consider competing offers to reduce or maintain costs. Switching costs are also a factor in a user’s sensitivity to price changes. In cloud computing, for example, the cost of rearchitecting an application as part of a migration might not be justifiable if the resultant ongoing cost savings are limited.
IT decision-makers surveyed as part of Uptime Intelligence’s Data Center Capacity Trends Survey 2022 were asked what percentage of current workloads they would be likely to migrate to the cloud if their existing data center costs (covering on-premises and colos) rose 10%, 50% or 100%, respectively (assuming cloud prices remained stable).
While Uptime has neither conducted nor seen extensive research into rising costs, most operators are likely to be experiencing strong inflationary pressures (i.e., of over 15%) on their operations: energy prices and staff shortages being the main drivers.
The survey responses are illustrated in two different formats:
What does this data tell us? Figure 1 shows that if on-premises or colo costs were to increase by 10%, then around 12% of workloads could migrate to the cloud. If costs were to increase by 50%, approximately 24% of workloads would potentially move to the cloud. Even if costs were to double, however, only just over 30% of workloads would be likely to migrate to the public cloud. This suggests that on-premises and colo users are not particularly price-sensitive. While they are likely to have some impact, rising data center costs per se are unlikely to trigger a mass exodus to the public cloud.
Some users are more price sensitive than others, however. Figure 2 shows that 42% of respondents indicate a 10% increase in costs would not drive any workloads to the public cloud. One quarter of respondents would still be unlikely to migrate workloads even if faced with price hikes of 50%. Notably, a quarter of respondents indicate they would not migrate any workloads even if costs were to double. This may suggest that at least 25% of those organizations surveyed do not consider the public cloud to be a viable option for their workloads currently.
This reluctance may be the result of several factors. Some respondents may derive value from hosting workloads in non-cloud data centers and may believe this to justify any additional expense. Others may believe that regulatory, technical and compliance issues render the public cloud unviable, making cost implications irrelevant. Some users may feel that moving to the public cloud is simply cost-prohibitive.
Most users are susceptible to price increases, however — at least to some extent. A 10% increase in costs would drive 55% of organizations to migrate some or most of their workloads to the cloud. A total of 59% of respondents indicate they would do so if faced with a more substantial 50% increase. Faced with a doubling of their costs, over a quarter of respondents would migrate most of their workloads to the cloud. Again, this is assuming that cloud costs remain constant — and it is unlikely that cloud providers could absorb such significant upward cost pressures without any increase in prices.
Other survey data (not shown in graphics) indicates that even if infrastructure expenditure were to double, only 7% of respondents would migrate their entire workloads to the cloud. Given that 25% of respondents indicate that they would keep all workloads on-premises regardless of cost increases, this confirms that most users are adopting a hybrid IT approach. Most users are willing to consider on-premises and cloud facilities for their workloads, choosing the most appropriate option for each application.
Although the Uptime Intelligence Data Center Capacity Trends Survey 2022 did not, specifically, cover the impact of price reductions, it is possible to estimate the potential impacts of cloud providers cutting their rates. A price cut of 10% would be unlikely to attract significantly more workloads to the public cloud: but a 50% reduction would have a more dramatic impact. As indicated above, however, cloud providers — faced with the same energy-cost challenges as data center owners and colos — are more likely to absorb any cost increases in their gross margins rather than risk damaging their credibility by raising prices (see OVHcloud price hike shows cloud’s vulnerability to energy costs).
In conclusion:
First signs of federal data center reporting mandates appear in US
/in Executive, Operations/by Jay Dietrich, Research Director of Sustainability, Uptime Institute, [email protected]The past year (2022) has seen regulators in many countries develop or mandate requirements to report data centers’ operating information and environmental performance metrics. The first of these, the European Commission (EC) Energy Efficiency Directive (EED) recast is currently under review by the European Parliament and is expected to become law in 2023. This directive will mandate three levels of information reporting, the application and publication of energy performance improvement and efficiency metrics, and conformity with certain energy efficiency requirements (see EU’s EED recast set to create reporting challenges).
Similar legislative and regulatory initiatives are now appearing in the US with the White House Office of Technology and Science Policy’s (OTSP’s) Climate and energy implications of crypto-assets in the US report, published in September 2022. Concurrently with this, Senator Sheldon Whitehouse is drafting complimentary legislation that addresses both crypto and conventional data centers and sets the stage for the introduction of similar regulation to the EED over the next three to five years.
The OTSP report focuses on the impacts of the recent precipitous increase in energy consumption resulting from cryptocurrency mining in the US — initially driven by high crypto prices, low electricity costs and China’s prohibition of cryptomining operations. The OTSP report estimates cryptomining energy consumption (for both Bitcoin and Ethereum mining) to be responsible for 0.9% to 1.7% of US electricity consumption, and for 0.4% to 0.8% of greenhouse gas (GHG) emissions, in 2021.
The OTSP’s projections may already be out of date due to the current high energy prices and the collapse in value of most crypto assets. The OTSP’s projections, moreover, do not take into account the likely impact of Ethereum mining operations (estimated to account for one-quarter to one-third of industry consumption) moving from “proof of work” (PoW) to “proof of stake” (PoS).
PoW is the original “consensus mechanism” used in cryptocurrency transactions, whereby miners compete to solve increasingly difficult algorithms to validate transactions — at the cost of ever-increasing energy consumption. PoS transactions are mediated by randomly selected miners who stake a quantity of cryptocurrency (and their experience level) for the right to confirm transactions — enabling the use of less computationally intense (and therefore less energy-intense) algorithms. Ethereum converted to PoS in September 2022 in an initiative known as “the Merge”: this change is expected to reduce its mining energy consumption by over 99%.
The OTSP report implies that the broader adoption of crypto assets and the application of the underlying blockchain software used across a range of business processes will continue to drive increasing blockchain-related energy consumption. The report does not offer a specific projection of increasing energy consumption from cryptomining and further blockchain deployments. Given that most, if not all, enterprise blockchain deployments use PoS validation, and given the ability of PoW infrastructure to move quickly to locations with minimal regulation and energy costs, much of this anticipated energy growth may not materialize.
To mitigate this projected growth in energy consumption, the OTSP report calls on the federal government to encourage and ensure the responsible development of cryptomining operations in three specific areas.
While these recommendations are primarily directed at cryptomining operations, the report also assesses conventional (i.e., non-crypto-asset) data center operations, noting that cryptomining energy consumption in 2021 was roughly comparable to that of conventional data centers. This clearly raises the question: if cryptomining energy consumption warrants public data reporting and energy performance standards, then why should conventional data center operations not also be included in that mandate?
Under US law, Congress would need to pass legislation authorizing an administrative agency to require data centers to report their location(s), operational data and environmental performance information. Senator Whitehouse is developing draft legislation to address both crypto-asset and conventional data centers, using the EED as a blueprint. The Senator’s proposals would amend the Energy Independence and Security Act of 2007 (EISA) to require all public and private conventional and cryptomining data center locations with more than 100 kW of installed IT equipment (nameplate power) to report data to the Energy Information Administration (EIA). These data center locations would need to outline their operating attributes: a requirement remarkably similar to the EED’s information reporting mandates.
The proposals also require the DOE to promulgate a final rule covering energy conservation standards for “Servers and Equipment for Cryptomining” within two years of the EISA amendments going into force. While this requirement is specific to cryptomining equipment, it is likely that the DOE will lobby Congress to include energy conservation standards for conventional data center IT equipment as part of these proposed amendments. The DOE has already attempted to set energy conservation standards for computer servers (79 FR 11350 02/28/2014) through authority granted under the EISA regulating commercial office equipment.
Little will happen immediately. Legislative and regulatory processes and procedures in the US can be laborious, and final standards governing data center information and energy efficiency reporting are likely to remain several years away. But the release of the OTSP report and the development of draft US legislation indicate that the introduction and adoption of these standards is a matter of “when” (and how strictly?) rather than “if”.
Owners and operators of digital infrastructure need to be prepared. The eventual promulgation of these standards, taken in conjunction with proposed regulation on climate change disclosures from the Securities and Exchange Commission will, sooner or later, dictate that operators establish data collection and management processes to meet information reporting requirements. Operators will need to develop a strategy for meeting these requirements and will need to have policies in place to ensure they undertake projects that increase the work delivered per megawatt-hour of energy consumed across their data center operations.
Data center managers would also be wise to engage with industry efforts to develop simple and effective energy-efficiency metrics. These metrics are required under both US draft legislation and the EC EED recast and are likely to be included in legislation and regulation in other jurisdictions. An ITI Green Grid (TGG) Working Group has been put in place to work on this issue, and other efforts have been proposed by groups and institutions such as Infrastructure Masons (iMasons) and the Climate Neutral Data Centre Pact. Uptime Institute is also providing detailed feedback on behalf of its members on an EC study proposing options and making recommendations for data reporting and metrics as required under the EED recast.
Industry initiatives that encompass all types of IT operations are going to be important. Just as importantly, the industry will need to converge on a single and cohesive globally applicable metric (or set of metrics) to facilitate standardized reporting and minimize confusion.
Rapid interconnectivity growth will add complexity and risk
/in Executive, Operations/by Lenny Simon, Senior Research Associate, Uptime InstituteRecent geopolitical concerns, predictions of a looming recession, and continued supply chain difficulties are unlikely to dampen growth in digital bandwidth on private networks according to Equinix’s 2022 Global Interconnection Index (GXI). Global interconnection bandwidth (the volume of data exchanged between companies directly, bypassing the public internet) is a barometer for digital infrastructure and sheds light on the difference in dynamics between verticals. High growth in private interconnection is a boon for Equinix as the world’s largest colocation provider by market share but makes resiliency more challenging for its customers: all these interconnects are also potential points of failure.
The Equinix GXI projects strong growth across the industry in 2023, with global interconnection bandwidth projected to increase by 41% compared to 2022. Overall, global interconnection bandwidth is projected to grow by a compound annual growth rate (CAGR) of 40% into 2025, when it is expected to reach nearly 28,000 terabits per second (tbps). These numbers include direct connections between enterprises and their digital business partners (such as telecommunications, cloud, edge, and software as a service (SaaS) providers).
The Equinix study projects faster growth in private interconnection for enterprises than for networks operated by telecommunications companies or cloud providers. This growth in private interconnection is driven by high demand for digital services and products — many of which also require a presence with multiple cloud providers as well as integration with major SaaS companies.
The energy and utility sector is likely to see the greatest growth in private network interconnection through 2025, with a CAGR of 53%, as energy data becomes increasingly important for managing intermittent renewable energy and decarbonizing the grid. Digital services supporting sustainability efforts such as carbon accounting are likely to require additional private interconnection with SaaS providers to accurately track operational sustainability metrics.
The banking and insurance and manufacturing sectors are expected to see CAGRs of 49% and 45%, respectively, over the same period. These industries are particularly sensitive to errors and outages, however, and appropriate planning will be necessary.
There is a reason Equinix has been drawing attention to the benefits of interconnection for the past six years: as at Q2 2022 the company operates 435,800 cross-connects throughout its own data centers. Its closest competitor, Digital Realty, reported just 185,000 cross-connects at its facilities in the same quarter. Equinix defines a cross-connect as a point-to-point cable link between two customers in the same retail colocation data center. For colocation companies, cross-connects not only represent core recurring revenue streams but also make their network-rich facilities more valuable as integration hubs between organizations.
As private interconnection increases, so too does the interdependency of digital infrastructure. Strong growth in interconnection may be responsible for the increasing proportion of networking and third-party-related outages in recent years. Uptime’s 2022 resiliency survey sheds light on the two most common causes of connectivity-related outages: misconfiguration and change management failure (reported by 43% of survey respondents); and third-party network-provider failure (43%). Asked specifically if their organization had suffered an outage caused by a problem with a third-party supplier, 39% of respondents confirmed this to be the case (see Figure 1).
When third-party IT and data center service providers do have an outage, customers are immediately affected — and may seek compensation. Enterprise end-users will need additional transparency and stronger service-level agreements from providers to better manage additional points of failure, as well as the outsourcing of their architecture resiliency. Importantly, managing the added complexity of an enterprise IT architecture spanning on-premises, colocation and cloud facilities demands more organizational resources in terms of skilled staff, time and budget.
Failing that, businesses might encounter unexpected availability and reliability issues rather than any anticipated improvement. According to Uptime’s 2021 annual survey of IT and data center managers, one in eight (of those who had a view) reported that using a mix of IT venues had resulted in their organization experiencing a deterioration in service resiliency, rather than the reverse.
By: Lenny Simon, Senior Research Associate and Max Smolaks, Analyst
Reports of cloud decline have been greatly exaggerated
/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]Cloud providers have experienced unprecedented growth over the past few years. CIOs the world over, often prompted by CFOs and CEOs, have been favoring the cloud over on-premises IT for new and major projects — with the result that the largest cloud provider, Amazon Web Services (AWS), has seen revenue increase by 30% to 40% every year since 2014 (when it recorded an 80% jump in turnover). Microsoft Azure and Google have reported similar numbers in recent times.
But there are signs of a slowdown:
Why this slowdown in cloud growth?
The global macroeconomic environment — specifically, high energy costs together with inflation — is making organizations more cautious about spending money. Cloud development projects are no different from many others and are likely to be postponed or deprioritized due to rising costs, skill shortages and global uncertainty.
Some moves to the cloud may have been indefinitely deferred. Public cloud is not always cheaper than on-premises implementations, and many organizations may have concluded that migration is just not worthwhile in light of other financial pressures.
For those organizations that have already built cloud-based applications it is neither feasible nor wise to turn off applications or resources to save money: these organizations are, instead, spending more time examining and optimizing their costs.
Cutting cloud costs, not consumption
Cloud providers’ top-line revenue figures suggest customers are successfully reducing their cloud costs. How are they doing this?
Optimizing cloud expenditure involves two key activities: first, eliminating waste (such as orphaned resources and poorly sized virtual machines); and second, more cost-effective procurement, through alternative pricing models such as consistent-usage commitments or spot instances — both of which, crucially, reduce expenditure without impacting application performance.
Hyperscaler cloud providers, which are more interested in building longer-term relationships than in deriving higher gross margins in the short term, offer tools to help users reduce expenditure. These tools have improved significantly over the past few years.
Many organizations have now crossed a threshold in terms of cloud use, where the savings to be made mean it is to their benefit to invest in optimization (using these tools). One factor driving optimization here is higher cloud expenditure — in part an ongoing consequence of the pandemic, which saw businesses retooling IT to survive, rather than focusing on cutting IT costs.
It should, perhaps, have been anticipated that customers would, at some point, start using these tools to their own advantage — current pressures on other costs having made cutting IT expenditure more critical than before.
Will cloud prices rise?
Cloud providers’ overriding objective of winning and keeping customers over the long term explains why hyperscalers are likely to try and avoid increasing their prices for the foreseeable future. Providers want to maintain good relationships with their customers so that they are the de facto provider of choice for new projects and developments: price hikes would damage the customer trust they’ve spent so long cultivating.
AWS’s Q3 2022 gross margin was 26%, some 3% down on Q2. This drop in margin could be attributed to rising energy costs, which AWS states almost doubled over the same period (hedging and long-term purchase agreements notwithstanding). Microsoft has reported it will face additional energy costs of $800 million this financial year. While AWS and Microsoft could have increased prices to offset rising energy costs and maintain their profit margins they have, so far, chosen not to do so rather than risk damaging customers’ trust.
How will this play out, going forward? Financial pressures may make organizations more careful about cloud spending. Projects may be subject to more stringent justification and approval, and some migrations are likely to be delayed (or even cancelled) for now. As revenue increases in absolute terms, achieving high-percentage revenue gains becomes increasingly difficult. Nonetheless, while the days of 40% revenue jumps may be over, this recent downturn is unlikely to be the start of a rapid downward spiral. AWS’s Q3 2022 revenue growth may have shrunk in percentage terms: but it was still in excess of $4 billion.
Applications architected for the cloud should be automatically scalable, and capable of meeting customers’ requirements without their having to spend more than necessary. Cloud applications allow organizations to adapt their business models and / or drive innovation — which may be one of the reasons many have been able to survive (and, in some cases, thrive) during challenging times. In a sense, the decline in growth that the cloud companies have suffered recently demonstrates that the cloud model is working exactly as intended.
The hyperscaler cloud providers are likely to continue to expand globally and create new products and services. Enterprise customers, in turn, are likely to continue to find cloud services competitive in comparison with colocation-based or on-premises alternatives. Much of the cloud’s value comes from a perception of it offering “unlimited” resources. If providers don’t increase capacity, they risk failing to meet customers’ expectations when required — damaging credibility, and relationships. AWS, Google and Microsoft continue to compete for market share, worldwide. Reducing investment now could risk future profitability.
AWS currently has 13,000 vacancies advertised on its website — a sign that the cloud sector is certainly not in retreat. This fact, rather, suggests future growth will be strong.
Major data center fire highlights criticality of IT services
/in Design, Executive, Operations/by Daniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]Uptime Institute’s outages database suggests data center fires are infrequent, and rarely have a significant impact on operations. Uptime has identified 14 publicly reported, high-profile data center outages caused by fire or fire suppression systems since 2020. The frequency of fires is not increasing relative to the IT load or number of data centers but, uncontained, they are potentially disastrous to facilities, and subsequent outages can be ruinous for the business.
SK Group, South Korea’s second largest conglomerate, is the latest high-profile organization to suffer a major data center fire, following a breakout at a multistory colocation facility operated by its SK Inc. C&C subsidiary in Pangyo (just south of Seoul) on October 15. According to police reports, the fire started in a battery room before spreading quickly to the rest of the building. It took firefighters around eight hours to bring the blaze under control.
While there were no reported injuries, this incident could prove to be the largest data center outage caused by fire to date. It is a textbook example of how seemingly minor incidents can escalate to wreak havoc through cascading interdependencies in IT services.
The incident took tens of thousands of servers offline, including not only SK Group’s own systems but also the IT infrastructure running South Korea’s most popular messaging and single sign-on platform, KakaoTalk. The outage disrupted its integrated mobile payment system, transport app, gaming platform and music service — all of which are used by millions. The outage also affected domestic cloud giant Naver (the “Google of South Korea”) which reported disruption to its online search, shopping, media and blogging services.
While SK Group has yet to disclose the root cause of the fire, Kakao, the company behind KakaoTalk, has pointed to the Lithium-ion (Li-ion) batteries deployed at the facility — manufactured by SK on, another SK Group subsidiary. In response, SK Group has released what it claims are records from its battery management system (BMS) showing no deviation from normal operations prior to the incident. Some local media reports contradict this, however, claiming multiple warnings were, in fact, generated by the BMS. Only a thorough investigation will settle these claims. In the meantime, both sides are reported to be “lawyering up.”
The fallout from the outage is not limited to service disruptions or lost revenue, and has prompted a statement from the country’s president, Yoon Suk-yeol, who has promised a thorough investigation into the causes of, and the extent of the damages arising from, the fire. The incident has, so far, led to a police raid on SK Inc. C&C headquarters; the resignation of Kakao co-CEO Whon Namkoong; and the establishment of a national task force for disaster prevention involving military officials and the national intelligence agency. Multiple class-action lawsuits against Kakao are in progress, mainly based on claims that the company has prioritized short-term profits over investment in more resilient IT infrastructure.
The South Korean government has announced a raft of measures aimed at preventing large-scale digital service failures. All large data centers will now be subject to disaster management procedures defined by the government, including regular inspections and safety drills. Longer-term, the country’s Ministry of Science and ICT will be pushing for the development of battery technologies posing a lower fire risk — a matter of national interest for South Korea, home to some of the world’s largest Li-ion cell manufacturers including Samsung SDI and LG Chem, in addition to SK on.
The fire in South Korea will inevitably draw comparisons with the data center fire that brought down the OVHcloud Strasbourg facility in 2021. Impacting some 65,000 customers, many of whom lost their data in the blaze (see Learning from the OVHcloud data center fire), this fire, as in Pangyo, was thought to have involved uninterruptible power supply (UPS) systems. According to the French Bureau of Investigation and Analysis on Industrial Risks (BEA-RI), the lack of an automatic fire extinguisher system, delayed electrical cutoff and building design all contributed to the spread of the blaze.
A further issue arising from this outage, and one that remains to be determined, is the financial cost to SK Group, Kakao and Naver. The fire at the OVHcloud Strasbourg facility was estimated to cost the operator more than €105 million — with less than half of this being covered by insurance. The cost of the fire in Pangyo is likely to run into tens (if not hundreds) of millions of dollars. This should serve as a timely reminder of the importance of fire suppression, particularly in battery rooms.
Li-ion batteries in mission-critical applications — risk creep?
Li-ion batteries present a greater fire risk than valve-regulated lead-acid batteries, regardless of their specific chemistries and construction – a position endorsed by the US’ National Fire Protection Association and others. Since the breakdown of cells in Li-ion batteries produces combustible gases (including oxygen) which can result in a major thermal-runaway event (in which the fire spreads uncontrollably between cells, across battery packs and, potentially, even cabinets if these are inappropriately distanced), the fires they cause are notoriously difficult to suppress.
Many operators have, hitherto, found the risk-reward profile of Li-ion batteries (in terms of their lower footprint and longer lifespan) to be acceptable. Uptime surveys show major UPS vendors reporting strong uptake of Li-ion batteries in data center and industrial applications: some vendors report shipping more than half their major three-phase UPS systems with Li-ion battery strings. According to the Uptime Institute Global Data Center Survey 2021, nearly half of operators have adopted this technology for their centralized UPS plants, up from about a quarter three years ago. The Uptime Institute Global Data Center Survey 2022 found Li-ion adoption levels to be increasing still further (see Figure 1).
The incident at the SK Inc. C&C facility highlights the importance of selecting appropriate fire suppression systems, and the importance of fire containment as part of resiliency. Most local regulation governing fire prevention and mitigation concentrates (rightly) on securing people’s safety, rather than on protecting assets. Data center operators, however, have other critically important issues to consider — including equipment protection, operational continuity, disaster recovery and mean time to recovery.
While gaseous (or clean agent) suppression is effective at slowing down the spread of a fire in the early stages of Li-ion cell failure (when coupled with early detection), it is arguably less suitable for handling a major thermal-runaway event. The cooling effects of water and foam mean these are likely to perform better; double-interlock pre-action sprinklers also limit the spread. Placing battery cabinets farther apart can help prevent or limit the spread of a major fire. Dividing battery rooms into fire-resistant compartments (a measure mandated by Uptime Institute’s Tier IV resiliency requirements) can further decrease the risk of facility-wide outages.
Such extensive fire prevention measures could, however, compromise the benefits of Li-ion batteries in terms of their higher volumetric energy density, lower cooling needs and overall advantage in lifespan costs (particularly where space is at a premium).
Advances in Li-ion chemistries and cell assembly will address operational safety concerns — lithium iron phosphate, with its higher ignition point and no release of oxygen during decomposition – being a case in point. Longer term, inherently safer, innovative chemistries — such as sodium-ion and nickel-zinc — will probably offer a more lasting solution to the safety (and sustainability) conundrum around Li-ion. Until then, the growing prevalence of vast amounts of Li-ion batteries in data centers means the propensity of violent fires can only grow — with potentially dire financial consequences.
By: Max Smolaks, Analyst, Uptime Institute Intelligence and Daniel Bizo, Research Director, Uptime Institute Intelligence
Tweak to AWS Outposts reflects demand for greater cloud autonomy
/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]Amazon Web Services (AWS) has made a minor change to its private-cloud appliance, AWS Outposts, that could significantly impact resiliency. The cloud provider has enabled local access to cloud administration, removing the appliance’s reliance on the public cloud. In the event of a network failure between the public cloud and the user’s data center, the private-cloud container platform can still be configured and maintained.
Many public-cloud providers have extended their offerings to now make their services accessible through the user’s own choice of data center. Services are typically billed in the same way as they are via the public cloud, and accessed through the same portal and software interfaces, but are now delivered from hardware and software hosted in the user’s own facility. Such services are in demand from customers seeking to meet compliance or data protection requirements, or to improve the end-user experience through lower latency.
In one business model, the cloud provider ships a server-storage private-cloud appliance to an organization’s data center. The organization manages the data center. The public-cloud provider is responsible for the hardware and middleware that delivers the cloud functionality.
The term “private cloud” describes a cloud platform where the user has access to elements of the platform not usually accessible in the public cloud (such as the data center facility, hardware and middleware). These appliances are a particular type of private cloud, not designed to be operated independently of the public cloud. They are best thought of as extensions of the public cloud to the on-premises data center (or colocation facility) since administration and software maintenance is performed via the public cloud.
As the public and private cloud use the same platform and application programming interfaces (APIs), applications can be built across the organization’s and the cloud provider’s data centers, and the platform can be managed as one. For more information on private-cloud appliances (see the Uptime Institute Intelligence report Cloud scalability and resiliency from first principles).
The resilience of this architecture has not, hitherto, been assured because the application still relies on the cloud provider’s ability to manage some services, such as the management interface. The public-cloud provider controls the interface for interacting with the user’s on-premises cloud (the “control plane”); if that interface goes down, so too does the ability to administrate the on-premises cloud.
Ironically, it is precisely during an outage that an administrator is most likely to want to make such changes to configuration — to reserve capacity for mission-critical workloads or to reprioritize applications to handle the loss of public-cloud capacity, for example. If an AWS Outpost appliance were being used in a factory to support manufacturing machinery, for instance, the inability to configure local capabilities during a network failure could significantly affect production.
It is for this reason that AWS’s announcement that its Elastic Kubernetes Service product (Amazon EKS) can be managed locally on AWS Outposts is important. Kubernetes is a platform used to manage containers. This new capability allows users to configure API endpoints on the AWS Outposts appliance, meaning the container configuration can be changed via the local network without connecting to the public cloud.
In practical terms, this addition makes AWS Outposts more resilient to outages because it can function in the event of a connectivity failure between the cloud provider and the data center. AWS Outposts is now far more feasible as a disaster-recovery or failover location, and more appropriate for edge locations, where connectivity might be less assured.
The most important aspect of this development, however, is that it indicates AWS — the largest cloud provider — is perhaps acknowledging that users don’t just want an extension of the public cloud to their own facilities. Although many organizations are pursuing a hybrid-cloud approach, where public and private cloud platforms can work together, they don’t want to sacrifice the autonomy of each of those environments.
Organizations want venues to work independently of each other if required, avoiding single points of failure. To address this desire, other AWS Outposts services may be made locally configurable over time as users demand autonomy and greater control over their cloud applications.