Data centers weather solar storms

Data centers weather solar storms

The US Space Weather Prediction Center (SWPC) issued multiple geomagnetic storm watches throughout August and September 2022. Geomagnetic storms occur when solar storms interact with the Earth’s atmosphere and magnetic field, risking disruption to satellites, radio communications and the power grid. The strongest predicted storm earned the agency’s rating of G3 (“strong”) on a five-level classification system of geomagnetic activity — intense enough to require voltage corrections on the grid in some regions. In comparison, an “extreme” G5 geomagnetic storm could overheat or destroy high-voltage transformers, leading to widespread and lasting power outages.

A geomagnetic storm represents one type of electromagnetic pulse (EMP) — a rapid discharge of electromagnetic energy. A geomagnetic storm, also known as a geomagnetic disturbance or geomagnetic EMP, has secondary effects on power consumers that expose data centers to risk of equipment disruption and damage. The three types of EMP (geomagnetic, nuclear, or intentional) vary in their physical characteristics, but each can endanger data centers (see the Uptime Institute report Electromagnetic pulse and its risk to data centers). However, many operators do not include any type of EMP in their risk assessments, and have not implemented protective measures.

The SWPC monitors solar events and can provide hours’ or days’ notice of events likely to impact Earth. The long-term prediction of an individual geomagnetic EMP event on Earth is not possible currently. Solar events that could cause geomagnetic EMP events occur frequently but chaotically; they are often directed away from Earth and astronomers can only predict them on the basis of probability. For example, a G5 (“extreme”) event typically reaches Earth once every 25 years. Such an event caused a nine-hour outage of the Hydro-Québec transmission system in 1989. An extreme geomagnetic EMP damaged 12 transformers in South Africa in 2003. Before the advent of today’s power grid, the 1859 Carrington Event (the most intense geomagnetic storm in history) caused sparking and fires at multiple telegraph stations.

Due to its low frequency, geomagnetic EMP acts most strongly on electrical conductors running miles in length, such as high-voltage transmission lines in the power grid. The induced current behaves similarly to direct current (DC) in a system designed for alternating current (AC). Most storms, such as those in late summer 2022, are not intense enough to cause power outages. Grid operators can compensate for the induced currents of a smaller EMP event and continue delivering power. Data centers may experience problems with power quality, however — specifically, harmonic distortion (defects in AC voltage waveforms). As power is transmitted from high-voltage lines to utility customers, it passes through a succession of transformers — each of which steps down the voltage but intensifies harmonics. Harmonics are at their greatest intensity once the power reaches the end user, in this case a data center.

Most data center uninterruptable power supply (UPS) systems are designed to accommodate some harmonics and protect downstream equipment, but geomagnetic EMP events can overwhelm these built-in protections — potentially damaging the UPS or other equipment. The effects of harmonics inside a data center can include inefficient UPS operation, UPS rectifier damage, tripped circuit breakers, overheated wiring, malfunctioning motors in mechanical equipment and, ultimately, physical damage to IT equipment.

Some grid operators are already installing protective devices to guard their infrastructure against geomagnetic EMP, sparing their customers the secondary effects of outages and harmonics.

Those data center operators that include EMP risk as part of an overall risk assessment can improve their infrastructure resiliency by implementing their own EMP safeguards. The primary threats to individual data centers from geomagnetic EMP —power outages and harmonics — act via the power grid. Operators can manage this risk by disconnecting from the grid and operating on backup power. Increased on-site fuel / energy storage may be appropriate in preparing for smaller geomagnetic storms.

In the event of a large geomagnetic storm, the entire population of the affected area will be competing for fuel and other supplies and prolonged power outages are likely to outlast data centers’ fuel-storage capacity. These same conditions are likely to affect users of those applications running in a data center, leaving them unable to connect. Relying on geographically dispersed multisite resiliency (stretching over thousands of miles) is likely to offer more effective protection. More localized EMP effects — from, for example, a small geomagnetic EMP or an intentional EMP — will not affect more distant locations, so there may be a stronger economic argument for maintaining availability through such events.

Awareness of EMP as a risk to data centers is increasing. Best practice in EMP protection is not well established in the data center industry yet, but periodic risk assessments will enable operators to incorporate updated information and guidelines as these become available.

Sacrifice speed to cut cloud carbon and costs

Sacrifice speed to cut cloud carbon and costs

New findings from research by Uptime Institute Intelligence reveals that organizations can cut both their cloud carbon emissions and costs by moving workloads to different regions. However, the trade off with this migration is an increase in latency.

Cloud users choose regions based primarily on two factors:

  1. Locating the application close to end users improves the user experience by delivering faster content. Some applications, such as interactive gaming, require very low latency, which is driving cloud providers to invest in new edge locations close to end users. Not all applications, however, need such a quick response and end users can often tolerate a slight increase in latency without material impact to their experience.
  2. Offering cloud-based services in a country usually has implications regarding data protection and this can be partly addressed by keeping data within the same jurisdiction as their end users.

If there are no legal reasons to keep data in a jurisdiction, cloud users can often migrate their workloads to a nearby region and gain reductions in their carbon footprint; this migration can also result in lower costs. Uptime Intelligence collated information from Microsoft Azure, Amazon Web Services (AWS), Google Cloud, the Cloud Carbon Footprint project (which sources data from carbonfootprint.com, the European Environment Agency and the US Environmental Protection Agency) and CloudPing to produce the Cloud Carbon Explorer, which includes three interactive maps.

The maps show the potential cross-region workload migration paths for AWS, Google Cloud and Microsoft Azure. These workload migration paths can reduce carbon footprint without significantly impacting user experience and, in some cases, reduce cost. Users can use the tool to explore suitable compromises of latency, cost and carbon for each application.

We found 38 migration paths between AWS regions that provide both carbon and cost reductions with a latency impact of less than 100 milliseconds (ms). For Google Cloud, there were 39 migrations paths and for Microsoft Azure there were 15 migration paths, all with a latency impact of <100 ms.

Figure 1 shows the tool’s analysis of possible workload migrations from AWS’s Frankfurt data center. The Carbon Cloud Footprint project estimates AWS’s Frankfurt data center to have relatively high grid carbon emissions compared with the rest of Europe. Migration to Stockholm, Milan, Paris or London all provide significant reductions in carbon and cost, with a maximum increase in latency of 30 ms.

Figure 1 Cloud Carbon Explorer: migration paths from AWS’s Frankfurt region
Figure 1 Cloud Carbon Explorer: migration paths from AWS’s Frankfurt region

The bubble size in Figure 1 represents the grid carbon emissions, and the thickness of the line represents the latency impact (wider equals slower). For example, clicking on the migration path pointer from Frankfurt to Stockholm shows a potential 98% cut in grid emissions. The color of the line indicates impact to cost, with green representing a cost saving (not shown in this example: yellow lines representing <5% cost increases).

Users can also make carbon and cost reductions when using Microsoft Azure or Google Cloud. For instance, the Cloud Carbon Explorer shows that by moving a virtual machine from Google’s Hong Kong data center to Taiwan, grid emissions drop 14% and the cost of an e2-standard-2 virtual machine decreases by 17%. The trade-off is a slight increase of round-trip latency of 13 ms. In another example, Microsoft Azure users can reduce carbon and cost by migrating their workloads from Iowa (US) to Illinois (US). For a slight increase in latency of 13 ms, the cost of a D2as_v5 virtual machine drops by 12% and grid carbon emissions decrease by 17%.

The Cloud Carbon Explorer provides indicative carbon, cost and latency figures based on various assumptions. A lack of data is a significant problem for users in calculating their cloud carbon footprints. This difficulty in acquiring the appropriate data is the reason Uptime Intelligence have used third-party sources and were not able to evaluate all regions for all cloud providers. In addition, the individual characteristics of specific data centers (such as power usage effectiveness variations) have not been considered due to a lack of comprehensive information. Although the analysis is imperfect, it does demonstrate, however, that there are suitable trade-offs to be made.

Organizations should commit to cutting carbon emissions, partly for regulatory reasons and also because sustainability is high on consumer and corporate agendas. However, moving region isn’t always simple, and there are legal and latency repercussions to consider. As discussed in our report How resiliency drives cloud carbon emissions, before users consider migrating workloads, they should investigate whether the application can be tuned to reduce carbon.

The next step for users to reduce their carbon (and cost) involves the choice of data center. Users need to balance the benefits of carbon reduction (sometimes coupled with a cost reduction) against the impact of a latency increase.

Uptime Intelligence believes the third-party data sources used in the Cloud Carbon Explorer are reliable and fair, but we have not audited them in depth. Users should use the interactive maps to evaluate feasible migrations before performing their own more detailed assessments. Our analysis suggests that these investigations are worthwhile, given the potential savings in both carbon and costs.

New server leasing models promise cloud-like flexibility

New server leasing models promise cloud-like flexibility

IT hardware vendors, such as Dell and Hewlett Packard Enterprise (HPE), are pivoting their revenue models away from product sales toward service-based subscriptions. The objective is to make hardware appear more flexible and cloud-like so buyers can retain servers in their choice of data center while receiving the critical benefit of the cloud: scalability.

Much of the value of the public cloud is in its pay-as-you-go consumption model. Scalability is an application’s ability to consume more or fewer resources as needed (see Cloud scalability and resiliency from first principles). Scalability is only possible if the user is billed in arrears based on resources consumed. This model allows IT to address changing business needs without having to provision capacity far in advance. The provider, not the user, is primarily responsible for infrastructure capacity planning.

To provide this flexibility, public cloud providers provision far more capacity than is likely to be needed. This excess allows users to consume at will without needing to reserve capacity in advance. The aggregation of demand from thousands of users allows hyperscaler cloud providers to overprovision capacity and still make a profit.

Server utilization is a critical metric in the cloud business model. The hyperscale provider faces a balancing act: too many servers sitting unused equates to waste, and waste equates to sunk costs. Too few servers and the principal value of the cloud, namely the ability for users to scale spontaneously, falls apart.

Organizations using physical servers in their own or colocation facilities (including for reasons of compliance, data sovereignty or performance) face the same capacity-management problem. But unlike hyperscalers, they neither have the benefit of demand aggregation at scale, nor the budget to buy and host many empty servers on a just-in-case basis.

Server vendors are changing their business models to allow the leasing of hardware on a more flexible basis. In this hardware as a service or consumption-based model, the buyer commits to a monthly minimum capacity of resources, usually expressed in central processing unit (CPU) cores, memory and storage. The provider supplies a server (to the user’s data center or colocation facility) that delivers at a minimum this capacity, but also builds in a buffer of capacity — usually 10% to 20% above the user’s requirement. The organization is billed at the end of each month for its minimum commitment, plus any additional capacity consumed.

The vendor provides a software portal that tracks consumption and capacity. If consumption regularly breaches the user’s committed capacity, the user is alerted and a new server can be ordered and delivered to the organization easily. Server maintenance is included.

Leasing hardware in this way isn’t necessarily cheaper than purchasing upfront, but the key benefits for the user include being able to switch from capital expense to operating expense, plus a lower capacity-management burden. From the vendor’s point of view, as a service models provide a defense against the pull of the public cloud. They also help build a more relationship-led business by proactively responding to customers’ capacity issues.

HPE pioneered this model with GreenLake. It is pivoting to a services-first approach, away from its established reputation as a hardware manufacturer. HPE GreenLake includes computing and storage hardware plus integrations with software such as cloud stacks, container platforms and multicloud portals. The GreenLake portfolio now includes more than 70 products.

Cloud integration is a critical use case. HPE provides the hardware and software providers such as VMware, Microsoft, Red Hat and Google provide the cloud software, enabling pay-as-you-go private clouds. Public cloud integrations also allow organizations to build hybrid clouds that run across public and private venues using the same platform and are charged on a consumption basis.

This new approach appears to be working. In Q2 2022, annualized revenue run rate for HPE’s as a service offerings reached $820M, bringing the total number of customers to over 1,600. The company reports that Q2 2022 orders are up by 107%, compared with Q2 2021.

Dell, too, is pivoting to a services-first approach with its APEX portfolio. As with HPE, APEX will offer hardware owned, managed and maintained by Dell and billed using a subscription model. Launched in 2021, the success of APEX is not yet evident, but Dell is seeing the model as mission-critical to its future. Other major hardware vendors, including Cisco, Hitachi and Lenovo have also introduced as a service models (called Cisco Plus, Hitachi EverFlex and Lenovo TruScale).

Organizations should consider consumption-based servers. There is no (or little) capital investment, the provider takes responsibility for some aspects of capacity planning and maintenance, and physical hardware can be consumed flexibly in a cloud-like model. However, capacity isn’t guaranteed: if unexpected resource demands overrun the capacity buffer, there may be performance issues while more servers are delivered to the site.

Is a consumption-based server cheaper than a purchased one? It depends on several factors, such as the contract term, the server model, the commitment and the utilization. For example, if the user over-commits to a minimum capacity, it may pay more than if it bought a smaller server upfront. Furthermore, the vendor still owns the server at the end of the term. There is no resell or trade-in value, which impacts the buyer’s return on investment. Hardware leasing models could be good news for colocation operators because it removes the capital requirement to run servers in their facilities. The model opens new revenue streams for managed services providers: could partnerships and packages that unify pay-as-you-go data center capacity, hardware and software attract capital-stricken customers?

Quantum Computing

Quantum computing is not a panacea yet

Quantum computing promises a revolution in scientific discovery. Quantum computing’s main advantage over digital computing is in quickly solving highly complex problems that require significant time or resources to process. Currently, solutions to many complex problems can be estimated using supercomputers or pools of servers over days or weeks. Other problems are so complex that they cannot be solved in human timescales using today’s technology. Although research is progressing, there is no guarantee that a practically useful quantum computer can ever be realized.

The impact of a fault-tolerant quantum computer on science and engineering would be vast. For example, it could help the global economy reduce global carbon emissions by finding a better process for producing ammonia than the currently used, energy-heavy Haber-Bosch process.

For companies involved in quantum computing, such as IBM, Microsoft, Honeywell, Amazon, D-Wave, Intel and IQM, quantum computing presents a significant new revenue stream. But it threatens to displace existing applications that use supercomputers or server pools to estimate solutions to tough problems.

Data centers, physical servers and applications used for complex processing tasks could be most vulnerable to being displaced by quantum computing, particularly modeling applications in fields such as finance, biochemistry, and engineering. For many applications, however, the cost and complexity of quantum will not guarantee a return on investment. Most quantum computer prototypes today require operating temperatures near absolute zero, huge upfront investments (although many are now accessible remotely as a cloud) and very specialized skills — this is a significant and costly undertaking.

So, what is quantum computing? In a digital computer, chips use transistors that are either switched on or off, representing 1s and 0s. Digital circuits use this mechanism to perform logic and memory functions, ultimately enabling complex IT systems and software. Electrical signals, typically encoding binary information, represent the flow of digital information.

Digital computers are deterministic systems, meaning they need to proceed with calculations step by step and can only mimic randomness in mathematics and the chaotic nature of the real world. They have another limitation: their numerical precision means they can only approximate when modeling nature. Even with high clock speeds and the massive parallelism of processing resources of supercomputers, there are many complex problems classical computers cannot practically attack. There are too many steps to go through or too much simplification required to be practical. Financial modeling, route optimization and protein folding in molecular biology are just a few examples of areas where classical computing hinders progress.

The quantum in quantum computing refers to tiny particles on the very limit of measurement. At the quantum scale, particles act spontaneously and randomly. Particles only assume a known state (i.e., their movement and position) once they are measured. A quantum particle “decides” what it is doing only when observed. Consequently, the results of observations are probabilistic. Unlike the clear, definite states of traditional computers, quantum bits, called qubits in a quantum computer, take multiple probabilistic states (or superpositions) between 0 and 1. The unobserved particle has essentially yet to decide its definite state.

Rather than looking through every possible combination of a problem, quantum computers use the inherent uncertainty in quantum particles and apply physics to solve the problem. Imagine we are trying to find the correct combination from a sequence of 0s and 1s that lies between 000 and 111. We could either go through every possible combination in series using a digital computer, or we could create three qubits and put them into a superposition state where we don’t know what they are doing. We can then perform operations on these qubits using lasers or electromagnetic fields.

Crucially, we manipulate these particles without looking at them — they remain in superposition. Once we have performed our operations, we then measure the particle’s state. The uncertainty between 0 and 1 then “collapses” into the correct combination.

This highly simplified example may introduce more questions than provide answers. The concepts in play at a quantum level are not intuitive, and the math is not simple. The critical point is that a quantum computer allows operations to be performed across a vast range of potential answers in as little as one step. In a digital computer, each possible solution must be investigated separately.

In the real world, however, building a quantum computer large enough to be practical has so far proved elusive. The problem is that controlling and protecting quantum states at such small scales is arduously tricky. Any interference, such as a stray particle, may bump into the qubit particle, changing its state by accident and leading to a loss of coherence. Errors that prevent coherence from lasting long enough to perform valuable and correct calculations are the biggest block to a viable quantum computer. The more qubits, the more difficult these errors are to control. Fault-tolerant quantum computers with hundreds, or even thousands, of useful qubits are the holy grail of quantum computing research.

Even though there is no guarantee that large-scale quantum computing is viable, scientists, government sponsors and investors either want to believe it is possible or don’t want to risk missing out on its daunting possibilities. Perhaps a more prominent indicator is that there are reports that some governments are collecting encrypted data in the hope they can crack it using a quantum computer later. As such, quantum-proof encryption is increasingly being considered in high-security use cases. But even quantum pioneers say we’re at least a decade away from a practical quantum computer.

Even if quantum computing is realized technically, it will probably not be as disruptive as some have forecasted. Practicalities, such as accessing a quantum computer and programming it, are far from simple. Cost will be prohibitive for many applications. As a result, it will not displace digital computers, including supercomputers, either. Instead, quantum computers will augment existing computing infrastructure as a new accelerated computing platform.

Alternative clouds are vulnerable to demanding buyers

Alternative clouds are vulnerable to demanding buyers

Although big-name hyperscalers such as Amazon, Google and Microsoft dominate the cloud arena, other companies also believe they have a role to play. Vultr, OVHcloud, Linode, DigitalOcean, Bluehost and Scaleway, for example, don’t offer huge portfolios of cutting-edge products but their products focus on simplicity and low cost for smaller businesses with relatively straightforward requirements.

The hallmark of the cloud is the ability to provision resources remotely and pay by credit card. But these resources are no longer just virtual machines — hyperscalers have evolved to offer vast portfolios, from software containers to artificial intelligence. Microsoft Azure claims to offer 200 products and services, for example, with each product having different variations and sizes in different regions and at different prices. Hyperscaler cloud providers have millions of individual line items for sale. Users must choose the best combination of these products for their requirements and architect them into an application.

Not all users need to build highly scalable applications across international borders utilizing the latest technology. Many companies just want to develop simple web applications using standard tools and are willing to use standard products to deliver their applications, rather than demanding specific capabilities.

With limited portfolios and limited variations, smaller alternative cloud providers can focus on delivering a few products well:

  • They focus innovation on squeezing costs and maximizing efficiency rather than on developing broad ranges of cutting-edge products.
  • Most have a handful of locations offering a few services (usually virtual machines, storage and, more recently, containers) in a limited range of configurations (such as a few different sizes of virtual machines).
  • They can focus on a region or country so that specific local demands for data sovereignty or ownership can be met.
  • Customer service and support is often more personal with an alternative cloud provider than a hyperscaler.

Hyperscalers are upfront that users must build resiliency in their applications using multiple availability zones (akin to data centers) and cloud services such as load balancers (see Public cloud costs versus resiliency: stateless applications). Alternative cloud providers don’t necessarily offer such capabilities. Users of alternative cloud providers often seek simplicity, without the need to architect decoupled, multivenue applications. The result is that users – almost inadvertently – rely more heavily on alternative cloud providers’ data centers to deliver resiliency than they might in a hyperscaler data center, where the application itself has been designed to be resilient. This dynamic is difficult to quantify but there are examples. Last year, a fire destroyed one of OVHcloud’s Strasbourg (France) data centers. The local fire service said the data center had neither an automatic fire extinguisher system nor an electrical cut-off mechanism. OVHcloud is now facing a class action lawsuit from 140 clients that were impacted, demonstrating the reliance these companies had on OVHcloud.

The challenge with the alternative provider model is that it reduces cloud services to a commodity where price, not innovation or quality, is a differentiator. As a result, alternative providers are under more pressure than hyperscalers to keep prices low. With less diverse portfolios, price pressure on a single service can impact overall margins more than a hyperscaler with other services to offset any losses. OVHcloud’s fire demonstrates that simplicity doesn’t necessarily mean resiliency.

In June 2022, DigitalOcean increased prices on nearly all its services. We believe some of this price increase is due to rising costs. Inflation is high in many economies and supply chain issues are affecting the timely and inexpensive delivery of servers and other equipment. The COVID-19 pandemic has triggered a movement of workers, reducing supply and, anecdotally, raising salaries. An innovation-driven hyperscaler might be able to absorb some of these costs in its margins; it is harder for a cost-differentiated alternative cloud provider.

In the short term, users may question whether they are getting enough value from alternative cloud providers to justify a larger bill (because of price increases). Migrating from an alternative cloud provider can be simpler than from a hyperscaler as they offer simple services with have many commonalities across providers and are, therefore, easier to move.

Hyperscaler services are usually more proprietary, including application programming interfaces coded into the fabric of applications that can be expensive to migrate. In addition, hyperscalers are increasingly opening new data centers in new countries, offsetting some of the alternative cloud providers’ value in locality. As a result, alternative cloud providers are more vulnerable to internal cost increases and buyers demanding lower prices than hyperscalers.

Alternative cloud providers are also vulnerable to a broader demand for cutting-edge technology. In the longer term, will cloud users see IT as something that has to operate cheaply and simply while the business focuses on bigger things, or will they want to pay more for their IT and invest in skills to build more complex yet strategically important applications? Small businesses want to use innovative technologies such as the internet of things, cloud-native development or machine learning to build differentiated and resilient applications that drive the business forward — even if these innovations come at a cost premium.

Data center operators cautiously support nuclear

Data center operators cautiously support nuclear

The value, role and safety of nuclear power has strongly divided opinion, both in favor of and against, since the 1950s. This debate has now, in 2022, reached a critical point again as energy security and prices cause increasing concern globally (particularly in geographies such as Europe) and as the climate crisis requires energy producers to shift toward either non- or low-carbon sources — including nuclear.

At the beginning of 2022, Uptime Institute Intelligence forecast that data center operators, in their search for low-carbon, firm (non-intermittent) power sources, would increasingly favor — and even lobby for — nuclear power. The Uptime Institute Global Data Center Survey 2022 shows that data center operators / owners in major data center economies around the world are cautiously in favor of nuclear power. There are, however, significant regional differences (see Figure 1).

diagram: Nuclear is needed, say operators in most regions
Figure 1 Nuclear is needed, say operators in most regions

In both North America and Europe, about three-quarters of data center operators believe nuclear should either play a core long-term role in providing grid power or is necessary for a period of transition. However, Europeans are more wary, with 35% saying nuclear should only play a temporary or transitional role (compared with just 23% in North America).

In Europe, attitudes to nuclear power are complex and politicized. Following the Chernobyl and Fukushima nuclear accidents, green parties in Europe lobbied strongly against nuclear power, with Germany eventually deciding to close all its nuclear power plants. More recently, the Russian invasion of Ukraine has exposed Germany’s over-reliance on energy imports, and many have called for a halt to this nuclear shutdown.

In the US, there is greater skepticism among the general population about climate change being caused by humans, and consequently surveys record lower levels of concern about carbon emissions. Uptime Intelligence’s survey appears to show data center operators in North America also have lower levels of concern about nuclear safety, given their greater willingness for nuclear to play a core role (Figure 1). As the issues of climate change and energy security intensify, this gap in opinion between the US and Europe is likely to close in the years ahead.

In China, not a single respondent thought nuclear power should be phased out — perhaps reflecting both its government’s stance and a strong faith in technology. China, more than most countries, faces major challenges meeting energy requirements and simultaneously reducing carbon emissions.

In Latin America, and Africa and the Middle East, significantly lower proportions of data center operators think nuclear power should play a key role. This may reflect political reality: there is far less nuclear power already in use in those regions, and concerns about political stability and nuclear proliferation (and cost) will likely limit even peaceful nuclear use. In practice, data center operators will not have a major impact on the use (or non-use) of nuclear power. Decisions will primarily be made by grid-scale investors and operators and will be steered by government policy. However, large-scale energy buyers can make investments more feasible — and existing plants more economic — if they choose to class nuclear power as a renewable (zero-carbon) energy source and include nuclear in power purchase agreements. They can also benefit by siting their data centers in regions where nuclear is a major energy source. Early-stage discussions around the use of small modular reactors (SMRs) for large data center campuses (see Data center operators ponder the nuclear option) are, at present, just that — exploratory discussions.