Rising energy prices

Will high energy prices push operators to direct liquid cooling?

The data center industry and other large power-consuming industries continue to feel pressure from skyrocketing electricity prices. In Germany and France, wholesale energy prices this August increased six-fold compared to prices from 2021. The US has fared better, but wholesale electricity prices have doubled this summer compared with last year’s prices.

While leased data center operators can typically pass on these higher energy costs to tenants, many IT service providers, such as web-hosting platforms and cloud data center operators, have seen their profits erode. High energy prices contributed to the bankruptcy of the UK division of colocation and cloud provider Sungard Availability Services in March 2022, followed by a bankruptcy filing for its US and Canadian operations in April.

A positive side effect of historically high energy prices is that investments in efficiency become more attractive. Industry-wide, power usage effectiveness (PUE) has been largely stagnant in recent years and cooling remains the largest source of inefficiency (see the Uptime Institute Global Data Center Survey 2022).

Direct liquid cooling (DLC) of IT hardware, while still relatively niche, can deliver significant energy savings for digital infrastructure. Even before the latest spikes in power costs, energy savings were already the top attraction for operators considering DLC. Uptime Institute’s Direct Liquid Cooling Survey, conducted early in 2022, shows that two-thirds of enterprise respondents think cost savings are the key factor for organizations considering a switch to DLC (see Figure 1).

Figure 1 Energy savings and sustainability are top DLC drivers
Figure 1 Energy savings and sustainability are top DLC drivers

This is a potential shift in the adoption dynamics for DLC: for early adopters, high rack density was the major catalyst in moving away from air-cooled systems. The recent spikes in energy prices, however, may push an even higher proportion of operators to consider DLC as a means to reduce energy costs.

DLC enables efficiency gains for both the data center facility and the IT hardware. For facility cooling, DLC offers the potential to use less energy for mechanical refrigeration — or in some cases none, depending on implementation and local climate conditions. It also substantially lowers the volume of air that needs to be moved around the data hall, thereby reducing energy consumption from air handlers.

There are further efficiency gains to be made from powering IT hardware, through the elimination of server fans and the potential for lower static power losses in the silicon, by significantly reducing IT operating temperature.

These savings in IT power are nontrivial and difficult to quantify, but models estimate that they can be considerable — ranging from 10% to 20% of total IT power. Yet, despite the energy and cost savings associated with DLC, there are some key barriers to adoption that allow air cooling to dominate:

  • A lack of standardization for existing DLC technologies.
  • Concerns over coolant leaks and material compatibility, which limit the speed of DLC adoption.
  • Retrofitting existing data centers with DLC may not be economically sound unless the facility already uses a chilled water loop.
  • Racks need to be densified (typically above 20 kilowatts per rack) for DLC to be economically viable.

Sustainability is an additional key factor that is likely to drive DLC adoption this decade. Energy savings translate into reductions in Scope 2 emissions (from purchased, off-site electricity), which is a major focus for companies seeking to improve their sustainability credentials.

The combination of this commitment to sustainability and historically high energy prices for the foreseeable, means data center operators have an unprecedented and powerful incentive to improve their infrastructure efficiency, strengthening the business case for a shift to DLC.

AWS price cuts: is serverless gaining momentum?

AWS price cuts: is serverless gaining momentum?

A serverless platform is an abstracted cloud computing service that executes a user’s code without the user needing to provision the underlying server or operating environment. The physical server, resources and operating environment used to execute the user’s code are managed by the cloud provider and are not accessible to the user (hence “serverless”). In July, Amazon Web Services (AWS) announced a price cut to its serverless platform, Lambda, but only for high levels of consumption.

Why would AWS make this price cut? In a serverless computing service, developers upload code via a graphical user interface or application programming interface (API). The user defines a trigger that executes this code, such as an API call sent from an application, a timer or an event on another cloud service. When triggered, the serverless platform assigns resources to the code, executes it and returns any result.

What differentiates serverless from platform as a service is that the user is only billed for the precise period the platform runs the algorithm. Aside from the cost of using persistent cloud storage, there is no ongoing expense for a serverless application when dormant, making it economically attractive for “bursty” workloads that exist for short periods of time when demand necessitates.

Serverless computing is a relatively recent innovation in IT. AWS pioneered the serverless cloud model (also called function as a service) with the launch of Lambda in 2014. Usage of Lambda is billed to the user using two metrics: the number of transactions and the total amount of memory consumed by code execution for the period executed, expressed in gigabyte (GB)-seconds.

AWS has announced a 10% to 20% discount for monthly consumption of Lambda of over six billion GB-seconds. To achieve this discount, users would have to consume a substantial amount in practice. No price cuts have been announced below the six billion GB-second threshold.

The price cut is unlikely to be related to market price pressure or internal cost reductions. The driver is likely to be that some organizations have grown their serverless consumption to a cost-prohibitive point. To reduce this barrier, AWS has opted to take a hit on gross margins for larger consumers in the belief that their sales volume will increase to offset the loss.

An organization’s cloud consumption is more likely to rise than fall over time. Most would prefer that their cloud applications grow to meet demand, rather than lock down scalability to save costs. Cost efficiency is rarely at the forefront of application development and operations teams. Many Lambda users will likely see increased usage and increased bills but this isn’t necessarily a problem if it translates into business benefits.

Analysis of the price cuts suggests that some users are consuming significant amounts. AWS has announced a 10% discount on the monthly execution costs of consumption between six billion and 15 billion GB-seconds and 20% for greater than 20 billion GB-seconds. Six billion GB-seconds is a considerable capacity: a user would have to consume the equivalent a of 2.3 terabytes (TB) of memory for an entire month to obtain a 10% discount and 5.7 TB to receive a 20% discount. The latter is the equivalent of 178 regular cloud instances, each configured with 32 GB of memory.

These figures demonstrate the significant scale of serverless being adopted by some organizations. Considering AWS has rebuilt its pricing model to reduce costs for these users, the number of organizations consuming such large amounts cannot be trivial. This price drop matters because it makes the serverless model more economical and attractive for use at larger scale, beyond just simple functionality. It signals wider adoption of serverless computing to the market, validating the emerging computing model’s viability. Highly bursty workloads architected to take advantage of serverless platforms are also likely to be more resource-efficient than even modern, containerised software, let alone traditional, nonscalable applications.

EU’s EED recast set to create reporting challenges

EU’s EED recast set to create reporting challenges

The European Commission’s (EC’s) proposed recast of its Energy Efficiency Directive (EED) sets out new and strict reporting requirements for data centers operating in the EU. If passed, data centers with 100 kilowatts or more total installed IT power demand (from server, storage and network equipment) will have to report their energy performance every year, including details on data traffic, quantity of data stored, water use, energy consumption, heat re-use and power utilization (see Table 1).

Table 1 Reporting requirements of proposed EED recast
Table 1 Reporting requirements of proposed EED recast

These reporting requirements raise several concerns for data centers. One concern is that some of the information is simply difficult to collect — at least for some. Most colocation operators do not currently have control or insight into the data traffic, storage and processing being performed on their customers’ IT equipment. For example, it will be challenging for a large retail colocation data center to collect, normalize and aggregate data from tens or hundreds of IT operators with different data collection and management systems into a coherent, accurate and standardized report.

Some parties have also raised concerns about the security risks associated with publicly reporting the names of owners, addresses and other details of data centers — information that is particularly sensitive for financial institutions. At present, it is relatively easy to find the location of data centers, but far more difficult to find details of the owners and operators of those data centers. Other parties are concerned about the administrative and financial burdens imposed on smaller operators.

While some data center operators have welcomed further transparency regarding their energy use, but argue against some of the proposed metrics. Feedback from DigitalEurope, a trade association representing the digital technology industry in Europe, notes that data traffic, processing and storage is unrelated to data center sustainability, as well as being commercially sensitive information. Apart from which, servers account for most data center energy use, yet the new EED makes no attempt to gather indicative data on server power efficiency. This is a missed opportunity to tackle the single largest cause of energy inefficiency in data centers.

As part of the proposed EED recast, the EC is planning mandatory annual reporting of key performance indicators (KPIs). The aggregated data will be used to develop sustainability indicators based on energy efficiency, use of renewable energy, water usage and waste heat utilization (see Table 1). These indicators will be used to define and rate a data center in terms of sustainability.

The EC hopes that by creating a register of energy use based on performance indicator reporting, opportunities for the data center industry to reduce energy consumption and increase efficiency can be identified and encouraged. It also hopes that future sustainability ratings developed from the KPIs will help to provide transparency of and accountability for data center carbon footprints. In time, stakeholders may come to expect this data from operators when evaluating business decisions in the EU.

The stakes are particularly high when it comes to defining renewable energy usage in terms of these KPIs. The legislation is currently unclear as to how the use of carbon offsets (such as guarantees of origin or renewable energy certificates) will be treated in audits or in developing sustainability ratings. Ideally, the ratings should assess the direct use of renewable and zero-carbon energy (as supplied throughout a grid region or through power purchase agreements) to accurately depict a data center’s energy use and carbon footprint. Without greater clarity, the impact of the proposed reporting requirements may instead depend on how the proposed legislation is interpreted by governments or operators.

For more information, see our recent report Critical regulation: the EU Energy Efficiency Directive recast.

This Halloween, beware the vampire server

This Halloween, beware the vampire server

Halloween brings joy to many in the form of tricks and treats. But to IT managers, Halloween is a stark reminder of the evil spirits that hide out of sight in data center cabinets and public cloud applications. Vampires, zombies and ghosts haunt the infrastructure, sucking valuable energy, space and resources. IT managers need to hunt, identify and purge these evil spirits before it’s too late — efficiency and sustainability are at stake.

Vampire (or comatose) servers and virtual machines lie — seemingly dead — in the shadows, their purpose unknown. Left forgotten due to staff changes, poor tagging or deprived documentation, there are no records of their function or value. But their removal should be performed with care. The aggression of a user who loses their valued, but seldom used, business process can be far more terrifying than the vampire that was slain.

Similarly, zombie servers and virtual machines wander the data center infrastructure, forgotten and unclaimed by the user that provisioned them. Readily identified by low utilization, they add no value and offer no useful purpose. They should be quietly put to rest and repurposed, refurbished or recycled.

Ghost data is generated and saved without a known purpose or immediate need. It haunts storage devices, occupying valuable terabytes. The data may be generated by a vampire server, creating an ever-increasing volume of data and making resolution evermore critical. In an ideal world, the server should be banished and the data deleted, but this may not be feasible if the data’s purpose is unknown or encrypted using a, seemingly, lost encryption key. The ghost data may need to be sent to a permanent grave of low-energy and low-cost long-term storage, available to be called back from the dead when needed.

Fortunately, there are tools available to help wannabe paranormal investigators banish these demons: data center administrative systems, specialty workload management software and cloud management tools.

IT administrators have access to data center management tools, such as data center infrastructure management (DCIM) programs that identify applications, and track and report resource demand and utilization for assigned resources. This data can be used to find servers and virtual machines with either low or no resource utilization or activity. Reports can be generated to list candidates for shutdown or consolidation. Many cloud providers such as Amazon Web Services, Google Cloud and Microsoft Azure offer this functionality for free in cloud management portals.

Software tools are also available to identify these vampires, zombies and ghosts. Products such as Densify, Granulate, Turbonomic Application Resource Management, TSO Logic and others scan physical servers and applications placed in the public cloud. They assess central processing unit (CPU), memory and application service levels, recommending or implementing resource adjustments to maximize operational efficiency, and minimize energy and resource use. Where an application or piece of IT equipment is not being used, they are slated for shutdown and removal. This can be done automatically or verified and initiated by a system administrator. Where resources are over or underutilized, application placements are adjusted to optimize their deployment so as to minimize resource use and improve resiliency and reliability.

Many spirits can be banished before they appear if the IT manager enforces a process to register, deploy and manage IT equipment and applications. This process is augmented and improved using a software monitoring tool. Properly executed, the process enables tracking and management of all equipment and applications to prevent the appearance of vampires, zombies and ghosts.

This is particularly important when users of the IT environment can conjure up an application or bare metal server from the public cloud in a matter of minutes. Easy launching and deployment, meant to simplify and improve the user’s experience, also feeds the system administrator’s worst nightmare of uncontrolled spirits wandering across their environments.

Controlling these spirits is an important aspect of a sustainability strategy. Eliminating wasted work and equipment reduces energy use and increases delivered work per watt of consumed energy. It is hauntingly beautiful to behold.

Data centers weather solar storms

Data centers weather solar storms

The US Space Weather Prediction Center (SWPC) issued multiple geomagnetic storm watches throughout August and September 2022. Geomagnetic storms occur when solar storms interact with the Earth’s atmosphere and magnetic field, risking disruption to satellites, radio communications and the power grid. The strongest predicted storm earned the agency’s rating of G3 (“strong”) on a five-level classification system of geomagnetic activity — intense enough to require voltage corrections on the grid in some regions. In comparison, an “extreme” G5 geomagnetic storm could overheat or destroy high-voltage transformers, leading to widespread and lasting power outages.

A geomagnetic storm represents one type of electromagnetic pulse (EMP) — a rapid discharge of electromagnetic energy. A geomagnetic storm, also known as a geomagnetic disturbance or geomagnetic EMP, has secondary effects on power consumers that expose data centers to risk of equipment disruption and damage. The three types of EMP (geomagnetic, nuclear, or intentional) vary in their physical characteristics, but each can endanger data centers (see the Uptime Institute report Electromagnetic pulse and its risk to data centers). However, many operators do not include any type of EMP in their risk assessments, and have not implemented protective measures.

The SWPC monitors solar events and can provide hours’ or days’ notice of events likely to impact Earth. The long-term prediction of an individual geomagnetic EMP event on Earth is not possible currently. Solar events that could cause geomagnetic EMP events occur frequently but chaotically; they are often directed away from Earth and astronomers can only predict them on the basis of probability. For example, a G5 (“extreme”) event typically reaches Earth once every 25 years. Such an event caused a nine-hour outage of the Hydro-Québec transmission system in 1989. An extreme geomagnetic EMP damaged 12 transformers in South Africa in 2003. Before the advent of today’s power grid, the 1859 Carrington Event (the most intense geomagnetic storm in history) caused sparking and fires at multiple telegraph stations.

Due to its low frequency, geomagnetic EMP acts most strongly on electrical conductors running miles in length, such as high-voltage transmission lines in the power grid. The induced current behaves similarly to direct current (DC) in a system designed for alternating current (AC). Most storms, such as those in late summer 2022, are not intense enough to cause power outages. Grid operators can compensate for the induced currents of a smaller EMP event and continue delivering power. Data centers may experience problems with power quality, however — specifically, harmonic distortion (defects in AC voltage waveforms). As power is transmitted from high-voltage lines to utility customers, it passes through a succession of transformers — each of which steps down the voltage but intensifies harmonics. Harmonics are at their greatest intensity once the power reaches the end user, in this case a data center.

Most data center uninterruptable power supply (UPS) systems are designed to accommodate some harmonics and protect downstream equipment, but geomagnetic EMP events can overwhelm these built-in protections — potentially damaging the UPS or other equipment. The effects of harmonics inside a data center can include inefficient UPS operation, UPS rectifier damage, tripped circuit breakers, overheated wiring, malfunctioning motors in mechanical equipment and, ultimately, physical damage to IT equipment.

Some grid operators are already installing protective devices to guard their infrastructure against geomagnetic EMP, sparing their customers the secondary effects of outages and harmonics.

Those data center operators that include EMP risk as part of an overall risk assessment can improve their infrastructure resiliency by implementing their own EMP safeguards. The primary threats to individual data centers from geomagnetic EMP —power outages and harmonics — act via the power grid. Operators can manage this risk by disconnecting from the grid and operating on backup power. Increased on-site fuel / energy storage may be appropriate in preparing for smaller geomagnetic storms.

In the event of a large geomagnetic storm, the entire population of the affected area will be competing for fuel and other supplies and prolonged power outages are likely to outlast data centers’ fuel-storage capacity. These same conditions are likely to affect users of those applications running in a data center, leaving them unable to connect. Relying on geographically dispersed multisite resiliency (stretching over thousands of miles) is likely to offer more effective protection. More localized EMP effects — from, for example, a small geomagnetic EMP or an intentional EMP — will not affect more distant locations, so there may be a stronger economic argument for maintaining availability through such events.

Awareness of EMP as a risk to data centers is increasing. Best practice in EMP protection is not well established in the data center industry yet, but periodic risk assessments will enable operators to incorporate updated information and guidelines as these become available.

Sacrifice speed to cut cloud carbon and costs

Sacrifice speed to cut cloud carbon and costs

New findings from research by Uptime Institute Intelligence reveals that organizations can cut both their cloud carbon emissions and costs by moving workloads to different regions. However, the trade off with this migration is an increase in latency.

Cloud users choose regions based primarily on two factors:

  1. Locating the application close to end users improves the user experience by delivering faster content. Some applications, such as interactive gaming, require very low latency, which is driving cloud providers to invest in new edge locations close to end users. Not all applications, however, need such a quick response and end users can often tolerate a slight increase in latency without material impact to their experience.
  2. Offering cloud-based services in a country usually has implications regarding data protection and this can be partly addressed by keeping data within the same jurisdiction as their end users.

If there are no legal reasons to keep data in a jurisdiction, cloud users can often migrate their workloads to a nearby region and gain reductions in their carbon footprint; this migration can also result in lower costs. Uptime Intelligence collated information from Microsoft Azure, Amazon Web Services (AWS), Google Cloud, the Cloud Carbon Footprint project (which sources data from carbonfootprint.com, the European Environment Agency and the US Environmental Protection Agency) and CloudPing to produce the Cloud Carbon Explorer, which includes three interactive maps.

The maps show the potential cross-region workload migration paths for AWS, Google Cloud and Microsoft Azure. These workload migration paths can reduce carbon footprint without significantly impacting user experience and, in some cases, reduce cost. Users can use the tool to explore suitable compromises of latency, cost and carbon for each application.

We found 38 migration paths between AWS regions that provide both carbon and cost reductions with a latency impact of less than 100 milliseconds (ms). For Google Cloud, there were 39 migrations paths and for Microsoft Azure there were 15 migration paths, all with a latency impact of <100 ms.

Figure 1 shows the tool’s analysis of possible workload migrations from AWS’s Frankfurt data center. The Carbon Cloud Footprint project estimates AWS’s Frankfurt data center to have relatively high grid carbon emissions compared with the rest of Europe. Migration to Stockholm, Milan, Paris or London all provide significant reductions in carbon and cost, with a maximum increase in latency of 30 ms.

Figure 1 Cloud Carbon Explorer: migration paths from AWS’s Frankfurt region
Figure 1 Cloud Carbon Explorer: migration paths from AWS’s Frankfurt region

The bubble size in Figure 1 represents the grid carbon emissions, and the thickness of the line represents the latency impact (wider equals slower). For example, clicking on the migration path pointer from Frankfurt to Stockholm shows a potential 98% cut in grid emissions. The color of the line indicates impact to cost, with green representing a cost saving (not shown in this example: yellow lines representing <5% cost increases).

Users can also make carbon and cost reductions when using Microsoft Azure or Google Cloud. For instance, the Cloud Carbon Explorer shows that by moving a virtual machine from Google’s Hong Kong data center to Taiwan, grid emissions drop 14% and the cost of an e2-standard-2 virtual machine decreases by 17%. The trade-off is a slight increase of round-trip latency of 13 ms. In another example, Microsoft Azure users can reduce carbon and cost by migrating their workloads from Iowa (US) to Illinois (US). For a slight increase in latency of 13 ms, the cost of a D2as_v5 virtual machine drops by 12% and grid carbon emissions decrease by 17%.

The Cloud Carbon Explorer provides indicative carbon, cost and latency figures based on various assumptions. A lack of data is a significant problem for users in calculating their cloud carbon footprints. This difficulty in acquiring the appropriate data is the reason Uptime Intelligence have used third-party sources and were not able to evaluate all regions for all cloud providers. In addition, the individual characteristics of specific data centers (such as power usage effectiveness variations) have not been considered due to a lack of comprehensive information. Although the analysis is imperfect, it does demonstrate, however, that there are suitable trade-offs to be made.

Organizations should commit to cutting carbon emissions, partly for regulatory reasons and also because sustainability is high on consumer and corporate agendas. However, moving region isn’t always simple, and there are legal and latency repercussions to consider. As discussed in our report How resiliency drives cloud carbon emissions, before users consider migrating workloads, they should investigate whether the application can be tuned to reduce carbon.

The next step for users to reduce their carbon (and cost) involves the choice of data center. Users need to balance the benefits of carbon reduction (sometimes coupled with a cost reduction) against the impact of a latency increase.

Uptime Intelligence believes the third-party data sources used in the Cloud Carbon Explorer are reliable and fair, but we have not audited them in depth. Users should use the interactive maps to evaluate feasible migrations before performing their own more detailed assessments. Our analysis suggests that these investigations are worthwhile, given the potential savings in both carbon and costs.