Blog Multi Author - Uptime Institute Blog

Large data centers are mostly more efficient, analysis confirms

February 7, 2024/in Executive, Operations/by Jacqueline Davis, Research Analyst, Uptime Institute, [email protected]

Uptime Institute calculates an industry average power usage effectiveness (PUE), which is a ratio of total site power to IT power, each year using data from the Uptime Institute Global Data Center Survey. This PUE data is pulled from a large sample over the course of 15 years and provides a reliable view of progress in facility efficiency.

Uptime’s data shows that industry PUE has remained at a high average (ranging from 1.55 to 1.59) since around 2020. Despite ongoing industry modernization, this overall PUE figure has remained almost static, in part because many older and less-efficient legacy facilities have a moderating effect on the average. In 2023, industry average PUE stood at 1.58.

For the 2023 annual survey, Uptime refined and expanded the survey questionnaire to provide deeper insights into the industry trends and improvements underlying the slow-moving average. This analysis builds on Uptime’s recent PUE research that focused on the influence of facility age and regional location (see Global PUEs — are they going anywhere?).

Influence of larger sites on PUE

Uptime’s headline PUE figure of 1.58 (weighted per respondent, for consistency with historical data) approximates the efficiency of an average facility. The new survey data allows us to analyze PUE in greater detail across a range of facility sizes. We applied provisioned IT capacity (in megawatts, MW) as a weighting factor, to examine PUE of a normalized unit of IT power. Using this approach, the capacity-weighted PUE figure is 1.47 — a result that many may expect, given the large amount of total IT power deployed in larger (often newer) data centers.

Larger facilities tend to be more efficient — most are relatively new and use leading-edge equipment, with more efficient cooling designs and optimized controls. Modernization of smaller facilities is less likely to yield a return on investment from energy savings. In Figure 1, Uptime compares survey respondents’ annual average PUE based on their data center’s provisioned IT capacity, showing a clear trend of efficiency improvements as data centers increase in capacity size.

Figure 1. Weighted average PUE by data center IT capacity

Diagram: Weighted average PUE by data center IT capacity

The PUE metric was introduced to track the efficiency of a given data center over time, rather than to compare between different data centers. Uptime analyzes PUE in large sample sizes to track trends in the industry, including the influence of facility size on PUE. Other factors shaping PUE include IT equipment utilization, facility design and age, system redundancy and local climate conditions.

Capacity and scrutiny will grow

Data centers are expanding in capacity — and this will warrant closer attention to the influence of facility size on efficiency. Some campuses currently have capacities of 300 MW, and several others are planned to reach in excess of 1 gigawatt (GW), which is between 10 and 30 times more power than the largest data centers seen in recent years. Uptime has identified approximately 28 hyperscale colocation campuses in development in addition to existing large hyperscale cloud sites. If the planned capacity of these sites is realized, they would account for approximately one-quarter of data center energy consumption globally.

These hyperscale colocation campuses, in common with many large new colocation facilities in existing prime data center locations, are designed for PUEs significantly below the industry average (1.4 and lower). Scala Data Centers is one example: it is building its Tamboré Campus in São Paolo (Brazil), which is intended to reach 450 MW with a PUE of 1.4. To preserve an economic advantage, the organization will need to optimize efficiency as the number of tenants filling the data halls increases.

Cloud hyperscalers Google, Amazon Web Services and Microsoft already claim PUE of 1.2 or lower at some sites. However, this is not always representative of the actual PUE of a customer application in the cloud. Their workloads may be provisioned by a colocation partner whose PUE is higher, or cloud workloads may need to be replicated across one or more availability zones — driving up energy usage and aggregating PUE across multiple facilities.

PUE improvements will be demanded as legislatures start to reference PUE in binding regulations. The new Energy Efficiency Act in Germany, which came into force in September 2023, mandates data centers in Germany to achieve a PUE of 1.5 from July 1, 2027 and a PUE of 1.3 from July 1, 2030. New data centers opening from July 1, 2026 are required to have a PUE of 1.2, or less — which even new build operators may find challenging at higher levels of resiliency.

The Uptime Intelligence View

Facility size is one of many important factors influencing facility efficiency and this is reflected in the capacity-weighted average PUE figure of 1.47, as opposed to the per-site average of 1.58. The data may suggest that over time, the replacement of older sites with larger more efficient ones may produce a more impactful or immediate improvement in efficiency than modernizing smaller sites.

Jacqueline Davis, Research Analyst, [email protected]

John O’Brien, Senior Research Analyst [email protected]

When net-zero goals meet harsh realities

January 24, 2024/in Executive, Operations/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]

For more than a decade, the data center industry — and the wider digital infrastructure that relies on it — has lived with the threat of much greater sustainability legislation or other forms of mandatory or semi-mandatory controls. But in a period of boom, it has mostly been a background worry, with legislators more concerned about disrupting an important new industry.

The EU, for example, first introduced the voluntary Code of Conduct for data centers in 2008, warning that legislation would follow if carbon and energy footprints were not brought under control. In the UK, a carbon reduction commitment was required of data centers but was later withdrawn.

Some countries and states, including California, Amsterdam and Singapore, have introduced tighter planning restrictions for data centers and even moratoriums on new developments. However, some of these have been watered down or suspended.

Since 2018, Uptime Institute has repeatedly warned of the likelihood of more legislation and greater public pressure, advising operators to, at the very least, avoid over-ambitious statements, collect better data and prepare. But the pressure to do so has not been strong: improvements in energy efficiency and processors’ performance (Moore’s law), along with the greater use of cloud computing, have held down energy and carbon use, while facility efficiency has gradually improved.

This “green honeymoon” is, however, coming to an end and for some, this will be both painful and expensive. From 2024, new reporting laws and a toughening of requirements will enforce stricter carbon reporting in many countries. These will attempt to ensure that corporate promises are both realistic and evidence-based (see the end of the report for details), and their effects will not be confined to countries or states where there is primary legislation.

A difficult period ahead

Meeting these tougher public goals will not be easy. For several different reasons, which span software and processor developments to the availability of renewable energy in the power grid, the ability of organizations using digital infrastructure to contain or reduce their energy use and carbon emissions will become more difficult.

Ultimately, these pressures may combine to encourage the widespread adoption of more aggressive and thoughtful sustainability strategies as well as a period of progressive and effective investment.

But Uptime Intelligence is also forecasting a difficult period for the sector from 2024 to 2030, as organizations miss sustainability goals and reporting requirements, battle with regulators and even some partners, and struggle to align their corporate business goals with wider sustainability objectives.

For example, in August, the UN-backed Science Based Targets initiative (SBTi) removed Amazon’s operations (including AWS) from its list of committed companies, as Amazon had failed to validate its net-zero emissions target to the SBTi criteria for science-based targets.

This is part of a wider trend. The CDP, previously known as the Carbon Disclosure Project and the most comprehensive global registry of corporate carbon emission commitments, recently said that of the 19,000 companies with registered plans on its platform, only 81 were credible.

A clear disconnect

In the coming years, larger and listed companies in most major economies will have to report their carbon emissions and climate-related risks, sometimes under financial reporting law and sometimes through special directives — the EU’s Corporate Sustainability Reporting Directive (CRSD) and California’s Climate Corporate Data Accountability Act (passed in September 2023) are two examples. The US Securities and Exchange Commission will also eventually require some emissions and risk disclosure from listed companies.

In some jurisdictions, the reporting and improvement of energy use will be required. The latest recast of the EU’s Energy Efficiency Directive (EED), finally published in October 2023, has detailed reporting requirements for data centers that include IT and network equipment use. The German implementation of the EED goes a step further, setting down PUE levels and requirements to reuse heat (with some exceptions). It also requires separate reporting by owners and operators of IT in colocation data centers.

There is a move towards greater precision and accountability at the non-governmental level, too. The principles of carbon emission measurement and reporting that underpin, for example, all corporate net-zero objectives tend to be agreed upon internationally by institutions such as the World Resources Institute and the World Business Council for Sustainable Development; in turn, these are used by bodies such as the SBTi and the CDP. Here too, standards are being rewritten, so that, for example, the use of carbon offsets is becoming less acceptable, forcing operators to buy carbon-free energy directly.

With all these developments under way, there is a startling disconnect between many of the public commitments by countries and companies, and what most digital infrastructure organizations are currently doing or are able to do. Figure 1 below shows that, according to two big surveys from Uptime Institute and IBM, far less than half of managers polled in IT (digital infrastructure) organizations say they are currently tracking any kind of carbon emission data (fuel, purchased electricity, and purchased goods and services).

Figure 1. Digital infrastructure’s tracking of carbon emissions

Diagram: Digital infrastructure’s tracking of carbon emissions

The difference between the two surveys highlights a second disconnect. IBM’s findings, based on responses from senior IT and sustainability staff, show a much higher proportion of organizations collecting carbon emission data than Uptime’s. However, the Uptime group is more likely to be directly responsible for electricity bills and associated carbon emissions, as well as generator fuel use, and is therefore more likely to have the tools and knowledge to collect the data.

One explanation for this is that sustainability and senior IT staff may not always collect all the underlying data but may use higher-level models and estimates. While this may be legally acceptable, it will not provide the data to identify waste and make the critical, detailed improvements necessary to reduce the digital carbon footprint.

In interviews with enterprises and colocation companies, Uptime similarly found that most of those concerned with reducing energy consumption or collecting sustainability data have limited contact with sustainability or executive teams.

Further challenges

Accurate, timely reporting of carbon emissions and other data will be difficult enough for many digital infrastructure operators, especially as it extends to Scope 3 (embedded, third-party and supply chain emissions). But operators will ultimately face a challenge that will not only be more difficult but may require significant investment: reducing emissions, whether in absolute terms or relative to the overall business workload.

Reducing emissions has never been easy, so why may it become more difficult? The first set of problems relates to IT. In the past five years, Moore’s law-type improvements in processor performance have slowed, supplemented or replaced multi-core processors and GPUs. These are doing more work but also require more power, which pushes up the power and cooling requirements both at a server level and in aggregate across data centers. Significantly improved cooling (e.g., direct liquid cooling), better utilization of IT and better, more intelligent management of workloads will be required to prevent runaway power consumption and carbon emissions.

A second set of problems relates to the energy grid. In most regions, it will take decades before grids are able to operate carbon-free most or all the time. But carbon reporting standards will increasingly require the use of in-region, carbon-free energy (or renewable energy). As more data center operators seek to buy carbon-free energy to meet net-zero goals, this renewable energy will rise in price — if it is available at all. Purchasing enough carbon-free energy to match all demand (24×7) will, at best, be expensive and, at worst, impossible.

The third problem is the continuing explosion in workload growth. Energy use by data centers is currently estimated to be between 150 terawatt-hours (TWh) and 400 TWh a year. And even without generative AI, it is universally expected to increase significantly, with some predictions expecting this to double or more beyond 2030 as workloads increase. With generative AI — the overall impact of which is as yet not fully understood — energy use could skyrocket, straining power grids and supply chains, and rendering carbon emission targets yet more difficult to meet.

The Uptime Intelligence View

This analysis suggests that, for most operators of digital infrastructure, it will not only be very difficult to meet stated carbon emission targets, but new reporting requirements will mean that many will be seen to fail. This may be expensive and incur reputational damage. Managers should work closely with all appropriate functional departments and partners to develop a strategy based on realistic goals and real data. No one should announce public goals without first doing this work.

What role might generative AI play in the data center?

January 10, 2024/in Design, Executive, Operations/by John O’Brien, Senior Research Analyst, Uptime Institute, [email protected]

Advances in artificial intelligence (AI) are expected to change the way work is done across numerous organizations and job roles. This is especially true for generative AI tools, which are capable of synthesizing new content based on patterns learned from existing data.

This update explores the rise of large language models (LLMs) and generative AI. It examines whether data center managers should be as dismissive of this technology as many appear to be and considers whether generative AI will find a role in data center operations.

This is the first in a series of reports on AI and its impact on the data center sector. Future reports will cover the use of AI in data center management tools, the power and cooling demands of AI systems, training and inference processing, and how the technology will affect core and edge data centers.

Trust in AI is affected by the hype

Data center owners and operators are starting to develop an understanding of the potential benefits of AI in data center management. So long as the underlying models are robust, transparent and trusted, AI is proving beneficial and is increasingly being used in areas such as predictive maintenance, anomaly detection, physical security and filtering and prioritizing alerts.

At the same time, a deluge of marketing messages and media coverage is creating confusion around the exact capabilities of AI-based products and services. The Uptime Institute Global Data Center Survey 2023 reveals that managers are significantly less likely to trust AI for their operational decision-making than they were a year ago. This fall in trust coincides with the sudden emergence of generative AI. Conversations with data center managers show that there is widespread caution in the industry.

Machine learning, deep learning and generative AI

Machine learning (ML) is an umbrella term for software techniques that involve training mathematical models on large data sets. The process enables ML models to analyze new data and make predictions or solve tasks without being explicitly programmed to do so, in a process called inferencing.

Deep learning — an advanced approach to ML inspired by the workings of the human brain — makes use of deep neural networks (DNNs) to identify patterns and trends in seemingly unrelated or uncorrelated data.

Generative AI is not a specific technology but a type of application that relies on the latest advances in DNN research. Much of the recent progress in generative AI is down to the transformer architecture — a method of building DNNs unveiled by Google in 2017 as part of its search engine technology and later used to create tools such as ChatGPT, which generates text, and DALL-E for images.

Transformers use attention mechanisms to learn the relationships between datapoints — such as words and phrases — largely without human oversight. The architecture manages to simultaneously improve output accuracy while reducing the duration of training required to create generative AI models. This technique kick-started a revolution in applied AI that became one of the key trends of 2023.

LLMs which are based on transformers, like ChatGPT, are trained using hundreds of gigabytes of text and can generate essays, scholarly or journalistic articles and even poetry. However, they face an issue that differentiates them from other types of AI and prevents them from being embraced in a mission-critical setting: they cannot guarantee the accuracy of their output.

The good, the bad and the impossible

With the growing awareness of the benefits of AI comes greater knowledge of its limitations. One of the key limitations of LLMs is their tendency to “hallucinate” and provide false information in a convincing manner. These models are not looking up facts; they are pattern-spotting engines that guess the next best option in a sequence.

This has led to some much-publicized news stories about early adopters of tools like ChatGPT landing themselves in trouble because they relied on the output of generative AI models that contained factual errors. Such stories have likely contributed to the erosion of trust in AI as a tool for data center management — even if these types of issues are exclusive to generative AI.

It is a subject of debate among researchers and academics whether hallucinations can be eliminated entirely from the output of generative AI, but their prevalence can certainly be reduced.

One way to achieve this is called grounding and involves the automated cross-checking of LLM output against web search results or reliable data sources. Another way to minimize the chances of hallucinations is called process supervision. Here, the models are trained to reward themselves for each correct step of their reasoning rather than the right conclusion.

Finally, there is the creation of domain-specific LLMs. These can be either built from scratch using data sourced from specific organizations or industry verticals, or they can be created through the fine-tuning of generic or “foundational” models to perform well-defined, industry-specific tasks. Domain-specific LLMs are much better at understanding jargon and are less likely to hallucinate when used in a professional setting because they have not been designed to cater to a wide variety of use cases.

The propensity to provide false information with confidence likely disqualifies generative AI tools from ever taking part in operational decision-making in the data center — this is better handled by other types of AI or traditional data analytics. However, there are other aspects of data center management that could be enhanced by generative AI, albeit with human supervision.

Generative AI and data center management

First, generative AI can be extremely powerful as an outlining tool for creating first-pass documents, models, designs and even calculations. For this reason, it will likely find a place in those parts of the industry that are concerned with the creation and planning of data centers and operations. However, the limitations of generative AI mean that its accuracy can never be assumed or guaranteed, and that human expertise and oversight will still be required.

Generative AI also has the potential to be valuable in certain management and operations activities within the data center as a productivity and administrative support tool.

The Uptime Institute Data Center Resiliency Survey 2023 reveals that 39% of data center operators have experienced a serious outage because of human error, of which 50% were the result of a failure to follow the correct procedures. To mitigate these issues, generative AI could be used to support the learning and development of staff with different levels of experience and knowledge.

Generative AI could also be used to create and update the method of procedures (MOPs), standard operating procedures (SOPs) and emergency operating procedures (EOPs), which can often get overlooked due to time and management pressures. Other examples of potential applications of generative AI include the creation of:

Technical user guides and operating manuals that are pertinent to the specific infrastructure within the facility.
Step-by-step maintenance procedures.
Standard information guides for new employee and / or customer onboarding.
Recruitment materials.
Risk awareness information and updates.
Q&A resources that can be updated as required.

In our conversations with Uptime Institute members and other data center operators, some said they used generative AI for purposes like these — for example, when summarizing notes made at industry events and team meetings. These members agreed that LLMs will eventually be capable of creating documents like MOPs, SOPs and EOPs.

It is crucial to note that in these scenarios, AI-based tools would be used to draft documents that would be checked by experienced data center professionals before being used in operations. The question is whether the efficiencies gained in the process of using generative AI offset the risks that necessitate human validation.

The Uptime Intelligence View

There is considerable confusion around AI. The first point to understand is that ML employs several techniques, and many of the analytics methods and tools can be very useful in a data center setting. But it is generative AI that is causing most of the stir. Given the number of hurdles and uncertainties, data center managers are right to be skeptical about how far they can trust generative AI to provide actionable intelligence in the data center.

That said, it does have very real potential in management and operations, supporting managers and teams in their training, knowledge and documentation of operating procedures. AI — in many forms — is also likely to find its way into numerous software tools that play an important supporting role in data center design, development and operations. Managers should track the technology and understand where and how it can be safely used in their facilities.

John O’Brien, Senior Research Analyst, Uptime Institute [email protected]

Max Smolaks, Research Analyst, Uptime Institute [email protected]

Looking for the x-factor in data center efficiency

December 13, 2023/in Design, Executive, Operations/by Daniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]

The suitability of a data center environment is primarily judged by its effect on the long-term health of IT hardware. Facility operators define their temperature and humidity set points with a view to balancing hardware failure rates against the associated capital and operational expenditures, with the former historically prioritized.

Over the past decade, this balance has shifted in favor of facility efficiencies as data center operators have gradually lifted their temperature (and humidity) limits from a typical 65°F to 68°F (18°C to 20°C) to somewhere between 72°F and 77°F (22°C and 25°C). However, many operators remain reluctant to further relax their temperature settings even for new data center designs, let alone their existing footprint.

This conservative stance goes against the empirical evidence. Industry body ASHRAE issued its failure rate guidance, called x-factor, almost a decade ago, yet it remains underused in key data center planning and design decisions. As a result, data centers have larger, and thus more expensive, cooling systems that use more energy than the data justifies.

Operators (and their tenants) that do follow x-factor in their cooling design and control strategy may reap the benefits not only in lower construction costs and energy savings but also in the possibility of fewer IT hardware failures. This Uptime Intelligence Update offers an overview of what x-factor is, what it indicates and what cooling optimization opportunities it presents.

Navigating x-factor data

ASHRAE and IT vendors tend to agree that it is best to operate hardware well below the specified temperature limits as standard. This is both to extract the maximum performance and to minimize the thermal stress on components — heat accelerates wear, which ultimately increases the likelihood of failures. Hard disk drives, with their moving parts, are the most prone to failure under heat stress.

X-factor is a dimensionless quantity that estimates how the annualized likelihood of IT hardware component failures change with inlet temperature. The proportional change to failure rates assumes continuous operation at a given temperature when compared with the baseline of 68°F (20°C) — a temperature set point historically preferred by many operators. Based on empirical data from IT vendors and published in 2014, x-factor comprises three data series:

X-factor numbers, which indicate the average (typical) hardware configuration of a server.
Upper-bound numbers, which reflect the server hardware configurations that are more prone to temperature-related failures. For example, less heat-resistant system designs or those with a lot of hard disk drives.
Lower-bound numbers, which reflect the server hardware configurations that are less likely than the average to develop failures. For example, those designs that are optimized for airflow and heat distribution or are diskless.

Figure 1 depicts the correlation between the likelihood of failure rate and the temperature up to 90.5°F (32.5°C), which is just above the upper limit of ASHRAE’s allowable envelope for Class A1 data center environments. These x-factor numbers represent changes in the likelihood of annualized IT component failures when operated at the chosen temperature continuously.

Figure 1. X-factor data visualized up to 90.5°F (32.5°C)

Diagram: X-factor data visualized up to 90.5°F (32.5°C)

For example, x-factor data indicates that when IT hardware operates at a constant 77°F (25°C) to reduce cooling energy needs, the annualized component failure rates will likely increase anywhere between 4% and 43% (midpoint 24%) when compared with the baseline at 68°F (20°C). This elevated x-factor quantifies the outcome of the accelerated wear on the hardware.

These failures do not necessarily represent a complete failure of an IT system, but cover any type of hardware failure, including redundant components. Some failures will manifest themselves in system instability and restarts. It is worth noting that modern IT systems are typically designed and warranted to operate in temperatures up to 95°F (32°C) as standard, although this may be dependent on the configuration.

Giving x-factor dimension: failure rate scenarios

Unfortunately, a key piece of information is missing that affects the trade-off calculations: the actual baseline failure rate before x-factor is applied. For business confidentiality reasons, IT vendors do not publish their respective field data, not even their calculated figures based on accelerated aging tests, such as the mean time between failures or failures in time. Based on what Uptime Intelligence was able to gather from the limited disclosures and academic publications, the following information can be surmised:

The best, most durable IT system designs may have an annualized baseline failure rate of around 2% or less. This is due to good thermal management, robust build quality and the selection of components based on their durability.
The difference in annualized failure rates between the most and least durable hardware is probably a factor of three or four (i.e., 6% to 8% of IT systems that develop a component failure in a year).
Unsurprisingly, hard disk drives are the most common source of failure. In most cases, these do not result in a system outage or data loss due to the ubiquitous use of data redundancy techniques.

As an example, if there is an assumed 4% baseline failure rate, operating constantly at 77°F (25°C) will mean an increase in typical IT system component failures by approximately 1 percentage point. X-factor’s average, upper bound and lower bound series also indicate that not all IT hardware reacts to the higher temperature in the same way. At 77°F (25°C), the respective increases are 1.16 percentage points for the upper bound of x-factor and 0.64 percentage point for the lower bound likelihoods of failure.

If a facility were to operate constantly at 81.5°F (27.5°C), which is just above the upper limit of ASHRAE’s recommended envelope, the typical annualized likelihood of failure increases by 1.36 percentage points, with the upper bound adding 1.6 percentage points and the lower bound adding just under 1 percentage point. Figure 2 shows these x-factor correlations for a 4% annualized failure rate baseline scenario.

Figure 2. Upper, average and lower bound annualized failure rates at a 4% baseline

Diagram: Upper, average and lower bound annualized failure rates at a 4% baseline

Figure 3 applies the upper bound x-factor to three example baseline rates (2%, 4% and 6%) to show the likelihood of annualized IT component failures at a given constant operating temperature.

Figure 3. Upper bound failure rate scenarios at 2%, 4% and 6% baselines

Diagram: Upper bound failure rate scenarios at 2%, 4% and 6% baselines

Arguably, these are marginal increases. Yet, it is not a complete picture. As servers age, failure rates escalate — anecdotally, by the fourth year of a server’s life span there is often a noticeable increase. This is largely accounted for by the mechanical components (i.e., motors and actuators) in hard disk drives wearing out in greater numbers and, to a lesser extent, instability due to voltage sags and other issues with embedded power components. Uptime Intelligence’s data shows that IT operators are retaining servers for longer than before, with a five-year average refresh cycle — a substantial number of servers are now older than seven years.

When running many aging servers, particularly the older legacy systems found in some larger enterprise IT environments, the chances of a marked increase in failures will likely deter operators from pursuing aggressive operating temperature targets, regardless of the energy efficiency benefits.

For new builds (or full refurbishments) with largely or fully new IT systems, the scale tilts toward designing for an elevated upper temperature set point to optimize energy and water consumption, while reducing mechanical equipment sizes and the associated power infrastructure to save on capital costs.

Temperature excursions: leaner cooling, not more failures

The true potential of using x-factor data in design decisions, however, lies not with constant warmer operation, but to allow the data hall’s temperature to change across a wider range in the allowable operating envelope. The business case for this is not built on further savings on cooling energy — those have largely been captured already by operating closer to the upper limits of the recommended envelope. In temperate climates, numerous data center facilities have already achieved annualized PUEs of around 1.2 or better by taking advantage of the recommended envelope and economization.

Adopting the use of ASHRAE’s allowable envelope (A1, up to 90.6°F / 32.6°C) for excursions into higher temperatures allows for a much leaner cooling and power infrastructure, including the downsizing, or elimination even, of mechanical refrigeration capacity. This is not only less expensive for the same IT capacity but may also reduce both maintenance tasks and the risk of cooling component failures.

A further boon, which is important for sites that are limited by their substation power envelope, is that more electrical capacity can be allocated to the IT load instead of reserving it for peak cooling energy needs (when the data center is fully utilized during the worst-case climatic conditions assumed for the facility design).

What x-factor data reveals is that controlled, short-duration temperature increases may have an effect on IT failure rates, but that impact is immaterial from a financial / operational perspective even if it is statistically significant (not easily explained by random chance). This is because the limited number of hours spent in those further elevated temperatures will only marginally accelerate component wear.

Take a facility located in Washington DC as an example, assuming the primary use of evaporative or adiabatic economization for cooling. Tracking wet-bulb conditions (the temperature to which the air can be cooled down by absorbing moisture) closely at a delta of 7.2°F (4°C) or less would have an imperceptible effect on the likelihood of IT failures when compared with operating at a constant 77°F (25°C). This is because the data hall would only spend an average of 320 hours a year above 77°F (25°C); the influence of the x-factor is mitigated when IT operates at excursion temperatures for less than 3.7% of the hours in a year.

Since the climate of Washington DC is not exceptional, this approach should be broadly applicable. The likelihood of component failures would increase only marginally in most of the popular data center locations across the world if operators let temperatures rise to the allowable limit of Class A1.

But letting temperatures rise for a leaner cooling and power infrastructure is only half the x-factor story. Opening up the lower set point can effectively bring x-factor down. Allowing the temperature to fall to track ambient conditions would help reduce failure rates, often below the baseline rate. This is because the facility could spend a great number of hours, sometimes the majority of the year, in temperatures around or below the baseline temperature of 68°F (20°C).

First published in the 2015 revision of its thermal guidelines and updated in 2021, ASHRAE estimated the annualized x-factors for several US and global locations, calculating multiple scenarios for several types of cooling systems. What these scenarios have in common is that they all assume the use of economizers that track ambient dry- or wet-bulb conditions within the allowable ranges.

The results strongly indicated that adopting wider temperature ranges to take advantage of ambient conditions would, in many climates, deliver substantial benefits in IT failure rates when compared with a constant operating temperature, such as 77°F (25°C). These gains from deceleration of wear are often more than offset by the acceleration effect from excursions into the upper bound of the allowable range. This means IT component failures will likely come down in most cases.

Yet despite these benefits, the adoption of wide temperature bands (as opposed to tightly controlling for a targeted set point) remains sporadic, even for new data center builds. The practice is only relatively common in some large technical computing applications and cloud facilities that opt for direct outside air systems.

In upcoming reports, Uptime Intelligence will dive deeper into how x-factor calculations and IT hardware thermal trends inform facility cooling strategies for future builds and refurbishments.

The Uptime Intelligence View

ASHRAE’s x-factor data indicates that the data center industry by and large is still overprotective of IT hardware, and it does so at great expense in capital expenditures and energy costs — and against growing pressure on sustainability credentials. Taking full advantage of advanced cooling systems and control strategies promises to bring not only energy efficiency but a leaner cooling infrastructure, possibly lowered IT failure rates and improved sustainability credentials for the business. Despite the complexities involved in engineering (and in departing from past practices), this optimization opportunity is too big to ignore.

Global PUEs — are they going anywhere?

December 4, 2023/in Executive, Operations/by Daniel Bizo, Research Director, Uptime Institute Intelligence, [email protected]

Regular readers of Uptime Institute’s annual data center survey, the longest running of its kind, already know that the industry average power usage effectiveness (PUE, a ratio of total site power and IT power) has trended sideways in recent years. Since 2020, it has been stuck in the 1.55 to 1.59 band.

Even going back further, we would see only a trivial improvement in the headline number. For many data center industry watchers, including regulators, this flatlining has become a cause for concern — an apparent lack of advancement raises questions about operators’ commitment to energy performance and environmental sustainability.

Why is global PUE lingering in near stasis? Has the industry come up against long-term practical limits? The leaps in efficiency that marked the early years of PUE were largely due to the adoption of best practices around air-flow management (e.g., blanking panels and containment systems) and the fine-tuning of controls.

In addition, new builds benefited from the use of better-performing cooling designs (especially optimized compressors and new types of heat rejection) and more efficient power distribution equipment. The combined results of these measures were dramatic: the industry average PUE dropped from 2.5 in 2007 to 1.98 in 2011, reaching 1.65 by 2014 (see Figure 1).

Figure 1. Industry-wide PUE improvement slows, then stops

Diagram: Industry-wide PUE improvement slows, then stops

The business reality of return on investment shapes the long-term trajectory of PUE. Gains from operational improvement and technical innovation naturally taper off due to diminishing returns because the most cost-effective upgrades are performed first. The expense of a major refurbishment in an existing facility is often difficult to justify for the energy savings alone — if the project is feasible at all without risking service disruption. Although PUE gains would naturally have slowed as a result, they should not have come to a full stop.

A closer look at the data indicates they likely have not; underlying improvements have masked shifts in other factors. Uptime’s 2023 survey questionnaire included new and refined questions to understand the industry PUE dynamics better. In the interest of accuracy, the survey primarily asked respondents for the PUE at the specific data center they are most familiar with rather than the largest facility of the operator’s organization. When the two differed, we also asked for the annualized PUE of the latter facility; this methodological change has not meaningfully affected the resulting average PUEs.

The biggest component in the flatlining of the headline PUE number is a richer geographical mix of surveyed data centers. North American and European participants used to dominate Uptime’s annual survey. In 2018, for example, nearly two-thirds of the responses came from these regions. By 2023, this proportion has fallen to less than half, as the survey’s panel has been gradually expanded into other territories to take a more global view.

This matters due to differences in climates: ambient conditions across Asia, the Middle East, Africa and Latin America tend to be more taxing on cooling systems, which use more energy to cool the same IT load.

Another factor is that in some regions, particularly in Africa and Latin America, facilities in our sample tend to be smaller in capacity, with many being around 1 megawatt (MW) or less in commissioned UPS system capacity. And smaller sites lend themselves to the use of air-cooled direct expansion air conditioning units, which typically use more energy.

Prevailingly hot and / or humid climates and less efficient cooling systems mean that in the Middle East, Africa and Latin American regions, the average PUE readings are above 1.7 — in contrast, North America and Europe are hovering around 1.5.

Directionally, industry PUEs should see gradual improvement in the coming years. Newer data centers tend to use more efficient cooling designs, leveraging not only leading-edge equipment and optimized controls but also relaxed temperature set points, following ASHRAE’s thermal guidelines. Newer data centers also tend toward larger capacity, which means an even stronger focus on infrastructure energy performance.

Figure 2. Average PUE by age of facility (1 MW and above)

Diagram: Average PUE by age of facility (1 MW and above)

Uptime’s data lends support to this theory. Facilities that are larger than 1 MW and less than 15 years old average around 1.48 PUE globally (see Figure 2). Those data centers built in the past five years are even more promising for energy efficiency and sustainability, with an average global PUE of around 1.45. North America and Europe lead these, as most of the new sites from these regions in the survey achieve annualized PUEs that are better than 1.4. Directionally, this is the level where global average PUEs are headed, but a large number of smaller facilities, arguably less optimized for energy, weigh on averages.

The Uptime Intelligence View

Uptime Institute’s headline PUE figure will move together with IT workloads migrating out of legacy (statistically older than 20 years) facilities into new and preferably larger data centers — a process that will take years to deliver incremental improvements to global PUEs. But this is putting too fine a point on PUE itself: the energy performance of digital infrastructure hinges primarily on the efficiency of IT, not facilities. The most effective energy performance and sustainability efforts need to examine the opportunities for savings on both sides of the decimal point.

Emerging regulatory requirements: tactics for riding the tsunami

November 16, 2023/in Design, Executive, Operations/by Jay Dietrich, Research Director of Sustainability, Uptime Institute, [email protected]

Over the past 12 months, Uptime Institute Intelligence has been closely following regulatory developments in the area of sustainability. These include mandates based on the Task force on Climate-related Financial Disclosure standard, such as the EU’s Corporate Sustainability Reporting Directive and the UK’s Climate-related Financial Disclosure Regulations, and the EU’s Energy Efficiency Directive. These legal frameworks mandate the collection and reporting of corporate and data center-level information on building type and location, energy use, greenhouse gas (GHG) emissions and operating metrics, such as power usage effectiveness (PUE) and work per energy.

Uptime Intelligence expects these types of regulations to propagate globally over the next five years.

Operational regulations such as these are new territory for the data center industry, which has primarily operated outside the thicket of environmental and operational regulations familiar to manufacturing entities. These rules arrive at a time when the physical size and energy and water intensity of recently constructed or proposed hyperscale and colocation campuses have captured public and governmental attention.

As regulations that require data center reporting and operational requirements propagate, digital infrastructure managers and executives are seeking guidance on the steps needed to maintain an effective compliance posture.

First and foremost, managers need to track the development of regulations that may affect operations at each of their facilities. A local, state or national regulation typically takes at least six months, and often two or three years of consultation and discussion, before it becomes law. Staff members need to be assigned to track regulatory and legislative developments and prepare facilities for compliance. At a minimum, designated employees should:

Track legislative and regulatory developments to anticipate the measurements, metrics and operational performance data that will be needed to comply with the requirements.
Identify, plan and initiate projects to implement the necessary measurements, data collection and reporting systems, and energy and water efficiency improvements.
Establish key performance indicators with goals to track and improve energy and operational performance to meet or exceed expected regulatory requirements.

Even if there is little or no sustainability legislation planned or expected in any country or region, it is still advisable to track the developments in key regions, such as the EU. Regulations can often be “exported,” and even if they do not become mandatory in a given country, they can become best practice or de facto standards.

To prepare for the expected regulatory mandates, digital infrastructure managers are advised to give as much importance to energy performance (work per megawatt-hours, MWh) and GHG emission reduction metrics and objectives as they do to those governing operational resiliency, reliability and performance. Much of the legislation aimed at reducing energy consumption and carbon emissions is focused on improving these metrics.

Sustainability and operational performance metrics are not mutually exclusive; they can collectively advance both environmental and business performance. The following are some of the key areas that digital infrastructure managers need to address:

Managers are advised to increase the utilization of their IT infrastructure. If managers set and pursue goals to improve average server utilization, servers running enterprise and office applications can achieve average utilizations approaching 50% and run batch jobs at average utilizations approaching 80%. Reaching these utilization levels can enhance business performance by reducing IT capital and operating expenditures. Environmental performance will also be improved through an increase in work delivered per MWh of energy consumed.
IT equipment should operate with power management functions enabled if the workloads can tolerate the slower response times that are inherent in the functions. Deploying these functions can reduce average server energy use by 10% or more (see The strong case for power management). An Uptime Institute survey of operators indicates that more than 30% of respondents already enable power management on some portion of their server fleet, capturing energy and cost reductions.
Managers are advised to collaborate with their energy provider(s) to establish tactical and strategic plans to increase the carbon-free energy (CFE) consumed by a facility over time. Achieving 100% consumption of CFE will take 5 to 20 years (or more) to come to fruition, depending on the generation assets available in a given market. Collaboration with energy providers enables operators to reduce operational carbon emissions over time in line with government and customer expectations.
Central cooling, IT space and IT equipment infrastructure should be managed and optimized for best energy use, with automated control packages delivering the best results. These systems can improve system energy efficiency by 20% or more by scheduling central cooling units, adjusting IT space cooling delivery and managing workload placement on the IT equipment infrastructure. This will maximize the utilization of these assets while minimizing their energy consumption when delivering the required workloads.

Improving the productivity of data center operations to prepare for regulatory requirements and enhance business and environmental performance is an ongoing, multi-year journey. It requires the total commitment of the management team and the IT and facilities operations staff, resourced energy efficiency improvement and GHG emissions reduction plans. By taking early action, managers can demonstrate a commitment to sustainability and smooth the path to compliance with emerging regulations.

The Uptime Intelligence View

Over the next decade, regulatory mandates will become a fact of life for the digital infrastructure industry. Managers need to track and anticipate these requirements, then initiate the necessary actions to meet the compliance obligations. Establishing the required business processes, data management and reporting systems, and making operational improvements in an orderly fashion will benefit the business and ensure compliance.

Large data centers are mostly more efficient, analysis confirms

Influence of larger sites on PUE

Capacity and scrutiny will grow

The Uptime Intelligence View

When net-zero goals meet harsh realities

A difficult period ahead

A clear disconnect

Further challenges

The Uptime Intelligence View

What role might generative AI play in the data center?

Trust in AI is affected by the hype

Machine learning, deep learning and generative AI

The good, the bad and the impossible

Generative AI and data center management

The Uptime Intelligence View

Looking for the x-factor in data center efficiency

Navigating x-factor data

Giving x-factor dimension: failure rate scenarios

Temperature excursions: leaner cooling, not more failures

The Uptime Intelligence View

Global PUEs — are they going anywhere?

The Uptime Intelligence View

Emerging regulatory requirements: tactics for riding the tsunami

The Uptime Intelligence View

Explaining the Uptime Institute’s Tier Classification System (April 2021 Update)

The Making of a Good Method of Procedure

A Look at Data Center Cooling Technologies

Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics

Implementing Data Center Cooling Best Practices