What role might generative AI play in the data center?

What role might generative AI play in the data center?

Advances in artificial intelligence (AI) are expected to change the way work is done across numerous organizations and job roles. This is especially true for generative AI tools, which are capable of synthesizing new content based on patterns learned from existing data.

This update explores the rise of large language models (LLMs) and generative AI. It examines whether data center managers should be as dismissive of this technology as many appear to be and considers whether generative AI will find a role in data center operations.

This is the first in a series of reports on AI and its impact on the data center sector. Future reports will cover the use of AI in data center management tools, the power and cooling demands of AI systems, training and inference processing, and how the technology will affect core and edge data centers.

Trust in AI is affected by the hype

Data center owners and operators are starting to develop an understanding of the potential benefits of AI in data center management. So long as the underlying models are robust, transparent and trusted, AI is proving beneficial and is increasingly being used in areas such as predictive maintenance, anomaly detection, physical security and filtering and prioritizing alerts.

At the same time, a deluge of marketing messages and media coverage is creating confusion around the exact capabilities of AI-based products and services. The Uptime Institute Global Data Center Survey 2023 reveals that managers are significantly less likely to trust AI for their operational decision-making than they were a year ago. This fall in trust coincides with the sudden emergence of generative AI. Conversations with data center managers show that there is widespread caution in the industry.

Machine learning, deep learning and generative AI

Machine learning (ML) is an umbrella term for software techniques that involve training mathematical models on large data sets. The process enables ML models to analyze new data and make predictions or solve tasks without being explicitly programmed to do so, in a process called inferencing.

Deep learning — an advanced approach to ML inspired by the workings of the human brain — makes use of deep neural networks (DNNs) to identify patterns and trends in seemingly unrelated or uncorrelated data.

Generative AI is not a specific technology but a type of application that relies on the latest advances in DNN research. Much of the recent progress in generative AI is down to the transformer architecture — a method of building DNNs unveiled by Google in 2017 as part of its search engine technology and later used to create tools such as ChatGPT, which generates text, and DALL-E for images.

Transformers use attention mechanisms to learn the relationships between datapoints — such as words and phrases — largely without human oversight. The architecture manages to simultaneously improve output accuracy while reducing the duration of training required to create generative AI models. This technique kick-started a revolution in applied AI that became one of the key trends of 2023.

LLMs which are based on transformers, like ChatGPT, are trained using hundreds of gigabytes of text and can generate essays, scholarly or journalistic articles and even poetry. However, they face an issue that differentiates them from other types of AI and prevents them from being embraced in a mission-critical setting: they cannot guarantee the accuracy of their output.

The good, the bad and the impossible

With the growing awareness of the benefits of AI comes greater knowledge of its limitations. One of the key limitations of LLMs is their tendency to “hallucinate” and provide false information in a convincing manner. These models are not looking up facts; they are pattern-spotting engines that guess the next best option in a sequence.

This has led to some much-publicized news stories about early adopters of tools like ChatGPT landing themselves in trouble because they relied on the output of generative AI models that contained factual errors. Such stories have likely contributed to the erosion of trust in AI as a tool for data center management — even if these types of issues are exclusive to generative AI.

It is a subject of debate among researchers and academics whether hallucinations can be eliminated entirely from the output of generative AI, but their prevalence can certainly be reduced.

One way to achieve this is called grounding and involves the automated cross-checking of LLM output against web search results or reliable data sources. Another way to minimize the chances of hallucinations is called process supervision. Here, the models are trained to reward themselves for each correct step of their reasoning rather than the right conclusion.

Finally, there is the creation of domain-specific LLMs. These can be either built from scratch using data sourced from specific organizations or industry verticals, or they can be created through the fine-tuning of generic or “foundational” models to perform well-defined, industry-specific tasks. Domain-specific LLMs are much better at understanding jargon and are less likely to hallucinate when used in a professional setting because they have not been designed to cater to a wide variety of use cases.

The propensity to provide false information with confidence likely disqualifies generative AI tools from ever taking part in operational decision-making in the data center — this is better handled by other types of AI or traditional data analytics. However, there are other aspects of data center management that could be enhanced by generative AI, albeit with human supervision.

Generative AI and data center management

First, generative AI can be extremely powerful as an outlining tool for creating first-pass documents, models, designs and even calculations. For this reason, it will likely find a place in those parts of the industry that are concerned with the creation and planning of data centers and operations. However, the limitations of generative AI mean that its accuracy can never be assumed or guaranteed, and that human expertise and oversight will still be required.

Generative AI also has the potential to be valuable in certain management and operations activities within the data center as a productivity and administrative support tool.

The Uptime Institute Data Center Resiliency Survey 2023 reveals that 39% of data center operators have experienced a serious outage because of human error, of which 50% were the result of a failure to follow the correct procedures. To mitigate these issues, generative AI could be used to support the learning and development of staff with different levels of experience and knowledge.

Generative AI could also be used to create and update the method of procedures (MOPs), standard operating procedures (SOPs) and emergency operating procedures (EOPs), which can often get overlooked due to time and management pressures. Other examples of potential applications of generative AI include the creation of:

  • Technical user guides and operating manuals that are pertinent to the specific infrastructure within the facility.
  • Step-by-step maintenance procedures.
  • Standard information guides for new employee and / or customer onboarding.
  • Recruitment materials.
  • Risk awareness information and updates.
  • Q&A resources that can be updated as required.

In our conversations with Uptime Institute members and other data center operators, some said they used generative AI for purposes like these — for example, when summarizing notes made at industry events and team meetings. These members agreed that LLMs will eventually be capable of creating documents like MOPs, SOPs and EOPs.

It is crucial to note that in these scenarios, AI-based tools would be used to draft documents that would be checked by experienced data center professionals before being used in operations. The question is whether the efficiencies gained in the process of using generative AI offset the risks that necessitate human validation.

The Uptime Intelligence View

There is considerable confusion around AI. The first point to understand is that ML employs several techniques, and many of the analytics methods and tools can be very useful in a data center setting. But it is generative AI that is causing most of the stir. Given the number of hurdles and uncertainties, data center managers are right to be skeptical about how far they can trust generative AI to provide actionable intelligence in the data center.

That said, it does have very real potential in management and operations, supporting managers and teams in their training, knowledge and documentation of operating procedures. AI — in many forms — is also likely to find its way into numerous software tools that play an important supporting role in data center design, development and operations. Managers should track the technology and understand where and how it can be safely used in their facilities.

John O’Brien, Senior Research Analyst, Uptime Institute [email protected]

Max Smolaks, Research Analyst, Uptime Institute [email protected]

Looking for the x-factor in data center efficiency

Looking for the x-factor in data center efficiency

The suitability of a data center environment is primarily judged by its effect on the long-term health of IT hardware. Facility operators define their temperature and humidity set points with a view to balancing hardware failure rates against the associated capital and operational expenditures, with the former historically prioritized.

Over the past decade, this balance has shifted in favor of facility efficiencies as data center operators have gradually lifted their temperature (and humidity) limits from a typical 65°F to 68°F (18°C to 20°C) to somewhere between 72°F and 77°F (22°C and 25°C). However, many operators remain reluctant to further relax their temperature settings even for new data center designs, let alone their existing footprint.

This conservative stance goes against the empirical evidence. Industry body ASHRAE issued its failure rate guidance, called x-factor, almost a decade ago, yet it remains underused in key data center planning and design decisions. As a result, data centers have larger, and thus more expensive, cooling systems that use more energy than the data justifies.

Operators (and their tenants) that do follow x-factor in their cooling design and control strategy may reap the benefits not only in lower construction costs and energy savings but also in the possibility of fewer IT hardware failures. This Uptime Intelligence Update offers an overview of what x-factor is, what it indicates and what cooling optimization opportunities it presents.

Navigating x-factor data

ASHRAE and IT vendors tend to agree that it is best to operate hardware well below the specified temperature limits as standard. This is both to extract the maximum performance and to minimize the thermal stress on components — heat accelerates wear, which ultimately increases the likelihood of failures. Hard disk drives, with their moving parts, are the most prone to failure under heat stress.

X-factor is a dimensionless quantity that estimates how the annualized likelihood of IT hardware component failures change with inlet temperature. The proportional change to failure rates assumes continuous operation at a given temperature when compared with the baseline of 68°F (20°C) — a temperature set point historically preferred by many operators. Based on empirical data from IT vendors and published in 2014, x-factor comprises three data series:

  • X-factor numbers, which indicate the average (typical) hardware configuration of a server.
  • Upper-bound numbers, which reflect the server hardware configurations that are more prone to temperature-related failures. For example, less heat-resistant system designs or those with a lot of hard disk drives.
  • Lower-bound numbers, which reflect the server hardware configurations that are less likely than the average to develop failures. For example, those designs that are optimized for airflow and heat distribution or are diskless.

Figure 1 depicts the correlation between the likelihood of failure rate and the temperature up to 90.5°F (32.5°C), which is just above the upper limit of ASHRAE’s allowable envelope for Class A1 data center environments. These x-factor numbers represent changes in the likelihood of annualized IT component failures when operated at the chosen temperature continuously.

Figure 1. X-factor data visualized up to 90.5°F (32.5°C)

Diagram: X-factor data visualized up to 90.5°F (32.5°C)

For example, x-factor data indicates that when IT hardware operates at a constant 77°F (25°C) to reduce cooling energy needs, the annualized component failure rates will likely increase anywhere between 4% and 43% (midpoint 24%) when compared with the baseline at 68°F (20°C). This elevated x-factor quantifies the outcome of the accelerated wear on the hardware.

These failures do not necessarily represent a complete failure of an IT system, but cover any type of hardware failure, including redundant components. Some failures will manifest themselves in system instability and restarts. It is worth noting that modern IT systems are typically designed and warranted to operate in temperatures up to 95°F (32°C) as standard, although this may be dependent on the configuration.

Giving x-factor dimension: failure rate scenarios

Unfortunately, a key piece of information is missing that affects the trade-off calculations: the actual baseline failure rate before x-factor is applied. For business confidentiality reasons, IT vendors do not publish their respective field data, not even their calculated figures based on accelerated aging tests, such as the mean time between failures or failures in time. Based on what Uptime Intelligence was able to gather from the limited disclosures and academic publications, the following information can be surmised:

  • The best, most durable IT system designs may have an annualized baseline failure rate of around 2% or less. This is due to good thermal management, robust build quality and the selection of components based on their durability.
  • The difference in annualized failure rates between the most and least durable hardware is probably a factor of three or four (i.e., 6% to 8% of IT systems that develop a component failure in a year).
  • Unsurprisingly, hard disk drives are the most common source of failure. In most cases, these do not result in a system outage or data loss due to the ubiquitous use of data redundancy techniques.

As an example, if there is an assumed 4% baseline failure rate, operating constantly at 77°F (25°C) will mean an increase in typical IT system component failures by approximately 1 percentage point. X-factor’s average, upper bound and lower bound series also indicate that not all IT hardware reacts to the higher temperature in the same way. At 77°F (25°C), the respective increases are 1.16 percentage points for the upper bound of x-factor and 0.64 percentage point for the lower bound likelihoods of failure.

If a facility were to operate constantly at 81.5°F (27.5°C), which is just above the upper limit of ASHRAE’s recommended envelope, the typical annualized likelihood of failure increases by 1.36 percentage points, with the upper bound adding 1.6 percentage points and the lower bound adding just under 1 percentage point. Figure 2 shows these x-factor correlations for a 4% annualized failure rate baseline scenario.

Figure 2. Upper, average and lower bound annualized failure rates at a 4% baseline

Diagram: Upper, average and lower bound annualized failure rates at a 4% baseline

Figure 3 applies the upper bound x-factor to three example baseline rates (2%, 4% and 6%) to show the likelihood of annualized IT component failures at a given constant operating temperature.

Figure 3. Upper bound failure rate scenarios at 2%, 4% and 6% baselines

Diagram: Upper bound failure rate scenarios at 2%, 4% and 6% baselines

Arguably, these are marginal increases. Yet, it is not a complete picture. As servers age, failure rates escalate — anecdotally, by the fourth year of a server’s life span there is often a noticeable increase. This is largely accounted for by the mechanical components (i.e., motors and actuators) in hard disk drives wearing out in greater numbers and, to a lesser extent, instability due to voltage sags and other issues with embedded power components. Uptime Intelligence’s data shows that IT operators are retaining servers for longer than before, with a five-year average refresh cycle — a substantial number of servers are now older than seven years.

When running many aging servers, particularly the older legacy systems found in some larger enterprise IT environments, the chances of a marked increase in failures will likely deter operators from pursuing aggressive operating temperature targets, regardless of the energy efficiency benefits.

For new builds (or full refurbishments) with largely or fully new IT systems, the scale tilts toward designing for an elevated upper temperature set point to optimize energy and water consumption, while reducing mechanical equipment sizes and the associated power infrastructure to save on capital costs.

Temperature excursions: leaner cooling, not more failures

The true potential of using x-factor data in design decisions, however, lies not with constant warmer operation, but to allow the data hall’s temperature to change across a wider range in the allowable operating envelope. The business case for this is not built on further savings on cooling energy — those have largely been captured already by operating closer to the upper limits of the recommended envelope. In temperate climates, numerous data center facilities have already achieved annualized PUEs of around 1.2 or better by taking advantage of the recommended envelope and economization.

Adopting the use of ASHRAE’s allowable envelope (A1, up to 90.6°F / 32.6°C) for excursions into higher temperatures allows for a much leaner cooling and power infrastructure, including the downsizing, or elimination even, of mechanical refrigeration capacity. This is not only less expensive for the same IT capacity but may also reduce both maintenance tasks and the risk of cooling component failures.

A further boon, which is important for sites that are limited by their substation power envelope, is that more electrical capacity can be allocated to the IT load instead of reserving it for peak cooling energy needs (when the data center is fully utilized during the worst-case climatic conditions assumed for the facility design).

What x-factor data reveals is that controlled, short-duration temperature increases may have an effect on IT failure rates, but that impact is immaterial from a financial / operational perspective even if it is statistically significant (not easily explained by random chance). This is because the limited number of hours spent in those further elevated temperatures will only marginally accelerate component wear.

Take a facility located in Washington DC as an example, assuming the primary use of evaporative or adiabatic economization for cooling. Tracking wet-bulb conditions (the temperature to which the air can be cooled down by absorbing moisture) closely at a delta of 7.2°F (4°C) or less would have an imperceptible effect on the likelihood of IT failures when compared with operating at a constant 77°F (25°C). This is because the data hall would only spend an average of 320 hours a year above 77°F (25°C); the influence of the x-factor is mitigated when IT operates at excursion temperatures for less than 3.7% of the hours in a year.

Since the climate of Washington DC is not exceptional, this approach should be broadly applicable. The likelihood of component failures would increase only marginally in most of the popular data center locations across the world if operators let temperatures rise to the allowable limit of Class A1.

But letting temperatures rise for a leaner cooling and power infrastructure is only half the x-factor story. Opening up the lower set point can effectively bring x-factor down. Allowing the temperature to fall to track ambient conditions would help reduce failure rates, often below the baseline rate. This is because the facility could spend a great number of hours, sometimes the majority of the year, in temperatures around or below the baseline temperature of 68°F (20°C).

First published in the 2015 revision of its thermal guidelines and updated in 2021, ASHRAE estimated the annualized x-factors for several US and global locations, calculating multiple scenarios for several types of cooling systems. What these scenarios have in common is that they all assume the use of economizers that track ambient dry- or wet-bulb conditions within the allowable ranges.

The results strongly indicated that adopting wider temperature ranges to take advantage of ambient conditions would, in many climates, deliver substantial benefits in IT failure rates when compared with a constant operating temperature, such as 77°F (25°C). These gains from deceleration of wear are often more than offset by the acceleration effect from excursions into the upper bound of the allowable range. This means IT component failures will likely come down in most cases.

Yet despite these benefits, the adoption of wide temperature bands (as opposed to tightly controlling for a targeted set point) remains sporadic, even for new data center builds. The practice is only relatively common in some large technical computing applications and cloud facilities that opt for direct outside air systems.

In upcoming reports, Uptime Intelligence will dive deeper into how x-factor calculations and IT hardware thermal trends inform facility cooling strategies for future builds and refurbishments.

The Uptime Intelligence View

ASHRAE’s x-factor data indicates that the data center industry by and large is still overprotective of IT hardware, and it does so at great expense in capital expenditures and energy costs — and against growing pressure on sustainability credentials. Taking full advantage of advanced cooling systems and control strategies promises to bring not only energy efficiency but a leaner cooling infrastructure, possibly lowered IT failure rates and improved sustainability credentials for the business. Despite the complexities involved in engineering (and in departing from past practices), this optimization opportunity is too big to ignore.

Global PUEs — are they going anywhere?

Global PUEs — are they going anywhere?

Regular readers of Uptime Institute’s annual data center survey, the longest running of its kind, already know that the industry average power usage effectiveness (PUE, a ratio of total site power and IT power) has trended sideways in recent years. Since 2020, it has been stuck in the 1.55 to 1.59 band.

Even going back further, we would see only a trivial improvement in the headline number. For many data center industry watchers, including regulators, this flatlining has become a cause for concern — an apparent lack of advancement raises questions about operators’ commitment to energy performance and environmental sustainability.

Why is global PUE lingering in near stasis? Has the industry come up against long-term practical limits? The leaps in efficiency that marked the early years of PUE were largely due to the adoption of best practices around air-flow management (e.g., blanking panels and containment systems) and the fine-tuning of controls.

In addition, new builds benefited from the use of better-performing cooling designs (especially optimized compressors and new types of heat rejection) and more efficient power distribution equipment. The combined results of these measures were dramatic: the industry average PUE dropped from 2.5 in 2007 to 1.98 in 2011, reaching 1.65 by 2014 (see Figure 1).

Figure 1. Industry-wide PUE improvement slows, then stops

Diagram: Industry-wide PUE improvement slows, then stops

The business reality of return on investment shapes the long-term trajectory of PUE. Gains from operational improvement and technical innovation naturally taper off due to diminishing returns because the most cost-effective upgrades are performed first. The expense of a major refurbishment in an existing facility is often difficult to justify for the energy savings alone — if the project is feasible at all without risking service disruption. Although PUE gains would naturally have slowed as a result, they should not have come to a full stop.

A closer look at the data indicates they likely have not; underlying improvements have masked shifts in other factors. Uptime’s 2023 survey questionnaire included new and refined questions to understand the industry PUE dynamics better. In the interest of accuracy, the survey primarily asked respondents for the PUE at the specific data center they are most familiar with rather than the largest facility of the operator’s organization. When the two differed, we also asked for the annualized PUE of the latter facility; this methodological change has not meaningfully affected the resulting average PUEs.

The biggest component in the flatlining of the headline PUE number is a richer geographical mix of surveyed data centers. North American and European participants used to dominate Uptime’s annual survey. In 2018, for example, nearly two-thirds of the responses came from these regions. By 2023, this proportion has fallen to less than half, as the survey’s panel has been gradually expanded into other territories to take a more global view.

This matters due to differences in climates: ambient conditions across Asia, the Middle East, Africa and Latin America tend to be more taxing on cooling systems, which use more energy to cool the same IT load.

Another factor is that in some regions, particularly in Africa and Latin America, facilities in our sample tend to be smaller in capacity, with many being around 1 megawatt (MW) or less in commissioned UPS system capacity. And smaller sites lend themselves to the use of air-cooled direct expansion air conditioning units, which typically use more energy.

Prevailingly hot and / or humid climates and less efficient cooling systems mean that in the Middle East, Africa and Latin American regions, the average PUE readings are above 1.7 — in contrast, North America and Europe are hovering around 1.5.

Directionally, industry PUEs should see gradual improvement in the coming years. Newer data centers tend to use more efficient cooling designs, leveraging not only leading-edge equipment and optimized controls but also relaxed temperature set points, following ASHRAE’s thermal guidelines. Newer data centers also tend toward larger capacity, which means an even stronger focus on infrastructure energy performance.

Figure 2. Average PUE by age of facility (1 MW and above)

Diagram: Average PUE by age of facility (1 MW and above)

Uptime’s data lends support to this theory. Facilities that are larger than 1 MW and less than 15 years old average around 1.48 PUE globally (see Figure 2). Those data centers built in the past five years are even more promising for energy efficiency and sustainability, with an average global PUE of around 1.45. North America and Europe lead these, as most of the new sites from these regions in the survey achieve annualized PUEs that are better than 1.4. Directionally, this is the level where global average PUEs are headed, but a large number of smaller facilities, arguably less optimized for energy, weigh on averages.

The Uptime Intelligence View

Uptime Institute’s headline PUE figure will move together with IT workloads migrating out of legacy (statistically older than 20 years) facilities into new and preferably larger data centers — a process that will take years to deliver incremental improvements to global PUEs. But this is putting too fine a point on PUE itself: the energy performance of digital infrastructure hinges primarily on the efficiency of IT, not facilities. The most effective energy performance and sustainability efforts need to examine the opportunities for savings on both sides of the decimal point.

Emerging regulatory requirements: tactics for riding the tsunami

Emerging regulatory requirements: tactics for riding the tsunami

Over the past 12 months, Uptime Institute Intelligence has been closely following regulatory developments in the area of sustainability. These include mandates based on the Task force on Climate-related Financial Disclosure standard, such as the EU’s Corporate Sustainability Reporting Directive and the UK’s Climate-related Financial Disclosure Regulations, and the EU’s Energy Efficiency Directive. These legal frameworks mandate the collection and reporting of corporate and data center-level information on building type and location, energy use, greenhouse gas (GHG) emissions and operating metrics, such as power usage effectiveness (PUE) and work per energy.

Uptime Intelligence expects these types of regulations to propagate globally over the next five years.

Operational regulations such as these are new territory for the data center industry, which has primarily operated outside the thicket of environmental and operational regulations familiar to manufacturing entities. These rules arrive at a time when the physical size and energy and water intensity of recently constructed or proposed hyperscale and colocation campuses have captured public and governmental attention.

As regulations that require data center reporting and operational requirements propagate, digital infrastructure managers and executives are seeking guidance on the steps needed to maintain an effective compliance posture.

First and foremost, managers need to track the development of regulations that may affect operations at each of their facilities. A local, state or national regulation typically takes at least six months, and often two or three years of consultation and discussion, before it becomes law. Staff members need to be assigned to track regulatory and legislative developments and prepare facilities for compliance. At a minimum, designated employees should:

  • Track legislative and regulatory developments to anticipate the measurements, metrics and operational performance data that will be needed to comply with the requirements.
  • Identify, plan and initiate projects to implement the necessary measurements, data collection and reporting systems, and energy and water efficiency improvements.
  • Establish key performance indicators with goals to track and improve energy and operational performance to meet or exceed expected regulatory requirements.

Even if there is little or no sustainability legislation planned or expected in any country or region, it is still advisable to track the developments in key regions, such as the EU. Regulations can often be “exported,” and even if they do not become mandatory in a given country, they can become best practice or de facto standards.

To prepare for the expected regulatory mandates, digital infrastructure managers are advised to give as much importance to energy performance (work per megawatt-hours, MWh) and GHG emission reduction metrics and objectives as they do to those governing operational resiliency, reliability and performance. Much of the legislation aimed at reducing energy consumption and carbon emissions is focused on improving these metrics.

Sustainability and operational performance metrics are not mutually exclusive; they can collectively advance both environmental and business performance. The following are some of the key areas that digital infrastructure managers need to address:

  1. Managers are advised to increase the utilization of their IT infrastructure. If managers set and pursue goals to improve average server utilization, servers running enterprise and office applications can achieve average utilizations approaching 50% and run batch jobs at average utilizations approaching 80%. Reaching these utilization levels can enhance business performance by reducing IT capital and operating expenditures. Environmental performance will also be improved through an increase in work delivered per MWh of energy consumed.
  2. IT equipment should operate with power management functions enabled if the workloads can tolerate the slower response times that are inherent in the functions. Deploying these functions can reduce average server energy use by 10% or more (see The strong case for power management). An Uptime Institute survey of operators indicates that more than 30% of respondents already enable power management on some portion of their server fleet, capturing energy and cost reductions.
  3. Managers are advised to collaborate with their energy provider(s) to establish tactical and strategic plans to increase the carbon-free energy (CFE) consumed by a facility over time. Achieving 100% consumption of CFE will take 5 to 20 years (or more) to come to fruition, depending on the generation assets available in a given market. Collaboration with energy providers enables operators to reduce operational carbon emissions over time in line with government and customer expectations.
  4. Central cooling, IT space and IT equipment infrastructure should be managed and optimized for best energy use, with automated control packages delivering the best results. These systems can improve system energy efficiency by 20% or more by scheduling central cooling units, adjusting IT space cooling delivery and managing workload placement on the IT equipment infrastructure. This will maximize the utilization of these assets while minimizing their energy consumption when delivering the required workloads.

Improving the productivity of data center operations to prepare for regulatory requirements and enhance business and environmental performance is an ongoing, multi-year journey. It requires the total commitment of the management team and the IT and facilities operations staff, resourced energy efficiency improvement and GHG emissions reduction plans. By taking early action, managers can demonstrate a commitment to sustainability and smooth the path to compliance with emerging regulations.

The Uptime Intelligence View

Over the next decade, regulatory mandates will become a fact of life for the digital infrastructure industry. Managers need to track and anticipate these requirements, then initiate the necessary actions to meet the compliance obligations. Establishing the required business processes, data management and reporting systems, and making operational improvements in an orderly fashion will benefit the business and ensure compliance.

Data centers are short-staffed boys’ clubs

Data centers are short-staffed boys’ clubs

Two persistent trends in data center staffing are in apparent tension. The 2023 Uptime Institute Global Data Center Survey confirmed, once again, that operations teams are struggling to attract and retain qualified staff. The severity of this shortage should justify aggressive hiring from all available labor sources — yet data centers still employ shockingly few women. The average proportion of female employees at respondents’ organizations is just 8% — lower than in many physically demanding, conventionally male-dominated industries, such as construction, mining and manufacturing.

In the Uptime Institute Global Data Center Survey 2023 report, Uptime details the staffing shortage that has frustrated the data center industry for more than a decade. Operator responses show that the past four years have been particularly trying for the sector. About half of the survey’s respondents reported difficulty in filling open job positions, and one in four have seen their staff hired away, with most being poached by competitors. Skills shortages affect virtually all job roles but are the most acute among operations, mechanical and electrical staff.

While researching the staffing demands of data center design, build, and operations teams, Uptime has collected gender data since 2018. Figure 1 shows that the majority (81%) of data center teams are overwhelmingly male, with women making up one in 10 workers or fewer. On average, a meager 8% of all teams are made up of women. This year, Uptime calculated this weighted average for the first time, and five years of data reveals no significant change.

Figure 1 Data center teams lack gender diversity

Diagram: Data center teams lack gender diversity

In the 2019 questionnaire regarding hiring initiatives, good intentions abound. Almost three in four respondents agreed the industry would benefit from hiring more women — and 45% said the gender imbalance was a threat to the industry, contributing to the dry talent pipeline and even technical stagnation. Nearly half of respondents were planning initiatives to hire more women — or had such programs already in place. These results suggested a will to invest in hiring women, and an expectation of that investment paying off.

Four years on, the staff shortfall looms large in keynote presentations and data halls alike. So, where are the women? Did the time and money invested in hiring initiatives fail to attract female candidates, or were these plans and initiatives quietly shelved? Regardless of the causes, the gender divide persists, and the benefits of mitigating it remain hypothetical.

Identifying the most relevant influences on the gender balance of the workforce is difficult without any motion in the data — but some explanations can be ruled out by comparison against other industries.

Anecdotally, women are disinclined to pursue manual labor — and labor statistics confirm that physically demanding job roles typically attract fewer women. However, this alone cannot account for the disparity in the data halls. Uptime’s findings suggest data center operations teams employ fewer women than manufacturing (29%), mining (15%) and construction (11%) — according to figures from a 2021 publication by the US Bureau of Labor Statistics (Employment and Wages Online Annual Averages). The construction workforce in western Europe is about 9% women, as per government statistical offices, and the International Energy Agency reports at least 10% women in the mining industry in all regions.

In occupational health and safety research, women describe construction sites as hostile workplaces. Women endure isolation, sexual harassment, and bullying, as well as injuries from tools and personal protective equipment designed for the average size, weight, and grip strength of a male worker. In interviews, women say they experience little job security in construction, and this is discouraging timely reporting of harassment or injuries, and frustrating efforts at improvement. If female workers are choosing such punishing job sites over data centers (even by a small margin), there must be other factors influencing the minimal presence of women — beyond the assumed aversion to physical labor.

Data center operations are not a highly visible career path, and the problem is more pronounced for women. With few female role models in the industry, women may perceive a data center career as unwelcoming or unsafe and may feel discouraged from applying. This cycle is likely to perpetuate itself unless more resources are devoted specifically to closing the gender gap.

Many data center operators now desire better visibility in the labor market and are bringing their organizations out of their habit of comfortable obscurity. To draw in more workers, the industry will need to effectively communicate that it has rewarding careers available for qualified women. An uptick in female representation would strengthen this message and could act as an indicator of early successes for recruitment efforts at large.

To draw future workers from universities and trade schools, some data center organizations have formed educational partnerships and programs — often working with high school students, or younger. Efforts in education to attract more women and girls to careers in data centers must begin today, to produce more gender diversity when these students grow into job seekers.

The data center industry also seeks out workers who are changing careers and can bring some applicable skills with them. Operators can smooth the transition into a data center career by revisiting the advertised job requirements — lowering the bar for entry so candidates can supplement their existing skill sets with training on the job and mentoring programs. Active outreach to women in this talent pool can ensure that fewer qualified candidates are overlooked.

This avenue of recruitment can expand to include “career returners.” Many in this group are women who left the workforce for childcare or family care duties — and operators offering benefits suited to workers with family obligations may gain an advantage in recruitment.

Operators desperate for staff would benefit from understanding what went wrong with female representation in the industry. Ongoing efforts to recruit women must be re-examined and held to account for a lack of return on time and money invested.

To lessen the data center staffing struggle, operators will need to draw from every available labor pool. Women and other underrepresented groups are underutilized resources — and the industry likely will not escape its staffing crisis until it can bring them in.

The Uptime Intelligence View

Gender balance in the data center lags many physically demanding, conventionally male-dominated industries such as construction, mining, and manufacturing — a sign the industry is not doing enough to advertise rewarding career opportunities to women. Data centers cannot afford to let female candidates remain left out. Outreach efforts to career changers, universities, and vocational training could maximize their returns by seeking out more women.

Consensus on regulatory goals hides national differences

Consensus on regulatory goals hides national differences

In recent reports, Uptime Institute Intelligence has warned that a wave of resiliency, security and sustainability legislation is making its way toward the statute books. Governments around the world — aware that digital infrastructure is increasingly critical to economic and national security (and consumes a lot of power) — have decided the sector cannot be left unwatched and unmanaged.

New laws relating to data center sustainability have attracted the most attention, partly because some of the provisions will likely prove expensive or difficult to meet. In Germany, for example, the Energy Efficiency Act, which was passed by Germany’s lower house of parliament in September 2023, requires all new data centers to reuse a set proportion of their waste heat (with some exceptions) and have a power usage effectiveness (PUE) of 1.3 or below. The legislation also specifies that older data centers will be required to reach this level by 2026.

The act, which has been dubbed by some as the “data center prevention act,” is the first of many new laws planned by European governments. It anticipates (and adds to) the requirements of the EU’s Energy Efficiency Directive (EED), which comes into force in the latter part of 2023. The EED contains a long list of onerous reporting and improvement requirements for data centers that will be transposed into national law in all 27 EU member states.

Sustainability is not, however, the focus of a lot of the upcoming regulation. A recent Uptime Institute report, Digital resiliency: global trends in regulation, looks at how legislation addressing resiliency across the digital supply chain is being implemented or planned in the US, EU, UK, Singapore, Australia and elsewhere. At least some of these laws will have far-reaching effects on the digital infrastructure ecosystem.

While rules to ensure resiliency are not new, the latest wave signals a significant extension of regulatory oversight. The new laws are a response to the growing threat of complex, systemic or even irrecoverable outages and are a recognition that data center services now play a critical role in a modern economy — a lesson underlined during the COVID-19 pandemic.

A common theme of these new rules is that, effectively, governments are now classifying digital infrastructure as part of the critical national infrastructure. This is a specific term that means operators of critical services are subject to security, availability, transparency and reporting mandates. Under these regulations, all participants in the infrastructure supply chain — whether software, hosting, colocation, cloud or networking — need to be transparent and accountable.

Uptime Intelligence research suggests that few operators are up to date with pending legislation or the requirements and costs of compliance. In the area of sustainability, surveys show that most data center / digital infrastructure operators are not collecting the data they will soon need to report.

Operators in North America (primarily the US) tend to be much more wary of regulation than their counterparts in other parts of the world — wariness that applies both to sustainability and to resiliency / transparency. In a 2021 Uptime Institute climate change survey, for example, three-quarters of European and Asian data center operators said they thought data center sustainability laws were needed — but only about 40% of US operators agreed (see Figure 1).

Figure 1. Majority invite more regulation for sustainability

Diagram: Majority invite more regulation for sustainability

In the 2023 Uptime Institute Global Data Center Survey, operators were asked about their attitude toward resiliency laws that would reveal more details of data center infrastructures and would enable certain customers (i.e., clients with critical requirements) to visit or assess facilities for resiliency. Operators everywhere were broadly supportive, but once again, those in North America were the most wary (see Figure 2).

Figure 2. North American operators least likely to favor transparency laws

Diagram: North American operators least likely to favor transparency laws

Differences in attitudes toward the role of regulation are not limited to businesses; regulators too differ in their aims and methods. The US, for example, generally puts more emphasis on economic incentives, partly through the tax system, such as the Inflation Reduction Act, while Europe favors the stick — rules must be followed, or penalties will follow.

In the US, when resiliency laws are introduced or proposed, for example by the Securities and Exchange Commission, they are not expected to be backed up by tough sanctions for failures to comply — unlike in the EU, where penalties can be high. Instead, organizations are encouraged to conform, or they will be prevented from bidding for certain government contracts.

And while Europe and parts of Asia (such as China and Singapore) have tough sustainability laws in the pipeline, the US has no federal laws planned.

There is, of course, a long-running debate over whether expensive carrots (the US model) or punitive, bureaucratic sticks are the most effective methods in facilitating change. Evidence from the Uptime Institute annual survey does show that regulations drive some investments (see Regulations drive investments in cybersecurity and efficiency). But, of course, so do rules that require investments in equipment.

For data center owners and operators, the end result may not be so different. High levels of resiliency and transparency are mostly expected to be rewarded in law and in the market, as are energy efficiency and low carbon emissions.

However, the incentive model may cost less for operators because of generous rebates and exemptions — Uptime estimates that fulfilling the emerging reporting requirements for sustainability and resiliency can cost upwards of $100,000.

The Uptime Intelligence View

The role and extent of regulation — and of incentives — is likely to change constantly in the next decade, making it difficult for data center operators to formulate a clear strategy. The most successful data center owners and operators will be those that aim for high standards at an early stage, in both areas of resiliency and sustainability, and invest accordingly. The business case for such investments is becoming ever stronger — in all geographies.