Colocation and public cloud growth masks enterprise expansion

Colocation and public cloud growth masks enterprise expansion

The colocation and public cloud sectors of the digital infrastructure industry continue to make headlines, with many organizations planning large-scale capacity expansion to meet rising demand. However, there is also a less public expansion underway — enterprises operators, for the third successive year, say they are going to invest in more data center capacity in 2024.

Results from the Uptime Institute Capacity Trends Survey 2023 reveal that 64% of enterprise operators are growing their data center capacity — a six percentage-point uptick from two years earlier in 2021 (see Figure 1). Notably, one in five organizations in this group say they are expanding by more than 20% annually. This scale of expansion is difficult to implement without significant investment.

Figure 1. Enterprise data center capacity shows strong growth

Diagram: Enterprise data center capacity shows strong growth

Some suppliers — and some operators — may be surprised by this rate of growth since it follows a decade-long period in which enterprise data centers were often dismissed as expensive, inflexible and outdated by many executives. However, Uptime’s data, which has been consistent over the past three years and is backed by many conversations, suggests that enterprise investment is strong.

What is driving this growth in enterprise data center capacity? Partly, it is simply a demand for more digital services. However, companies also say that they are investing to enhance the resiliency of their data centers (45% of survey respondents) and to support a hybrid cloud strategy (37%). Moving to cloud architectures often requires an increase in data center capacity, especially if an organization is developing distributed resiliency architectures.

Cost may also be a factor that is driving enterprise investment. According to separate Uptime research, of those enterprise operators that compared the cost of provisioning workloads on-premises versus off-premises, most report that corporate data centers are less expensive than using colocation (56%, n=154) or public cloud (51%, n=151). This is particularly true for enterprises that have already made significant investments to expand capacity.

One looming problem for enterprises — and indeed for colocation companies — is the widely forecast increase in rack power density in the coming years. To accommodate this, new investments in cooling and power distribution will be required.

Four-fifths (82%) of enterprises say that they are expecting more demand for higher power densities in the next two to three years, but more than one-third (36%) say that they cannot accommodate this demand with their existing infrastructure (see Figure 2). As a result, many workloads with higher density demands will, as Uptime expects, be outsourced to third parties that have the requisite power and cooling infrastructure.

Figure 2. Many need new investment to meet expected power densities

Diagram: Enterprise data center capacity shows strong growth

In spite of the increased enterprise sector spending, the trend towards greater outsourcing to colocation and cloud companies is expected to remain strong (see The majority of enterprise IT is now off-premises). For example, colocation companies report more growth than their enterprise counterparts by 15 percentage points (79%, n=130), with twice as many reporting annual growth rates of more than 20% (40%, n=130).

Taken together, Uptime’s survey data shows that chief information officers are investing in cloud, hosting, colocation and enterprise data centers. While more workloads may be outsourced, the enterprise data center will most likely continue to grow and evolve. Large companies with complex mission-critical workloads, especially those that are heavily regulated, will most likely maintain on-premises sites.

However, as colocation and public cloud providers expand the depth of their services in response to the industry’s staffing, regulatory and supply chain challenges, enterprises will increasingly integrate these resources over the next decade.


The Uptime Intelligence View

Enterprise data centers have been characterized as being in decline over the past five years, especially in the context of the significant, double-digit annual growth of large colocation and public cloud organizations. But Uptime survey data has consistently shown investment in the sector. Although owning and operating data centers may not feature in the strategies employed by many large and especially newer organizations, enterprise facilities will likely remain essential to businesses beyond the medium term.

Long shifts in data centers — time to reconsider?

Long shifts in data centers — time to reconsider?

Human error has been — and remains to be — a major cause of outages in data centers. Uptime Intelligence’s research shows that about four in 10 operators have had a major outage in the past three years in which human error played a role (Annual outage analysis 2023). Half of these respondents said errors were made because staff failed to follow the correct procedures.

Thorough training, regular practice in equipment testing and work experience all help to reduce these errors — particularly in an emergency when a prompt reaction is crucial. An often underappreciated factor is the importance of mental performance and the effects of fatigue.

The relationship between shift length, fatigue and human error is well documented, but less clear is how the data center industry can define shifts that help minimize human error. The recommended best practices for other industries do not always translate into the data center world, where 24/7 service availability is the standard. Additionally, data center owners and operators wanting to optimize shift length to limit fatigue need to navigate employee preferences and region-specific constraints.

What the research says

Studies indicate there is a tipping point after which the performance of most staff deteriorates. Researchers at the Chinese University of Hong Kong Department of Systems Engineering and Engineering Management analyzed 241 papers on the relationship between shift length and occupational health and found that individuals working more than 10-hour shifts are significantly more likely to experience fatigue. A similar review from the Finnish Institute of Occupational Health shows the risk of workplace injury due to fatigue-related accidents across a range of industries is 15% higher in 10-hour shifts than 8-hour shifts, and jumps to 38% higher at 12 hours.

The errors that stem from disruption to circadian rhythms (biological processes over a 24-hour period) and mental exhaustion, and can lead to injury (e.g., from improper machine operation), can be considered products of cognitive oversight. This oversight, which is an unintentional failure to interpret events correctly, is at the root of much human error in data centers and can potentially result in not just injury, but a disruption to services.

Currently, 8- to10-hour single-day shifts are most common in the data center industry across all major regions, according to the Uptime Institute Data Center Staffing Survey 2023 (Operators struggle to overcome ongoing staff and skills shortage). There are, however, some geographic variations in the results: while 17% of all respondents report single-day shifts of more than 10 hours, Asia-Pacific leads at 22%. In contrast, respondents from Europe have more than three times as many 5- to 7-hour shifts as respondents from Asia-Pacific, but just over half (13%) report shifts of more than 10 hours.

Policy variations across different regions are clearly a factor in how data center owners and operators choose specific shift lengths for their employees, particularly in relation to night shifts. In Europe, labor laws in several major countries do not allow night shifts to exceed 8 or 10 hours as standard. Exceptions can be made to meet 24/7 staffing requirements, with night shifts extended to 12 hours, as long as employees are compensated with sufficient paid time off work.

These policy restrictions in Europe — along with the survey results indicating that European respondents provide more 5- to 7-hour shifts than respondents from other regions — may indicate that these companies are hiring more part-time employees to make up their staffing shortfall.

Companies in other regions attempting to replicate a similar strategy to reduce shift length face obstacles. Unlike European employees, workers in the US and several Latin-American countries risk losing access to healthcare coverage if their shifts become shorter. In the US there is no statutory obligation for the employer to provide healthcare coverage if employees work less than a 40-hour week. Staff are therefore reluctant to reduce their weekly hours.

Employers can limit long shifts — particularly night shifts (which have higher workplace injury risk) — to 8 hours. While this may appear to be an intuitive solution to avoid performance deterioration, Uptime Institute’s technical consultants advise that any change will not be without friction, and shift length may not even be the primary contributory factor. Some key considerations are:

  • Complacency and ownership. Shift structure should promote sharing of knowledge, break monotony of routines and help develop a sense of inclusion through rotating shifts. Shift silos, such as staff having a fixed schedule, with some only working at weekends or nights, may create unhealthy attitudes resulting from complacency or a lack of team cohesion.
  • Meeting staff lifestyle preferences. Despite data suggesting that long shifts are detrimental to performance, it is difficult for some operators to cut back hours. Uptime Institute technical consultants often see a staff preference for 12-hour shifts over several days, for the benefits of both additional overtime pay and extended blocks of time off work.
  • Relief shifts. Consensus in the industry is that extending shifts to more than 12 hours is ultimately worse for the business than sending employees home. For many operators, however, extending shifts to beyond 12 hours is unavoidable as a means of meeting staffing requirements. In practice, identifying individuals that can handle these extended shift lengths is not easy. It is not just very long shifts that carry the risks associated with fatigue. Staff not being able to rest sufficiently due to covering the shifts of absentee staff is another source of potential exhaustion, even if these shifts are not particularly long.

Long-term impact

Sourcing the appropriate, qualified individual for a relief shift in an understaffed industry is challenging. Typically, companies request employees to clock in on their rest days. This may work well for an employee during a week they are already off work, but it could also force employees to clock back on before they have had sufficient rest between shifts. Adding more staff into the shift rotation may prevent other employees from having to extend shifts or clock in with insufficient rest, but this simply patches over the root of the problem: the absence of staff from their scheduled shifts.

Operators need to monitor absence levels and understand the reasons behind these absence levels. The cumulative long-term impact of working shifts of more than 10 hours increases the risk of developing a range of health conditions, as well as fatigue. Although many data center operators have developed shift schedules to minimize errors, this needs to be balanced with a long-term view of health, work life balance and burn-out.

Planning ahead

Retroactively adjusting shift lengths of established employees could result in low morale and counterintuitively result in higher levels of fatigue as staff adjust to their new schedule changes. Many data center owners and operators, however, are undergoing significant infrastructure expansion, which need to be staffed on a shift rotation that minimizes human error and limits the risks of disruption to service availability. Owners and operators should consider the following recommendations:

  • Avoid shift lengths of more than 12 hours. Staffing levels and schedules should be defined to minimize the occurrences of abnormally long shifts.
  • Identify shifts that are not appropriate as relief shifts. Establish a system for ensuring well-rested coverage. Monitor overtime and rest periods between shifts to avoid calling in exhausted staff.
  • Consider individual employee preferences but remain mindful that shift workers often ignore potential risks to their own job performance and health when requesting their preferred schedule.

The Uptime Intelligence View

While many data center managers take a flexible approach to staffing, relief shifts remain a common source of human error. Employees experiencing long-term effects of extended shift work, in terms of risks to health and performance, may be perpetuating difficulties in filling the required shifts due to increased levels of staff absence. These factors can result in an operational stress of lower-than-ideal staffing levels in many facilities, leaving data center managers with few options to optimize shifts.

What does embedded carbon of IT really represent?

What does embedded carbon of IT really represent?

Due to regulatory mandates and expanded stakeholder expectations, a growing share of operators are quantifying and publicly reporting a complete carbon dioxide equivalent (CO2e) emissions inventory for their data center infrastructure. An organization’s direct on-site emissions (classified as Scope 1 according to the Greenhouse Gas Protocol) and emissions from purchased energy sources (classified as Scope 2) are relatively easy to calculate from measured operational facility data and available grid emissions factors.

In contrast, Scope 3 data (comprising indirect emissions from the activities of other organizations and individuals) is more challenging to gather and has a high degree of uncertainty. This is because Scope 3 represents the Scope 1 and 2 emissions of both upstream suppliers and downstream buyers, which can be up to five layers deep in the value chain. By definition, this includes potentially millions of product users scattered around the globe. Despite the challenges, data center operators need to establish processes to collect and quantify their Scope 3 emissions while recognizing both the inherent uncertainty in the data and the limited levels of control over said emissions.

Carbon emissions (shorthand for CO2e greenhouse gases) embedded in IT equipment are an important Scope 3 category for IT infrastructure operators. The most valuable data is collected from manufacturers because they have the best insight into and connections with their supply chains. Where manufacturers do not supply the data, publicly available databases and proprietary estimation tools are useful for looking up or calculating embedded carbon values for IT equipment.

IT original equipment manufacturers (OEMs) can provide product carbon footprint (PCF) reports of typical configurations for some or all their machine models. These reports combine embedded carbon estimates from the product’s manufacture and transportation to the customer along with estimates of use emissions (Scope 2) and emissions associated with the management of end-of-life equipment (a separate Scope 3 category). The “manufacturing” and “transport” emissions categories are the most relevant for reporting embedded emissions from IT equipment. Tables 1a and Table 1b give select examples of PCF reports by some of the major OEMs.

Table 1a. Example server configurations from different manufacturers

Table: Example server configurations from different manufacturers

Table 1b. Server manufacturers’ PCF reports for the example configurations

Table: Server manufacturers’ PCF reports for the example configurations

The “use” and “end of life” carbon footprint categories are of little value to a data center operator when purchasing IT equipment. The OEM estimate of emissions from operational energy use is redundant to the operator’s emission reports because this will be accounted for under Scope 2 calculations based on functional energy use and the data center emissions factor after the equipment is installed. Emissions generated by the end-of-use recovery and disposal of IT equipment are several years away and will depend on the vendor hired to manage the disposal process.

While manufacturers’ PCF reports are a convenient source of Scope 3 data, calculating carbon content for each of the four categories requires assumptions and highly uncertain estimates that limit the data’s accuracy and usefulness. Quantifying embedded carbon emissions is an academic exercise that provides little, if any, actionable insight for data center operators.

Embedded emissions: manufacture of IT equipment

The manufacture of IT equipment has two primary sources of carbon emissions: the emissions associated with the energy consumed by the manufacturing and assembly processes (typically 30% to 50% of the total) and those associated with component production, particularly semiconductors such as flash memory, dynamic random access memory (DRAM) and processors (50% to 70% of the total).

The supply chains are also geographically concentrated. Equipment manufacturing and assembly operations are mostly based in the Asia-Pacific region, where electricity emission factors vary between 0.4 and 0.8 metric tonnes of CO2 per megawatt-hour due to the high fossil fuel content in the generation mix. IT equipment has hundreds of components that are sourced from multiple companies around the globe.

IT hardware vendors, let alone buyers, cannot know the exact electricity mix as different components are manufactured and assembled in other countries with varying sources of electricity. The raw materials used in each component further complicate the problem because they are typically processed in different geographic areas, each with its own electricity source and associated variations in CO2 emissions.

The greenhouse gas emissions from semiconductor manufacturing are the result of energy consumption (about 40%), the use of perfluorinated compounds (about 20%) — these are high global warming potential gases used for chamber cleaning — and the production of the many materials and process chemicals used to fabricate a semiconductor device (about 40%).

Most of these emissions are generated deep in the supply chain of the server manufacturing process. Neither the equipment manufacturer nor the purchaser has the visibility to observe these processes or has direct leverage over the suppliers responsible for the energy use and manufacturing emissions. Academic research has found up to 30% uncertainties in IT equipment manufacturing and assembly emissions estimates.

Actions to drive emissions reductions in these processes need to be promoted by the individual suppliers and their immediate customers — there is little that the IT buyer can do to drive reductions. Importantly, there will be little difference in embedded carbon between IT OEMs for a comparable product configuration. Equally, the embedded carbon of IT equipment can be a trade-off with emissions from use because larger, more richly configured systems can also be more energy efficient.

The manufacturing emissions data for the three configurations of the HPE Proliant DL360 illustrates why manufacturing emissions estimates have high uncertainty. Using data from the EU, the base configuration, with a minimal (i.e., low component) configuration, has an embedded emissions estimate of 55% of the performance configuration. Because each server configuration is unique to the purchaser, estimating the emissions for a specific configuration adds to the 30% uncertainty inherent in assessing the embedded emissions.

This uncertainty escalates significantly for the Dell server cited in Tables 1a and 1b because the embedded manufacturing emissions are calculated for a base server with only 32 GB of memory, an improbable configuration for a data center. Reporting a low-end configuration is primarily responsible for the higher uncertainty in the Dell estimate compared with the HPE estimate.

An operator can reduce the uncertainty introduced by configuration choices by weighting the published manufacturing emissions. To make this adjustment, the weight of the purchased server (available from shipping documents) can be multiplied by the ratio of the manufacturing emissions to the weight of the server configuration used to estimate the PCF (available from the manufacturer) to get an adjusted estimate of the purchased server manufacturing PCF. This approach is best applied to the mainstream or performance configurations, as the base configuration has minimal quantities of DRAM and storage devices that contribute a significant portion of the manufacturing emissions footprint.

This adjustment may reduce the error and create a more representative manufacturing emissions estimate. However, given the limited value of this emission quantity, the additional time and effort to collect the data and perform the adjustment may not be worthwhile.

Embedded emissions: transportation to the customer

Product transportation from the assembly site to the customer accounts for a small percentage of embedded carbon in Scope 3 emission estimates. These vary based on geographical region and type of transport vehicle.

Greater geographical distances to the reporting company inevitably result in more fuel use and, therefore, higher emissions. Most products are assembled and shipped from Asia. The transport data for the HPE servers with a mainstream configuration has three to five times the transport emissions when shipped to Europe (98 kilograms, kg, of CO2e) or the US (150 kg of CO2e ) compared with Japan (31 kg of CO2e).

The most significant impact on transport emissions (and cost) is whether the product is shipped by sea or air. When demand for products is high and delivery time is critical, companies may opt for a faster transport method at the expense of greater CO2e emissions. The air transportation of goods has a 20 to 30 times larger carbon footprint than transport via ocean freight.

Use emissions

To reduce use emissions, data center operators should focus on energy use of IT instead of embedded carbon. This entails buying the most efficient IT equipment for their workloads as measured in work delivered per watt, maximizing hardware utilization, and deploying power management where the workload can tolerate the higher response times.

These aspects of evaluating or improving server efficiency are covered in several Uptime Intelligence Updates and Briefing Reports (listed at the end of this Update). Focusing on efficiency, and particularly the better utilization of IT assets, will not only drive lower Scope 2 through better energy performance but can also help Scope 3 inventories by requiring fewer IT systems to perform the same amount of work.

Manufacturers’ estimates of use emissions have no value to the purchaser. A sustainable purchase decision needs to minimize energy consumption by procuring the most efficient equipment for the workload. The use emissions will then be a function of the electricity emissions factor at the data center location where the IT equipment is installed. These will be reported as Scope 2 emissions in an environmental / sustainability report and required regulatory disclosures.

End-of-life product management emissions

The end-of-life product emissions estimate also offers no value to the data center operator. These emissions are not accounted for until the product is removed from service during a specified refresh cycle. The emissions associated with the refurbishment, recycling and disposal of the product and its components will be a function of the chosen end-of-life product management process.

Conclusion

Apart from massive IT buyers who can force shifts in upstream supply chain emissions, most data center infrastructure operators cannot influence the emissions generated within the supply chain. A select few hyperscalers can leverage their buying power by prioritizing sustainability in their purchasing decisions. And while smaller data centers can be indirect beneficiaries of hyperscaler-driven innovations in the industry, they cannot rely on this possibility as an actionable strategy for carbon reduction. Instead, most data centers that wish to track and curb Scope 3 emissions can focus on the following:

  • Require access to PCF for all products. IT buyers should expect to be able to obtain and compare equipment PCF data for their reporting purposes and to consider estimates when selecting their configurations. Buyers can signal their preference for transparency to vendors and, over time, choose to work with those companies that are more transparent in their reporting and methodology for calculating estimates.
  • Focus on IT energy performance. Scope 3 emissions are becoming an important component of emissions inventory reports, but Scope 2 emissions will likely account for most life-cycle emissions. Even for those data centers that use low-carbon power sources, good stewardship of energy resources should prioritize the reduction of energy consumption. Driving workload consolidation, a key component to helping infrastructure energy performance, will not only help energy performance but will also benefit the Scope 3 balance by using fewer IT systems and / or less hardware. Consequently, product configuration decisions should not be based solely on PCF for a single piece of equipment because better-performing, and thus potentially more efficient, configurations will often have more silicon and components (i.e., bigger processors, more memory, more storage), accompanied by a higher Scope 3 bill.
  • Factor in Scope 3 emissions into decisions about IT systems refresh. A relatively recent development in server technology is that replacing an older system with a newer one may not automatically mean better energy performance. Modern servers come with the caveat that, unless given a substantial amount of work, their utilization will not be high enough to perform more work for each kilowatt-hour of consumed energy. Without a considerable efficiency advantage, the new server may not be able to recover its manufacturing emissions.

The Uptime Intelligence View

Data center managers are facing mandates to report Scope 3 C02e emissions. Estimating the embedded carbon in IT equipment fails to produce actionable data because the values have a high degree of uncertainty and provide little guidance on how to reduce manufacturing emissions. Innovative operators and owners will instead focus on reducing their own direct emissions and forming collaborative partnerships with suppliers and manufacturers to encourage transparency in emissions reporting.

Jay Dietrich, Research Director, [email protected]

Rose Weinschenk, Research Associate, [email protected]

DLC will not come to the rescue of data center sustainability

DLC will not come to the rescue of data center sustainability

A growing number of data center operators and equipment vendors are anticipating the proliferation of direct liquid cooling systems (DLC) over the next few years. As far as projections go, Uptime Institute’s surveys agree: the industry consensus for the mainstream adoption of liquid-cooled IT converges on the latter half of the 2020s.

DLC systems, such as cold plate and immersion, have already proved themselves in technical computing applications as well as mainframe systems for decades. More recently, IT and facility equipment vendors, together with some of the larger data center operators, have started working on commercializing DLC systems for much broader adoption.

A common theme running through both operators’ expectations of DLC and vendors’ messaging is that a main benefit of DLC is improved energy efficiency. Specifically, the superior thermal performance of liquids compared with air will dramatically reduce the consumption of electricity and water in heat rejection systems, such as chillers, as well as increase opportunities for year-round free cooling in some climates. In turn, the data center’s operational sustainability credentials would improve significantly. Better still, the cooling infrastructure would become leaner, cost less and be easier to maintain.

These benefits will be out of reach for many facilities for several practical reasons. The reality of mainstream data centers combined with the varied requirements of generic IT workloads (as opposed to high-performance computing) means that cost and energy efficiency gains will be unevenly distributed across the sector. Many of the operators deploying DLC systems in the next few years will likely prioritize speed and ease of installation into existing environments, as well as focus on maintaining infrastructure resiliency — rather than aiming for maximum DLC efficiency.

Another major factor is time: the pace of adoption. The use of DLC in mission-critical facilities, let alone a large-scale change, represents a wholesale shift in cooling design and infrastructure operations, with industry best practices yet to catch up. Adding to the hurdles is that many data center operators will deem the current DLC systems limited or uneconomical for their applications, slowing rollout across the industry.

Cooling in mixed company

Data center operators retrofitting a DLC system into their existing data center footprint will often do so gradually in an iterative process, accumulating operational experience. Operators will need to manage a potentially long period when liquid-cooled and air-cooled IT systems and infrastructure coexist in the same data center. This is because air-cooled IT systems will continue to be in production for many years to come, with typical life cycles of between five and seven years. In many cases, this will also mean a cooling infrastructure (for heat transport and rejection) shared between air and liquid systems.

In these hybrid environments, DLC’s energy efficiency will be constrained by the supply temperature requirements of air-cooling equipment, which puts a lid on operating at higher temperatures —compromising the energy and capital efficiency benefits of DLC on the facility side. This includes DLC systems that are integrated with chilled water systems (running the return facility loop as supply for DLC may deliver some marginal gains) and DLC implementations where the coolant distribution unit (CDU) is cooled by the cold air supply.

Even though DLC eliminates many, if not all, server fans and reduces airflow requirements for major gains in total infrastructure energy efficiency, these gains will be difficult to quantify for real-world reporting purposes because IT fan power is not a commonly tracked metric — it is hidden in the IT load.

It will take years for DLC installations to reach the scale where a dedicated cooling infrastructure can be justified as a standard approach, and for energy efficiency gains to have a positive effect on the industry’s energy performance, such as in power usage effectiveness (PUE) numbers. Most likely, any impact on PUE or sustainability performance from DLC adoption will remain imperceptible for years.

Hidden trade-offs in temperature

There are other factors that will limit the cooling efficiency seen with DLC installations. At the core of DLC’s efficiency potential are the liquid coolants’ favorable thermal properties, which enable them to capture IT heat more effectively. The same thermal properties can also be used for a cooling performance advantage as opposed to maximizing cooling system efficiency. When planning for and configuring a DLC system, some operators will give performance, underpinned by lower operating temperatures, more weight in their balancing act between design trade-offs.

Facility water temperature is a crucial variable in this trade-off. Many DLC systems can cool IT effectively with facility water that is as high as 104°F (40°C) or even higher in specific cases. This minimizes capital and energy expenditure (and water consumption) for the heat rejection infrastructure, particularly for data centers in hotter climates.

Yet, even when presented with the choice, a significant number of facility and IT operators will choose lower supply temperatures for their DLC systems’ water supply. This is because there are substantial benefits to using lower water temperatures — often below 68°F (20°C) — despite the costs involved. Chiefly, a low facility water temperature reduces the flow rate needed for the same cooling capacity, which eases pressure on pipes and pumping.

Conversely, organizations that use warm water and DLC to enable data center designs with dry coolers face planning and design uncertainties. High facility water temperatures not only require higher flow rates and pumping power but also need to account for potential supply temperature reductions in the future as IT requirements become stricter due to evolving server silicon. For a given capacity, this could mean more or larger dry coolers, which potentially require upgrades with mechanical or evaporative assistance. Data center operators that want free cooling benefits and a light mechanical plant have a complex planning and design task ahead.

On the IT side, taking advantage of low temperatures makes sense when maximizing the performance and energy efficiency of processors because silicon exhibits lower static power losses at lower temperatures. This approach is already common today because the primary reason for most current DLC installations is to support high IT performance objectives. Data center operators currently use DLC primarily because they need to cool high-density IT rather than conserve energy.

The bulk of DLC system sales in the coming years will likely be to support high-performance IT systems, many of which will use processors with restricted temperature limits — these models are sold by chipmakers specifically to maximize compute speeds. Operators may select low water temperatures to accommodate these low-temperature processors and to maximize the cooling capacity of the CDU. In effect, a significant share of DLC adoption will likely represent an investment in performance rather than facility efficiency gains.

DLC changes more than the coolant

For all its potential benefits, a switch to DLC raises some challenges to resiliency design, maintenance and operation. These can be especially daunting in the absence of mature and application-specific guidance from standards organizations. Data center operators that support business-critical workloads are unlikely to accept compromises to resiliency standards and realized application availability for a new mode of cooling, regardless of the technical or economic benefits.

In the event of a failure in the DLC system, cold plates tend to offer much less than a minute of ride-through time because of their small coolant volume. The latest high-powered processors would have only a few seconds of ride-through at full load when using typical cold plate systems. Operating at high temperatures means that there are thin margins in a failure, something that operators of mainstream, mission-critical facilities will be mindful of when making these decisions.

In addition, implementing concurrent maintainability or fault tolerance with some DLC equipment may not be practical. As a result, a conversion to DLC can demand that organizations maintain their infrastructure resiliency standard in a different way from air cooling. Operators may consider protecting coolant pumps with an uninterruptible power supply (UPS) and using software resiliency strategies when possible.

Organizational procedures for procurement, commissioning, maintenance and operations need to be re-examined because DLC disrupts the current division of facilities and IT infrastructure functions. For air-cooling equipment, there is strong consensus regarding the division of equipment between facilities and IT teams, as well as their corresponding responsibilities in procurement, maintenance and resiliency. No such consensus exists for liquid cooling equipment. A resetting of staff responsibilities will require much closer cooperation between facilities and IT infrastructure teams.

These considerations will temper the enthusiasm for large-scale use of DLC and make for a more measured approach to its adoption. As operators increasingly understand the ways in which DLC deployment is not straightforward, they will bide their time and wait for industry best practices to mature and fill their knowledge gaps.

In the long term (i.e., 10 years or more), DLC is likely to handle a large share of IT workloads, including a broad set of systems running business applications. This will happen as standardization efforts, real-world experience with DLC systems in production environments and mature guidance take shape in new, more robust products and best practices for the industry. To grow the number and size of deployments of cold plate and immersion systems in mission-critical facility infrastructure, DLC system designs will have to meet additional technical and economic objectives. This will complicate the case for efficiency improvements.

The cooling efficiency figures of today’s DLC products are often derived from niche applications that differ from typical commercial data centers — and real-world efficiency gains from DLC in mainstream data centers will necessarily be subject to more trade-offs and constraints.

In the near term, the business case for DLC is likely to tilt in favor of prioritizing IT performance and ease of retrofitting with a shared cooling infrastructure. Importantly, choosing lower, more traditional water supply temperatures and utilizing chillers appears to be an attractive proposition for added resiliency and future-proofing. As many data center operators deem performance needs and mixed environments to be more pressing business concerns — free cooling aspirations, along with their benefits in sustainability, will have to wait for much of the industry.

US mandates crypto energy reporting: will data centers be next?

US mandates crypto energy reporting: will data centers be next?

Rising concerns about cryptocurrency mining energy use have led the US Energy Information Administration (EIA) to launch a six-month emergency data reporting mandate (on January 26, 2024) to obtain information from 82 cryptocurrency mining companies. The emergency order which was approved by the Office of Management and Budget (OMB) requires cryptocurrency miners to provide information detailing their monthly energy consumption, average and maximum electricity demand, energy suppliers, mining unit counts, and the hash rate (compute power of a blockchain network) at each of their operating locations from February 2024 to July 2024. The order is expected to capture information from 150 facilities.

At the end of February 2024, the initiative (survey) was temporarily put on hold after a lawsuit brought by a cryptocurrency association and a bitcoin mining company alleged that the data collection initiative could harm businesses by forcing them to divulge confidential and sensitive information. The lawsuit contested the notion that cryptocurrency mining operations pose a danger to the reliability of the grid, which will now have to be proven in court.

While the legal action has halted the survey for at least a month, it does not dispute that the OBR and the EIA have the legal mechanisms available to launch this initiative — and the same mechanisms can be applied to other sectors of the data center industry.

EIA estimates rise in crypto mining energy use

The EIA administrator requested emergency reporting authority from the OMB because they and their staff concluded that escalating cryptocurrency mining energy demand in the US could reasonably result in public harm for the following reasons:

  • Rising Bitcoin prices risk more electricity use as miners expand their operations.
  • The increased energy demand is occurring without an accompanying increase in energy supply, which is likely to increase energy prices and grid instability. 
  • There is no data available to assess the speed and extent of the potential energy use growth, making it difficult to mitigate the potential public harm.

The order was generated based on data models developed under an EIA in-depth analysis of cryptocurrency mining activities that estimated that these operations were responsible for 0.6% to 2.3% of US electricity consumption and that energy consumption was likely to grow rapidly (see Tracking electricity consumption from US cryptocurrency mining operations). In addition, concerns were expressed by members of Congress and the Administration (see First signs of federal data center reporting mandates appear in US) about cryptocurrency mining energy use, while grid planners have indicated that the growth in energy consumption will negatively affect electricity supply costs, the quantity of reserve supply, grid reliability and greenhouse gas emissions.

To continue data collection beyond the six-month emergency order, the EIA is currently using the agency’s authority to request a three-year extension to the data collection period.

Energy reporting for traditional data centers likely

While the emergency order warrants the data center industry’s attention, it is the fact that the EIA has the authority to prescribe major energy users to supply this information that will be of real concern.

The rapid growth of conventional data center operations has increasingly been under public scrutiny, particularly in regard to data center expansion in the US. The US Office of Science and Technology Policy report (reviewed in First signs of federal data center reporting mandates appear in US) estimated that the energy consumption of traditional data centers was likely to be equivalent to the energy consumption of cryptocurrency mining operations, making traditional data centers a logical next target for facility and energy use reporting.

Data center operators should not be surprised if the EIA turns its attention to energy consumption in traditional data centers, proposing regulations for reporting sometime in 2024. The buildout of traditional data centers is eliciting the same criticisms levelled at cryptocurrency mining operations: they reduce available electricity supply and grid reliability, raise electricity costs, and increase emissions due to increased demand on fossil fuel-powered generation facilities. The final push for the EIA to act is likely to be the EU’s publication of region-wide data center energy use as reported under the EED (which is still being finalized), and associated regulations.

Conclusion

US data center operators should prepare for a potential EIA-mandated energy consumption reporting regulation in 2024. The reporting requirements are likely to resemble those mandated for cryptocurrency mining operations: facility information, energy consumption and demand data, and a count of installed equipment. Two items that will have to be addressed in the regulatory proposal are the criteria that data centers are required to report, such as data center type, installed IT capacity, installed power capacity; and the reporting frequency (monthly, quarterly or annually). Any US data center energy consumption reporting regulation will require publication in the US’ Federal Register and a comment period, giving the industry an opportunity to review and shape the final reporting requirements.


The Uptime Intelligence View

US data center operators have been sanguine about the potential for government regulation mandating data reporting or minimum performance requirements for key metrics. Unfortunately, the regulation establishing the US EIA contains a vehicle (15 USC 772) that authorizes the EIA to compel major energy users to report energy consumption and relevant facility data. Given the current public, legislator and regulator concerns relating to the projected growth of data center energy demand, which is anticipated to accelerate with the growth of AI offerings, it is highly likely that the EIA will propose a regulation mandating energy consumption reporting for data centers in 2024.

Addendum

The US Energy Information Administration (EIA) and Bitcoin industry groups have reached a settlement in the Bitcoin industry group’s lawsuit. EIA has agreed to “destroy” all the data collected. Under the settlement agreement, EIA intends to use its authority 15 USC 772 (Administrators information-gathering power) to request a three-year data collection period (Federal Register Vol. 89 No. 28; February 9, 2024). This process will allow the industry to provide comments and shape the requirements of the data collection process.

Performance expectations of liquid cooling need a reality check

Performance expectations of liquid cooling need a reality check

The idea of using liquids to cool IT hardware, exemplified by technologies such as cold plates and immersion cooling, is frequently hailed as the ultimate solution to the data center’s energy efficiency and sustainability challenges. If a data center replaces air cooling with direct liquid cooling (DLC), chilled water systems can operate at higher supply and return water temperatures, which are favorable for both year-round free cooling and waste heat recovery.

Indeed, there are some larger DLC system installations that use only dry coolers for heat rejection, and a few installations are integrated into heat reuse schemes. As supply chains remain strained and regulatory environments tighten, the attraction of leaner and more efficient data center infrastructure will only grow.

However, thermal trends in server silicon will challenge engineering assumptions, chiefly DLC coolant design temperature points that ultimately underpin operators’ technical, economic and sustainability expectations of DLC. Some data center operators say the mix of technical and regulatory changes on the horizon are difficult to understand when planning for future capacity expansions — and the evolution of data center silicon will only add to the complications.

The only way is up: silicon power keeps escalating

Uptime Institute Intelligence has repeatedly noted the gradual but inescapable trend towards higher server power — barring a fundamental change in chip manufacturing technology (see Silicon heatwave: the looming change in data center climates). Not long ago, a typical enterprise server used less than 200 watts (W) on average, and stayed well below 400 W even when fully loaded. More recent highly performant dual-socket servers can reach 700 W to800 W thermal power, even when lightly configured with memory, storage and networking. In a few years, mainstream data center servers with high-performance configurations will require as much as 1 kilowatt (kW) in cooling, even without the addition of power-hungry accelerators.

The underlying driver for this trend is semiconductor physics combined with server economics for two key reasons. First, even though semiconductor circuits’ switching energy is dropping, the energy gains are being outpaced by an increase in the scale of integration. As semiconductor technology advances, the same area of silicon will gradually consume (and dissipate) ever more power as a result. Chips are also increasing in size, compounding this effect.

Second, many large server buyers prefer highly performant chips that can process greater software payloads faster because these chips drive infrastructure efficiency and business value. For some, such as financial traders and cloud services providers, higher performance can translate into more direct revenue. In return for these benefits, IT customers are ready to pay hefty price premiums and accept that high-end chips are more power-hungry.

DLC to wash cooling problems away

The escalation of silicon power is now supercharged by the high demand for artificial intelligence (AI) training and other supercomputing workloads, which will make the use of air cooling more costly. Fan power in high-performance servers can often account for 10% to 20% of total system power, in addition to silicon static power losses, due to operating near the upper temperature limit. There is also a loss of server density, resulting from the need to accommodate larger heat sinks and fans, and to allow more space between the electronics.

In addition, air cooling may soon see restrictions in operating temperatures after nearly two decades of gradual relaxation of set points. In its 2021 Equipment thermal guidelines for data processing environments, US industry body ASHRAE created a new environmental class for high-density servers with a recommended supply temperature maximum of 22°C (71.6°F) — a whole 5°C (9°F) lower than the general guidelines (Class A1 to A4), with a corresponding dip in data center energy efficiency (see New ASHRAE guidelines challenge efficiency drive).

Adopting DLC offers relief from the pressure of these trends. The superior thermal performance of liquids, whether water or engineered fluids, makes the job of removing several hundred watts of thermal energy from compact IT electronics more straightforward. Current top-of-the-line processors (up to 350 W thermal design power) and accelerators (up to 700 W on standard parts such as NVIDIA data center GPUs) can be effectively cooled even at high liquid coolant temperatures, allowing the facility water supply for the DLC system to be running as high as 40°C (104°F), and even up to 45°C (113°F).

High facility water temperatures could enable the use of dry coolers in most climates; or alternatively, the facility can offer valuable waste heat to a potential offtaker. The promise is attractive: much reduced IT and facility fan power, elimination of compressors that also lower capital and maintenance needs, and little to no water use for cooling. Today, several high-performance computing facilities with DLC systems take advantage of the heat-rejection or heat-reuse benefits of high temperatures.

Temperature expectations need to cool down

Achieving these benefits is not necessarily straightforward. Details of DLC system implementation, further increases in component thermal power, and temperature restrictions on some components all complicate the process further.

  • Temperatures depend on the type of DLC implementation. Many water-cooled IT systems, the most common type in use today, often serialize multiple cold plates within a server to simplify tubing, which means downstream components will receive a higher temperature coolant than the original supply. This is particularly true for densified compute systems with very compact chassis, and restricts coolant supply temperatures well below what would be theoretically permissible with a parallel supply to every single cold plate.
  • Thermal design power has not peaked. The forces underlying the rise in silicon power (discussed above) remain in play, and the data center industry widely expects even more power-hungry components in the coming years. Yet, these expectations remain in the realm of anecdotes, rumors and leaks in the trade press, rather than by way of publicly available information. Server chip vendors refuse to publicize the details of their roadmaps — only select customers under nondisclosure agreements have improved visibility. From our discussions with suppliers, Uptime Intelligence can surmise that more powerful processors are likely to surpass the 500 W mark by 2025. Some suppliers are running proof of concepts simulating 800 W silicon heat loads, and higher.
  • Temperature restrictions of processors. It is not necessarily the heat load that will cap facility water temperatures, but the changing silicon temperature requirements. As thermal power goes up, the maximum temperature permitted on the processor case (known as Tcase) is coming down —to create a larger temperature difference to the silicon and boost heat flux. Intel has also introduced processor models specified for liquid cooling, with Tcase as low as 57°C (134.6°F), which is more than a 20°C (36°F) drop from comparable air-cooled parts. These low-Tcase models are intended to take advantage of the lower operating temperature made possible by liquid cooling to maximize peak performance levels when running computationally intense code, which is typical in technical and scientific computing.
  • Memory module cooling.In all the speculation around high-power processors and accelerators, a potentially overlooked issue is the cooling of server memory modules, whose heat output was once treated as negligible. As module density, operating speeds and overall capacity increase with successive generations, maintaining healthy operating temperature ranges is becoming more challenging. Unlike logic chips, such as processors that can withstand higher operating temperatures, dynamic memory (DRAM) cells show performance degradation above 85°C (185°F), including elevated power use, higher latency, and — if thermal escalation is unchecked — bit errors and overwhelmed error correction schemes. Because some of the memory modules will be typically downstream of processors in a cold-plate system, they receive higher temperature coolant. In many cases it won’t be the processor’s Tcase that will restrict coolant supply temperatures, but the limits of memory chips.

The net effect of all these factors is clear: widespread deployment of DLC to promote virtually free heat rejection and heat reuse will remain aspirational in all but a few select cases where the facility infrastructure is designed around a specific liquid-cooled IT deployment.

There are too many moving parts to accurately assess the precise requirements of mainstream DLC systems in the next five years. What is clear, however, is that the very same forces that are pushing the data center industry towards liquid cooling will also challenge some of the engineering assumptions around its expected benefits.

Operators that are considering dedicated heat rejection for DLC installations will want to make sure they prepare the infrastructure for a gradual decrease in facility supply temperatures. They can achieve this by planning increased space for additional or larger heat rejection units — or by setting the water temperature conservatively from the outset.

Temperature set points are not dictated solely by IT requirements, but also by flow rate considerations — which has consequences for pipe and pump sizing. Operating close to temperature limits means loss of cooling capacity for the coolant distribution units (CDU), requiring either larger CDUs or more of them. Slim margins also mean any degradation or loss of cooling may have a near immediate effect at full load: a cooling failure in water or single-phase dielectric cold-plate systems may have less than 10 seconds of ride-through time.

Today, temperatures seem to be converging around 32°C (89.6°F) for facility water — a good balance between facility efficiency, cooling capacity and support for a wide range of DLC systems. Site manuals for many water-cooled IT systems also have the same limit. Although this is far higher than any elevated water temperature for air-cooling systems, it still requires additional heat rejection infrastructure either in the form of water evaporation or mechanical cooling. Whether lower temperatures will be needed as server processors approach 500 W — with large memory arrays and even higher power accelerators — will depend on a number of factors, but it is fair to assume the likely answer will be “yes”, despite the high cost of larger mechanical plants.

These considerations and limitations are mostly defined by water cold-plate systems. Single-phase immersion with forced convection and two-phase coolants, probably in the form of cold-plate evaporators rather than immersion, offer alternative approaches to DLC that should help ease supply temperature restrictions. For the time being, water cold plates remain the most widely available and are commonly deployed, and mainstream data center operators will need to ensure they meet the IT system requirements that use them.

In many cases, Uptime Intelligence expects operators to opt for lower facility supply water temperatures for their DLC systems, which brings benefits in lower pumping energy and fewer CDUs for the same cooling capacity, and is also more future proof. Many operators have already opted for conservative water temperatures as they upgrade their facilities for a blend of air and liquid-cooled IT. Others will install DLC systems that are not connected to a water supply but are air-cooled using fans and large radiators.


The Uptime Intelligence View

The switch to liquid to cool IT electronics offers a host of energy and compute performance benefits. However, future expectations based on the past performance of DLC installations are unlikely to be met. The challenges of silicon thermal management will only become more difficult as new generations of high-power server and memory chips develop. This is due to stricter component temperature limits, with future maximum facility water temperatures to be set at more conservative levels. For now, the vision of a lean data center cooling plant without either compressors or evaporative water consumption remains elusive.