Forecasting the solar storm threat

Forecasting the solar storm threat

A proposed permanent network of electromagnetic monitoring stations across the continental US, operating in tandem with a machine learning (ML) algorithm, could facilitate accurate predictions of geomagnetic disturbances (GMDs). If realized, this predictive system could help grid operators avert disruption and reduce the likelihood of damage to their — and their customers’ — infrastructure, including data centers.

Geomagnetic disturbances, also referred to as “geomagnetic storms” or “geomagnetic EMP”, occur when violent solar events interact with Earth’s atmosphere and magnetic field. Solar events that cause geomagnetic EMP (such as coronal mass ejection, or solar flares) occur frequently but chaotically, and are often directed away from Earth. The only long-term available predictions are probabilistic, and imprecise: for example, an extreme geomagnetic EMP typically occurs once every 25 years. When a solar event occurs, the US Space Weather Prediction Center (SWPC) can give hours’ to days’ notice of when it is expected to reach Earth. At present, these warnings lack practical information regarding the intensity and the location of such EMPs’ effects on power infrastructure and customer equipment (such as data centers).

A GMD produces ground-induced currents (GICs) in electrical conductors. The low frequency of a GMD concentrates GICs in very long electrical conductors — such as, for example, the high-voltage transmission lines in a power grid. A severe GMD can cause high-voltage transformer damage and widespread power outages — which could last indefinitely: high-voltage transformers have long manufacturing lead times, even in normal circumstances. Some grid operators have begun protecting their infrastructure against GICs. Data centers, however, are at risk of secondary GIC effects through their connections to the power grid: and many data center operators have not taken protective measures against GMDs, or any other form of EMP (see Electromagnetic pulse and its threat to data centers).

In the event of a less intense GMD, grid operators can often compensate for GICs, without failures. Data centers, however, may experience power-quality issues such as harmonic distortions (defects in AC voltage waveforms). Most data center uninterruptable power supply (UPS) systems are designed to accommodate some harmonics and protect downstream equipment, but the intense effects of a GMD can overwhelm these built-in protections — potentially damaging the UPS or other equipment. The effects of harmonics inside a data center can include inefficient UPS operation, UPS rectifier damage, tripped circuit breakers, overheated wiring, malfunctioning motors in mechanical equipment and, ultimately, physical damage to IT equipment.

The benefit to data center operators from improved forecasting of GMD effects is greatest in the event of these less intense incidents, which threaten equipment damage to power customers but are insufficient to bring down the power grid. An operator’s best defense against secondary GIC effects is to pre-emptively disconnect from the grid and run on backup generators. Actionable, accurate, and localized forecasting of GIC effects would better prepare operators to disconnect in time to avert damage (and to avoid unnecessary generator runtime in regions where this is strictly regulated).

An added challenge regarding the issue of geomagnetic effects on power infrastructure is that it is interdisciplinary: the interactions between Earth’s magnetic field and the power grid have historically not been well understood by experts in either geology or electrical infrastructure. Computationally simulating the effects of geomagnetic events on grid infrastructure is still not practically feasible.

This might change with rapid advancements in computer performance and modeling methods. At the 2022 Infragard National Disaster Resilience Council Summit in the US, researchers at Oregon State University presented a machine learning approach that could produce detailed geomagnetic forecasting — the objective here being to inform grid operators of their assessment of necessary protection of grid infrastructure.

Better modeling and forecasting of GMD effects requires many measurements spanning a geographic area of interest. The Magnetotelluric (MT) Array collects data across the continental US, using seven permanent stations, and over 1,600 temporary locations (as at 2022), arranged 43 miles (70 km) apart, on a grid. Over 1,900 temporary MT stations are planned by 2024. Station instruments measure time-dependent changes in Earth’s electric and magnetic fields, providing insight into the resistivity and electromagnetic impedance of Earth’s crust and upper mantle, in three dimensions. This data informs predictions of GIC intensity, which closely correlates with damaging effects on power infrastructure. The MT Array provides a dramatic and much-needed improvement to the resolution of data available on these geomagnetic effects.

Diagram: Magnetotelluric Array stations (2022). Map image © Google
Figure 1 Magnetotelluric Array stations (2022). Map image © Google

Researchers trained a machine learning model on two months of continuous and simultaneous data output from an array of 25 MT stations in Alaska (US). The trained model effectively predicts geomagnetic effects, with 30 minutes’ advance notice. Fortunately, scaling these forecast abilities to the continental US will not require the long-term operation of thousands of MT stations. The trained model can forecast geomagnetic effects at the 43 miles (70 km) resolution of the full MT Array with significantly fewer permanent stations providing input.

The proposed permanent network is called the “Internet of MT” (IoMT) and would cover the continental US with just 500 permanently installed devices to produce ongoing forecasts, on a grid at 87 mile (140 km) spacing. These devices are designed differently from the equipment at today’s MT Array stations: while collecting the same types of data, they have several advantages. Powered by solar panels and allowing data to be uploaded automatically through a mobile network connection, the IoMT devices have a smaller footprint and a much lower cost of acquisition — approximately $5,000 per station (in contrast to current MT Array station equipment, which would cost $60,000 to install permanently).

The MT Array has, so far, been financed through funding from various US government agencies, including the National Science Foundation (NSF), the National Aeronautics and Space Administration (NASA), and the United States Geological Survey (USGS). Though the IoMT’s equipment design promises a significantly lower cost of acquisition and installation than the technology used in today’s temporary array, funding for this next phase has not yet been secured.

Detailed geomagnetic forecasts could make it possible for grid operators to take proactive steps to protect their infrastructure — preventing prolonged power outages and sparing their customers (including data centers) damaging secondary effects. The predictions offered through the IoMT provide a model that could be used worldwide to address the risks inherent in the threat of geomagnetic EMP. Though it is too early to anticipate how this data could be distributed to data center operators, the value of proactive defense from GMDs may support a subscription service — for instance, on the part of companies that provide weather data.

Cloud migrations to face closer scrutiny

Cloud migrations to face closer scrutiny

Big public-cloud operators have often had to compete against each other — sometimes ferociously. Only rarely have they had to compete against alternative platforms for corporate IT, however. More often than not, chief information officers (CIOs) responsible for mission-critical IT have seen a move to the public cloud as low-risk, flexible, forward-looking and, ultimately, inexpensive. But these assumptions are now coming under pressure.

As the coming years threaten to be economically and politically turbulent, infrastructure and supply chains will be subject to disruption. Increasing government and stakeholder interest will force enterprises to scrutinize the financial and other risks of moving on-premises applications to the public cloud. More effort, and more investment, may be required to ensure that resiliency is both maintained and is clearly evident to its customers. While cloud has, in the past, been viewed as a low-risk option, the balance of uncertainty is changing — as are the cost equations.

Although the picture is complicated, with many factors at play, there are some signs that these pressures may, already, be slowing down adoption. Amazon Web Services (AWS), the largest cloud provider, reported a historic slowdown in growth in the second half of 2022, after nearly a decade of 30% to 40% increases year-on-year. Microsoft, too, has flagged a likely slowdown in the growth of its Azure cloud service.

No one in the industry is suggesting that the adoption of public cloud has peaked, or that it is no longer of strategic value to large enterprises. Use of the public cloud is still growing dramatically and is still driving growth in the data center industry. Public cloud will continue to be the near-automatic choice for most new applications, but organizations with complex, critical and hybrid requirements are likely to slow down or pause their migrations from on-premises infrastructure to the cloud.

Is the cloud honeymoon over?

Many businesses have been under pressure to move applications to the cloud quickly, without comprehensive analysis of the costs, benefits and risks. CIOs, often prompted or backed by heads of finance or chief executives, have favored the cloud over on-premises IT for new and / or major projects.

Data from the Uptime Institute Global Data Center Survey 2022 suggests that, while many were initially wary, organizations are becoming more confident in using the cloud for their most important critical workloads. The proportion of respondents not placing mission-critical workloads into the public cloud has dropped from 74% in 2019 to 63% in 2022. Figure 1 shows the growth in on-premises to cloud migrations, encouraged by C-level enthusiasm and positive perceptions of inexpensive performance.

diagram: Drivers for and barriers to cloud migration (infrastructure-level factors)
Figure 1 Drivers for and barriers to cloud migration (infrastructure-level factors)

High-profile cloud outages, however, together with increasing regulatory interest, are encouraging some customers to take a closer look. Customers are beginning to recognize that not all applications have been architected to take advantage of key cloud features — and architecting applications properly can be very costly. “Lifting and shifting” applications that cannot scale, or that cannot track changes in user demand or resource supply dynamically, is unlikely to deliver the full benefits of the cloud and could create new challenges. Figure 1 shows how several internal (IT) and external (macroeconomic) pressures could suppress growth in the future.

One particular challenge is that many applications have not been rearchitected to meet business objectives — most notably resiliency. Many cloud customers are not fully aware of their responsibilities regarding the resiliency and scalability of their application architecture, in the belief that cloud companies take care of this automatically. Cloud providers, however, make it explicitly clear that zones will suffer outages occasionally and that customers are required to play their part. Cloud providers recommend that customers distribute workloads across multiple availability zones, thereby increasing the likelihood that applications will remain functional, even if a single availability zone falters.

Research by Uptime shows how vulnerable enterprise-cloud customers are to single-zone outages currently. Data from the Uptime Institute Global Data Center Survey 2022 shows that only 35% of respondents believe the loss of an availability zone would result in significant performance issues, and only 16% of respondents indicated that the loss of an availability zone would not impact their cloud applications.

To capture the full benefits of the cloud and to reduce the risk of outages, organizations need to (re)architect for resiliency. This resiliency has an upfront and ongoing cost implication, and this needs to be factored in when a decision is made to migrate applications from on-premises to the cloud. Uptime Intelligence has previously found that architecting an application across dual availability zones can cost 43% more than a non-duplicated application (see Public cloud costs versus resiliency: stateless applications). Building across regions, which further improves resiliency, can double costs. Some applications might not be worth migrating to the cloud, given the additional expense of resiliency being factored into application architecture.

Economic forces will reduce pressure to migrate to the cloud

Successful and fully functional cloud migrations of critical workloads carry additional costs that are often substantial — a factor that is only now starting to be fully understood by many organizations.

These costs include both the initial phase — when applications have to be redeveloped to be cloud-native, at a time when skills are in short supply and high demand — and the ongoing consumption charges that arise from long periods of operation across multiple zones. It is clear that the cost of the cloud has not always been factored in: a major reason for organizations moving their workloads back to on-premises from the public cloud being cost (cited by 43% of respondents to Uptime Institute’s Data Center Capacity Trends Survey 2022).

Server refresh cycles often act as a trigger for cloud migration. Rather than purchasing new physical servers, IT C-level leaders choose to lift-and-shift applications to the public cloud. Uptime’s 2015 global survey of data center managers showed that 35% of respondents kept their servers in operation for five years or more; this proportion had increased to 52% by 2022. During challenging economic times, CIOs may be choosing to keep existing servers running instead of investing in a migration to the cloud.

Even if CIOs continue to exert pressure for a move to the cloud, this will be muted by the need to justify the expense of migration. Despite allowing for a reduction in on-premises IT and in data center footprints, many organizations do not have the leeway to handle the unexpected costs required to make cloud applications more resilient or performant. Poor access to capital, together with tighter budgets, will force executives to think carefully about the need for full cloud migrations. Application migrations with a clear return on investment will continue to move to the cloud; those that are borderline may be put on the back burner until conditions are clearer.

Additional pressure from regulators

Governments are also becoming concerned that cloud applications are not sufficiently resilient, or that they present other risks. The dominance of Amazon, Google and Microsoft (the “hyperscalers”) has raised concerns regarding “concentration risk” — an over-reliance on a limited number of cloud providers — in several countries and key sectors.

Regulators are taking steps to assess and manage this concentration risk, amid concerns that it could threaten the stability of many economies. The EC’s recently adopted Digital Operational Resilience Act (DORA) provides a framework for making the oversight of outsourced IT providers (including cloud) the responsibility of financial market players. The UK government’s Office of Communications (Ofcom) has launched a study into the country’s £15 billion public-cloud-services market. The long-standing but newly updated Gramm-Leach-Bliley Act (GLBA, also known as the Financial Services Modernization Act) in the US now requires regular cyber and physical security assessments.

The direction is clear. More organizations are going to be required to better evaluate and plan risks arising from third-party providers. This will not always be easy or accurate. Cloud providers face the same array of risks (arising from cyber-security issues, staff shortages, supply chains, extreme weather and unstable grids, etc.) as other operators. They are rarely transparent about the challenges associated with these risks.

Organizations are becoming increasingly aware that lifting and shifting applications from on-premises to public-cloud locations does not guarantee the same levels of performance or availability. Applications must be architected to take advantage of the public cloud — with the resulting upfront and ongoing cost implications. Many organizations may not have the funds (or indeed the expertise and / or staff) to rearchitect applications during these challenging times, particularly if the business benefits are not clear. Legislation will force regulated industries to consider all risks before venturing into the public cloud. Much of this legislation, however, is yet to be drafted or introduced.

How will this affect the overall growth of the public cloud and its appeal to the C-level management? Hyperscaler cloud providers will continue to expand globally and to create new products and services. Enterprise customers, in turn, are likely to continue finding cloud services competitive. The rush to migrate workloads will slow down as organizations do the right thing: assess their risks, design architectures that help mitigate those risks, and move only when ready to do so (and when doing so will add value to the business).


The full report Five data center predictions for 2023 is available here.

See our Five Data Center Predictions for 2023 webinar here.

Accounting for digital infrastructure GHG emissions

Accounting for digital infrastructure GHG emissions

A host of regulations worldwide have introduced (or will introduce) legal mandates forcing data center operators to report specific operational data and metrics. Key examples include the European Union’s Corporate Sustainability Reporting Directive (CSRD); the European Commission’s proposed Energy Efficiency Directive (EED) recast; the draft US Securities and Exchange Commission’s (SEC) climate disclosure proposal and various national reporting requirements (including in Brazil, Hong Kong, Japan, New Zealand, Singapore, Switzerland, and the UK) under the Task Force on Climate-related Financial Disclosures (TCFD). The industry is not, currently, adequately prepared to address these requirements, however.

Current data-exchange practices lack consistency — and any recognized consensus — on reporting sustainability-related data, such as energy and water use, greenhouse gas (GHG) emissions and operational metrics. Many enterprises have, throughout discussions with Uptime Institute, indicated that it is difficult (and sometimes impossible) to obtain energy and emissions data from colocation and cloud operators.

The Uptime Institute Global Data Center Survey 2022 made clear that data center operators’ readiness to report GHG emissions has seen incremental improvement over previous surveys, with only 37% of respondents indicating that they are prepared to publicly report their GHG emission inventories (up by just 4 percentage points on the previous year). Of these, less than one-third of respondents are currently including their Scope 3 emissions inventories.

Fortunately, most of the reporting regimes will become effective for the 2024 reporting year, giving data center managers time to work with their colocation and cloud providers on obtaining the necessary data, and to put their carbon accounting processes in order. While the finer details will vary according to each enterprise’s digital infrastructure footprint, there are certain common steps that data center managers can implement to facilitate the collection of quality data to fulfill these new reporting mandates.

Colocation operations

The GHG Protocol classifies emissions as Scope 1 and Scope 2, where an entity has operational control or financial control. Having analyzed these definitions Uptime’s position is that IT operators exercise both operational and financial control over their IT operations in colocation data centers.

From an operational control standpoint, IT operators specify and purchase the IT equipment installed in the colocation space, set the operating parameters for that equipment (power management settings, virtual machine creation and assignment, hardware utilization levels, etc.) and maintain and monitor operations. Similarly, IT operators have financial control: they purchase, install, operate and maintain the IT equipment. On which basis, GHG emissions from IT operations in a colocation facility should be classified as Scope 2 for the IT operator and Scope 3 for the colocation operator. Emissions (and energy use) from facility functions, such as power distribution losses from the grid connection to the IT hardware, and cooling energy, should fall into Scope 3 for the IT operator tenant.

Table 1 outlines Scope 2 and Scope 3 emissions reporting responsibilities for IT operations in enterprise (owned), colocation and public cloud data centers under GHG Protocol Corporate Accounting and Reporting Standards.

Table: Emissions Scope assignments for IT operations in different data center types
Table 1 Emissions Scope assignments for IT operations in different data center types

In collaboration with colocation and IT operators, Business for Social Responsibility (a sustainable business network and consultancy) published some initial guidance on emissions accounting in 2017: GHG Emissions Accounting, Renewable Energy Purchases, and Zero-Carbon Accounting: Issues and Considerations for the Colocation Data Center Industry

This guidance did not take a position on the assignment of Scope 2 and 3 emissions in colocation operations, however, leaving this decision to individual colocation operators.

In practice, different operators use two different accounting criteria. Equinix, for example, accounts for all energy use and emissions as Scope 2, with emissions effectively passed to tenants as Scope 3. NTT follows the approach (also recommended by Uptime) that GHG emissions from the energy use of IT operations in a colocation facility should be classified as Scope 2 for the IT operator and Scope 3 for the colocation operator.

The use of two different accounting criteria creates confusion and makes the comparison and understanding of emissions reports across the data center industry difficult. The industry needs to settle on a single accounting methodology for emissions reporting.

The GHG Protocol Corporate Accounting and Reporting Standards are likely to be cited as governing the classification of Scope 1, 2 and 3 emissions under legal mandates such as the CSRD and the proposed SEC climate disclosure requirements. Uptime recommends that colocation operators and their tenants conform to the GHG Protocol to meet these legal requirements.

Public cloud operations

Emissions accounting for IT operations in a public-cloud facility is straightforward: all emissions are Scope 2 for the cloud operator (since they own and operate the IT and facilities infrastructure) and Scope 3 for the IT operator (customer).

A cloud operation in a colocation facility adds another layer of allocation. Public-cloud IT energy use should be accounted for as Scope 3 by the colocation operator and Scope 2 by the cloud operator, with facility infrastructure-related emissions accounted for as Scope 2 and Scope 3 by each entity respectively. This represents no change for the IT operator: all emissions associated with its cloud-based applications and data — regardless where that cloud footprint exists — will be accounted for as Scope 3.

IT operators report that they have difficulty obtaining energy-use and emissions information from their cloud providers. The larger cloud operators, and several of the large colocation providers, typically claim that there are zero emissions associated with operations at their facilities because they are carbon-neutral on account of buying renewable energy and carbon offsets. The same providers are typically unable or unwilling to provide more detailed information — making compliance with legally mandated reporting requirements difficult for IT operators.

If IT operators are to comply with forthcoming disclosure obligations they will, in accordance with the GHG Protocol, need data on their energy use and their location-based (grid power mix) and market-based (contractual mix) emissions. They will also need more granular information on renewable energy consumption and the application of renewable energy certificates (RECs) in offsetting grid power use and the associated emissions if they are to fully understand the underlying details.

Required cloud and colocation provider sustainability data

With new sustainability reporting regulations due to take effect in the medium term, IT operators will clearly need more detailed data on energy and emissions from their infrastructure providers in meeting their compliance responsibilities, as well as in assessing the total environmental impact of their operations. Colocation and cloud services providers (and others providing hosting and various IT infrastructure services) will be expected to provide the data listed below — ideally as a condition of any service contract. This data will provide the information necessary to complete TCFD climate disclosures, as well as the IT operator’s sustainability report. Additional data may need to be added to this list to address specific local reporting or operating-efficiency mandates.

Data-transfer requirements for colocation and cloud services contracts should facilitate the annual reporting of operational data including:

  • IT power consumption as reported through the operator-specific meter.
  • 12-month average PUE for the space supporting the racks.
  • Quantity of waste heat recovered and reused.
  • Total-facility electricity consumption (over the year).
  • The percentage of each type of generation supplying electricity to the facility (i.e., coal, natural gas, wind, solar, biomass etc.).
  • Quantity of renewable energy consumed by the facility (megawatt hours, MWh).
  • MWh of RECs / grid offsets (guarantees of origin, GOs) matched to grid purchases to claim renewable energy “use” and / or to offset grid emissions (to include generation type(s) and the avoided emissions value for each group of RECs or GOs used to match grid electricity consumption).
  • Percentage of renewable energy “used” (consumed and matched) at the facility as declared by the supplier.
  • Reported location-based emissions for the facility (MT CO2).
  • Reported market-based emissions for the facility (MT CO2).
  • Average annual emissions factor of electricity supplied (by utility, energy retailer or country or grid region) (MT CO2/MWh).
  • Total-facility water consumption (over the year).

Note: GHG emissions values should be reported for the facility’s total fuel and electricity consumption. Scope 1 emissions should include any refrigerant emissions (fugitive or failure).

Energy-use data should be requested monthly or quarterly (recognizing that data reports from service providers will typically lag by one to three months) to allow tracking of power consumption, emissions metrics and objectives throughout the year. Current-year emissions can be estimated using the previous year’s emissions factor for electrical consumption at a facility.

Mandated reporting requirements will typically require data to be submitted in March following the end of the reporting year. Therefore, service agreements should allow for this to be delivered to clients by February.

Colocation and cloud-service providers need to develop methodologies to provide energy-use and location- and market-based-emissions estimates to their clients. Colocation providers should install metering to measure tenants’ IT power consumption, simplifying allocated energy use and emissions reporting. Cloud providers have several different approaches available for measuring or estimating energy use. Algorithms can be created that use IT-system and equipment power and utilization data (with tracking capabilities and / or knowledge of the power-use characteristics of the deployed IT equipment configurations) to estimate a customer’s energy use and associated location-based emissions.

Any calculation methodology should be transparent for customers. Cloud providers will need to choose a methodology that fits with their data collection capabilities and start providing data to their customers as soon as possible.

IT operators need to obtain information on the RECs, GOs and carbon offsets applied to the overall energy use at each facility at which they operate. This data will allow IT operators to validate the actual emissions associated with the energy consumed by the data center, as well as claims regarding renewable energy use and GHG emissions reductions. IT operators will need to exercise due diligence to ensure that data is accurately reported, and will need to match the service provider’s data to the operator’s chosen sustainability metrics.

Data required from IT tenants at colocation facilities

Colocation operators may require operational information from their tenants. The proposed EED recast is likely to require colocation operators to report specific IT operational data — which will have to be supplied by their tenants. At a minimum, colocation operators need to incorporate a clause into their standard contracts requiring tenants to provide legally mandated operational data. Contract language can be made more specific to the facilities covered by the forthcoming mandates as new regulations are promulgated.

Conclusion

The reporting of data center energy use and GHG emissions is undergoing a major transition — from a voluntary effort subject to limited scrutiny to legally mandated reporting requiring third-party assurance. These legal requirements can extend to smaller enterprise and colocation operators: the EED recast, for example, will apply to operations with just 100 kW of installed IT equipment power. These forthcoming requirements will require IT operators to take responsibility for their operations across all data-center categories — owned, colocation and cloud.

This new regulatory environment will mean digital infrastructure managers will now have to facilitate collaboration between their facilities teams, IT teams and data center service providers to create a coherent sustainability strategy across their operations. Processes will need to be created to generate, collect and report the data and metrics needed to comply with these requirements. At the industry level, standards need to be developed to create a consistent framework for data and metrics reporting.  

These efforts need to be undertaken with some urgency since many of these new reporting obligations will take effect from the 2023 or 2024 operating year.

Data center costs set to rise and rise

Data center costs set to rise and rise

Up until two years ago, the cost of building and operating data centers had been falling reasonably steeply. Improving technology, greater production volumes as the industry expanded and consolidated, large-scale builds, prefabricated and modular construction techniques, stable energy prices and the low costs of capital have all played a part. While labor costs have risen during this time, better management, processes and automation have helped to prevent spiraling wage bills.

The past two years, however, have seen these trends come to a halt. Ongoing supply chain issues and rising labor, energy and capital costs all set to make building and running data centers more expensive in 2023 and beyond.

But the impact of these cost increases — affecting IT as well as facilities — will be muted due to the durable growth of the data center industry, fueled by global digitization and the overwhelming appetite for more IT. In response, most large data center operators (and data center capacity buyers) are continuing to move forward with expansion projects and taking on more space.

Smaller and medium-sized data center operators, however, that lack the resources to weather higher costs are likely to find this particularly challenging, with some smaller colocation operators (and enterprise data centers) struggling to remain competitive. Increasing overhead costs arising from new regulatory requirements and climbing interest rates will further challenge some operators, but an immediate rush to the public cloud is unlikely since this strategy, too, has non-trivial (and often high) costs.

Capital costs

Capital plays a major part in the data center life cycle costing. Capital has been both cheap and readily available to data center builders for more than a decade: but the market changed in 2022. Countries that are home to major data center markets or to major companies that own and build data centers are now facing decades-high inflation rates (see Table 1), making it more difficult and more expensive to raise capital. But with increasing demand for capacity, partly due to a pent-up demand resulting from construction bottlenecks during the COVID-19 pandemic along with permitting and energy supply problems more recently, the most active and best positioned operators are funding their capacity expansion.

Table: Inflation rates as at September 2022
Table 1 Inflation rates as at September 2022

Uptime Institute’s Data Center and IT Spending Survey 2022 shows that more than two-thirds of enterprise and colocation operators expect to spend more in data center costs in 2023. Most enterprise data centers (90%) say they will be adding IT or data center capacity over the next two to three years, with half expecting to construct new facilities (although they may be closing down others).

The recent rise in construction costs may have come as a shock to some. Data center construction costs and lead-times had improved significantly in the 2010s, but we are now seeing a reversal of this trend. An average Tier III enterprise data center (a technical facility with concurrently maintainable site infrastructure) would have cost approximately $12 million per megawatt (MW) in 2010 per Uptime’s estimates (not including land and civil works) and would have taken up to two years to build.

Changes in design and construction had resulted in these costs dropping — in the best cases, to as little as $6 to $8 million per MW immediately before the COVID-19 pandemic, with lead-times cut to less than 12 months. While Uptime has not verified these claims, some projects were reported to have been budgeted at less than $4 million per MW and taken just six months to complete.

The view today is markedly different. Long waiting times for some significant components (such as certain engine generators and centralized UPS systems) are driving up prices. By 2022, costs for Tier III specifications had risen by $1 million to $2 million per MW according to Uptime’s estimates. Lead-times can now reach or exceed 12 months, prolonging capacity expansion and refurbishment projects — and sometimes preventing operators from earning revenue from near complete facilities.

While prices for some construction materials have started to stabilize at an elevated level since the COVID-19 pandemic, prices are expected to increase further in 2023. Product shortages, together with higher prices for labor, semiconductors and power, are all having an inflationary effect across the industry. Concurrently, site acquisitions at major data center hubs with low-latency network connections now come at a premium, as popular data center locations run out of suitable land and power.

Uptime Institute’s Supply Chain Survey 2022 shows computer room cooling units, UPS systems and power distribution components to be the data center equipment most severely impacted by shortages. Of the 678 respondents to this survey, 80% said suppliers had increased their prices over the past 18 months. Notably, Li-ion battery prices, which had been trending downwards every year until 2021, increased in 2022 due to shortages of raw materials coupled with high demand.

More stringent sustainability requirements, too, contribute to higher capital costs. Regulations in some major data center hubs (such as Amsterdam and Singapore) mean only developments with highly energy efficient designs can move forward. But meeting these requirements will come at a cost (engineering fees, structural changes, different cooling systems), lifting the barriers to entry. New energy efficiency standards (as stipulated under the EC’s Energy Efficiency Directive recast, for example) will stress budgets still further (see Critical regulation: the EU Energy Efficiency Directive recast).

Operators are looking to recover the cost of sustainability requirements through efficiency gains. Surging power costs, which are likely to remain high in the coming years, now mean the calculation has shifted in favor of more aggressive energy optimization — but upfront capital requirements will often be higher.

Operating and IT costs

The operating expenditures associated with data centers and IT infrastructure are also set to increase in 2023, due to steep rises in major input costs. Uptime Institute’s Data Center and IT Spending Survey 2022 showed power to be driving the greatest unit cost increases for most operators (see Figure 1) — the result of high gas prices, the transition to renewable energy, imbalances in grid supply and the war in Ukraine.

The UK and the EU have been most affected by these increases, with certain colocation operators passing down some significant increases in energy costs to their customers. While energy prices are expected to drop (at least against the record highs of 2022), they are likely to remain well above the average levels of the past two decades.

diagram: Enterprise data centers most impacted by IT hardware costs
Figure 1 Enterprise data centers most impacted by IT hardware costs

Second only to power, IT hardware showed the next greatest increase in unit costs for enterprise data center respondents, partly because of various dislocations in the hardware supply chain, shortages of some processors and switching silicon, and inflation. Demand for IT hardware has continued to outpace supply, and manufacturing backlogs resulting from the COVID-19 pandemic have yet to catch up.

Uptime sees promising signs of improvements in data center hardware supply, largely due to a recent sag in global demand (caused by economic headwinds and IT investment cycles). As a result, prices and lead-times for generic IT hardware (with some exceptions) will likely moderate in the first half of 2023.

If history is any guide, demand for data center IT will rise again some time in 2023 once some major IT infrastructure buyers accelerate their capacity expansion, which will yet again lead to tightness in the supply of select hardware later in the year.

Staffing will also play a major role in the increased cost of running data centers, and is likely to continue to impact the industry beyond 2023. Many operators say they are spending more on labor costs in a bid to retain current staff (see Figure 2). This presents a further challenge for those enterprises that are unable to match salary offers made by some of the booming tech giants.

diagram: Labor spending driven by staff retention initiatives
Figure 2 Labor spending driven by staff retention initiatives

The aggregate view is clear: the overall costs of building and running data centers is set to rise significantly over the next few years. While businesses can deploy various strategies and technologies — such as automation, energy efficiency and tactical migration to the cloud — to reduce operational costs, these are likely to entail capital investment, new skills and technical complexity.

Will data centers becoming more expensive drive more operators towards colocation or the cloud? It seems unlikely that higher on-premises costs will cause greater migration per se. Results from Uptime Institute’s Data Center and IT Spending Survey 2022 show that despite increasing costs, many operators find that keeping workloads on-premises is still cheaper than colocation (54%, n=96) or migrating to the cloud (64%, n=84).

Estimating the costs of each of these options, however, is difficult in a rapidly changing market, in which some costs are opaque. Given the high costs associated with migrating to the cloud, it is likely to be cheaper for enterprises to endure higher construction and refurbishment costs in the near term and benefit from lower operating costs over the longer term. Not all companies will be able capitalize on this strategy, however.

Those larger organizations with the financial resources to benefit from economies of scale, with the ability to raise capital more easily and with sufficient purchasing power to leverage suppliers, are likely to have lower costs compared with smaller companies (and most enterprise data centers). Given their scale, however, they are still likely to face higher costs elsewhere, such as sustainability reporting and calls for proving — and improving — their infrastructure resiliency and security.

The full report Five data center predictions for 2023 is available to download here.

See our Five Data Center Predictions for 2023 webinar here.


Max Smolaks

Douglas Donnellan

High costs drive cloud repatriation, but impact is overstated

High costs drive cloud repatriation, but impact is overstated

Unexpected costs are driving some data-heavy and legacy applications back from public-cloud to on-premises locations. However, very few organizations are moving away from the public cloud strategically — let alone altogether.

The past decade has seen numerous reports of so-called cloud “repatriations” — the migration of applications back to on-premises venues following negative experiences with, or unsuccessful migrations to, the public cloud. These reports have been cited by some colocation providers and private-cloud vendors as evidence of the public cloud’s failures, particularly concerning cost and performance.

Cloud-storage vendor Dropbox brought attention to this issue after migrating from Amazon Web Services (AWS) in 2017. Documents submitted to the US Securities Exchange Commission suggest the company saved an estimated $75 million over the next two years, as a result. Software vendor 37signals also made headlines after moving its project management platform Basecamp and email service Hey from AWS and Google Cloud to a colocation facility.

Responses to Uptime Institute’s 2022 Data Center Capacity Trends Survey also indicated that some applications are moving back to the public cloud. One-third (33%) of respondents said their organizations had moved production applications from a public-cloud provider to a colocation facility or data center on a permanent basis (Figure 1). The terms “permanently” and “production” were included in this survey question specifically to ensure that respondents did not consider applications being moved between venues due to application development processes or redistribution across hybrid-cloud deployments.

diagram: Many organizations are moving some applications out of public cloud
Figure 1 Many organizations are moving some applications out of public cloud

Poor planning for scale is driving repatriation

Respondents to Uptime Institute’s 2022 Data Center Capacity Trends Survey cited cost as the biggest driver behind migration back to on-premises facilities (Figure 2).

diagram: Unexpected costs are driving repatriation
Figure 2 Unexpected costs are driving repatriation

Why are costs greater than expected?

Data is often described as having “gravity” — meaning the greater the amount of data stored in a system the more data (and, very often, software applications) it will attract over time. This growth is logical in light of two major drivers: data growth and storage economics. Most users and applications accumulate more data automatically (and perhaps inadvertently) over time: cleaning and deleting data, on the other hand, is a far more manual and onerous (and, therefore, costly) task. At the same time, the economics of data storage promote centralization, largely driven by strong scale efficiencies arising from better storage management. Dropbox’s data-storage needs were always going to grow over time because it aggregated large volumes of consumer and business users — with each gradually storing more data with the service.

The key benefits of cloud computing is scalability — not just upwards during periods of high demand (to meet performance requirements), but also downwards when demand is low (to reduce expenditure). Dropbox cannot shrink its capacity easily, as its data has gravity. It cannot easily reduce costs by scaling back resources. Dropbox, moreover, as a collection of private file repositories, does not benefit from other cloud services (such as web-scale databases, machine learning or Internet of Things technologies) that might use this data. Dropbox needs ever-growing storage capacity — and very little else — from a cloud provider. At Dropbox’s level of scale, the company would inevitably save money by buying storage servers as required and adding them to its data centers.

Does this mean all data-heavy customers should avoid the public cloud?

No. Business value may be derived from using cloud services which use this growing data as a source. This value often justifies the expense of storing cloud data. For example, a colossal database of DNA sequences might create a significant monthly spend. But if a cloud analytics service (one that would otherwise be time consuming and costly to deploy privately) could use this data source to help create new drugs or treatments, the price would probably be worth paying.

Many companies will not have the scale of Dropbox to make on-premises infrastructure cost efficient in comparison with the public cloud. Companies with only a few servers’ worth of storage might not have the appetite (or the staff) to manage storage servers and data centers when they could, alternatively, upload data to the public cloud. However, the ever-growing cost of storage is by no means trivial, even for some smaller companies: 37signals’ main reason for leaving the public cloud was the cost of data storage — which the company stated was over $500,000 per year.

Other migrations away from the public cloud may be due to “lifting-and-shifting” existing applications (from on-premises environments to the public cloud) without rearchitecting these to be scalable. An application that can neither grow to meet demand, nor shrink to reduce costs, rarely benefits from deployment on the public cloud (see Cloud scalability and resiliency from first principles). According to Uptime Institute’s 2022 Data Center Capacity Trends Survey most applications (41%) that were migrated back to on-premises infrastructure were existing applications that had previously been lifted and shifted to the public cloud.

The extent of repatriation is exaggerated

Since Dropbox’s migration, many analyses of cloud repatriation (and the associated commentary) have assumed an all-or-nothing approach to public cloud, forgetting that a mixed approach is a viable option. Organizations have many applications. Some applications can be migrated to the public cloud and perform as expected at an affordable price; others may be less successful. Just because 34% of respondents have migrated someapplications back from the public cloud it does not, necessarily, mean the public cloud has universally failed at those organizations. Nor does it suggest that the public cloud is not a viable model for all applications.

Only 6% of respondents to Uptime Institute’s 2022 Data Center Capacity Trends Survey stated that they had abandoned the public cloud altogether (Figure 3) due to cloud repatriation. Some 23% indicated that repatriation had no impact on public-cloud usage, with 59% indicating that cloud usage had been somewhat reduced by cloud adoption.

diagram: Overall impacts of moving from public cloud
Figure 3 Overall impacts of moving from public cloud

The low numbers of respondents abandoning public cloud suggests most are pursuing a hybrid approach, involving both on-premises and public-cloud venues. These venues don’t necessarily work together as an integrated platform. Hybrid IT here refers to an open-minded strategy regarding which venue is the best location for each application’s requirements.

Conclusion

Some applications are moving back to on-premises locations from the public cloud, with unexpected costs being the most significant driver here. These applications are likely to be slow-growing, data-heavy applications that don’t benefit from other cloud services, or applications that have been lifted and shifted without being refactored for scalability (upwards or downwards). The impact of repatriation on public-cloud adoption is, however, moderate at most. Some applications are moving away from the public cloud, but very few organizations are abandoning the public cloud altogether. Hybrid IT — at both on-premises and cloud venues — is the standard approach. Organizations need to thoroughly analyze the costs, risks and benefits of migrating to the public cloud before they move — not in retrospect.

Too hot to handle? Operators to struggle with new chips

Too hot to handle? Operators to struggle with new chips

Standard IT hardware was a boon for data centers: for almost two decades, mainstream servers have had relatively constant power and cooling requirements. This technical stability moored the planning and design of facilities (for both new builds and retrofits) and has helped attract investment in data center capacity and technical innovation. Furthermore, many organizations are operating data centers near or beyond their design lifespan because, at least in part, they have been able to accommodate several IT refreshes without major facility upgrades.

This stability has helped data center designers and planners. Data center developers could confidently plan for design power averaging between 4 kilowatts (kW) and 6 kW per rack, while (in specifying thermal management criteria) following US industry body ASHRAE’s climatic guidelines. This maturity and consistency in data center power density and cooling standards has, of course, been dependent on stable, predictable power consumption by processors and other server components.

The rapid rise in IT power density, however, now means that plausible design assumptions regarding future power density and environmental conditions are starting to depart from these standard, narrow ranges.

This increases the technical and business risks — particularly because of the risks inherent under future, potentially divergent scenarios. The business costs of incorrect design assumptions can be significant: be too conservative (i.e., retain low-density approaches), and a data center may quickly become limited or even obsolete; be too technically aggressive (i.e., assume or predict highly densified racks and heat reuse) and there is a risk of significant overspend on underutilized capacity and capabilities.

Facilities built today need to remain economically competitive and technically capable for 10 to 15 years. This means certain assumptions must be made through speculation, without data center designers knowing the future specifications of IT racks. As a result, engineers and decision-makers need to grapple with the uncertainty that will surround data center technical requirements for the second half of the 2020s and beyond.

Server heat turns to high

Driven by the rising demand of IT silicon, server power and — in turn — typical rack power are both escalating. Extreme-density racks are also increasingly prevalent in technical computing, high-performance analytics and artificial intelligence training. New builds and retrofits will be more difficult to optimize for future generations of IT.

While server heat output remained relatively modest, it was possible to establish industry standards around air cooling. ASHRAE’s initial recommendations on supply temperature and humidity ranges (in 2004, almost 20 years ago) met the needs and risk appetites of most operators. ASHRAE subsequently encouraged incrementally wider ranges, helping drive industry gains in facilities’ energy efficiency.

Uptime Institute research shows a trend in consistent, if modest, increases in rack power density over the past decade. Contrary to some (aggressive) expectations, the typical rack remains under 10 kW. This long-running trend has picked up pace more recently, and Uptime expects it to accelerate further. The uptick in rack power density is not exclusively due to more heavily loaded racks. It is also due to greater power consumption per server, which is being driven primarily by the mass-market emergence of higher-powered server processors that are attractive for their performance and often superior energy efficiency if utilized well (Figure 1).

diagram: Server power consumption on a steep climb
Figure 1 Server power consumption on a steep climb

This trend will soon reach a point when it starts to destabilize existing facility design assumptions. As semiconductor technology slowly — but surely — approaches its physical limits, there will be major consequences for both power delivery and thermal management (see Silicon heatwave: the looming change in data center climates).

“Hotter” processors are already a reality. Intel’s latest server processor series, expected to be generally available from January 2023, achieves thermal design power (TDP) ratings as high as 350 watt (W) — with optional configuration to more than 400 W should the server owner seek ultimate performance (compared with 120 W to 150 W only 10 years ago). Product roadmaps call for 500 W to 600 W TDP processors in a few years. This will result in mainstream “workhorse” servers approaching or exceeding 1 kW in power consumption each — an escalation that will strain not only cooling, but also power delivery within the server chassis.

Servers for high-performance computing (HPC) applications can act as an early warning of the cooling challenges that mainstream servers will face as their power consumption rises. ASHRAE, in a 2021 update, defined a new thermal standard (Class H1) for high-density servers requiring restricted air supply temperatures (of up to 22°C / 71.6°F) to allow for sufficient cooling, adding a cooling overhead that will worsen energy consumption and power usage effectiveness (PUE). This is largely because of the number of tightly integrated, high-power components. HPC accelerators, such as graphics processing units, can use hundreds of watts each at peak power — in addition to server processors, memory modules and other electronics.

The coming years will see more mainstream servers requiring similar restrictions, even without accelerators or densification. In addition to processor heat output, cooling is also constrained by markedly lower limits on processor case temperatures — e.g., 55°C, down from a typical 80°C to 82°C — for a growing number of models. Other types of data center chips, such as computing accelerators and high-performance switching silicon, are likely to follow suit. This is the key problem: removing greater volumes of lower-temperature heat is thermodynamically challenging.

Data centers strike the balance

Increasing power density may prove difficult at many existing facilities. Power or cooling capacity may be limited by budgetary or facility constraints — and upgrades may be needed for live electrical systems such as UPS, batteries, switchgears and generators. This is expensive and carries operational risks. Without it, however, more powerful IT hardware will result in considerable stranded space. In a few years, the total power of a few servers will exceed 5 kW: and a quarter-rack of richly configured servers can reach 10 kW if concurrently stressed.

Starting with a clean sheet, designers can optimize new data centers for a significantly denser IT configuration. There is, however, a business risk in overspending on costly electrical gear, unless managed by designing a flexible power capacity and technical space (e.g., prefabbed modular infrastructure). Power requirements for the next 10 to 15 years are still too far ahead to be forecast with confidence. Major chipmakers are ready to offer technological guidance covering the next three to five years, at most. Will typical IT racks reach average power capacities of 10 kW, 20 kW or even 30 kW by the end of the decade? What will be the highest power densities a new data center will be expected to handle? Today, even the best informed can only speculate.

Thermal management is becoming tricky too. There are multiple intricacies inherent in any future cooling strategy. Many “legacy” facilities are limited in their ability to supply the necessary air flow to cool high-density IT. The restricted temperatures typically needed by (or preferable for) high-density racks and upcoming next-generation servers, moreover, demand higher cooling power at the risk of losing IT performance (modern silicon throttles itself when it exceeds temperature limits). To which end, ASHRAE recommends dedicated low-temperature areas to minimize the hit on facilities’ energy efficiency.

A growing number of data center operators will consider support for direct liquid cooling (DLC), often as a retrofit. Although DLC engineering and operations practices have matured, and now offer a wider array of options (cold plates or immersion) than ever before, its deployment will come with its own challenges. A current lack of standardization raises fears of vendor lock-in and supply-chain constraints for key parts, as well as a reduced choice in server configurations. In addition, large parts of enterprise IT infrastructure (chiefly storage systems and networking equipment) cannot currently be liquid-cooled.

Although IT vendors are offering (and will continue to offer) more server models with integrated DLC systems, this approach requires bulk buying of IT hardware. For facilities’ management teams, this will lead to technical fragmentation involving multiple DLC vendors, each with its own set of requirements. Data center designers and operations teams will have to plan not only for mixed-density workloads, but also for a more diverse technical environment. The finer details of DLC system maintenance procedures, particularly for immersion-type systems, will be unfamiliar to some data center staff, highlighting the importance of training and codified procedure over muscle memory. The propensity for human error can only increase in such an environment.

The coming changes in data center IT will be powerful. Semiconductor physics is, fundamentally, the key factor behind this dynamic but infrastructure economics is driving it: more powerful chips tend to help deliver infrastructure efficiency gains and, through the applications they run, generate more business value. In a time of technological flux, data center operators will find there are multiple opportunities for gaining an edge over peers and competitors — but not without a level of risk. Going forward, adaptability is key.


See our Five Data Center Predictions for 2023 webinar here.


Jacqueline Davis, Research Analyst, Uptime Institute

Max Smolaks, Research Analyst, Uptime Institute