Common Factors for IT venue selection (Choosing In-house or Outsource)

Data center capacity is rapidly expanding in outsourced, third-party IT venues, such as colocation data centers and public cloud. Whether measured in megawatts (MW) of uninterruptible power supply capacity or IT load (or by some other measure, such as square feet of white or leased space, or in units of compute or storage) overall capacity is growing rapidly.

There are many factors driving this growth, including newer cloud-based services, such as social media and streaming; mobile applications and services; new enterprise services and applications; the migration of corporate workloads into colocation sites; and the adoption of software as a service (SaaS) and public cloud platforms.

Does this mean that almost all IT workloads will end up running in a third-party data center? In a word, No, not for a long time and in some cases, never. In the 2020 Uptime Institute survey, as in years past, we asked respondents to estimate, by percentage, how much of their workload/data is processed/stored in different types of data centers today and how this might look in two years’ time.

The majority (58%) said that most of their workloads run in corporate data centers today — that is, enterprise-owned, on-premises facilities. And if you add in Micro datacenters and Server closets, that figure rises to 69%!

These findings are similar to those of our previous years’ surveys. They confirm Uptime Institute’s view that the enterprise-owned data center sector, while not necessarily the most innovative, will continue to be the foundation of enterprise IT for the next decade. In our survey, nearly two-thirds of IT workloads are expected to be running in privately owned environments (large data centers, server closets and micro data centers) by 2022, with the remainder contracted to external suppliers. Although the enterprise data center sector is falling as a percentage of the whole, the absolute amount of enterprise data center capacity is still growing.

A mix of factors typically drive enterprise demand for third-party IT venues; similarly, multiple factors drive demand for on-premises data centers. Often some combination of the factors listed below come into play when deciding the best-execution venue for workloads.

COMMON DRIVERS FOR IT VENUE SELECTION DECISIONS

Choosing Outsourced:

Cost: Outsourcing can lower costs in the short to medium term. For organizations “born” in a public cloud or colo, it typically is cost-prohibitive to move to an enterprise data center.

Cost allocation: Outsourcing shifts cost allocations from capex toward more repeatable opex models.

IT agility and flexibility: Outsourcing provides the ability to readily and quickly adapt to changing capacity needs without the burden of managing the full stack of IT and applications; IT can be used for a project’s duration only (e.g., for test and development).

Access to resources: Third-parties may provide access to a wider range of resources, including technology, interconnections, software tools, services and application environments.

Security: Third-parties can offer the most advanced, highly resourced security features.

 

Choosing on-premise:

Cost: Ownership delivers total cost of ownership benefits over the long term; in the shorter term, owners avoid the data transport costs of moving to an outsourced venue.

Governance: On-premises environments may be necessary for compliance with data governance and regulatory requirements.

Control: Owners can closely monitor and control factors such as latency, availability and application performance. While most outsourced venues are strong in these areas, service level agreements vary and are limited.

Risk: Ownership ensures full visibility into (and the ability to adjust) the risk profile of every workload.

Security: Ownership provides the ability to maintain control and governance (dedicated rather than shared physical infrastructure) over security features.

Want to know more about venue selection, data center staffing and skills, latest technologies in use and the cloud, risk and cost management, operational strategies and more? Check out our full 2020 Annual Survey.

COVID-19, air filtration and energy use

The COVID-19 pandemic has caused concerns about data center HVAC (heating, ventilation and air conditioning) filtration. So many data center operators are adjusting filtration protocols, including upgrading to finer MERV (minimum efficiency reporting value) 13 filters, to better filter out aerosols and COVID-19 virus carriers.1 But there is no free lunch. Depending on the design of the data center’s HVAC system, this increase in filtration static pressure can slightly increase energy consumption.

Technically, with a PSC (permanent split capacitor) blower motor, extra static pressure from this denser filter would result in lower airflow, reduced system performance and less air distribution. With ECMs (electronically commutated motors) — the type of motor most commonly found in data center cooling systems — this extra static pressure can result in higher blower motor power consumption.

Recently, an Uptime Institute member reported an increase of 8% energy use following a filtration upgrade from a MERV 8 to a more dense MERV 13 filter, due to the increased fan energy required to maintain static pressure and airflow. Since their data center fan power is 7-10% of total data center energy used, any increased fan energy affects their PUE (power usage effectiveness).

But this single data point may not be representative of all environments, since the quality and quantity of the media in the filter and the configuration of the HVAC system can significantly affect static pressure and airflow. Denser MERV 13 filters won’t necessarily require more fan energy if the operator installs filters that increase surface area by having more and deeper pleats.2

There is a direct correlation between airflow CFM (cubic feet per minute) and IWC (initial resistance water column), so some HVAC adjustments might be needed (see chart below).2


 

Source: “Is There a Downside to High-MERV Filters? The new high-MERV filters extract an energy penalty,” by David Springer, November 02, 2009, in Home energy: The Home Performance Magazine. http://www.homeenergy.org/show/article/nav/issues/page/4/id/667

Some guidance is in order: When choosing a filter remember that airflow (static pressure increase) is just as important to consider as filtration (MERV rating). Just because a filter fits doesn’t mean it’s the best filter for the system, so its very important to check the rating of the facility’s filter and the design specifications of the cooling system.

Many data center cooling systems use an airflow rate between 330–490 CFM, so the system would experience 0.10-0.16 initial resistance (static pressure increase in inches of water column) for the MERV 11 filter example shown in the image below.3


 

 


Typical data center underfloor static pressure is 20-25 pascal or 0.08-0.1 inches of water column. Therefore, while this filter would introduce some increased static pressure, it pales in comparison to the 220 pascal static pressure required to move air through the cooling coils in the CRAC (computer room air conditioning) units or the impingement and turning of cooled air on the subfloor, which can be as high as 200 pascal.4

With careful airflow planning there should be minimal airflow and pressure impacts from more dense filters, such as MERV 13. These finer filters, though, will need to be changed more frequently, since they will trap more particulates and so will clog sooner. In addition to the higher change frequency, budget for higher per-unit costs, for filters with larger surface area media will likely be more expensive than smaller, less efficient filters.

But everything considered, given the modest energy trade-offs and the increased particulate and virus filtering they provide, MERV 13 filter use is quickly becoming the new normal.


Endnotes

1 https://www.ashrae.org/technical-resources/commercial#general
2 “Is There a Downside to High-MERV Filters? The new high-MERV filters extract an energy penalty,” by David Springer, November 02, 2009, in Home energy: The Home Performance Magazinehttp://www.homeenergy.org/show/article/nav/issues/page/4/id/667
https://www.hvacrschool.com/air-filter-static-pressure-drop/
http://asmedigitalcollection.asme.org/heattransfer/article-pdf/132/7/073001/5794918/073001_1.pdf

Why ASHRAE is concerned about edge data centers

There are few organizations that have had as big an impact on data center design as ASHRAE — and specifically, their Technical Committee (TC) 9.9. ASHRAE’s 2004 publication Thermal guidelines for Data Processing Environments (now in its fourth edition) described the optimal environmental operating conditions for the electronics in a data center, and in doing so, effectively established some of the key design goals that have been followed by almost every data center builder and operator for the past 15 years.

In October 2020, ASHRAE published what it describes as a “groundbreaking” Technical Bulletin — this time looking specifically at edge data centers. The Bulletin is a lighter, shorter document than most previous TC9.9 publications, and it does not signal any changes in the thermal operating guidelines that remain so important. Nor does it reveal any significant new findings — even though TC9.9 members do have access to a lot of data and tests conducted by equipment manufacturers.

But its brevity does not imply that it lacks importance. The ASHRAE committee members want to send a warning that unless careful processes and guidelines are followed, and good technical choices made, there will be a significantly increased risk of electronic equipment failure in many edge data centers. Given that a lot more processing will be done at the edge in the next decade or so that means there could be more failures and service disruptions in the future than in the past.

In the document, and in separate conversations with Uptime Institute (which has members on the TC9.9 committee), ASHRAE TC9.9 members identify two major issues as they relate to edge data centers.

First, the simple fact is that almost all “core” enterprise, large, and colocation data centers have a strictly climate-controlled white space. Small edge data centers, however, cannot always maintain this — especially during maintenance. Edge data center doors may be open during maintenance, exposing equipment to rapid temperature change, rain and pollution. Cooling may be turned off during maintenance, leading to rapid heat rise; routine or urgent maintenance may not be possible or timely; and edge environments are more likely to have a higher density and to be hotter, colder, wetter, dustier or more insecure than big, remote, well-managed facilities.

A second issue is that a lot of the equipment on the market has been designed for use in large data centers where failure is more easily tolerated — or even deliberately allowed, as part of an overall “abandon in place” strategy. Put bluntly, the quality of IT equipment is no longer required to be as high as it used to be, because it has been designed to be quickly swapped out. Some leading suppliers no longer do the systematic, lifecycle-based testing that was once a necessity.

What can be done?

The first place to start is to avoid poor site selection — but that, of course, is not always an option for small edge data centers, which need to be near the point of use. Ideally, the actual data center (which may be prefabricated) should be designed to allow ease of maintenance without exposure to outside conditions, to minimize damage (such as to batteries and IT equipment). A data center infrastructure management system that remotely monitors key environmental variables and IT performance will help prevent failures. It also makes sense to design with some concurrent maintainability/fault tolerance, if the design goals/budgets allow.

Other key points of advice are:

  • Protect the edge facility by servicing only during moderate weather or when using a mantrap or tent, to avoid potentially deleterious conditions. Even time of day can be a factor (e.g., humidity is often higher in the morning).
  • Monitor humidity, condensation and temperature during servicing.
  • Monitor the rate of temperature or humidity change when doors are opened.
  • Beware of local particulates/pollution, such as ocean spray, dust, industrial pollutants and heavy vehicle exhaust.
  • Air filtration should meet necessary standards — MERV (Minimum Efficiency Reporting Value) 11 or 13.
  • Beware the effect of gaseous pollutants that cause corrosion (e.g., of components containing copper or silver).
  • Use remote monitoring where possible to track corrosion rate, filter performance, etc.

The key takeaway from this ASHRAE Bulletin is that it is far more difficult (and possibly more expensive) to control virtually all environmental variables in edge data centers. This will almost certainly lead to more failures. Vigilance and good practices can improve early detection and reduce the likelihood of failures and mitigate their impact.

The full document Edge Computing: Considerations for Reliable Operation is available on the ASHRAE website..

PUE: The golden metric is looking rusty

When the PUE (power usage effectiveness) metric was first discussed at a meeting of The Green Grid in Santa Clara, back in 2007, a microphone stand was placed in each aisle of the auditorium. The importance of the initiative was understood even then: experts, including the founders of Uptime Institute, formed lines to give their considered input. And if there was one point that came across, it was that the industry should not treat PUE as a comparative metric, a gold standard that every designer, operator and planning authority must chase down, for comparison, presentation and applause.

Thirteen years on, that is almost exactly what has happened. PUE, thanks to its simplicity and universal applicability, has become the critical benchmark for scoring data centers for efficiency and “greenness” (environmental responsibility). But it is often used uncritically.

This has had both positive and negative consequences. The positive side is that PUE values have been forced down across the world, as operators strive to reduce waste and operating costs. To meet their goals, suppliers have focused on improving electrical efficiency of equipment and have developed nonmechanical alternatives (such as free cooling). PUEs have fallen from an average of 2.5 in 2007 to around 1.6 today. In an Uptime Institute survey, around 95% of respondents said it is important that colocation companies have a low PUE.

So what is the negative side? Its simplicity. The very quality that led to PUE’s near universal adoption also may have led to its being over-used and misapplied — increasingly, with unintended consequences.

This observation is hardly new — indeed, this is exactly what those experts feared back in 2007 as they lined up to voice their concerns. And it is why PUE was developed into a standard, with rules on how it should be measured, and why Partial PUE variants were developed.

But the unintended consequences are becoming more serious and more concerning. The metric may be driving the wrong behaviors. For this reason, it is very possible that PUE values may begin to be de-emphasized in the near future and more operators may allow their number to rise a little. (In 2019, Uptime Institute did note a marginal rise in PUE values.) If this happens, some operators may find themselves in conflict with certain regulators and planners, for whom PUE has been an easily understood but simplistic way of enforcing environmental goals (policing this is another matter). These planning authorities, then, may also need to review their policies around driving down the PUE numbers. (California, Singapore, Amsterdam, take note.)

There are, we think, three strong and strengthening reasons why focusing on the PUE of the data center might be counter-productive:

  • Resiliency (N to N+1). There is an industry-wide move to increase the resiliency of data centers, with the growing dependency on IT and the impact of the pandemic and extreme weather among the drivers. At the physical site level, it is difficult to improve redundancy and resiliency without adding power-consuming equipment — which drives up the PUE. (Similarly, one operator installed the highest level of MERV [Minimum Efficiency Reporting Value] air filters to filter out viruses — but fan energy use rose by 8% and the PUE value rose accordingly.)
  • Water use. The drive to improve energy efficiency has encouraged operators to use economizers and, in particular, adiabatic and evaporative cooling and chillers. But this has often involved a trade-off: energy use drops, but water use rises. That compromise is becoming problematic in some regions, where climate change means water is becoming more scarce and, in some instances, a bigger local environmental issue than on-site energy use. Again, PUE may be the wrong metric to track — WUE (water usage effectiveness) is a more important one.
  • IT efficiency. It is widely known that while facilities have become ever more energy efficient, IT use can be extremely inefficient and wasteful, with processors and memory often drawing energy while doing relatively little work. Energy-saving efficiency investments on the IT side, however, are not rewarded with a lower PUE, but rather a higher one (as PUE is the ratio of the IT to non-IT energy figures). This has been known from the outset. With a slew of new technologies becoming available to lower the overall energy figure, it is important that the need to meet a given PUE value does not discourage investments and innovation on the IT side.

There is, to be fair, a good counter-argument to these three points: that by searching for ways to lower the PUE regardless of these constraints, data center operators can find ways to meet all the goals. Liquid cooling, for example, can be highly resilient and uses little water. The argument here, then, is not for anyone to abandon PUE, or to stop watching and measuring it, but to apply it more cautiously than ever, and more flexibly.

Why data center operators are investing in more redundancy

When Uptime Institute recently asked over 300 data center managers how the pandemic would change their operations, one answer stood out: Two-thirds expect to increase the resiliency of their core data center(s) in the years ahead. Many said they expected their costs to increase as a result.

The reasoning is clear: the pandemic — or any future one — can mean operating with fewer staff, and possibly with disrupted service and supply chains. Remote monitoring and preventive maintenance will help to reduce the likelihood of an incident, but machines will always fail. It makes sense to reduce the impact of failure by increasing system redundancy.

But even before the pandemic, there was a trend toward higher levels of redundancy. As shown in the figure below, roughly half of those participating in the Uptime Institute 2020 global survey of suppliers, designers and advisors reported their customers have increased redundancy levels in the last three to five years.


 

 


This trend may seem unsurprising to some, but it was not entirely predictable. The growth of cloud has been accompanied by the much greater use of multisite resiliency and regional availability zones. In theory, at least, these substantially reduce the impact of single-site facility outages, because traffic and workloads can be diverted elsewhere. Backed by this capability, some operators — Facebook is an example — have proceeded with lower levels of redundancy than was common in the past (thereby saving costs and energy).

The use of availability zones, however, has come with its own problems, with networking and software issues often causing service interruptions. And the loss of one data center immediately places capacity and traffic demand on others, heightening risks. For this reason, even the big cloud providers and internet application operators manage mostly concurrently maintainable facilities, and it is common for them to stipulate that colocation partners have N+2 level facilities.

With a variety of options, the overall shift to increased resiliency is still slow and quite nuanced, with designers mostly favoring either N+1 or N+2 configurations, according to site and business needs, and, often, according to the creativity of the designers. Overall, there is actually a marginal decrease in the number of data centers that are 2N, but a steady three-year shift from N+1 to N+2 — not only in power, but also in cooling (see figure below). There is also an increase in the use of active-active availability zones, as discussed in our recent report Uptime Institute global data center survey 2020.


 

 


Demand patterns and growing IT dependency partly account for these higher levels of redundancy/resiliency. The level of resiliency needed for each service or by each customer is dictated by business requirements, but this is not fixed in time. The growing criticality of many IT services highlights the importance of mitigating risk through increased resiliency. “Creeping criticality” — a situation in which infrastructure and processes have not been upgraded or updated to reflect the growing criticality of the applications or business processes they support — may require redundancy upgrades.

Uptime Institute expects operators to make more use of distributed resiliency in the future — especially as more workloads are designed using cloud or microservices architectures (workloads are more portable, and instances are more easily copied). But there is no sign that this is diminishing the need for site-level resiliency. The software running these distributed services is often opaque, complex and may be prone to errors of programming or configuration. Annual outage data shows these types of issues are proliferating. Further, any big component failures can cascade, making recovery difficult and expensive, with data and applications synchronized across multiple sites.

The trend for now is clear: more resiliency at every level is the least risky approach — even if it means some extra expense and duplication of effort.

Job Projections, 2019-2029, Macro shifts, Gig work and the Baby-Boomers

In mid-September 2020, The US Bureau of Labor and Statistics published their updated 2019-2029 Employment Projection summary news release and the associated handbook which discusses various job roles and hiring and salary expectations over the next decade. It identified a number of easy to consume trends which caught my attention and perhaps worth your consideration.

In the report, they state that in 2019 we had 162.8 million workers in the USA which they project to grow by 6-8 million jobs (in total) over the next ten years, an annual growth of less than HALF of 1%, compared to the last decade’s growth of 1.3%. Not surprising (and due to the aging of America), 60% of those new jobs will be in the health and medical fields. You can browse through the interactive handbook and click on any number of filters and report types, but BE SURE to try the links in the “Browse Occupations” section on the bottom half of the first screen. Try “Most New Jobs” or “Highest Paid Jobs”. (Spoiler alerts: Highest-Paid Jobs has Psychiatrists listed at the top, and while Most-New Jobs is listed as “Home Health”, the second biggest increase in jobs is Fast Food workers!)

BLS talks about some notable related and supporting projections which can be summarized as follows:

  • More people over 55 years old will continue to be employed full-time (financial stability has decreased)
  • Fewer people below 34 years old will choose to be employed full-time (the ‘desk job’ has less appeal to many in this age group who want freelance or “gig” style work)
  • About 57% of women will work full-time outside the home, compared to 66% of men (and both will be down slightly from today)

The BLS offers their narrative: “The decline in labor force participation is due to the aging of the babyboom generation, a continuation of the declining trend in men’s participation, and a slight decline in women’s participation”. And throughout the report they cite the aging of the baby-boomers as a root cause.

BLS spends billions of dollars and countless resources gathering data and analyzing it and ultimately come to the same conclusion and projections that we all implicitely feel: Baby-Boomers are timing out, or as I like to say ‘Greying out’.  Couple this workforce aging with the URGENT macro topics of:

  • COVID – all of the long-term shifts and economics caused by it
  • Trade Policies – shifting of raw materials/goods/transportation/macro-economics
  • ‘GIG work Mentality’ – intentionally skipping full-time employment altogether
  • Online everything – The death of brick and mortar retail
  • ‘Valuing Differences’ – Diversity/Devision/HumanRights

and we have a crazy ride over the next 10 years to be sure. The world we knew has changed, and those changes are STRUCTURAL. We will never go back to pre-2020 life…. This is not a fad, nor a short-term manageable incident. The world has changed and the quicker each of us decides how they wish to participate in the new structure, the easier life will be for that person. And when enough people have gotten on board, life will become easier once again.

Note: While the report above is USA-centric, similiar patterns are already being seen worldwide.