Despite years of discussion, warnings and strict regulations in some countries, data center hot work remains a contentious issue in the data center industry. Hot work is the practice of working on energized electrical circuits (voltage limits differ regionally) — and it is usually done, in spite of the risks, to reduce the possibility of a downtime incident during maintenance.
Uptime Institute advises against hot work in almost all instances. The safety concerns are just too great, and data suggests work on energized circuits may — at best — only reduce the number of manageable incidents, while increasing the risk of arc flash and other events that damage expensive equipment and may lead to an outage or injury. In addition, concurrently maintainable or fault tolerant designs as described in Uptime Institute’s Tier Standard make hot work unnecessary.
The pressure against hot work continues to mount. In the US, electrical contractors have begun to decline some work that involves working on energized circuits, even if an energized work permit has been created and signed by appropriate management, as required by National Fire Protection Association (NFPA) 70E (Standard for Electrical Safety in the Workplace). In addition, US Department of Labor’s Occupational Safety and Hazards Agency (OSHA) has repeatedly rejected business continuity as an exception to hot work restrictions, making it harder for management to justify hot work and to find executives willing to sign the energized work permit.
OSHA statistics make clear that work on energized systems is a dangerous practice, especially for construction trades workers; installation, maintenance, and repair occupations; and grounds maintenance workers. For this reason, NFPA 70E sharply limits the situations in which organizations are allowed to work on energized equipment. Personnel safety is not the only issue; personal protective equipment (PPE) protects only workers, not equipment, so an arc flash can destroy many thousands of dollars of IT gear.
Ignoring local and national standards can be costly, too. OSHA reported 2,923 lockout/tagout and 1,528 PPE violations in 2017, among the many safety concerns it addressed that year. New minimum penalties for a single violation exceed $13,000, with top total fines for numerous, willful and repeated violations running into the millions of dollars. Wrongful death and injury suits add to the cost, and violations can lead to higher insurance premiums, too.
Participants in a recent Uptime Institute discussion roundtable agreed that the remaining firms performing work on live loads should begin preparing to end the practice. They said that senior management is often the biggest impediment to ending hot work, at least at some organizations, despite the well-known and documented risks. Executive resistance can be tied to concerns about power supplies or failure to maintain independent A/B feeds. In some cases, service level agreements contain restrictions against powering down equipment.
Despite executive resistance at some companies, the trend is clearly against hot work. By 2015, more than two-thirds of facilities operators had already eliminated the practice, according to Uptime Institute data. A tighter regulatory environment, heightened safety concerns, increased financial risk and improved equipment should combine to all but eliminate hot work in the near future. But there are still holdouts, and the practice is far more acceptable in some countries — China is an example — than in others, such as the US, where NFPA 70E severely limits the practice in all industries.
Also, hot work does not eliminate IT failure risk. Uptime Institute has been tracking data center abnormal incidents for more than 20 years and when studying the data, at least 71 failures occurred during hot work. While these failures are generally attributed to poor procedures or maintenance, a recent, more careful analysis concluded that better procedures or maintenance (or both) would have made it possible to perform the work safely — and without any failures — on de-energized systems.
The Uptime Institute abnormal incident database includes only four injury reports; all occurred during work on energized systems. In addition, the database includes 16 reports of arc flash. One occurred during normal preventive maintenance and one during an infrared scan. Neither caused injury, but the potential risk to personnel is apparent, as is the potential for equipment damage (and legal exposure).
Undoubtedly, eliminating hot work is a difficult process. One large retailer that has just begun the process expects the transition to take several years. And not all organizations succeed: Uptime Institute is aware of at least one organization in which incidents involving failed power supplies caused senior management to cancel their plan to disallow work on energized equipment.
According to several Uptime Institute Network community members, building a culture of safety is the most time-consuming part of the transition from hot work, as data centers are goal-oriented organizations, well-practiced at developing and following programs to identify and eliminate risk.
It is not necessary or even prudent to eliminate all hot work at once. The IT team can help slowly retire the practice by eliminating the most dangerous hot work first, building experience on less critical loads, or reducing the number of circuits affected at any one time. To prevent common failures when de-energizing servers, the Operations team can increase scrutiny on power supplies and ensure that dual-corded servers are properly fed.
In early data centers, the practice of hot work was understandable — necessary, even. However, Uptime Institute has long advocated against hot work. Modern equipment and higher resiliency architectures based on dual-corded servers make it possible to switch power feeds in the case of an electrical equipment failure. These advances not only improve data center availability, they also make it possible to isolate equipment for maintenance purposes.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/02/Energized-hot-work-cropped-blog.jpg11983242Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2020-02-24 05:30:402020-02-14 15:34:41Phasing Out Data Center Hot Work
Uptime Institute Intelligence plans to release its 2019/2020 outages report shortly. This report will examine the types, causes and impacts of public outages, as well as further analyze the results of a recent Uptime survey on outages and impacts. The data will once again show that serious IT service interruptions are common and costly, with the impacts often causing serious disruption.
We have excluded one type of outage from the report: those caused by cyberattacks. Data integrity and cybersecurity is, of course, a very major issue that requires vigilant attention and investment, but it is not currently an area on which Uptime Institute researches and advises. Most security issues are data breaches; although they have serious consequences, they do not usually lead to a service interruption.
However, two forms of malicious attack can and often do lead to outages or at least a severe service degradation. The first is a Distributed Denial of Service (DDoS) attack, where a coordinated attempt is made to overwhelm a site with traffic. Uptime has tracked a number of these each year for many years, and security specialists say they are increasingly common. Even so, most organizations that are DDoS targets have developed effective countermeasures that minimize the threat. These measures include such techniques as packet filtering, load balancing and blocking suspect internet protocol addresses. As a result, DDoS attacks are showing up less frequently in our lists of outages.
The second type, ransomware, is emerging as a major problem and cause of outages. Ransomware attackers deny authorized users access to their own data; the hackers use malware to encrypt the user’s files and refuse to unlock them unless a ransom is paid. Often, operators have no choice but to take down all involved IT services in an attempt to recover access, restore from the last clean backup copy, and purge the systems of viruses. Outages can last days or weeks.
In the past two years, ransomware attacks have increased dramatically. The FBI investigated over 1,400 ransomware attacks in 2018. Government offices are a particular target. Kaspersky Research Labs, operated by security software supplier Kaspersky, identified 147 attacks on municipalities in 2019 (up 60%), in which the criminals demanded ransoms of $5.3 million. The IT Governance blog, based in the UK, recorded 19 major ransomware attacks globally in December 2019 alone.
Most US cities have now signed a charter never to pay a ransom to the criminals — but more importantly, most are now also upgrading their infrastructure and practices to prevent attacks. Some that have been targeted, however, have paid the ransom.
Perhaps the two most serious attacks in 2019 were the City of Baltimore, which refused to pay the ransom and budgeted $18 million to fix its problem; and the City of Atlanta, which also refused to pay the ransom and paid over $7 million to fully restore operations. The WannaCry virus attack in 2018 reportedly cost the UK National Health Service over $120 million (£92 million). And on New Year’s Eve 2019, Travelex’s currency trading went offline for two weeks due to a ransomware attack, costing it millions.
Preventing a ransomware attack has become — or should become — a very high priority for those concerned with resiliency. Addressing the risk may involve some stringent, expensive and inconvenient processes, such as multifactor security, since attackers will likely try to copy all passwords as well as encrypt files. In terms of the Uptime Institute Outage Severity Rating, many attacks quickly escalate to the most serious Category 4 or 5 levels — severe enough to costs millions and threaten the survival of the organization. Indeed, one North American health provider has struggled to recover after receiving a $14 million ransom demand.
All of this points to the obvious imperative: The availability and integrity of digital infrastructure, data and services is critical — in the fullest sense of the word — to almost all organizations today, and assessments of vulnerability need to span security, software, systems, power, networks and facilities. Weaknesses are likely to be exploited; sufficient investment and diligence in this area has become essential and must never waver. In hindsight we discover that almost all outages could have prevented with better management, processes and technology.
Members of the Uptime Institute Network can read more on this topic here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/02/GettyImages-1142860861-blog.jpg4461221Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngAndy Lawrence, Executive Director of Research, Uptime Institute, [email protected]2020-02-10 05:46:282020-01-30 14:56:41The spectre of ransomware
A wave of new technologies, from 5G to the internet of things (IoT) to artificial intelligence (AI), means much more computing and much more data will be needed near the point of use. That means many more small data centers will be required. But there will be no sudden mass deployment, no single standout use case, no single design dominating. Demand is likely to grow faster from 2022.
Small package, big impact
Suppliers in the data center industry are excited. Big vendors such as Schneider, Vertiv and Huawei have been rapidly adding to their product lines and redrawing their financial forecasts; startups — companies such as Vapor IO, EdgeMicro, EdgeInfra and MetroEDGE — are pioneering new designs; and established telco specialists, such as Ericsson, along with telco operators, are working on new technologies and partnerships. Builders and operators of colocation data centers, such as EdgeConneX, Equinix and Compass, are assessing where the opportunity lies.
The opportunity is to supply, build or operate local edge data centers — small micro data centers that are designed to operate near the point of use, supporting applications that are not suited to run in big, remote data centers, even in mid-sized regional colocation data centers. Unlike most larger data centers, micro data centers will mostly be built, configured and tested in a factory and delivered on a truck. Typical sizes will be 50 kW to 400 kW, and there are expected to be a lot of them.
But with the anticipation comes consternation — it is possible to commit too early. Some analysts had predicted that the explosion in edge demand would be in full swing by now, fueled by the growing maturity of the IoT and the 2020 launch schedules for 5G services. Suppliers, however, mostly report only a trickle — not a flood — of orders.
Privately, some suppliers admit they have been caught off guard. There is a deep discussion about the extent of data center capacity needed at the local edge; about just how many applications and services really need local edge processing; and about the type and size of IT equipment needed — maybe a small box on the wall will be enough?
While the technical answers to most of these questions are largely understood, questions remain about the economics, the ownership, and the scale and pace of deployment of new technologies and services. These are critical matters affecting deployment.
Edge demand and 5G
In the past decade, data and processing has shifted to a cloudy core, with hundreds of hyperscale data centers built or planned. This will continue. But a rebalancing is underway (see Uptime Institute Intelligence report: The internet tilts toward the edge), with more processing being done not just at the regional edge, in nearby colocation (and other regional) data centers, but locally, in a micro data center that is tens or hundreds of meters away.
This new small facility may be needed to support services that have a lot of data, such as MRI scanners, augmented reality and real-time streaming; it may be needed to provide very low latency, instantly responsive services for both humans and machines — factory machines are one example, driverless cars another; and it may be needed to quickly crunch AI calculations for immediate, real-time responses. There is also a more mundane application: to provide on-site services, such as in a hospital, factory or retail establishment, should the network fail.
With all these use cases, why is there any doubt about the micro data center opportunity?
First, in terms of demand drivers, no new technology has created so much interest and excitement as 5G. The next generation telecom wireless network standard promises speeds of up to 10 gigabits per second (Gbps) communications, latency of below five millisecond (ms), support for one million devices per square kilometer, and five-nines availability. It will ultimately support a vast array of new always-on, low latency and immersive applications that will require unimaginable amounts of data and compute power — too much to realistically or economically send back to the internet’s hyperscale core. Much of this will require low-latency communications and rapid processing of a few milliseconds or less — which, the speed of light dictates, must be within a few kilometers.
Few doubt that 5G will create (or satisfy) huge demand and play a pivotal role in IoT. But the rollout of 5G, already underway, is not going to be quick, sudden or dramatic. In fact, full rollout may take 15 years. This is because the infrastructure required to support 5G is too expensive, too complex, and involves too many parties to do all at once. Estimates vary, with at least one analyst firm predicting that telecom companies will need to spend $1 trillion upgrading their networks.
A second issue that is creating uncertainty about demand is that many edge applications — whether supported by 5G or some other networking technology (such as WiFi 6) — may not require a local micro data center. For example, high-bandwidth applications may be best served from a content distribution network at the regional edge, in a colo, or by the colo itself, while many sensors and IoT devices produce very little data and so can be served by small gateway devices. Among 5G’s unique properties is the ability to support data-heavy, low-latency services at scale — but this is exactly the kind of service that will mostly be deployed in 2021 or later.
Suppliers and telcos alike, then, are unsure about the number, type and size of data centers at the local edge. Steve Carlini, a Schneider Electric executive, told Uptime Institute that he expects most demand for micro data centers supporting 5G will be in the cities, where mobile edge-computing clusters would likely each need one micro data center. But the number of clusters in each city, far fewer than the number of new masts, would depend on demand, applications and other factors.
A third big issue that will slow demand for micro data centers is economic and organizational. These issues include licensing, location and ownership of sites; support and maintenance; security and resiliency concerns; and management sentiment. Most enterprises expect to own their own edge micro data centers, according to Uptime Intelligence research, but many others will likely prefer to outsource this altogether, in spite of potentially higher operational costs and a loss of control.
Suppliers are bullish, even if they know demand will grow slowly at first. Among the first-line targets are those simply looking to upgrade server rooms, where the work cannot be turned over to a colo or the cloud; factories with local automation needs; retailers and others that need more resiliency in distributed locations; and telcos, whose small central offices need the security, availability and cost base of small data centers.
This wide range of applications has also led to an explosion of innovation. Expect micro data centers to vary in density, size, shape, cooling types (include liquid), power sources (including lithium ion batteries and fuel cells) and levels of resiliency.
The surge in demand for micro data centers will be real, but it will take time. Many of the economic and technical drivers are not yet mature; 5G, one of the key underlying catalysts, is in its infancy. In the near term, much of the impetus behind the use of micro data centers will lie in their ability to ensure local availability in the event of network or other remote outages.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/01/GettyImages-157394357-blog.jpg10242720Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngAndy Lawrence, Executive Director of Research, Uptime Institute, [email protected]2020-02-03 05:34:532020-01-30 14:46:06Micro data centers: An explosion in demand, in slow motion
Hardware refresh is the process of replacing older, less efficient servers with newer, more efficient ones with more compute capacity. However, there is a complication to the refresh cycle that is relatively recent: the slowing down of Moore’s law. There is still a very strong case for savings in energy when replacing servers that are up to nine years old. However, the case for refreshing more recent servers — say, up to three years old — may be far less clear, due to the stagnation witnessed in Moore’s law over the past few years.
Moore’s law refers to the observation made by Gordon Moore (co-founder of Intel) that the transistor count on microchips would double every two years. This implied that transistors would become smaller and faster, while drawing less energy. Over time, the doubling in performance per watt was observed to happen around every year and a half.
It is this doubling in performance per watt that underpins the major opportunity for increasing compute capacity while increasing efficiency through hardware refresh. But in the past five years, it has been harder for Intel (and immediate rival AMD) to maintain the pace of improvement. This raises the question: Are we still seeing these gains from recent and forthcoming generation of central processing units (CPUs)? If not, the hardware refresh case will be undermined … and suppliers are unlikely to be making that point too loudly.
To answer this question, Uptime Institute Intelligence analyzed performance data from the Standard Performance Evaluation Corporation (SPEC; https://www.spec.org/). The SPECpower dataset used contains energy performance results from hundreds of servers, based on the SPECpower server energy performance benchmark. To be able to track trends and eliminate potential outlier bias in reported servers (e.g., high-end servers versus volume servers), only dual-socket servers were considered in our analysis, for trend consistency. The dataset was then broken down into 18-month intervals (based on the published date of release of servers in SPECpower) and the performance averaged for each period. The results (server performance per watt) are shown in Figure 1, along with the trend line (polynomial, order 3).
The figure above shows how performance increases have started to plateau, particularly over the past two periods. The data suggests upgrading a 2015 server in 2019 might provide only a 20% boost in processing power for the same number of watts. In contrast, upgrading a 2008/2009 server in 2012 might have given a boost of 200% to 300%.
To further understand the reason behind this, we charted the way CPU technology (lithography) has evolved over time, along with performance and idle power consumption (see Figure 2).
Figure 2 reveals some interesting insights. During the beginning of the decade, the move from one CPU lithography to another, e.g., 65 nanometers (nm) to 45 nm, 45 nm to 32 nm, etc., presented major performance per watt gains (orange line), as well as substantial reduction in idle power consumption (blue line), thanks to the reduction in transistor size and voltage.
However, it is also interesting to see that the introduction of a larger number of cores to maintain performance gains produced a negative impact on idle power consumption. This can be seen briefly during the 45 nm lithography and very clearly in recent years with 14 nm.
Over the past few years, while lithography stagnated at 14 nm, the increase in performance per watt (when working with a full load) has been accompanied by a steady increase in idle power consumption (perhaps due to the increase in core count to achieve performance gains). This is one reason why the case for hardware refresh for more recent kit has become weaker: Servers in real-life deployments tend to spend a substantial part of their time in idle mode — 75% of the time, on average. As such, the increase in idle power may offset energy gains from performance.
This is an important point that will likely have escaped many buyers and operators: If a server spends a disproportionate amount of time in active idle mode — as is the case for most — the focus should be on active idle efficiency (e.g., choosing servers with lower core count) rather than just on higher server performance efficiency, while satisfying overall compute capacity requirements.
It is, of course, a constantly moving picture. The more recent introduction of the 7 nm lithography by AMD (Intel’s main competitor) should give Moore’s law a new lease of life for the next couple of years. However, it has become clear that we are starting to reach the limits of the existing approach to CPU design. Innovation and efficiency improvements will need to be based on new architectures, entirely new technologies and more energy-aware software design practices.
The full report Beyond PUE: Tackling IT’s wasted terawatts is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/01/GettyImages-1017644632-50.jpg9952591Rabih Bashroushhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRabih Bashroush2020-01-27 05:45:542020-01-27 09:59:00Optimizing server refresh cycles with an aging Moore’s law
Big IT outages are occurring with growing regularity, many with severe consequences. Executives, industry authorities and governments alike are responding with more rules, calls for more transparency and a more formal approach to end-to-end, holistic resiliency.
Creeping criticality
IT outages and data center downtime can cause huge disruption. That is hardly news: veterans with long memories can remember severe IT problems caused by power outages, for example, back in the early 1990s.
Three decades on, the situation is vastly different. Almost every component and process in the entire IT supply chain has been engineered, re-engineered and architected for the better, with availability a prime design criterion. Failure avoidance and management, business continuity and data center resiliency has become a discipline, informed by proven approaches and supported by real-time data and a vast array of tools and systems.
But there is a paradox: The very success of IT, and of remotely delivered services, has created a critical dependency on IT in almost every business and for almost every business process. This dependency has radically increased in recent years. Many more outages — and there are more of them — have a more immediate, wider and bigger impact than in the past.
A particular issue that has affected many high-profile organizations, especially in industries such as air transport, finance and retail, is “asymmetric criticality” or “creeping criticality.” This refers to a situation in which the infrastructure and processes have not been upgraded or updated to reflect the growing criticality of the applications or business processes they support. Some of the infrastructure has a 15-year life cycle, a timeframe out of sync with the far faster pace of innovation and change in the IT market.
While the level of dependency on IT is growing, another big set of changes is still only partway through: the move to cloud and distributed IT architectures (which may or may not involve the public cloud). Cloud and distributed applications enable the move, in part or whole, to a more distributed approach to resiliency. This approach involves replicating data across availability zones (regional clusters of three or more data centers) and using a variety of software tools and approaches, distributed databases, decentralized traffic and workload management, data replication and disaster recovery as a service.
These approaches can be highly effective but bring two challenges. First are complexity and cost — these architectures are difficult to set up, manage and configure, even for a customer with no direct responsibility for the infrastructure (Uptime Institute data suggests that difficulties with IT and software contribute to ever more outages). And second, for most customers, is a loss of control, visibility and accountability. This loss of visibility is now troubling regulators, especially in the financial services sector, which now plan to exercise more oversight in the United States (US), Europe, the United Kingdom (UK) and elsewhere.
Will outages get worse?
Are outages becoming more common or more damaging? The answer depends on the exact phrasing of the question: neither the number nor the severity of outages is increasing as a proportion of the level of IT services being deployed — in fact, reliability and availability is probably increasing, albeit perhaps not significantly.
But the absolute number of outages is clearly increasing. In both our 2018 and 2019 global annual surveys, half (almost exactly 50%) said their organization had a serious data center or IT outage in the past three years – and it is known that the number of data centers has risen significantly during this time. Our data also shows the impact of these outages is serious or severe in almost 20% of cases, with many industry sectors, including public cloud and colocation, suffering problems.
What next?
The industry is now at an inflection point; whatever the overall rate of outages, the impact of outages at all levels has become more public, has more consequential effects, and is therefore more costly. This trend will continue for several years, as networks, IT and cloud services take time to mature and evolve to meet the heavy availability demands put upon them. More high-profile outages can be expected, and more sectors and governments will start examining the nature of critical infrastructure.
This has already started in earnest: In the UK, the Bank of England is investigating large banks’ reliance on cloud as part of a broader risk-reduction initiative for financial digital services. The European Banking Authority specifically states that an outsourcer/cloud operator must allow site inspections of data centers. And in the US, the Federal Reserve has conducted a formal examination of at least one Amazon Web Services (AWS) data center, in Virginia, with a focus on its infrastructure resiliency and backup systems. More site visits are expected.
Authorities in the Netherlands, Sweden and the US have also been examining the resiliency of 911 services after a series of failures. And in the US, the General Accounting Office published an analysis to determine what could be done about the impact and frequency of IT outages at airlines. Meanwhile, data centers themselves will continue to be the most resilient and mature component (and with Uptime Institute certification, can be shown to be designed and operated for resiliency). There are very few signs that any sector of the market (enterprise, colocation or cloud) plans on downgrading physical infrastructure redundancy.
As a result of the high impact of outages, a much greater focus on resiliency can be expected, with best practices and management, investment, technical architectures, transparency and reporting, and legal responsibility all under discussion.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/01/GettyImages-1192700007-blog.jpg9472655Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngAndy Lawrence, Executive Director of Research, Uptime Institute, [email protected]2020-01-13 05:57:122019-12-27 09:11:43Outages drive authorities and businesses to act
Energy use by data centers and IT will continue to rise, putting pressure on energy infrastructure and raising questions about carbon emissions. The drivers for more energy use are simply too great to be offset by efficiency gains.
Drivers
Demand for digital services has seen sustained, exceptional growth over the past few years — and with it, the energy consumption of the underlying infrastructure has risen steadily as well. This has given rise to concerns about the ability of the energy industry to effectively supply data centers in some geographies — and the continuing worries about the sector’s growing carbon footprint.
Although there is a shortage of reliable, comprehensive data about the industry’s use of energy, it is likely that some models have underestimated energy data and carbon emissions and that the issue will become more critical in the years ahead.
There are some standout examples of IT energy use. Bitcoin mining, for example, is reliably estimated to have consumed over 73 terawatt-hour (TWh) of energy in 2019. This equates to the electricity use of 6.8 million average US households, or 20 million UK households. This is one cryptocurrency — of over 1,500 — and just one application area of blockchains.
Social media provides another example of uncontrolled energy use. Research by Uptime Intelligence shows that every time an image is posted on Instagram by the Portuguese soccer star Cristiano Ronaldo (who at the time of writing had the most followers on the platform), his more than 188 million followers consume over 24 megawatt-hours (MWh) of energy to view it.
Media streaming, which represents the biggest proportion of global traffic and which is rising steadily and globally, has become the energy guzzler of the internet. According to our analysis, streaming a 2.5 hour high definition (HD) movie consumes 1 kilowatt-hour (kWh) of energy. But for 4K (Ultra HD) streaming — expected to become more mainstream in 2020 — this will be closer to 3 kWh, a three-fold increase.
Data from the most developed countries shows what can be expected elsewhere. In the UK, which has more than 94% internet penetration, annual household broadband data consumption increased from 17 gigabyte (GB) in 2011 to 132 GB in 2016, according to official Ofcom data — a sustained 50% increase year-on-year for five years. (The growth figure is much higher in other parts of the world such as Asia and Africa.) Internet penetration, standing at 58% globally in 2019, is expected to increase by 10% in 2020.
This increase in demand is a big driver — although not the only one — for more infrastructure and more energy consumption in cloud, colocation and some enterprise data centers. But a new factor has yet to kick in: 5G.
While it will take a few years for 5G to further mature and become widespread, it is widely expected that the rollout of 5G from 2020 will substantially accelerate the data growth trends, with many new types of digital services in domains such as smart cities, IoT and transportation, among many others. The increased bandwidth compared with 4G will lead to increased demand for higher resolution content and richer media formats (e.g., virtual reality) as soon as late 2020 and rising more steeply, along with energy consumption, after that.
The role of blockchain (of which Bitcoin is just an example) and its impact on energy consumption is still to be fully determined, but if the takeup is on a large scale, it can only be an upward force. Most analysts in this area have predicted a dramatic rise in blockchain adoption beyond cryptocurrency in 2020, helped by new offerings such as the AWS blockchain service. Not all blockchain models are the same, but it inherently means a decentralized architecture, which requires extensive infrastructure to accommodate the replication of data. This consumes more energy than traditional centralized architectures.
Bitcoin is an example of a blockchain that uses Proof of Work as a consensus mechanism — and such models are extremely energy-intensive, requiring multiple parties to solve complex mathematical problems. While alternatives to this model (e.g., Proof of Stake) are likely to gain widespread commercial adoption, the uptake to date has been slow.
Energy consumption and global IT
Several reports have been published in recent years on IT energy consumption and its predicted growth rates. An International Energy Agency (IEA) report published in 2019 noted that workloads and internet traffic will double, but it also forecast that data center energy demand will remain flat to 2021, due to efficiency trends. It cited various references for the basic research.
But Uptime Institute Intelligence is wary of this prediction and intends to collaborate with various parties in 2020 to research this further. There are very strong factors driving up IT energy consumption, and some of the existing data on IT energy use contradicts the IEA figures. The IEA report, for example, stated that global data center energy consumption was 197.8 TWh in 2018 and is expected to drop slightly by 2021. However, research by the European Union’s (EU’s) EURECA (EU Resource Efficiency Coordination Action) Project found that European data centers consumed 130 TWh in 2017, whereas Greenpeace put energy consumption by the Chinese data center industry at 160 TWh in 2018. This suggests an annual total for China and Europe alone in the neighborhood of 290 TWh, far higher than the IEA global figures.
It is true that the explosive increase in IT demand will not translate directly into the same rate of growth for infrastructure energy consumption (due to increased IT energy efficiency). However, given the exponential rate of growth, it is likely that demand will substantially outpace the gains from efficiency practices over the next five years.
In US data centers, the law of diminishing returns may begin to limit the impact of energy savings. For example, at the data center level, best practices such as hot/cold aisle containment, installation of blanking plates and raising set point temperature have already been widely deployed; this can be seen in the substantial drop in power usage effectiveness (PUE) between 2011 and 2014. However, since 2014, PUE has not dropped much, and in 2019, we noticed a slight increase in the average annual PUE reported by respondents to our global data center survey. Similarly, with IT hardware, Moore’s law has slowed down, and newer servers are not maintaining the same efficiency improvements seen in the past.
Uptime Institute expects the strong growth in the IT sector to be sustained over the next five years, given the well-understood demand patterns and the existing technologies coming into large-scale adoption. Our preliminary research suggests that IT energy consumption will rise steadily too, by as much as 10% in 2020, but further research will be conducted to develop and validate these forecasts.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/01/GettyImages-1096717338-blog.jpg17624775Rabih Bashroushhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRabih Bashroush2020-01-06 06:00:422019-12-19 11:12:47Data center energy use goes up and up and up
Phasing Out Data Center Hot Work
/in Executive, Operations/by Kevin HeslinDespite years of discussion, warnings and strict regulations in some countries, data center hot work remains a contentious issue in the data center industry. Hot work is the practice of working on energized electrical circuits (voltage limits differ regionally) — and it is usually done, in spite of the risks, to reduce the possibility of a downtime incident during maintenance.
Uptime Institute advises against hot work in almost all instances. The safety concerns are just too great, and data suggests work on energized circuits may — at best — only reduce the number of manageable incidents, while increasing the risk of arc flash and other events that damage expensive equipment and may lead to an outage or injury. In addition, concurrently maintainable or fault tolerant designs as described in Uptime Institute’s Tier Standard make hot work unnecessary.
The pressure against hot work continues to mount. In the US, electrical contractors have begun to decline some work that involves working on energized circuits, even if an energized work permit has been created and signed by appropriate management, as required by National Fire Protection Association (NFPA) 70E (Standard for Electrical Safety in the Workplace). In addition, US Department of Labor’s Occupational Safety and Hazards Agency (OSHA) has repeatedly rejected business continuity as an exception to hot work restrictions, making it harder for management to justify hot work and to find executives willing to sign the energized work permit.
OSHA statistics make clear that work on energized systems is a dangerous practice, especially for construction trades workers; installation, maintenance, and repair occupations; and grounds maintenance workers. For this reason, NFPA 70E sharply limits the situations in which organizations are allowed to work on energized equipment. Personnel safety is not the only issue; personal protective equipment (PPE) protects only workers, not equipment, so an arc flash can destroy many thousands of dollars of IT gear.
Ignoring local and national standards can be costly, too. OSHA reported 2,923 lockout/tagout and 1,528 PPE violations in 2017, among the many safety concerns it addressed that year. New minimum penalties for a single violation exceed $13,000, with top total fines for numerous, willful and repeated violations running into the millions of dollars. Wrongful death and injury suits add to the cost, and violations can lead to higher insurance premiums, too.
Participants in a recent Uptime Institute discussion roundtable agreed that the remaining firms performing work on live loads should begin preparing to end the practice. They said that senior management is often the biggest impediment to ending hot work, at least at some organizations, despite the well-known and documented risks. Executive resistance can be tied to concerns about power supplies or failure to maintain independent A/B feeds. In some cases, service level agreements contain restrictions against powering down equipment.
Despite executive resistance at some companies, the trend is clearly against hot work. By 2015, more than two-thirds of facilities operators had already eliminated the practice, according to Uptime Institute data. A tighter regulatory environment, heightened safety concerns, increased financial risk and improved equipment should combine to all but eliminate hot work in the near future. But there are still holdouts, and the practice is far more acceptable in some countries — China is an example — than in others, such as the US, where NFPA 70E severely limits the practice in all industries.
Also, hot work does not eliminate IT failure risk. Uptime Institute has been tracking data center abnormal incidents for more than 20 years and when studying the data, at least 71 failures occurred during hot work. While these failures are generally attributed to poor procedures or maintenance, a recent, more careful analysis concluded that better procedures or maintenance (or both) would have made it possible to perform the work safely — and without any failures — on de-energized systems.
The Uptime Institute abnormal incident database includes only four injury reports; all occurred during work on energized systems. In addition, the database includes 16 reports of arc flash. One occurred during normal preventive maintenance and one during an infrared scan. Neither caused injury, but the potential risk to personnel is apparent, as is the potential for equipment damage (and legal exposure).
Undoubtedly, eliminating hot work is a difficult process. One large retailer that has just begun the process expects the transition to take several years. And not all organizations succeed: Uptime Institute is aware of at least one organization in which incidents involving failed power supplies caused senior management to cancel their plan to disallow work on energized equipment.
According to several Uptime Institute Network community members, building a culture of safety is the most time-consuming part of the transition from hot work, as data centers are goal-oriented organizations, well-practiced at developing and following programs to identify and eliminate risk.
It is not necessary or even prudent to eliminate all hot work at once. The IT team can help slowly retire the practice by eliminating the most dangerous hot work first, building experience on less critical loads, or reducing the number of circuits affected at any one time. To prevent common failures when de-energizing servers, the Operations team can increase scrutiny on power supplies and ensure that dual-corded servers are properly fed.
In early data centers, the practice of hot work was understandable — necessary, even. However, Uptime Institute has long advocated against hot work. Modern equipment and higher resiliency architectures based on dual-corded servers make it possible to switch power feeds in the case of an electrical equipment failure. These advances not only improve data center availability, they also make it possible to isolate equipment for maintenance purposes.
The spectre of ransomware
/in Executive, Operations/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]Uptime Institute Intelligence plans to release its 2019/2020 outages report shortly. This report will examine the types, causes and impacts of public outages, as well as further analyze the results of a recent Uptime survey on outages and impacts. The data will once again show that serious IT service interruptions are common and costly, with the impacts often causing serious disruption.
We have excluded one type of outage from the report: those caused by cyberattacks. Data integrity and cybersecurity is, of course, a very major issue that requires vigilant attention and investment, but it is not currently an area on which Uptime Institute researches and advises. Most security issues are data breaches; although they have serious consequences, they do not usually lead to a service interruption.
However, two forms of malicious attack can and often do lead to outages or at least a severe service degradation. The first is a Distributed Denial of Service (DDoS) attack, where a coordinated attempt is made to overwhelm a site with traffic. Uptime has tracked a number of these each year for many years, and security specialists say they are increasingly common. Even so, most organizations that are DDoS targets have developed effective countermeasures that minimize the threat. These measures include such techniques as packet filtering, load balancing and blocking suspect internet protocol addresses. As a result, DDoS attacks are showing up less frequently in our lists of outages.
The second type, ransomware, is emerging as a major problem and cause of outages. Ransomware attackers deny authorized users access to their own data; the hackers use malware to encrypt the user’s files and refuse to unlock them unless a ransom is paid. Often, operators have no choice but to take down all involved IT services in an attempt to recover access, restore from the last clean backup copy, and purge the systems of viruses. Outages can last days or weeks.
In the past two years, ransomware attacks have increased dramatically. The FBI investigated over 1,400 ransomware attacks in 2018. Government offices are a particular target. Kaspersky Research Labs, operated by security software supplier Kaspersky, identified 147 attacks on municipalities in 2019 (up 60%), in which the criminals demanded ransoms of $5.3 million. The IT Governance blog, based in the UK, recorded 19 major ransomware attacks globally in December 2019 alone.
Most US cities have now signed a charter never to pay a ransom to the criminals — but more importantly, most are now also upgrading their infrastructure and practices to prevent attacks. Some that have been targeted, however, have paid the ransom.
Perhaps the two most serious attacks in 2019 were the City of Baltimore, which refused to pay the ransom and budgeted $18 million to fix its problem; and the City of Atlanta, which also refused to pay the ransom and paid over $7 million to fully restore operations. The WannaCry virus attack in 2018 reportedly cost the UK National Health Service over $120 million (£92 million). And on New Year’s Eve 2019, Travelex’s currency trading went offline for two weeks due to a ransomware attack, costing it millions.
Preventing a ransomware attack has become — or should become — a very high priority for those concerned with resiliency. Addressing the risk may involve some stringent, expensive and inconvenient processes, such as multifactor security, since attackers will likely try to copy all passwords as well as encrypt files. In terms of the Uptime Institute Outage Severity Rating, many attacks quickly escalate to the most serious Category 4 or 5 levels — severe enough to costs millions and threaten the survival of the organization. Indeed, one North American health provider has struggled to recover after receiving a $14 million ransom demand.
All of this points to the obvious imperative: The availability and integrity of digital infrastructure, data and services is critical — in the fullest sense of the word — to almost all organizations today, and assessments of vulnerability need to span security, software, systems, power, networks and facilities. Weaknesses are likely to be exploited; sufficient investment and diligence in this area has become essential and must never waver. In hindsight we discover that almost all outages could have prevented with better management, processes and technology.
Members of the Uptime Institute Network can read more on this topic here.
Micro data centers: An explosion in demand, in slow motion
/in Design, Executive/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]A wave of new technologies, from 5G to the internet of things (IoT) to artificial intelligence (AI), means much more computing and much more data will be needed near the point of use. That means many more small data centers will be required. But there will be no sudden mass deployment, no single standout use case, no single design dominating. Demand is likely to grow faster from 2022.
Small package, big impact
Suppliers in the data center industry are excited. Big vendors such as Schneider, Vertiv and Huawei have been rapidly adding to their product lines and redrawing their financial forecasts; startups — companies such as Vapor IO, EdgeMicro, EdgeInfra and MetroEDGE — are pioneering new designs; and established telco specialists, such as Ericsson, along with telco operators, are working on new technologies and partnerships. Builders and operators of colocation data centers, such as EdgeConneX, Equinix and Compass, are assessing where the opportunity lies.
The opportunity is to supply, build or operate local edge data centers — small micro data centers that are designed to operate near the point of use, supporting applications that are not suited to run in big, remote data centers, even in mid-sized regional colocation data centers. Unlike most larger data centers, micro data centers will mostly be built, configured and tested in a factory and delivered on a truck. Typical sizes will be 50 kW to 400 kW, and there are expected to be a lot of them.
But with the anticipation comes consternation — it is possible to commit too early. Some analysts had predicted that the explosion in edge demand would be in full swing by now, fueled by the growing maturity of the IoT and the 2020 launch schedules for 5G services. Suppliers, however, mostly report only a trickle — not a flood — of orders.
Privately, some suppliers admit they have been caught off guard. There is a deep discussion about the extent of data center capacity needed at the local edge; about just how many applications and services really need local edge processing; and about the type and size of IT equipment needed — maybe a small box on the wall will be enough?
While the technical answers to most of these questions are largely understood, questions remain about the economics, the ownership, and the scale and pace of deployment of new technologies and services. These are critical matters affecting deployment.
Edge demand and 5G
In the past decade, data and processing has shifted to a cloudy core, with hundreds of hyperscale data centers built or planned. This will continue. But a rebalancing is underway (see Uptime Institute Intelligence report: The internet tilts toward the edge), with more processing being done not just at the regional edge, in nearby colocation (and other regional) data centers, but locally, in a micro data center that is tens or hundreds of meters away.
This new small facility may be needed to support services that have a lot of data, such as MRI scanners, augmented reality and real-time streaming; it may be needed to provide very low latency, instantly responsive services for both humans and machines — factory machines are one example, driverless cars another; and it may be needed to quickly crunch AI calculations for immediate, real-time responses. There is also a more mundane application: to provide on-site services, such as in a hospital, factory or retail establishment, should the network fail.
With all these use cases, why is there any doubt about the micro data center opportunity?
First, in terms of demand drivers, no new technology has created so much interest and excitement as 5G. The next generation telecom wireless network standard promises speeds of up to 10 gigabits per second (Gbps) communications, latency of below five millisecond (ms), support for one million devices per square kilometer, and five-nines availability. It will ultimately support a vast array of new always-on, low latency and immersive applications that will require unimaginable amounts of data and compute power — too much to realistically or economically send back to the internet’s hyperscale core. Much of this will require low-latency communications and rapid processing of a few milliseconds or less — which, the speed of light dictates, must be within a few kilometers.
Few doubt that 5G will create (or satisfy) huge demand and play a pivotal role in IoT. But the rollout of 5G, already underway, is not going to be quick, sudden or dramatic. In fact, full rollout may take 15 years. This is because the infrastructure required to support 5G is too expensive, too complex, and involves too many parties to do all at once. Estimates vary, with at least one analyst firm predicting that telecom companies will need to spend $1 trillion upgrading their networks.
A second issue that is creating uncertainty about demand is that many edge applications — whether supported by 5G or some other networking technology (such as WiFi 6) — may not require a local micro data center. For example, high-bandwidth applications may be best served from a content distribution network at the regional edge, in a colo, or by the colo itself, while many sensors and IoT devices produce very little data and so can be served by small gateway devices. Among 5G’s unique properties is the ability to support data-heavy, low-latency services at scale — but this is exactly the kind of service that will mostly be deployed in 2021 or later.
Suppliers and telcos alike, then, are unsure about the number, type and size of data centers at the local edge. Steve Carlini, a Schneider Electric executive, told Uptime Institute that he expects most demand for micro data centers supporting 5G will be in the cities, where mobile edge-computing clusters would likely each need one micro data center. But the number of clusters in each city, far fewer than the number of new masts, would depend on demand, applications and other factors.
A third big issue that will slow demand for micro data centers is economic and organizational. These issues include licensing, location and ownership of sites; support and maintenance; security and resiliency concerns; and management sentiment. Most enterprises expect to own their own edge micro data centers, according to Uptime Intelligence research, but many others will likely prefer to outsource this altogether, in spite of potentially higher operational costs and a loss of control.
Suppliers are bullish, even if they know demand will grow slowly at first. Among the first-line targets are those simply looking to upgrade server rooms, where the work cannot be turned over to a colo or the cloud; factories with local automation needs; retailers and others that need more resiliency in distributed locations; and telcos, whose small central offices need the security, availability and cost base of small data centers.
This wide range of applications has also led to an explosion of innovation. Expect micro data centers to vary in density, size, shape, cooling types (include liquid), power sources (including lithium ion batteries and fuel cells) and levels of resiliency.
The surge in demand for micro data centers will be real, but it will take time. Many of the economic and technical drivers are not yet mature; 5G, one of the key underlying catalysts, is in its infancy. In the near term, much of the impetus behind the use of micro data centers will lie in their ability to ensure local availability in the event of network or other remote outages.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.
Optimizing server refresh cycles with an aging Moore’s law
/in Executive, Operations/by Rabih BashroushHardware refresh is the process of replacing older, less efficient servers with newer, more efficient ones with more compute capacity. However, there is a complication to the refresh cycle that is relatively recent: the slowing down of Moore’s law. There is still a very strong case for savings in energy when replacing servers that are up to nine years old. However, the case for refreshing more recent servers — say, up to three years old — may be far less clear, due to the stagnation witnessed in Moore’s law over the past few years.
Moore’s law refers to the observation made by Gordon Moore (co-founder of Intel) that the transistor count on microchips would double every two years. This implied that transistors would become smaller and faster, while drawing less energy. Over time, the doubling in performance per watt was observed to happen around every year and a half.
It is this doubling in performance per watt that underpins the major opportunity for increasing compute capacity while increasing efficiency through hardware refresh. But in the past five years, it has been harder for Intel (and immediate rival AMD) to maintain the pace of improvement. This raises the question: Are we still seeing these gains from recent and forthcoming generation of central processing units (CPUs)? If not, the hardware refresh case will be undermined … and suppliers are unlikely to be making that point too loudly.
To answer this question, Uptime Institute Intelligence analyzed performance data from the Standard Performance Evaluation Corporation (SPEC; https://www.spec.org/). The SPECpower dataset used contains energy performance results from hundreds of servers, based on the SPECpower server energy performance benchmark. To be able to track trends and eliminate potential outlier bias in reported servers (e.g., high-end servers versus volume servers), only dual-socket servers were considered in our analysis, for trend consistency. The dataset was then broken down into 18-month intervals (based on the published date of release of servers in SPECpower) and the performance averaged for each period. The results (server performance per watt) are shown in Figure 1, along with the trend line (polynomial, order 3).
The figure above shows how performance increases have started to plateau, particularly over the past two periods. The data suggests upgrading a 2015 server in 2019 might provide only a 20% boost in processing power for the same number of watts. In contrast, upgrading a 2008/2009 server in 2012 might have given a boost of 200% to 300%.
To further understand the reason behind this, we charted the way CPU technology (lithography) has evolved over time, along with performance and idle power consumption (see Figure 2).
Figure 2 reveals some interesting insights. During the beginning of the decade, the move from one CPU lithography to another, e.g., 65 nanometers (nm) to 45 nm, 45 nm to 32 nm, etc., presented major performance per watt gains (orange line), as well as substantial reduction in idle power consumption (blue line), thanks to the reduction in transistor size and voltage.
However, it is also interesting to see that the introduction of a larger number of cores to maintain performance gains produced a negative impact on idle power consumption. This can be seen briefly during the 45 nm lithography and very clearly in recent years with 14 nm.
Over the past few years, while lithography stagnated at 14 nm, the increase in performance per watt (when working with a full load) has been accompanied by a steady increase in idle power consumption (perhaps due to the increase in core count to achieve performance gains). This is one reason why the case for hardware refresh for more recent kit has become weaker: Servers in real-life deployments tend to spend a substantial part of their time in idle mode — 75% of the time, on average. As such, the increase in idle power may offset energy gains from performance.
This is an important point that will likely have escaped many buyers and operators: If a server spends a disproportionate amount of time in active idle mode — as is the case for most — the focus should be on active idle efficiency (e.g., choosing servers with lower core count) rather than just on higher server performance efficiency, while satisfying overall compute capacity requirements.
It is, of course, a constantly moving picture. The more recent introduction of the 7 nm lithography by AMD (Intel’s main competitor) should give Moore’s law a new lease of life for the next couple of years. However, it has become clear that we are starting to reach the limits of the existing approach to CPU design. Innovation and efficiency improvements will need to be based on new architectures, entirely new technologies and more energy-aware software design practices.
The full report Beyond PUE: Tackling IT’s wasted terawatts is available to members of the Uptime Institute Network here.
Outages drive authorities and businesses to act
/in Executive/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]Big IT outages are occurring with growing regularity, many with severe consequences. Executives, industry authorities and governments alike are responding with more rules, calls for more transparency and a more formal approach to end-to-end, holistic resiliency.
Creeping criticality
IT outages and data center downtime can cause huge disruption. That is hardly news: veterans with long memories can remember severe IT problems caused by power outages, for example, back in the early 1990s.
Three decades on, the situation is vastly different. Almost every component and process in the entire IT supply chain has been engineered, re-engineered and architected for the better, with availability a prime design criterion. Failure avoidance and management, business continuity and data center resiliency has become a discipline, informed by proven approaches and supported by real-time data and a vast array of tools and systems.
But there is a paradox: The very success of IT, and of remotely delivered services, has created a critical dependency on IT in almost every business and for almost every business process. This dependency has radically increased in recent years. Many more outages — and there are more of them — have a more immediate, wider and bigger impact than in the past.
A particular issue that has affected many high-profile organizations, especially in industries such as air transport, finance and retail, is “asymmetric criticality” or “creeping criticality.” This refers to a situation in which the infrastructure and processes have not been upgraded or updated to reflect the growing criticality of the applications or business processes they support. Some of the infrastructure has a 15-year life cycle, a timeframe out of sync with the far faster pace of innovation and change in the IT market.
While the level of dependency on IT is growing, another big set of changes is still only partway through: the move to cloud and distributed IT architectures (which may or may not involve the public cloud). Cloud and distributed applications enable the move, in part or whole, to a more distributed approach to resiliency. This approach involves replicating data across availability zones (regional clusters of three or more data centers) and using a variety of software tools and approaches, distributed databases, decentralized traffic and workload management, data replication and disaster recovery as a service.
These approaches can be highly effective but bring two challenges. First are complexity and cost — these architectures are difficult to set up, manage and configure, even for a customer with no direct responsibility for the infrastructure (Uptime Institute data suggests that difficulties with IT and software contribute to ever more outages). And second, for most customers, is a loss of control, visibility and accountability. This loss of visibility is now troubling regulators, especially in the financial services sector, which now plan to exercise more oversight in the United States (US), Europe, the United Kingdom (UK) and elsewhere.
Will outages get worse?
Are outages becoming more common or more damaging? The answer depends on the exact phrasing of the question: neither the number nor the severity of outages is increasing as a proportion of the level of IT services being deployed — in fact, reliability and availability is probably increasing, albeit perhaps not significantly.
But the absolute number of outages is clearly increasing. In both our 2018 and 2019 global annual surveys, half (almost exactly 50%) said their organization had a serious data center or IT outage in the past three years – and it is known that the number of data centers has risen significantly during this time. Our data also shows the impact of these outages is serious or severe in almost 20% of cases, with many industry sectors, including public cloud and colocation, suffering problems.
What next?
The industry is now at an inflection point; whatever the overall rate of outages, the impact of outages at all levels has become more public, has more consequential effects, and is therefore more costly. This trend will continue for several years, as networks, IT and cloud services take time to mature and evolve to meet the heavy availability demands put upon them. More high-profile outages can be expected, and more sectors and governments will start examining the nature of critical infrastructure.
This has already started in earnest: In the UK, the Bank of England is investigating large banks’ reliance on cloud as part of a broader risk-reduction initiative for financial digital services. The European Banking Authority specifically states that an outsourcer/cloud operator must allow site inspections of data centers. And in the US, the Federal Reserve has conducted a formal examination of at least one Amazon Web Services (AWS) data center, in Virginia, with a focus on its infrastructure resiliency and backup systems. More site visits are expected.
Authorities in the Netherlands, Sweden and the US have also been examining the resiliency of 911 services after a series of failures. And in the US, the General Accounting Office published an analysis to determine what could be done about the impact and frequency of IT outages at airlines. Meanwhile, data centers themselves will continue to be the most resilient and mature component (and with Uptime Institute certification, can be shown to be designed and operated for resiliency). There are very few signs that any sector of the market (enterprise, colocation or cloud) plans on downgrading physical infrastructure redundancy.
As a result of the high impact of outages, a much greater focus on resiliency can be expected, with best practices and management, investment, technical architectures, transparency and reporting, and legal responsibility all under discussion.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.
Data center energy use goes up and up and up
/in Design, Executive, Operations/by Rabih BashroushEnergy use by data centers and IT will continue to rise, putting pressure on energy infrastructure and raising questions about carbon emissions. The drivers for more energy use are simply too great to be offset by efficiency gains.
Drivers
Demand for digital services has seen sustained, exceptional growth over the past few years — and with it, the energy consumption of the underlying infrastructure has risen steadily as well. This has given rise to concerns about the ability of the energy industry to effectively supply data centers in some geographies — and the continuing worries about the sector’s growing carbon footprint.
Although there is a shortage of reliable, comprehensive data about the industry’s use of energy, it is likely that some models have underestimated energy data and carbon emissions and that the issue will become more critical in the years ahead.
There are some standout examples of IT energy use. Bitcoin mining, for example, is reliably estimated to have consumed over 73 terawatt-hour (TWh) of energy in 2019. This equates to the electricity use of 6.8 million average US households, or 20 million UK households. This is one cryptocurrency — of over 1,500 — and just one application area of blockchains.
Social media provides another example of uncontrolled energy use. Research by Uptime Intelligence shows that every time an image is posted on Instagram by the Portuguese soccer star Cristiano Ronaldo (who at the time of writing had the most followers on the platform), his more than 188 million followers consume over 24 megawatt-hours (MWh) of energy to view it.
Media streaming, which represents the biggest proportion of global traffic and which is rising steadily and globally, has become the energy guzzler of the internet. According to our analysis, streaming a 2.5 hour high definition (HD) movie consumes 1 kilowatt-hour (kWh) of energy. But for 4K (Ultra HD) streaming — expected to become more mainstream in 2020 — this will be closer to 3 kWh, a three-fold increase.
Data from the most developed countries shows what can be expected elsewhere. In the UK, which has more than 94% internet penetration, annual household broadband data consumption increased from 17 gigabyte (GB) in 2011 to 132 GB in 2016, according to official Ofcom data — a sustained 50% increase year-on-year for five years. (The growth figure is much higher in other parts of the world such as Asia and Africa.) Internet penetration, standing at 58% globally in 2019, is expected to increase by 10% in 2020.
This increase in demand is a big driver — although not the only one — for more infrastructure and more energy consumption in cloud, colocation and some enterprise data centers. But a new factor has yet to kick in: 5G.
While it will take a few years for 5G to further mature and become widespread, it is widely expected that the rollout of 5G from 2020 will substantially accelerate the data growth trends, with many new types of digital services in domains such as smart cities, IoT and transportation, among many others. The increased bandwidth compared with 4G will lead to increased demand for higher resolution content and richer media formats (e.g., virtual reality) as soon as late 2020 and rising more steeply, along with energy consumption, after that.
The role of blockchain (of which Bitcoin is just an example) and its impact on energy consumption is still to be fully determined, but if the takeup is on a large scale, it can only be an upward force. Most analysts in this area have predicted a dramatic rise in blockchain adoption beyond cryptocurrency in 2020, helped by new offerings such as the AWS blockchain service. Not all blockchain models are the same, but it inherently means a decentralized architecture, which requires extensive infrastructure to accommodate the replication of data. This consumes more energy than traditional centralized architectures.
Bitcoin is an example of a blockchain that uses Proof of Work as a consensus mechanism — and such models are extremely energy-intensive, requiring multiple parties to solve complex mathematical problems. While alternatives to this model (e.g., Proof of Stake) are likely to gain widespread commercial adoption, the uptake to date has been slow.
Energy consumption and global IT
Several reports have been published in recent years on IT energy consumption and its predicted growth rates. An International Energy Agency (IEA) report published in 2019 noted that workloads and internet traffic will double, but it also forecast that data center energy demand will remain flat to 2021, due to efficiency trends. It cited various references for the basic research.
But Uptime Institute Intelligence is wary of this prediction and intends to collaborate with various parties in 2020 to research this further. There are very strong factors driving up IT energy consumption, and some of the existing data on IT energy use contradicts the IEA figures. The IEA report, for example, stated that global data center energy consumption was 197.8 TWh in 2018 and is expected to drop slightly by 2021. However, research by the European Union’s (EU’s) EURECA (EU Resource Efficiency Coordination Action) Project found that European data centers consumed 130 TWh in 2017, whereas Greenpeace put energy consumption by the Chinese data center industry at 160 TWh in 2018. This suggests an annual total for China and Europe alone in the neighborhood of 290 TWh, far higher than the IEA global figures.
It is true that the explosive increase in IT demand will not translate directly into the same rate of growth for infrastructure energy consumption (due to increased IT energy efficiency). However, given the exponential rate of growth, it is likely that demand will substantially outpace the gains from efficiency practices over the next five years.
In US data centers, the law of diminishing returns may begin to limit the impact of energy savings. For example, at the data center level, best practices such as hot/cold aisle containment, installation of blanking plates and raising set point temperature have already been widely deployed; this can be seen in the substantial drop in power usage effectiveness (PUE) between 2011 and 2014. However, since 2014, PUE has not dropped much, and in 2019, we noticed a slight increase in the average annual PUE reported by respondents to our global data center survey. Similarly, with IT hardware, Moore’s law has slowed down, and newer servers are not maintaining the same efficiency improvements seen in the past.
Uptime Institute expects the strong growth in the IT sector to be sustained over the next five years, given the well-understood demand patterns and the existing technologies coming into large-scale adoption. Our preliminary research suggests that IT energy consumption will rise steadily too, by as much as 10% in 2020, but further research will be conducted to develop and validate these forecasts.
The full report Ten data center industry trends in 2020 is available to members of the Uptime Institute Network here.