The pandemic has led to a renewed interest by data center managers in remote monitoring, management and automation. Uptime Institute has fielded dozens of inquiries about these approaches in recent months, but one in particular stands out: What will operational automation and machine learning mean for on-site staff requirements?
With greater automation, the expectation is a move toward light staffing models, with just one or a handful of technicians on-site. These technicians will need to be able to respond to a range of potential situations: electrical or mechanical issues; software administration problems; break/fix needs on the computer-room floor; the ability to configure equipment, including servers, switches and routers; and so on. Do people with these competencies exist? How long does it take to train them?
Our experts agree: the requirement of on-site staff is shifting — from electrical and mechanical specialists to more generalist technicians whose primary role is to monitor and control data center activities to prevent incidents (especially outages).
Even before the pandemic, most data center technicians on-site did not carry out major preventative maintenance activities (although some do conduct low-level preventative maintenance); they support and escort vendors who do this work. The pandemic has accelerated this trend. On-site technicians today are typically trained as operational coordinators, switching and isolating equipment when necessary, ensuring adequate monitoring, reacting to unexpected and emergency conditions with the goal of getting a situation under control and returning the data center to a stable state.
Staffing costs have always been a major item in data center operations budgets. With the advent of better remote monitoring and resiliency of the data center in recent years, the perceived need for larger numbers of on-site data center staffing has diminished, particularly during off hours when activity is low. This trend is unlikely to reverse even after pandemic times.
One of our members, for example, runs an extremely large data center site that can be described as being built with a Tier III (concurrently maintainable) intent. It is mostly leased by hyperscale internet/cloud companies. On-site technicians are trained as generalists for operations and emergency coverage, and they work 12-hour shifts. A separate 8-hour day shift is staffed more heavily with engineers to handle customer projects and to assist with other operator activities as needed. All preventative maintenance is conducted by third-party vendors, who are escorted on-site by the staff technicians. Management anticipates moving to an automated, condition maintenance-based approach in the future, with the aim of lowering the number of on-site technical staff over time. The expectation is that on-site 24/7 staff will always be required to meet client service level agreements, but by lowering their numbers there will be meaningful operational savings.
However, this will not be a swift change (for this or any other data center). Implementing automated, software-driven systems is an iterative — and human-driven — process that takes time, ongoing investment and, critically, organizational and process change.
Technologies and services for remote data center monitoring and management are available and continue to develop. As they are (slowly and carefully) implemented, managers will feel more comfortable not having personnel on-site 24/7. In time, management focus will likely shift from ensuring round-the-clock staffing to developing more of an on-call approach. Already, more data centers are employing technicians and engineers who support multiple sites rather than having a fully staffed, dedicated team for each individual data center. These technicians have general knowledge of electrical and mechanical systems, and they coordinate the preventive and corrective maintenance activities, which are mostly performed by vendors.
Today, however, because of the pandemic, there is generally a greater reliance in on-site staffing, whereby technicians in the data center are providing managers with security/comfort and insurance in case there is an incident or outage. This is likely a short-term reaction.
In the medium term — say, in the next three to five years or so — we expect there will be increased use of plug-and-play data center and IT components and systems, so generalist site staffers can readily remove and replace modules as needed, without extensive training.
In the long term, more managers will seek to enhance and ensure data center and application resiliency and automation. This will involve the technical development of more self-healing systems/networks and redundancies (driven by software) that allow for reduced levels of on-site staff and reduced expertise of those personnel. If business functions can continue in the face of a failure without human intervention, then mean-time-to-repair becomes far less critical — a technician or vendor can be dispatched in due course with a replacement component to restore full functionality of the site. This type of self-healing approach has been discussed in earnest for at least the past decade but has not yet been realized — in no small part because of the operational change and new operational processes needed. A self-healing, autonomous operational approach would be an overhaul of today’s decades-long, industry-wide practices. Change is not always easy, and rarely is it inexpensive.
What is likely to (finally) propel the development of and move to self-healing technologies is the expected demand for large numbers of lights-out edge data centers. These small facilities will increasingly be designed to be plug-and-play and to be serviced by people with little specialized skills/training. On-site staff will be trained primarily to reliably follow directions from remote technical experts. These remote experts will be responsible for analyzing monitored data and providing specific instructions for the staffer at the site. It is possible, if not likely, that most people dispatched on-site to edge facilities will be vendors swapping out components. And increasingly, the specialist mechanical and electrical staff will not only be remote, but also trained experts in real-time monitoring and management software and software-driven systems.
Recent events have heightened concerns around physical security for many data center operators, and with good reason: the pandemic means many data centers may still be short-staffed, less time may have been available for review of and training on routine procedures, and vendor substitutes may be more common than under non-pandemic conditions. Add the usual “unusuals” that affect operations (e.g., severe storms causing staff absences and increasing the likelihood of utility failures), and normal precautions may fall by the wayside.
For most data centers, much of physical security starts at site selection and design. The typical layered (“box inside a box”) security strategy adopted by most facilities handles many concerns. If a data center has vulnerabilities (e.g., dark fiber wells beyond the perimeter), they’re generally known and provisions have been made to monitor them. Routine security standards are in place, emergency procedures are established, and all employees are trained.
But what keeps data center operators up at night is the unexpected. The recent bombing in Nashville, Tennessee (US), which disrupted internet and wireless services, and new threats to Amazon Web Services facilities as a result of their decision to suspend hosting the social media platform Parler are a stark reminder that extreme events can occur.
A December 2019 report from Uptime Institute summed it up best, by stating that IT security is one of the big issues of the information age. Billions of dollars are spent protecting the integrity and availability of data against the actions of malign agents. But while cybersecurity is a high-profile issue, all information lives in a physical data center somewhere, and much of it needs the highest order of protection. Data center owners/operators employ a wide range of tactics to maintain a perimeter against intruders and to regulate the activities of clients and visitors inside the data center. The full report assesses operator security spending, concerns, management and best practices.
Key Findings from December 2019 report: • Spending on physical security is commonly around 5% of the operations budget but in extremecases can be as high as 30%. • Data centers employ a range of common technologies and techniques to control access to the facility, but there is no “one size fits all” solution to physical security: each organization must tailor their approach to fit their circumstances. • Neither cloud-based data replication nor the threat of cybersecurity to both IT systems and facilities equipment have significantly diminished the need for physical security. • Most data center owners and operators consider unauthorized activity in the data center to be the greatest physical threat to IT. • Access to the data center property is governed by policies that reflect the business requirements of the organization and establish the techniques and technologies used to ensure the physical security of the facility. These policies should be reviewed regularly and benchmarked against those of similar organizations. • Data centers commonly employ third-party security services to enforce physical security policies. • Attempts at unwarranted entry do occur. In a recent study, about one in five data centers experienced some form of attempted access in a 5-year period. • Drones, infrared cameras, thermal scanners and video analytics are promising new technologies. • Biometric recognition is still viewed skeptically by many operators.
Now is the time to review your security plans and emergency operations procedures and to brief staff. Ensure they know the organization’s strategies and expectations. If your facility is in an area where many data centers are clustered together, consider collaborating with them to develop a regional plan.
https://journal.uptimeinstitute.com/wp-content/uploads/2021/01/sec1.jpg8041857Rhonda Ascierto, Vice President, Research, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRhonda Ascierto, Vice President, Research, Uptime Institute2021-01-25 04:34:002021-01-12 14:56:35Ensuring physical security in uncertain times
Data center operators (and enterprise IT) are generally cautious adopters of new technologies. Only a few (beyond hyperscale operators) try to gain a competitive advantage through their early use of technology. Rather, they have a strong preference toward technologies that are proven, reliable and well-supported. This reduces risks and costs, even if it means opportunities to jump ahead in efficiency, agility or functionality are missed.
But innovation does occur, and sometimes it comes in waves, perhaps triggered by the opportunity for a significant leap forward in efficiency, the sudden maturing of a technology, or some external catalyst. The threat of having to close critical data centers to move workloads to the public cloud may be one such driver; the need to operate a facility without staff during a weather event, or a pandemic crisis, may be another; the need to operate with far fewer carbon emissions may be yet another. Sometimes one new technology needs another to make it more economic.
The year 2021 may be one of those standouts in which a number of emerging technologies begin to gain traction. Among the technologies on the edge of wider adoption are:
Storage-class memory – A long-awaited class of semiconductors with ramifications for server performance, storage strategies and power management.
Silicon photonics – A way of connecting microchips that may revolutionize server and data center design.
ARM servers – Low-powered compute engines that, after a decade of stuttering adoption, are now attracting attention.
Software-defined power – A way to unleash and virtualize power assets in the data center.
All of these technologies are complementary; all have been much discussed, sampled and tested for several years, but so far with limited adoption. Three of these four were identified as highly promising technologies in the Uptime Institute/451 Research Disrupted Data Center research project summarized in the report Disruptive Technologies in the Datacenter: 10 Technologies Driving a Wave of Change, published in 2017. As the disruption profile below shows, these technologies were clustered to the left of the timeline, meaning they were, at that time, not yet ready for widespread adoption.
Now the time may be coming, with hyperscale operators particularly interested in storage-class memory and silicon photonics. But small operators, too, are trying to solve new problems — to match the efficiency of their larger counterparts, and, in some cases, to deploy highly efficient, reliable, and powerful edge data centers.
Storage-class memory
Storage-class memory (SCM) is a generic label for emerging types of solid-state media that offer the same or similar performance as dynamic random access memory or static random access memory, but at lower cost and with far greater data capacities. By allowing servers to be fitted with larger memories, SCM promises to heavily boost processing speeds. SCM is also nonvolatile or persistent — it retains data even if power to the device is lost and promises greater application availability by allowing far faster restarts of servers after reboots and crashes.
SCM can be used not just as memory, but also as an alternative to flash for high-speed data storage. For data center operators, the (widespread) use of SCM could reduce the need for redundant facility infrastructure, as well as promote higher-density server designs and more dynamic power management (software-defined power is discussed below).
However, the continuing efforts to develop commercially viable SCM have faced major technical challenges. Currently only one SCM exists with the potential to be used widely in servers. That memory was jointly developed by Intel and Micron Technology, and is now called Optane by Intel, and 3D XPoint by Micron. Since 2017, it has powered storage drives made by Intel that, although far faster than flash equivalents, have enjoyed limited sales because of their high cost. More promisingly, Intel last year launched the first memory modules powered by Optane.
Software suppliers such as Oracle and SAP are changing the architecture of their databases to maximize the benefits of the SCM devices, and major cloud providers are offering services based on Optane used as memory. Meanwhile a second generation of Optane/3D XPoint is expected to ship soon, and by reducing prices is expected to be more widely used in storage drives.
Silicon photonics
Silicon photonics enables optical switching functions to be fabricated on silicon substrates. This means electronic and optical devices can be combined into a single connectivity/processing package, reducing transceiver/switching latency, costs, size and power consumption (by up to 40%). While this innovation has uses across the electronics world, data centers are expected to be the biggest market for the next decade.
In the data center, silicon photonics allows components (such as processors, memory, input/output [I/O]) that are traditionally packaged on one motherboard or within one server to be optically interconnected, and then spread across a data hall — or even far beyond. Effectively, it has the potential to turn a data center into one big computer, or for data centers to be built out in a less structured way, using software to interconnect disaggregated parts without loss of performance. The technology will support the development of more powerful supercomputers and may be used to support the creation of new local area networks at the edge. Networking switches using the technology can also save 40% on power and cooling (this adds up in large facilities, which can have up to 50,000 switches).
Acquisitions by Intel (Barefoot Networks), Cisco (Luxtera, Acacia Communications) and Nvidia (Mellanox Networking) signal a much closer integration between network switching and processors in the future. Hyperscale data center operators are the initial target market because the technology can combine with other innovations (as well as with Open Compute Project rack and networking designs). As a result, we expect to see the construction of flexible, large-scale networks of devices in a more horizontal, disaggregated way.
ARM servers
The Intel x86 processor family is one of the building blocks of the internet age, of data centers and of cloud computing. Whether provided by Intel or a competitor such as Advanced Micro Devices, almost every server in every data center is built around this processor architecture. With its powerful (and power-hungry) cores, its use defines the motherboard and the server design and is the foundation of the software stack. Its use dictates technical standards, how workloads are processed and allocated, and how data centers are designed, powered and organized.
This hegemony may be about to break down. Servers based on the ARM processor design — the processors used in billions of mobile phones and other devices (and soon, in Apple MacBooks) — are now being used by Amazon Web Services (AWS) in its proprietary designs. Commercially available ARM systems offer dramatic price, performance and energy consumption improvements over current Intel x86 designs. When Nvidia announced its (proposed) $40 billion acquisition of ARM in early 2020, it identified the data center market as its main opportunity. The server market is currently worth $67 billion a year, according to market research company IDC (International Data Corporation).
Skeptics may point out that there have been many servers developed and offered using alternative, low-power and smaller processors, but none have been widely adopted to date. Hewlett Packard Enterprise’s Moonshot server system, initially launched using low-powered Intel Atom processors, is the best known but, due to A variety of factors, market adoption has been low.
Will that change? The commitment to use ARM chips by Apple (currently for MacBooks) and AWS (for cloud servers) will make a big difference, as will the fact that even the world’s most powerful supercomputer (as of mid-2020) uses an ARM Fujitsu microprocessor. But innovation may make the biggest difference. The UK-based company Bamboo Systems, for example, designed its system to support ARM servers from the ground up, with extra memory, connectivity and I/O processors at each core. It claims to save around 60% of the costs, 60% of the energy and 40% of the space when compared with a Dell x86 server configured for the same workload.
Software-defined power
In spite of its intuitive appeal and the apparent importance of the problems it addresses, the technology that has come to be known as “software-defined power” has to date received little uptake among operators. Software-defined power, also known as “smart energy,” is not one system or single technology but a broad umbrella term for technologies and systems that can be used to intelligently manage and allocate power and energy in the data center.
Software-defined power systems promise greater efficiency and use of capacity, more granular and dynamic control of power availability and redundancy, and greater real-time management of resource use. In some instances, it may reduce the amount of power that needs to be provisioned, and it may allow some energy storage to be sold back to the grid, safely and easily.
Software-defined power adopts some of the architectural designs and goals of software-defined networks, in that it virtualizes power switches as if they were network switches. The technology has three components: energy storage, usually lithium-ion (Li-ion) batteries; intelligently managed power switches or breakers; and, most importantly, management software that has been designed to automatically reconfigure and allocate power according to policies and conditions. (For a more detailed description, see our report Smart energy in the data center).
Software-defined power has taken a long time to break into the mainstream — and even 2021 is unlikely to be the breakthrough year. But a few factors are swinging in its favor. These include the widespread adoption of Li-ion batteries for uninterruptible power supplies, an important precondition; growing interest from the largest operators and the biggest suppliers (which have so far assessed technology, but viewed the market as unready); and, perhaps most importantly, an increasing understanding by application owners that they need to assess and categorize their workloads and services for differing resiliency levels. Once they have done that, software-defined power (and related smart energy technologies) will enable power availability to be applied more dynamically to the applications that need it, when they need it.
The full report Five data center trends for 2021 is available to members of the Uptime Institute community which can be found here.
Through 2021 and beyond, the world will begin to recover from its acute crisis — COVID-19 — and will turn its attention to other matters. Few if any of these issues will be as important as climate change, a chronic condition that will become more pressing and acute as each year passes.
In the critical digital infrastructure sector, as in all businesses, issues arising directly or indirectly from climate change will play a significant role in strategic decision-making and technical operations in the years ahead. And this is regardless of the attitude or beliefs of senior executives; stakeholders, governments, customers, lobbyists and watchdogs all want and expect to see more action. The year 2021 will be critical, with governments expected to act with greater focus and unity as the new US government rejoins the global effort.
We can group the growing impact of climate change into four areas:
Extreme weather/climate impact – As discussed in our report The gathering storm: Climate change and data center resiliency, extreme weather and climate change present an array of direct and indirect threats to data centers. For example, extreme heatwaves — which will challenge many data center cooling systems — are projected to occur once every three or four years, not once in 20.
Legislation and scrutiny – Nearly 2,000 pieces of climate-related legislation have been passed globally to date (covering all areas). Many more, along with more standards and customer mandates, can be expected in the next several years.
Litigation and customer losses – Many big companies are demanding rigorous standards through their supply chains — or their contracts will be terminated. Meanwhile, climate activists, often well-resourced, are filing lawsuits against technology companies and digital infrastructure operators to cover everything from battery choices to water consumption.
The need for new technologies – Management will be under pressure to invest more, partly to protect against weather events, and partly to migrate to cleaner technologies such as software-defined power or direct liquid cooling.
In the IT sector generally — including in data centers — it has not all been bad news to date. Led by the biggest cloud and colo companies, and judged by several metrics, the data center sector has made good progress in curtailing carbon emissions and wasteful energy use. According to the Carbon Trust, a London-based body focused on reducing carbon emissions, the IT sector is on course to meet its science-based target for 2030 — a target that will help keep the world to 1.5 degrees Celsius (34.7 degrees Fahrenheit) warming (but still enough warming to create huge global problems). Its data shows IT sector carbon emissions from 2020 to 2030 are on a trajectory to fall significantly in five key areas – data centers, user devices, mobile networks, and fixed and enterprise networks. Overall, the IT sector needs to cut carbon emissions by 50% from 2020 to 2030.
Data centers are just a part of this, accounting for more carbon emissions than mobile, fixed or enterprise networks, but significantly less than all the billions of user devices. Data center energy efficiency has been greatly helped by facility efficiencies, such as economizer cooling, improvements in server energy use, and greater utilization through virtualization and other IT/software improvements. Use of renewables has also helped: According to Uptime Institute data (our 2020 Climate Change Survey) over a third of operators now largely power their data centers using renewable energy sources or offset their carbon use (see figure below). Increasing availability of renewable power in the grid will help to further reduce emissions.
But there are some caveats to the data center sector’s fairly good performance. First, the reduction in carbon emissions achieved to date is contested by many who think the impact of overall industry growth on energy use and carbon emissions has been understated (i.e., energy use/carbon emissions are actually quite a lot higher than widely accepted models suggest — a debatable issue that Uptime Institute continues to review). Second, at an individual company or data center level, it may become harder to achieve carbon emissions reductions in the next decade than it has been in the past decade — just as the level of scrutiny and oversight, and the penalty for not doing enough, ratchets up. Why? There are many possible reasons, including the following:
Many of the facilities-level improvements in energy use at data centers have been achieved already — indeed, average industry power usage effectiveness values show only marginal improvements over the last five years. Some of these efficiencies may even go into reverse if other priorities, such as water use or resiliency, take precedence (economizers may have to be supplemented with mechanical chillers to reduce water use, for example).
Improvements in IT energy efficiency have also slowed — partly due to the slowing or even ending of Moore’s Law (i.e., IT performance doubling every two years) — and because the easiest gains in IT utilization have already been achieved.
Some of the improvements in carbon emissions over the next decade require looking beyond immediate on-site emissions, or those from energy supplies. Increasingly, operators of critical digital infrastructure — very often under external pressure and executive mandate — must start to record the embedded carbon emissions (known as Scope 3 emissions) in the products and services they use. This requires skills, tools and considerable administrative effort.
The biggest operators of digital infrastructure — among them Amazon, Digital Realty, Equinix, Facebook, Google and Microsoft — have made ambitious and specific commitments to achieve carbon neutrality in line with science-based targets within the next two decades. That means, first, they are setting standards that will be difficult for many others to match, giving them a competitive advantage; and second, these companies will put pressure on their supply chains — including data center partners — to minimize emissions.
The full report Five data center trends for 2021 is available to members of Uptime Institute which can be obtained here.
https://journal.uptimeinstitute.com/wp-content/uploads/2021/01/sustainability2.jpg242700Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngAndy Lawrence, Executive Director of Research, Uptime Institute, [email protected]2021-01-11 05:42:002021-05-04 10:25:22Sustainability: More challenging, more transparent
One of the most widely anticipated trends in IT and infrastructure is significant new demand for edge computing, fueled by technologies such as 5G, IoT and AI. To date, net new demand for edge computing — processing, storing and integrating data close to where it is generated — has built slowly. As a result, some suppliers of micro data center and edge technologies have had to lower their investors’ expectations.
This slow build-out, however, does not mean that it will not happen. Demand for decentralized IT will certainly grow. There will be more workloads that need low latency, such as healthcare tech, high performance computing (notably more AI), critical IoT, and virtual and augmented reality, as well as more traffic from latency-sensitive internet companies (as Amazon famously said 10 years ago, every 100 milliseconds of latency costs them one percent in sales). There will also be more data generated by users and “things” at the edge, which will be too expensive to transport across long distances to large, centralized data centers (the “core”).
For all these reasons, new edge data center and connectivity capacity will be needed, and we expect a wave of new partnerships and deals in 2021. Enterprises will connect to clouds via as-a-service (on-demand, software-driven) interconnections at the edge, and the internet will extend its reach with new exchange points. Just as the internet is a network of tens of thousands of individual networks connected together, the edge will require not just new capacity but also a new ecosystem of suppliers working together. The year 2021 will likely see intense activity — but the long-expected surge in demand may have to wait.
The edge build-out will be uneven, in part because the edge is not a monolith. Different edge workloads need different levels of latency, bandwidth and resiliency, as shown in the data center schema below. Requirements for data transit and exchanges will also vary. Edge infrastructure service providers will need to rely on many partners, including specialist vendors that will serve different customer requirements. Enterprise customers will become increasingly dependent on third-party connections to different services.
So far, much attention has been focused on the local edge, where connectivity and IT capacity are sited within a kilometer or so from devices and users. In urban areas, where 5G is (generally) expected to flourish, and in places where a lot of IoT data is generated, such as factories and retail stores, we are slowly seeing more micro data centers being deployed. These small facilities can act either as private connections or internet exchange points (or both), handing off wireless data to a fiber connection and creating new “middle-mile” connections.
We expect that edge micro data centers will be installed both privately and as shared infrastructure, including for cloud providers, telcos and other edge platform providers, to reduce latency and keep transit costs in check. To get closer to users and “things,” fiber providers will also partner with more wireless operators.
In 2021, most of the action is likely to be one step further back from the edge, in regional locations where telcos, cloud providers and enterprises are creating — or consuming — new interconnections in carrier-neutral data centers such as colo and wholesale facilities. All major cloud providers are increasingly creating points of presence (PoPs) in more colos, creating software-defined WANs of public (internet) and private (enterprise) connections. Colo customers are then able to connect to various destinations, depending on their business needs, via software, hardware and networks that colos are increasingly providing. These interconnections are making large leased facilities a preferred venue for other suppliers to run edge infrastructure-as-a-service offerings, including for IoT workloads. For enterprises and suppliers alike, switching will become as important as power and space.
We expect more leased data centers will be built (and bought) in cities and suburbs in 2021 and beyond. Large and small colos alike will place more PoPs in third-party facilities. And more colos will provide more software-driven interconnection platforms, either via internal development, partnerships or acquisitions.
At the same time, CDNs that already have large edge footprints will further exploit their strong position by offering more edge services on their networks directly to enterprises. We’re also seeing more colos selling “value-add” IT and infrastructure-as-a-service products — and we expect they will extend further up the IT stack with more compute and storage capabilities.
The edge build-out will clearly lead to increased operational complexity, whereby suppliers will have to manage hundreds of application program interfaces and multiple service level agreements. For these reasons, the edge will need to become increasingly software-defined and driven by AI. We expect investment and partnerships across all these areas.
How exactly it will play out remains unclear; it is simply too early. Already we have seen major telco and data center providers pivot their edge strategies, including moving from partnerships to acquisitions.
One segment we are watching particularly closely is the big internet and cloud companies. Having built significant backbone infrastructure, they have made little or only modest investments to date at the edge. With their huge workloads and deep pockets, their appetite for direct ownership of edge infrastructure is not yet known but could significantly shape the ecosystem around them.
The full report Five data center trends for 2021 is available to members of Uptime Institute, guest membership can be found here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/12/Edge5sm.jpg235645Rhonda Ascierto, Vice President, Research, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRhonda Ascierto, Vice President, Research, Uptime Institute2021-01-04 05:34:002021-05-04 10:24:50Edge Computing – The Next Frontier
Outsourcing the requirement to own and operate data center capacity is the cornerstone of many digital transformation strategies, with almost every large enterprise spreading their workloads across their own data centers, colocation sites and public cloud. But ask any regulator, any chief executive, any customer: You can’t outsource responsibility — for incidents, outages, security breaches or even, in the years ahead, carbon emissions.
Chief information officers, chief technology officers and other operational heads knew this three or four decades ago (and many have learned the hard way since). That is why data centers became physical and logical fortresses, and why almost every component and electrical circuit has some level of redundancy.
In 2021, senior executives will grapple with a new iteration of the accountability imperative. Even the most cautious enterprises now want to make more use of the public cloud, while the use of private clouds is enabling greater choices of third-party venue and IT architecture. But this creates a problem: cloud service operators, software-as-a-service (SaaS) providers and even some colos are rarely fully accountable or transparent about their shortcomings — and they certainly do not expect to be held financially accountable for consequences of failures. Investors, regulators, customers and partners, meanwhile, want more oversight, more transparency and, where possible, more accountability.
This is forcing many organizations to take a hard look at which workloads can be safely moved to the cloud and which cannot. For some, such as the European financial services sector, regulators will require an assessment of the criticality of workloads — a trend that is likely to spread and grow to other sectors over time. The most critical applications and services will either have to stay in-house, or enterprise executives will need to satisfy themselves and their regulators that these services are run well by a third-party provider, and that they have full visibility into the operational practices and technical infrastructure of their provider.
The data suggests this is a critical period in the development of IT governance. The shift of enterprise IT workloads from on-premises data center to cloud and hosted services is well underway. But there is a long way to go, and some of the issues around transparency and accountability have arisen only recently as more critical and sensitive data and functionality is considered for migration to the cloud.
The first tranche of workloads moving to third parties often did not include the most critical or sensitive services. For many organizations, a public cloud is (or was initially) the venue of choice for specific types of workloads, such as application test and development; big-data processing, such as AI; and new applications that are cloud-native. But as more IT departments become familiar with the tool sets from cloud providers, such as for application development and deployment orchestration, more types of workloads have moved into public clouds only recently, with more critical applications to follow (or perhaps not). High-profile, expensive public cloud outages, increased regulatory pressures and an increasingly uncertain macroeconomic outlook will force many enterprises to assess — or reassess — where workloads should actually be running (a process that has been called “The Big Sort”).
Uptime Institute believes that many mission-critical workloads are likely to remain in on-premises or colo data centers — at least for many years to come: More than 70% of IT and critical infrastructure operators we surveyed in 2020 do not put any critical workloads in a public cloud, with over a quarter of this group (21% of the total sample) saying the reason is a lack of visibility/accountability about resiliency. And over a third of those who do place critical applications in a public cloud also say they do not have enough visibility (see chart below). Clearly, providers’ assurances of availability and of adherence to best practices are not enough for mission-critical workloads. (These results were almost identical when we asked the same question in our 2019 annual survey.)
The issues of transparency, reporting and governance are likely to ripple through the cloud, SaaS and hosting industries, as customers seek assurances of excellence in operations — especially when financial penalties for failures by third parties are extremely light. While even the largest cloud and internet application providers operate mostly concurrently maintainable facilities, experience has shown that unaudited (“mark your own homework”) assurances frequently lead to poor outcomes.
Creeping criticality
There is an added complication. While the definitions and requirements of criticality in IT are dictated by business requirements, they are not fixed in time. Demand patterns and growing IT dependency mean many workloads/services have become more critical — but the infrastructure and processes supporting them may not have been updated (“creeping criticality”). This is a particular concern for workloads subject to regulatory compliance (“compliance drift”).
COVID-19 may have already caused a reassessment of the criticality or risk profile of IT; extreme weather may provide another. When Uptime Institute recently asked over 250 on-premises and colo data center managers how the pandemic would change their operations, two-thirds said they expect to increase the resiliency of their core data center(s) in the years ahead. Many said they expected their costs to increase as a result. One large public cloud company recently asked their leased data center providers to upgrade their facilities to N+1 redundancy, if they were not already.
But even before the pandemic, there was a trend toward higher levels of redundancy for on-premises data centers. There is also an increase in the use of active-active availability zones, especially as more workloads are designed using cloud or microservices architectures. Workloads are more portable, and instances are more easily copied than in the past. But we see no signs that this is diminishing the need for site-level resiliency.
Colos are well-positioned to provide both site-level resiliency (which is transparent and auditable) and outsourced IT services, such as hosted private clouds. We expect more colos will offer a wider range of IT services, in addition to interconnections, to meet the risk (and visibility) requirements of more mission-critical workloads. The industry, it seems, has concluded that more resiliency at every level is the least risky approach — even if it means some extra expense and duplication of effort.
Uptime Institute expects that the number of enterprise (privately owned/on-premises) data centers will continue to dwindle but that enterprise investment in site-level resiliency will increase (as will investment in data-driven operations). Data centers that remain in enterprise ownership will likely receive more investment and continue to be run to the highest standards.
The full report Five data center trends for 2021 is available to members of the Uptime Institute Inside Track community here.
https://journal.uptimeinstitute.com/wp-content/uploads/2020/12/resp-sm.jpg6801100Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngAndy Lawrence, Executive Director of Research, Uptime Institute, [email protected]2020-12-21 05:15:002021-05-04 10:22:35Accountability – the “new” imperative
Data center staff on-site: engineering specialists or generalists?
/in Executive, Operations/by Rhonda Ascierto, Vice President, Research, Uptime InstituteThe pandemic has led to a renewed interest by data center managers in remote monitoring, management and automation. Uptime Institute has fielded dozens of inquiries about these approaches in recent months, but one in particular stands out: What will operational automation and machine learning mean for on-site staff requirements?
With greater automation, the expectation is a move toward light staffing models, with just one or a handful of technicians on-site. These technicians will need to be able to respond to a range of potential situations: electrical or mechanical issues; software administration problems; break/fix needs on the computer-room floor; the ability to configure equipment, including servers, switches and routers; and so on. Do people with these competencies exist? How long does it take to train them?
Our experts agree: the requirement of on-site staff is shifting — from electrical and mechanical specialists to more generalist technicians whose primary role is to monitor and control data center activities to prevent incidents (especially outages).
Even before the pandemic, most data center technicians on-site did not carry out major preventative maintenance activities (although some do conduct low-level preventative maintenance); they support and escort vendors who do this work. The pandemic has accelerated this trend. On-site technicians today are typically trained as operational coordinators, switching and isolating equipment when necessary, ensuring adequate monitoring, reacting to unexpected and emergency conditions with the goal of getting a situation under control and returning the data center to a stable state.
Staffing costs have always been a major item in data center operations budgets. With the advent of better remote monitoring and resiliency of the data center in recent years, the perceived need for larger numbers of on-site data center staffing has diminished, particularly during off hours when activity is low. This trend is unlikely to reverse even after pandemic times.
One of our members, for example, runs an extremely large data center site that can be described as being built with a Tier III (concurrently maintainable) intent. It is mostly leased by hyperscale internet/cloud companies. On-site technicians are trained as generalists for operations and emergency coverage, and they work 12-hour shifts. A separate 8-hour day shift is staffed more heavily with engineers to handle customer projects and to assist with other operator activities as needed. All preventative maintenance is conducted by third-party vendors, who are escorted on-site by the staff technicians. Management anticipates moving to an automated, condition maintenance-based approach in the future, with the aim of lowering the number of on-site technical staff over time. The expectation is that on-site 24/7 staff will always be required to meet client service level agreements, but by lowering their numbers there will be meaningful operational savings.
However, this will not be a swift change (for this or any other data center). Implementing automated, software-driven systems is an iterative — and human-driven — process that takes time, ongoing investment and, critically, organizational and process change.
Technologies and services for remote data center monitoring and management are available and continue to develop. As they are (slowly and carefully) implemented, managers will feel more comfortable not having personnel on-site 24/7. In time, management focus will likely shift from ensuring round-the-clock staffing to developing more of an on-call approach. Already, more data centers are employing technicians and engineers who support multiple sites rather than having a fully staffed, dedicated team for each individual data center. These technicians have general knowledge of electrical and mechanical systems, and they coordinate the preventive and corrective maintenance activities, which are mostly performed by vendors.
Today, however, because of the pandemic, there is generally a greater reliance in on-site staffing, whereby technicians in the data center are providing managers with security/comfort and insurance in case there is an incident or outage. This is likely a short-term reaction.
In the medium term — say, in the next three to five years or so — we expect there will be increased use of plug-and-play data center and IT components and systems, so generalist site staffers can readily remove and replace modules as needed, without extensive training.
In the long term, more managers will seek to enhance and ensure data center and application resiliency and automation. This will involve the technical development of more self-healing systems/networks and redundancies (driven by software) that allow for reduced levels of on-site staff and reduced expertise of those personnel. If business functions can continue in the face of a failure without human intervention, then mean-time-to-repair becomes far less critical — a technician or vendor can be dispatched in due course with a replacement component to restore full functionality of the site. This type of self-healing approach has been discussed in earnest for at least the past decade but has not yet been realized — in no small part because of the operational change and new operational processes needed. A self-healing, autonomous operational approach would be an overhaul of today’s decades-long, industry-wide practices. Change is not always easy, and rarely is it inexpensive.
What is likely to (finally) propel the development of and move to self-healing technologies is the expected demand for large numbers of lights-out edge data centers. These small facilities will increasingly be designed to be plug-and-play and to be serviced by people with little specialized skills/training. On-site staff will be trained primarily to reliably follow directions from remote technical experts. These remote experts will be responsible for analyzing monitored data and providing specific instructions for the staffer at the site. It is possible, if not likely, that most people dispatched on-site to edge facilities will be vendors swapping out components. And increasingly, the specialist mechanical and electrical staff will not only be remote, but also trained experts in real-time monitoring and management software and software-driven systems.
Ensuring physical security in uncertain times
/in Executive/by Rhonda Ascierto, Vice President, Research, Uptime InstituteRecent events have heightened concerns around physical security for many data center operators, and with good reason: the pandemic means many data centers may still be short-staffed, less time may have been available for review of and training on routine procedures, and vendor substitutes may be more common than under non-pandemic conditions. Add the usual “unusuals” that affect operations (e.g., severe storms causing staff absences and increasing the likelihood of utility failures), and normal precautions may fall by the wayside.
For most data centers, much of physical security starts at site selection and design. The typical layered (“box inside a box”) security strategy adopted by most facilities handles many concerns. If a data center has vulnerabilities (e.g., dark fiber wells beyond the perimeter), they’re generally known and provisions have been made to monitor them. Routine security standards are in place, emergency procedures are established, and all employees are trained.
But what keeps data center operators up at night is the unexpected. The recent bombing in Nashville, Tennessee (US), which disrupted internet and wireless services, and new threats to Amazon Web Services facilities as a result of their decision to suspend hosting the social media platform Parler are a stark reminder that extreme events can occur.
A December 2019 report from Uptime Institute summed it up best, by stating that IT security is one of the big issues of the information age. Billions of dollars are spent protecting the integrity and availability of data against the actions of malign agents. But while cybersecurity is a high-profile issue, all information lives in a physical data center somewhere, and much of it needs the highest order of protection. Data center owners/operators employ a wide range of tactics to maintain a perimeter against intruders and to regulate the activities of clients and visitors inside the data center. The full report assesses operator security spending, concerns, management and best practices.
Key Findings from December 2019 report:
• Spending on physical security is commonly around 5% of the operations budget but in extremecases can be as high as 30%.
• Data centers employ a range of common technologies and techniques to control access to the facility, but there is no “one size fits all” solution to physical security: each organization must tailor their approach to fit their circumstances.
• Neither cloud-based data replication nor the threat of cybersecurity to both IT systems and facilities equipment have significantly diminished the need for physical security.
• Most data center owners and operators consider unauthorized activity in the data center to be the greatest physical threat to IT.
• Access to the data center property is governed by policies that reflect the business requirements of the organization and establish the techniques and technologies used to ensure the physical security of the facility. These policies should be reviewed regularly and benchmarked against those of similar organizations.
• Data centers commonly employ third-party security services to enforce physical security policies.
• Attempts at unwarranted entry do occur. In a recent study, about one in five data centers experienced some form of attempted access in a 5-year period.
• Drones, infrared cameras, thermal scanners and video analytics are promising new technologies.
• Biometric recognition is still viewed skeptically by many operators.
Now is the time to review your security plans and emergency operations procedures and to brief staff. Ensure they know the organization’s strategies and expectations. If your facility is in an area where many data centers are clustered together, consider collaborating with them to develop a regional plan.
A Surge of Innovation
/in Design, Executive/by Rhonda Ascierto, Vice President, Research, Uptime InstituteData center operators (and enterprise IT) are generally cautious adopters of new technologies. Only a few (beyond hyperscale operators) try to gain a competitive advantage through their early use of technology. Rather, they have a strong preference toward technologies that are proven, reliable and well-supported. This reduces risks and costs, even if it means opportunities to jump ahead in efficiency, agility or functionality are missed.
But innovation does occur, and sometimes it comes in waves, perhaps triggered by the opportunity for a significant leap forward in efficiency, the sudden maturing of a technology, or some external catalyst. The threat of having to close critical data centers to move workloads to the public cloud may be one such driver; the need to operate a facility without staff during a weather event, or a pandemic crisis, may be another; the need to operate with far fewer carbon emissions may be yet another. Sometimes one new technology needs another to make it more economic.
The year 2021 may be one of those standouts in which a number of emerging technologies begin to gain traction. Among the technologies on the edge of wider adoption are:
All of these technologies are complementary; all have been much discussed, sampled and tested for several years, but so far with limited adoption. Three of these four were identified as highly promising technologies in the Uptime Institute/451 Research Disrupted Data Center research project summarized in the report Disruptive Technologies in the Datacenter: 10 Technologies Driving a Wave of Change, published in 2017. As the disruption profile below shows, these technologies were clustered to the left of the timeline, meaning they were, at that time, not yet ready for widespread adoption.
Now the time may be coming, with hyperscale operators particularly interested in storage-class memory and silicon photonics. But small operators, too, are trying to solve new problems — to match the efficiency of their larger counterparts, and, in some cases, to deploy highly efficient, reliable, and powerful edge data centers.
Storage-class memory
Storage-class memory (SCM) is a generic label for emerging types of solid-state media that offer the same or similar performance as dynamic random access memory or static random access memory, but at lower cost and with far greater data capacities. By allowing servers to be fitted with larger memories, SCM promises to heavily boost processing speeds. SCM is also nonvolatile or persistent — it retains data even if power to the device is lost and promises greater application availability by allowing far faster restarts of servers after reboots and crashes.
SCM can be used not just as memory, but also as an alternative to flash for high-speed data storage. For data center operators, the (widespread) use of SCM could reduce the need for redundant facility infrastructure, as well as promote higher-density server designs and more dynamic power management (software-defined power is discussed below).
However, the continuing efforts to develop commercially viable SCM have faced major technical challenges. Currently only one SCM exists with the potential to be used widely in servers. That memory was jointly developed by Intel and Micron Technology, and is now called Optane by Intel, and 3D XPoint by Micron. Since 2017, it has powered storage drives made by Intel that, although far faster than flash equivalents, have enjoyed limited sales because of their high cost. More promisingly, Intel last year launched the first memory modules powered by Optane.
Software suppliers such as Oracle and SAP are changing the architecture of their databases to maximize the benefits of the SCM devices, and major cloud providers are offering services based on Optane used as memory. Meanwhile a second generation of Optane/3D XPoint is expected to ship soon, and by reducing prices is expected to be more widely used in storage drives.
Silicon photonics
Silicon photonics enables optical switching functions to be fabricated on silicon substrates. This means electronic and optical devices can be combined into a single connectivity/processing package, reducing transceiver/switching latency, costs, size and power consumption (by up to 40%). While this innovation has uses across the electronics world, data centers are expected to be the biggest market for the next decade.
In the data center, silicon photonics allows components (such as processors, memory, input/output [I/O]) that are traditionally packaged on one motherboard or within one server to be optically interconnected, and then spread across a data hall — or even far beyond. Effectively, it has the potential to turn a data center into one big computer, or for data centers to be built out in a less structured way, using software to interconnect disaggregated parts without loss of performance. The technology will support the development of more powerful supercomputers and may be used to support the creation of new local area networks at the edge. Networking switches using the technology can also save 40% on power and cooling (this adds up in large facilities, which can have up to 50,000 switches).
Acquisitions by Intel (Barefoot Networks), Cisco (Luxtera, Acacia Communications) and Nvidia (Mellanox Networking) signal a much closer integration between network switching and processors in the future. Hyperscale data center operators are the initial target market because the technology can combine with other innovations (as well as with Open Compute Project rack and networking designs). As a result, we expect to see the construction of flexible, large-scale networks of devices in a more horizontal, disaggregated way.
ARM servers
The Intel x86 processor family is one of the building blocks of the internet age, of data centers and of cloud computing. Whether provided by Intel or a competitor such as Advanced Micro Devices, almost every server in every data center is built around this processor architecture. With its powerful (and power-hungry) cores, its use defines the motherboard and the server design and is the foundation of the software stack. Its use dictates technical standards, how workloads are processed and allocated, and how data centers are designed, powered and organized.
This hegemony may be about to break down. Servers based on the ARM processor design — the processors used in billions of mobile phones and other devices (and soon, in Apple MacBooks) — are now being used by Amazon Web Services (AWS) in its proprietary designs. Commercially available ARM systems offer dramatic price, performance and energy consumption improvements over current Intel x86 designs. When Nvidia announced its (proposed) $40 billion acquisition of ARM in early 2020, it identified the data center market as its main opportunity. The server market is currently worth $67 billion a year, according to market research company IDC (International Data Corporation).
Skeptics may point out that there have been many servers developed and offered using alternative, low-power and smaller processors, but none have been widely adopted to date. Hewlett Packard Enterprise’s Moonshot server system, initially launched using low-powered Intel Atom processors, is the best known but, due to A variety of factors, market adoption has been low.
Will that change? The commitment to use ARM chips by Apple (currently for MacBooks) and AWS (for cloud servers) will make a big difference, as will the fact that even the world’s most powerful supercomputer (as of mid-2020) uses an ARM Fujitsu microprocessor. But innovation may make the biggest difference. The UK-based company Bamboo Systems, for example, designed its system to support ARM servers from the ground up, with extra memory, connectivity and I/O processors at each core. It claims to save around 60% of the costs, 60% of the energy and 40% of the space when compared with a Dell x86 server configured for the same workload.
Software-defined power
In spite of its intuitive appeal and the apparent importance of the problems it addresses, the technology that has come to be known as “software-defined power” has to date received little uptake among operators. Software-defined power, also known as “smart energy,” is not one system or single technology but a broad umbrella term for technologies and systems that can be used to intelligently manage and allocate power and energy in the data center.
Software-defined power systems promise greater efficiency and use of capacity, more granular and dynamic control of power availability and redundancy, and greater real-time management of resource use. In some instances, it may reduce the amount of power that needs to be provisioned, and it may allow some energy storage to be sold back to the grid, safely and easily.
Software-defined power adopts some of the architectural designs and goals of software-defined networks, in that it virtualizes power switches as if they were network switches. The technology has three components: energy storage, usually lithium-ion (Li-ion) batteries; intelligently managed power switches or breakers; and, most importantly, management software that has been designed to automatically reconfigure and allocate power according to policies and conditions. (For a more detailed description, see our report Smart energy in the data center).
Software-defined power has taken a long time to break into the mainstream — and even 2021 is unlikely to be the breakthrough year. But a few factors are swinging in its favor. These include the widespread adoption of Li-ion batteries for uninterruptible power supplies, an important precondition; growing interest from the largest operators and the biggest suppliers (which have so far assessed technology, but viewed the market as unready); and, perhaps most importantly, an increasing understanding by application owners that they need to assess and categorize their workloads and services for differing resiliency levels. Once they have done that, software-defined power (and related smart energy technologies) will enable power availability to be applied more dynamically to the applications that need it, when they need it.
The full report Five data center trends for 2021 is available to members of the Uptime Institute community which can be found here.
Sustainability: More challenging, more transparent
/in Design, Executive/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]Through 2021 and beyond, the world will begin to recover from its acute crisis — COVID-19 — and will turn its attention to other matters. Few if any of these issues will be as important as climate change, a chronic condition that will become more pressing and acute as each year passes.
In the critical digital infrastructure sector, as in all businesses, issues arising directly or indirectly from climate change will play a significant role in strategic decision-making and technical operations in the years ahead. And this is regardless of the attitude or beliefs of senior executives; stakeholders, governments, customers, lobbyists and watchdogs all want and expect to see more action. The year 2021 will be critical, with governments expected to act with greater focus and unity as the new US government rejoins the global effort.
We can group the growing impact of climate change into four areas:
In the IT sector generally — including in data centers — it has not all been bad news to date. Led by the biggest cloud and colo companies, and judged by several metrics, the data center sector has made good progress in curtailing carbon emissions and wasteful energy use. According to the Carbon Trust, a London-based body focused on reducing carbon emissions, the IT sector is on course to meet its science-based target for 2030 — a target that will help keep the world to 1.5 degrees Celsius (34.7 degrees Fahrenheit) warming (but still enough warming to create huge global problems). Its data shows IT sector carbon emissions from 2020 to 2030 are on a trajectory to fall significantly in five key areas – data centers, user devices, mobile networks, and fixed and enterprise networks. Overall, the IT sector needs to cut carbon emissions by 50% from 2020 to 2030.
Data centers are just a part of this, accounting for more carbon emissions than mobile, fixed or enterprise networks, but significantly less than all the billions of user devices. Data center energy efficiency has been greatly helped by facility efficiencies, such as economizer cooling, improvements in server energy use, and greater utilization through virtualization and other IT/software improvements. Use of renewables has also helped: According to Uptime Institute data (our 2020 Climate Change Survey) over a third of operators now largely power their data centers using renewable energy sources or offset their carbon use (see figure below). Increasing availability of renewable power in the grid will help to further reduce emissions.
But there are some caveats to the data center sector’s fairly good performance. First, the reduction in carbon emissions achieved to date is contested by many who think the impact of overall industry growth on energy use and carbon emissions has been understated (i.e., energy use/carbon emissions are actually quite a lot higher than widely accepted models suggest — a debatable issue that Uptime Institute continues to review). Second, at an individual company or data center level, it may become harder to achieve carbon emissions reductions in the next decade than it has been in the past decade — just as the level of scrutiny and oversight, and the penalty for not doing enough, ratchets up. Why? There are many possible reasons, including the following:
The biggest operators of digital infrastructure — among them Amazon, Digital Realty, Equinix, Facebook, Google and Microsoft — have made ambitious and specific commitments to achieve carbon neutrality in line with science-based targets within the next two decades. That means, first, they are setting standards that will be difficult for many others to match, giving them a competitive advantage; and second, these companies will put pressure on their supply chains — including data center partners — to minimize emissions.
The full report Five data center trends for 2021 is available to members of Uptime Institute which can be obtained here.
Edge Computing – The Next Frontier
/in Design, Executive/by Rhonda Ascierto, Vice President, Research, Uptime InstituteOne of the most widely anticipated trends in IT and infrastructure is significant new demand for edge computing, fueled by technologies such as 5G, IoT and AI. To date, net new demand for edge computing — processing, storing and integrating data close to where it is generated — has built slowly. As a result, some suppliers of micro data center and edge technologies have had to lower their investors’ expectations.
This slow build-out, however, does not mean that it will not happen. Demand for decentralized IT will certainly grow. There will be more workloads that need low latency, such as healthcare tech, high performance computing (notably more AI), critical IoT, and virtual and augmented reality, as well as more traffic from latency-sensitive internet companies (as Amazon famously said 10 years ago, every 100 milliseconds of latency costs them one percent in sales). There will also be more data generated by users and “things” at the edge, which will be too expensive to transport across long distances to large, centralized data centers (the “core”).
For all these reasons, new edge data center and connectivity capacity will be needed, and we expect a wave of new partnerships and deals in 2021. Enterprises will connect to clouds via as-a-service (on-demand, software-driven) interconnections at the edge, and the internet will extend its reach with new exchange points. Just as the internet is a network of tens of thousands of individual networks connected together, the edge will require not just new capacity but also a new ecosystem of suppliers working together. The year 2021 will likely see intense activity — but the long-expected surge in demand may have to wait.
The edge build-out will be uneven, in part because the edge is not a monolith. Different edge workloads need different levels of latency, bandwidth and resiliency, as shown in the data center schema below. Requirements for data transit and exchanges will also vary. Edge infrastructure service providers will need to rely on many partners, including specialist vendors that will serve different customer requirements. Enterprise customers will become increasingly dependent on third-party connections to different services.
So far, much attention has been focused on the local edge, where connectivity and IT capacity are sited within a kilometer or so from devices and users. In urban areas, where 5G is (generally) expected to flourish, and in places where a lot of IoT data is generated, such as factories and retail stores, we are slowly seeing more micro data centers being deployed. These small facilities can act either as private connections or internet exchange points (or both), handing off wireless data to a fiber connection and creating new “middle-mile” connections.
We expect that edge micro data centers will be installed both privately and as shared infrastructure, including for cloud providers, telcos and other edge platform providers, to reduce latency and keep transit costs in check. To get closer to users and “things,” fiber providers will also partner with more wireless operators.
In 2021, most of the action is likely to be one step further back from the edge, in regional locations where telcos, cloud providers and enterprises are creating — or consuming — new interconnections in carrier-neutral data centers such as colo and wholesale facilities. All major cloud providers are increasingly creating points of presence (PoPs) in more colos, creating software-defined WANs of public (internet) and private (enterprise) connections. Colo customers are then able to connect to various destinations, depending on their business needs, via software, hardware and networks that colos are increasingly providing. These interconnections are making large leased facilities a preferred venue for other suppliers to run edge infrastructure-as-a-service offerings, including for IoT workloads. For enterprises and suppliers alike, switching will become as important as power and space.
We expect more leased data centers will be built (and bought) in cities and suburbs in 2021 and beyond. Large and small colos alike will place more PoPs in third-party facilities. And more colos will provide more software-driven interconnection platforms, either via internal development, partnerships or acquisitions.
At the same time, CDNs that already have large edge footprints will further exploit their strong position by offering more edge services on their networks directly to enterprises. We’re also seeing more colos selling “value-add” IT and infrastructure-as-a-service products — and we expect they will extend further up the IT stack with more compute and storage capabilities.
The edge build-out will clearly lead to increased operational complexity, whereby suppliers will have to manage hundreds of application program interfaces and multiple service level agreements. For these reasons, the edge will need to become increasingly software-defined and driven by AI. We expect investment and partnerships across all these areas.
How exactly it will play out remains unclear; it is simply too early. Already we have seen major telco and data center providers pivot their edge strategies, including moving from partnerships to acquisitions.
One segment we are watching particularly closely is the big internet and cloud companies. Having built significant backbone infrastructure, they have made little or only modest investments to date at the edge. With their huge workloads and deep pockets, their appetite for direct ownership of edge infrastructure is not yet known but could significantly shape the ecosystem around them.
The full report Five data center trends for 2021 is available to members of Uptime Institute, guest membership can be found here.
Accountability – the “new” imperative
/in Executive, Operations/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]Outsourcing the requirement to own and operate data center capacity is the cornerstone of many digital transformation strategies, with almost every large enterprise spreading their workloads across their own data centers, colocation sites and public cloud. But ask any regulator, any chief executive, any customer: You can’t outsource responsibility — for incidents, outages, security breaches or even, in the years ahead, carbon emissions.
Chief information officers, chief technology officers and other operational heads knew this three or four decades ago (and many have learned the hard way since). That is why data centers became physical and logical fortresses, and why almost every component and electrical circuit has some level of redundancy.
In 2021, senior executives will grapple with a new iteration of the accountability imperative. Even the most cautious enterprises now want to make more use of the public cloud, while the use of private clouds is enabling greater choices of third-party venue and IT architecture. But this creates a problem: cloud service operators, software-as-a-service (SaaS) providers and even some colos are rarely fully accountable or transparent about their shortcomings — and they certainly do not expect to be held financially accountable for consequences of failures. Investors, regulators, customers and partners, meanwhile, want more oversight, more transparency and, where possible, more accountability.
This is forcing many organizations to take a hard look at which workloads can be safely moved to the cloud and which cannot. For some, such as the European financial services sector, regulators will require an assessment of the criticality of workloads — a trend that is likely to spread and grow to other sectors over time. The most critical applications and services will either have to stay in-house, or enterprise executives will need to satisfy themselves and their regulators that these services are run well by a third-party provider, and that they have full visibility into the operational practices and technical infrastructure of their provider.
The data suggests this is a critical period in the development of IT governance. The shift of enterprise IT workloads from on-premises data center to cloud and hosted services is well underway. But there is a long way to go, and some of the issues around transparency and accountability have arisen only recently as more critical and sensitive data and functionality is considered for migration to the cloud.
The first tranche of workloads moving to third parties often did not include the most critical or sensitive services. For many organizations, a public cloud is (or was initially) the venue of choice for specific types of workloads, such as application test and development; big-data processing, such as AI; and new applications that are cloud-native. But as more IT departments become familiar with the tool sets from cloud providers, such as for application development and deployment orchestration, more types of workloads have moved into public clouds only recently, with more critical applications to follow (or perhaps not). High-profile, expensive public cloud outages, increased regulatory pressures and an increasingly uncertain macroeconomic outlook will force many enterprises to assess — or reassess — where workloads should actually be running (a process that has been called “The Big Sort”).
Uptime Institute believes that many mission-critical workloads are likely to remain in on-premises or colo data centers — at least for many years to come: More than 70% of IT and critical infrastructure operators we surveyed in 2020 do not put any critical workloads in a public cloud, with over a quarter of this group (21% of the total sample) saying the reason is a lack of visibility/accountability about resiliency. And over a third of those who do place critical applications in a public cloud also say they do not have enough visibility (see chart below). Clearly, providers’ assurances of availability and of adherence to best practices are not enough for mission-critical workloads. (These results were almost identical when we asked the same question in our 2019 annual survey.)
The issues of transparency, reporting and governance are likely to ripple through the cloud, SaaS and hosting industries, as customers seek assurances of excellence in operations — especially when financial penalties for failures by third parties are extremely light. While even the largest cloud and internet application providers operate mostly concurrently maintainable facilities, experience has shown that unaudited (“mark your own homework”) assurances frequently lead to poor outcomes.
Creeping criticality
There is an added complication. While the definitions and requirements of criticality in IT are dictated by business requirements, they are not fixed in time. Demand patterns and growing IT dependency mean many workloads/services have become more critical — but the infrastructure and processes supporting them may not have been updated (“creeping criticality”). This is a particular concern for workloads subject to regulatory compliance (“compliance drift”).
COVID-19 may have already caused a reassessment of the criticality or risk profile of IT; extreme weather may provide another. When Uptime Institute recently asked over 250 on-premises and colo data center managers how the pandemic would change their operations, two-thirds said they expect to increase the resiliency of their core data center(s) in the years ahead. Many said they expected their costs to increase as a result. One large public cloud company recently asked their leased data center providers to upgrade their facilities to N+1 redundancy, if they were not already.
But even before the pandemic, there was a trend toward higher levels of redundancy for on-premises data centers. There is also an increase in the use of active-active availability zones, especially as more workloads are designed using cloud or microservices architectures. Workloads are more portable, and instances are more easily copied than in the past. But we see no signs that this is diminishing the need for site-level resiliency.
Colos are well-positioned to provide both site-level resiliency (which is transparent and auditable) and outsourced IT services, such as hosted private clouds. We expect more colos will offer a wider range of IT services, in addition to interconnections, to meet the risk (and visibility) requirements of more mission-critical workloads. The industry, it seems, has concluded that more resiliency at every level is the least risky approach — even if it means some extra expense and duplication of effort.
Uptime Institute expects that the number of enterprise (privately owned/on-premises) data centers will continue to dwindle but that enterprise investment in site-level resiliency will increase (as will investment in data-driven operations). Data centers that remain in enterprise ownership will likely receive more investment and continue to be run to the highest standards.
The full report Five data center trends for 2021 is available to members of the Uptime Institute Inside Track community here.