With the recent expansion of the American Society of Heating, Refrigerating and Air-Conditioning Engineers’ (ASHRAE’s) acceptable data center operating temperature and humidity ranges — taken as an industry-standard best practice by many operators — the case for free air cooling has become much stronger. Free air cooling is an economical method of using low external air temperature to cool server rooms.
In the 2019 Uptime Institute Supply-side Survey (available to member of the Uptime Institute Network) we asked over 500 data center vendors, consultants and engineers about their customers’ adoption of free air economizer cooling (the use of outside air or a combination of water and air to supplement mechanical cooling) using the following approaches:
Indirect air: Outside air passes through a heat exchanger that separates the air inside the data center from the cooler outside air. This approach prevents particulates from entering the white space and helps control humidity levels.
Direct air: Outside air passes through an evaporative cooler and is then directed via filters to the data center cold aisle. When the temperature outside is too cold, the system mixes the outside air with exhaust air to achieve the correct inlet temperature for the facility.
Findings from the survey show that free air cooling economization projects continue to gain traction, with indirect free air cooling being slightly more popular than direct air. In our survey, 84% said that at least some of their customers are deploying indirect air cooling (74% for direct air). Only 16% of participants said that none of their customers are deploying indirect free air cooling (26% for direct air), as shown in the figure below.
The data suggests that there is more momentum behind direct free air cooling in North America than in other parts of the world. Among North American respondents, 70% indicated that some of their customers are deploying direct air cooling (compared with 63% indirect air). As shown in the figure below, this was not the case in Europe or Asia-Pacific, where suppliers reported that more customers were deploying indirect air. This perhaps could be linked to the fact that internet giants represent a bigger data center market share in North America than in other parts of the world — internet giants are known to favor direct free air cooling when deploying at scale.
The continued pressure to increase cost-efficiency, as well as the rising awareness and interest in environmental impact, is likely to continue driving uptake of free air cooling. Compared with traditional compressor-based cooling systems, free air cooling requires less upfront capital investment and involves lower operational expenses, while having a lower environmental impact (e.g., no refrigerants, low embedded carbon and a higher proportion of recyclable components).
Yet, some issues hampering free air cooling uptake will likely continue in the short term. These include the upfront retrofit investment required for existing facilities; humidity and air quality constraints (which are less of a problem for indirect air cooling); lack of reliable weather models in some areas (and the potential impact of climate change); and restrictive service level agreements, particularly in the colocation sector.
Moreover, a lack of understanding of the ASHRAE standards and clarity around IT equipment needs is driving some operators to design to the highest common denominator, particularly when hosting legacy or mixed IT systems. The opportunity to take advantage of free air cooling is missed as a result, due to the perceived need to adopt lower operating temperatures.
Going forward, at least in Europe, this problem might be partially addressed by the introduction of the new European EcoDesign legislation for servers and online storage devices, which will take effect from March 2020. The new legislation will require IT manufacturers to declare the operating condition classes and thermal performance of their equipment. This, in turn, will help enterprise data centers better optimize their operations by segregating IT equipment based on ambient operating requirements.
The full report Uptime Institute data center supply-side survey 2019 is available to members of the Uptime Institute Network. You can become a member or request guest access by looking here or contacting any member of the Uptime Institute team.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/09/BLOG-Energy-Arrows.jpg8012202Rabih Bashroushhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRabih Bashroush2019-09-09 06:30:542019-08-19 15:17:00Data Center Free Air Cooling Trends
Uptime Institute has long argued that, although it may take many years, the long-term trend is toward a high level of automation in the data center, covering many functions that most managers currently would not trust to machines or outside programmers.
Our data center management maturity model shows this long-term evolution.
In our model, we have mapped different levels of operating efficiency to different stages of deployment of data center infrastructure management (DCIM) software. For any manager who is looking to buy DCIM or has already implemented the software and seeks expanded features or functions, we encourage them to consider their short- and long-term automation goals.
Today, most DCIM deployments fall into Level 2 or Level 3 of the model. A growing number of organizations are targeting Level 3 by integrating DCIM data with IT, cloud service and other non-facility data, as discussed in the report Data center management software and services: Effective selection and deployment (co-authored with Andy Lawrence).
The advent of AI-driven, cloud-based services will, we believe, drive greater efficiencies and, when deployed in combination with on-premises DCIM software, enable more data centers to reach Level 4 (and, over time, Level 5).
Although procurement decisions today may be only minimally affected by current automation needs, a later move toward greater automation should be considered, especially in terms of vendor choice/lock-in and integration.
Integration capabilities, as well as the use and integration of AI (including AI-driven cloud services), are important factors in both the overall strategic decision to deploy DCIM and the choice of a particular supplier/platform.
The full report Data center management software and services: Effective selection and deployment is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/09/BLOG-MM.jpg8012202Rhonda Ascierto, Vice President, Research, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRhonda Ascierto, Vice President, Research, Uptime Institute2019-09-02 08:00:322019-08-19 15:29:41The Evolving Data Center Management Maturity Model, A Quick Update
Today, the role that the physical data center plays in software-defined data centers, particularly facility design and operational management, is often overlooked. However, this is likely to change.
As more networking and compute becomes virtualized and flexible, so too must data center resources, in order to achieve maximum agility and efficiency. To virtualize only IT and networking resources is to optimize only the top layers of the stack; the supply of underlying physical data center resources — power, cooling, space — must also be tightly coupled to IT demand and resources, and automated accordingly.
This is where data center infrastructure management (DCIM) software comes into play. Leading DCIM platforms enable not just the operational management of data centers, but also the automation of key resources, such as power and cooling. For dynamic resource management, integration with IT and other non-facility data is key.
By integrating DCIM, organizations can tightly couple demand for the virtualized and logical resources (IT and networking) with the supply of physical facility resources (power, cooling and space). Doing so enables cost efficiencies and reduces the risk of service interruption due to under-provisioning.
Integrating DCIM also enables more informed decision-making around best-execution venues (internally and for colocation customers), taking into account the cost and availability of IT, connectivity and data center resources.
While integration is typically a “phase two” strategy (i.e., following the full deployment of a DCIM suite), integration goals should be established early on. The figure below is a simplified view of some of the data sources from systems spanning the IT stack that DCIM can integrate with. Uptime Institute Intelligence’s report Data center management software and services: Effective selection and deployment provides a better understanding of what is required.
Which processes are likely to require multisystem integration? Here are some examples:
Monitoring capacity across clouds and on-premises data centers (enterprise and colocation)
Possible software integrations: Cloud service application programming interfaces (APIs) for cloud monitoring, virtual machine (VM) management, DCIM suite, IT service management (ITSM).
Adjusting or moving workloads according to availability or energy costs/reliability, or to reduce risk during maintenance
Possible software integrations: DCIM suite, VM management, cloud service APIs for cloud monitoring and hybrid cloud management, ITSM/IT asset management, maintenance management, service catalog.
Colocation portal providing key data to customers
Possible software integrations: service level agreement (SLA) management, customer relationship management (CRM), DCIM, ITSM/IT asset management, interconnection management.
Data center service-based costing (real-time, chargeback)
Possible software integrations: CRM, service management, financial management, DCIM power monitoring, VM/IT resource use. Also useful for carbon/energy tracking/reporting.
Cloud-based resiliency/disaster recovery
Possible software integrations: DCIM, IT monitoring, workload management, capacity management, storage management, VM management, cloud service APIs for cloud monitoring and hybrid cloud management, disaster recovery/backup.
Unified incident/problem management
Possible software integrations: DCIM, ITSM, maintenance management, work-order system.
Identifying and eliminating underused/comatose servers
Possible software integrations: DCIM monitoring, ITSM utilization, IT asset/capacity management, VM management.
End-to-end financial planning
Possible software integrations: Financial planning, DCIM capacity planning.
Possible software integrations: DCIM monitoring, CRM, financial management/planning, IT asset management.
We are seeing more organizations, including large enterprises and colos, invest in DCIM integrations for higher levels of visibility with a goal of end-to-end automation.
Ultimately, by integrating DCIM with IT and other systems, organizations can more effectively plan for data center capacity investments. Using DCIM to optimize the use of existing facilities could also mean enterprises and colos may need fewer or smaller facilities in the future.
The full report Data center management software and services: Effective selection and deployment is available to members of the Uptime Institute Network here.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/08/DCIM-BLOG.jpg6931874Rhonda Ascierto, Vice President, Research, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRhonda Ascierto, Vice President, Research, Uptime Institute2019-08-26 09:00:482019-08-19 15:24:53DCIM as a Hub: Integrations Make all the difference
Uptime Institute’s Annual outage analysis, published early this year, called attention to the persistent problem of IT service and data center outages. Coupled with our annual survey data on outages, the analysis explains, to a degree, why investments to date have not greatly reduced the outage problem — at least from an end-to-end service view.
Gathering outage data is a challenge: there is no centralized database of outage reports in any country (that we are aware of) and, short of mandatory rules, there probably won’t be. Uptime Institute’s outage analysis relied on reports in the media, which skews the findings, and on survey data, which has its own biases. Other initiatives have similar limitations.
The US government also struggles to get an accurate accounting of data center/IT outages, even in closely watched industries with a public profile. The US General Accounting Office (GAO) recently issued a report (GAO-19-514) in which it documents 34 IT outages from 2015 through 2017 that affected 11 of the 12 selected (domestic US) airlines included in the report. The GAO believes that about 85% of the outages resulted in some flight delays or cancellations and 14% caused a ground stop of several hours or more. And directly related, Uptime Institute identified 10 major outages affecting the airline industry worldwide in the period since January 2016.
The Uptime Institute data is drawn from media reports and other more direct sources. It is not expected to be comprehensive. Many, many outages are kept as quiet as possible and the parties involved do their best to downplay the impact. The media-based approach provides insights, but probably understates the extent of the outage problem — at least in the global airline industry.
Government data is not complete either. The GAO explicitly notes many circumstances in which information about airline IT outages is unavailable to it and other agencies, except in unusual cases. These circumstances might involve smaller airlines and airports that don’t get attention. The GAO also notes that delays and cancellations can have multiple causes, which can reduce the number of instances in which an IT outage is blamed. The GAO’s illustration below provides examples of potential IT outage effects.
The report further notes: “No government data were available to identify IT outages or determine how many flights or passengers were affected by such outages. Similarly, the report does not describe the remedies given to passengers or their costs.” We do know, of course, that some airlines — Delta and United are two examples — have faced significant outage-related financial consequences.
Consumer complaints stemming from IT outages accounted for less than one percent of all complaints received by the US Department of Transportation from 2015 through June 2018, according to agency officials. These complaints raised concerns similar to those resulting from more common causes of flight disruption, such as weather. It is likely that all these incidents bring reputation costs to airlines that are greater than the operational costs the incidents incur.
The GAO did not have the mandate to identify the causes of outages it identified. The report describes possible causes in general terms. These include aging and legacy systems, incompatible systems, complexity, inter-dependencies, and a transition to third-party and cloud systems. Other issues included hardware failures, software outages or slowdowns, power or telecommunications failures, and network connectivity.
The GAO said, “Representatives from six airlines, an IT expert, and four other aviation industry stakeholders pointed to a variety of factors that could contribute to an outage or magnify the effect of an IT disruption. These factors ranged from under-investment in IT systems after years of poor airline profitability, increasing requirements on aging systems or systems not designed to work together, and the introduction of new customer-oriented platforms and services.” All of this is hardly breaking news to industry professionals, and many of these issues have been discussed in Uptime Institute meetings and in our 2016 Airline outages FAQ.
The report cites prevention efforts that reflect similarly standard themes, with five airlines moving to hybrid models (spreading workloads and risk, in theory) and two improving connectivity by using multiple telecommunications network providers. Stakeholders interviewed by the GAO mentioned contingency planning, recovery strategies and routine system testing; the use of artificial intelligence (although it is not clear for what functions); and outage drills as means for avoiding and minimizing system disruptions. (The Uptime Institute Digital Infrastructure Resiliency Assessment helps organizations better understand where their strengths and weaknesses lie.)
In short, the GAO was able to throw some light on a known problem but was not able to generate a complete record of outages in the US airline industry, provide an estimate of direct or indirect costs, explain their severity and impact, or pinpoint their causes. As a result, each airline is on its own to determine whether it will investigate outages, identify causes or invest in remedies. There is little information sharing; Uptime Institute’s Abnormal Incident Reporting System examines causes for data center-specific events, but it is not industry specific and would not capture many network or IT-related events. Although there have been some calls for greater sharing, within industries and beyond, there is little sign that most operators are willing to openly discuss causes and failures owing to the dangers of further reputation damage, lawsuits and exploitation by competition.
Access to our complete annual outages reports, data center survey results, abnormal Incident reporting data, energy efficiency in the data center and a wealth of other topics is available to members of Uptime Institute Network. Want to know more about this organization? Check out the complete benefits and request a trial of membership in the community here.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/08/GOA-chart-2.7aspect.jpg10002700Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2019-08-19 09:00:032019-08-19 15:20:06IT Outages in the Airline Industry, A New Report by the GAO
An artificial intelligence (AI) strategy for data center management and operation requires more than just data and some very smart humans. Selecting specific use cases and understanding the types of data that influence AI outcomes — and then validating those outcomes — will be key if the needs of the business are to be met.
By focusing on specific use cases, early successes can then be scaled, and further value can be extracted incrementally. Managers don’t need to be AI experts, but Uptime Institute does recommend that they understand the fundamental depth and breadth of the AI being applied. Doing so means they can better determine how much data is required and how the AI will be using the data, which will be critical when validating results and recommendations.
As a first step, let’s address a few points about AI. First, what is the difference between algorithms and models? AI marketers can use these terms to mean the same thing, although they are not.
An algorithm is a sequence of mathematical steps or computational instructions. It is an automated instruction set. An algorithm can be a single instruction or a sequence of instructions — its complexity depends on how simple or complex each individual instruction is and/or the sheer number of instructions that the algorithm needs to execute.
In AI, a model refers to a mathematical model that is able to process data and provide the expected response to or outcome of that data. For example, if an algorithm is applied to a data set, the outcome would be the model. So, the model is the outcome of one or many algorithms. A model changes if the data fed into the algorithm changes, or if the same data is fed through a different algorithm.
Another very important distinction is between the two main types of AI techniques being used in data centers today: machine learning and deep learning.
There are three main types of machine learning techniques:
Supervised learning: Humans supply a model and training data. Algorithms take the training data and fine-tune the model so the inputs and outputs/responses are more closely aligned. As more data is added over time, the algorithms further improve the model and can make reasonable predictions for responses to new data. Supervised machine learning is most commonly used in data centers and other industries.
Unsupervised learning: Algorithms find patterns or intrinsic structures in unlabeled data. In some scenarios, unsupervised machine learning techniques are combined with supervised ones. In effect, the output of unsupervised machine learning can become the training data for supervised machine learning.
Reinforcement learning: Humans supply a model and unlabeled data. When an algorithm determines an optimal outcome for the data, it is reinforced by a positive mathematical “reward.” (An open-source reinforcement learning model from Google is appropriately called Dopamine.) By providing feedback, it learns through different variations. Of these, reinforcement learning is the newest machine learning technique.
Deep learning, a subset of machine learning, uses multiple layers of artificial neural networks to build algorithms, based on vast data, that find an optimal way to make decisions or perform tasks on their own. Humans supply training data and algorithms, and the computer breaks down these inputs into a hierarchy of very simple concepts. Each concept becomes a mathematical node on the neutral network. Instead of using machine learning models from humans, deep learning uses the training data like a neural network, which works like a decision tree. It builds new models from its own analysis of the training data.
Which technique is best for which use case? It depends on the quality and sophistication of the algorithm, as well as the model and data being used. If all these things are equal, however, there are certain techniques that are particularly well-suited to certain use cases.
Some say deep learning can find greater levels of inefficiencies because it is unfettered by known models. On the other hand, supervised machine learning is more transparent (making it easier for domain-expert humans to validate results) and, arguably, quicker to automate.
It can vary but below are some use cases that can be well-suited to different types of machine learning and for deep learning.
It is still early days, but it is likely that certain techniques will dominate specific use cases over time.
At a minimum, operators should understand the fundamental level of the depth and breadth of the AI being applied. Ask the supplier to show the data points in the model and the relationship between those items — in other words, how the AI is using the data to make recommendations for action. And, of course, it is always important to track the results when actions are taken (by a human operator).
The full report Very smart data centers: How artificial intelligence will power operational decisions is available to members of the Uptime Institute Network community. For more information about membership, click here.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/08/GettyImages-1135342603-blog.jpg18414928Rhonda Ascierto, Vice President, Research, Uptime Institutehttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngRhonda Ascierto, Vice President, Research, Uptime Institute2019-08-12 08:00:442019-11-18 08:49:50Data center AI: Start with the end in mind
We have heard for a dozen years about the Open Computing Project (OCP) and their non-traditional approach to computing hardware, from the racks to the servers, storage and networking. And over the last few years to the Open19 Foundation started to promote their alternative platform which resembles the more traditional 19-inch rack approach we have all known since the 1980’s. (Note: the official Open19 formal specification was only publicly released this past year). But neither of these approaches has resulted in earth-shattering, data center changing deployments outside of the web-scalers and a handful of early adopter pilots and test-beds. And many argue that companies like Facebook that are heavily invested in the OCP platform, have essentially created a hardware design that works for THEM (mostly) and that very few other companies could realize the stated savings that Facebook enjoys with this ‘non-standard’ approach due to many factors, including hardware, software, staffing, support, training and compatibility concerns.
But time continues to move forward and some of those core values we have grown up on in the data center are changing too. Think of storage and networking approaches now versus 10 years ago. Think of what VMware looked like back then versus today. Vastly different. Think of the skills needed back then versus now. So perhaps there is room for a new platform. As the staffing for IT gets new blood, perhaps they can cut their teeth on a new platform?
As such the Open19 Foundation has designated 2019 the Year of Accelerated Adoption for the ‘new platform’, its Open19 specification, which defines four server profiles, based on a standard 19-inch rack. Open19 expects the specification to be the basis of flexible and economic data center and edge solutions for facilities of many sizes and densities.
At the organization’s May 2019 summit, Yuval Bachar, president of the Open19 Foundation and principal engineer of data center architecture at LinkedIn, told Data Center Knowledge that Open19 gear has been deployed at scale in the social network’s facilities in Oregon and Texas. In addition, two mega data centers are running proofs of concept and six other companies are deploying or evaluating Open19 technology.
These early deployments support recent Uptime Institute Intelligence findings: Just3% of respondents to our Ninth Annual Uptime Institute Data Center Survey (available to Uptime Institute Network members) said they were deploying Open19 hardware or designs, with another eight percent evaluating Open19. That’s a total of about 50 respondents deploying or evaluating Open19. However, 54% of respondents said that they were not aware of Open19.
Despite these survey results, we agree with the Foundation: conditions may be right for an increase in Open19 adoption.
Viewed from one perspective, these adoption (or planned adoption) figures are really quite impressive: until its public release on March 12, 2019, the Open19 project specification was available only to the current Foundation members (including founding members Flex, GE Digital, Hewlett Packard Enterprise, Packet, LinkedIn and Vapor IO). The public release of the Open19 standard greatly increases the potential for new product options and deployments.
We found an additional point of interest in our survey data: senior executives (56%) and designers (47%) are more aware of Open19 than IT management (41%) and critical facilities management (41%). Senior executives (16%) and design engineers (17%) are also far more likely to say that they are deploying or considering Open19 designs or hardware than IT management (6%) and critical facilities management (9%). One possibility: Open19 designs and hardware are making their way into production without serious disruption to the routines of IT management and critical facilities management. That would be a promising development for Open19.
——————————————————————————–
For more information on OCP and Open19 and other data center standards, a wealth of research is available to members of the Uptime Institute Network. Members enjoy a continuous stream of relevant and actionable knowledge from our analysts and share a wealth of experiences with their peers from some of the largest companies in the world. Membership instills a primary consciousness about operational efficiency and best practices which can be put into action everyday. For membership information click here.
https://journal.uptimeinstitute.com/wp-content/uploads/2019/06/Open19-Rack-Photo-wide.jpg6881883Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2019-08-05 09:00:032019-07-07 06:15:37Open19 expects 2019 to be the year of “Accelerated Adoption”
Data Center Free Air Cooling Trends
/in Design, Executive/by Rabih BashroushWith the recent expansion of the American Society of Heating, Refrigerating and Air-Conditioning Engineers’ (ASHRAE’s) acceptable data center operating temperature and humidity ranges — taken as an industry-standard best practice by many operators — the case for free air cooling has become much stronger. Free air cooling is an economical method of using low external air temperature to cool server rooms.
In the 2019 Uptime Institute Supply-side Survey (available to member of the Uptime Institute Network) we asked over 500 data center vendors, consultants and engineers about their customers’ adoption of free air economizer cooling (the use of outside air or a combination of water and air to supplement mechanical cooling) using the following approaches:
Findings from the survey show that free air cooling economization projects continue to gain traction, with indirect free air cooling being slightly more popular than direct air. In our survey, 84% said that at least some of their customers are deploying indirect air cooling (74% for direct air). Only 16% of participants said that none of their customers are deploying indirect free air cooling (26% for direct air), as shown in the figure below.
The data suggests that there is more momentum behind direct free air cooling in North America than in other parts of the world. Among North American respondents, 70% indicated that some of their customers are deploying direct air cooling (compared with 63% indirect air). As shown in the figure below, this was not the case in Europe or Asia-Pacific, where suppliers reported that more customers were deploying indirect air. This perhaps could be linked to the fact that internet giants represent a bigger data center market share in North America than in other parts of the world — internet giants are known to favor direct free air cooling when deploying at scale.
The continued pressure to increase cost-efficiency, as well as the rising awareness and interest in environmental impact, is likely to continue driving uptake of free air cooling. Compared with traditional compressor-based cooling systems, free air cooling requires less upfront capital investment and involves lower operational expenses, while having a lower environmental impact (e.g., no refrigerants, low embedded carbon and a higher proportion of recyclable components).
Yet, some issues hampering free air cooling uptake will likely continue in the short term. These include the upfront retrofit investment required for existing facilities; humidity and air quality constraints (which are less of a problem for indirect air cooling); lack of reliable weather models in some areas (and the potential impact of climate change); and restrictive service level agreements, particularly in the colocation sector.
Moreover, a lack of understanding of the ASHRAE standards and clarity around IT equipment needs is driving some operators to design to the highest common denominator, particularly when hosting legacy or mixed IT systems. The opportunity to take advantage of free air cooling is missed as a result, due to the perceived need to adopt lower operating temperatures.
Going forward, at least in Europe, this problem might be partially addressed by the introduction of the new European EcoDesign legislation for servers and online storage devices, which will take effect from March 2020. The new legislation will require IT manufacturers to declare the operating condition classes and thermal performance of their equipment. This, in turn, will help enterprise data centers better optimize their operations by segregating IT equipment based on ambient operating requirements.
The full report Uptime Institute data center supply-side survey 2019 is available to members of the Uptime Institute Network. You can become a member or request guest access by looking here or contacting any member of the Uptime Institute team.
The Evolving Data Center Management Maturity Model, A Quick Update
/in Operations/by Rhonda Ascierto, Vice President, Research, Uptime InstituteUptime Institute has long argued that, although it may take many years, the long-term trend is toward a high level of automation in the data center, covering many functions that most managers currently would not trust to machines or outside programmers.
Recent advances in artificial intelligence (AI) have made this seem more likely. (For more on data center AI, see our report Very smart data centers: How artificial intelligence will power operational decisions.)
Our data center management maturity model shows this long-term evolution.
In our model, we have mapped different levels of operating efficiency to different stages of deployment of data center infrastructure management (DCIM) software. For any manager who is looking to buy DCIM or has already implemented the software and seeks expanded features or functions, we encourage them to consider their short- and long-term automation goals.
Today, most DCIM deployments fall into Level 2 or Level 3 of the model. A growing number of organizations are targeting Level 3 by integrating DCIM data with IT, cloud service and other non-facility data, as discussed in the report Data center management software and services: Effective selection and deployment (co-authored with Andy Lawrence).
The advent of AI-driven, cloud-based services will, we believe, drive greater efficiencies and, when deployed in combination with on-premises DCIM software, enable more data centers to reach Level 4 (and, over time, Level 5).
Although procurement decisions today may be only minimally affected by current automation needs, a later move toward greater automation should be considered, especially in terms of vendor choice/lock-in and integration.
Integration capabilities, as well as the use and integration of AI (including AI-driven cloud services), are important factors in both the overall strategic decision to deploy DCIM and the choice of a particular supplier/platform.
The full report Data center management software and services: Effective selection and deployment is available to members of the Uptime Institute Network here.
DCIM as a Hub: Integrations Make all the difference
/in Operations/by Rhonda Ascierto, Vice President, Research, Uptime InstituteToday, the role that the physical data center plays in software-defined data centers, particularly facility design and operational management, is often overlooked. However, this is likely to change.
As more networking and compute becomes virtualized and flexible, so too must data center resources, in order to achieve maximum agility and efficiency. To virtualize only IT and networking resources is to optimize only the top layers of the stack; the supply of underlying physical data center resources — power, cooling, space — must also be tightly coupled to IT demand and resources, and automated accordingly.
This is where data center infrastructure management (DCIM) software comes into play. Leading DCIM platforms enable not just the operational management of data centers, but also the automation of key resources, such as power and cooling. For dynamic resource management, integration with IT and other non-facility data is key.
By integrating DCIM, organizations can tightly couple demand for the virtualized and logical resources (IT and networking) with the supply of physical facility resources (power, cooling and space). Doing so enables cost efficiencies and reduces the risk of service interruption due to under-provisioning.
Integrating DCIM also enables more informed decision-making around best-execution venues (internally and for colocation customers), taking into account the cost and availability of IT, connectivity and data center resources.
While integration is typically a “phase two” strategy (i.e., following the full deployment of a DCIM suite), integration goals should be established early on. The figure below is a simplified view of some of the data sources from systems spanning the IT stack that DCIM can integrate with. Uptime Institute Intelligence’s report Data center management software and services: Effective selection and deployment provides a better understanding of what is required.
Which processes are likely to require multisystem integration? Here are some examples:
Monitoring capacity across clouds and on-premises data centers (enterprise and colocation)
Adjusting or moving workloads according to availability or energy costs/reliability, or to reduce risk during maintenance
Colocation portal providing key data to customers
Data center service-based costing (real-time, chargeback)
Cloud-based resiliency/disaster recovery
Unified incident/problem management
Identifying and eliminating underused/comatose servers
End-to-end financial planning
Automated services (provisioning, colocation customer onboarding, audits, etc.)
We are seeing more organizations, including large enterprises and colos, invest in DCIM integrations for higher levels of visibility with a goal of end-to-end automation.
Ultimately, by integrating DCIM with IT and other systems, organizations can more effectively plan for data center capacity investments. Using DCIM to optimize the use of existing facilities could also mean enterprises and colos may need fewer or smaller facilities in the future.
The full report Data center management software and services: Effective selection and deployment is available to members of the Uptime Institute Network here.
IT Outages in the Airline Industry, A New Report by the GAO
/in Executive, Operations/by Kevin HeslinUptime Institute’s Annual outage analysis, published early this year, called attention to the persistent problem of IT service and data center outages. Coupled with our annual survey data on outages, the analysis explains, to a degree, why investments to date have not greatly reduced the outage problem — at least from an end-to-end service view.
Gathering outage data is a challenge: there is no centralized database of outage reports in any country (that we are aware of) and, short of mandatory rules, there probably won’t be. Uptime Institute’s outage analysis relied on reports in the media, which skews the findings, and on survey data, which has its own biases. Other initiatives have similar limitations.
The US government also struggles to get an accurate accounting of data center/IT outages, even in closely watched industries with a public profile. The US General Accounting Office (GAO) recently issued a report (GAO-19-514) in which it documents 34 IT outages from 2015 through 2017 that affected 11 of the 12 selected (domestic US) airlines included in the report. The GAO believes that about 85% of the outages resulted in some flight delays or cancellations and 14% caused a ground stop of several hours or more. And directly related, Uptime Institute identified 10 major outages affecting the airline industry worldwide in the period since January 2016.
The Uptime Institute data is drawn from media reports and other more direct sources. It is not expected to be comprehensive. Many, many outages are kept as quiet as possible and the parties involved do their best to downplay the impact. The media-based approach provides insights, but probably understates the extent of the outage problem — at least in the global airline industry.
Government data is not complete either. The GAO explicitly notes many circumstances in which information about airline IT outages is unavailable to it and other agencies, except in unusual cases. These circumstances might involve smaller airlines and airports that don’t get attention. The GAO also notes that delays and cancellations can have multiple causes, which can reduce the number of instances in which an IT outage is blamed. The GAO’s illustration below provides examples of potential IT outage effects.
The report further notes: “No government data were available to identify IT outages or determine how many flights or passengers were affected by such outages. Similarly, the report does not describe the remedies given to passengers or their costs.” We do know, of course, that some airlines — Delta and United are two examples — have faced significant outage-related financial consequences.
Consumer complaints stemming from IT outages accounted for less than one percent of all complaints received by the US Department of Transportation from 2015 through June 2018, according to agency officials. These complaints raised concerns similar to those resulting from more common causes of flight disruption, such as weather. It is likely that all these incidents bring reputation costs to airlines that are greater than the operational costs the incidents incur.
The GAO did not have the mandate to identify the causes of outages it identified. The report describes possible causes in general terms. These include aging and legacy systems, incompatible systems, complexity, inter-dependencies, and a transition to third-party and cloud systems. Other issues included hardware failures, software outages or slowdowns, power or telecommunications failures, and network connectivity.
The GAO said, “Representatives from six airlines, an IT expert, and four other aviation industry stakeholders pointed to a variety of factors that could contribute to an outage or magnify the effect of an IT disruption. These factors ranged from under-investment in IT systems after years of poor airline profitability, increasing requirements on aging systems or systems not designed to work together, and the introduction of new customer-oriented platforms and services.” All of this is hardly breaking news to industry professionals, and many of these issues have been discussed in Uptime Institute meetings and in our 2016 Airline outages FAQ.
The report cites prevention efforts that reflect similarly standard themes, with five airlines moving to hybrid models (spreading workloads and risk, in theory) and two improving connectivity by using multiple telecommunications network providers. Stakeholders interviewed by the GAO mentioned contingency planning, recovery strategies and routine system testing; the use of artificial intelligence (although it is not clear for what functions); and outage drills as means for avoiding and minimizing system disruptions. (The Uptime Institute Digital Infrastructure Resiliency Assessment helps organizations better understand where their strengths and weaknesses lie.)
In short, the GAO was able to throw some light on a known problem but was not able to generate a complete record of outages in the US airline industry, provide an estimate of direct or indirect costs, explain their severity and impact, or pinpoint their causes. As a result, each airline is on its own to determine whether it will investigate outages, identify causes or invest in remedies. There is little information sharing; Uptime Institute’s Abnormal Incident Reporting System examines causes for data center-specific events, but it is not industry specific and would not capture many network or IT-related events. Although there have been some calls for greater sharing, within industries and beyond, there is little sign that most operators are willing to openly discuss causes and failures owing to the dangers of further reputation damage, lawsuits and exploitation by competition.
—————————————————————————————————————————————————-
Access to our complete annual outages reports, data center survey results, abnormal Incident reporting data, energy efficiency in the data center and a wealth of other topics is available to members of Uptime Institute Network. Want to know more about this organization? Check out the complete benefits and request a trial of membership in the community here.
Data center AI: Start with the end in mind
/in Executive, Operations/by Rhonda Ascierto, Vice President, Research, Uptime InstituteAn artificial intelligence (AI) strategy for data center management and operation requires more than just data and some very smart humans. Selecting specific use cases and understanding the types of data that influence AI outcomes — and then validating those outcomes — will be key if the needs of the business are to be met.
By focusing on specific use cases, early successes can then be scaled, and further value can be extracted incrementally. Managers don’t need to be AI experts, but Uptime Institute does recommend that they understand the fundamental depth and breadth of the AI being applied. Doing so means they can better determine how much data is required and how the AI will be using the data, which will be critical when validating results and recommendations.
In a recent report written by Uptime Institute Intelligence entitled “Very smart data centers: How artificial intelligence will power operational decisions” on the subject of AI in data centers, should provide a better understanding of what is required.
As a first step, let’s address a few points about AI. First, what is the difference between algorithms and models? AI marketers can use these terms to mean the same thing, although they are not.
An algorithm is a sequence of mathematical steps or computational instructions. It is an automated instruction set. An algorithm can be a single instruction or a sequence of instructions — its complexity depends on how simple or complex each individual instruction is and/or the sheer number of instructions that the algorithm needs to execute.
In AI, a model refers to a mathematical model that is able to process data and provide the expected response to or outcome of that data. For example, if an algorithm is applied to a data set, the outcome would be the model. So, the model is the outcome of one or many algorithms. A model changes if the data fed into the algorithm changes, or if the same data is fed through a different algorithm.
Another very important distinction is between the two main types of AI techniques being used in data centers today: machine learning and deep learning.
There are three main types of machine learning techniques:
Deep learning, a subset of machine learning, uses multiple layers of artificial neural networks to build algorithms, based on vast data, that find an optimal way to make decisions or perform tasks on their own. Humans supply training data and algorithms, and the computer breaks down these inputs into a hierarchy of very simple concepts. Each concept becomes a mathematical node on the neutral network. Instead of using machine learning models from humans, deep learning uses the training data like a neural network, which works like a decision tree. It builds new models from its own analysis of the training data.
Which technique is best for which use case? It depends on the quality and sophistication of the algorithm, as well as the model and data being used. If all these things are equal, however, there are certain techniques that are particularly well-suited to certain use cases.
Some say deep learning can find greater levels of inefficiencies because it is unfettered by known models. On the other hand, supervised machine learning is more transparent (making it easier for domain-expert humans to validate results) and, arguably, quicker to automate.
It can vary but below are some use cases that can be well-suited to different types of machine learning and for deep learning.
It is still early days, but it is likely that certain techniques will dominate specific use cases over time.
At a minimum, operators should understand the fundamental level of the depth and breadth of the AI being applied. Ask the supplier to show the data points in the model and the relationship between those items — in other words, how the AI is using the data to make recommendations for action. And, of course, it is always important to track the results when actions are taken (by a human operator).
The full report Very smart data centers: How artificial intelligence will power operational decisions is available to members of the Uptime Institute Network community. For more information about membership, click here.
Open19 expects 2019 to be the year of “Accelerated Adoption”
/in Design, Executive/by Kevin HeslinWe have heard for a dozen years about the Open Computing Project (OCP) and their non-traditional approach to computing hardware, from the racks to the servers, storage and networking. And over the last few years to the Open19 Foundation started to promote their alternative platform which resembles the more traditional 19-inch rack approach we have all known since the 1980’s. (Note: the official Open19 formal specification was only publicly released this past year). But neither of these approaches has resulted in earth-shattering, data center changing deployments outside of the web-scalers and a handful of early adopter pilots and test-beds. And many argue that companies like Facebook that are heavily invested in the OCP platform, have essentially created a hardware design that works for THEM (mostly) and that very few other companies could realize the stated savings that Facebook enjoys with this ‘non-standard’ approach due to many factors, including hardware, software, staffing, support, training and compatibility concerns.
But time continues to move forward and some of those core values we have grown up on in the data center are changing too. Think of storage and networking approaches now versus 10 years ago. Think of what VMware looked like back then versus today. Vastly different. Think of the skills needed back then versus now. So perhaps there is room for a new platform. As the staffing for IT gets new blood, perhaps they can cut their teeth on a new platform?
As such the Open19 Foundation has designated 2019 the Year of Accelerated Adoption for the ‘new platform’, its Open19 specification, which defines four server profiles, based on a standard 19-inch rack. Open19 expects the specification to be the basis of flexible and economic data center and edge solutions for facilities of many sizes and densities.
At the organization’s May 2019 summit, Yuval Bachar, president of the Open19 Foundation and principal engineer of data center architecture at LinkedIn, told Data Center Knowledge that Open19 gear has been deployed at scale in the social network’s facilities in Oregon and Texas. In addition, two mega data centers are running proofs of concept and six other companies are deploying or evaluating Open19 technology.
These early deployments support recent Uptime Institute Intelligence findings: Just 3% of respondents to our Ninth Annual Uptime Institute Data Center Survey (available to Uptime Institute Network members) said they were deploying Open19 hardware or designs, with another eight percent evaluating Open19. That’s a total of about 50 respondents deploying or evaluating Open19. However, 54% of respondents said that they were not aware of Open19.
Despite these survey results, we agree with the Foundation: conditions may be right for an increase in Open19 adoption.
Viewed from one perspective, these adoption (or planned adoption) figures are really quite impressive: until its public release on March 12, 2019, the Open19 project specification was available only to the current Foundation members (including founding members Flex, GE Digital, Hewlett Packard Enterprise, Packet, LinkedIn and Vapor IO). The public release of the Open19 standard greatly increases the potential for new product options and deployments.
We found an additional point of interest in our survey data: senior executives (56%) and designers (47%) are more aware of Open19 than IT management (41%) and critical facilities management (41%). Senior executives (16%) and design engineers (17%) are also far more likely to say that they are deploying or considering Open19 designs or hardware than IT management (6%) and critical facilities management (9%). One possibility: Open19 designs and hardware are making their way into production without serious disruption to the routines of IT management and critical facilities management. That would be a promising development for Open19.
——————————————————————————–
For more information on OCP and Open19 and other data center standards, a wealth of research is available to members of the Uptime Institute Network. Members enjoy a continuous stream of relevant and actionable knowledge from our analysts and share a wealth of experiences with their peers from some of the largest companies in the world. Membership instills a primary consciousness about operational efficiency and best practices which can be put into action everyday. For membership information click here.