Examining the scope of the challenge
By David Schirmacher
Digital Realty’s 127 properties cover around 24 million square feet of mission-critical data center space in over 30 markets across North America, Europe, Asia and Australia, and it continues to grow and expand its data center footprint. As senior VP of operations, it’s my job to ensure that all of these data centers perform consistently—that they’re operating reliably and at peak efficiency and delivering best-in-class performance to our 600-plus customers.
At its core, this challenge is one of managing information. Managing any one of these data centers requires access to large amounts of operational data.
If Digital Realty could collect all the operational data from every data center in its entire portfolio and analyze it properly, the company would have access to a tremendous amount of information that it could use to improve operations across its portfolio. And that is exactly what we have set out to do by rolling out what may be the largest-ever data center infrastructure management (DCIM) project.
Earlier this year, Digital Realty launched a custom DCIM platform that collects data from all the company’s properties, aggregates it into a data warehouse for analysis, and then reports the data to our data center operations team and customers using an intuitive browser-based user interface. Once the DCIM platform is fully operational, we believe we will have the ability to build the largest statistically meaningful operational data set in the data center industry.
Business Needs and Challenges
The list of systems that data center operators report using to manage their data center infrastructure often includes a building management system, an assortment of equipment-specific monitoring and control systems, possibly an IT asset management program and quite likely a series of homegrown spreadsheets and reports. But they also report that they don’t have access to the information they need. All too often, the data required to effectively manage a data center operation is captured by multiple isolated systems, or worse, not collected at all. Accessing the data necessary to effectively manage a data center operation continues to be a significant challenge in the industry.
At every level, data and access to data are necessary to measure data center performance, and DCIM is intrinsically about data management. In 451 Research’s DCIM: Market Monitor Forecast, 2010-2015, analyst Greg Zwakman writes that a DCIM platform, “…collects and manages information about a data center’s assets, resource use and operational status.” But 451 Research’s definition does not end there. The collected information “…is then distributed, integrated, analyzed and applied in ways that help managers meet business and service-oriented goals and optimize their data center’s performance.” In other words, a DCIM platform must be an information management system that, in the end, provides access to the data necessary to drive business decisions.
Over the years, Digital Realty successfully deployed both commercially available and custom software tools to gather operational data at its data center facilities. Some of these systems provide continuous measurement of energy consumption and give our operators and customers a variety of dashboards that show energy performance. Additional systems deliver automated condition and alarm escalation, as well as work order generation. In early 2012 Digital Realty recognized that the wealth of data that could be mined across its vast data center portfolio was far greater than current systems allowed.
In response to this realization, Digital Realty assembled a dedicated and cross-functional operations and technology team to conduct an extensive evaluation of the firm’s monitoring capabilities. The company also wanted to leverage the value of meaningful data mined from its entire global operations.
The team realized that the breadth of the company’s operations would make the project challenging even as it began designing a framework for developing and executing its solution. Neither Digital Realty nor its internal operations and technology teams were aware of any similar development and implementation project at this scale—and certainly not one done by an owner/operator.
As the team analyzed data points across the company portfolio, it found additional challenges. Those challenges included how to interlace the different varieties and vintages of infrastructure across the company’s portfolio, taking into consideration the broad deployment of Digital Realty’s Turn-Key Flex data center product, the design diversity of its custom solutions and acquired data center locations, the geographic diversity of the sites and the overall financial implications of the undertaking as well as its complexity.
Drilling Down
Many data center operators are tempted to first explore what DCIM vendors have to offer when starting a project, but taking the time to gain internal consensus on requirements is a better approach. Since no two commercially available systems offer the same features, assessing whether a particular product is right for an application is almost impossible without a clearly defined set of requirements. All too often, members of due diligence teams are drawn to what I refer to as “eye candy” user interfaces. While such interfaces might look appealing, the 3-D renderings and colorful “spinning visual elements” are rarely useful and can often be distracting to a user whose true goal is managing operational performance.
When we started our DCIM project, we took a highly disciplined approach to understanding our requirements and those of our customers. Harnessing all the in-house expertise that supports our portfolio to define the project requirements was itself a daunting task but essential to defining the larger project. Once we thought we had a firm handle on our requirements, we engaged a number of key customers and asked them what they needed. It turned out that our customers’ requirements aligned well with those our internal team had identified. We took this alignment as validation that we were on the right track. In the end, the project team defined the following requirements:
• The first of our primary business requirements was global access to consolidated data. We required every single one of Digital Realty’s data centers have access to the data, and we needed the capability to aggregate data from every facility into a consolidated view, which would allow us to compare performance of various data centers across the portfolio in real time.
• Second, the data access system had to be highly secured and give us the ability to limit views based on user type and credentials. More than 1,000 people in Digital Realty’s operations department alone would need some level of data access. Plus, we have a broad range of customers who would also need some level of access, which highlights the importance of data security.
• The user interface also had to be extremely user-friendly. If we didn’t get that right, Digital Realty’s help desk would be flooded with requests on how to use the system. We required a clean navigational platform that is intuitive enough for people to access the data they need quickly and easily, with minimal training.
• Data scalability and mining capability were other key requirements. The amount of information Digital Realty has across its many data centers is massive, and we needed a database that could handle all of it. We also had to ensure that Digital Realty would get that information into the database. Digital Realty has a good idea of what it wants from its dashboard and reporting systems today, but in five years the company will want access to additional kinds of data. We don’t want to run into a new requirement for reporting and not have the historical data available to meet it.
Other business requirements included:
• Open bidirectional access to data that would allow the DCIM system to exchange information with
other systems, including computerized maintenance management systems (CMMS), event management,
procurement and invoicing systems
• Real-time condition assessment that allows authorized users to instantly see and assess operational
performance and reliability at each local data center as well as at our central command center
• Asset tracking and capacity management
• Cost allocation and financial analysis to show not only how much energy is being consumed but also how
that translates to dollars spent and saved
• The ability to pull information from individual data centers back to a central location using minimal re
sources at each facility
Each of these features was crucial to Digital Realty. While other owners and operators may share similar requirements, the point is that a successful project is always contingent on how much discipline is exercised in defining requirements in the early stages of the project—before users become enamored by the “eye candy” screens many of these products employ.
To Buy or Build?
With 451 Research’s DCIM definition—as well as Digital Realty’s business requirements—in mind, the project team could focus on delivering an information management system that would meet the needs of a broad range of user types, from operators to C-suite executives. The team wanted DCIM to bridge the gap between facilities and IT systems, thus providing data center operators with a consolidated view of the data that would meet the requirements of each user type.
The team discussed whether to buy an off-the-shelf solution or to develop one on its own. A number of solutions on the market appeared to address some of the identified business requirements, but the team was unable to find a single solution that had the flexibility and scalability required to support all of Digital Realty’s operational requirements. The team concluded it would be necessary to develop a custom solution.
Avoiding Unnecessary Risk
There is significant debate in the industry about whether DCIM systems should have control functionality—i.e., the ability to change the state of IT, electrical and mechanical infrastructure systems. Digital Realty strongly disagrees with the idea of incorporating this capability into a DCIM platform. By its very definition, DCIM is an information management system. To be effective, this system needs to be accessible to a broad array of users. In our view, granting broad access to a platform that could alter the state of mission-critical systems would be careless, despite security provisions that would be incorporated into the platform.
While Digital Realty and the project team excluded direct-control functionality from its DCIM requirements, they saw that real-time data collection and analytics could be beneficial to various control-system schemas within the data center environment. Because of this potential benefit, the project team took great care to allow for seamless data exchange between the core database platform and other systems. This feature will enable the DCIM platform to exchange data with discrete control subsystems in situations where the function would be beneficial. Further, making the DCIM a true browser-based application would allow authorized users to call up any web-accessible control system or device from within the application. These users could then key in the additional security credentials of that system and have full access to it from within the DCIM platform. Digital Realty believes this strategy fully leverages the data without compromising security.
The Challenge of Data Scale
Managing the volume of data generated by a DCIM is among the most misunderstood areas of DCIM development and application. A DCIM platform collects, analyzes and stores a truly immense volume of data. Even a relatively small data center generates staggering amounts of information—billions of annual data transactions—that few systems can adequately support. By contrast, most building management systems (BMS) have very limited capability to manage significant amounts of historical data for the purposes of defining ongoing operational performance and trends.
Consider a data center with a 10,000-ft2 data hall and a traditional BMS that monitors a few thousand data points associated mainly with the mechanical and electrical infrastructure. This system communicates in near real time with devices in the data center to provide control- and alarm-monitoring functions. However, the information streams are rarely collected. Instead they are discarded after being acted on. Most of the time, in fact, the information never leaves the various controllers distributed throughout the facility. Data are collected and stored at the server for a period of time only when an operator chooses to manually initiate a trend routine.
If the facility operators were to add an effective DCIM to the facility, it would be able to collect much more data. In addition to the mechanical and electrical data, the DCIM could collect power and cooling data at the IT rack level and for each power circuit supporting the IT devices. The DCIM could also include detailed information about the IT devices installed in the racks. Depending on the type and amount desired, data collection could easily required 10,000 points.
But the challenge facing this facility operator is even more complex. In order to evaluate performance trends, all the data would need to be collected, analyzed and stored for future reference. If the DCIM were to collect and store a value for each data point for each minute of operation, it would have more than five billion transactions per year. And this would be just the data coming in. Once collected, the five billion transactions would have to be sorted, combined and analyzed to produce meaningful output. Few, if any, of the existing technologies installed in a typical data center have the ability to manage this volume of information. In the real world, Digital Realty is trying to accomplish this same goal across its entire global portfolio.
The Three Silos of DCIM
As Digital Realty’s project team examined the process of developing a DCIM platform, it found that the challenge included three distinct silos of data functionality: the engine for collection, the logical structures for analysis and the reporting interface.
Figure 1. Digital Realty’s view of the DCIM stack.
The engine of Digital Realty’s DCIM must reach out and collect vast quantities of data from the company’s entire portfolio (see Figure 1). The platform will need to connect to all the sites and all the systems within these sites to gather information. This challenge requires a great deal of expertise in the communication protocols of these systems. In some instances, accomplishing this goal will require “cracking” data formats that have historically stranded data within local systems. Once collected, the data to be checked for integrity and packaged for reliable transmission to the central data store.
The project team also faced the challenge of creating the logical data structures that to process, analyze and archive the data once the DCIM has successfully accessed and transmitted the raw data from each location to the data store. Dealing with 100-plus data centers, often with hundreds of thousands of square feet of white space each, increases the scale of the challenge exponentially. The project team overcame a major hurdle in addressing this challenge when it was able to define relationships between various data categories that allowed the database developers to prebuild and then volume-test data structures to ensure they were up to the challenge.
These data structures, or “data hierarchies” as Digital Realty’s internal team refers to them, are the “secret sauce” of the solution (see Figure 2). Many of the traditional monitoring and control systems in the marketplace require a massive amount of site-level point mapping that is often field-determined by local installation technicians. These points are then manually associated with the formulas necessary to process the data. This manual work is why these projects often take much longer to deploy and can be difficult to commission as mistakes are flushed out.
Figure 2. Digital mapped all the information sources and their characteristics as a step to developing its DCIM.
In this solution, these data relationships have been predefined and are built into the core database from the start. Since this solution is targeted specifically to a data center operation, the project team was able to identify a series of data relationships, or hierarchies, that can be applied to any data center topology and still hold true.
For example, an IT application such as an email platform will always be installed on some type of IT device or devices. These devices will always be installed in some type of rack or footprint in a data room. The data room will always be located on a floor, the floor will always be located in a building, the building in a campus or region, and so on, up to the global view. The type of architectural or infrastructure design doesn’t matter; the relationship will always be fixed.
The challenge is defining a series of these hierarchies that always test true, regardless of the design type. Once designed, the hierarchies can be pre-built, their validity tested and they can be optimized to handle scale. There are many opportunities for these kinds of hierarchies. This is exactly what we have done.
Having these structures in place facilitates rapid deployment and minimizes data errors. It also streamlines the dashboard analytics and reporting capabilities, as the project team was able to define specific data requirements and relationships and then point the dashboard or report at the layer of the hierarchy to be analyzed. For example, a single report template designed to look at IT assets can be developed and optimized and would then rapidly return accurate values based on where the report was pointed. If pointed at the rack level, the report would show all the IT assets in the rack; if pointed at the room level, the report would show all the assets in the room, and so on. Since all the locations are brought into a common predefined database, the query will always yield an apples-to-apples comparison regardless of any unique topologies existing at specific sites.
Figure 3. Structure and analysis as well as web-based access were important functions.
Last remains the challenge of creating the user interface, or front end, for the system. There is no point in collecting and processing the data if operators and customers can’t easily access it. A core requirement was that the front end needed to be a true browser-based application. Terms like “web-based” or “web-enabled” are often used in the control industry to disguise the user interface limitations of existing systems. Often to achieve some of the latest visual and 3-D effects, vendors will require the user’s workstation to be configured with a variety of thin-client applications. In some cases, full-blown applications have to be installed. For Digital Realty, installing add-ins on workstations would be impractical given the number of potential users of the platform. In addition, in many cases, customers would reject these installs due to security concerns. A true browser-based application requires only a standard computer configuration, a browser and the correct security credentials (see Figure 3).
Intuitive navigation is another key user interface requirement. A user should need very little training to get to the information they need. Further, the information should be displayed to ensure quick and accurate assessment of the data.
Digital Realty’s DCIM Solution
Digital Realty set out to build and deploy a custom DCIM platform to meet all these requirements. Rollout commenced in May 2013, and as of August, the core team was ahead of schedule in terms of implementing the DCIM solution across the company’s global portfolio of data centers.
The name EnVision reflects the platform’s ability to look at data from different user perspectives. Digital Realty developed EnVision to allow its operators and customers insight into their operating environments and also to offer unique features specifically targeted to colocation customers. EnVision provides Digital Realty with vastly increased visibility into its data center operations as well as the ability to analyze information so it is digestible and actionable. It has a user interface with data displays and reports that are tailored to operators. Finally, it has access to historical and predictive data.
In addition, EnVision provides a global perspective allowing high-level and granular views across sites and regions. It solves the stranded data issue by reaching across all relevant data stores on the facilities and IT sides to provide a comprehensive and consolidated view of data center operations. EnVision is built on an enterprise-class database platform that allows for unlimited data scaling and analysis and provides intuitive visuals and data representations, comprehensive analytics, dashboard and reporting capabilities from an operator’s perspective.
Trillions of data points will be collected and processed by true browser-based software that is deployed on high-availability network architecture. The data collection engine offers real-time, high-speed and high-volume data collection and analytics across multiple systems and protocols. Furthermore, reporting and dashboard capabilities offer visualization of the interaction between systems and equipment.
Executing the Rollout
A project of this scale requires a broad range of skill sets to execute successfully. IT specialists must build and operate the high-availability compute infrastructure that the core platform sits on. Network specialists define the data transport mechanisms from each location.
Control specialists create the data integration for the various systems and data sources. Others assess the available data at each facility, determine where gaps exist and define the best methods and systems to fill those gaps.
The project team’s approach was to create and install the core, head-end compute architecture using a high-availability model and then to target several representative facilities for proof-of-concept. This allowed the team of specialists to work out the installation and configuration challenges and then to build a template so that Digital Realty could repeat the process successfully at other facilities. With the process validated, the program moved onto the full rollout phase, with multiple teams executing across the company’s portfolio.
Even as Digital Realty deploys version 1.0 of the platform, a separate development team continues to refine the user interface with the addition of reports, dashboards and other functions and features. Version 2.0 of the platform is expected in early 2014, and will feature an entirely new user interface, with even more powerful dashboard and reporting capabilities, dynamically configurable views and enhanced IT asset management capabilities.
The project has been daunting, but the team at Digital Realty believes the rollout of the EnVision DCIM platform will set a new standard of operational transparency, further bridging the gap between facilities and IT systems and allowing operators to drive performance into every aspect of a data center operation.
David Schirmacher
David Schirmacher is senior vice president of Portfolio Operations at Digital Realty, where is responsible for overseeing the company’s global property operations as well as technical operations, customer service and security functions. He joined Digital Realty in January 2012. His more than 30 years of relevant experience includes turns as principal and Chief Strategy Officer for FieldView Solutions, where he focused on driving data center operational performance; and vice president, global head of Engineering for Goldman Sachs, where he focused on developing data center strategy and IT infrastructure for the company’s headquarters, trading floor, branch offices and data center facilities around the world. Mr. Schirmacher also held senior executive and technical positions at Compass Management and Leasing and Jones Lang LaSalle. Considered a thought leader within the data center industry, Mr. Schirmacher is president of 7×24 Exchange International and he has served on the technical advisory board of Mission Critical.
Turning back the clock at Barclays Americas’ data centers
By Jan van Halem and Frances Cabrera
Barclays is a major global financial services provider engaged in personal banking, credit cards, corporate and investment banking, and wealth and investment management with an extensive international presence in Europe, the Americas, Africa, and Asia. Barclays has two major data centers in the northeastern United States with production environments that support the Americas region’s operations (see Figure 1). Barclays Corporate Real Estate Solutions (CRES) Engineering team manages the data centers, in close partnership with the Global Technology and Information Services (GTIS) team.
Both of Barclays Americas’ data centers are reaching 7 years old, but not showing their age—at least not energy wise. For the last 3 years, Barclays Americas’ engineering and IT teams have been on an energy-efficiency program to ensure that the company’s data center portfolio continues operating even more efficiently than originally commissioned.
Figure 1. Comparison of DC 1 and DC 2
By 2013, Barclays Americas’ two data centers had reduced their energy consumption by 8,000 (megawatt-hours (MWh) or 8%, which equates to 3,700 tons of carbon emissions avoided. In addition, the power usage effectiveness (PUE) of the largest data center dropped from an annual average of 1.63 to 1.54–earning it an Energy Star certification for 2013.
The Barclays Americas team pinpointed the following strategies for a three-pronged attack on energy inefficiency:
Airflow management
Enhancement of cooling operations
Variable frequency drive (VFD) installations on computer room air
conditioning (CRAC) units and pumps
The goals were to:
Reduce total cooling
Enhance heat transfer efficiencies
Reduce cooling losses (i.e., short cycling of air)
Figure 2. Summary of initiatives implemented across the two data centers: 2010-present.
The team found considerable savings without having to make large capital investments or use complex solutions that would threaten the live data center environment (see Figure 2). The savings opportunities were all identified, tested, implemented, and validated using in-house resources with collaboration between the engineering and IT teams.
Background
In 2010, Barclays launched a Climate Action Programme, which has since been expanded into its 2015 Citizenship Plan. The program included a carbon reduction target of 4% by the end of 2013 from 2010 levels. The target spurred action among the regions to identify and implement energy conservation measures and created a favorable culture for the funding of initiatives. Due to energy initiatives like the data center programs put in place in the Americas, Barclays reduced its carbon emissions globally by 12% in 2012, meeting the target two years early. Barclays is now setting its sights on a further carbon reduction of 10% by 2015.
From the beginning, it was understood that any new data center energy initiatives in the Americas must take into account both IT and building operation considerations. CRES and GTIS worked together on research, testing, and mocking up these initiatives. Their findings were shared with the central teams at headquarters so that case studies could be developed to ensure that the successes can be used at other sites around the world.
Blowing Back the Years
Initially, the team focused on airflow management. The team successfully reduced CRAC unit fan speeds at DC 1 from 70% to 40% by raising temperature set points as recommended by ASHRAE TC9.9, replacing unneeded perforated tiles with solid tiles, and installing Cold Aisle containment. All these changes were implemented with no impact to equipment operation.
The first step, and the easiest, was to evaluate the initial fan speed of the CRAC units at DC 1, and reduce it from 70% to 58%, with no effect on performance. When the DCs were commissioned they were briefly operating at 55°F (15°C), which was quickly increased to 65°F (18°C). By 2010, the temperature set points in Cold Aisles were raised to 75°F (24°C) from 65°F (18°C) in both DC 1 and DC 2 as recommended by updated ASHRAE standard for data center cooling.
Next, Barclays’ teams turned to the perforated tiles. Perforated tiles in DC 1 were replaced with solid tiles. This reduced CRAC fan speeds from 58% to 50%, with no impact on Cold Aisle temperature conditions and equipment operation.
Finally, in 2011, the PDU aisles were retrofitted with containment doors, which further improved airflow efficiency. After the site teams researched various options and recommended a commercial product, an in-house crew completed the installation. The team opted to use an in-house personnel to avoid having external contractors working in a live environment, which meant that the IT operations team felt comfortable having trusted engineering staff in the server rooms. The engineering staff used its expertise and common sense to install the doors with no disturbance to the end users.
Figure 3. Energy savings attributed to cooling enhancements
The team chose not to extend the containment above the aisles. Using data from wireless temperature and humidity sensors located throughout the aisles, the team found that it could still achieve ~80% of the projected energy savings without having the containment reaching the ceiling and avoid the additional cost and potential issues with local fire codes.
With the doors installed, the teams continue monitoring temperatures through the sensors and can control airflow by adjusting CRAC fan speeds,which regulates the amount of supply air in the Cold Aisle to minimize the bypass overflow and ensure efficient air distribution. As a result, return air temperature increased, driving efficiency and allowing further fan speed reductions of 50% to 40%. Again, these reductions were achieved with no impact to operating conditions. Based on these successes, Cold Aisle containment was installed at DC 2 in 2014.
Figure 4. Available free cooling at DC 1 and DC 2
A curve based on manufacturer’s data and direct power readings for the CRAC units enabled the teams to calculate the power reductions associated with volume flow rate reductions As a result, they calculated that the airflow initiatives resulted in 3.6 gigawatt-hours (GWh) of energy annually across DC 1 and DC 2 (see Figure 3).
Staying Cool
DC 1’s primary cooling system is a water-cooled plant consisting of two 1,250-ton chillers and five cooling towers. DC 1 was designed to utilize free cooling technology that feeds condenser water from cooling towers into a heat exchanger. The heat exchanger then cools the building’s chilled water. In 2011, the CRES Engineering team embarked on an initiative to maximize the use of free cooling.
After reviewing operational parameters such as temperature, water flow, and cooling tower fan speeds, Barclays team made the following adjustments:
Adding additional units to the series to slow down the water pumped through the heat exchanger. Making
the water go through two units instead of one increased the efficiency of the heat exchange.
Running two cooling tower unit fans at half the speed of one fan to reduce power demand as the result of
analyzing data from an electrical power monitoring system (EPMS). As a result, the volume of
condenser water was divided among multiple cooling tower units instead of all going into one.
Increasing chilled water temperature 4°F (7°C) from 2011 to today, expanded the period of time that
free cooling is possible (see Figure 4).
Of the three strategies, it is hardest to directly measure and attribute energy savings to enhancing cooling operations, as the changes impact several parts of the cooling plant. Barclays used the EPMS to track power readings throughout the systems, particularly the cooling tower units. The EPMS enables PUE trending and shows the reduction of DC 1’s PUE overtime. Since 2011, it’s dropped 5% to an annual average of 1.54.
Driving Back Inefficiency
Figure 5. Savings attributed to VFDs
In 2013 the teams began an intensive review of VFD technology. They found that considerable energy savings were to be obtained by installing VFDs on several pieces of equipment such as air handlers in plant rooms and condenser water pumps in both DC 1 and DC 2. The VFDs control the speed of an existing AC induction motor. By reducing the speed on air handlers, the unit load can be adjusted to the existing heat load (see Figure 5).
The team focused on (36) 30-ton units throughout DC 1 that would yield positive energy and cost savings. Utility rebates further enhanced the business case. The VFDs were installed towards the end of 2013, and the business case applied to DC 2 for further installations in 2014.
Figure 6. Savings attributed to frequency reductions on AC motors
To calculate savings, direct power readings were taken at the CRAC units at intervals of 2 hertz (Hz) from 60 Hz to 30 Hz. As shown in Figure 6, reducing CRAC frequency from 60 Hz to 56 Hz reduced power demand by 19%. In addition, the fan motor will release less heat to the air further reducing cooling load.
Additional maintenance cost savings are achieved through the extension of the filter replacement cycle. The VFDs allow less air to go through the system increasing the life span of the system filters. Once fully implemented the VFD installations will save over 3.9 GWh of energy.
Data as the Fountain of Youth
Comprehensive monitoring systems in place across the two data centers provided data accessible by both the GTIS and CRES teams, enabling them to make the best, data-driven decisions. The use of the sites’ EPMS and branch circuit monitoring system (BCMS) enable the teams to pinpoint areas with the greatest energy-saving potential and work together to trial and implement initiatives.
Barclays uses the EPMS as a tool to monitor, measure, and optimize the performance of the electrical loads. It monitors critical systems such as HVAC equipment, electrical switchgear, etc. A dashboard displays trend data. For example, the team can trend the total UPS load and the total building load, which yields PUE, on a continuous, real-time basis.
In addition to the EPMS, the CRES and GTIS teams also use the BCMS to track energy use by server cabinet, remote power panel, and power distribution unit. This system is used for capacity planning and load balancing. In addition, the monitored cabinet loads are used to optimize the airflow in the Cold Aisles.
Conclusion
With the right level of collaboration between CRES and GTIS, the right data, and the right corporate environmental targets, the Barclays Americas’ team was able to find energy and cost savings in their data centers. By executing airflow management, enhanced cooling, and VFD strategies in the existing data centers, the team applied the latest standards and best practices to keep energy consumption at levels typical of new data centers. At 8 GWh lighter, with an Energy Star certification and a PUE that keeps dropping—these data centers are not showing their energy age.
Jan van Halem is vice president, Data Center Engineering at Barclays. He joined Barclays’ engineering team in 2004. Mr. van Halem has more than 20 years experience and a strong knowledge of critical data center systems and mechanical operations. At Barclays, he is responsible for the mechanical and electrical systems of the company’s major data centers in the Americas. In addition, he has provided engineering design to new construction and expansion projects in the region.
Before joining Barclays, Mr. van Halem was with real estate organizations and held positions in facility management, construction management, and project management. Mr. van Halem has a BS degree in Marine Engineering from the Maritime Institute De Ruyter in Flushing, the Netherlands. He served as a marine engineer for 8 years in the Dutch Merchant Marine.
Frances Cabrera, LEED AP, is vice president, Environmental Management at Barclays. She joined Barclays’ environmental management team in 2011. Ms. Cabrera oversees Barclays Americas’ environmental programs, both resource saving and compliance, to support the region’s ISO certified management system. With the collaboration of the corporate real estate team, the region has achieved multiple LEED and Energy Star certifications. She’s also part of the firm’s global center of excellence for environment, where she works across regions to measure and support the firm’s green IT achievements.
Before joining Barclays, Ms. Cabrera ran the ISO 14001 systems in North and South America for Canon USA and worked at various manufacturing companies in Rochester, NY, integrating environmental considerations into their operations and meeting regulations. Ms. Cabrera has a BS degree in Environmental Technology and a MS degree in Environmental, Health, and Safety Management from the Rochester Institute of Technology.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/12/barclays.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-12-05 12:56:552014-12-05 12:56:55Thinking Ahead Can Prevent the Mid-Life Energy Crisis in Data Centers
Achieving Concurrently Maintainable and Fault Tolerant cooling using various close coupled cooling technologies
By Matt Mescall
Early mainframe computers were cooled by water at the chip level. Then, as computing moved to the distributed server model, air replaced water. Typically, data centers included perimeter computer room air condition (CRAC) units to supply cold air to a raised floor plenum and perforated floor tiles to deliver it to IT equipment. These CRAC units were either direct-expansion (DX) or chilled-water units (for simplicity, CRAC will be used to refer to either kind of unit). This arrangement worked for the past few decades while data centers were primarily occupied by low-density IT equipment (< 2-4 kilowatts [kW] per rack). However, as high-density racks become more common, CRAC units and a raised floor may not provide adequate cooling.
To address these situations, data center cooling vendors developed close coupled cooling (CCC). CCC technology includes in-row, in-rack, above–rack, and rear-door heat exchanger (RDHx) systems. Manufacturers typically recommend the use of a Cold Aisle/Hot Aisle arrangement for greater efficiency, which is a best practice for all data center operations. As rack density increased due to IT consolidation and virtualization, CCC moved from being a solution to an unusual cooling situation to being the preferred cooling method. Implemented properly, a CCC solution can meet the Concurrently Maintainable and Fault Tolerant requirements of a data center.
While an air handler may provide humidity control, the close coupled cooling solution provides the onlycooling for the IT equipment in a data center. Additionally, it is assumed that the reader understands how to design a direct-expansion or chilled-water CRAC based cooling system to meet Concurrent Maintainability and Fault Tolerant requirements. This paper does not address Concurrent Maintainability and Fault Tolerant requirements for a central cooling plant, only the CCC system in the data hall.
Meeting Concurrently Maintainable and Fault Tolerant Requirements
First, let’s clarify what is required for a Concurrently Maintainable (Tier III) and a Fault Tolerant (Tier IV) cooling system. This discussion is not a comprehensive description of all Concurrently Maintainable and Fault Tolerant requirements, but it provides the basis for the rest of the discussion in this paper.
A Concurrently Maintainable system must have redundant capacity components and independent distribution paths, which means that each and every capacity component and distribution path element can be taken out of service for maintenance, repair, or replacement without impacting the critical environment.
To meet this requirement, the system must have dry pipes (no flowing or pressurized liquid) to prevent liquid spills when maintaining pipes, joints, and valves. Draining a pipe while it is disassembled is allowed, but hot tapping and pipe freezing are not. A Fault Tolerant cooling system may look like a Concurrently Maintainable system, but it must also autonomously respond to failures, including Continuous Cooling, and compartmentalize the chilled-water and/or refrigerant pipes outside the room of use (typically the computer room).
There are several different types and configurations of CCC. For simplicity, this paper will break them into two groups, in-row and above–row, and RDHx. While there are other CCC solutions available, the same concepts can be used to provide a Concurrently Maintainable or Fault Tolerant design.
In-row and above-row CCC
When data center owners have a business requirement for a high density data center to be Concurrently Maintainable or Fault Tolerant, a CCC design poses special circumstances that do not exist with room-based cooling. First, airflow must be considered. A CRAC-unit-based cooling design that is Concurrently Maintainable or Fault Tolerant has N+R cooling units that provide cooling to the whole room. When a redundant unit is off for maintenance or suffers a fault, the IT equipment still receives cooling from the remaining CRAC units via the perforated tiles in the Cold Aisle. The cooling in any Cold Aisle is not affected when the redundant unit is offline. This arrangement allows for one or two redundant CRAC units in an entire room (see Figure 1).
Figure 1. CCC airflow considerations
CCC provides cooling to the specific Cold Aisle where the unit is located. In other words, CCC units cannot provide cooling to different Cold Aisles the way CRAC units can. Accordingly, the redundant CCC unit must be located in the aisle where the cooling is required. In addition to having sufficient redundant cooling in every Cold Aisle, distance from the cooling unit to the IT equipment must also be considered. In-row and above row cooling units typically can provide cold air for only a limited distance. The design must take into account the worst-case scenario during maintenance or a failure event.
After considering the number of units and their location in the Cold Aisle, design team must consider the method of cooling, which may be air-cooled direct expansion (DX), chilled water, or a pumped refrigerant. Air-cooled DX units aretypically matched with their own condenser units. Other than proper routing, piping for air-cooled DX units require no special considerations.
Piping to chilled-water units is either traditional chilled-water piping or a cooling distribution unit (CDU). In the former method chilled water is piped directly to CCC units, similar to CRAC units. In this case, chilled-water piping systems are designed to be Concurrently Maintainable or Fault Tolerant in the same way as single-coil, room-based CRAC units.
The latter method, which uses CDUs, poses a number of special considerations. Again, chilled-water piping to a CDU and to single-coil, room-based CRAC units is designed to be Concurrently Maintainable or Fault Tolerant in the same way. However, designers must consider the impact to each Cold Aisle when a CDU is removed from service or suffers a fault.
If any single CDU provides cooling to more than the redundant number of cooling units in any aisle, the design is not Concurrently Maintainable or Fault Tolerant. When CDUs are located outside of the server room or data hall in a Fault Tolerant design, they must be properly compartmentalized so that a single event does not remove more than the redundant number of cooling units from service. A Fault Tolerant system also requires Continuous Cooling, the ability to detect, isolate, and contain a fault, and sustain operations. In a CCC system that rejects heat to a chilled-water system, the mechanical part of Continuous Cooling can be met with an appropriate thermal storage tank system that is part of a central plant.
A CCC system that rejects heat to outside air via refrigerant and a condenser will likely rely on uninterrupted power to provide Continuous Cooling which will be discussed in the following paragraphs.
Some CCC systems use pumped refrigerant. These systems transfer heat from pumped refrigerant to a building’s chilled-water system, a glycol system, or an external condenser unit.
Due to the similarities between chilled-water and glycol systems with respect to the piping headers, glycol and chilled water systems will be treated the same for purposes of this paper.. The heat transfer occurs at an in-room chiller or heat exchanger that, for the purposes of this discussion, is similar to a CDU. The Con- currently Maintainable and Fault Tolerant design considerations for a pumped refrigerant system are the same as a chilled-water system that uses a CDU.
The system that powers all CCC components must be designed to ensure that the electrical system does not defeat the Concurrent Maintainability or Fault Tolerance of the mechanical system. In a Concurrently Maintainable mechanical system electrical design, no more than the redundant number of cooling units may be removed from service when any part of the electrical system is removed from service in a planned manner. This requirement includes the cooling within any aisle, not just the room as a whole. Designing the CCC units and the associated CDUs, in-room chillers, or heat exchangers in a 2N configuration greatly simplifies the electrical distribution.
Providing an A feed to half of the units and a B feed to the other half of the units while paying attention to the distribution of the CCC units, will typically provide a Concurrently Maintainable electrical design.
If the cooling system is in an N+R configuration, the distribution of the power sources will require special coordination. Typically, the units will be dual fed, which can be accomplished by utilizing an internal transfer switch n the units, an external manual transfer switch, or an external automatic transfer switch. This requirement applies to all components of the CCC system that require power to cool the critical space, including the in-row and above-row units, the in-room chillers, heat exchangers, and any power that is required for CDUs (see Figure 2).
Figure 2. CCC valve scenario
When any part of a Fault Tolerant electrical design for a mechanical system experiences a fault no more than the redundant number of cooling units may be removed from service. The same Concurrently Maintainable concepts apply to a Fault Tolerant electrical system; however, all of the transfer switches must be automatic and cannot rely on human intervention to respond to a fault. Additionally, in order to provide Continuous Cooling, uninterruptible power must be provided for cooling fans, in-room chillers and heat exchangers, pumps, and CDUs. A CCC system that uses DX and condensers to reject heat to outside air will require uninterrupted power to all system components to achieve Continuous Cooling.
The controls for these systems must also be considered in the design and meet the appropriate Concurrent Maintainability and Fault Tolerant requirements.
RDHx
The requirements for a Concurrently Maintainable or Fault Tolerant RDHx cooling solution are similar to those for in-row cooling. The RDHx units typically use chilled water or a pumped refrigerant and CDUs, in-room chillers, or heat exchangers. These units need to meet all of the Concurrent Maintainability
or Fault Tolerant requirements of in-row CCC units. Airflow when a door is removed from service for either a planned event or due to a failure is a major consideration. When an RDHx solution cools an entire data center, it may be configured in a front-to-back rack configuration. When one or more doors are removed from service, the affected racks will blow hot exhaust air into the racks behind them, which may cause them to overheat, depending on the heat load.
This configuration does not meet Concurrent Maintainability or Fault Tolerant requirements, which require that the cooling system provide N cooling to all critical equipment during a planned maintenance event or a failure. Placing the racks in a Cold Aisle/Hot Aisle configuration may not meet this requirement as exhaust air from the affected rack may circulate over its top from the Hot Aisle and overheat the servers at the top of the rack and possibly adjacent racks. The same airflow issue is possible for racks placed at the end of rows when their RDHx is not working.
Summary
Using CCC as the only form of cooling in the data center is becoming more common. CCC provides additional challenges to meet Concurrent Maintainability and Fault Tolerant requirements beyond those typically experienced with a CRAC-based cooling system. The challenges of different airflow, when compared to room- based CRACs, and ensuring that the consequential impact of maintenance and failures on the additional capacity components and distribution systems do not remove more than the redundant number of units from service can be met with careful consideration when designing all parts of the CCC system.
Matt Mescall
Matthew Mescall, PE, is a senior consultant for Uptime Institute Professional Services and Tier Certification Authority, where he performs audits and provides strategic- level consulting and Tier Certification reviews. Mr. Mescall’s career in critical facilities spans 12 years and includes responsibilities in planning, engineering, design, construction, and operation. Before joining Uptime Institute, Mr. Mescall was with IBM, where he operated its Boulder, CO, data center and led a worldwide team analyzing best practices across IBM data centers to ensure consistent, cost-effective reliability.Mr. Mescall holds a BS degree in Civil Engineering from the University of Southern California, a MS in Civil Engineering from the Georgia Institute of Technology, and a Masters Certificate in Project Management from George Washington University.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/11/19.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-11-26 12:17:192014-12-23 13:45:36Close Coupled Cooling and Reliability
The fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/11/17-header.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-11-21 15:56:262015-07-24 08:54:12Annual Data Center Industry Survey 2014
The fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional. This document provides the full results and analysis.
I. Survey Demographics
Uptime Institute focused its analysis on the end-user survey respondents. The majority of survey participants are data center managers, followed by smaller percentages of IT managers and senior executives. The U.S. and Canada make up a significant portion of the response, with growing numbers of participants from around the globe.
About half of the end-user respondents work for third-party commercial data center companies (colocation or cloud computing providers), and the other half work for enterprises in vertical industries such as financial services (11%), manufacturing (7%), healthcare (4%), government (4%), and other industries (26%).
In various sections throughout this survey, the financial services industry’s responses have been broken out from those of traditional enterprise companies. Across multiple focus areas, the responses of financial services organizations differ significantly from those of other verticals.
Previous annual surveys in 2011, 2012, and 2013 showed that large organizations (defined in this context as companies managing over 5,000 servers) were adopting new technologies, outsourcing less-critical workloads, and pursuing energy efficiency goals much faster than smaller companies.
For 2014, we analyzed the results further and found that the data center maturity gap is not just a matter of size, but also specific to a single industry (banking). This difference is most likely due to the massive investment financial organizations have in IT, especially in relation to their overall cost structures. A financial organization’s efficiency at deploying IT correlates directly to its profitability in a way that may not be as obvious to other companies.
When the response profiles of the financial organizations and the colocation providers are compared, a pattern starts to emerge – both banks and colos run their data center operations as a business.
We will explore the implications of these parallels throughout the survey data.
II. Data center budgets
In each year’s survey, we ask participants to compare their organization’s current spending on data centers (including real estate, infrastructure equipment, staffing, operations, etc.) to the previous year. Every year, the answers to these questions reveal massive growth in the colocation and multi-tenant data center (MTDC) industry compared to enterprise spending.
In 2014, the vast majority of third-party data center respondents (86%) report receiving budget increases, versus 63% of financial firms and just 50% of the other enterprise companies. This gap is similar to the 2013 results: 77% of third-party respondents increased budget, versus just 47% of enterprise companies.
With half of all enterprises reporting stagnant or shrinking data center budgets and colocation operators reporting massive growth, our conclusion is that increasingly, enterprise data center workloads are shifting to third-party providers.
This is not to say that the enterprise data centers are going away any time soon. These organizations will continue to squeeze a return on their investments in enterprise data center assets. The value of a performing, fully or partially depreciated asset cannot be disregarded.
According to surveys from Uptime Institute’s executive programs, nearly every enterprise has chosen to host some percentage of its IT workloads off-premise, in a MTDC, cloud, or other third-party environment. Anecdotal reports to Uptime Institute lead us to believe this to be a fairly recent development (within the last 5 years).
Yet, while nearly every company is deploying off-premise computing, the percentage of workloads hosted in these environments appears to be fairly static as a percentage of overall compute. The figure below represents enterprise organizations’ IT deployment mix today, and projected deployment mix in 2014. This indicates that the growth trends in off-premise computing will continue as overall enterprise IT workloads continue to grow. This finding also indicates that there is no imminent rebound or recoil in the circular trend of outsourcing-insourcing-outsourcing commonly held in IT and other key enterprise functions such as call centers.
This report will delve into the drivers and decision-making challenges for enterprise IT organizations contracting MTDCs and cloud computing in Section IV.
III. IT Efficiency
A special focus in 2014’s survey is an assessment of the behaviors, management strategies, and technologies used to improve IT efficiency.
Over the past several years, Uptime Institute has documented the rise in adoption of Power Usage Effectiveness (PUE) and the meager gains achieved by further pursuing that metric. Based on Uptime Institute’s field experience and feedback from the Uptime Institute Network (a user group of large data center owners and operators) around the world, enterprise IT executives are overly focused on PUE.
The vast majority (72%) of respondents measure PUE. Isolating the responses by job function, a huge percentage of executives (82%) are tracking that metric and reporting it to their corporate management.
PUE is an effective engineering ratio that data center facilities teams use to capture baseline data and track the results of efficiency improvements to mechanical and electrical infrastructure. It is also useful for design teams to compare equipment- or topology-level solutions. But as industry adoption of PUE has expanded, the metric is increasingly being misused as a methodology to cut costs and prove stewardship of corporate and/or environmental resources
In 2007, Uptime Institute surveyed its Network members, and found an average PUE of 2.50. The average PUE improved from 2.50 in 2007 to 1.89 in 2011 in Uptime Institute’s data center industry survey.
From 2011 to today, the average self-reported PUE has only improved from 1.89 to 1.7. The biggest infrastructure efficiency gains happened five years ago, and further improvements will require significant investment and effort, with increasingly diminishing returns.
The figure following represents adoption of various data center cooling approaches to improve efficiency. The low-capital cost approaches have largely been adopted by data center operators. And yet, executives pressure for further reduction in PUE. High-cost efficiency investments in technologies and design approaches may provide negative financial payback and zero improvement of systemic IT efficiency problems.
Many companies’ targets for PUE are far lower than their reported current state. By focusing on PUE, IT executives are spending effort and capital for diminishing returns and continue to ignore the underlying drivers of poor IT utilization.
For example, Uptime Institute estimates of 20% of servers in data centers are obsolete, outdated, or unused. Yet, very few survey respondents believe their server populations include comatose machines. Nearly half of survey respondents have no scheduled auditing to identify and remove unused hardware.
Historically, IT energy efficiency has been driven by data center facilities management. According to the Uptime Institute’s Annual Data Center Industry Survey (2011-2014), less than 20% of companies report that their IT departments pay the data center power bill, and the vast majority of companies allocate this cost to the facilities or real estate budgets.
This lopsided financial arrangement fosters unaccountable IT growth, inaccurate planning, and waste. This is why 67% of the senior IT executives believe comatose hardware is not a problem.
Uptime Institute launched the Server Roundup contest in October 2011 to raise awareness about the removal and recycling of comatose and obsolete IT equipment and reduce data center energy use. Uptime Institute invited companies around the globe to help address and solve this problem by participating in the Server Roundup, an initiative to promote IT and Facilities integration and improve data center energy efficiency.
In 2 years of participating in Server Roundup, the financial firm Barclays has removed nearly 15,000 servers and saved over US$10M. Server Roundup overwhelmingly proves that disciplined hardware decommissioning can provide a significant financial impact.
Yet despite these huge savings and intangible benefits to the overall IT organization, many firms are not applying the same level of diligence and discipline to a server decommissioning plan, as noted previously.
This is the crux of the data center efficiency challenge ahead—convincing more organizations of the massive return on investment in addressing IT instead of relentlessly pursuing physical infrastructure efficiency.
Organizations need to hold IT operations teams accountable to root out inefficiencies, of which comatose servers are only the most obvious and egregious example.
For nearly a decade, Uptime Institute has recommended enterprise IT executives take a holistic approach to significantly reduce the cost and resource consumption of compute infrastructure. That approach is outlined here.
IV. Enterprise Adoption of Third-Party Data Centers and IT Services
As stated earlier, nearly every enterprise organization is using some combination of in-house IT and off-premise computing. There are a number of drivers for this trend, including the ability to right-size deployments, lower the cost of investment, and getting IT workloads into production quickly.
So far, enterprise organizations have largely been satisfied with their experiences using multi-tenant data center providers. In fact, in this unverified and self-reported survey, the colocation operators report fewer outages than their enterprise counterparts.
Despite many enterprise organizations currently reporting satisfaction with colocation providers, the deployment to off-premise computing has not always been a smooth transition. In Uptime Institute’s experience, many large enterprise organizations historically ran their own data centers, and only recently started deploying into third-party sites at scale. The facilities and corporate real estate teams who are often responsible for managing these companies have limited experience in contract terms, service level agreements, pricing, and other challenges specific to an outsourced IT relationship.
In fact, the decision over whether to outsource an IT workload and where to host it typically comes from the IT department, and not the management team that ultimately holds responsibility for that contract.
The facilities managers and data center engineers are expected to become experts in third-party data center management on the fly—to learn on the job. All the while, the usage of third-party data center providers is rapidly expanding, and very few enterprises have formalized requirements for engaging with the MTDC market. A large percentage cannot track the cost of downtime for their organizations.
The vast majority of enterprise organizations are blasting workloads into off-premise computing environments, but they don’t know where they are going, or what their staff are supposed to do when they get there. Many organizations are making decisions on a very limited selection of criteria and inputs.
Ultimately, this was the primary reason Uptime Institute developed the FORCSS™ Methodology in 2012.
Uptime Institute FORCSS is a means to capture, compare, prioritize, and communicate the benefits, costs, and impacts of multiple IT deployment alternatives. Deployment alternatives may include owned/existing data centers, commercial data centers (wholesale, retail, colocation, managed service), or IaaS (including cloud) that is procured on a scale or limited basis.
FORCSS provides organizations with the flexibility to develop specific responses to varying organizational needs. A case study series will present the process of applying the FORCSS Factors to specific deployment options and present the outcome of the FORCSS Index—a concise structure that can be understood by non-IT executive management.
Enterprise companies are investing less in their own data centers. Instead, they are deploying their IT in off-premise data center environments. This trend goes back through the 4 years Uptime Institute has conducted this survey. This trend is leading to massive spending in the MTDC market. This spending does not show signs of abating.
Although more IT workloads are moving to third-party providers (especially new workloads), the enterprise-owned data center will continue to be responsible for much core IT production for the foreseeable future.
Enterprise organizations are satisfied with the performance of their current MTDC providers, but very few companies have the expertise or processes in place yet to manage or even make the most effective decisions about off-premise computing options.
As noted in previous years, IT efficiency efforts have largely been limited to data center facilities management and design teams. Very little work has been done to address the systemic IT inefficiencies that have plagued the industry for nearly a decade. But as senior executives push for more improvements in efficiency, many will realize they are running out of return on investment; hopefully, they will turn to improving IT utilization.
A large majority (75%) of survey participants said the data center industry needs a new energy efficiency metric.
Appendix
Additional 2014 survey responses:
i. If your organization has adopted Cold Aisle or Hot Aisle containment, approximately what percentage of your cabinets uses this design?
a. Less than 10% contained: 22%
b. 10-25% contained: 13%
c. 25-50% contained: 12%
d. 50% contained: 7%
e. 50-75% contained: 16%
f. 75-100% contained: 30%
ii. Would your organization consider a data center that did not include the following designs/technologies?
a. Raised floor: 52% yes
b. Mechanical cooling: 24% yes
c. Generator: 8% yes
d. Uninterruptible power supply: 7% yes
iii. Does management receive reports on data center energy costs?
a. Yes: 71%
b. No: 29%
iv. Does management set targets for reducing data center energy costs?
a. Yes: 54%
b. No: 46%
v. How does your organization measure PUE?
a. PUE Category 0: 30%
b. PUE Category 1: 25%
c. PUE Category 2: 19%
d. PUE Category 3: 11%
e. Alternative method: 8%
f. Don’t know: 7%
vi. Does your company report PUE publicly?
a. Yes; 10%
b. No; 90%
vii. Has your organization achieved environmental or sustainability certifications for any of its data centers?
a. Colo/MTDC: 35% yes
b. Financial Services: 46% yes
c. Other Enterprises: 21% yes
viii. Considering your company’s primary multi-tenant or colocation provider, what is the length of the commitment you have made to that provider?
a. Under 2 years
i. Financial Services: 28%
ii. Other Enterprise: 36%
b. 2-3 years
i. Financial Services: 11%
ii. Other Enterprise: 22%
c. 3-5 years
i. Financial Services: 30%
ii. Other Enterprise: 21%
d. Over 5 years
i. Financial Services: 32%
ii. Other Enterprise: 21%
ix. If your organization measures the cost of data center downtime, how do you use that information?
a. Report to management: 88%
b. Rationalize equipment purchases: 51%
c. Rationalize services purchases: 42%
d. Rationalize increased staff or staff training: 39%
e. Rationalize software purchases: 32%
x. Does your organization perform unscheduled drills that simulate data center emergencies?
a. Yes: 44%
b. No: 56%
xi. Considering your organization’s largest enterprise data center, what staffing model is used for facilities staff? a. 24 hours a day, 7 days a week: 70%
b. Other: 30%
Email Uptime Institute Director of Content and Publications Matt Stansberry with any questions or feedback: [email protected].
This paper provides analysis and commentary of the Uptime Institute survey responses. Uptime Institute makes reasonable efforts to facilitate a survey that is reliable and relevant. All participant responses are assumed to be in good faith. Uptime Institute does not verify or endorse the responses of the participants; any claims to savings or benefits are entirely the representations of the survey participants.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/11/17-header.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-11-21 14:06:372014-11-25 07:56:392014 Data Center Industry Survey
Striking the appropriate balance between cost and reliability is a business decision that requires metrics
By Dr. Hussein Shehata
This paper focuses on cooling limitations of down-flow computer room air conditioners/air handlers (CRACs/CRAHs) with dedicated heat extraction solutions in high-density data center cooling applications. The paper also explains how higher redundancy can increase total cost of ownership (TCO) while supporting only very light loads and proposes a metric to help balance the requirements of achieving higher capacities and efficient space utilization.
With several vendors proposing passive high-density technologies (e.g., cabinet hot air removal as a total resolution to the challenge of high density), this analysis shows that such solutions are only possible for a select few cabinets in each row and not for full deployments.
The vendors claim that the technologies can remove heat loads exceeding 20 kilowatts (kW) per cabinet, but our study disproves that claim; passive-cooling units cannot extract more heat than the cold air supplied by the CRACs. For the efficient design of a data center, the aim is to increase the number of cabinets and the total IT load, with the minimal necessary supporting cooling infrastructure. See Figure 1.
Figure 1. The relationship between IT and supporting spaces
Passive Hot Air Removal
Data center design continually evolves towards increasing capacity and decreasing spatial volume, increasing energy density. High-end applications and equipment have higher energy density than standard equipment; however, the high-performance models of any technology have historically become the market standard with the passage of time, which in the case of the IT industry is a short period. As an example, every 3 years the world’s fastest supercomputers offer 10 times the performance of the previous generation, a trend that has been documented over the past 20 years.
Cooling high-density data centers is mostly commonly achieved by:
• Hot Air Removal (HAR) via cabinet exhaust ducts—active and passive.
See Figure 2.
Figure 2. HAR via cabinet exhaust ducts (active and passive). Courtesy APC
• Dedicated fan-powered cooling units (i.e., chilled water cabinets).
See Figure 3.
Figure 3. Dedicated fan-powered cooling units
This paper focuses on HAR/CRAC technology using an underfloor air distribution plenum.
Approach
High-density data centers require cooling units that are capable of delivering the highest cooling capacity using the smallest possible footprint. The high-powered CRACs in the smallest footprints available from the major manufacturer offer a net sensible cooling capacity of approximately 90 kW but require 3×1-meter (m) (width by depth) footprints. (Appendix C includes the technical specifications for the example CRAC).
Excluding a detailed heat load estimate and air efficiency distribution effectiveness, the variables of CRAC capacity, cabinet quantity, and cabinet capacity may be related in the following formula.
Note: The formula is simplified and focused on IT cooling requirements, excluding other loads such as lighting and solar gains.
CRAC Capacity = Number of IT cabinets x kW/cabinet (1)
Example 1 for N Capacity: If a 90-kW CRAC cools 90 cabinets, the
average cooling delivered per cabinet is 1 kW.
90 kW= 90 cabinets x 1 kW/cabinet (2)
Example 2 for N Capacity: If a 90-kW CRAC cools two cabinets, the
average cooling delivered per cabinet is 45 kW.
90 kW= 2 cabinets x 45 kW/cabinet (3)
The simplified methodology, however, does not provide practical insight into space usage and heat extraction capability. In Example 1, one CRAC would struggle to efficiently deliver air evenly to all 90 cabinets due to the practical constraints of CRAC airflow throw; in most circumstances the cabinets farthest from the CRAC would likely receive less air then the closer cabinets (assuming practical raised-floor heights and minimal obstructions to under floor airflow).
In Example 2, one CRAC would be capable of supplying sufficient cooling to both cabinets; however, the ratio of space utilization of the CRAC, service access space, and airflow throw buffer would result in a high space usage for the infrastructure compared to prime white space (IT cabinets). Other constraints, such as allocating sufficient perforated floor tiles/grills in case of a raised-floor plenum or additional Cold Aisle containment for maximum air distribution effectiveness may lead to extremely large Cold Aisles that again render the data center space utilization inefficient.
Appendix B includes a number of data center layouts generated to illustrate these concepts. The strategic layouts in this study considered maximum (18 m), average (14 m) and minimal (10 m) practical CRAC air throw, with CRACs installed perpendicular to cabinet rows on one and two sides as recommended in ASHRAE TC9.9. The front-to-back airflow cabinets are assumed to be configured to the best practice of Cold Aisle/Hot Aisle arrangement (See Figure 4). Variation in throw resulted in low, medium, and high cabinet count, best defined as high density, average density, and maximum packed (high number of cabinets) for the same data center whitespace area and electrical load (see Figure 5).
Figure 5. CRAC throw area
In the example layouts, CRACs were placed close together, with the minimal 500-millimeter (mm) maintenance space on one side and 1,000 mm on the long side (see Figure 6). Note that each CRAC manufacturer might have different unit clearance requirements. A minimal 2-m buffer between the nearest cabinet and each CRAC unit prevents entrainment of warm air into the cold air plenum. Cold and Hot aisle widths were modeled on approximately 1,000 mm (hot) and 1,200 mm (cold) as recommended in ASHRAE TC9.9 literature.
In the context of this study, CRAC footprint is defined as the area occupied by CRACs (including maintenance and airflow throw buffer); cabinet footprint is defined as the area occupied by cabinets (and their aisles). These two areas have been compared to analyze the use of prime footprint within the data center hall.
Tier level requires each and every power and cooling component and path to fulfill the Tier requirements; in the context of this paper the redundancy configuration reflects the Tier level of CRAC capacity components only, excluding considerations to other subsystems required for the facility’s operation. Tier I would not require redundant components, hence N CRAC units are employed. Tiers II, III, and IV would require redundant CRACs; therefore N+1 and N+2 configurations were also considered.
Figure 6. CRAC maintenance zone
A basic analysis shows that using a CRAC as described above would require a 14-m2 area (including throw buffer), which would generate 25.7 kW of cooling for every 1 m of active CRAC perimeter at N redundancy, 19.3 kW for one-sided N+1 redundancy and two-sided N+2 redundancy, 22.5 kW for two-sided N+1 redundancy, and 12.9 kW for one-sided N+2 redundancy. However, data center halls are not predominantly selected and designed based on perimeter length, but rather on floor area.
The study focused on identifying the area required by CRAC units, compared to that occupied by IT cabinets, and defines it as a ratio. Figure 7 shows Tier I (N) one-sided CRACs in a high-density cabinet configuration. Appendix A includes the other configuration models.
Furthermore, a metric has been derived to help determine the appropriate cabinet footprint at the required Tier level (considering CRAC redundancy only).
Figure 7. Tier 1 (N) one-sided CRACs in a high-density cabinet configuration
Cabinet capacity to footprint factor C2F= kw/cabinet / C2C (4)
Where CRAC to Cabinet factor C2C= CRAC footprint / Cabinet footprint (5)
For multiple layout configurations, the higher the C2F, the more IT capacity can be incorporated into the space. Higher capacity could be established by more cabinets at lower densities or by fewer cabinets at higher densities. However, the C2F is closely linked to the necessary CRAC footprint, which as analyzed in this paper, could be a major limiting factor (see Figure 8).
Figure 8. C2F versus cabinet load (kW) for various CRAC redundancies
Results
The detailed results appear in Appendix B. The variations analyzed included reference CRACs with no redundancy, with one redundant unit, and with two redundant units. For each of the CRAC configurations, three cabinet layouts were considered: maximum packed, average density, and high density).
Results showed that the highest C2F based on the six variations within each of the three redundancy configurations is as follows:
The noteworthy finding is that the highest C2F in all 18-modeled variations was for high-density implementation and at a CRAC-to-cabinet (C2C) area ratio of 0.46 (i.e., CRACs occupy 32% of the entire space) and a cabinet footprint of 2.3 m2 per cabinet. This is supporting evidence that, although high-density cabinets would require more cooling footprint, high density is the most efficient space utilization per kW of IT.
Example 3 illustrates how the highest C2F on a given CRAC redundancy and one- or two-sided layout may be utilized for sizing the footprint and capacity within an average-sized 186-m2 data center hall for a Tier II-IV (N+2, C2F=9.8, C2C=0.5, and cabinet footprint of 2.3 m2) deployment. The space is divided into a net 124-m2 data hall for cabinets, and 62 m2 of space for CRAC units by utilizing the resulting ideal C2C of 0.46.
Example 3: If a net 124-m2 data hall for cabinets and 62 m2 of space for CRAC units is available, the highest achievable capacity would be 4.5 kW/cabinet.
9.8= 4.5 kW/cabinet/59 m2 : 127 m2 (6)
To determine the number of cabinets and CRACs, the CRAC cooling capability will be used rather than the common method of dividing the area by cabinet footprint.
The total area occupied by a CRAC is 14 m2; hence approximately four CRACs would occupy the 59-m2 space. Two CRACs are duty, since N+2 is utilized; therefore, the available capacity would be 90 kW x 2 = 180 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 180/4.5 = 40 cabinets.
The total effective space used by the 40 cabinets is 92 m2 (40 x 2.3 m2 ) that is 72% of the available cabinet dedicated area. This shows that higher redundancy may be resilient but does not fully utilize the space efficiently. This argument highlights the importance of the debate between resilience and space utilization.
Example 4 illustrates how C2F may be utilized for sizing the footprint and capacity within the same data center hall but at a lower redundancy of N+1 configuration.
Example 4: By applying the same methodology, the highest achievable capacity would be 5.2 kW/cabinet.
11.4= (7)
The total area occupied by a CRAC is 14 m2 (including CRAC throw and maintenance); hence approximately four CRACs would occupy 59 m2 of space. Three CRACs would be on duty, since N+1 is utilized; therefore, the available capacity would be 90 kW x 3 = 270 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 270/5.2 = 52 cabinets.
The total effective space used by the 52 cabinets is 120 m2 (52 x 2.3 m2 ), which is 95% of the space. The comparison of Example 3 to Example 4 shows that less redundancy provides more efficient space utilization.
Figure 9. Summary of the results
The analysis shows that taking into consideration the maximum C2F results obtained for each redundancy type and then projecting output on a given average load per cabinet, an example average high-density cabinet of 20 kW would require the CRAC units to occupy double the IT cabinet space in an N+2 configuration, hence lowering the effective use of such prime IT floor space (See Figure 9).
Additional Metrics
Additional metrics for design purposes have been derived from the illustrated graphs and resultant formulae.
The derived formula could be documented as follows:
P=K/L+M-(6.4 x R/S) (8)
Where
P = Cooling per perimeter meter (kW/m)
K = CRAC net sensible capacity (kW)
L = CRAC length (m)
M = CRAC manufacturer side maintenance clearance (m)
R = CRAC redundancy
S = One- or two-sided CRAC layout
Conclusion
Approximately 50% (270 kW/180 kW) more capacity, 30% more cabinets, and 16% higher-cabinet load density could be utilized in the same space with only one redundant CRAC and may still fulfill Tier II-IV component redundancy requirements. This is achievable at no additional investment cost as the same number of CRACs (4) is installed within the same available footprint of 2,000 ft2. The analysis also showed that the highest average practical load per cabinet should not exceed 6 kW if efficient space utilization is sought by maintaining a C2C of 0.46.
This study shows that an average high-density cabinet load may not be cooled efficiently with the use of only CRACs or even with CRACs coupled with passive heat-extraction solutions. The data supports the necessary implementation of row- and cabinet-based active cooling for high-density data center applications.
The first supercomputers used cooling water; however, the low-density data centers that were commissioned closer to a decade ago (below 2 kW per cabinet) almost totally eliminated liquid cooling. This was due to reservations about the risks of water leakage within live, critical data centers.
Data centers of today are considered to be medium-density facilities. Some of these data centers average below 4 kW per cabinet. Owners and operators that have higher demands and are ahead of the average market typically dedicate only a portion of the data center space to high-density cabinets.
With server density increasing every day and high-density cabinets (approaching 40 kW and above) becoming a potential future deployment, data centers seem likely to experience soaring heat loads that will demand comprehensive liquid-cooling infrastructures.
With future high-density requirements, CRAC units may become secondary cooling support or even more drastically, CRAC units may become obsolete!
Appendix A5. Two-sided CRAC, average-throw, medium packed cabinets
Appendix A6. Two-sided CRAC, minimum-throw, high density cabinets
Appendix B
Appendix B1. Tier I (N) CRAC modeling results
Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4:^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer
Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4: ^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer
Liebert CRAC Technical Specification
Note: Net sensible cooling will be reduced by 7.5 kW x 3 = 22.5 kW for fans; 68.7 kW for Model DH/VH380A
Dr Hussein Shehata, BA, PhD, CEng, PGDip, MASHRAE, MIET, MCIBSE, is the technical director, EMEA, Uptime Institute Professional Services (UIPS). Dr Shehata is a U.K. Chartered Engineer who joined Uptime Institute Professional Services in 2011. He is based in Dubai, serving the EMEA region. From 2008-2011, Hussein was vice president & AsiaPacific DC Engineering, Architecture & Strategy Head at JP Morgan in Japan. Prior to that, he co-founded, managed, and operated as a subject matter expert (SME) at PTS Consulting Japan. He graduated in Architecture, followed by a PhD in HVAC, and a diploma in Higher Education that focused on multi-discipline teaching, with a focus on Engineers and Architects.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/11/10.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-11-11 12:44:382014-11-11 12:44:38Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics
Digital Realty Deploys Comprehensive DCIM Solution
/in Operations/by Kevin HeslinExamining the scope of the challenge
By David Schirmacher
Digital Realty’s 127 properties cover around 24 million square feet of mission-critical data center space in over 30 markets across North America, Europe, Asia and Australia, and it continues to grow and expand its data center footprint. As senior VP of operations, it’s my job to ensure that all of these data centers perform consistently—that they’re operating reliably and at peak efficiency and delivering best-in-class performance to our 600-plus customers.
At its core, this challenge is one of managing information. Managing any one of these data centers requires access to large amounts of operational data.
If Digital Realty could collect all the operational data from every data center in its entire portfolio and analyze it properly, the company would have access to a tremendous amount of information that it could use to improve operations across its portfolio. And that is exactly what we have set out to do by rolling out what may be the largest-ever data center infrastructure management (DCIM) project.
Earlier this year, Digital Realty launched a custom DCIM platform that collects data from all the company’s properties, aggregates it into a data warehouse for analysis, and then reports the data to our data center operations team and customers using an intuitive browser-based user interface. Once the DCIM platform is fully operational, we believe we will have the ability to build the largest statistically meaningful operational data set in the data center industry.
Business Needs and Challenges
The list of systems that data center operators report using to manage their data center infrastructure often includes a building management system, an assortment of equipment-specific monitoring and control systems, possibly an IT asset management program and quite likely a series of homegrown spreadsheets and reports. But they also report that they don’t have access to the information they need. All too often, the data required to effectively manage a data center operation is captured by multiple isolated systems, or worse, not collected at all. Accessing the data necessary to effectively manage a data center operation continues to be a significant challenge in the industry.
At every level, data and access to data are necessary to measure data center performance, and DCIM is intrinsically about data management. In 451 Research’s DCIM: Market Monitor Forecast, 2010-2015, analyst Greg Zwakman writes that a DCIM platform, “…collects and manages information about a data center’s assets, resource use and operational status.” But 451 Research’s definition does not end there. The collected information “…is then distributed, integrated, analyzed and applied in ways that help managers meet business and service-oriented goals and optimize their data center’s performance.” In other words, a DCIM platform must be an information management system that, in the end, provides access to the data necessary to drive business decisions.
Over the years, Digital Realty successfully deployed both commercially available and custom software tools to gather operational data at its data center facilities. Some of these systems provide continuous measurement of energy consumption and give our operators and customers a variety of dashboards that show energy performance. Additional systems deliver automated condition and alarm escalation, as well as work order generation. In early 2012 Digital Realty recognized that the wealth of data that could be mined across its vast data center portfolio was far greater than current systems allowed.
In response to this realization, Digital Realty assembled a dedicated and cross-functional operations and technology team to conduct an extensive evaluation of the firm’s monitoring capabilities. The company also wanted to leverage the value of meaningful data mined from its entire global operations.
The team realized that the breadth of the company’s operations would make the project challenging even as it began designing a framework for developing and executing its solution. Neither Digital Realty nor its internal operations and technology teams were aware of any similar development and implementation project at this scale—and certainly not one done by an owner/operator.
As the team analyzed data points across the company portfolio, it found additional challenges. Those challenges included how to interlace the different varieties and vintages of infrastructure across the company’s portfolio, taking into consideration the broad deployment of Digital Realty’s Turn-Key Flex data center product, the design diversity of its custom solutions and acquired data center locations, the geographic diversity of the sites and the overall financial implications of the undertaking as well as its complexity.
Drilling Down
Many data center operators are tempted to first explore what DCIM vendors have to offer when starting a project, but taking the time to gain internal consensus on requirements is a better approach. Since no two commercially available systems offer the same features, assessing whether a particular product is right for an application is almost impossible without a clearly defined set of requirements. All too often, members of due diligence teams are drawn to what I refer to as “eye candy” user interfaces. While such interfaces might look appealing, the 3-D renderings and colorful “spinning visual elements” are rarely useful and can often be distracting to a user whose true goal is managing operational performance.
When we started our DCIM project, we took a highly disciplined approach to understanding our requirements and those of our customers. Harnessing all the in-house expertise that supports our portfolio to define the project requirements was itself a daunting task but essential to defining the larger project. Once we thought we had a firm handle on our requirements, we engaged a number of key customers and asked them what they needed. It turned out that our customers’ requirements aligned well with those our internal team had identified. We took this alignment as validation that we were on the right track. In the end, the project team defined the following requirements:
• The first of our primary business requirements was global access to consolidated data. We required every single one of Digital Realty’s data centers have access to the data, and we needed the capability to aggregate data from every facility into a consolidated view, which would allow us to compare performance of various data centers across the portfolio in real time.
• Second, the data access system had to be highly secured and give us the ability to limit views based on user type and credentials. More than 1,000 people in Digital Realty’s operations department alone would need some level of data access. Plus, we have a broad range of customers who would also need some level of access, which highlights the importance of data security.
• The user interface also had to be extremely user-friendly. If we didn’t get that right, Digital Realty’s help desk would be flooded with requests on how to use the system. We required a clean navigational platform that is intuitive enough for people to access the data they need quickly and easily, with minimal training.
• Data scalability and mining capability were other key requirements. The amount of information Digital Realty has across its many data centers is massive, and we needed a database that could handle all of it. We also had to ensure that Digital Realty would get that information into the database. Digital Realty has a good idea of what it wants from its dashboard and reporting systems today, but in five years the company will want access to additional kinds of data. We don’t want to run into a new requirement for reporting and not have the historical data available to meet it.
Other business requirements included:
• Open bidirectional access to data that would allow the DCIM system to exchange information with
other systems, including computerized maintenance management systems (CMMS), event management,
procurement and invoicing systems
• Real-time condition assessment that allows authorized users to instantly see and assess operational
performance and reliability at each local data center as well as at our central command center
• Asset tracking and capacity management
• Cost allocation and financial analysis to show not only how much energy is being consumed but also how
that translates to dollars spent and saved
• The ability to pull information from individual data centers back to a central location using minimal re
sources at each facility
Each of these features was crucial to Digital Realty. While other owners and operators may share similar requirements, the point is that a successful project is always contingent on how much discipline is exercised in defining requirements in the early stages of the project—before users become enamored by the “eye candy” screens many of these products employ.
To Buy or Build?
With 451 Research’s DCIM definition—as well as Digital Realty’s business requirements—in mind, the project team could focus on delivering an information management system that would meet the needs of a broad range of user types, from operators to C-suite executives. The team wanted DCIM to bridge the gap between facilities and IT systems, thus providing data center operators with a consolidated view of the data that would meet the requirements of each user type.
The team discussed whether to buy an off-the-shelf solution or to develop one on its own. A number of solutions on the market appeared to address some of the identified business requirements, but the team was unable to find a single solution that had the flexibility and scalability required to support all of Digital Realty’s operational requirements. The team concluded it would be necessary to develop a custom solution.
Avoiding Unnecessary Risk
There is significant debate in the industry about whether DCIM systems should have control functionality—i.e., the ability to change the state of IT, electrical and mechanical infrastructure systems. Digital Realty strongly disagrees with the idea of incorporating this capability into a DCIM platform. By its very definition, DCIM is an information management system. To be effective, this system needs to be accessible to a broad array of users. In our view, granting broad access to a platform that could alter the state of mission-critical systems would be careless, despite security provisions that would be incorporated into the platform.
While Digital Realty and the project team excluded direct-control functionality from its DCIM requirements, they saw that real-time data collection and analytics could be beneficial to various control-system schemas within the data center environment. Because of this potential benefit, the project team took great care to allow for seamless data exchange between the core database platform and other systems. This feature will enable the DCIM platform to exchange data with discrete control subsystems in situations where the function would be beneficial. Further, making the DCIM a true browser-based application would allow authorized users to call up any web-accessible control system or device from within the application. These users could then key in the additional security credentials of that system and have full access to it from within the DCIM platform. Digital Realty believes this strategy fully leverages the data without compromising security.
The Challenge of Data Scale
Managing the volume of data generated by a DCIM is among the most misunderstood areas of DCIM development and application. A DCIM platform collects, analyzes and stores a truly immense volume of data. Even a relatively small data center generates staggering amounts of information—billions of annual data transactions—that few systems can adequately support. By contrast, most building management systems (BMS) have very limited capability to manage significant amounts of historical data for the purposes of defining ongoing operational performance and trends.
Consider a data center with a 10,000-ft2 data hall and a traditional BMS that monitors a few thousand data points associated mainly with the mechanical and electrical infrastructure. This system communicates in near real time with devices in the data center to provide control- and alarm-monitoring functions. However, the information streams are rarely collected. Instead they are discarded after being acted on. Most of the time, in fact, the information never leaves the various controllers distributed throughout the facility. Data are collected and stored at the server for a period of time only when an operator chooses to manually initiate a trend routine.
If the facility operators were to add an effective DCIM to the facility, it would be able to collect much more data. In addition to the mechanical and electrical data, the DCIM could collect power and cooling data at the IT rack level and for each power circuit supporting the IT devices. The DCIM could also include detailed information about the IT devices installed in the racks. Depending on the type and amount desired, data collection could easily required 10,000 points.
But the challenge facing this facility operator is even more complex. In order to evaluate performance trends, all the data would need to be collected, analyzed and stored for future reference. If the DCIM were to collect and store a value for each data point for each minute of operation, it would have more than five billion transactions per year. And this would be just the data coming in. Once collected, the five billion transactions would have to be sorted, combined and analyzed to produce meaningful output. Few, if any, of the existing technologies installed in a typical data center have the ability to manage this volume of information. In the real world, Digital Realty is trying to accomplish this same goal across its entire global portfolio.
The Three Silos of DCIM
As Digital Realty’s project team examined the process of developing a DCIM platform, it found that the challenge included three distinct silos of data functionality: the engine for collection, the logical structures for analysis and the reporting interface.
Figure 1. Digital Realty’s view of the DCIM stack.
The engine of Digital Realty’s DCIM must reach out and collect vast quantities of data from the company’s entire portfolio (see Figure 1). The platform will need to connect to all the sites and all the systems within these sites to gather information. This challenge requires a great deal of expertise in the communication protocols of these systems. In some instances, accomplishing this goal will require “cracking” data formats that have historically stranded data within local systems. Once collected, the data to be checked for integrity and packaged for reliable transmission to the central data store.
The project team also faced the challenge of creating the logical data structures that to process, analyze and archive the data once the DCIM has successfully accessed and transmitted the raw data from each location to the data store. Dealing with 100-plus data centers, often with hundreds of thousands of square feet of white space each, increases the scale of the challenge exponentially. The project team overcame a major hurdle in addressing this challenge when it was able to define relationships between various data categories that allowed the database developers to prebuild and then volume-test data structures to ensure they were up to the challenge.
These data structures, or “data hierarchies” as Digital Realty’s internal team refers to them, are the “secret sauce” of the solution (see Figure 2). Many of the traditional monitoring and control systems in the marketplace require a massive amount of site-level point mapping that is often field-determined by local installation technicians. These points are then manually associated with the formulas necessary to process the data. This manual work is why these projects often take much longer to deploy and can be difficult to commission as mistakes are flushed out.
Figure 2. Digital mapped all the information sources and their characteristics as a step to developing its DCIM.
In this solution, these data relationships have been predefined and are built into the core database from the start. Since this solution is targeted specifically to a data center operation, the project team was able to identify a series of data relationships, or hierarchies, that can be applied to any data center topology and still hold true.
For example, an IT application such as an email platform will always be installed on some type of IT device or devices. These devices will always be installed in some type of rack or footprint in a data room. The data room will always be located on a floor, the floor will always be located in a building, the building in a campus or region, and so on, up to the global view. The type of architectural or infrastructure design doesn’t matter; the relationship will always be fixed.
The challenge is defining a series of these hierarchies that always test true, regardless of the design type. Once designed, the hierarchies can be pre-built, their validity tested and they can be optimized to handle scale. There are many opportunities for these kinds of hierarchies. This is exactly what we have done.
Having these structures in place facilitates rapid deployment and minimizes data errors. It also streamlines the dashboard analytics and reporting capabilities, as the project team was able to define specific data requirements and relationships and then point the dashboard or report at the layer of the hierarchy to be analyzed. For example, a single report template designed to look at IT assets can be developed and optimized and would then rapidly return accurate values based on where the report was pointed. If pointed at the rack level, the report would show all the IT assets in the rack; if pointed at the room level, the report would show all the assets in the room, and so on. Since all the locations are brought into a common predefined database, the query will always yield an apples-to-apples comparison regardless of any unique topologies existing at specific sites.
Figure 3. Structure and analysis as well as web-based access were important functions.
Last remains the challenge of creating the user interface, or front end, for the system. There is no point in collecting and processing the data if operators and customers can’t easily access it. A core requirement was that the front end needed to be a true browser-based application. Terms like “web-based” or “web-enabled” are often used in the control industry to disguise the user interface limitations of existing systems. Often to achieve some of the latest visual and 3-D effects, vendors will require the user’s workstation to be configured with a variety of thin-client applications. In some cases, full-blown applications have to be installed. For Digital Realty, installing add-ins on workstations would be impractical given the number of potential users of the platform. In addition, in many cases, customers would reject these installs due to security concerns. A true browser-based application requires only a standard computer configuration, a browser and the correct security credentials (see Figure 3).
Intuitive navigation is another key user interface requirement. A user should need very little training to get to the information they need. Further, the information should be displayed to ensure quick and accurate assessment of the data.
Digital Realty’s DCIM Solution
Digital Realty set out to build and deploy a custom DCIM platform to meet all these requirements. Rollout commenced in May 2013, and as of August, the core team was ahead of schedule in terms of implementing the DCIM solution across the company’s global portfolio of data centers.
The name EnVision reflects the platform’s ability to look at data from different user perspectives. Digital Realty developed EnVision to allow its operators and customers insight into their operating environments and also to offer unique features specifically targeted to colocation customers. EnVision provides Digital Realty with vastly increased visibility into its data center operations as well as the ability to analyze information so it is digestible and actionable. It has a user interface with data displays and reports that are tailored to operators. Finally, it has access to historical and predictive data.
In addition, EnVision provides a global perspective allowing high-level and granular views across sites and regions. It solves the stranded data issue by reaching across all relevant data stores on the facilities and IT sides to provide a comprehensive and consolidated view of data center operations. EnVision is built on an enterprise-class database platform that allows for unlimited data scaling and analysis and provides intuitive visuals and data representations, comprehensive analytics, dashboard and reporting capabilities from an operator’s perspective.
Trillions of data points will be collected and processed by true browser-based software that is deployed on high-availability network architecture. The data collection engine offers real-time, high-speed and high-volume data collection and analytics across multiple systems and protocols. Furthermore, reporting and dashboard capabilities offer visualization of the interaction between systems and equipment.
Executing the Rollout
A project of this scale requires a broad range of skill sets to execute successfully. IT specialists must build and operate the high-availability compute infrastructure that the core platform sits on. Network specialists define the data transport mechanisms from each location.
Control specialists create the data integration for the various systems and data sources. Others assess the available data at each facility, determine where gaps exist and define the best methods and systems to fill those gaps.
The project team’s approach was to create and install the core, head-end compute architecture using a high-availability model and then to target several representative facilities for proof-of-concept. This allowed the team of specialists to work out the installation and configuration challenges and then to build a template so that Digital Realty could repeat the process successfully at other facilities. With the process validated, the program moved onto the full rollout phase, with multiple teams executing across the company’s portfolio.
Even as Digital Realty deploys version 1.0 of the platform, a separate development team continues to refine the user interface with the addition of reports, dashboards and other functions and features. Version 2.0 of the platform is expected in early 2014, and will feature an entirely new user interface, with even more powerful dashboard and reporting capabilities, dynamically configurable views and enhanced IT asset management capabilities.
The project has been daunting, but the team at Digital Realty believes the rollout of the EnVision DCIM platform will set a new standard of operational transparency, further bridging the gap between facilities and IT systems and allowing operators to drive performance into every aspect of a data center operation.
David Schirmacher
David Schirmacher is senior vice president of Portfolio Operations at Digital Realty, where is responsible for overseeing the company’s global property operations as well as technical operations, customer service and security functions. He joined Digital Realty in January 2012. His more than 30 years of relevant experience includes turns as principal and Chief Strategy Officer for FieldView Solutions, where he focused on driving data center operational performance; and vice president, global head of Engineering for Goldman Sachs, where he focused on developing data center strategy and IT infrastructure for the company’s headquarters, trading floor, branch offices and data center facilities around the world. Mr. Schirmacher also held senior executive and technical positions at Compass Management and Leasing and Jones Lang LaSalle. Considered a thought leader within the data center industry, Mr. Schirmacher is president of 7×24 Exchange International and he has served on the technical advisory board of Mission Critical.
Thinking Ahead Can Prevent the Mid-Life Energy Crisis in Data Centers
/in Design/by Kevin HeslinTurning back the clock at Barclays Americas’ data centers
By Jan van Halem and Frances Cabrera
Barclays is a major global financial services provider engaged in personal banking, credit cards, corporate and investment banking, and wealth and investment management with an extensive international presence in Europe, the Americas, Africa, and Asia. Barclays has two major data centers in the northeastern United States with production environments that support the Americas region’s operations (see Figure 1). Barclays Corporate Real Estate Solutions (CRES) Engineering team manages the data centers, in close partnership with the Global Technology and Information Services (GTIS) team.
Both of Barclays Americas’ data centers are reaching 7 years old, but not showing their age—at least not energy wise. For the last 3 years, Barclays Americas’ engineering and IT teams have been on an energy-efficiency program to ensure that the company’s data center portfolio continues operating even more efficiently than originally commissioned.
Figure 1. Comparison of DC 1 and DC 2
By 2013, Barclays Americas’ two data centers had reduced their energy consumption by 8,000 (megawatt-hours (MWh) or 8%, which equates to 3,700 tons of carbon emissions avoided. In addition, the power usage effectiveness (PUE) of the largest data center dropped from an annual average of 1.63 to 1.54–earning it an Energy Star certification for 2013.
The Barclays Americas team pinpointed the following strategies for a three-pronged attack on energy inefficiency:
conditioning (CRAC) units and pumps
The goals were to:
Figure 2. Summary of initiatives implemented across the two data centers: 2010-present.
The team found considerable savings without having to make large capital investments or use complex solutions that would threaten the live data center environment (see Figure 2). The savings opportunities were all identified, tested, implemented, and validated using in-house resources with collaboration between the engineering and IT teams.
Background
In 2010, Barclays launched a Climate Action Programme, which has since been expanded into its 2015 Citizenship Plan. The program included a carbon reduction target of 4% by the end of 2013 from 2010 levels. The target spurred action among the regions to identify and implement energy conservation measures and created a favorable culture for the funding of initiatives. Due to energy initiatives like the data center programs put in place in the Americas, Barclays reduced its carbon emissions globally by 12% in 2012, meeting the target two years early. Barclays is now setting its sights on a further carbon reduction of 10% by 2015.
From the beginning, it was understood that any new data center energy initiatives in the Americas must take into account both IT and building operation considerations. CRES and GTIS worked together on research, testing, and mocking up these initiatives. Their findings were shared with the central teams at headquarters so that case studies could be developed to ensure that the successes can be used at other sites around the world.
Blowing Back the Years
Initially, the team focused on airflow management. The team successfully reduced CRAC unit fan speeds at DC 1 from 70% to 40% by raising temperature set points as recommended by ASHRAE TC9.9, replacing unneeded perforated tiles with solid tiles, and installing Cold Aisle containment. All these changes were implemented with no impact to equipment operation.
The first step, and the easiest, was to evaluate the initial fan speed of the CRAC units at DC 1, and reduce it from 70% to 58%, with no effect on performance. When the DCs were commissioned they were briefly operating at 55°F (15°C), which was quickly increased to 65°F (18°C). By 2010, the temperature set points in Cold Aisles were raised to 75°F (24°C) from 65°F (18°C) in both DC 1 and DC 2 as recommended by updated ASHRAE standard for data center cooling.
Next, Barclays’ teams turned to the perforated tiles. Perforated tiles in DC 1 were replaced with solid tiles. This reduced CRAC fan speeds from 58% to 50%, with no impact on Cold Aisle temperature conditions and equipment operation.
Finally, in 2011, the PDU aisles were retrofitted with containment doors, which further improved airflow efficiency. After the site teams researched various options and recommended a commercial product, an in-house crew completed the installation. The team opted to use an in-house personnel to avoid having external contractors working in a live environment, which meant that the IT operations team felt comfortable having trusted engineering staff in the server rooms. The engineering staff used its expertise and common sense to install the doors with no disturbance to the end users.
Figure 3. Energy savings attributed to cooling enhancements
The team chose not to extend the containment above the aisles. Using data from wireless temperature and humidity sensors located throughout the aisles, the team found that it could still achieve ~80% of the projected energy savings without having the containment reaching the ceiling and avoid the additional cost and potential issues with local fire codes.
With the doors installed, the teams continue monitoring temperatures through the sensors and can control airflow by adjusting CRAC fan speeds,which regulates the amount of supply air in the Cold Aisle to minimize the bypass overflow and ensure efficient air distribution. As a result, return air temperature increased, driving efficiency and allowing further fan speed reductions of 50% to 40%. Again, these reductions were achieved with no impact to operating conditions. Based on these successes, Cold Aisle containment was installed at DC 2 in 2014.
Figure 4. Available free cooling at DC 1 and DC 2
A curve based on manufacturer’s data and direct power readings for the CRAC units enabled the teams to calculate the power reductions associated with volume flow rate reductions As a result, they calculated that the airflow initiatives resulted in 3.6 gigawatt-hours (GWh) of energy annually across DC 1 and DC 2 (see Figure 3).
Staying Cool
DC 1’s primary cooling system is a water-cooled plant consisting of two 1,250-ton chillers and five cooling towers. DC 1 was designed to utilize free cooling technology that feeds condenser water from cooling towers into a heat exchanger. The heat exchanger then cools the building’s chilled water. In 2011, the CRES Engineering team embarked on an initiative to maximize the use of free cooling.
After reviewing operational parameters such as temperature, water flow, and cooling tower fan speeds, Barclays team made the following adjustments:
the water go through two units instead of one increased the efficiency of the heat exchange.
analyzing data from an electrical power monitoring system (EPMS). As a result, the volume of
condenser water was divided among multiple cooling tower units instead of all going into one.
free cooling is possible (see Figure 4).
Of the three strategies, it is hardest to directly measure and attribute energy savings to enhancing cooling operations, as the changes impact several parts of the cooling plant. Barclays used the EPMS to track power readings throughout the systems, particularly the cooling tower units. The EPMS enables PUE trending and shows the reduction of DC 1’s PUE overtime. Since 2011, it’s dropped 5% to an annual average of 1.54.
Driving Back Inefficiency
Figure 5. Savings attributed to VFDs
In 2013 the teams began an intensive review of VFD technology. They found that considerable energy savings were to be obtained by installing VFDs on several pieces of equipment such as air handlers in plant rooms and condenser water pumps in both DC 1 and DC 2. The VFDs control the speed of an existing AC induction motor. By reducing the speed on air handlers, the unit load can be adjusted to the existing heat load (see Figure 5).
The team focused on (36) 30-ton units throughout DC 1 that would yield positive energy and cost savings. Utility rebates further enhanced the business case. The VFDs were installed towards the end of 2013, and the business case applied to DC 2 for further installations in 2014.
Figure 6. Savings attributed to frequency reductions on AC motors
To calculate savings, direct power readings were taken at the CRAC units at intervals of 2 hertz (Hz) from 60 Hz to 30 Hz. As shown in Figure 6, reducing CRAC frequency from 60 Hz to 56 Hz reduced power demand by 19%. In addition, the fan motor will release less heat to the air further reducing cooling load.
Additional maintenance cost savings are achieved through the extension of the filter replacement cycle. The VFDs allow less air to go through the system increasing the life span of the system filters. Once fully implemented the VFD installations will save over 3.9 GWh of energy.
Data as the Fountain of Youth
Comprehensive monitoring systems in place across the two data centers provided data accessible by both the GTIS and CRES teams, enabling them to make the best, data-driven decisions. The use of the sites’ EPMS and branch circuit monitoring system (BCMS) enable the teams to pinpoint areas with the greatest energy-saving potential and work together to trial and implement initiatives.
Barclays uses the EPMS as a tool to monitor, measure, and optimize the performance of the electrical loads. It monitors critical systems such as HVAC equipment, electrical switchgear, etc. A dashboard displays trend data. For example, the team can trend the total UPS load and the total building load, which yields PUE, on a continuous, real-time basis.
In addition to the EPMS, the CRES and GTIS teams also use the BCMS to track energy use by server cabinet, remote power panel, and power distribution unit. This system is used for capacity planning and load balancing. In addition, the monitored cabinet loads are used to optimize the airflow in the Cold Aisles.
Conclusion
With the right level of collaboration between CRES and GTIS, the right data, and the right corporate environmental targets, the Barclays Americas’ team was able to find energy and cost savings in their data centers. By executing airflow management, enhanced cooling, and VFD strategies in the existing data centers, the team applied the latest standards and best practices to keep energy consumption at levels typical of new data centers. At 8 GWh lighter, with an Energy Star certification and a PUE that keeps dropping—these data centers are not showing their energy age.
Jan van Halem is vice president, Data Center Engineering at Barclays. He joined Barclays’ engineering team in 2004. Mr. van Halem has more than 20 years experience and a strong knowledge of critical data center systems and mechanical operations. At Barclays, he is responsible for the mechanical and electrical systems of the company’s major data centers in the Americas. In addition, he has provided engineering design to new construction and expansion projects in the region.
Before joining Barclays, Mr. van Halem was with real estate organizations and held positions in facility management, construction management, and project management. Mr. van Halem has a BS degree in Marine Engineering from the Maritime Institute De Ruyter in Flushing, the Netherlands. He served as a marine engineer for 8 years in the Dutch Merchant Marine.
Frances Cabrera, LEED AP, is vice president, Environmental Management at Barclays. She joined Barclays’ environmental management team in 2011. Ms. Cabrera oversees Barclays Americas’ environmental programs, both resource saving and compliance, to support the region’s ISO certified management system. With the collaboration of the corporate real estate team, the region has achieved multiple LEED and Energy Star certifications. She’s also part of the firm’s global center of excellence for environment, where she works across regions to measure and support the firm’s green IT achievements.
Before joining Barclays, Ms. Cabrera ran the ISO 14001 systems in North and South America for Canon USA and worked at various manufacturing companies in Rochester, NY, integrating environmental considerations into their operations and meeting regulations. Ms. Cabrera has a BS degree in Environmental Technology and a MS degree in Environmental, Health, and Safety Management from the Rochester Institute of Technology.
Close Coupled Cooling and Reliability
/in Design/by Kevin HeslinAchieving Concurrently Maintainable and Fault Tolerant cooling using various close coupled cooling technologies
By Matt Mescall
Early mainframe computers were cooled by water at the chip level. Then, as computing moved to the distributed server model, air replaced water. Typically, data centers included perimeter computer room air condition (CRAC) units to supply cold air to a raised floor plenum and perforated floor tiles to deliver it to IT equipment. These CRAC units were either direct-expansion (DX) or chilled-water units (for simplicity, CRAC will be used to refer to either kind of unit). This arrangement worked for the past few decades while data centers were primarily occupied by low-density IT equipment (< 2-4 kilowatts [kW] per rack). However, as high-density racks become more common, CRAC units and a raised floor may not provide adequate cooling.
To address these situations, data center cooling vendors developed close coupled cooling (CCC). CCC technology includes in-row, in-rack, above–rack, and rear-door heat exchanger (RDHx) systems. Manufacturers typically recommend the use of a Cold Aisle/Hot Aisle arrangement for greater efficiency, which is a best practice for all data center operations. As rack density increased due to IT consolidation and virtualization, CCC moved from being a solution to an unusual cooling situation to being the preferred cooling method. Implemented properly, a CCC solution can meet the Concurrently Maintainable and Fault Tolerant requirements of a data center.
While an air handler may provide humidity control, the close coupled cooling solution provides the onlycooling for the IT equipment in a data center. Additionally, it is assumed that the reader understands how to design a direct-expansion or chilled-water CRAC based cooling system to meet Concurrent Maintainability and Fault Tolerant requirements. This paper does not address Concurrent Maintainability and Fault Tolerant requirements for a central cooling plant, only the CCC system in the data hall.
Meeting Concurrently Maintainable and Fault Tolerant Requirements
First, let’s clarify what is required for a Concurrently Maintainable (Tier III) and a Fault Tolerant (Tier IV) cooling system. This discussion is not a comprehensive description of all Concurrently Maintainable and Fault Tolerant requirements, but it provides the basis for the rest of the discussion in this paper.
A Concurrently Maintainable system must have redundant capacity components and independent distribution paths, which means that each and every capacity component and distribution path element can be taken out of service for maintenance, repair, or replacement without impacting the critical environment.
To meet this requirement, the system must have dry pipes (no flowing or pressurized liquid) to prevent liquid spills when maintaining pipes, joints, and valves. Draining a pipe while it is disassembled is allowed, but hot tapping and pipe freezing are not. A Fault Tolerant cooling system may look like a Concurrently Maintainable system, but it must also autonomously respond to failures, including Continuous Cooling, and compartmentalize the chilled-water and/or refrigerant pipes outside the room of use (typically the computer room).
There are several different types and configurations of CCC. For simplicity, this paper will break them into two groups, in-row and above–row, and RDHx. While there are other CCC solutions available, the same concepts can be used to provide a Concurrently Maintainable or Fault Tolerant design.
In-row and above-row CCC
When data center owners have a business requirement for a high density data center to be Concurrently Maintainable or Fault Tolerant, a CCC design poses special circumstances that do not exist with room-based cooling. First, airflow must be considered. A CRAC-unit-based cooling design that is Concurrently Maintainable or Fault Tolerant has N+R cooling units that provide cooling to the whole room. When a redundant unit is off for maintenance or suffers a fault, the IT equipment still receives cooling from the remaining CRAC units via the perforated tiles in the Cold Aisle. The cooling in any Cold Aisle is not affected when the redundant unit is offline. This arrangement allows for one or two redundant CRAC units in an entire room (see Figure 1).
Figure 1. CCC airflow considerations
CCC provides cooling to the specific Cold Aisle where the unit is located. In other words, CCC units cannot provide cooling to different Cold Aisles the way CRAC units can. Accordingly, the redundant CCC unit must be located in the aisle where the cooling is required. In addition to having sufficient redundant cooling in every Cold Aisle, distance from the cooling unit to the IT equipment must also be considered. In-row and above row cooling units typically can provide cold air for only a limited distance. The design must take into account the worst-case scenario during maintenance or a failure event.
After considering the number of units and their location in the Cold Aisle, design team must consider the method of cooling, which may be air-cooled direct expansion (DX), chilled water, or a pumped refrigerant. Air-cooled DX units aretypically matched with their own condenser units. Other than proper routing, piping for air-cooled DX units require no special considerations.
Piping to chilled-water units is either traditional chilled-water piping or a cooling distribution unit (CDU). In the former method chilled water is piped directly to CCC units, similar to CRAC units. In this case, chilled-water piping systems are designed to be Concurrently Maintainable or Fault Tolerant in the same way as single-coil, room-based CRAC units.
The latter method, which uses CDUs, poses a number of special considerations. Again, chilled-water piping to a CDU and to single-coil, room-based CRAC units is designed to be Concurrently Maintainable or Fault Tolerant in the same way. However, designers must consider the impact to each Cold Aisle when a CDU is removed from service or suffers a fault.
If any single CDU provides cooling to more than the redundant number of cooling units in any aisle, the design is not Concurrently Maintainable or Fault Tolerant. When CDUs are located outside of the server room or data hall in a Fault Tolerant design, they must be properly compartmentalized so that a single event does not remove more than the redundant number of cooling units from service. A Fault Tolerant system also requires Continuous Cooling, the ability to detect, isolate, and contain a fault, and sustain operations. In a CCC system that rejects heat to a chilled-water system, the mechanical part of Continuous Cooling can be met with an appropriate thermal storage tank system that is part of a central plant.
A CCC system that rejects heat to outside air via refrigerant and a condenser will likely rely on uninterrupted power to provide Continuous Cooling which will be discussed in the following paragraphs.
Some CCC systems use pumped refrigerant. These systems transfer heat from pumped refrigerant to a building’s chilled-water system, a glycol system, or an external condenser unit.
Due to the similarities between chilled-water and glycol systems with respect to the piping headers, glycol and chilled water systems will be treated the same for purposes of this paper.. The heat transfer occurs at an in-room chiller or heat exchanger that, for the purposes of this discussion, is similar to a CDU. The Con- currently Maintainable and Fault Tolerant design considerations for a pumped refrigerant system are the same as a chilled-water system that uses a CDU.
The system that powers all CCC components must be designed to ensure that the electrical system does not defeat the Concurrent Maintainability or Fault Tolerance of the mechanical system. In a Concurrently Maintainable mechanical system electrical design, no more than the redundant number of cooling units may be removed from service when any part of the electrical system is removed from service in a planned manner. This requirement includes the cooling within any aisle, not just the room as a whole. Designing the CCC units and the associated CDUs, in-room chillers, or heat exchangers in a 2N configuration greatly simplifies the electrical distribution.
Providing an A feed to half of the units and a B feed to the other half of the units while paying attention to the distribution of the CCC units, will typically provide a Concurrently Maintainable electrical design.
If the cooling system is in an N+R configuration, the distribution of the power sources will require special coordination. Typically, the units will be dual fed, which can be accomplished by utilizing an internal transfer switch n the units, an external manual transfer switch, or an external automatic transfer switch. This requirement applies to all components of the CCC system that require power to cool the critical space, including the in-row and above-row units, the in-room chillers, heat exchangers, and any power that is required for CDUs (see Figure 2).
Figure 2. CCC valve scenario
When any part of a Fault Tolerant electrical design for a mechanical system experiences a fault no more than the redundant number of cooling units may be removed from service. The same Concurrently Maintainable concepts apply to a Fault Tolerant electrical system; however, all of the transfer switches must be automatic and cannot rely on human intervention to respond to a fault. Additionally, in order to provide Continuous Cooling, uninterruptible power must be provided for cooling fans, in-room chillers and heat exchangers, pumps, and CDUs. A CCC system that uses DX and condensers to reject heat to outside air will require uninterrupted power to all system components to achieve Continuous Cooling.
The controls for these systems must also be considered in the design and meet the appropriate Concurrent Maintainability and Fault Tolerant requirements.
RDHx
The requirements for a Concurrently Maintainable or Fault Tolerant RDHx cooling solution are similar to those for in-row cooling. The RDHx units typically use chilled water or a pumped refrigerant and CDUs, in-room chillers, or heat exchangers. These units need to meet all of the Concurrent Maintainability
or Fault Tolerant requirements of in-row CCC units. Airflow when a door is removed from service for either a planned event or due to a failure is a major consideration. When an RDHx solution cools an entire data center, it may be configured in a front-to-back rack configuration. When one or more doors are removed from service, the affected racks will blow hot exhaust air into the racks behind them, which may cause them to overheat, depending on the heat load.
This configuration does not meet Concurrent Maintainability or Fault Tolerant requirements, which require that the cooling system provide N cooling to all critical equipment during a planned maintenance event or a failure. Placing the racks in a Cold Aisle/Hot Aisle configuration may not meet this requirement as exhaust air from the affected rack may circulate over its top from the Hot Aisle and overheat the servers at the top of the rack and possibly adjacent racks. The same airflow issue is possible for racks placed at the end of rows when their RDHx is not working.
Summary
Using CCC as the only form of cooling in the data center is becoming more common. CCC provides additional challenges to meet Concurrent Maintainability and Fault Tolerant requirements beyond those typically experienced with a CRAC-based cooling system. The challenges of different airflow, when compared to room- based CRACs, and ensuring that the consequential impact of maintenance and failures on the additional capacity components and distribution systems do not remove more than the redundant number of units from service can be met with careful consideration when designing all parts of the CCC system.
Matt Mescall
Matthew Mescall, PE, is a senior consultant for Uptime Institute Professional Services and Tier Certification Authority, where he performs audits and provides strategic- level consulting and Tier Certification reviews. Mr. Mescall’s career in critical facilities spans 12 years and includes responsibilities in planning, engineering, design, construction, and operation. Before joining Uptime Institute, Mr. Mescall was with IBM, where he operated its Boulder, CO, data center and led a worldwide team analyzing best practices across IBM data centers to ensure consistent, cost-effective reliability.Mr. Mescall holds a BS degree in Civil Engineering from the University of Southern California, a MS in Civil Engineering from the Georgia Institute of Technology, and a Masters Certificate in Project Management from George Washington University.
Annual Data Center Industry Survey 2014
/in Design, Executive, Operations/by Kevin HeslinThe fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional.
To immediately access the full report, please provide your business contact information.
2014 Data Center Industry Survey
/in Private/by Kevin HeslinThe fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional. This document provides the full results and analysis.
I. Survey Demographics
Uptime Institute focused its analysis on the end-user survey respondents. The majority of survey participants are data center managers, followed by smaller percentages of IT managers and senior executives. The U.S. and Canada make up a significant portion of the response, with growing numbers of participants from around the globe.
About half of the end-user respondents work for third-party commercial data center companies (colocation or cloud computing providers), and the other half work for enterprises in vertical industries such as financial services (11%), manufacturing (7%), healthcare (4%), government (4%), and other industries (26%).
In various sections throughout this survey, the financial services industry’s responses have been broken out from those of traditional enterprise companies. Across multiple focus areas, the responses of financial services organizations differ significantly from those of other verticals.
Previous annual surveys in 2011, 2012, and 2013 showed that large organizations (defined in this context as companies managing over 5,000 servers) were adopting new technologies, outsourcing less-critical workloads, and pursuing energy efficiency goals much faster than smaller companies.
For 2014, we analyzed the results further and found that the data center maturity gap is not just a matter of size, but also specific to a single industry (banking). This difference is most likely due to the massive investment financial organizations have in IT, especially in relation to their overall cost structures. A financial organization’s efficiency at deploying IT correlates directly to its profitability in a way that may not be as obvious to other companies.
When the response profiles of the financial organizations and the colocation providers are compared, a pattern starts to emerge – both banks and colos run their data center operations as a business.
We will explore the implications of these parallels throughout the survey data.
II. Data center budgets
In each year’s survey, we ask participants to compare their organization’s current spending on data centers (including real estate, infrastructure equipment, staffing, operations, etc.) to the previous year. Every year, the answers to these questions reveal massive growth in the colocation and multi-tenant data center (MTDC) industry compared to enterprise spending.
In 2014, the vast majority of third-party data center respondents (86%) report receiving budget increases, versus 63% of financial firms and just 50% of the other enterprise companies. This gap is similar to the 2013 results: 77% of third-party respondents increased budget, versus just 47% of enterprise companies.
With half of all enterprises reporting stagnant or shrinking data center budgets and colocation operators reporting massive growth, our conclusion is that increasingly, enterprise data center workloads are shifting to third-party providers.
This is not to say that the enterprise data centers are going away any time soon. These organizations will continue to squeeze a return on their investments in enterprise data center assets. The value of a performing, fully or partially depreciated asset cannot be disregarded.
According to surveys from Uptime Institute’s executive programs, nearly every enterprise has chosen to host some percentage of its IT workloads off-premise, in a MTDC, cloud, or other third-party environment. Anecdotal reports to Uptime Institute lead us to believe this to be a fairly recent development (within the last 5 years).
Yet, while nearly every company is deploying off-premise computing, the percentage of workloads hosted in these environments appears to be fairly static as a percentage of overall compute. The figure below represents enterprise organizations’ IT deployment mix today, and projected deployment mix in 2014. This indicates that the growth trends in off-premise computing will continue as overall enterprise IT workloads continue to grow. This finding also indicates that there is no imminent rebound or recoil in the circular trend of outsourcing-insourcing-outsourcing commonly held in IT and other key enterprise functions such as call centers.
This report will delve into the drivers and decision-making challenges for enterprise IT organizations contracting MTDCs and cloud computing in Section IV.
III. IT Efficiency
A special focus in 2014’s survey is an assessment of the behaviors, management strategies, and technologies used to improve IT efficiency.
Over the past several years, Uptime Institute has documented the rise in adoption of Power Usage Effectiveness (PUE) and the meager gains achieved by further pursuing that metric. Based on Uptime Institute’s field experience and feedback from the Uptime Institute Network (a user group of large data center owners and operators) around the world, enterprise IT executives are overly focused on PUE.
The vast majority (72%) of respondents measure PUE. Isolating the responses by job function, a huge percentage of executives (82%) are tracking that metric and reporting it to their corporate management.
PUE is an effective engineering ratio that data center facilities teams use to capture baseline data and track the results of efficiency improvements to mechanical and electrical infrastructure. It is also useful for design teams to compare equipment- or topology-level solutions. But as industry adoption of PUE has expanded, the metric is increasingly being misused as a methodology to cut costs and prove stewardship of corporate and/or environmental resources
In 2007, Uptime Institute surveyed its Network members, and found an average PUE of 2.50. The average PUE improved from 2.50 in 2007 to 1.89 in 2011 in Uptime Institute’s data center industry survey.
From 2011 to today, the average self-reported PUE has only improved from 1.89 to 1.7. The biggest infrastructure efficiency gains happened five years ago, and further improvements will require significant investment and effort, with increasingly diminishing returns.
The figure following represents adoption of various data center cooling approaches to improve efficiency. The low-capital cost approaches have largely been adopted by data center operators. And yet, executives pressure for further reduction in PUE. High-cost efficiency investments in technologies and design approaches may provide negative financial payback and zero improvement of systemic IT efficiency problems.
Many companies’ targets for PUE are far lower than their reported current state. By focusing on PUE, IT executives are spending effort and capital for diminishing returns and continue to ignore the underlying drivers of poor IT utilization.
For example, Uptime Institute estimates of 20% of servers in data centers are obsolete, outdated, or unused. Yet, very few survey respondents believe their server populations include comatose machines. Nearly half of survey respondents have no scheduled auditing to identify and remove unused hardware.
Historically, IT energy efficiency has been driven by data center facilities management. According to the Uptime Institute’s Annual Data Center Industry Survey (2011-2014), less than 20% of companies report that their IT departments pay the data center power bill, and the vast majority of companies allocate this cost to the facilities or real estate budgets.
This lopsided financial arrangement fosters unaccountable IT growth, inaccurate planning, and waste. This is why 67% of the senior IT executives believe comatose hardware is not a problem.
Uptime Institute launched the Server Roundup contest in October 2011 to raise awareness about the removal and recycling of comatose and obsolete IT equipment and reduce data center energy use. Uptime Institute invited companies around the globe to help address and solve this problem by participating in the Server Roundup, an initiative to promote IT and Facilities integration and improve data center energy efficiency.
In 2 years of participating in Server Roundup, the financial firm Barclays has removed nearly 15,000 servers and saved over US$10M. Server Roundup overwhelmingly proves that disciplined hardware decommissioning can provide a significant financial impact.
Yet despite these huge savings and intangible benefits to the overall IT organization, many firms are not applying the same level of diligence and discipline to a server decommissioning plan, as noted previously.
This is the crux of the data center efficiency challenge ahead—convincing more organizations of the massive return on investment in addressing IT instead of relentlessly pursuing physical infrastructure efficiency.
Organizations need to hold IT operations teams accountable to root out inefficiencies, of which comatose servers are only the most obvious and egregious example.
For nearly a decade, Uptime Institute has recommended enterprise IT executives take a holistic approach to significantly reduce the cost and resource consumption of compute infrastructure. That approach is outlined here.
IV. Enterprise Adoption of Third-Party Data Centers and IT Services
As stated earlier, nearly every enterprise organization is using some combination of in-house IT and off-premise computing. There are a number of drivers for this trend, including the ability to right-size deployments, lower the cost of investment, and getting IT workloads into production quickly.
So far, enterprise organizations have largely been satisfied with their experiences using multi-tenant data center providers. In fact, in this unverified and self-reported survey, the colocation operators report fewer outages than their enterprise counterparts.
Despite many enterprise organizations currently reporting satisfaction with colocation providers, the deployment to off-premise computing has not always been a smooth transition. In Uptime Institute’s experience, many large enterprise organizations historically ran their own data centers, and only recently started deploying into third-party sites at scale. The facilities and corporate real estate teams who are often responsible for managing these companies have limited experience in contract terms, service level agreements, pricing, and other challenges specific to an outsourced IT relationship.
In fact, the decision over whether to outsource an IT workload and where to host it typically comes from the IT department, and not the management team that ultimately holds responsibility for that contract.
The facilities managers and data center engineers are expected to become experts in third-party data center management on the fly—to learn on the job. All the while, the usage of third-party data center providers is rapidly expanding, and very few enterprises have formalized requirements for engaging with the MTDC market. A large percentage cannot track the cost of downtime for their organizations.
The vast majority of enterprise organizations are blasting workloads into off-premise computing environments, but they don’t know where they are going, or what their staff are supposed to do when they get there. Many organizations are making decisions on a very limited selection of criteria and inputs.
Ultimately, this was the primary reason Uptime Institute developed the FORCSS™ Methodology in 2012.
Uptime Institute FORCSS is a means to capture, compare, prioritize, and communicate the benefits, costs, and impacts of multiple IT deployment alternatives. Deployment alternatives may include owned/existing data centers, commercial data centers (wholesale, retail, colocation, managed service), or IaaS (including cloud) that is procured on a scale or limited basis.
FORCSS provides organizations with the flexibility to develop specific responses to varying organizational needs. A case study series will present the process of applying the FORCSS Factors to specific deployment options and present the outcome of the FORCSS Index—a concise structure that can be understood by non-IT executive management.
Please refer to the FORCSS Introduction and FORCSS Case Study 1.
Conclusions
Appendix
Additional 2014 survey responses:
i. If your organization has adopted Cold Aisle or Hot Aisle containment, approximately what percentage of your cabinets uses this design?
a. Less than 10% contained: 22%
b. 10-25% contained: 13%
c. 25-50% contained: 12%
d. 50% contained: 7%
e. 50-75% contained: 16%
f. 75-100% contained: 30%
ii. Would your organization consider a data center that did not include the following designs/technologies?
a. Raised floor: 52% yes
b. Mechanical cooling: 24% yes
c. Generator: 8% yes
d. Uninterruptible power supply: 7% yes
iii. Does management receive reports on data center energy costs?
a. Yes: 71%
b. No: 29%
iv. Does management set targets for reducing data center energy costs?
a. Yes: 54%
b. No: 46%
v. How does your organization measure PUE?
a. PUE Category 0: 30%
b. PUE Category 1: 25%
c. PUE Category 2: 19%
d. PUE Category 3: 11%
e. Alternative method: 8%
f. Don’t know: 7%
vi. Does your company report PUE publicly?
a. Yes; 10%
b. No; 90%
vii. Has your organization achieved environmental or sustainability certifications for any of its data centers?
a. Colo/MTDC: 35% yes
b. Financial Services: 46% yes
c. Other Enterprises: 21% yes
viii. Considering your company’s primary multi-tenant or colocation provider, what is the length of the commitment you have made to that provider?
a. Under 2 years
i. Financial Services: 28%
ii. Other Enterprise: 36%
b. 2-3 years
i. Financial Services: 11%
ii. Other Enterprise: 22%
c. 3-5 years
i. Financial Services: 30%
ii. Other Enterprise: 21%
d. Over 5 years
i. Financial Services: 32%
ii. Other Enterprise: 21%
ix. If your organization measures the cost of data center downtime, how do you use that information?
a. Report to management: 88%
b. Rationalize equipment purchases: 51%
c. Rationalize services purchases: 42%
d. Rationalize increased staff or staff training: 39%
e. Rationalize software purchases: 32%
x. Does your organization perform unscheduled drills that simulate data center emergencies?
a. Yes: 44%
b. No: 56%
xi. Considering your organization’s largest enterprise data center, what staffing model is used for facilities staff?
a. 24 hours a day, 7 days a week: 70%
b. Other: 30%
Email Uptime Institute Director of Content and Publications Matt Stansberry with any questions or feedback: [email protected].
This paper provides analysis and commentary of the Uptime Institute survey responses. Uptime Institute makes reasonable efforts to facilitate a survey that is reliable and relevant. All participant responses are assumed to be in good faith. Uptime Institute does not verify or endorse the responses of the participants; any claims to savings or benefits are entirely the representations of the survey participants.
Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics
/in Executive/by Kevin HeslinStriking the appropriate balance between cost and reliability is a business decision that requires metrics
By Dr. Hussein Shehata
This paper focuses on cooling limitations of down-flow computer room air conditioners/air handlers (CRACs/CRAHs) with dedicated heat extraction solutions in high-density data center cooling applications. The paper also explains how higher redundancy can increase total cost of ownership (TCO) while supporting only very light loads and proposes a metric to help balance the requirements of achieving higher capacities and efficient space utilization.
With several vendors proposing passive high-density technologies (e.g., cabinet hot air removal as a total resolution to the challenge of high density), this analysis shows that such solutions are only possible for a select few cabinets in each row and not for full deployments.
The vendors claim that the technologies can remove heat loads exceeding 20 kilowatts (kW) per cabinet, but our study disproves that claim; passive-cooling units cannot extract more heat than the cold air supplied by the CRACs. For the efficient design of a data center, the aim is to increase the number of cabinets and the total IT load, with the minimal necessary supporting cooling infrastructure. See Figure 1.
Figure 1. The relationship between IT and supporting spaces
Passive Hot Air Removal
Data center design continually evolves towards increasing capacity and decreasing spatial volume, increasing energy density. High-end applications and equipment have higher energy density than standard equipment; however, the high-performance models of any technology have historically become the market standard with the passage of time, which in the case of the IT industry is a short period. As an example, every 3 years the world’s fastest supercomputers offer 10 times the performance of the previous generation, a trend that has been documented over the past 20 years.
Cooling high-density data centers is mostly commonly achieved by:
• Hot Air Removal (HAR) via cabinet exhaust ducts—active and passive.
See Figure 2.
Figure 2. HAR via cabinet exhaust ducts (active and passive). Courtesy APC
• Dedicated fan-powered cooling units (i.e., chilled water cabinets).
See Figure 3.
Figure 3. Dedicated fan-powered cooling units
This paper focuses on HAR/CRAC technology using an underfloor air distribution plenum.
Approach
High-density data centers require cooling units that are capable of delivering the highest cooling capacity using the smallest possible footprint. The high-powered CRACs in the smallest footprints available from the major manufacturer offer a net sensible cooling capacity of approximately 90 kW but require 3×1-meter (m) (width by depth) footprints. (Appendix C includes the technical specifications for the example CRAC).
Excluding a detailed heat load estimate and air efficiency distribution effectiveness, the variables of CRAC capacity, cabinet quantity, and cabinet capacity may be related in the following formula.
Note: The formula is simplified and focused on IT cooling requirements, excluding other loads such as lighting and solar gains.
CRAC Capacity = Number of IT cabinets x kW/cabinet (1)
Example 1 for N Capacity: If a 90-kW CRAC cools 90 cabinets, the
average cooling delivered per cabinet is 1 kW.
90 kW= 90 cabinets x 1 kW/cabinet (2)
Example 2 for N Capacity: If a 90-kW CRAC cools two cabinets, the
average cooling delivered per cabinet is 45 kW.
90 kW= 2 cabinets x 45 kW/cabinet (3)
The simplified methodology, however, does not provide practical insight into space usage and heat extraction capability. In Example 1, one CRAC would struggle to efficiently deliver air evenly to all 90 cabinets due to the practical constraints of CRAC airflow throw; in most circumstances the cabinets farthest from the CRAC would likely receive less air then the closer cabinets (assuming practical raised-floor heights and minimal obstructions to under floor airflow).
In Example 2, one CRAC would be capable of supplying sufficient cooling to both cabinets; however, the ratio of space utilization of the CRAC, service access space, and airflow throw buffer would result in a high space usage for the infrastructure compared to prime white space (IT cabinets). Other constraints, such as allocating sufficient perforated floor tiles/grills in case of a raised-floor plenum or additional Cold Aisle containment for maximum air distribution effectiveness may lead to extremely large Cold Aisles that again render the data center space utilization inefficient.
Figure 4. Typical Cold Aisle/Hot Aisle arrangement (ASHRAE TC9.9)
Appendix B includes a number of data center layouts generated to illustrate these concepts. The strategic layouts in this study considered maximum (18 m), average (14 m) and minimal (10 m) practical CRAC air throw, with CRACs installed perpendicular to cabinet rows on one and two sides as recommended in ASHRAE TC9.9. The front-to-back airflow cabinets are assumed to be configured to the best practice of Cold Aisle/Hot Aisle arrangement (See Figure 4). Variation in throw resulted in low, medium, and high cabinet count, best defined as high density, average density, and maximum packed (high number of cabinets) for the same data center whitespace area and electrical load (see Figure 5).
Figure 5. CRAC throw area
In the example layouts, CRACs were placed close together, with the minimal 500-millimeter (mm) maintenance space on one side and 1,000 mm on the long side (see Figure 6). Note that each CRAC manufacturer might have different unit clearance requirements. A minimal 2-m buffer between the nearest cabinet and each CRAC unit prevents entrainment of warm air into the cold air plenum. Cold and Hot aisle widths were modeled on approximately 1,000 mm (hot) and 1,200 mm (cold) as recommended in ASHRAE TC9.9 literature.
In the context of this study, CRAC footprint is defined as the area occupied by CRACs (including maintenance and airflow throw buffer); cabinet footprint is defined as the area occupied by cabinets (and their aisles). These two areas have been compared to analyze the use of prime footprint within the data center hall.
Tier level requires each and every power and cooling component and path to fulfill the Tier requirements; in the context of this paper the redundancy configuration reflects the Tier level of CRAC capacity components only, excluding considerations to other subsystems required for the facility’s operation. Tier I would not require redundant components, hence N CRAC units are employed. Tiers II, III, and IV would require redundant CRACs; therefore N+1 and N+2 configurations were also considered.
Figure 6. CRAC maintenance zone
A basic analysis shows that using a CRAC as described above would require a 14-m2 area (including throw buffer), which would generate 25.7 kW of cooling for every 1 m of active CRAC perimeter at N redundancy, 19.3 kW for one-sided N+1 redundancy and two-sided N+2 redundancy, 22.5 kW for two-sided N+1 redundancy, and 12.9 kW for one-sided N+2 redundancy. However, data center halls are not predominantly selected and designed based on perimeter length, but rather on floor area.
The study focused on identifying the area required by CRAC units, compared to that occupied by IT cabinets, and defines it as a ratio. Figure 7 shows Tier I (N) one-sided CRACs in a high-density cabinet configuration. Appendix A includes the other configuration models.
Furthermore, a metric has been derived to help determine the appropriate cabinet footprint at the required Tier level (considering CRAC redundancy only).
Figure 7. Tier 1 (N) one-sided CRACs in a high-density cabinet configuration
Cabinet capacity to footprint factor C2F= kw/cabinet / C2C (4)
Where CRAC to Cabinet factor C2C= CRAC footprint / Cabinet footprint (5)
For multiple layout configurations, the higher the C2F, the more IT capacity can be incorporated into the space. Higher capacity could be established by more cabinets at lower densities or by fewer cabinets at higher densities. However, the C2F is closely linked to the necessary CRAC footprint, which as analyzed in this paper, could be a major limiting factor (see Figure 8).
Figure 8. C2F versus cabinet load (kW) for various CRAC redundancies
Results
The detailed results appear in Appendix B. The variations analyzed included reference CRACs with no redundancy, with one redundant unit, and with two redundant units. For each of the CRAC configurations, three cabinet layouts were considered: maximum packed, average density, and high density).
Results showed that the highest C2F based on the six variations within each of the three redundancy configurations is as follows:
• Tier I (N)–one-sided CRAC deployment: C2F = 13
• Tier II-IV (N+1)–two-sided CRAC deployment: C2F = 11.4
• Tier II-IV (N+2 and above)–two-sided CRAC deployment: C2F = 9.8
The noteworthy finding is that the highest C2F in all 18-modeled variations was for high-density implementation and at a CRAC-to-cabinet (C2C) area ratio of 0.46 (i.e., CRACs occupy 32% of the entire space) and a cabinet footprint of 2.3 m2 per cabinet. This is supporting evidence that, although high-density cabinets would require more cooling footprint, high density is the most efficient space utilization per kW of IT.
Example 3 illustrates how the highest C2F on a given CRAC redundancy and one- or two-sided layout may be utilized for sizing the footprint and capacity within an average-sized 186-m2 data center hall for a Tier II-IV (N+2, C2F=9.8, C2C=0.5, and cabinet footprint of 2.3 m2) deployment. The space is divided into a net 124-m2 data hall for cabinets, and 62 m2 of space for CRAC units by utilizing the resulting ideal C2C of 0.46.
Example 3: If a net 124-m2 data hall for cabinets and 62 m2 of space for CRAC units is available, the highest achievable capacity would be 4.5 kW/cabinet.
9.8= 4.5 kW/cabinet/59 m2 : 127 m2 (6)
To determine the number of cabinets and CRACs, the CRAC cooling capability will be used rather than the common method of dividing the area by cabinet footprint.
The total area occupied by a CRAC is 14 m2; hence approximately four CRACs would occupy the 59-m2 space. Two CRACs are duty, since N+2 is utilized; therefore, the available capacity would be 90 kW x 2 = 180 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 180/4.5 = 40 cabinets.
The total effective space used by the 40 cabinets is 92 m2 (40 x 2.3 m2 ) that is 72% of the available cabinet dedicated area. This shows that higher redundancy may be resilient but does not fully utilize the space efficiently. This argument highlights the importance of the debate between resilience and space utilization.
Example 4 illustrates how C2F may be utilized for sizing the footprint and capacity within the same data center hall but at a lower redundancy of N+1 configuration.
Example 4: By applying the same methodology, the highest achievable capacity would be 5.2 kW/cabinet.
11.4= (7)
The total area occupied by a CRAC is 14 m2 (including CRAC throw and maintenance); hence approximately four CRACs would occupy 59 m2 of space. Three CRACs would be on duty, since N+1 is utilized; therefore, the available capacity would be 90 kW x 3 = 270 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 270/5.2 = 52 cabinets.
The total effective space used by the 52 cabinets is 120 m2 (52 x 2.3 m2 ), which is 95% of the space. The comparison of Example 3 to Example 4 shows that less redundancy provides more efficient space utilization.
Figure 9. Summary of the results
The analysis shows that taking into consideration the maximum C2F results obtained for each redundancy type and then projecting output on a given average load per cabinet, an example average high-density cabinet of 20 kW would require the CRAC units to occupy double the IT cabinet space in an N+2 configuration, hence lowering the effective use of such prime IT floor space (See Figure 9).
Additional Metrics
Additional metrics for design purposes have been derived from the illustrated graphs and resultant formulae.
The derived formula could be documented as follows:
P=K/L+M-(6.4 x R/S) (8)
Where
P = Cooling per perimeter meter (kW/m)
K = CRAC net sensible capacity (kW)
L = CRAC length (m)
M = CRAC manufacturer side maintenance clearance (m)
R = CRAC redundancy
S = One- or two-sided CRAC layout
Conclusion
Approximately 50% (270 kW/180 kW) more capacity, 30% more cabinets, and 16% higher-cabinet load density could be utilized in the same space with only one redundant CRAC and may still fulfill Tier II-IV component redundancy requirements. This is achievable at no additional investment cost as the same number of CRACs (4) is installed within the same available footprint of 2,000 ft2. The analysis also showed that the highest average practical load per cabinet should not exceed 6 kW if efficient space utilization is sought by maintaining a C2C of 0.46.
This study shows that an average high-density cabinet load may not be cooled efficiently with the use of only CRACs or even with CRACs coupled with passive heat-extraction solutions. The data supports the necessary implementation of row- and cabinet-based active cooling for high-density data center applications.
The first supercomputers used cooling water; however, the low-density data centers that were commissioned closer to a decade ago (below 2 kW per cabinet) almost totally eliminated liquid cooling. This was due to reservations about the risks of water leakage within live, critical data centers.
Data centers of today are considered to be medium-density facilities. Some of these data centers average below 4 kW per cabinet. Owners and operators that have higher demands and are ahead of the average market typically dedicate only a portion of the data center space to high-density cabinets.
With server density increasing every day and high-density cabinets (approaching 40 kW and above) becoming a potential future deployment, data centers seem likely to experience soaring heat loads that will demand comprehensive liquid-cooling infrastructures.
With future high-density requirements, CRAC units may become secondary cooling support or even more drastically, CRAC units may become obsolete!
Appendix A
Appendix A1. One-sided CRAC, maximum-throw, maximum-packed cabinets
Appendix A2. One-sided CRAC, average-throw, medium cabinets
Appendix A3. One-sided CRAC, minimum-throw, high-density cabinets
Appendix A4. Two-sided CRAC, maximum-throw, maximum-packed cabinets.
Appendix A5. Two-sided CRAC, average-throw, medium packed cabinets
Appendix A6. Two-sided CRAC, minimum-throw, high density cabinets
Appendix B
Appendix B1. Tier I (N) CRAC modeling results
Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4:^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer
Appendix B2. Tier II-IV (N+1) CRAC modeling results
Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4: ^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer
Appendix B3. Tier II-IV (N+1) CRAC modeling results
Appendix C
Liebert CRAC Technical Specification
Note: Net sensible cooling will be reduced by 7.5 kW x 3 = 22.5 kW for fans; 68.7 kW for Model DH/VH380A
Dr Hussein Shehata, BA, PhD, CEng, PGDip, MASHRAE, MIET, MCIBSE, is the technical director, EMEA, Uptime Institute Professional Services (UIPS). Dr Shehata is a U.K. Chartered Engineer who joined Uptime Institute Professional Services in 2011. He is based in Dubai, serving the EMEA region. From 2008-2011, Hussein was vice president & AsiaPacific DC Engineering, Architecture & Strategy Head at JP Morgan in Japan. Prior to that, he co-founded, managed, and operated as a subject matter expert (SME) at PTS Consulting Japan. He graduated in Architecture, followed by a PhD in HVAC, and a diploma in Higher Education that focused on multi-discipline teaching, with a focus on Engineers and Architects.