Close Coupled Cooling and Reliability

Achieving Concurrently Maintainable and Fault Tolerant cooling using various close coupled cooling technologies

By Matt Mescall

Early mainframe computers were cooled by water at the chip level. Then, as computing moved to the distributed server model, air replaced water. Typically, data centers included perimeter computer room air condition (CRAC) units to supply cold air to a raised floor plenum and perforated floor tiles to deliver it to IT equipment. These CRAC units were either direct-expansion (DX) or chilled-water units (for simplicity, CRAC will be used to refer to either kind of unit). This arrangement worked for the past few decades while data centers were primarily occupied by low-density IT equipment (< 2-4 kilowatts [kW] per rack). However, as high-density racks become more common, CRAC units and a raised floor may not provide adequate cooling.

To address these situations, data center cooling vendors developed close coupled cooling (CCC). CCC technology includes in-row, in-rack, above–rack, and rear-door heat exchanger (RDHx) systems. Manufacturers typically recommend the use of a Cold Aisle/Hot Aisle arrangement for greater efficiency, which is a best practice for all data center operations. As rack density increased due to IT consolidation and virtualization, CCC moved from being a solution to an unusual cooling situation to being the preferred cooling method. Implemented properly, a CCC solution can meet the Concurrently Maintainable and Fault Tolerant requirements of a data center.

While an air handler may provide humidity control, the close coupled cooling solution provides the onlycooling for the IT equipment in a data center. Additionally, it is assumed that the reader understands how to design a direct-expansion or chilled-water CRAC based cooling system to meet Concurrent Maintainability and Fault Tolerant requirements. This paper does not address Concurrent Maintainability and Fault Tolerant requirements for a central cooling plant, only the CCC system in the data hall.

Meeting Concurrently Maintainable and Fault Tolerant Requirements

First, let’s clarify what is required for a Concurrently Maintainable (Tier III) and a Fault Tolerant (Tier IV) cooling system. This discussion is not a comprehensive description of all Concurrently Maintainable and Fault Tolerant requirements, but it provides the basis for the rest of the discussion in this paper.

A Concurrently Maintainable system must have redundant capacity components and independent distribution paths, which means that each and every capacity component and distribution path element can be taken out of service for maintenance, repair, or replacement without impacting the critical environment.

To meet this requirement, the system must have dry pipes (no flowing or pressurized liquid) to prevent liquid spills when maintaining pipes, joints, and valves. Draining a pipe while it is disassembled is allowed, but hot tapping and pipe freezing are not. A Fault Tolerant cooling system may look like a Concurrently Maintainable system, but it must also autonomously respond to failures, including Continuous Cooling, and compartmentalize the chilled-water and/or refrigerant pipes outside the room of use (typically the computer room).

There are several different types and configurations of CCC. For simplicity, this paper will break them into two groups, in-row and above–row, and RDHx. While there are other CCC solutions available, the same concepts can be used to provide a Concurrently Maintainable or Fault Tolerant design.

In-row and above-row CCC

When data center owners have a business requirement for a high density data center to be Concurrently Maintainable or Fault Tolerant, a CCC design poses special circumstances that do not exist with room-based cooling. First, airflow must be considered. A CRAC-unit-based cooling design that is Concurrently Maintainable or Fault Tolerant has N+R cooling units that provide cooling to the whole room. When a redundant unit is off for maintenance or suffers a fault, the IT equipment still receives cooling from the remaining CRAC units via the perforated tiles in the Cold Aisle. The cooling in any Cold Aisle is not affected when the redundant unit is offline. This arrangement allows for one or two redundant CRAC units in an entire room (see Figure 1).

Figure 1. CCC airflow considerations

Figure 1. CCC airflow considerations

CCC provides cooling to the specific Cold Aisle where the unit is located. In other words, CCC units cannot provide cooling to different Cold Aisles the way CRAC units can. Accordingly, the redundant CCC unit must be located in the aisle where the cooling is required. In addition to having sufficient redundant cooling in every Cold Aisle, distance from the cooling unit to the IT equipment must also be considered. In-row and above row cooling units typically can provide cold air for only a limited distance. The design must take into account the worst-case scenario during maintenance or a failure event.

After considering the number of units and their location in the Cold Aisle, design team must consider the method of cooling, which may be air-cooled direct expansion (DX), chilled water, or a pumped refrigerant. Air-cooled DX units aretypically matched with their own condenser units. Other than proper routing, piping for air-cooled DX units require no special considerations.

Piping to chilled-water units is either traditional chilled-water piping or a cooling distribution unit (CDU). In the former method chilled water is piped directly to CCC units, similar to CRAC units. In this case, chilled-water piping systems are designed to be Concurrently Maintainable or Fault Tolerant in the same way as single-coil, room-based CRAC units.

The latter method, which uses CDUs, poses a number of special considerations. Again, chilled-water piping to a CDU and to single-coil, room-based CRAC units is designed to be Concurrently Maintainable or Fault Tolerant in the same way. However, designers must consider the impact to each Cold Aisle when a CDU is removed from service or suffers a fault.

If any single CDU provides cooling to more than the redundant number of cooling units in any aisle, the design is not Concurrently Maintainable or Fault Tolerant. When CDUs are located outside of the server room or data hall in a Fault Tolerant design, they must be properly compartmentalized so that a single event does not remove more than the redundant number of cooling units from service. A Fault Tolerant system also requires Continuous Cooling, the ability to detect, isolate, and contain a fault, and sustain operations. In a CCC system that rejects heat to a chilled-water system, the mechanical part of Continuous Cooling can be met with an appropriate thermal storage tank system that is part of a central plant.

A CCC system that rejects heat to outside air via refrigerant and a condenser will likely rely on uninterrupted power to provide Continuous Cooling which will be discussed in the following paragraphs.

Some CCC systems use pumped refrigerant. These systems transfer heat from pumped refrigerant to a building’s chilled-water system, a glycol system, or an external condenser unit.

Due to the similarities between chilled-water and glycol systems with respect to the piping headers, glycol and chilled water systems will be treated the same for purposes of this paper.. The heat transfer occurs at an in-room chiller or heat exchanger that, for the purposes of this discussion, is similar to a CDU. The Con- currently Maintainable and Fault Tolerant design considerations for a pumped refrigerant system are the same as a chilled-water system that uses a CDU.

The system that powers all CCC components must be designed to ensure that the electrical system does not defeat the Concurrent Maintainability or Fault Tolerance of the mechanical system. In a Concurrently Maintainable mechanical system electrical design, no more than the redundant number of cooling units may be removed from service when any part of the electrical system is removed from service in a planned manner. This requirement includes the cooling within any aisle, not just the room as a whole. Designing the CCC units and the associated CDUs, in-room chillers, or heat exchangers in a 2N configuration greatly simplifies the electrical distribution.

Providing an A feed to half of the units and a B feed to the other half of the units while paying attention to the distribution of the CCC units, will typically provide a Concurrently Maintainable electrical design.

If the cooling system is in an N+R configuration, the distribution of the power sources will require special coordination. Typically, the units will be dual fed, which can be accomplished by utilizing an internal transfer switch n the units, an external manual transfer switch, or an external automatic transfer switch. This requirement applies to all components of the CCC system that require power to cool the critical space, including the in-row and above-row units, the in-room chillers, heat exchangers, and any power that is required for CDUs (see Figure 2).

Figure 2. CCC valve scenario

Figure 2. CCC valve scenario

When any part of a Fault Tolerant electrical design for a mechanical system experiences a fault no more than the redundant number of cooling units may be removed from service. The same Concurrently Maintainable concepts apply to a Fault Tolerant electrical system; however, all of the transfer switches must be automatic and cannot rely on human intervention to respond to a fault. Additionally, in order to provide Continuous Cooling, uninterruptible power must be provided for cooling fans, in-room chillers and heat exchangers, pumps, and CDUs. A CCC system that uses DX and condensers to reject heat to outside air will require uninterrupted power to all system components to achieve Continuous Cooling.

The controls for these systems must also be considered in the design and meet the appropriate Concurrent Maintainability and Fault Tolerant requirements.

RDHx

The requirements for a Concurrently Maintainable or Fault Tolerant RDHx cooling solution are similar to those for in-row cooling. The RDHx units typically use chilled water or a pumped refrigerant and CDUs, in-room chillers, or heat exchangers. These units need to meet all of the Concurrent Maintainability
or Fault Tolerant requirements of in-row CCC units. Airflow when a door is removed from service for either a planned event or due to a failure is a major consideration. When an RDHx solution cools an entire data center, it may be configured in a front-to-back rack configuration. When one or more doors are removed from service, the affected racks will blow hot exhaust air into the racks behind them, which may cause them to overheat, depending on the heat load.

This configuration does not meet Concurrent Maintainability or Fault Tolerant requirements, which require that the cooling system provide N cooling to all critical equipment during a planned maintenance event or a failure. Placing the racks in a Cold Aisle/Hot Aisle configuration may not meet this requirement as exhaust air from the affected rack may circulate over its top from the Hot Aisle and overheat the servers at the top of the rack and possibly adjacent racks. The same airflow issue is possible for racks placed at the end of rows when their RDHx is not working.

Summary

Using CCC as the only form of cooling in the data center is becoming more common. CCC provides additional challenges to meet Concurrent Maintainability and Fault Tolerant requirements beyond those typically experienced with a CRAC-based cooling system. The challenges of different airflow, when compared to room- based CRACs, and ensuring that the consequential impact of maintenance and failures on the additional capacity components and distribution systems do not remove more than the redundant number of units from service can be met with careful consideration when designing all parts of the CCC system.


 

Matt Mescall

Matt Mescall

Matthew Mescall, PE, is a senior consultant for Uptime Institute Professional Services and Tier Certification Authority, where he performs audits and provides strategic- level consulting and Tier Certification reviews. Mr. Mescall’s career in critical facilities spans 12 years and includes responsibilities in planning, engineering, design, construction, and operation. Before joining Uptime Institute, Mr. Mescall was with IBM, where he operated its Boulder, CO, data center and led a worldwide team analyzing best practices across IBM data centers to ensure consistent, cost-effective reliability.Mr. Mescall holds a BS degree in Civil Engineering from the University of Southern California, a MS in Civil Engineering from the Georgia Institute of Technology, and a Masters Certificate in Project Management from George Washington University.

 

Annual Data Center Industry Survey 2014

The fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional.

To immediately access the full report, please provide your business contact information.

2014 Data Center Industry Survey

The fourth annual Uptime Institute Data Center Industry Survey provides an overview of global industry trends by surveying 1,000 data center operators and IT practitioners. Uptime Institute collected responses via email February through April 2014 and presented preliminary results in May 2014 at the 9th Uptime Institute Symposium: Empowering the Data Center Professional. This document provides the full results and analysis.

I. Survey Demographics

Uptime Institute focused its analysis on the end-user survey respondents. The majority of survey participants are data center managers, followed by smaller percentages of IT managers and senior executives. The U.S. and Canada make up a significant portion of the response, with growing numbers of participants from around the globe.

See survey demographics graphic

About half of the end-user respondents work for third-party commercial data center companies (colocation or cloud computing providers), and the other half work for enterprises in vertical industries such as financial services (11%), manufacturing (7%), healthcare (4%), government (4%), and other industries (26%).

In various sections throughout this survey, the financial services industry’s responses have been broken out from those of traditional enterprise companies. Across multiple focus areas, the responses of financial services organizations differ significantly from those of other verticals.

Previous annual surveys in 2011, 2012, and 2013 showed that large organizations (defined in this context as companies managing over 5,000 servers) were adopting new technologies, outsourcing less-critical workloads, and pursuing energy efficiency goals much faster than smaller companies.

For 2014, we analyzed the results further and found that the data center maturity gap is not just a matter of size, but also specific to a single industry (banking). This difference is most likely due to the massive investment financial organizations have in IT, especially in relation to their overall cost structures. A financial organization’s efficiency at deploying IT correlates directly to its profitability in a way that may not be as obvious to other companies.

When the response profiles of the financial organizations and the colocation providers are compared, a pattern starts to emerge – both banks and colos run their data center operations as a business.

We will explore the implications of these parallels throughout the survey data.

II. Data center budgets

In each year’s survey, we ask participants to compare their organization’s current spending on data centers (including real estate, infrastructure equipment, staffing, operations, etc.) to the previous year. Every year, the answers to these questions reveal massive growth in the colocation and multi-tenant data center (MTDC) industry compared to enterprise spending.

See budget increases graphic

In 2014, the vast majority of third-party data center respondents (86%) report receiving budget increases, versus 63% of financial firms and just 50% of the other enterprise companies. This gap is similar to the 2013 results: 77% of third-party respondents increased budget, versus just 47% of enterprise companies.

With half of all enterprises reporting stagnant or shrinking data center budgets and colocation operators reporting massive growth, our conclusion is that increasingly, enterprise data center workloads are shifting to third-party providers.

This is not to say that the enterprise data centers are going away any time soon. These organizations will continue to squeeze a return on their investments in enterprise data center assets. The value of a performing, fully or partially depreciated asset cannot be disregarded.

According to surveys from Uptime Institute’s executive programs, nearly every enterprise has chosen to host some percentage of its IT workloads off-premise, in a MTDC, cloud, or other third-party environment. Anecdotal reports to Uptime Institute lead us to believe this to be a fairly recent development (within the last 5 years).

Yet, while nearly every company is deploying off-premise computing, the percentage of workloads hosted in these environments appears to be fairly static as a percentage of overall compute. The figure below represents enterprise organizations’ IT deployment mix today, and projected deployment mix in 2014. This indicates that the growth trends in off-premise computing will continue as overall enterprise IT workloads continue to grow. This finding also indicates that there is no imminent rebound or recoil in the circular trend of outsourcing-insourcing-outsourcing commonly held in IT and other key enterprise functions such as call centers.

See in-house vs outsource graphic

This report will delve into the drivers and decision-making challenges for enterprise IT organizations contracting MTDCs and cloud computing in Section IV.

III. IT Efficiency

A special focus in 2014’s survey is an assessment of the behaviors, management strategies, and technologies used to improve IT efficiency.

Over the past several years, Uptime Institute has documented the rise in adoption of Power Usage Effectiveness (PUE) and the meager gains achieved by further pursuing that metric. Based on Uptime Institute’s field experience and feedback from the Uptime Institute Network (a user group of large data center owners and operators) around the world, enterprise IT executives are overly focused on PUE.

The vast majority (72%) of respondents measure PUE. Isolating the responses by job function, a huge percentage of executives (82%) are tracking that metric and reporting it to their corporate management.

See PUE Measurement graphic

PUE is an effective engineering ratio that data center facilities teams use to capture baseline data and track the results of efficiency improvements to mechanical and electrical infrastructure. It is also useful for design teams to compare equipment- or topology-level solutions. But as industry adoption of PUE has expanded, the metric is increasingly being misused as a methodology to cut costs and prove stewardship of corporate and/or environmental resources

In 2007, Uptime Institute surveyed its Network members, and found an average PUE of 2.50. The average PUE improved from 2.50 in 2007 to 1.89 in 2011 in Uptime Institute’s data center industry survey.

See largest PUE site graphic

From 2011 to today, the average self-reported PUE has only improved from 1.89 to 1.7. The biggest infrastructure efficiency gains happened five years ago, and further improvements will require significant investment and effort, with increasingly diminishing returns.

The figure following represents adoption of various data center cooling approaches to improve efficiency. The low-capital cost approaches have largely been adopted by data center operators. And yet, executives pressure for further reduction in PUE. High-cost efficiency investments in technologies and design approaches may provide negative financial payback and zero improvement of systemic IT efficiency problems.

See advanced cooling graphic
See target PUE graphic

Many companies’ targets for PUE are far lower than their reported current state. By focusing on PUE, IT executives are spending effort and capital for diminishing returns and continue to ignore the underlying drivers of poor IT utilization.

For example, Uptime Institute estimates of 20% of servers in data centers are obsolete, outdated, or unused. Yet, very few survey respondents believe their server populations include comatose machines. Nearly half of survey respondents have no scheduled auditing to identify and remove unused hardware.

See comatose server graphic

Historically, IT energy efficiency has been driven by data center facilities management. According to the Uptime Institute’s Annual Data Center Industry Survey (2011-2014), less than 20% of companies report that their IT departments pay the data center power bill, and the vast majority of companies allocate this cost to the facilities or real estate budgets.

This lopsided financial arrangement fosters unaccountable IT growth, inaccurate planning, and waste. This is why 67% of the senior IT executives believe comatose hardware is not a problem.

Uptime Institute launched the Server Roundup contest in October 2011 to raise awareness about the removal and recycling of comatose and obsolete IT equipment and reduce data center energy use. Uptime Institute invited companies around the globe to help address and solve this problem by participating in the Server Roundup, an initiative to promote IT and Facilities integration and improve data center energy efficiency.

In 2 years of participating in Server Roundup, the financial firm Barclays has removed nearly 15,000 servers and saved over US$10M. Server Roundup overwhelmingly proves that disciplined hardware decommissioning can provide a significant financial impact.

Yet despite these huge savings and intangible benefits to the overall IT organization, many firms are not applying the same level of diligence and discipline to a server decommissioning plan, as noted previously.

This is the crux of the data center efficiency challenge ahead—convincing more organizations of the massive return on investment in addressing IT instead of relentlessly pursuing physical infrastructure efficiency.

Organizations need to hold IT operations teams accountable to root out inefficiencies, of which comatose servers are only the most obvious and egregious example.

See auditing comatose servers graphic

For nearly a decade, Uptime Institute has recommended enterprise IT executives take a holistic approach to significantly reduce the cost and resource consumption of compute infrastructure. That approach is outlined here.

IV. Enterprise Adoption of Third-Party Data Centers and IT Services

As stated earlier, nearly every enterprise organization is using some combination of in-house IT and off-premise computing. There are a number of drivers for this trend, including the ability to right-size deployments, lower the cost of investment, and getting IT workloads into production quickly.

See rate drivers multitenant datacenter graphic

So far, enterprise organizations have largely been satisfied with their experiences using multi-tenant data center providers. In fact, in this unverified and self-reported survey, the colocation operators report fewer outages than their enterprise counterparts.

See colo-satisfaction graphicSee outage impact graphic

See colocation outage graphic

Despite many enterprise organizations currently reporting satisfaction with colocation providers, the deployment to off-premise computing has not always been a smooth transition. In Uptime Institute’s experience, many large enterprise organizations historically ran their own data centers, and only recently started deploying into third-party sites at scale. The facilities and corporate real estate teams who are often responsible for managing these companies have limited experience in contract terms, service level agreements, pricing, and other challenges specific to an outsourced IT relationship.

In fact, the decision over whether to outsource an IT workload and where to host it typically comes from the IT department, and not the management team that ultimately holds responsibility for that contract.

See job role graphic

The facilities managers and data center engineers are expected to become experts in third-party data center management on the fly—to learn on the job. All the while, the usage of third-party data center providers is rapidly expanding, and very few enterprises have formalized requirements for engaging with the MTDC market. A large percentage cannot track the cost of downtime for their organizations.

The vast majority of enterprise organizations are blasting workloads into off-premise computing environments, but they don’t know where they are going, or what their staff are supposed to do when they get there. Many organizations are making decisions on a very limited selection of criteria and inputs.

See Third Party IT Graphic
See provider selectionUltimately, this was the primary reason Uptime Institute developed the FORCSS™ Methodology in 2012.

Uptime Institute FORCSS is a means to capture, compare, prioritize, and communicate the benefits, costs, and impacts of multiple IT deployment alternatives. Deployment alternatives may include owned/existing data centers, commercial data centers (wholesale, retail, colocation, managed service), or IaaS (including cloud) that is procured on a scale or limited basis.

FORCSS provides organizations with the flexibility to develop specific responses to varying organizational needs. A case study series will present the process of applying the FORCSS Factors to specific deployment options and present the outcome of the FORCSS Index—a concise structure that can be understood by non-IT executive management.

Please refer to the FORCSS Introduction and FORCSS Case Study 1.

Conclusions

  • Enterprise companies are investing less in their own data centers. Instead, they are deploying their IT in off-premise data center environments. This trend goes back through the 4 years Uptime Institute has conducted this survey. This trend is leading to massive spending in the MTDC market. This spending does not show signs of abating.
  • Although more IT workloads are moving to third-party providers (especially new workloads), the enterprise-owned data center will continue to be responsible for much core IT production for the foreseeable future.
  • Enterprise organizations are satisfied with the performance of their current MTDC providers, but very few companies have the expertise or processes in place yet to manage or even make the most effective decisions about off-premise computing options.
  • As noted in previous years, IT efficiency efforts have largely been limited to data center facilities management and design teams. Very little work has been done to address the systemic IT inefficiencies that have plagued the industry for nearly a decade. But as senior executives push for more improvements in efficiency, many will realize they are running out of return on investment; hopefully, they will turn to improving IT utilization.
  • A large majority (75%) of survey participants said the data center industry needs a new energy efficiency metric.

Appendix

Additional 2014 survey responses:

i. If your organization has adopted Cold Aisle or Hot Aisle containment, approximately what percentage of your cabinets uses this design?

a. Less than 10% contained: 22%
b. 10-25% contained: 13%
c. 25-50% contained: 12%
d. 50% contained: 7%
e. 50-75% contained: 16%
f. 75-100% contained: 30%

ii. Would your organization consider a data center that did not include the following designs/technologies?

a. Raised floor: 52% yes
b. Mechanical cooling: 24% yes
c. Generator: 8% yes
d. Uninterruptible power supply: 7% yes

iii. Does management receive reports on data center energy costs?
a. Yes: 71%
b. No: 29%

iv. Does management set targets for reducing data center energy costs?
a. Yes: 54%
b. No: 46%

v. How does your organization measure PUE?
a. PUE Category 0: 30%
b. PUE Category 1: 25%
c. PUE Category 2: 19%
d. PUE Category 3: 11%
e. Alternative method: 8%
f. Don’t know: 7%

vi. Does your company report PUE publicly?
a. Yes; 10%
b. No; 90%

vii. Has your organization achieved environmental or sustainability certifications for any of its data centers?
a. Colo/MTDC: 35% yes
b. Financial Services: 46% yes
c. Other Enterprises: 21% yes

viii. Considering your company’s primary multi-tenant or colocation provider, what is the length of the commitment you have made to that provider?

a. Under 2 years
i. Financial Services: 28%
ii. Other Enterprise: 36%

b. 2-3 years
i. Financial Services: 11%
ii. Other Enterprise: 22%

c. 3-5 years
i. Financial Services: 30%
ii. Other Enterprise: 21%

d. Over 5 years
i. Financial Services: 32%
ii. Other Enterprise: 21%

ix. If your organization measures the cost of data center downtime, how do you use that information?
a. Report to management: 88%
b. Rationalize equipment purchases: 51%
c. Rationalize services purchases: 42%
d. Rationalize increased staff or staff training: 39%
e. Rationalize software purchases: 32%

x. Does your organization perform unscheduled drills that simulate data center emergencies?
a. Yes: 44%
b. No: 56%

xi. Considering your organization’s largest enterprise data center, what staffing model is used for facilities staff?
a. 24 hours a day, 7 days a week: 70%
b. Other: 30%


Email Uptime Institute Director of Content and Publications Matt Stansberry with any questions or feedback: [email protected].

This paper provides analysis and commentary of the Uptime Institute survey responses. Uptime Institute makes reasonable efforts to facilitate a survey that is reliable and relevant. All participant responses are assumed to be in good faith. Uptime Institute does not verify or endorse the responses of the participants; any claims to savings or benefits are entirely the representations of the survey participants.

Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics

Striking the appropriate balance between cost and reliability is a business decision that requires metrics

By Dr. Hussein Shehata

This paper focuses on cooling limitations of down-flow computer room air conditioners/air handlers (CRACs/CRAHs) with dedicated heat extraction solutions in high-density data center cooling applications. The paper also explains how higher redundancy can increase total cost of ownership (TCO) while supporting only very light loads and proposes a metric to help balance the requirements of achieving higher capacities and efficient space utilization.

With several vendors proposing passive high-density technologies (e.g., cabinet hot air removal as a total resolution to the challenge of high density), this analysis shows that such solutions are only possible for a select few cabinets in each row and not for full deployments.

The vendors claim that the technologies can remove heat loads exceeding 20 kilowatts (kW) per cabinet, but our study disproves that claim; passive-cooling units cannot extract more heat than the cold air supplied by the CRACs. For the efficient design of a data center, the aim is to increase the number of cabinets and the total IT load, with the minimal necessary supporting cooling infrastructure. See Figure 1.

Figure 1. The relationship between IT and supporting spaces

Figure 1. The relationship between IT and supporting spaces

Passive Hot Air Removal
Data center design continually evolves towards increasing capacity and decreasing spatial volume, increasing energy density. High-end applications and equipment have higher energy density than standard equipment; however, the high-performance models of any technology have historically become the market standard with the passage of time, which in the case of the IT industry is a short period. As an example, every 3 years the world’s fastest supercomputers offer 10 times the performance of the previous generation, a trend that has been documented over the past 20 years.

Cooling high-density data centers is mostly commonly achieved by:

• Hot Air Removal (HAR) via cabinet exhaust ducts—active and passive.
See Figure 2.

Figure 2. HAR via cabinet exhaust ducts (active and passive). Courtesy APC

Figure 2. HAR via cabinet exhaust ducts (active and passive). Courtesy APC

• Dedicated fan-powered cooling units (i.e., chilled water cabinets).
See Figure 3.

Figure 3. Dedicated fan-powered cooling units

Figure 3. Dedicated fan-powered cooling units

This paper focuses on HAR/CRAC technology using an underfloor air distribution plenum.

Approach
High-density data centers require cooling units that are capable of delivering the highest cooling capacity using the smallest possible footprint. The high-powered CRACs in the smallest footprints available from the major manufacturer offer a net sensible cooling capacity of approximately 90 kW but require 3×1-meter (m) (width by depth) footprints. (Appendix C includes the technical specifications for the example CRAC).

Excluding a detailed heat load estimate and air efficiency distribution effectiveness, the variables of CRAC capacity, cabinet quantity, and cabinet capacity may be related in the following formula.

Note: The formula is simplified and focused on IT cooling requirements, excluding other loads such as lighting and solar gains.

CRAC Capacity = Number of IT cabinets x kW/cabinet (1)

Example 1 for N Capacity: If a 90-kW CRAC cools 90 cabinets, the
average cooling delivered per cabinet is 1 kW.

90 kW= 90 cabinets x 1 kW/cabinet (2)

Example 2 for N Capacity: If a 90-kW CRAC cools two cabinets, the
average cooling delivered per cabinet is 45 kW.

90 kW= 2 cabinets x 45 kW/cabinet (3)

The simplified methodology, however, does not provide practical insight into space usage and heat extraction capability. In Example 1, one CRAC would struggle to efficiently deliver air evenly to all 90 cabinets due to the practical constraints of CRAC airflow throw; in most circumstances the cabinets farthest from the CRAC would likely receive less air then the closer cabinets (assuming practical raised-floor heights and minimal obstructions to under floor airflow).

In Example 2, one CRAC would be capable of supplying sufficient cooling to both cabinets; however, the ratio of space utilization of the CRAC, service access space, and airflow throw buffer would result in a high space usage for the infrastructure compared to prime white space (IT cabinets). Other constraints, such as allocating sufficient perforated floor tiles/grills in case of a raised-floor plenum or additional Cold Aisle containment for maximum air distribution effectiveness may lead to extremely large Cold Aisles that again render the data center space utilization inefficient.

Figure 4. Typical Cold Aisle/Hot Aisle arrangement (ASHRAE TC9.9)

Figure 4. Typical Cold Aisle/Hot Aisle arrangement (ASHRAE TC9.9)

Appendix B includes a number of data center layouts generated to illustrate these concepts. The strategic layouts in this study considered maximum (18 m), average (14 m) and minimal (10 m) practical CRAC air throw, with CRACs installed perpendicular to cabinet rows on one and two sides as recommended in ASHRAE TC9.9. The front-to-back airflow cabinets are assumed to be configured to the best practice of Cold Aisle/Hot Aisle arrangement (See Figure 4). Variation in throw resulted in low, medium, and high cabinet count, best defined as high density, average density, and maximum packed (high number of cabinets) for the same data center whitespace area and electrical load (see Figure 5).

Figure 5. CRAC throw area

Figure 5. CRAC throw area

In the example layouts, CRACs were placed close together, with the minimal 500-millimeter (mm) maintenance space on one side and 1,000 mm on the long side (see Figure 6). Note that each CRAC manufacturer might have different unit clearance requirements. A minimal 2-m buffer between the nearest cabinet and each CRAC unit prevents entrainment of warm air into the cold air plenum. Cold and Hot aisle widths were modeled on approximately 1,000 mm (hot) and 1,200 mm (cold) as recommended in ASHRAE TC9.9 literature.

In the context of this study, CRAC footprint is defined as the area occupied by CRACs (including maintenance and airflow throw buffer); cabinet footprint is defined as the area occupied by cabinets (and their aisles). These two areas have been compared to analyze the use of prime footprint within the data center hall.

Tier level requires each and every power and cooling component and path to fulfill the Tier requirements; in the context of this paper the redundancy configuration reflects the Tier level of CRAC capacity components only, excluding considerations to other subsystems required for the facility’s operation. Tier I would not require redundant components, hence N CRAC units are employed. Tiers II, III, and IV would require redundant CRACs; therefore N+1 and N+2 configurations were also considered.

Figure 6. CRAC maintenance zone

Figure 6. CRAC maintenance zone

A basic analysis shows that using a CRAC as described above would require a 14-m2 area (including throw buffer), which would generate 25.7 kW of cooling for every 1 m of active CRAC perimeter at N redundancy, 19.3 kW for one-sided N+1 redundancy and two-sided N+2 redundancy, 22.5 kW for two-sided N+1 redundancy, and 12.9 kW for one-sided N+2 redundancy. However, data center halls are not predominantly selected and designed based on perimeter length, but rather on floor area.

The study focused on identifying the area required by CRAC units, compared to that occupied by IT cabinets, and defines it as a ratio. Figure 7 shows Tier I (N) one-sided CRACs in a high-density cabinet configuration. Appendix A includes the other configuration models.

Furthermore, a metric has been derived to help determine the appropriate cabinet footprint at the required Tier level (considering CRAC redundancy only).

Figure 7. Tier 1 (N) one-sided CRACs in a high-density cabinet configuration

Figure 7. Tier 1 (N) one-sided CRACs in a high-density cabinet configuration

Cabinet capacity to footprint factor C2F= kw/cabinet / C2C (4)

Where CRAC to Cabinet factor C2C= CRAC footprint / Cabinet footprint (5)

For multiple layout configurations, the higher the C2F, the more IT capacity can be incorporated into the space. Higher capacity could be established by more cabinets at lower densities or by fewer cabinets at higher densities. However, the C2F is closely linked to the necessary CRAC footprint, which as analyzed in this paper, could be a major limiting factor (see Figure 8).

Figure 8. C2F versus cabinet load (kW) for various CRAC redundancies

Figure 8. C2F versus cabinet load (kW) for various CRAC redundancies

Results
The detailed results appear in Appendix B. The variations analyzed included reference CRACs with no redundancy, with one redundant unit, and with two redundant units. For each of the CRAC configurations, three cabinet layouts were considered: maximum packed, average density, and high density).

Results showed that the highest C2F based on the six variations within each of the three redundancy configurations is as follows:

• Tier I (N)–one-sided CRAC deployment: C2F = 13

• Tier II-IV (N+1)–two-sided CRAC deployment: C2F = 11.4

• Tier II-IV (N+2 and above)–two-sided CRAC deployment: C2F = 9.8

The noteworthy finding is that the highest C2F in all 18-modeled variations was for high-density implementation and at a CRAC-to-cabinet (C2C) area ratio of 0.46 (i.e., CRACs occupy 32% of the entire space) and a cabinet footprint of 2.3 m2 per cabinet. This is supporting evidence that, although high-density cabinets would require more cooling footprint, high density is the most efficient space utilization per kW of IT.

Example 3 illustrates how the highest C2F on a given CRAC redundancy and one- or two-sided layout may be utilized for sizing the footprint and capacity within an average-sized 186-m2 data center hall for a Tier II-IV (N+2, C2F=9.8, C2C=0.5, and cabinet footprint of 2.3 m2) deployment. The space is divided into a net 124-m2 data hall for cabinets, and 62 m2 of space for CRAC units by utilizing the resulting ideal C2C of 0.46.

Example 3: If a net 124-m2 data hall for cabinets and 62 m2 of space for CRAC units is available, the highest achievable capacity would be 4.5 kW/cabinet.

9.8= 4.5 kW/cabinet/59 m2 : 127 m2 (6)

To determine the number of cabinets and CRACs, the CRAC cooling capability will be used rather than the common method of dividing the area by cabinet footprint.

The total area occupied by a CRAC is 14 m2; hence approximately four CRACs would occupy the 59-m2 space. Two CRACs are duty, since N+2 is utilized; therefore, the available capacity would be 90 kW x 2 = 180 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 180/4.5 = 40 cabinets.

The total effective space used by the 40 cabinets is 92 m2 (40 x 2.3 m2 ) that is 72% of the available cabinet dedicated area. This shows that higher redundancy may be resilient but does not fully utilize the space efficiently. This argument highlights the importance of the debate between resilience and space utilization.

Example 4 illustrates how C2F may be utilized for sizing the footprint and capacity within the same data center hall but at a lower redundancy of N+1 configuration.

Example 4: By applying the same methodology, the highest achievable capacity would be 5.2 kW/cabinet.

11.4= (7)

The total area occupied by a CRAC is 14 m2 (including CRAC throw and maintenance); hence approximately four CRACs would occupy 59 m2 of space. Three CRACs would be on duty, since N+1 is utilized; therefore, the available capacity would be 90 kW x 3 = 270 kW. The number of cabinets that could then be installed in this 186-m2 total area would be 270/5.2 = 52 cabinets.

The total effective space used by the 52 cabinets is 120 m2 (52 x 2.3 m2 ), which is 95% of the space. The comparison of Example 3 to Example 4 shows that less redundancy provides more efficient space utilization.

image9

Figure 9. Summary of the results

The analysis shows that taking into consideration the maximum C2F results obtained for each redundancy type and then projecting output on a given average load per cabinet, an example average high-density cabinet of 20 kW would require the CRAC units to occupy double the IT cabinet space in an N+2 configuration, hence lowering the effective use of such prime IT floor space (See Figure 9).

Additional Metrics
Additional metrics for design purposes have been derived from the illustrated graphs and resultant formulae.

The derived formula could be documented as follows:

P=K/L+M-(6.4 x R/S)  (8)

Where
P = Cooling per perimeter meter (kW/m)
K = CRAC net sensible capacity (kW)
L = CRAC length (m)
M = CRAC manufacturer side maintenance clearance (m)
R = CRAC redundancy
S = One- or two-sided CRAC layout

Conclusion
Approximately 50% (270 kW/180 kW) more capacity, 30% more cabinets, and 16% higher-cabinet load density could be utilized in the same space with only one redundant CRAC and may still fulfill Tier II-IV component redundancy requirements. This is achievable at no additional investment cost as the same number of CRACs (4) is installed within the same available footprint of 2,000 ft2. The analysis also showed that the highest average practical load per cabinet should not exceed 6 kW if efficient space utilization is sought by maintaining a C2C of 0.46.

This study shows that an average high-density cabinet load may not be cooled efficiently with the use of only CRACs or even with CRACs coupled with passive heat-extraction solutions. The data supports the necessary implementation of row- and cabinet-based active cooling for high-density data center applications.

The first supercomputers used cooling water; however, the low-density data centers that were commissioned closer to a decade ago (below 2 kW per cabinet) almost totally eliminated liquid cooling. This was due to reservations about the risks of water leakage within live, critical data centers.

Data centers of today are considered to be medium-density facilities. Some of these data centers average below 4 kW per cabinet. Owners and operators that have higher demands and are ahead of the average market typically dedicate only a portion of the data center space to high-density cabinets.

With server density increasing every day and high-density cabinets (approaching 40 kW and above) becoming a potential future deployment, data centers seem likely to experience soaring heat loads that will demand comprehensive liquid-cooling infrastructures.

With future high-density requirements, CRAC units may become secondary cooling support or even more drastically, CRAC units may become obsolete!

 

Appendix A

Appendix A1. One-sided CRAC, maximum-throw, maximum-packed cabinets

Appendix A1. One-sided CRAC, maximum-throw, maximum-packed cabinets

 

 

Appendix A2. One-sided CRAC, average-throw, medium cabinets

Appendix A2. One-sided CRAC, average-throw, medium cabinets

 

Appendix A3. One-sided CRAC, minimum-throw, high-density cabinets

Appendix A3. One-sided CRAC, minimum-throw, high-density cabinets

 

Appendix A4. Two-sided CRAC, maximum-throw, maximum-packed cabinets.

Appendix A4. Two-sided CRAC, maximum-throw, maximum-packed cabinets.

 

Appendix A5. Two-sided CRAC, average-throw, medium packed cabinets

Appendix A5. Two-sided CRAC, average-throw, medium packed cabinets

 

 

 

Appendix A6. Two-sided CRAC, minimum-throw, high density cabinets

Appendix A6. Two-sided CRAC, minimum-throw, high density cabinets

 

 

 

 

 

 

 

 

Appendix B

JJ Hussein Appendix B1b image19

 

 

JJ HUssein Appendix B1a image20

Appendix B1. Tier I (N) CRAC modeling results

Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4:^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer

JJ Hussein Appendix B2b image21

 

 

 

 

 

Appendix B2. Tier II-IV (N+1) CRAC modeling results

Appendix B2. Tier II-IV (N+1) CRAC modeling results

 

 

 

Note 1: HD = High Density
Note 2: MP = Max Packed
Note 3: * = CRAC Area includes maintenance and throw buffer
Note 4: ^ = 27 m2 area is deducted from total area, as it is already included in the throw buffer

JJ Hussein Appendix B3b image23

 

Appendix B3. Tier II-IV (N+1) CRAC modeling results

Appendix B3. Tier II-IV (N+1) CRAC modeling results

Appendix C

image25

 

image26

 

 

 

 

 

 

 

 

 

JJ Hussein Appendix C4 image28

image27

 

 

 

 

 

 

 

 

Liebert CRAC Technical Specification
Note: Net sensible cooling will be reduced by 7.5 kW x 3 = 22.5 kW for fans; 68.7 kW for Model DH/VH380A


husseinDr Hussein Shehata, BA, PhD, CEng, PGDip, MASHRAE, MIET, MCIBSE, is the technical director, EMEA, Uptime Institute Professional Services (UIPS). Dr Shehata is a U.K. Chartered Engineer who joined Uptime Institute Professional Services in 2011. He is based in Dubai, serving the EMEA region. From 2008-2011, Hussein was vice president & AsiaPacific DC Engineering, Architecture & Strategy Head at JP Morgan in Japan. Prior to that, he co-founded, managed, and operated as a subject matter expert (SME) at PTS Consulting Japan. He graduated in Architecture, followed by a PhD in HVAC, and a diploma in Higher Education that focused on multi-discipline teaching, with a focus on Engineers and Architects.

Cogeneration powers South Africa’s first Tier III Certified data center

MTN’s new facility makes use of Kyoto Wheel

By Olu Soluade, Robin Sloan, Willem Weber, and Philip Young

01-MTN-Centurion-Site

Figure 1. MTN Centurion Site

MTN’s new data center in Centurion, Johannesburg, South Africa, includes a 500-square-meter (m2) space to support MTN’s existing Pretoria Switch. MTN provides cellular telecommunications services, hosted data space, and operations offices via a network of regional switches. The Centurion Switch data center is a specialist regional center serving a portion of the smallest but most densely populated province of South Africa, Gauteng. The operational Centurion Switch Data Center provides energy efficient and innovative service to the MTN regional network (See Figure 1).

As part of the project, MTN earned Uptime Institute Tier III Design and Facility Certifications and the Carbon Credit application and approval by the Department of Energy-South Africa. Among other measures, MTN even deployed Novec 1230 fire-suppression gas to gain carbon credits from the United Nations Framework Convention on Climate Change (UNFCC). MTN Centurion is the first Uptime Institute Tier III Certified Design and Facility in South Africa. In addition, the facility became the first in South Africa to make use of the Kyoto Wheel to help it achieve its low PUE and energy-efficiency operations goals.

A modular design accommodates the 500-m2 white space and provides auxiliary services and functions to ensure a data center that meets MTN’s standards and specifications.

Space was also allocated for the future installation of:

  • Radio mast
  • RF room
  • Tri-generation plant
  • Solar systems
  • Wind banks

Electrical Services

The building is divided into 250 m2 of transmission space and 250 m2 of data space. Both spaces were designed to the following specifications.

  • Data cabinets at 6 kilowatt (kW)/cabinet
  • Transmission cabinets at 2.25 kW/cabinet
  • Maximum 12 cabinets per row
  • Primary backup power with rotary UPS run on biodiesel
  • In row dc PDUs (15 kW)
  • In row ac PDUs (40 kW)
  • Utility supply from Tshwane (CAI applied and got a connection of 8 mega volt-amperes)
  • 25 percent of all energy consumed to be generated from on site renewable resources
External-chiller-plant

Figure 2. External chiller plant

 

Heating, Ventilation, Air-conditioning

A specific client requirement was to build a facility that is completely off grid. As a result the design team conducted extensive research and investigated various types of refrigeration plants to determine which system would be the most efficient and cost effective.

The final technologies for the main areas include (see Figures 2-5):

  1. Air-cooled chillers
  2. Kyoto Wheel in main switch room
  3. Chilled water down-blow air handling units in other rooms
  4. Hot Aisle containment

The design for the data center facility began as a Tier IV facility, but the requirement for autonomous control caused management to target Tier III instead. However, the final plans incorporate many features that might be found in a Fault Tolerant facility.

Tables 1 and 2 describe the facility’s electrical load in great detail.

Green Technologies

Stainless-steel-cladded-chilled-water-pipework

Figure 3. Stainless steel cladded chilled water pipework.

The horizontal mounting of the coil of the Kyoto Wheel (See Figure 5 a-b) at MTN Centurion is a one of a kind. The company paid strict attention to installation details and dedicated great effort to the seamless architectural integration of the technology.
MTN chose the Kyoto Wheel (enthalpy wheel) to transfer energy between hot indoor returning air from the data center and outdoor air because the indirect heat exchange between hot return air and cooler outdoor air offers:

  • Free cooling/heat recovery
  • Reduced load on major plant
  • Lower running costs
  • Low risk of dust transfer

    Ducted-return-air

    Figure 4. Ducted return air from cabinets in data space

Although the use of an enthalpy wheel in South Africa is rare (MTN Centurion is one of two installations known to the author), Southern African temperature conditions are very well suited to the use of air-side economizers. Nonetheless the technology has not been widely accepted in South Africa because of:

  • Aversion to technologies untested in the African market
  • Risk mitigation associated with dust ingress to the data center
  • Historically low data center operating temperatures (older equipment)
  • Historically low local energy costs

The tri-generation plant is one of the other green measures for the Centurion Switch. The tri-generation meets the base load of the switch.

Kyoto-Wheel-installation

Figure 5a. Kyoto Wheel installation

Kyoto-Wheel-installation

Figure 5b. Kyoto Wheel installation

MTN first employed a tri-generation plant at its head office of about four years ago (see Figures 6-8).

The data center also incorporates low-power, high-efficiency lighting, which is controlled by occupancy sensors and photosensors (see Figure 9).

Design Challenges

Table 1

Table 1. Phase 1 Building 950 W/data cabinet and 1,950 W/switch rack at 12 cabinets/row 600-kW maximum per floor

MTN Centurion Switch experienced several challenges during design and construction and ultimately applied solutions that can be used on future projects:

  • The original Kyoto Wheel software was developed for the Northern Hemisphere. For this project, several changes were incorporated into the software for the Southern Hemisphere.
  • Dust handling in Africa varies from the rest world. Heavy dust requires heavy washable pre-filters and a high carrying capacity in-filtration media.

    Table 2

    Table 2. Phase 1 Building SF – 2,050 W/data cabinet at 12 cabinets/row, 800 kW maximum per floor

The design team also identified three steps to encourage further use of airside economizers in South Africa:

  • Increased education to inform operators about the benefits of higher operating temperatures
  • Increased publicity to increase awareness of air-side economizers
  • Better explanations to promote understanding of dust risks and solutions

Innovation

Many features incorporated in the MTN facility are tried-and-true data center solutions. However, in addition to the enthalpy wheel, MTN employed modularity and distributed PDU technology for the first time in this project.

In addition, the Kyoto Wheel is used throughout HVAC design, but rarely at this scale and in this configuration. The use of this system, in this application, and the addition of the chilled water coils and water spray were the first within the MTN network and the first in South Africa.

Conclusion

MTN tirelessly pursues energy efficiency and innovation in all its data center designs. The MTN Centurion site is the first Tier III Certified Constructed Facility in South Africa and the first for MTN.

The future provision for tri-generation, photovoltaic, and wind installations are all items that promise to increase the sustainability of this facility.

Figure 6. Tri-generation plant room at MTN Head Office

Figure 6. Tri-generation plant room at MTN Head Office

Figure 7. Tri-generation gas engines at MTN Head Office

Figure 7. Tri-generation gas engines at MTN Head Office

 

Figure 8. Tri-generation schematics at MTN Head Office

Figure 8. Tri-generation schematics at MTN Head Office

12-Typical-light-installation-with-occupancy-sensing

Figure 10. Data rack installation

Figure 10. Data rack installation

Figure 11. Power control panels

Figure 11. Power control panels

Figure 12. External chilled water plant equipment

Figure 12. External chilled water plant equipment


olu-soluade

Olu Soluade started AOS Consulting Engineers in 2008. He holds a Masters degree in Industrial Engineering and a BSc. Hons. degree with Second Class upper in Mechanical Engineering. He is a professional engineer and professional construction project manager with 21 years of experience in the profession.

 

robin-sloanRobin Sloan is Building Services Manager at AOS Consulting Engineers. He is a mechanical engineer with 7 years of experience In education, healthcare, commercial, residential, retail and transportation building projects. His core competencies include project management, design works of railway infrastructure, education, commercial, and health-care projects from concept through to hand-over, HVAC systems, mechanical and natural ventilation, drainage, pipework services (gas, water and compressed air), control systems, and thermal modelling software.

willem-weberWillem Weber is Senior Manager: Technical Infrastructure for MTN South Africa, the largest cellular operator in Africa. Mr. Weber was responsible for the initiation and development of the first methane tri-generation plant in South Africa, the first CSP cooling system using the Fresnel technology, the first Tier III Design and Constructed Facility certified by Uptime Institute in South Africa, utilizing the thermal energy wheel technology for cooling and tri-generation.

philip-young

 

Philip Young is Building Services Manager at AOS Consulting Engineers and a professional electronic and mechanical engineer registered with ECSA (P Eng) with 10 years experience. Previously he was a project manager & engineer at WSP Group Africa (Pty) Ltd. Mr. Young is involved in design, feasibility studies, multidisciplinary technical and financial evaluations, Building Management Systems, and renewable energy.

Russia’s First Tier IV Certification of Design Documents

Next Step: Preparing for Facility Certification

By Alexey Karpov

Mordovia Republic-Technopark Mordovia Data Center (Technopark Data Center) is one of the most significant projects in Mordovia (see Figure 1). The facility is a mini-city that includes research organizations, industry facilities, business centers, exhibition centers, schools, a residential village, and service facilities. One of the key parts of the project is a data center intended to provide information, computing, and telecommunication services and resources to residents of Technopark-Mordovia, public authorities, business enterprises of the region, and the country as a whole. The data processing complex will accommodate institutions primarily engaged in software development, as well as companies whose activities are connected with the information environment and the creation of information resources and databases using modern technologies.

mordovia

Figure 1. Map of Mordovia

The data center offers colocation and hosting services, hardware maintenance, infrastructure as a service (IaaS) through a terminal access via open and secure channels, and access to Groupware software based on a SaaS model. As a result, Technopark Data Center will minimize the residents’ costs to conduct research, manage general construction and design projects, and interact with consumers in the early stages of production through outsourcing of information and telecommunication functions and collective use of expensive software and hardware complexes. Mordovia created and helped fund the project to help enterprises develop and promote innovative products and technologies. About 30 leading science and technology centers cooperate with Technopark-Mordovia, conduct research, and introduce into production new and innovative technologies, products, and materials because of the support of the Technopark Data Center (see Figure 2).

technopark-datacenter-renderings

Figures 2 (a-b) Above. Renderings of the Technopark Data Center show both elevated and street-level views.

Why Design Certification?
Technopark Data Center is the largest and most powerful computing center in Mordovia. Its designers understood that the facility would eventually serve many of the government’s most significant social programs. In addition, the data center would also be used to test and run Electronic Government programs, which are currently in development. According to Alexey Romanov, Director of Gosinform, the state operator of Technopark-Mordovia, “Our plan is to attract several groups of developers to become residents. They will use the computing center as a testing ground for developing programs such as Safe City, medical services for citizens, etc. Therefore, we are obliged to provide the doctors with round the clock online access to clinical records, as well as provide the traffic police with the same access level to the management programs of the transport network in the region.”

To meet these requirements, Technoserv followed the provisions of Uptime Institute requirements for engineering infrastructure (Data Center Site Infrastructure Tier Standard: Topology). As a result, all engineering systems are designed to fully meet requirements for Uptime Institute Tier IV Certification of Design Documents for redundancy, physical separation, and maintenance of equipment and distribution lines (see Figures 3 and 4).

Figure 3. One-line diagram shows Technopark Data Center’s redundant power paths.

Figure 3. One-line diagram shows Technopark Data Center’s redundant power paths.

technopark-processing-area

Figure 4. Technopark Data Center’s processing area.

Meeting these requirements enables Mordovia to achieve significant savings, as the Technopark Data Center makes possible an overall data center plan that makes use of lower reliability regional centers. Though not Tier Certified by the Uptime Institute, these regional data centers are built to follow redundant components requirements, which reduces capital costs. Meanwhile, the central data center provides backup in case one of the regional data centers experiences downtime.

The Technopark Data Center is the core of all IT services in Mordovia. The regional data centers are like “access terminals” in this environment, so the government reasoned that it was not necessary to build them to meet high reliability requirements.

The Data Center Specification
The Technopark Data Center is a 1,900-kW facility that can house about 110 racks, with average consumption of 8-9 kW per rack. Power is supplied from four independent sources: two independent feeds from the city’s electricity system and diesel generator sets with 2N redundancy.

Main characteristics:
(See Table 1)

The data center building is a multi-story structure. Servers occupy the first floor: computing resources are placed in three areas, and various types of IT equipment (basic computing, telecommunications, and storage systems) are placed in different rooms. The administrative block and call-center are on the second floor.

Chillers, pumping stations, chilled water storage tanks, and UPS batteries, etc. are located in the basement and technical floors. Transformers and diesel generators are located in a separate area adjoining the data center. Diesel fuel tanks are located in two deepened areas at opposite sides of the building.

The data center design includes several energy-saving technologies, which enables the facility to be very energy efficient by Russian standards (PUE <1.45). For example, the cooling system includes a free-cooling mode, and all power and cooling equipment operate in modes intended to provide maximum efficiency. Other energy efficiency details include:

  • Computing equipment is installed in a Cold Aisle/Hot Aisle configuration, with containment of the Hot Aisles. In-row cooling further improves energy efficiency.
  • The cooling system utilizes efficient chillers with screw compressors and water-cooled condensers. The dry cooling towers installed on the roof refrigerate the condensers of the chillers in the summer. In the winter, these cooling towers help provide free cooling. Calculations for the design of the cooling system and air conditioning were performed according to ASHRAE standards.
  • All elements of the engineered systems, as well as the systems themselves, are integrated into a single BMS. This BMS controls all the necessary functions of the equipment and interconnected subsystems and quickly localizes faults and limits the consequences of emergencies. Technoserv utilizes a distributed architecture in which each component has a dedicated controller that feeds information back to a single BMS. If the BMS servers fail, the individual controllers maintain autonomous control of the facility.The BMS also collects and processes exhaustive amounts of information about equipment, issues reports, and archives data. A control room is provided at the facility for operators, where they can monitor the operation of all elements of the engineering infrastructure.
Table 1. The Technopark Data Center is designed to be Fault Tolerant. Plans are being made to begin the Tier Certification for Constructed Facility.

Table 1. The Technopark Data Center is designed to be Fault Tolerant. Plans are being made to begin the Tier Certification for Constructed Facility.

From a security standpoint, the data center is organized into three access levels:

  • Green areas provide open admission for users and to the showroom.
  • Blue areas are restricted to Technopark Data Center residents performing their own IT projects.
  • Red areas are open only to data center staff.

Three independent fiberoptic lines, each having a capacity of 10 Gbits per second, ensure uninterrupted and high capacity data transmission to users of Technopark Data Center’s network infrastructure. Russia’s key backbone operators (Rostelecom, Transtelekom, and Megaphone) were selected as Technopark Data Center’s telecom partners because of their well-connected and powerful infrastructure in Russia.

The data center also includes a monitoring and dispatching system. The system is based on three software products: EMC Ionix (monitoring the availability of all components of the IT infrastructure), EMC APG (accumulation of statistics and performance analysis), VMware vCenter Operations Enterprise (intelligent performance monitoring and capacity of objects the virtual environments VMware), and integration modules specially designed by Technoserv.

Challenges

Figure 5. Inside a data hall.

Figure 5. Inside a data hall.

As noted previously, the data center was designed to achieve the highest levels of reliability. There are some data centers in Russia that perform critical national tasks, but none of those facilities require the highest levels of reliability. This reality made the task seem more daunting to everyone who worked on it. Technoserv had to do something that had never been done in Russia and do so in a limited time. Technoserv managed to accomplish this feat in less than two years.

During the Uptime Institute’s Design Certification process, Technoserv stayed in close contact with Uptime Institute subject matter experts. As a result, Technoserv was able to develop solutions as problems emerged. The company is also proud of the qualifications of Technoserv specialists, who have extensive experience in designing and building data centers and who provided the basis for the successful completion of this project.

The technical challenge was also significant. Meeting Tier IV Design Documents requirements can require a large number of redundant elements, the close relationship of mechanical and electrical systems, and testing to demonstrate that emergencies can be addressed without human intervention or damage to IT equipment.

It was necessary to account for all developments in the space and then properly develop BMS hardware that would meet these potential challenges. In addition, the automation system should also work with no loss of functionality in the event of a fault of the BMS system. Design and implementation of algorithms for the BMS demanded involvement of the automation division of Technoserv and almost 6 months of hard work.

It was important to limit the noise from the engineering equipment, as the data center is located in a residential area. Noise insulation measures required examination of the normative and regulatory documents. Knowledge of local codes was key!

Lessons Learned
Technoserv also learned again that there no minor details in a high-tech data center. For example, a topcoat applied to the floor during construction caused the floor to oxidize actively. Only after numerous measurements and testing did Technoserv find that the additive in the coating composition had entered into an electrochemical reaction with the metal supports that formed sulfuric acid and caused an electric potential on the racks of the raised floor.

The data center is currently operational. Technoserv plans to complete the Tier IV Certification of Constructed Facility process.

alexey-karpovAlexey Karpov is head of the Data Center Construction Department at Technoserv. Having more than 10 years experience in designing and building data centers, Mr. Karpov is an Accredited Tier Designer, Certified Data Centre Design Professional, and Certified Data Centre Management Professional. VTB Bank, recognized as the largest infrastructure project in Russia in 2010, and the data center for Bashneft are two large-scale projects completed under his guidance. Technoserv, Russia’s largest system integrator, was founded in 1992.Technoserve installs, develops, and outsources IT infrastructure and develops communications, engineering, and information security systems as well as power systems and application platforms. According to RA Expert, a leading Russian analytical agency, Technoserv is a leader in providing IT services in Russia. Business volumes confirm the company’s leadership in the Russian IT market; total revenues for the entire Technoserv group of companies exceeded 43 billion rubles in fiscal year 2012.