Switzerland’s First Tier IV Certified Facility Achieves Energy Efficiency

Telecommunications company Swisscom AG builds a new data center in Berne, one of seven Tier IV data centers in Europe

By Beat Lauber, Urs Moell, and Rudolf Anker

Swisscom AG, a Switzerland-based telecommunications company, recently built a new data center in Berne, Switzerland. Swisscom spent three years and invested around US$62.5 million to build a highly efficient and reliable data center. Following two years of construction, the new Berne-Wankdorf Data Center was fully operational in January 2015.
The Swisscom Berne-Wankdorf Data Center xDC, Phase 1, is one of only seven data centers in Europe awarded Uptime Institute Tier IV Certification of Constructed Facility and the first in Switzerland. It also has top ratings for energy efficiency, thanks to an innovative cooling concept, and won a 2015 Brill Award for Efficient IT. The new building is the largest of the 24 data centers operated by Swisscom in Switzerland (see Figure 1).

Figure 1. Exterior of Swisscom’s Berne Wankdorf data center, photo Nils Sandmeier

Figure 1. Exterior of Swisscom’s Berne Wankdorf data center, photo Nils Sandmeier

The data center is designed on a modular principle, permitting future expansion whenever necessary. This ensures the required degree of investment security for Swisscom and its customers. The initial stage includes four modules with access areas for personnel and equipment. Swisscom can add capacity as needed, up to a total of seven modules. The data center will house around 5,000 servers and approximately 10,000 customer systems.

MODULAR 2N CONCEPT

Each module in Berne-Wankdorf Data Center has an IT capacity of 600 kW (see Figure 2). Modules A to D, which have a total capacity of 2.4 megawatts (MW), were built in the first phase of construction. Modules E, F, and G are to be built at some point in the future, either individually or in parallel. In addition to the modules for production, an entrance module housing a reception area, a lodge, workstations, and break-out spaces have also been built.

N Anker Figure 2 image2

Figure 2. Site layout with extension modules

Two independent cells (electrical power supply and cooling) rated at 150% of the nominal power demand supply each module. This means that either cell can cover the entire power requirement of a module. The initial configuration includes four cells to supply four modules. Additional modules, each with an individual supply cell, can be attached without interruption. The supply is made via two independent paths, providing uninterrupted electricity and cooling.

SITE ARCHITECTURE

The building totals four stories, three above ground and one below ground. Server rooms are located on the ground and first floors (see Figure 3). Fuel and water tanks, as well as storage areas, are located in the basement. Outside air cools the energy supply equipment. For this reason most of the top floor is dedicated to housing building services (see Figure 4). 

The frame of the building as well as its floors, ceilings, and walls are made primarily from prefabricated sections of concrete (see Figure 5). Only the basement and the sections providing reinforcement for seismic protection are constructed from cast-in-situ concrete. The façade also consists of prefabricated sections of concrete 15 meters (m) high with inlaid thermal insulation.

The server rooms do not have visible pillars. Joists 1.05 m high support the ceilings and span 13.8 m above the IT equipment. The space between the joists is used for air movement. Warm air from the server racks is fed through a suspended ceiling to recirculating air coolers. This removes the need for a raised floor in the server rooms (see Figure 6).

Figure 3. Ground floor layout

Figure 3. Ground floor layout

 

Figure 4. Second floor layout 

Figure 4. Second floor layout

 

Figure 5. Bearing structure

Figure 5. Bearing structure

Figure 6. System profile through a server room

Figure 6. System profile through a server room

EFFICIENT COOLING SYSTEMS

An adiabatic re-cooling  system with rainwater enables Swisscom to eliminate mechanical chillers completely. As long as the outside temperature is below 21°C (70°F), the re-cooling units work on dry free cooling. When the temperature rises above 21°C (70°F), water added to the warm air draws out heat through evaporation.
The cooled water from the re-cooling units is then used to supply the CRACs to cool the IT systems. The racks are configured in a Hot Aisle containment cube that keeps the cold supply air and warm exhaust air entirely separate. Warm air from the Hot Aisle is supplied to the recirculating air coolers via a suspended ceiling. This layout means that the majority of the room is on the cool side of the cooling system. As a result, relatively pleasant working conditions are assured, despite relatively warm cooling air (see Figure 7).

Figure 7. Pictorial schematic of the cooling supply

Figure 7. Pictorial schematic of the cooling supply

The CRACs are specifically designed to a high cooling water temperature 26° C (79°F) and lowest possible inlet air temperature 28°C (82°F). The exhaust air temperature is 38° C (100°F).  With the exception of a small number of damp hot days in the summer, this ecological cooling concept can supply air cooler than 28°C (82°F) all year round.

Retrospective calculations (see Figure 8) based on data from the equipment show that the maximum foreseeable temperature of the supply air would be 30°C (86°F) in the worst-case scenario (full load, failure of an entire supply cell, extreme climate values from the last 20 years).

Figure 8. h-x diagram for air conditions 2005

Figure 8. h-x diagram for air conditions 2005

Rainwater for the adiabatic cooling system is collected from the roof, where there are two separate tanks, each of which can hold enough water to support at least 72 hours of operation. Two separate networks supply water pumper from the two tanks to hybrid recoolers through redundant water treatment systems. The recoolers can be supplied either from the osmosis tank or directly from the rainwater tank. If there is not enough rainwater, water is drawn from the city water network to fill the tanks. 

During the heating season, the heat produced from the cooling systems heats the building directly. Efficient heat distribution regulates the temperature in the rooms. The data center dissipates the remainder of the waste heat to the local energy supplier’s existing waste heat grid. The thermal energy from grid heats a residential quarter and swimming pools. The more consumers use this energy, the less the hybrid recoolers are operated, whereby a further energy-saving potential is utilized.

NOBREAK UPS

The Wankdorf Data Center does not use batteries but instead deploys SMS NoBreak equipment safeguards the uninterruptible power supply (UPS) using kinetic energy. Should the power supply fail, the NoBreak equipment uses flywheel inertia to ensure that operation continues uninterrupted until the diesel engine-generator sets start up (within seconds) to take over the energy supply (see Figure 9).

Figure 9. NoBreak equipment

Figure 9. NoBreak equipment

EXTENSIVE BUILDING AUTOMATION

Essentially, the building automation system (BAS) comprises two redundant VMWare ESX servers on which the BAS system and an energy management tool are installed. While the building automation system supports all vital functions, the energy management tool is tasked with evaluating and recording energy measurements.
A redundant signalling system provides a back-up solution for alarm signals. The system has its own independent network. All measured values are displayed and recorded. An energy management function analyses the measured values so that energy efficiency can be continuously optimized.

MAXIMUM PROTECTION

From the selection of its location to its specific construction, from its physical protective measures to its advanced security and safety concept, the Berne-Wankdorf Data Center offers the maximum in protection. Access is strictly controlled with a biometric access control system and the site is monitored from a lodge that is staffed around the clock.

ankertable


 

 


Beat Lauber

Beat Lauber

Beat Lauber is an approved visionary in the field of data center design. He is founding member and CEO of RZintegral AG, a leading Swiss company specializing in data center planning and engineering. His career includes more than 20 years of experience with critical infrastructures and involves notable challenges in design, planning, realization, and project management of large data center projectss. Audits and mandates for strategies complete his list of activities. Mr. Lauber graduated as Architect FH/SIA and made post-graduate studies in Business Administration and Risk Management and is Fire Protection Manager CFPA-E.

 

Urs Moell

Urs Moell

Urs Moell is senior data center designer at RZintegral AG and has acquired a broad knowledge in strategies and layout of critical infrastructures as well as availability, energy efficiency, safety and security. He is in charge of the development and layout, architectural design and the optimal coordination of all trades for best-performance data centers. He graduated as Architect ETH and has 20 years of experience planning buildings as well.

 

 

Rudolf Anker

Rudolf Anker

Rudolf Anker is head of Datacenter Management at Swisscom IT, where he has been since 2004. His responsibilities include project manager of new data centers, including planning, lifecycle, and operations. He initiated and provided overall project management for the new DC RZ Future and xDC  data center buildings in Zollikofen and Wankdorf).

LG CNS Deploys Custom Cooling  Approach in Korean Data Centers

IT services provider develops innovative cooling system for use in its own cloud computing data centers
By Jong Wan Kim

LG CNS is a major IT services and solutions provider in Korea. Since 1987, LG CNS has acted as the CIO for LG Group’s affiliates. As of the end of 2014, its data centers provided IT services to LG Group’s affiliates in more than 40 countries, including China, Japan, United States, India, Indonesia, Brazil, Colombia, Malaysia, and several nations in Europe. LG CNS also offers services government, public, and financial sector entities.

LG CNS operates four data centers in Korea and one each in Beijing, Nanjing, New Jersey, and Amsterdam, The Netherlands. Three of its domestic data centers are located in or near Seoul, Korea’s capital; the other one is in Busan, which is located in southeast Korea and is the country’s the second largest city (see Figure 1).

Figure 1. LG CNS data centers worldwide

Figure 1. LG CNS data centers worldwide

LG CNS and its operating technicians and potential customers all view energy efficiency as crucial to controlling costs. In recent years, however, rapid developments have dramatically improved IT performance. At the same time, new processors produce more heat, so data centers must provide more space for power and cooling infrastructure. As a result, LG CNS concluded that it needed to develop a cooling system optimized for very dense data centers. They expected that the optimized cooling system would also be more energy-efficient.

LG CNS developed and applied an optimized custom cooling system concept to its new 40-megawatt (MW), 32,321-square-meter (m2) Busan Global Cloud Data Center, which is the largest in Korea. This facility, which opened in 2013, serves as the company’s hub in northeast Asia (see Figure 2). The Busan Data Center can accommodate 3,600 servers at 6 kilowatts (kW) per rack.

Figure 2. Busan Global Cloud Data Center

Figure 2. Busan Global Cloud Data Center

The Busan Data Center makes use of whole-building, chimney-style hot-air exhaust and a hybrid cooling system (LG CNS calls it a Built-up Outside Air Cooling System) that the company developed to improve upon the energy efficiency it achieved in its existing facilities, which used a packaged-type air conditioning system (existing packaged-type air conditioning system). In addition, LG CNS developed its Smart Green Platform (SGP) software that automatically controls the unique cooling system and other data center components to achieve free cooling for eight months of the year without running chillers. The annual average PUE is estimated to be 1.39, with a minimum of 1.15 in the winter. After seeing positive results at the Busan Data Center, LG CNS decided to apply the cooling system to its Inchon Data Center, which was built in 1992 and was the first purpose-built data center in Korea.


THE BUSAN DATA CENTER SITE

The Busan location provides three advantages to LG CNS: geographic, network connectivity, and proximity to new customers.

•   Geographic: Data centers should located be where the risk of natural disasters, 
especially earthquakes, is low. Korea has relatively little seismic activity, 
making it a good candidate for data centers. The building is also set on an 
elevation that is higher than the historic high water levels.

•   Network Connectivity: Korea has four active submarine connections: the Busan cable, the Keoje cable, the C2C Basan cable, and Taean cable, which connect to the APCN, APCN-2, C2C, China-US CN, EAC, FNAL/RNAL, FLAG FEA, RJCN, R-J-K, and TPE submarine cable systems. This connectivity positions Busan as an IT hub for Asia-Pacific (see Figure 3).

•   New customers: Development plans promise to transform Busan into a global IT hub with many foreign companies accessing its infrastructure resources.

Figure 3: Map of submarine cables serving Busan

Figure 3: Map of submarine cables serving Busan

COOLING THE BUSAN DATA CENTER

Utilizing cold outside air may be the best way to reduce energy use. From June through September, outside air temperatures near Busan are often greater than 30°C (86°F) and humid, so it cannot be used for cooling (see Figure 4). To meet this environmental challenge, LG CNS developed a system that converts space that normally houses CRAC units into a functional CRAC. Although LG CNS developed the system for its new Busan Data Center, it subsequently applied it to the existing Inchon Data Center. In the existing data center, this transformation involved disassembling the CRACs. In both new and legacy retrofit applications, LG CNS utilized the walls of the CRAC room as CRAC surfaces and its aisles as paths for airflow.

Figure 4. Average annual temperatures in South Korea 

Figure 4. Average annual temperatures in South Korea

EXISTING PACKAGED-TYPE CRACS

The existed packaged-type air conditioning system used in existing LG CNS facilities includes an air supply louver, outside air chamber, air supply duct, mixing chamber, filter, humidifier/cooling coil, and fan. These systems require less space than other systems (see Figure 5-7). 

The existing packaged-type air conditioning system used at LG CNS facilities has three operating modes that vary by outside temperature. At ≈8–16°C (46-62°F), existing packaged-type CRACs provide additional cooling to supplement the outside air; and when the temperature is more than 16°C (62°F), the CRAC is fully operational and no outside air is supplied. When the temperature is below 7°C (45°F), the system uses 100% outside air and the CRAC is not in operation. This is accomplished by stopping the compressors where dedicated air-cooled DX CRACs are in use and the chillers where chilled water CRAHs are in use. Both types are used in LG CNS facilities. When outside air is supplied only the internal fan of CRAC is required, and when the temperature increases the CRAC begins to operate. 

In Korea, this system yields 100% compressor savings only from December to February and partial energy savings only from October to November and March to May. Limits to the system include:

•   The small air supply duct limits the airflow and requires more fan energy. In addition, the narrow inner space makes it difficult to access for maintenance.

•   Winter air in Korea is quite dry, so humidification is required during the winter when 100% outside air cooling is possible. However, the size of the CRAC limits its ability to humidify, which causes inefficiencies.

•   The existing packaged-type air conditioning requires space for maintaining plumbing, including pipes and ducts.

Figure 5. Diagram of existing outside air conditioning (air supply louver 110, outside air chamber 120, air supply duct 130, mixing chamber 140, filter 150, humidifier/cooling coil 160, and fan 170.

 

Figure 6. Plan view of existing packaged-type CRAC

Figure 6. Plan view of existing packaged-type CRAC

 

 

Figure 7. Sectional view of the CRAC

Figure 7. Sectional view of the CRAC

 

 

 

 

BUILT-UP OUTSIDE AIR COOLING SYSTEM

LG CNS created a task force team (TFT) comprising Operations, cooling system vendors, consulting firms, construction personnel, and other relevant professionals to develop a method of improving the flow of outside air for cooling interior white spaces. The TFT investigated new approaches for approximately 10 months, including pilot testing.
The TFT focused on three things:

•   Introducing more outdoor air to the computer room

•   Controlling the temperature of the static air supply by mixing outside cold air with the inside heated air

•   Controlling the airflow of exhaust hot air by maximizing airflow to the outside.

The main concept that resulted from the investigation involves:

•   Utilizing the building floor and ceiling as a duct and chamber

•   Customizing the size of the CRAC room walls to meet cooling demands

•   Increasing humidification efficiency by making the path that air travels longer after it passes through the cold water coil

•   Creating a separate pathway for exhausting hot air

•   Utilizing a maintenance space as a duct for the CRAC while it is operating and as a maintenance space when it is not.

The Built-up Outside Air Cooling System applied in LG CNS’s new Busan uses outdoor air and built chimneys to exhaust heat generated by the servers, which reduces energy consumption (see Figure 8). In the company’s existing data centers, the width of the packaged-type CRAC was about 2.5 meters (m). The Built-up Outside Air Cooling System should be 4-5-m wide in order to increase the volume of outside air used for cooling. This change, however, increases the construction costs and the size of the building, which can be a significant expense because of the high cost of  real estate in Korea. Saving energy is important, but larger equipment and space requirements for infrastructure can reduce IT white space. To address these issues, the width of the additional space required to supply outdoor air in the Busan Data Center is 3 m.

Figure 8. Architecture of the Built-up Outside Air Cooling System

Figure 8. Architecture of the Built-up Outside Air Cooling System

Figure 9. Sectional view of Built-up Outside Air Cooling System  OA Damper (410): Damper to control the outside air supply  OA Louver (411): Opening to introduce the outside air supply and keep rain and sunshine out
RA Damper (420): Damper to control the indoor air Filter (430): Filter to remove dust from outside and inside Coil, Humidifier (440): A water mist humidifier to control humidity and a coil for cooling air supplied inside Main Pipe (450): Pipe to provide supply and return water Fan (461): Fan for supplying cold air into internal server room

Figure 9. Sectional view of Built-up Outside Air Cooling System 
OA Damper (410): Damper to control the outside air supply 
OA Louver (411): Opening to introduce the outside air supply and keep rain and sunshine out
RA Damper (420): Damper to control the indoor air
Filter (430): Filter to remove dust from outside and inside
Coil, Humidifier (440): A water mist humidifier to control humidity and a coil for cooling air supplied inside
Main Pipe (450): Pipe to provide supply and return water
Fan (461): Fan for supplying cold air into internal server room

Because of the importance of saving space, LG CNS tested various designs to determine that an S-shaped design would provide the optimum airflow in a small space (see Figure 9). In addition, the system supplies additional outside air using the Built-up Outside Air Cooling System’s internal fan. 

Care also has to be taken to separate the supply and exhaust air paths to ensure that the mixed air does not come into upper computer room. This task can be complicated in Korea, where most Korean data centers are at least five-stories high. To solve this issue, LG CNS put the cold air supplies on either side of the Busan Data Center and the exhaust in the middle passage to its roof. The company calls it “Chimney,” the wind way (蠾‘).

Figure 10. Built-up Cooling, Pretest Condition

Figure 10. Built-up Cooling, Pretest Condition

Figure 11. Mock-up testing

Figure 11. Mock-up testing

MOCK-UP TEST

As soon as the design was complete, LG CNS made a separate testing space to determine how changes in ambient temperature would change temperatures in the computer room and evaluate the SGP software (see Figure 10 and 11). LG CNS also used the space to evaluate overall airflow, which was also satisfactory, because the system utilized the entire CRAC room. Coil static pressure, increased airflow, mixing efficiency, utilization of maintenance space, and humidification efficiency also checked out well, (see Figure 12-14) and it was determined that the overall system would work to extend the period in which outside air could be used. LG CNS expected 19% power savings compared to the existing packaged type CRAC.

Figure 12. (Left) Locations of temperature sensors 

Figure 12. Locations of temperature sensors

Figure 13. (Middle) Airflow and temperature distribution (plan view)

Figure 13. Airflow and temperature distribution (plan view)

 Figure 14. (Right) Airflow and temperature distribution (front view)

Figure 14. Airflow and temperature distribution (front view)

OPERATION OF BUILT-UP COOLING

The Built-up Outside Air Cooling System has three modes of operation: full outside air mode, mixed mode, and circulation mode. 

Full outside air mode introduces 100% outside air by means of a damper installed on the building exterior. This mode is used when temperatures are ≈16–20°C (62-68°F) from March to May and September to November. Air enters the building and passes through a mixing chamber without mixing with hot air inside the computer room. If the temperature of the supply air in the computer room is lower than appropriate, the system automatically changes to mixed mode.

Full outside air mode was designed to meet LG CNS’s service level agreements (SLA) at outside air temperatures up to 20°C (62-68°F), and LG CNS found that it could maintain Cold Aisle temperatures of 20–21°C (68-70°F) when the supply air was below 20°C (68°F). In fact, LG CNS found that it could still meet its SLAs even when outside air temperatures reached 23°C (73°F). At that outside air temperature the maximum temperature in front of the servers is ≈23–24°C (73-75°F), which still meets LG CNS’s SLAs [23~25°C (73-77 °F) air at the servers]. LG CNS believes that the fast airflow generated by using a whole room as a big CRAC contributes to maximizing cooling from the outside air.

When the ambient temperature is less than 16°C (62°F), the system operates in mixed mode. In this mode of operation, the system of operation mixes outdoor cold air and warm air before introducing it into the computer room. Sensors just inside the building’s outer wall measure the air temperature and adjust the outside damper and the computer room vent dampers to supply air to the computing room at the proper temperature. 

Circulation mode activates when the outside temperature is greater than 23°C (73°F). At those high temperatures, the outside damper is closed so that no outside air is introduced. Instead air cooled by cold water from the chiller system is introduced into the computer room. By opening the computer room vent dampers 100% while the outside damper remains closed, 4–8°C (39-46°F) cold water from the chiller cools air to the appropriate temperature to supply the computer room.

In Busan, the chiller operates at 100% over 25°C (77°F). Therefore LG CNS sets the optimum temperature data by increasing the temperature of the cooling water from the cooling tower and the temperature of chilled water from the chiller to reduce energy use.

RESULTS

At first there were many doubts about whether Korea’s first Built-up Outside Air Cooling System would reduce energy use, but now many companies view the Busan Data Center as a benchmark. The Built-up Outside Air Cooling System enables LG CNS to use 100% outside air about 8.5 months per year and achieve a PUE of 1.38, which is lower than the design basis PUE of 1.40. 

The Busan Data Center received the highest rating of A +++ from the Korean Information Technology Service Industry Association (the Association) as a result of a green data center audit. This was the first time a local data center received the Green Data Center Certification since the Association first instituted it in 2012. The Association explained that the amount of electricity saved when the PUE index at a large data center was reduced from 1.80 to 1.40 is equal to the energy that 5,840 ordinary Korean households use for a year (see Figure 15).

Figure 15. PUE at Busan Global Cloud Data Center

Figure 15. PUE at Busan Global Cloud Data Center

APPLICATION OF BUILT-UP AIR COOLING SYSTEM TO AN EXISTING DATA CENTER

LG CNS decided to apply the Built-up Outside Air Cooling System at its Inchon Data Center. It was the first purpose-built data center in Korea in 1992 and has been operating more than 20 years. Energy efficiency is relatively low because the servers are not arranged in a Hot Aisle/Cold Aisle configuration, even though aging power and cooling equipment have been replaced in recent years. 

Unlike the Busan site, Inchon provided no separate space to build in, since the servers and the CRACs are both in the computing room. As a result LG CNS had to customize the Built-up Outside Air Cooling System. 

According to power consumption analysis, the existing packaged type system used in the Inchon Data Center accounted for 36% of the facility’s total energy use, the second largest amount after IT. Air-cooled DX CRACs and chilled water CRAHs used in the facility consumed a high percentage of this energy. 

LG CNS decided to install the Built-up Outside Air Cooling System as an addition to the existing CRAC system. It was not easy to access the outer periphery where the CRAC is installed from the exterior, so LG cut into the exterior wall of the building.

There are two types of CRAC units, down-blower and upper-blower type (see Figures 16 and 17).

Figure 16. Model of down (left) and upper blow type (right)

Figure 16. Model of down (above) and upper blow type (below)

 

F Kim Figure 17B image20The type used on a project depends on the cooling exhaust system. The down-blower type is designed to mix internal air and outside air. It needs to have exhaust temperature sensors, an absorbing temperature sensor, a CRAC exhaust sensor, and an internal air-absorbing sensor. A damper regulates the exhaust and a filter cleans the air supply. The basic concept of the upper-blower type CRAC is very similar but with a different equipment layout. The outside air and mixing chambers ducts of upper blower CRACs are large enough to allow 100% supply air to be introduced into the computing room. 

The Inchon Data Center building is a two-layered structure, with down-blower type CRACs on the first floor and upper-blower types on the second floor. LG CNS designed two ways of supplying cold outside air into the computer room and installed a big duct for the down blower and upper blower CRACs to supply outside air from the cut outer wall.

Figure 17. The down (left) and upper blower (right) type of CRACs deployed at the Inchon Data Center

Figure 17. The down (above) and upper blower (below) type of CRACs deployed at the Inchon Data Center



F Kim Figure 18b image22IMPLEMENTATION AND PRE-TEST

The Built-up Outside Air Cooling System CRAC is installed in a 132-m2 space on the first floor of the Inchon facility. As in the Busan Data Center, the system has three modes, with similar operating parameters (see Figure 18).

Figure 18. Three modes of operation, outside air mode, mixing mode, and circulation mode (top to bottom)

Figure 18. Three modes of operation, outside air mode, mixing mode, and circulation mode (top to bottom)

Figure 16. Model of down (left) and upper blow type (right)

Even though LG CNS had experience with its system in Busan, the Inchon installation had additional risks because all the computer rooms were in operation. Before the construction phase, a preliminary review of expected risks was conducted so as not to affect the existing servers. 

In order to protect against dust caused by cutting the outside walls from entering the rooms, LG CNS installed medium density fiberboard (MDF). A temporary finish coating both sides of the exterior wall prevented rain from entering the building. 

When LG CNS connected the new Built-up Outside Air Cooling System to the existing CRAC, it had to turn off power to the existing CRACs. That eliminated all cooling into the server rooms so portable fans were used to provide cooling air. To maintain the proper temperatures during construction, LG CNS operated the backup CRAC and the temperature of the existing CRAC was set lower than baseline. 

During the pre-test, the system was able to maintain the computer room temperature for the enclosed space without being affected by the ambient airflow in all three operating modes. However, the computer room is an open type, so the amount of cooling supplied and heating from servers differs from area to area. The solution was to optimize cooling by setting individual targets area by area (see Figure 19).

Figure 19. The Inchon data center with built up Built-up Outside Air Cooling System

Figure 19. The picture of Inchon data center with built up Built-up Outside Air Cooling System

F Kim Figure 19b image24

As the Built-up Outside Air Cooling System CRAC was attached to the inner wall of the computer room, cooling air could not reach the center of the room, so there was a hot spot. Therefore, supply and exhaust vents were installed separately in the center of the room to smooth circulation.

RESULTS AND BENEFITS

As a result of the Inchon retrofit, LG CNS is able to maximize its use of outside air and save 1.9 million kWh in electricity costs. The installation saves LG CNS about US$228,000 in electricity annually, with PUE improving from 1.91 to 1.62 (see Figure 21).

There are various methods of improving the efficiency of air conditioning systems and reducing the heat generated in high-density data centers. However, it is much easier to introduce the Built-up Method into a newly built data center than an existing data center, such as the Inchon Data Center. However, LG CNS, through advance planning and risk prevention activities, managed the feat.

Figure 20. 100% IT power change when outside air supplied

Figure 20. 100% IT power change when outside air supplied

Figure 21. PUE before and after Built-up Outside Air Cooling System

Figure 21. PUE before and after Built-up Outside Air Cooling System

ONGOING COMMITMENT

When LG CNS first started planning the Busan Data Center, it built on four main pillars: efficiency, reliability, scalability, and security. And all four goals were integral to achieving overall sustainability. These pillars are the foundations of the data center and ensure that they are able to meet customer needs now and in the future. With that commitment, LG CNS has worked to accumulate technologies in energy efficiency and continue to make efforts to reduce energy use.


Jong Wan Kim

Jong Wan Kim

Jong Wan Kim is vice president of the LG CNS Infrastructure Unit. He has more than 25 years experience in data center management, distributed systems, and system integration projects. He has implemented operational innovations in the company’s data centers and focused on maximizing productivity based on automation in its next generation data center. Since 2010, Mr. Kim has been president of Data Center Associates, which comprises 28 domestic data center executives. He has consulted the government regarding data center-related policies and encouraged the exchange of technical information among national data centers operators to raise local standards to the global level. More recently, Mr. Kim has concentrated on providing platform-based infrastructure services, including software-defined data centers and cloud computing in distributed computing environments.

 

U.S. Bank Upgrades Its Data Center Electrical Distribution

Open heart surgery on the data center: Switchgear replacement in a live facility

By Mark Johns

U.S. Bank emerged during the 1990s from mergers and acquisitions among several major regional banks in the West and Midwest. Since then, the company continued to grow through additional large acquisitions and mergers with more than 50 banks. Today, U.S. Bancorp is a diversified American financial services holding company headquartered in Minneapolis, MN. It is the parent company of U.S. Bank National Association, which is the fifth largest bank in the United States by assets, and fourth largest by total branches. U.S. Bank’s branch network serves 25 midwestern and western states with 3,081 banking offices and 4,906 ATMs. U.S. Bancorp offers regional consumer and business banking and wealth management services, national wholesale and trust services and global payments services to over 15.8 million customers.

Rich in history, US Bancorp operates under the second oldest continuous national charter—originally Charter #24—granted during Abraham Lincoln’s administration in 1863. In addition, U.S. Bank helped finance Charles Lindbergh’s historic flight across the Atlantic. For sheer volume, U.S. Bank is the fifth-largest check processor in the nation, handling 4 billion paper checks annually at 12 processing sites. The bank’s air and ground courier fleet moves 15 million checks each day.

Energy Park Site
U.S. Bank relies on its Energy Park site in St. Paul, MN, to support these operations. Energy Park comprises a 350,000-square-foot (ft2) multi-use building that houses the check production operations and 40,000-ft2 data center, as well as support staff for both. Xcel Energy provides two 2,500-kilovolt-ampere (kVA) feeds to the data center and two 2,000-kVA feeds to the rest of the building.

The utility’s data center feeds supply power to two automatic throw over switches (ATO); each ATO feeds two transformers. Two transformers support the data center, and two other transformers support check production and power for the rest of the building, including offices and HVAC (see Figures 1-3).

Figure 1. Temporary stand-alone power plant

Figure 1. Temporary stand-alone power plant

M usbank figure2image4

Figures 2 and 3. Utility transfer and ATS

Figures 2 and 3. Utility transfer and ATS

A single UPS module feeds the check production area. However, two separate multi-module, parallel redundant UPS systems feed data center loads. Four N+1 1,500-kilowatt (kW) standby-rated engine generators backup the three UPS systems through existing switchgear distribution. The data center switchgear is a paralleling/closed-transition type, and the check production area switchgear is an open-transition type. The remaining office area space is not backed up by engine generators.

Project Summary
To ensure data center reliability, U.S. Bank initiated an Electric Modernization Project (data center electrical distribution). The project included replacing outdated switchgear and UPS systems, which were no longer supported by the manufacturer. In the project’s first phase, Russelectric paralleling switchboards were selected to replace existing equipment and create two separate distribution systems, each backed up by existing engine generators. Mechanical and UPS loads are divided between the two systems, so that either one can support the data center. Switchgear tie breakers increase overall redundancy. The facility benefits from new generator controls and new switchgear SCADA functionality, which will monitor and control utility or generator power.

Since this project was undertaken in a live facility, several special considerations had to be addressed. In order to safely replace the existing switchgear, a temporary stand-alone power plant, sized to support all of the data center loads, was assembled in a parking lot just outside the building’s existing electric/switchgear room (see Figures 4-6). The temporary power plant consisted of a new utility transformer, powered from one of the utility’s ATOs, which supplies power to an automatic transfer switch (ATS). The ATS supplies power from either the utility feeds or the standby-rated engine generators to a new distribution switchboard to support data center loads. The switchboard was installed inside a small building to protect it from the elements. Maintenance bypass switches enable staff to work on the ATS.

Figure 4. Maintenance bypass switches were installed to allow for work on the ATS

Figure 4. Maintenance bypass switches were installed to allow for work on the ATS

Figure 5 (Top) and 6 (Bottom). Switchboard was installed in a small building

Figure 5 (Top) and 6 (Bottom). Switchboard was installed in a small building

M usbank figure6 image9

Each standby-rated engine generator has two sources of fuel oil. The primary source is from a bulk tank, with additional piping connected to the site’s two existing 10,000-gallon fuel oil storage tanks to allow for filling the bulk tank or direct feed to the engine generators (see Figure 7).

Transferring Data Center Loads
U.S. Bank’s commissioning of the stand-alone power plant including testing the ATS, load testing the engine generators, infrared (IR) scanning all connections, and a simulated utility outage. Some additional cabling was added during commissioning to address cable heating due to excessive voltage drop. After commissioning was completed, data center loads were transferred to the stand-alone plant. This required providing temporary circuits for select mechanical equipment and moving loads away from four panelboards (two for mechanical equipment and two for the UPS), so that they could be shut down and re-fed from the temporary power plant. The panelboards were transferred one at a time to keep the data center on-line throughout all this work. The transfer work took place over two weekends.

The mechanical loads were sequenced first in order to put load on the stand-alone plant to provide a stable power source when the UPS systems were cut over and brought on-line. Data center loads were transferred to engine-generator power at the beginning of each day to isolate the data center from the work.

On the first Saturday devoted to the transfer process, the mechanical loads were rotated away from the first panelboard to be re-fed. Equipment requiring temporary power was cut over (see Figure 8). The isolated panelboard was then shut down and re-fed from the stand-alone plant. Once the panelboard was re-fed and power restored to it, equipment receiving temporary power was returned to its normal source. Mechanical loads were rotated back to this panelboard, so that the second panelboard could be shut down and re-fed. Data center loads were transferred back to utility power at the end of each day.

The Sunday mechanical cut over followed the same sequence as Saturday, except the stand-alone power plant, with live load, was tested at the end of the day. This testing included having Xcel Energy simulate a utility outage to the data center, which the utility did with data center loads still on engine-generator power so as not to impact the data center.

UPS were transferred the following weekend. On Saturday, the two UPS systems were transferred to engine-generator power and put into maintenance bypass so their primary power sources could be re-fed from the stand-alone power plant. At the end of the day, the two UPS systems went back on-line and transferred back to utility power. On Sunday, workers cut over the UPS maintenance bypass source. That day’s work concluded with additional testing of the stand-alone power plant, including another simulated utility outage to see how the plant would respond while supporting entire data center.

Figure 7. Standby-rated engine generators have two sources of fuel oil

Figure 7. Standby-rated engine generators have two sources of fuel oil

 

Figure 8. Data center loads were transferred to the  temporary stand-alone power plant over the course of two weekends

Figure 8. Data center loads were transferred to the  temporary stand-alone power plant over the course of two weekends

Cable Bus Installation
At the same time the stand-alone power plant was assembled and loads cut over to it, four sets of cable trays and cables were installed to facilitate dividing the UPS loads. These four sets of cable trays had to be run through office and production areas to get to the existing UPS room, which is a run of approximately 625 feet (see Figure 9). Each tray served one of the four primary and maintenance bypass UPS systems.

Figure 9. New cable buss ran about 625 feet through the facility

Figure 9. New cable buss ran about 625 feet through the facility

Switchgear and Generators
After the data center loads were transferred over to the stand-alone power plant, the old switchgear was disconnected from utility power so it could be disassembled and removed from the facility (see Figures 10 and 11). Then, the new switchgear was installed (see Figures 12 and 13).

The switchgear was designed for even distribution of loads, with an A (yellow) side and a B (blue) side (see Figure 14). Each side supports one of the two UPS systems, one of the two chillers with its pumps and towers, and half of the computer room cooling units.

M usbank figure10 image12

Figure 10 and 11. Old switchgear was disassembled and removed from the facility

Figure 10 and 11. Old switchgear was disassembled and removed from the facility

After installation, portable load banks were brought in for commissioning the new switchgear. The engine generators also received a full re-commission due to the changes in the controls and the additional alarms.

M Usbank figure12image14

Figure 12 and 13. New switchgear was installed

Figure 12 and 13. New switchgear was installed

Figure 14. Switchgear supporting Yellow and Blue sides, equally dividing the critical load

Figure 14. Switchgear supporting Yellow and Blue sides, equally dividing the critical load

 

After the new switchgear was fully commissioned, data center loads were cut over to the new switchgear following a similar transfer sequence as to the stand-alone power plant. The panelboards supporting mechanical and UPS equipment were again each cut over one panel at a time to keep the data center on-line, again requiring transferring data center loads to engine-generator power to isolate the data center throughout this work.

M usbank figure15image19

Figure 15 and 16. Upgraded engine-generator controls and alarming were installed, with panels installed in the Engineers Office

Figure 15 and 16. Upgraded engine-generator controls and alarming were installed, with panels installed in the Engineers Office

As previously mentioned, upgraded engine-generator controls and alarming were installed as part of the project (see Figures 15 and 16). The older controls had to be upgraded to allow communication with the new switchgear. Upgraded alarm panels were installed in the Engineering Office. In addition, each switchboard has a SCADA screen with a workstation installed in the Engineering Office (see Figure 17). The project also included updating MOPs for all aspects of the switchgear operation (see Figure 18).

Figure 17. New switchgear includes a new SCADA system

Figure 17. New switchgear includes a new SCADA system

Figure 18. Updated MOPs for the switchgear

Figure 18. Updated MOPs for the switchgear

The overall project went well and was completed on time with no impact to the data center. Since this phase of the project was completed, we have performed a number of live load engine-generator tests, including a few brief utility power tests, in which the engine generators were started and supported transferred load. In each test, the new equipment performed great. Phase 2 of the modernization project is the replacement of UPS System 1, which is currently underway and anticipated to be completed later in 2014. Phase 3 is replacement of UPS System 2, scheduled for 2015.


Mark Johns

Mark Johns

Mark Johns is chief engineer, U.S. Bank IT Critical Facilities Services. He has more than 26 years data center engineering experience, completing numerous Infrastructure upgrade projects, including all commissioning, without interruption to data center operations. Mr. John’s long career prior to U.S. Bank includes working in a 7-story multi-use facility, which includes data center operations, check processing operations, and support staff.

A Look at Data Center Cooling Technologies

Sabey optimizes air-cooled data centers through containment
By John Sasser

The sole purpose of data center cooling technology is to maintain environmental conditions suitable for information technology equipment (ITE) operation. Achieving this goal requires removing the heat produced by the ITE and transferring that heat to some heat sink. In most data centers, the operators expect the cooling system to operate continuously and reliably.

I clearly recall a conversation with a mechanical engineer who had operated data centers for many years. He felt that most mechanical engineers did not truly understand data center operations and design. He explained that most HVAC engineers start in office or residential design, focusing on comfort cooling, before getting into data center design. He thought that the paradigms they learn in those design projects don’t necessarily translate well to data centers.

It is important to understand that comfort cooling is not the primary purpose of data center cooling systems, even though the data center must be safe for the people who work in them. In fact, it is perfectly acceptable (and typical) for areas within a data center to be uncomfortable for long-term occupancy.

As with any well-engineered system, a data center cooling system should efficiently serve its function. Data centers can be very energy intensive, and it is quite possible for a cooling system to use as much (or more) energy as the computers it supports. Conversely, a well-designed and operated cooling system may use only a small fraction of the energy used by ITE.

In this article, I will provide some history on data center cooling. I will then discuss some of the technical elements of data center cooling, along with a comparison of data center cooling technologies, including some that we use in Sabey’s data centers.

The Economic Meltdown of Moore’s Law
In the early to mid-2000s, designers and operators worried about the ability of air-cooling technologies to cool increasingly power hungry servers. With design densities approaching or exceeding 5 kilowatts (kW) per cabinet, some believed that operators would have to resort to technologies such as rear-door heat exchangers and other kinds of in-row cooling to keep up with the increasing densities.

In 2007, Ken Brill of the Uptime Institute famously predicted the Economic Meltdown of Moore’s Law. He said that the increasing amount of heat resulting from fitting more and more transistors onto a chip would reach an endpoint at which it would no longer be economically feasible to cool the data center without significant advances in technology (see Figure 1).

Figure 1. ASHRAE New Datacom Equipment Power Chart, published February 1, 2005

Figure 1. ASHRAE New Datacom Equipment Power Chart, published February 1, 2005

The U.S. Congress even got involved. National leaders had become aware of data centers and the amount of energy they require. Congress directed the U.S. Environmental Protection Agency (EPA) to submit a report on data center energy consumption (Public Law 109-341). This law also directed the EPA to identify efficiency strategies and drive the market for efficiency. This report projected vastly increasing energy use by data centers unless measures were taken to significantly increase efficiency (see Figure 2).

Figure 2. Chart ES-1 from EPA report dated (August 2, 2007)

Figure 2. Chart ES-1 from EPA report dated (August 2, 2007)

As of 2014, Moore’s Law has not yet failed. When it does, the end will be a result of physical limitations involved in the design of chips and transistors, having nothing to do with the data center environment.

At about the same time that EPA published its data center report, industry leaders took note of efficiency issues, ITE manufacturers began to place a greater emphasis on efficiency in their designs, in addition to performance; and data center designers and operators began designing for efficiency as well as reliability and cost; and operators started to realize that efficiency does not require a sacrifice of reliability.

Legacy Cooling and the End of Raised Floor
For decades, computer rooms and data centers utilized raised floor systems to deliver cold air to servers. Cold air from a computer room air conditioner (CRAC) or computer room air handler (CRAH) pressurized the space below the raised floor. Perforated tiles provided a means for the cold air to leave the plenum and enter the main space—ideally in front of server intakes. After passing through the server, the heated air returned to the CRAC/CRAH to be cooled, usually after mixing with the cold air. Very often, the CRAC unit’s return temperature was the set point used to control the cooling system’s operation. Most commonly the CRAC unit fans ran at a constant speed, and the CRAC had a humidifier within the unit that produced steam. The primary benefit of a raised floor, from a cooling standpoint, is to deliver cold air where it is needed, with very little effort, by simply swapping a solid tile for a perforated tile (see Figure 3).

Figure 3: Legacy raised floor cooling

Figure 3: Legacy raised floor cooling

For many years, this system was the most common design for computer rooms and data centers. It is still employed today. In fact, I still find many operators who are surprised to enter a modern data center and not find raised floor and CRAC units.

The legacy system relies on one of the principles of comfort cooling: deliver a relatively small quantity of conditioned air and let that small volume of conditioned air mix with the larger volume of air in the space to reach the desired temperature. This system worked okay when ITE densities were low. Low densities enabled the system to meet its primary objective despite its flaws—poor efficiency, uneven cooling, etc.
At this point, it is an exaggeration to say the raised floor is obsolete. Companies still build data centers with raised floor air delivery. However, more and more modern data centers do not have raised floor simply because improved air delivery techniques have rendered it unnecessary.

How Cold is Cold Enough?
“Grab a jacket. We’re going in the data center.”

Heat must be removed from the vicinity of the ITE electrical components to avoid overheating the components. If a server gets too hot, onboard logic will turn it off to avoid damage to the server.

ASHRAE Technical Committee 9.9 (TC 9.9) has done considerable work in the area of determining suitable environments for ITE. I believe their publications, especially Thermal Guidelines for Data Processing Equipment, have facilitated the transformation of data centers from the “meat lockers” of legacy data centers to more moderate temperatures. [Editor’s note: The ASHRAE Technical Committee TC9.9 guideline recommends that the device inlet be between 18-27°C and 20-80% relative humidity (RH) to meet the manufacturer’s established criteria. Uptime Institute further recommends that the upper limit be reduced to 25°C to allow for upsets, variable conditions in operation, or to compensate for errors inherent in temperature sensors and/or controls systems.]

It is extremely important to understand that the TC 9.9 guidelines are based on server inlet temperatures—not internal server temperatures, not room temperatures, and certainly not server exhaust temperatures. It is also important to understand the concepts of Recommended and Allowable conditions.
If a server is kept too hot, but not so hot that it turns itself off, its lifespan could be reduced. Generally speaking, this lifespan reduction is a function of the high temperatures the server experiences and the duration of that exposure. In providing a broader Allowable range, ASHRAE TC 9.9 suggests that ITE can be exposed to the higher temperatures for more hours each year.

Given that technology refreshes can occur as often as every 3 years, ITE operators should consider how relevant the lifespan reduction is to their operations. The answer may depend on the specifics of a given situation. In a homogenous environment with a refresh rate of 4 years or less, the failure rate of increased temperatures may be insufficient to drive cooling design—especially if the manufacturer will warrant the ITE at higher temperatures. In a mixed environment with equipment of longer expected life spans, temperatures may warrant increased scrutiny.

In addition to temperature, humidity and contamination can affect ITE. Humidity and contamination tend to only affect ITE when the ITE is exposed to unacceptable conditions for a long period of time. Of course, in extreme cases (if someone dumped a bucket of water or dirt on a computer) one would expect to see an immediate effect.

The concern about low humidity involves electro-static discharge (ESD). As most people have experienced, in an environment with less moisture in the air (lower humidity), ESD events are more likely. However, ESD concerns related to low humidity in a data center have been largely debunked. In “Humidity Controls for Data Centers – Are They Necessary” (ASHRAE Journal, March 2010), Mark Hydeman and David Swenson wrote that ESD was not a real threat to ITE, as long as it stayed in the chassis. On the flip side, tight humidity control is no guarantee of protection against ESD for ITE with its casing removed. A technician removing the casing to work on components should use a wrist strap.

High humidity, on the other hand, does appear to pose a realistic threat to ITE. While condensation should definitely not occur, it is not a significant threat in most data centers. The primary threat is something called hygrometric dust particles. Basically, higher humidity can make dust in the air more likely to stick to electrical components in the computer. When dust sticks, it can reduce heat transfer and possibly cause corrosion to those components. The effect of reduced heat transfer is very similar to that caused by high temperatures.

There are several threats related to contamination. Dust can coat electronic components, reducing heat transfer. Certain types of dust, called zinc whiskers, are conductive. Zinc whiskers have been most commonly found in electroplated raised floor tiles. The zinc whiskers can become airborne and land inside a computer. Since they are conductive, they can actually cause damaging shorts in tiny internal components.  Uptime Institute documented this phenomenon in a paper entitled “Zinc Whiskers Growing on Raised-Floor Tiles Are Causing Conductive Failures and Equipment Shutdowns.”

In addition to the threats posed by physical particulate contamination, there are threats related to gaseous contamination. Certain gases can be corrosive to the electronic components.

Cooling Process
The cooling process can be broken into steps:

1.   Server Cooling. Removing heat from ITE

2.  Space Cooling. Removing heat from the space housing the ITE

3.  Heat Rejection. Rejecting the heat to a heat sink outside the data center

4.  Fluid Conditioning. Tempering and returning fluid to the white space, to maintain appropriate
conditions within the space.

Server Cooling
ITE generates heat as the electronic components within the ITE use electricity. It’s Newtonian physics: the energy in the incoming electricity is conserved. When we say a server uses electricity, we mean the server’s components are effectively changing the state of the energy from electricity to heat.

Heat transfers from a solid (the electrical component) to a fluid (typically air) within the server, often via another solid (heat sinks within the server). ITE fans draw air across the internal components, facilitating this heat transfer.

Some sytems make use of liquids to absorb and carry heat from ITE. In general, liquids perform this function more efficiently than air. I have seen three such sytems:

• Liquid contact with a heat sink. A liquid flows through a server and makes contact with a heat sink inside the equipment, absorbing heat and removing it from the ITE.

• Immersion cooling. ITE components are immersed in a non-conductive liquid. The liquid absorbs the heat and transfers it away from the components.

• Dielectric fluid with state change. ITE components are sprayed with a non-conductive liquid. The liquid changes state and takes heat away to another heat exchanger, where the fluid rejects the heat and changes state back into a liquid.

In this article, I focus on systems associated with air-cooled ITE, as that is by far the most common method used in the industry.

Space Cooling
In legacy data center designs, heated air from servers mixes with other air in the space and eventually makes its way back to a CRAC/CRAH unit. The air transfers its heat, via a coil, to a fluid within the CRAC/CRAH. In the case of a CRAC, the fluid is a refrigerant. In the case of a CRAH, the fluid is chilled water. The refrigerant or chilled water removes the heat from the space. The air coming out of the CRAC/CRAH often has a discharge temperature of 55-60°F (13-15.5°C). The CRAC/CRAH blows the air into a raised floor plenum—typically using constant-speed fans. The standard CRAC/CRAH configuration from many manufacturers and designers controls the unit’s cooling based on return air temperature.

Layout and Heat Rejection Options
While raised floor free cooling worked okay in low-density spaces where no one paid attention to efficiency, it could not meet the demands of increasing heat density and efficiency—at least not as it had been historically used. I have been in legacy data centers with temperature gauges, and I’ve measured temperatures around 60°F (15.5°C) at the base of a rack and temperatures near 80°F (26°C) at the top of the same rack and also calculated PUEs well in excess of two.

People began to employ best practices and technologies including Hot Aisles and Cold Aisles, ceiling return plenums, raised floor management, and server blanking panels to improve the cooling performance in raised floor environments. These methods are definitely beneficial, and operators should use them.

Around 2005, design professionals and operators began to experiment with the idea of containment. The idea is simple; use a physical barrier to separate cool server intake air from heated server exhaust air. Preventing cool supply air and heated exhaust air from mixing provides a number of benefits, including:

• More consistent inlet air temperatures

• The temperature of air supplied to the white space can be raised, improving options for efficiency

• The temperature of air returning to the coil is higher, which typically makes it operate more efficiently

• The space can accommodate higher density equipment

Ideally, in a contained environment, air leaves the air handling equipment at a temperature and humidity suitable for ITE operation. The air goes through the ITE only once and then returns to the air handling equipment for conditioning.

Hot Aisle Containment vs. Cold Aisle Containment
In a Cold Aisle containment system, cool air from air handlers is contained, while hot server exhaust air is allowed to return freely to the air handlers. In a Hot Aisle containment system, hot exhaust air is contained and returns to the air handlers, usually via a ceiling return plenum (see Figure 4).

Figure 4: Hot Aisle containment

Figure 4: Hot Aisle containment

Cold Aisle containment can be very useful in a raised floor retrofit, especially if there is no ceiling return plenum. In such a case, it might be possible to leave the cabinets more or less as they are, as long as they are in a Cold Aisle/Hot Aisle arrangement. One builds the containment system around the existing Cold Aisles.

Most Cold Aisle containment environments are used in conjunction with raised floor. It is also possible to use Cold Aisle containment with another delivery system, such as overhead ducting. The raised floor option allows for some flexibility; it is much more difficult to move a duct, once it is installed.

In a raised floor environment with multiple Cold Aisle pods, the volume of cold air delivered to each pod depends largely on the number of floor tiles deployed within each of the containment areas. Unless one builds an extremely high raised floor, the amount of air that can go to a given pod is going to be limited. High raised floors can be expensive to build; the heavy ITE must go on top of the raised floor.

In a Cold Aisle containment data center, one must typically assume that airflow requirements for a pod will not vary significantly on a regular basis. It is not practical to frequently switch out floor tiles or even adjust floor tile dampers. In some cases, a software system that uses CFD modeling to determine airflows based on real time information can then control air handler fan speeds in an attempt to get the right amount of air to the right pods. There are limits to how much air can be delivered to a pod with any given tile configuration; one must still try to have about the right amount of floor tiles in the proper position.

In summary, Cold Aisle containment works best in instances where the designer and operator have confidence in the layout of ITE cabinets and in instances where the loading of the ITE does not change much, nor vary widely.

I prefer Hot Aisle containment in new data centers. Hot Aisle containment increases flexibility. In a properly designed Hot Aisle containment data center, operators have more flexibility in deploying containment. The operator can deploy a full pod or chimney cabinets. The cabinet layouts can vary. One simply connects the pod or chimney to the ceiling plenum and cuts or removes ceiling tiles to allow hot air to enter it.

In a properly controlled Hot Aisle containment environment, the ITE determines how much air is needed. There is a significant flexibility in density. The cooling system floods the room with temperate air. As air is removed from the cool side of the room by server fans, the lower pressure area causes more air to flow to replace it.

Ideally, the server room has a large, open ceiling plenum, with clear returns to the air handling equipment. It is easier to have a large, open ceiling plenum than a large, open raised floor, because the ceiling plenum does not have to support the server cabinets. The air handlers remove air from the ceiling return plenum. Sabey typically controls fan speed based on differential pressure (dP) between the cool air space and the ceiling return plenum. Sabey attempts to keep the dP slightly negative in the ceiling return plenum, with respect to the cool air space. In this manner, any small leaks in containment cause cool air to go into the plenum. The air handler fans ramp up or down to maintain the proper airflow.

Hot Aisle containment requires a much simpler control scheme and provides more flexible cabinet layouts than a typical Cold Aisle containment system.

In one rather extreme example, Sabey deployed six customer racks in a 6000 ft2 space pulling a little more than 35 kilowatts (kW) per rack. The racks were all placed in a row. Sabey allowed about 24 inches between the racks and built a Hot Aisle containment pod around them. Many data centers would have trouble accommodating such high density racks. A more typical utilization in the same space might be 200 racks (30 ft2 per rack) at 4.5 kW/rack. Other than building the pod, Sabey did not have to take any sort of custom measures for the cooling. The operations sequence worked as intended, simply ramping up the air handler fans a bit to compensate for the increased airflow. These racks have been operating well for almost a year.

Hot Aisle containment systems tend to provide higher volumes of conditioned air compared to Cold Aisle containment, which is a minor benefit. In a Cold Aisle containment system, the volume of air in a data center at any given time is the volume of air in the supply plenum (whether that is a raised floor or overhead duct) and the amount of air in the contained Cold Aisles. This volume is typically less than the volume in the remainder of the room. In a Hot Aisle containment system, the room is flooded with air. The volume of hot air is typically limited to the air inside the Hot Aisle containment and the ceiling return plenum.

Hot Aisle containment also allows operators to remove raised floor from the design. Temperate air floods the room, often from the perimeter. The containment prevents mixing, so air does not have to be delivered immediately in front of the ITE. Removing raised floor reduces the initial costs and the continuing management headache.

There is one factor that could lead operators to continue to install raised floor. If one anticipates direct liquid cooling during the lifespan of the data center, a raised floor may make a very good location for the necessary piping.

Close-Coupled Cooling
There are other methods of removing heat from white spaces, including in-row and in-cabinet solutions. For example, rear-door heat exchangers accept heat from servers and remove it from a data center via a liquid.

In-row cooling devices are placed near the servers, typically as a piece of equipment placed in a row of ITE cabinets. There are also systems that are located above the server cabinets.

These close-coupled cooling systems reduce the fan energy required to move air. These types of systems do not strike me as being optimal for Sabey’s business model. I believe such a system would likely be more expensive and less flexible than Hot Aisle containment layouts for accommodating unknown future customer requirements, which is important for Sabey’s operation. Close-coupled cooling solutions can have good applications, such as increasing density in legacy data centers.

Heat Rejection
After server heat is removed from a white space, it must be rejected to a heat sink. The most common heat sink is the atmosphere. Other choices include bodies of water or the ground.

There are various methods of transferring data center heat to its ultimate heat sink. Here is a partial list:

• CRAH units with water-cooled chillers and cooling towers

• CRAH units with air-cooled chillers

• Split system CRAC units

• CRAC units with cooling towers or fluid coolers

• Pumped liquid (e.g., from in-row cooling) and cooling towers

• Airside economization

• Airside economization with direct evaporative cooling (DEC)

• Indirect evaporative cooling (IDEC)

Economizer Cooling
Most legacy systems include some form of refrigerant-based thermodynamic cycle to obtain the desired environmental conditions. Economization is cooling in which the refrigerant cycle is turned off—either part or all of the time.

Airside economizers draw outside air into the data center, which is often mixed with return air to obtain the right conditions, before entering the data center. IDEC is a variation of this in which the outside air does not enter the data center but receives heat from the inside air via a solid heat exchanger.

Evaporative cooling (either direct or indirect) systems use evaporated water to supplement the availability of economizer cooling or more efficient refrigerant-based cooling. The state change of water absorbs energy, lowering the dry bulb temperature to a point where it approaches the wet bulb (saturated) temperature of the air (see Figure 5).

Figure 5. Direct evaporative cooling (simplified)

Figure 5. Direct evaporative cooling (simplified)

In waterside economizer systems, the refrigerant cycle is not required when outside conditions are cold enough to achieve the desired chilled water temperature set points. The chilled water passes through a heat exchanger and rejects the heat directly to the condenser water loop.

Design Criteria
In order to design a cooling system, the design team must agree upon certain criteria.

Heat load (most often measured in kilowatts) typically gets the most attention. Most often, heat load actually includes two elements: total heat to be rejected and the density of that heat. Traditionally, data centers have measured heat density in watts per square foot. Many postulate that density should actually be measured in kilowatts per cabinet, which is a very defensible in cases where one knows the number of cabinets to be deployed.

Airflow receives less attention than heat load. Many people use computational fluid dynamics (CFD) software to model airflow. These programs can be especially useful in non-contained raised floor environments.

In all systems, but especially in contained environments, it is important that the volume of air produced by the cooling system meet the ITE requirement. There is a direct relationship between heat gain through a server, power consumed by the server, and airflow through that server. Heat gain through a server is typically measured by the temperature difference between the server intake and server exhaust or delta T (∆T). Airflow is measured in volume over time, typically cubic feet per minute (CFM).

Assuming load has already been determined, a designer should know (or, more realistically, assume) a ∆T. If the designer does not assume a ∆T, the designer leaves it to the equipment manufacturer to determine the design ∆T, which could result in airflow that does not match the requirements.

I typically ask designers to assume a 20°F (11°C) ∆T. Higher density equipment, such as blades, typically has higher ∆T. However, most commodity servers are doing well to get as high as a 20°F (11°C) ∆T. (Proper containment and various set points can also make a tremendous difference.)

The risk of designing a system in which the design ∆T is lower than the actual ∆T is that the system will not be able to deliver the necessary airflow/cooling. The risk in going the other way is that the owner will have purchased more capacity than the design goals otherwise warrant.

The Design Day equals the most extreme outside air conditions the design is intended to handle. The owner and designers have to decide how hot is hot enough, as it affects the operation of the equipment. In Seattle, in the 100 years before July 29, 2009, there was not a recorded ambient temperature above 100°F (38°C) (as measured at SeaTac airport). Also keep in mind that equipment is often located (especially on the roof) where temperatures are higher than are experienced at official weather stations.

An owner must determine what the temperature and humidity should be in the space. Typically, this is specified for a Design Day when N equipment is operating and redundant units are off-line. Depending on the system, the designers will determine air handler discharge set points based on these conditions, making assumptions and/or calculations of temperature increases between the air handler discharge and the server inlet. There can be opportunities for more efficient systems if the owner is willing to go into the ASHRAE Allowable range during extreme outside temperatures and/or during upset conditions such as utility interruptions. Sabey typically seeks to stay within the ASHRAE Recommended range In its business model.

The owner and designer should understand the reliability goals of the data center and design mechanical, electrical, and controls to support these reliability goals. Of course, when considering these items, the design team may be subject to over building. If the design team assumes an extreme Design Day, adds in redundant equipment, specifies the low end of the ASHRAE Recommended range, and then maybe adds a little percentage on top, just in case, the resulting system can be highly reliable, if designed and operated appropriately. It can also be too expensive to build and inefficient to operate.

It is worth understanding that data centers do not typically operate at design load. In fact, during much of a data center’s lifespan, it may operate in a lightly loaded state. Operators and designers should spend some time making the data center efficient in those conditions, not just as it approaches design load. Sabey has made design choices that allow us to not only cool efficiently, but also to cool efficiently at light loads. Figure 6 shows that we reached average PUE conditions of 1.20 at only 10% loading at one of its operating data centers.

Figure 6. PUE and design load (%) over time.

Figure 6. PUE and design load (%) over time.

Crystal Ball
While very high density ITE is still being built and deployed, the density of most ITE has not kept up with the increases projected 10 years ago. Sabey was designing data centers at an average 150 watts/ft2 6 years ago, and the company has not yet seen a reason to increase that. Of course, Sabey can accommodate significantly higher localized densities where needed.

In the near future, I expect air-based cooling systems with containment to continue to be the system of choice for cooling data centers. In the long term, I would not be surprised to see increasing adoption of liquid-cooling technologies.

Conclusion
Sabey Data Centers develops and operates data centers. It has customers in many different verticals and of many different sizes. As a service provider, Sabey does not typically know the technology or layout its customers will require. Sabey’s data centers use different cooling technologies, suitable to the location. Sabey has data centers in the mild climate of Seattle, the semi-arid climate of central Washington, and in downtown New York City. Sabey’s data centers are housed in single-story greenfield buildings and in a redeveloped high-rise.

Despite these variations and uncertainties, all the data centers Sabey designs and operates have certain common elements. They all use Hot Aisle containment without raised floor. All have a ceiling return plenum for server exhaust air and flood the room for the server inlet air. These data centers all employ some form of economizer. Sabey seeks to operate efficiently in lightly loaded conditions, with variable speed motors for fans, pumps, and chillers, where applicable.

Sabey has used a variety of different mechanical systems with Hot Aisle containment, and I tend to prefer IDEC air handlers, where practical. Sabey has found that this is a very efficient system with lower water use than the name implies. Much of the time, the system is operating in dry heat exchanger mode. The system tends to facilitate very simple control sequencing, and that simplicity enhances reliability. The systems restart rapidly, which is good in utility interruptions. The fans keep spinning and ramp up as soon as the generators start providing power. Water remains in the sump, so the evaporative cooling process requires essentially no restart time. Sabey has successfully cooled racks between 35-40 kW with no problem.

Until there is broad adoption of liquid-cooled servers, the primary opportunities appear to be in optimizing air-cooled, contained data centers.


John Sasser

John Sasser

John Sasser brings more than 20 years of management experience to the operations of Sabey Data Centers’ portfolio of campuses. In addition to all day-to-day operations, start-ups and transitions, he is responsible for developing the conceptual bases of design and operations for all Sabey data centers, managing client relationships, overseeing construction projects, and overall master planning.

Mr. Sasser and his team have received recognition from a variety of organizations, including continuous uptime awards from the Uptime Institute and energy conservations awards from Seattle City Light and the Association of Energy Engineers.

Prior to joining Sabey, he worked for Capital One and Walt Disney Company. Mr. Sasser also spent 7 years with the Navy Civil Engineer Corps.

AIG Tells How It Raised Its Level of Operations Excellence

By Kevin Heslin and Lee Kirby

Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates, and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.

With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.

This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.

Herb Alvarez: Director of Global Engineering and Critical Facilities
American International Group

An experienced staff was empowered to improve infrastructure, staffing, processes, and programs

What’s the greatest challenge managing your current footprint?

Providing global support and oversight via a thin staffing model can be difficult, but due to the organizational structure and the relationship with our global FM alliance partner (CBRE) we have been able to improve service delivery, manage cost, and enhance reliability. From my perspective, the greatest challenges have been managing the cultural differences of the various regions, followed by the limited availability of qualified staffing in some of the regions. With our global FM partner, we can provide qualified coverage for approximately 90% of our portfolio; the remaining 10% is where we see some of these challenges.

 

Do you have reliability or energy benchmarks?

We continue to make energy efficiency and sustainability a core requirement of our data center management practice. Over the last few years we retrofitted two existing data center pods at our two global data centers and we replaced EOL (end of life) equipment with best-in-class, higher efficiency systems. The UPS systems that we installed achieve a 98% efficiency rating while operating in ESS mode and 94 to 96% rating while operating in VMMS mode. In addition, the new cooling systems were installed with variable flow controls and VFDs for the chillers, pumps, and CRAHs. Including full cold aisle containment as well as multiple control algorithms to enhance operating efficiency. Our target operating model for the new data center pods was to achieve a Tier III level of reliability along with a 1.75 PUE, and we achieved both of these objectives. The next step on our energy and sustainability path is to seek Energy Star and other industry recognitions.

 

Can you tell me about your governance model and how that works?

My group in North America is responsible for the strategic direction and the overall management for the critical environments around the world. We set the standards (design, construction, operations, etc.), guidelines, and processes. Our regional engineering managers, in turn, carry these, out at the regional level. At the country level, we have the tactical management (FM) that ultimately implements the strategy. We subscribe to a system of checks and balances, and we have incorporated global and regional auditing to ensure that we have consistency throughout the execution phase. We also incorporate KPIs to promote the high level of service delivery that we expect.

 

From your perspective, what is the greatest difficulty in making that model work, ensuring that the design ideas are appropriate for each facility, and that they are executed according to your standards?

The greatest difficulties encountered were attributed to the cultural differences between regions. Initially, we encountered some resistance at the international level in regards to broad acceptance of design standards and operating standards. However, with the support of executive senior leadership and the on-going consolidation effort, we achieved global acceptance through a persistent and focused effort. We now have the visibility and oversight to ensure that our standards and guidelines are being enforced across the regions. It is important to mention that our standards, although rigid, do have flexible components embedded in them due to the fact that a “one size fits all” regimen is not always feasible. For these instances, we incorporated an exception process that grants the required flexibility to deviate from a documented standard. In terms of execution, we now have the ability via “in-country” resources to validate designs and their execution.

It also requires changing the culture, even within our own corporate group. For example, we have a Transactions group that starts the search for facilities. Our group said that we should only be in this certain type of building, this quality of building, so we created some standards and minimum requirements. We said, “We are AIG. We are an insurance company. We can’t go into a shop house.” This was a cultural change, because Transactions always looked for the lowest cost option first.

The AIG name is at stake. Anything we do that is deficient has the potential to blemish the brand.

 

Herb, it sounds like you are describing a pretty successful program. And yet, I am wondering if there are things that you would do differently if you starting from scratch.

If it were a clean slate, and a completely new start, I would look to use an M&O type of assessment at the onset of any new initiatives as it relates to data center space acquisition. Utilizing M&O as a widely accepted and recognized tool would help us achieve consistency across data centers and would validate colo provider capabilities as it relates to their operational practices.

 

How do M&O stamps help the organization, and which parts of your operations do they influence the most?

I see two clear benefits. From the management and operations perspective, the M&O Stamp offers us a proven methodology of assessing our M&O practice, not only validating our program but also offering a level of benchmarking against other participants of the assessments. The other key benefit is that the M&O stamp helps us promote our capabilities within the AIG organization. Often, we believe that we are operationally on par with the industry, but a third-party validation from a globally accepted and recognized organization helps further validate our beliefs and our posture as it relates to the quality of the service delivery that we provide. We look at the M&O stamp as an on-going certification process that ensures that we continually uphold the underlying principles of management and operations excellence, a badge of honor if you will.

 

AIG has been awarded two M&O Stamps of Approval in the U.S. I know you had similar scores on the two facilities. Were the recommendations similar?

I expected more commonality between both of the facilities. When you have a global partner, you expect consistency across sites. In these cases, there were about five recommendations for each site; two of them were common to both sites. The others were not. It highlighted the need for us to re-assess the operation in several areas, and remediate where necessary.

 

Of course you have way more than two facilities. Were you able to look at those reports and those recommendations and apply them universally?

Oh, absolutely. If there was a recommendation specific to one site, we did not look at it just for that site. We looked to leverage that across the portfolio. It only makes sense, as it applies to our core operating principals of standardizing across the portfolio.

 

Is setting KPIs for operations performance part of your FM vendor management strategy?

KPIs are very important to the way we operate. They allow us to set clear and measureable performance indicators that we utilize to gauge our performance. The KPIs drive our requirement for continuous improvement and development. We incentivize our alliance partner and its employees based on KPI performance, which helps drive operational excellence.

 

Who do you share the information with and who holds you accountable for improvements in your KPIs?

That’s an interesting question. This information is shared with our senior management as it forms our year-over-year objectives and is used as a basis for our own performance reviews and incentive packages. We review our KPIs on an on-going basis to ensure that we are trending positively; we re-assess the KPIs on an annual basis to ensure that they remain relevant to the desired corporate objectives. During the last several years one of our primary KPIs has been to drive cost reductions to the tune of 5% reductions across the portfolio.

 

Does implementing those reductions become part of staff appraisals?

For my direct reports, the answer is yes. It becomes part of their annual objectives, they have to be measurable and we have to agree that they are achievable. We track progress on a regular basis and communicate progress via our quarterly employee reviews. Again, we are very careful that any such reductions do not adversely impact our operations or detract us from achieving our uptime requirements.

 

Do you feel that AIG has mastered demand management so you can effectively plan, deploy, and manage capacity at the speed of the client?

I think that we have made significant improvements over the last few years in terms of capacity planning, but I do believe that this is an area where we can still continue to improve. Our capacity planning team does a very good job of tracking, trending, and projecting workloads. But there is ample opportunity for us to become more granular on the projections side of the reporting, so that we have a very clear and transparent view of what is planned, its anticipated arrival, and its anticipated deployment time line. We recognize that we all play a role, and the expectation is that we will all work collaboratively to implement these types of enhancements to our demand/capacity management practice.

 

So you are viewing all of this as a competitive advantage.

You have to. That’s a clear objective for all of senior management. We have to have a competitive edge in the marketplace, whether that’s on the technology side, product side, or how we deliver services to our clients. We need to be best in class. We need to champion the cause and drive this message throughout the organization.

 

Staffing is a huge part of maintaining data center operational excellence. We hear from our Network members that finding and keeping talent is a challenge. Is this something you are seeing as well?

I definitely do think there is a shortage of data center talent. We have experienced this first hand. I do believe that the industry needs to have a focused data center education program to train data center personnel. I am not referring to the theoretical or on-line programs, which already exist, but hands-on training that is specific to data center infrastructure. Typical trade school programs focus on general systems and equipment but do not have a track that is specific to data centers, one that also includes operational practices in critical environments. I think there has got to be something in the industry that’s specialized and hands-on. Training that covers the complex systems found in data centers, such as UPS systems, switchgear, EPMS, BMS, fire suppression, etc.

 

How do you retain your own good talent?

Keep them happy, keep them trained, and above all keep it interesting. You have to have a succession track, a practice that allows growth from within but also accounts for employee turnover. The succession track has to ensure that we have operational continuity when a team member moves on to pursue other opportunities.

The data center environment is a very demanding environment, and so you have to keep staff members focused and engaged. We focus on building a team, and as part of team development we ensure team members are properly trained and developed to the point where we can help them achieve their personal goals, which often times includes upward mobility. Our development track is based on the CBRE Foundations training program. In addition to the training program, AIG and CBRE provide multiple avenues for staff members to pursue growth opportunities.

 

When the staff is stable, what kinds of things can you do to keep them happy when you can’t promote them?

Oftentimes, it is the small things you do that resonate the most. I am a firm believer that above-average performance needs to be rewarded. We are pro-active and at times very creative in how we acknowledge those that are considered top performers. The Brill Award, which we achieved as a team, is just one example. We acknowledged the team members with a very focused and sincere thank you communication, acknowledging not only their participation but also the fact that it could not have been achieved without them. From a senior management perspective, we can’t lose sight of the fact that in order to cultivate a team environment you have to be part of the team. We advocate for a culture of inclusion, development, and opportunity.


Herb Alvarez

Herb Alvarez

Herb Alvarez is director of Global Engineering & Critical Facilities, American International Group. Inc. Mr. Alvarez is responsible for engineering and critical facilities management for the AIG portfolio, which comprises 970 facilities spread across 130 countries. Mr. Alvarez has overarching responsibility for the global data center facilities and their building operations. He works closely and in collaboration with AIG’s Global Services group, which is the company’s IT division.

AIG operates three purpose-built data centers in the U.S., including a 235,000 square foot (ft2) facility in New Jersey and a 205,000-ft2 facility in Texas, and eight regional colo data centers in Asia Pacific, EMEA, and Japan.

Mr. Alvarez helped implement a consolidation and standardization effort Global Infrastructure Utility (GIU) that AIG’s CEO Robert Benmosche implemented in 2010. This initiative was completed in 2013.


 

Kevin Heslin

Kevin Heslin

Kevin Heslin is chief editor and director of ancillary projects at the Uptime Institute. He served as an editor at New York Construction News, Sutton Publishing, the IESNA, and BNP Media, where he founded Mission Critical, the leading commercial publication dedicated to data center and backup power professionals. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the B.A. in Journalism from Fordham University in 1981 and a B.S. in Technical Communications from Rensselaer Polytechnic Institute in 2000.

 

 

 

 

 

Meeting the M&O Challenge of Managing a Diverse Data Center Footprint: John Sheputis and Don Jenkins, Infomart

By Matt Stansberry and Lee Kirby

Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.

With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.

This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.

John Sheputis: President, Infomart Data Centers

Don Jenkins: VP Operations, Infomart Data Centers

Give our readers a sense of your current data center footprint.

Sheputis: The portfolio includes about 2.2 million square feet (ft2) of real estate, mostly data center space. The facilities in both of our West Coast locations are data center exclusive. The Dallas facility is enormous, at 1.6 million ft2, and is a combination of mission critical and non-mission critical space. Our newest site in Ashburn, VA, is 180,000 ft2 and undergoing re-development now, with commissioning on the new critical load capacity expected to complete early next year.

The Dallas site has been operational since the 1980s. We assumed the responsibility for the data center pods in that building in Q4 2014 and brought on staff from that site to our team.

What is the greatest challenge of managing your current footprint?

Jenkins: There are several challenges, but communicating standards across the portfolio is a big one. Also, different municipalities have varying local codes and governmental regulations. We need to adapt our standards to the different regions.

For example, air quality control standards vary at different sites. We have to meet very high air quality standards in California, which means we adhere to very strict requirements for engine-generator runtimes and exhaust filter media. But in other locations, the regulations are less strict, and that variance impacts our maintenance schedules and parts procurement.

Sheputis: It may sound trivial to go from an area where air quality standards are high to one that is less stringent, but it still represents a change in our standards. If you’re going to do development, it’s probably best to start in California or somewhere with more restrictive standards and then go somewhere else. It would be very difficult to go the other way.

More generally, the Infomart merger was a big bite. It includes a lot of responsibility for non-data center space. So now we have two operating standards. We have over 500,000 ft2 of office-use real estate that uses the traditional break-fix operation model. We also have over two dozen data center suites with another 500,000 ft2 of mission critical space as well, where nothing breaks, or if it does, there can be no interruption of service. These different types of property have two different operations objectives and require different skill sets. Putting those varying levels of operations under one team expands the number of challenges you absorb. It pushes us from managing a few sites to a “many sites” level of complexity.

How do you benchmark performance goals?

Sheputis: I’m going to restrict my response to our mission critical space. When we start or assume control of a project, we have some pretty unforgiving standards. We want concurrent maintenance, industry-leading PUE, on time, on budget, and no injuries—and we want our project to meet critical load capacity and quality standards.

But picking up somebody else’s capital project after they‘ve already completed their design and begun the work, yet before they finished? That is the hardest thing in the world. The Dallas Infomart site is so big, there are two or three construction projects going on at any time. Show up any weekend, and you’ll somebody is doing a crane pick or has a helicopter delivering some equipment to be installed on the roof. It’s that big. It’s a damn good thing that we have great staff on site in Dallas and someone like Don Jenkins to make sure everything goes smoothly.

We hear a lot about data center operations staffing shortages. What has been your experience at Infomart?

Jenkins: Good help is hard to find anywhere. Data center skills are very specific. It’s a lot harder to find good data center people. One of the things we try to do is hire veterans. Over half our operating engineers have military backgrounds, including myself. We do this not just out of patriotism or to meet security concerns, but because we understand and appreciate the similarity of a mission critical operation and a military operation (see http://journal.uptimeinstitute.com/resolving-data-center-staffing-shortage/).

Sheputis: If you have high standards, there is always a shortage of people for any job. But the corollary for that is that if you’re known for doing your job very well, the best people often find you. Don deserves credit for building low turnover teams. Creating a culture of continuity requires more than strong technical skillsets, you have to begin recruiting the kinds of people who can play on a team.

Don uses this phrase a lot to describe the type he’s looking for—people who are capable of both leading and being led. He wants candidates with low egos who care about outcomes, strong ethics, and who want to learn. We invest heavily in our training program, and we are rigorous in finding people who buy into our process. We don’t want people who want to be heroes. The ideal candidate is a responsible team player with an aptitude for learning, and we fill in the technical gaps as necessary over time. No one has all the skills they need day one. Our training is industry leading. To date, we have had no voluntary turnover.

Jenkins: We do about 250 man-hours of training for each staff member. It’s not cheap, but we feel it’s necessary and the guys love it. They want to learn. They ask for it. Greater skill attainment is a win-win for them, our tenants, and us.

Sheputis: When you build a data center, you often meet the technically strongest people at either the beginning of the project during design or the end of the project during the commissioning phase. Every project we do is Level 5 Commissioned. That’s when you find and address all of the odd or unusual use cases that the manufacturer may not have anticipated. More than once, we have had a UPS troubleshooting specialist say to Don, “You guys do it right. Let me know when you have an opening in your organization.”

Jenkins: I think it’s a testament that shows how passionate we are about what we do.

Are you standardizing management practices across multiple sites?

Sheputis: When we had one or two sites, it wasn’t a challenge because we were copying from California to Oregon. But with three or more sites it becomes much more difficult. With the inclusion of Dallas and Ashburn, we have had to raise our game. It is tempting to say we do the same thing everywhere, but that would be unrealistic at best.

Broadly speaking, we have two families of standards: Content and Process. For functional content we have specs for staffing, maintenance, security, monitoring, and the like. We apply these with the knowledge that there will be local exceptions—such as different codes and different equipment choices. An operator from one site has to appreciate the deviations at the other sites. We also have process-based standards, and these are more meticulously applied across sites. While the OEM equipment may be different, shouldn’t the process for change management be consistent? Same goes for the problem management process. Compliance is another area where consistency is expected.

The challenge with projecting any standard is to efficiently create evidence of acceptance and verification. We try to create a working feedback loop, and we are always looking for ways to do it better. We can centrally document standard policies and procedures, but we rely on field acceptance of the standard, and we leverage our systems to measure execution versus expectation. We can say please complete work orders on time and to the following spec, and we can delegate scheduling to the field, but the loop isn’t complete until we confirm execution and offer feedback on whether the work and documentation were acceptable.

What technology or methodology has helped your organization to significantly improve data center management?

Jenkins: Our standard building management system BMS is a Niagaraproduct with an open framework. This allows our legacy equipment to talk over open protocols. All of our dashboards and data look the same and feel the same across all of the sites so that anybody could pull up another site and it would look the same to the operator.

Sheputis: Whatever system you’re using, there has to be a high premium on keeping it open. If you run on a closed system, it eventually becomes a lost island. This is especially true as you scale your operation. You have to have open systems.

How does your organization use the M&O Stamp?

Sheputis: The M&O stamp is one of the most important things we have ever achieved. And I’m not saying this to flatter you or the Uptime Institute. We believe data center operations are very important, and we have always believed we were pretty good. But I have to believe that many operators think they do a good job as well. So who is right? How does anyone really know? The challenge to the casual observer is that the data center industry is fairly closed. Operations are secure and private.

We started the process to see how good we were, and if we were good, we also thought it would be great to have a credible third party to acknowledge that. Saying I think I’m good is one thing, having a credentialed organization like Uptime Institute say so is much more.

But the M&O process is more than the Stamp of Approval. Our operations have matured and improved by participating in this process. Every year we reassess and recertify we feel like we learn new things, and we’re tracking our progress. The bigger benefit may be that the process forces us to think procedurally. When we’re setting up a new site, it helps us set a roadmap for what we want to achieve. Compared to all other forms of certification, we get something out of this beyond the credential; we get a path to improve.

Jenkins: Lots of people run a SWOT (strengths, weaknesses, opportunities, and threats) analysis or internal audit, but that feedback often lacks external reference points. You can give yourself an audit, and you can say “we’re great.” But what are you learning? How do you expand your knowledge? The M&O Stamp of Approval provides learning opportunities for us by providing a neutral experienced outsider viewpoint on where, and more importantly, how we can do better.

On one of the assessments, one of Uptime Institute’s consultants demonstrated how we could setup our chiller plant so that an operator could see all the key variables easily at a glance, with fewer steps to see what valves are open or closed. The advice was practical and easy to implement. Including markers on a chain, little flags on a chiller, LED lights on a pump. Very simple things to do, but we hadn’t thought of it. They’d seen it in Europe, it was easy to do, and it helps. That’s one specific example, but we used the knowledge of the M&O team to help us grow.

We think the M&O criteria and content will get better and deeper as time goes on. This is a solid standard for people to grow on.

Sheputis: We are for certifications, as they remove doubt, but most of the work and value is had in obtaining the first certification. I can see why others are cynical about value and cost to recertify. But I do think there’s real value in the ongoing M&O certification, mainly because it shows continuous improvement. No other certification process does that.

Jenkins: A lot of certifications are binary in that you pass if you have enough checked boxes—the content is specific, but operationally shallow. We feel that we get a lot more content out of the M&O process.

Sheputis: As I said before, we are for compliance and transparency. As we are often fulfilling a compliance requirement for someone else, there is clear value is saying we are PCI compliant or SSAE certified. But the M&O Stamp of Approval process is more like seeing a professional instructor. All other certifications should address the M&O stamp as “Sir.”


matt-stansberry

Matt Stansberry

Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly editorial director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.