Open heart surgery on the data center: Switchgear replacement in a live facility
By Mark Johns
U.S. Bank emerged during the 1990s from mergers and acquisitions among several major regional banks in the West and Midwest. Since then, the company continued to grow through additional large acquisitions and mergers with more than 50 banks. Today, U.S. Bancorp is a diversified American financial services holding company headquartered in Minneapolis, MN. It is the parent company of U.S. Bank National Association, which is the fifth largest bank in the United States by assets, and fourth largest by total branches. U.S. Bank’s branch network serves 25 midwestern and western states with 3,081 banking offices and 4,906 ATMs. U.S. Bancorp offers regional consumer and business banking and wealth management services, national wholesale and trust services and global payments services to over 15.8 million customers.
Rich in history, US Bancorp operates under the second oldest continuous national charter—originally Charter #24—granted during Abraham Lincoln’s administration in 1863. In addition, U.S. Bank helped finance Charles Lindbergh’s historic flight across the Atlantic. For sheer volume, U.S. Bank is the fifth-largest check processor in the nation, handling 4 billion paper checks annually at 12 processing sites. The bank’s air and ground courier fleet moves 15 million checks each day.
Energy Park Site
U.S. Bank relies on its Energy Park site in St. Paul, MN, to support these operations. Energy Park comprises a 350,000-square-foot (ft2) multi-use building that houses the check production operations and 40,000-ft2 data center, as well as support staff for both. Xcel Energy provides two 2,500-kilovolt-ampere (kVA) feeds to the data center and two 2,000-kVA feeds to the rest of the building.
The utility’s data center feeds supply power to two automatic throw over switches (ATO); each ATO feeds two transformers. Two transformers support the data center, and two other transformers support check production and power for the rest of the building, including offices and HVAC (see Figures 1-3).
Figure 1. Temporary stand-alone power plant
Figures 2 and 3. Utility transfer and ATS
A single UPS module feeds the check production area. However, two separate multi-module, parallel redundant UPS systems feed data center loads. Four N+1 1,500-kilowatt (kW) standby-rated engine generators backup the three UPS systems through existing switchgear distribution. The data center switchgear is a paralleling/closed-transition type, and the check production area switchgear is an open-transition type. The remaining office area space is not backed up by engine generators.
Project Summary
To ensure data center reliability, U.S. Bank initiated an Electric Modernization Project (data center electrical distribution). The project included replacing outdated switchgear and UPS systems, which were no longer supported by the manufacturer. In the project’s first phase, Russelectric paralleling switchboards were selected to replace existing equipment and create two separate distribution systems, each backed up by existing engine generators. Mechanical and UPS loads are divided between the two systems, so that either one can support the data center. Switchgear tie breakers increase overall redundancy. The facility benefits from new generator controls and new switchgear SCADA functionality, which will monitor and control utility or generator power.
Since this project was undertaken in a live facility, several special considerations had to be addressed. In order to safely replace the existing switchgear, a temporary stand-alone power plant, sized to support all of the data center loads, was assembled in a parking lot just outside the building’s existing electric/switchgear room (see Figures 4-6). The temporary power plant consisted of a new utility transformer, powered from one of the utility’s ATOs, which supplies power to an automatic transfer switch (ATS). The ATS supplies power from either the utility feeds or the standby-rated engine generators to a new distribution switchboard to support data center loads. The switchboard was installed inside a small building to protect it from the elements. Maintenance bypass switches enable staff to work on the ATS.
Figure 4. Maintenance bypass switches were installed to allow for work on the ATS
Figure 5 (Top) and 6 (Bottom). Switchboard was installed in a small building
Each standby-rated engine generator has two sources of fuel oil. The primary source is from a bulk tank, with additional piping connected to the site’s two existing 10,000-gallon fuel oil storage tanks to allow for filling the bulk tank or direct feed to the engine generators (see Figure 7).
Transferring Data Center Loads
U.S. Bank’s commissioning of the stand-alone power plant including testing the ATS, load testing the engine generators, infrared (IR) scanning all connections, and a simulated utility outage. Some additional cabling was added during commissioning to address cable heating due to excessive voltage drop. After commissioning was completed, data center loads were transferred to the stand-alone plant. This required providing temporary circuits for select mechanical equipment and moving loads away from four panelboards (two for mechanical equipment and two for the UPS), so that they could be shut down and re-fed from the temporary power plant. The panelboards were transferred one at a time to keep the data center on-line throughout all this work. The transfer work took place over two weekends.
The mechanical loads were sequenced first in order to put load on the stand-alone plant to provide a stable power source when the UPS systems were cut over and brought on-line. Data center loads were transferred to engine-generator power at the beginning of each day to isolate the data center from the work.
On the first Saturday devoted to the transfer process, the mechanical loads were rotated away from the first panelboard to be re-fed. Equipment requiring temporary power was cut over (see Figure 8). The isolated panelboard was then shut down and re-fed from the stand-alone plant. Once the panelboard was re-fed and power restored to it, equipment receiving temporary power was returned to its normal source. Mechanical loads were rotated back to this panelboard, so that the second panelboard could be shut down and re-fed. Data center loads were transferred back to utility power at the end of each day.
The Sunday mechanical cut over followed the same sequence as Saturday, except the stand-alone power plant, with live load, was tested at the end of the day. This testing included having Xcel Energy simulate a utility outage to the data center, which the utility did with data center loads still on engine-generator power so as not to impact the data center.
UPS were transferred the following weekend. On Saturday, the two UPS systems were transferred to engine-generator power and put into maintenance bypass so their primary power sources could be re-fed from the stand-alone power plant. At the end of the day, the two UPS systems went back on-line and transferred back to utility power. On Sunday, workers cut over the UPS maintenance bypass source. That day’s work concluded with additional testing of the stand-alone power plant, including another simulated utility outage to see how the plant would respond while supporting entire data center.
Figure 7. Standby-rated engine generators have two sources of fuel oil
Figure 8. Data center loads were transferred to the temporary stand-alone power plant over the course of two weekends
Cable Bus Installation
At the same time the stand-alone power plant was assembled and loads cut over to it, four sets of cable trays and cables were installed to facilitate dividing the UPS loads. These four sets of cable trays had to be run through office and production areas to get to the existing UPS room, which is a run of approximately 625 feet (see Figure 9). Each tray served one of the four primary and maintenance bypass UPS systems.
Figure 9. New cable buss ran about 625 feet through the facility
Switchgear and Generators
After the data center loads were transferred over to the stand-alone power plant, the old switchgear was disconnected from utility power so it could be disassembled and removed from the facility (see Figures 10 and 11). Then, the new switchgear was installed (see Figures 12 and 13).
The switchgear was designed for even distribution of loads, with an A (yellow) side and a B (blue) side (see Figure 14). Each side supports one of the two UPS systems, one of the two chillers with its pumps and towers, and half of the computer room cooling units.
Figure 10 and 11. Old switchgear was disassembled and removed from the facility
After installation, portable load banks were brought in for commissioning the new switchgear. The engine generators also received a full re-commission due to the changes in the controls and the additional alarms.
Figure 12 and 13. New switchgear was installed
Figure 14. Switchgear supporting Yellow and Blue sides, equally dividing the critical load
After the new switchgear was fully commissioned, data center loads were cut over to the new switchgear following a similar transfer sequence as to the stand-alone power plant. The panelboards supporting mechanical and UPS equipment were again each cut over one panel at a time to keep the data center on-line, again requiring transferring data center loads to engine-generator power to isolate the data center throughout this work.
Figure 15 and 16. Upgraded engine-generator controls and alarming were installed, with panels installed in the Engineers Office
As previously mentioned, upgraded engine-generator controls and alarming were installed as part of the project (see Figures 15 and 16). The older controls had to be upgraded to allow communication with the new switchgear. Upgraded alarm panels were installed in the Engineering Office. In addition, each switchboard has a SCADA screen with a workstation installed in the Engineering Office (see Figure 17). The project also included updating MOPs for all aspects of the switchgear operation (see Figure 18).
Figure 17. New switchgear includes a new SCADA system
Figure 18. Updated MOPs for the switchgear
The overall project went well and was completed on time with no impact to the data center. Since this phase of the project was completed, we have performed a number of live load engine-generator tests, including a few brief utility power tests, in which the engine generators were started and supported transferred load. In each test, the new equipment performed great. Phase 2 of the modernization project is the replacement of UPS System 1, which is currently underway and anticipated to be completed later in 2014. Phase 3 is replacement of UPS System 2, scheduled for 2015.
Mark Johns
Mark Johns is chief engineer, U.S. Bank IT Critical Facilities Services. He has more than 26 years data center engineering experience, completing numerous Infrastructure upgrade projects, including all commissioning, without interruption to data center operations. Mr. John’s long career prior to U.S. Bank includes working in a 7-story multi-use facility, which includes data center operations, check processing operations, and support staff.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/08/johns.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-08-14 08:38:332015-08-14 08:38:33U.S. Bank Upgrades Its Data Center Electrical Distribution
Sabey optimizes air-cooled data centers through containment
By John Sasser
The sole purpose of data center cooling technology is to maintain environmental conditions suitable for information technology equipment (ITE) operation. Achieving this goal requires removing the heat produced by the ITE and transferring that heat to some heat sink. In most data centers, the operators expect the cooling system to operate continuously and reliably.
I clearly recall a conversation with a mechanical engineer who had operated data centers for many years. He felt that most mechanical engineers did not truly understand data center operations and design. He explained that most HVAC engineers start in office or residential design, focusing on comfort cooling, before getting into data center design. He thought that the paradigms they learn in those design projects don’t necessarily translate well to data centers.
It is important to understand that comfort cooling is not the primary purpose of data center cooling systems, even though the data center must be safe for the people who work in them. In fact, it is perfectly acceptable (and typical) for areas within a data center to be uncomfortable for long-term occupancy.
As with any well-engineered system, a data center cooling system should efficiently serve its function. Data centers can be very energy intensive, and it is quite possible for a cooling system to use as much (or more) energy as the computers it supports. Conversely, a well-designed and operated cooling system may use only a small fraction of the energy used by ITE.
In this article, I will provide some history on data center cooling. I will then discuss some of the technical elements of data center cooling, along with a comparison of data center cooling technologies, including some that we use in Sabey’s data centers.
The Economic Meltdown of Moore’s Law
In the early to mid-2000s, designers and operators worried about the ability of air-cooling technologies to cool increasingly power hungry servers. With design densities approaching or exceeding 5 kilowatts (kW) per cabinet, some believed that operators would have to resort to technologies such as rear-door heat exchangers and other kinds of in-row cooling to keep up with the increasing densities.
In 2007, Ken Brill of the Uptime Institute famously predicted the Economic Meltdown of Moore’s Law. He said that the increasing amount of heat resulting from fitting more and more transistors onto a chip would reach an endpoint at which it would no longer be economically feasible to cool the data center without significant advances in technology (see Figure 1).
Figure 1. ASHRAE New Datacom Equipment Power Chart, published February 1, 2005
The U.S. Congress even got involved. National leaders had become aware of data centers and the amount of energy they require. Congress directed the U.S. Environmental Protection Agency (EPA) to submit a report on data center energy consumption (Public Law 109-341). This law also directed the EPA to identify efficiency strategies and drive the market for efficiency. This report projected vastly increasing energy use by data centers unless measures were taken to significantly increase efficiency (see Figure 2).
As of 2014, Moore’s Law has not yet failed. When it does, the end will be a result of physical limitations involved in the design of chips and transistors, having nothing to do with the data center environment.
At about the same time that EPA published its data center report, industry leaders took note of efficiency issues, ITE manufacturers began to place a greater emphasis on efficiency in their designs, in addition to performance; and data center designers and operators began designing for efficiency as well as reliability and cost; and operators started to realize that efficiency does not require a sacrifice of reliability.
Legacy Cooling and the End of Raised Floor
For decades, computer rooms and data centers utilized raised floor systems to deliver cold air to servers. Cold air from a computer room air conditioner (CRAC) or computer room air handler (CRAH) pressurized the space below the raised floor. Perforated tiles provided a means for the cold air to leave the plenum and enter the main space—ideally in front of server intakes. After passing through the server, the heated air returned to the CRAC/CRAH to be cooled, usually after mixing with the cold air. Very often, the CRAC unit’s return temperature was the set point used to control the cooling system’s operation. Most commonly the CRAC unit fans ran at a constant speed, and the CRAC had a humidifier within the unit that produced steam. The primary benefit of a raised floor, from a cooling standpoint, is to deliver cold air where it is needed, with very little effort, by simply swapping a solid tile for a perforated tile (see Figure 3).
Figure 3: Legacy raised floor cooling
For many years, this system was the most common design for computer rooms and data centers. It is still employed today. In fact, I still find many operators who are surprised to enter a modern data center and not find raised floor and CRAC units.
The legacy system relies on one of the principles of comfort cooling: deliver a relatively small quantity of conditioned air and let that small volume of conditioned air mix with the larger volume of air in the space to reach the desired temperature. This system worked okay when ITE densities were low. Low densities enabled the system to meet its primary objective despite its flaws—poor efficiency, uneven cooling, etc.
At this point, it is an exaggeration to say the raised floor is obsolete. Companies still build data centers with raised floor air delivery. However, more and more modern data centers do not have raised floor simply because improved air delivery techniques have rendered it unnecessary.
How Cold is Cold Enough?
“Grab a jacket. We’re going in the data center.”
Heat must be removed from the vicinity of the ITE electrical components to avoid overheating the components. If a server gets too hot, onboard logic will turn it off to avoid damage to the server.
ASHRAE Technical Committee 9.9 (TC 9.9) has done considerable work in the area of determining suitable environments for ITE. I believe their publications, especially Thermal Guidelines for Data Processing Equipment, have facilitated the transformation of data centers from the “meat lockers” of legacy data centers to more moderate temperatures. [Editor’s note: The ASHRAE Technical Committee TC9.9 guideline recommends that the device inlet be between 18-27°C and 20-80% relative humidity (RH) to meet the manufacturer’s established criteria. Uptime Institute further recommends that the upper limit be reduced to 25°C to allow for upsets, variable conditions in operation, or to compensate for errors inherent in temperature sensors and/or controls systems.]
It is extremely important to understand that the TC 9.9 guidelines are based on server inlet temperatures—not internal server temperatures, not room temperatures, and certainly not server exhaust temperatures. It is also important to understand the concepts of Recommended and Allowable conditions.
If a server is kept too hot, but not so hot that it turns itself off, its lifespan could be reduced. Generally speaking, this lifespan reduction is a function of the high temperatures the server experiences and the duration of that exposure. In providing a broader Allowable range, ASHRAE TC 9.9 suggests that ITE can be exposed to the higher temperatures for more hours each year.
Given that technology refreshes can occur as often as every 3 years, ITE operators should consider how relevant the lifespan reduction is to their operations. The answer may depend on the specifics of a given situation. In a homogenous environment with a refresh rate of 4 years or less, the failure rate of increased temperatures may be insufficient to drive cooling design—especially if the manufacturer will warrant the ITE at higher temperatures. In a mixed environment with equipment of longer expected life spans, temperatures may warrant increased scrutiny.
In addition to temperature, humidity and contamination can affect ITE. Humidity and contamination tend to only affect ITE when the ITE is exposed to unacceptable conditions for a long period of time. Of course, in extreme cases (if someone dumped a bucket of water or dirt on a computer) one would expect to see an immediate effect.
The concern about low humidity involves electro-static discharge (ESD). As most people have experienced, in an environment with less moisture in the air (lower humidity), ESD events are more likely. However, ESD concerns related to low humidity in a data center have been largely debunked. In “Humidity Controls for Data Centers – Are They Necessary” (ASHRAE Journal, March 2010), Mark Hydeman and David Swenson wrote that ESD was not a real threat to ITE, as long as it stayed in the chassis. On the flip side, tight humidity control is no guarantee of protection against ESD for ITE with its casing removed. A technician removing the casing to work on components should use a wrist strap.
High humidity, on the other hand, does appear to pose a realistic threat to ITE. While condensation should definitely not occur, it is not a significant threat in most data centers. The primary threat is something called hygrometric dust particles. Basically, higher humidity can make dust in the air more likely to stick to electrical components in the computer. When dust sticks, it can reduce heat transfer and possibly cause corrosion to those components. The effect of reduced heat transfer is very similar to that caused by high temperatures.
There are several threats related to contamination. Dust can coat electronic components, reducing heat transfer. Certain types of dust, called zinc whiskers, are conductive. Zinc whiskers have been most commonly found in electroplated raised floor tiles. The zinc whiskers can become airborne and land inside a computer. Since they are conductive, they can actually cause damaging shorts in tiny internal components. Uptime Institute documented this phenomenon in a paper entitled “Zinc Whiskers Growing on Raised-Floor Tiles Are Causing Conductive Failures and Equipment Shutdowns.”
In addition to the threats posed by physical particulate contamination, there are threats related to gaseous contamination. Certain gases can be corrosive to the electronic components.
Cooling Process
The cooling process can be broken into steps:
1. Server Cooling. Removing heat from ITE
2. Space Cooling. Removing heat from the space housing the ITE
3. Heat Rejection. Rejecting the heat to a heat sink outside the data center
4. Fluid Conditioning. Tempering and returning fluid to the white space, to maintain appropriate
conditions within the space.
Server Cooling
ITE generates heat as the electronic components within the ITE use electricity. It’s Newtonian physics: the energy in the incoming electricity is conserved. When we say a server uses electricity, we mean the server’s components are effectively changing the state of the energy from electricity to heat.
Heat transfers from a solid (the electrical component) to a fluid (typically air) within the server, often via another solid (heat sinks within the server). ITE fans draw air across the internal components, facilitating this heat transfer.
Some sytems make use of liquids to absorb and carry heat from ITE. In general, liquids perform this function more efficiently than air. I have seen three such sytems:
• Liquid contact with a heat sink. A liquid flows through a server and makes contact with a heat sink inside the equipment, absorbing heat and removing it from the ITE.
• Immersion cooling. ITE components are immersed in a non-conductive liquid. The liquid absorbs the heat and transfers it away from the components.
• Dielectric fluid with state change. ITE components are sprayed with a non-conductive liquid. The liquid changes state and takes heat away to another heat exchanger, where the fluid rejects the heat and changes state back into a liquid.
In this article, I focus on systems associated with air-cooled ITE, as that is by far the most common method used in the industry.
Space Cooling
In legacy data center designs, heated air from servers mixes with other air in the space and eventually makes its way back to a CRAC/CRAH unit. The air transfers its heat, via a coil, to a fluid within the CRAC/CRAH. In the case of a CRAC, the fluid is a refrigerant. In the case of a CRAH, the fluid is chilled water. The refrigerant or chilled water removes the heat from the space. The air coming out of the CRAC/CRAH often has a discharge temperature of 55-60°F (13-15.5°C). The CRAC/CRAH blows the air into a raised floor plenum—typically using constant-speed fans. The standard CRAC/CRAH configuration from many manufacturers and designers controls the unit’s cooling based on return air temperature.
Layout and Heat Rejection Options
While raised floor free cooling worked okay in low-density spaces where no one paid attention to efficiency, it could not meet the demands of increasing heat density and efficiency—at least not as it had been historically used. I have been in legacy data centers with temperature gauges, and I’ve measured temperatures around 60°F (15.5°C) at the base of a rack and temperatures near 80°F (26°C) at the top of the same rack and also calculated PUEs well in excess of two.
People began to employ best practices and technologies including Hot Aisles and Cold Aisles, ceiling return plenums, raised floor management, and server blanking panels to improve the cooling performance in raised floor environments. These methods are definitely beneficial, and operators should use them.
Around 2005, design professionals and operators began to experiment with the idea of containment. The idea is simple; use a physical barrier to separate cool server intake air from heated server exhaust air. Preventing cool supply air and heated exhaust air from mixing provides a number of benefits, including:
• More consistent inlet air temperatures
• The temperature of air supplied to the white space can be raised, improving options for efficiency
• The temperature of air returning to the coil is higher, which typically makes it operate more efficiently
• The space can accommodate higher density equipment
Ideally, in a contained environment, air leaves the air handling equipment at a temperature and humidity suitable for ITE operation. The air goes through the ITE only once and then returns to the air handling equipment for conditioning.
Hot Aisle Containment vs. Cold Aisle Containment
In a Cold Aisle containment system, cool air from air handlers is contained, while hot server exhaust air is allowed to return freely to the air handlers. In a Hot Aisle containment system, hot exhaust air is contained and returns to the air handlers, usually via a ceiling return plenum (see Figure 4).
Figure 4: Hot Aisle containment
Cold Aisle containment can be very useful in a raised floor retrofit, especially if there is no ceiling return plenum. In such a case, it might be possible to leave the cabinets more or less as they are, as long as they are in a Cold Aisle/Hot Aisle arrangement. One builds the containment system around the existing Cold Aisles.
Most Cold Aisle containment environments are used in conjunction with raised floor. It is also possible to use Cold Aisle containment with another delivery system, such as overhead ducting. The raised floor option allows for some flexibility; it is much more difficult to move a duct, once it is installed.
In a raised floor environment with multiple Cold Aisle pods, the volume of cold air delivered to each pod depends largely on the number of floor tiles deployed within each of the containment areas. Unless one builds an extremely high raised floor, the amount of air that can go to a given pod is going to be limited. High raised floors can be expensive to build; the heavy ITE must go on top of the raised floor.
In a Cold Aisle containment data center, one must typically assume that airflow requirements for a pod will not vary significantly on a regular basis. It is not practical to frequently switch out floor tiles or even adjust floor tile dampers. In some cases, a software system that uses CFD modeling to determine airflows based on real time information can then control air handler fan speeds in an attempt to get the right amount of air to the right pods. There are limits to how much air can be delivered to a pod with any given tile configuration; one must still try to have about the right amount of floor tiles in the proper position.
In summary, Cold Aisle containment works best in instances where the designer and operator have confidence in the layout of ITE cabinets and in instances where the loading of the ITE does not change much, nor vary widely.
I prefer Hot Aisle containment in new data centers. Hot Aisle containment increases flexibility. In a properly designed Hot Aisle containment data center, operators have more flexibility in deploying containment. The operator can deploy a full pod or chimney cabinets. The cabinet layouts can vary. One simply connects the pod or chimney to the ceiling plenum and cuts or removes ceiling tiles to allow hot air to enter it.
In a properly controlled Hot Aisle containment environment, the ITE determines how much air is needed. There is a significant flexibility in density. The cooling system floods the room with temperate air. As air is removed from the cool side of the room by server fans, the lower pressure area causes more air to flow to replace it.
Ideally, the server room has a large, open ceiling plenum, with clear returns to the air handling equipment. It is easier to have a large, open ceiling plenum than a large, open raised floor, because the ceiling plenum does not have to support the server cabinets. The air handlers remove air from the ceiling return plenum. Sabey typically controls fan speed based on differential pressure (dP) between the cool air space and the ceiling return plenum. Sabey attempts to keep the dP slightly negative in the ceiling return plenum, with respect to the cool air space. In this manner, any small leaks in containment cause cool air to go into the plenum. The air handler fans ramp up or down to maintain the proper airflow.
Hot Aisle containment requires a much simpler control scheme and provides more flexible cabinet layouts than a typical Cold Aisle containment system.
In one rather extreme example, Sabey deployed six customer racks in a 6000 ft2 space pulling a little more than 35 kilowatts (kW) per rack. The racks were all placed in a row. Sabey allowed about 24 inches between the racks and built a Hot Aisle containment pod around them. Many data centers would have trouble accommodating such high density racks. A more typical utilization in the same space might be 200 racks (30 ft2 per rack) at 4.5 kW/rack. Other than building the pod, Sabey did not have to take any sort of custom measures for the cooling. The operations sequence worked as intended, simply ramping up the air handler fans a bit to compensate for the increased airflow. These racks have been operating well for almost a year.
Hot Aisle containment systems tend to provide higher volumes of conditioned air compared to Cold Aisle containment, which is a minor benefit. In a Cold Aisle containment system, the volume of air in a data center at any given time is the volume of air in the supply plenum (whether that is a raised floor or overhead duct) and the amount of air in the contained Cold Aisles. This volume is typically less than the volume in the remainder of the room. In a Hot Aisle containment system, the room is flooded with air. The volume of hot air is typically limited to the air inside the Hot Aisle containment and the ceiling return plenum.
Hot Aisle containment also allows operators to remove raised floor from the design. Temperate air floods the room, often from the perimeter. The containment prevents mixing, so air does not have to be delivered immediately in front of the ITE. Removing raised floor reduces the initial costs and the continuing management headache.
There is one factor that could lead operators to continue to install raised floor. If one anticipates direct liquid cooling during the lifespan of the data center, a raised floor may make a very good location for the necessary piping.
Close-Coupled Cooling
There are other methods of removing heat from white spaces, including in-row and in-cabinet solutions. For example, rear-door heat exchangers accept heat from servers and remove it from a data center via a liquid.
In-row cooling devices are placed near the servers, typically as a piece of equipment placed in a row of ITE cabinets. There are also systems that are located above the server cabinets.
These close-coupled cooling systems reduce the fan energy required to move air. These types of systems do not strike me as being optimal for Sabey’s business model. I believe such a system would likely be more expensive and less flexible than Hot Aisle containment layouts for accommodating unknown future customer requirements, which is important for Sabey’s operation. Close-coupled cooling solutions can have good applications, such as increasing density in legacy data centers.
Heat Rejection
After server heat is removed from a white space, it must be rejected to a heat sink. The most common heat sink is the atmosphere. Other choices include bodies of water or the ground.
There are various methods of transferring data center heat to its ultimate heat sink. Here is a partial list:
• CRAH units with water-cooled chillers and cooling towers
• CRAH units with air-cooled chillers
• Split system CRAC units
• CRAC units with cooling towers or fluid coolers
• Pumped liquid (e.g., from in-row cooling) and cooling towers
• Airside economization
• Airside economization with direct evaporative cooling (DEC)
• Indirect evaporative cooling (IDEC)
Economizer Cooling
Most legacy systems include some form of refrigerant-based thermodynamic cycle to obtain the desired environmental conditions. Economization is cooling in which the refrigerant cycle is turned off—either part or all of the time.
Airside economizers draw outside air into the data center, which is often mixed with return air to obtain the right conditions, before entering the data center. IDEC is a variation of this in which the outside air does not enter the data center but receives heat from the inside air via a solid heat exchanger.
Evaporative cooling (either direct or indirect) systems use evaporated water to supplement the availability of economizer cooling or more efficient refrigerant-based cooling. The state change of water absorbs energy, lowering the dry bulb temperature to a point where it approaches the wet bulb (saturated) temperature of the air (see Figure 5).
Figure 5. Direct evaporative cooling (simplified)
In waterside economizer systems, the refrigerant cycle is not required when outside conditions are cold enough to achieve the desired chilled water temperature set points. The chilled water passes through a heat exchanger and rejects the heat directly to the condenser water loop.
Design Criteria
In order to design a cooling system, the design team must agree upon certain criteria.
Heat load (most often measured in kilowatts) typically gets the most attention. Most often, heat load actually includes two elements: total heat to be rejected and the density of that heat. Traditionally, data centers have measured heat density in watts per square foot. Many postulate that density should actually be measured in kilowatts per cabinet, which is a very defensible in cases where one knows the number of cabinets to be deployed.
Airflow receives less attention than heat load. Many people use computational fluid dynamics (CFD) software to model airflow. These programs can be especially useful in non-contained raised floor environments.
In all systems, but especially in contained environments, it is important that the volume of air produced by the cooling system meet the ITE requirement. There is a direct relationship between heat gain through a server, power consumed by the server, and airflow through that server. Heat gain through a server is typically measured by the temperature difference between the server intake and server exhaust or delta T (∆T). Airflow is measured in volume over time, typically cubic feet per minute (CFM).
Assuming load has already been determined, a designer should know (or, more realistically, assume) a ∆T. If the designer does not assume a ∆T, the designer leaves it to the equipment manufacturer to determine the design ∆T, which could result in airflow that does not match the requirements.
I typically ask designers to assume a 20°F (11°C) ∆T. Higher density equipment, such as blades, typically has higher ∆T. However, most commodity servers are doing well to get as high as a 20°F (11°C) ∆T. (Proper containment and various set points can also make a tremendous difference.)
The risk of designing a system in which the design ∆T is lower than the actual ∆T is that the system will not be able to deliver the necessary airflow/cooling. The risk in going the other way is that the owner will have purchased more capacity than the design goals otherwise warrant.
The Design Day equals the most extreme outside air conditions the design is intended to handle. The owner and designers have to decide how hot is hot enough, as it affects the operation of the equipment. In Seattle, in the 100 years before July 29, 2009, there was not a recorded ambient temperature above 100°F (38°C) (as measured at SeaTac airport). Also keep in mind that equipment is often located (especially on the roof) where temperatures are higher than are experienced at official weather stations.
An owner must determine what the temperature and humidity should be in the space. Typically, this is specified for a Design Day when N equipment is operating and redundant units are off-line. Depending on the system, the designers will determine air handler discharge set points based on these conditions, making assumptions and/or calculations of temperature increases between the air handler discharge and the server inlet. There can be opportunities for more efficient systems if the owner is willing to go into the ASHRAE Allowable range during extreme outside temperatures and/or during upset conditions such as utility interruptions. Sabey typically seeks to stay within the ASHRAE Recommended range In its business model.
The owner and designer should understand the reliability goals of the data center and design mechanical, electrical, and controls to support these reliability goals. Of course, when considering these items, the design team may be subject to over building. If the design team assumes an extreme Design Day, adds in redundant equipment, specifies the low end of the ASHRAE Recommended range, and then maybe adds a little percentage on top, just in case, the resulting system can be highly reliable, if designed and operated appropriately. It can also be too expensive to build and inefficient to operate.
It is worth understanding that data centers do not typically operate at design load. In fact, during much of a data center’s lifespan, it may operate in a lightly loaded state. Operators and designers should spend some time making the data center efficient in those conditions, not just as it approaches design load. Sabey has made design choices that allow us to not only cool efficiently, but also to cool efficiently at light loads. Figure 6 shows that we reached average PUE conditions of 1.20 at only 10% loading at one of its operating data centers.
Figure 6. PUE and design load (%) over time.
Crystal Ball
While very high density ITE is still being built and deployed, the density of most ITE has not kept up with the increases projected 10 years ago. Sabey was designing data centers at an average 150 watts/ft2 6 years ago, and the company has not yet seen a reason to increase that. Of course, Sabey can accommodate significantly higher localized densities where needed.
In the near future, I expect air-based cooling systems with containment to continue to be the system of choice for cooling data centers. In the long term, I would not be surprised to see increasing adoption of liquid-cooling technologies.
Conclusion
Sabey Data Centers develops and operates data centers. It has customers in many different verticals and of many different sizes. As a service provider, Sabey does not typically know the technology or layout its customers will require. Sabey’s data centers use different cooling technologies, suitable to the location. Sabey has data centers in the mild climate of Seattle, the semi-arid climate of central Washington, and in downtown New York City. Sabey’s data centers are housed in single-story greenfield buildings and in a redeveloped high-rise.
Despite these variations and uncertainties, all the data centers Sabey designs and operates have certain common elements. They all use Hot Aisle containment without raised floor. All have a ceiling return plenum for server exhaust air and flood the room for the server inlet air. These data centers all employ some form of economizer. Sabey seeks to operate efficiently in lightly loaded conditions, with variable speed motors for fans, pumps, and chillers, where applicable.
Sabey has used a variety of different mechanical systems with Hot Aisle containment, and I tend to prefer IDEC air handlers, where practical. Sabey has found that this is a very efficient system with lower water use than the name implies. Much of the time, the system is operating in dry heat exchanger mode. The system tends to facilitate very simple control sequencing, and that simplicity enhances reliability. The systems restart rapidly, which is good in utility interruptions. The fans keep spinning and ramp up as soon as the generators start providing power. Water remains in the sump, so the evaporative cooling process requires essentially no restart time. Sabey has successfully cooled racks between 35-40 kW with no problem.
Until there is broad adoption of liquid-cooled servers, the primary opportunities appear to be in optimizing air-cooled, contained data centers.
John Sasser
John Sasser brings more than 20 years of management experience to the operations of Sabey Data Centers’ portfolio of campuses. In addition to all day-to-day operations, start-ups and transitions, he is responsible for developing the conceptual bases of design and operations for all Sabey data centers, managing client relationships, overseeing construction projects, and overall master planning.
Mr. Sasser and his team have received recognition from a variety of organizations, including continuous uptime awards from the Uptime Institute and energy conservations awards from Seattle City Light and the Association of Energy Engineers.
Prior to joining Sabey, he worked for Capital One and Walt Disney Company. Mr. Sasser also spent 7 years with the Navy Civil Engineer Corps.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/07/sasser.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-07-30 09:52:132015-07-30 09:52:13A Look at Data Center Cooling Technologies
Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates, and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.
With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.
This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.
Herb Alvarez: Director of Global Engineering and Critical Facilities American International Group
An experienced staff was empowered to improve infrastructure, staffing, processes, and programs
What’s the greatest challenge managing your current footprint?
Providing global support and oversight via a thin staffing model can be difficult, but due to the organizational structure and the relationship with our global FM alliance partner (CBRE) we have been able to improve service delivery, manage cost, and enhance reliability. From my perspective, the greatest challenges have been managing the cultural differences of the various regions, followed by the limited availability of qualified staffing in some of the regions. With our global FM partner, we can provide qualified coverage for approximately 90% of our portfolio; the remaining 10% is where we see some of these challenges.
Do you have reliability or energy benchmarks?
We continue to make energy efficiency and sustainability a core requirement of our data center management practice. Over the last few years we retrofitted two existing data center pods at our two global data centers and we replaced EOL (end of life) equipment with best-in-class, higher efficiency systems. The UPS systems that we installed achieve a 98% efficiency rating while operating in ESS mode and 94 to 96% rating while operating in VMMS mode. In addition, the new cooling systems were installed with variable flow controls and VFDs for the chillers, pumps, and CRAHs. Including full cold aisle containment as well as multiple control algorithms to enhance operating efficiency. Our target operating model for the new data center pods was to achieve a Tier III level of reliability along with a 1.75 PUE, and we achieved both of these objectives. The next step on our energy and sustainability path is to seek Energy Star and other industry recognitions.
Can you tell me about your governance model and how that works?
My group in North America is responsible for the strategic direction and the overall management for the critical environments around the world. We set the standards (design, construction, operations, etc.), guidelines, and processes. Our regional engineering managers, in turn, carry these, out at the regional level. At the country level, we have the tactical management (FM) that ultimately implements the strategy. We subscribe to a system of checks and balances, and we have incorporated global and regional auditing to ensure that we have consistency throughout the execution phase. We also incorporate KPIs to promote the high level of service delivery that we expect.
From your perspective, what is the greatest difficulty in making that model work, ensuring that the design ideas are appropriate for each facility, and that they are executed according to your standards?
The greatest difficulties encountered were attributed to the cultural differences between regions. Initially, we encountered some resistance at the international level in regards to broad acceptance of design standards and operating standards. However, with the support of executive senior leadership and the on-going consolidation effort, we achieved global acceptance through a persistent and focused effort. We now have the visibility and oversight to ensure that our standards and guidelines are being enforced across the regions. It is important to mention that our standards, although rigid, do have flexible components embedded in them due to the fact that a “one size fits all” regimen is not always feasible. For these instances, we incorporated an exception process that grants the required flexibility to deviate from a documented standard. In terms of execution, we now have the ability via “in-country” resources to validate designs and their execution.
It also requires changing the culture, even within our own corporate group. For example, we have a Transactions group that starts the search for facilities. Our group said that we should only be in this certain type of building, this quality of building, so we created some standards and minimum requirements. We said, “We are AIG. We are an insurance company. We can’t go into a shop house.” This was a cultural change, because Transactions always looked for the lowest cost option first.
The AIG name is at stake. Anything we do that is deficient has the potential to blemish the brand.
Herb, it sounds like you are describing a pretty successful program. And yet, I am wondering if there are things that you would do differently if you starting from scratch.
If it were a clean slate, and a completely new start, I would look to use an M&O type of assessment at the onset of any new initiatives as it relates to data center space acquisition. Utilizing M&O as a widely accepted and recognized tool would help us achieve consistency across data centers and would validate colo provider capabilities as it relates to their operational practices.
How do M&O stamps help the organization, and which parts of your operations do they influence the most?
I see two clear benefits. From the management and operations perspective, the M&O Stamp offers us a proven methodology of assessing our M&O practice, not only validating our program but also offering a level of benchmarking against other participants of the assessments. The other key benefit is that the M&O stamp helps us promote our capabilities within the AIG organization. Often, we believe that we are operationally on par with the industry, but a third-party validation from a globally accepted and recognized organization helps further validate our beliefs and our posture as it relates to the quality of the service delivery that we provide. We look at the M&O stamp as an on-going certification process that ensures that we continually uphold the underlying principles of management and operations excellence, a badge of honor if you will.
AIG has been awarded two M&O Stamps of Approval in the U.S. I know you had similar scores on the two facilities. Were the recommendations similar?
I expected more commonality between both of the facilities. When you have a global partner, you expect consistency across sites. In these cases, there were about five recommendations for each site; two of them were common to both sites. The others were not. It highlighted the need for us to re-assess the operation in several areas, and remediate where necessary.
Of course you have way more than two facilities. Were you able to look at those reports and those recommendations and apply them universally?
Oh, absolutely. If there was a recommendation specific to one site, we did not look at it just for that site. We looked to leverage that across the portfolio. It only makes sense, as it applies to our core operating principals of standardizing across the portfolio.
Is setting KPIs for operations performance part of your FM vendor management strategy?
KPIs are very important to the way we operate. They allow us to set clear and measureable performance indicators that we utilize to gauge our performance. The KPIs drive our requirement for continuous improvement and development. We incentivize our alliance partner and its employees based on KPI performance, which helps drive operational excellence.
Who do you share the information with and who holds you accountable for improvements in your KPIs?
That’s an interesting question. This information is shared with our senior management as it forms our year-over-year objectives and is used as a basis for our own performance reviews and incentive packages. We review our KPIs on an on-going basis to ensure that we are trending positively; we re-assess the KPIs on an annual basis to ensure that they remain relevant to the desired corporate objectives. During the last several years one of our primary KPIs has been to drive cost reductions to the tune of 5% reductions across the portfolio.
Does implementing those reductions become part of staff appraisals?
For my direct reports, the answer is yes. It becomes part of their annual objectives, they have to be measurable and we have to agree that they are achievable. We track progress on a regular basis and communicate progress via our quarterly employee reviews. Again, we are very careful that any such reductions do not adversely impact our operations or detract us from achieving our uptime requirements.
Do you feel that AIG has mastered demand management so you can effectively plan, deploy, and manage capacity at the speed of the client?
I think that we have made significant improvements over the last few years in terms of capacity planning, but I do believe that this is an area where we can still continue to improve. Our capacity planning team does a very good job of tracking, trending, and projecting workloads. But there is ample opportunity for us to become more granular on the projections side of the reporting, so that we have a very clear and transparent view of what is planned, its anticipated arrival, and its anticipated deployment time line. We recognize that we all play a role, and the expectation is that we will all work collaboratively to implement these types of enhancements to our demand/capacity management practice.
So you are viewing all of this as a competitive advantage.
You have to. That’s a clear objective for all of senior management. We have to have a competitive edge in the marketplace, whether that’s on the technology side, product side, or how we deliver services to our clients. We need to be best in class. We need to champion the cause and drive this message throughout the organization.
Staffing is a huge part of maintaining data center operational excellence. We hear from our Network members that finding and keeping talent is a challenge. Is this something you are seeing as well?
I definitely do think there is a shortage of data center talent. We have experienced this first hand. I do believe that the industry needs to have a focused data center education program to train data center personnel. I am not referring to the theoretical or on-line programs, which already exist, but hands-on training that is specific to data center infrastructure. Typical trade school programs focus on general systems and equipment but do not have a track that is specific to data centers, one that also includes operational practices in critical environments. I think there has got to be something in the industry that’s specialized and hands-on. Training that covers the complex systems found in data centers, such as UPS systems, switchgear, EPMS, BMS, fire suppression, etc.
How do you retain your own good talent?
Keep them happy, keep them trained, and above all keep it interesting. You have to have a succession track, a practice that allows growth from within but also accounts for employee turnover. The succession track has to ensure that we have operational continuity when a team member moves on to pursue other opportunities.
The data center environment is a very demanding environment, and so you have to keep staff members focused and engaged. We focus on building a team, and as part of team development we ensure team members are properly trained and developed to the point where we can help them achieve their personal goals, which often times includes upward mobility. Our development track is based on the CBRE Foundations training program. In addition to the training program, AIG and CBRE provide multiple avenues for staff members to pursue growth opportunities.
When the staff is stable, what kinds of things can you do to keep them happy when you can’t promote them?
Oftentimes, it is the small things you do that resonate the most. I am a firm believer that above-average performance needs to be rewarded. We are pro-active and at times very creative in how we acknowledge those that are considered top performers. The Brill Award, which we achieved as a team, is just one example. We acknowledged the team members with a very focused and sincere thank you communication, acknowledging not only their participation but also the fact that it could not have been achieved without them. From a senior management perspective, we can’t lose sight of the fact that in order to cultivate a team environment you have to be part of the team. We advocate for a culture of inclusion, development, and opportunity.
Herb Alvarez
Herb Alvarez is director of Global Engineering & Critical Facilities, American International Group. Inc. Mr. Alvarez is responsible for engineering and critical facilities management for the AIG portfolio, which comprises 970 facilities spread across 130 countries. Mr. Alvarez has overarching responsibility for the global data center facilities and their building operations. He works closely and in collaboration with AIG’s Global Services group, which is the company’s IT division.
AIG operates three purpose-built data centers in the U.S., including a 235,000 square foot (ft2) facility in New Jersey and a 205,000-ft2 facility in Texas, and eight regional colo data centers in Asia Pacific, EMEA, and Japan.
Mr. Alvarez helped implement a consolidation and standardization effort Global Infrastructure Utility (GIU) that AIG’s CEO Robert Benmosche implemented in 2010. This initiative was completed in 2013.
Kevin Heslin
Kevin Heslin is chief editor and director of ancillary projects at the Uptime Institute. He served as an editor at New York Construction News, Sutton Publishing, the IESNA, and BNP Media, where he founded Mission Critical, the leading commercial publication dedicated to data center and backup power professionals. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the B.A. in Journalism from Fordham University in 1981 and a B.S. in Technical Communications from Rensselaer Polytechnic Institute in 2000.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/07/herb.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-07-20 09:52:582015-07-22 12:42:34AIG Tells How It Raised Its Level of Operations Excellence
Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.
With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.
This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.
John Sheputis: President, Infomart Data Centers
Don Jenkins: VP Operations, Infomart Data Centers
Give our readers a sense of your current data center footprint.
Sheputis: The portfolio includes about 2.2 million square feet (ft2) of real estate, mostly data center space. The facilities in both of our West Coast locations are data center exclusive. The Dallas facility is enormous, at 1.6 million ft2, and is a combination of mission critical and non-mission critical space. Our newest site in Ashburn, VA, is 180,000 ft2 and undergoing re-development now, with commissioning on the new critical load capacity expected to complete early next year.
The Dallas site has been operational since the 1980s. We assumed the responsibility for the data center pods in that building in Q4 2014 and brought on staff from that site to our team.
What is the greatest challenge of managing your current footprint?
Jenkins: There are several challenges, but communicating standards across the portfolio is a big one. Also, different municipalities have varying local codes and governmental regulations. We need to adapt our standards to the different regions.
For example, air quality control standards vary at different sites. We have to meet very high air quality standards in California, which means we adhere to very strict requirements for engine-generator runtimes and exhaust filter media. But in other locations, the regulations are less strict, and that variance impacts our maintenance schedules and parts procurement.
Sheputis: It may sound trivial to go from an area where air quality standards are high to one that is less stringent, but it still represents a change in our standards. If you’re going to do development, it’s probably best to start in California or somewhere with more restrictive standards and then go somewhere else. It would be very difficult to go the other way.
More generally, the Infomart merger was a big bite. It includes a lot of responsibility for non-data center space. So now we have two operating standards. We have over 500,000 ft2 of office-use real estate that uses the traditional break-fix operation model. We also have over two dozen data center suites with another 500,000 ft2 of mission critical space as well, where nothing breaks, or if it does, there can be no interruption of service. These different types of property have two different operations objectives and require different skill sets. Putting those varying levels of operations under one team expands the number of challenges you absorb. It pushes us from managing a few sites to a “many sites” level of complexity.
How do you benchmark performance goals?
Sheputis: I’m going to restrict my response to our mission critical space. When we start or assume control of a project, we have some pretty unforgiving standards. We want concurrent maintenance, industry-leading PUE, on time, on budget, and no injuries—and we want our project to meet critical load capacity and quality standards.
But picking up somebody else’s capital project after they‘ve already completed their design and begun the work, yet before they finished? That is the hardest thing in the world. The Dallas Infomart site is so big, there are two or three construction projects going on at any time. Show up any weekend, and you’ll somebody is doing a crane pick or has a helicopter delivering some equipment to be installed on the roof. It’s that big. It’s a damn good thing that we have great staff on site in Dallas and someone like Don Jenkins to make sure everything goes smoothly.
We hear a lot about data center operations staffing shortages. What has been your experience at Infomart?
Jenkins: Good help is hard to find anywhere. Data center skills are very specific. It’s a lot harder to find good data center people. One of the things we try to do is hire veterans. Over half our operating engineers have military backgrounds, including myself. We do this not just out of patriotism or to meet security concerns, but because we understand and appreciate the similarity of a mission critical operation and a military operation (see http://journal.uptimeinstitute.com/resolving-data-center-staffing-shortage/).
Sheputis: If you have high standards, there is always a shortage of people for any job. But the corollary for that is that if you’re known for doing your job very well, the best people often find you. Don deserves credit for building low turnover teams. Creating a culture of continuity requires more than strong technical skillsets, you have to begin recruiting the kinds of people who can play on a team.
Don uses this phrase a lot to describe the type he’s looking for—people who are capable of both leading and being led. He wants candidates with low egos who care about outcomes, strong ethics, and who want to learn. We invest heavily in our training program, and we are rigorous in finding people who buy into our process. We don’t want people who want to be heroes. The ideal candidate is a responsible team player with an aptitude for learning, and we fill in the technical gaps as necessary over time. No one has all the skills they need day one. Our training is industry leading. To date, we have had no voluntary turnover.
Jenkins: We do about 250 man-hours of training for each staff member. It’s not cheap, but we feel it’s necessary and the guys love it. They want to learn. They ask for it. Greater skill attainment is a win-win for them, our tenants, and us.
Sheputis: When you build a data center, you often meet the technically strongest people at either the beginning of the project during design or the end of the project during the commissioning phase. Every project we do is Level 5 Commissioned. That’s when you find and address all of the odd or unusual use cases that the manufacturer may not have anticipated. More than once, we have had a UPS troubleshooting specialist say to Don, “You guys do it right. Let me know when you have an opening in your organization.”
Jenkins: I think it’s a testament that shows how passionate we are about what we do.
Are you standardizing management practices across multiple sites?
Sheputis: When we had one or two sites, it wasn’t a challenge because we were copying from California to Oregon. But with three or more sites it becomes much more difficult. With the inclusion of Dallas and Ashburn, we have had to raise our game. It is tempting to say we do the same thing everywhere, but that would be unrealistic at best.
Broadly speaking, we have two families of standards: Content and Process. For functional content we have specs for staffing, maintenance, security, monitoring, and the like. We apply these with the knowledge that there will be local exceptions—such as different codes and different equipment choices. An operator from one site has to appreciate the deviations at the other sites. We also have process-based standards, and these are more meticulously applied across sites. While the OEM equipment may be different, shouldn’t the process for change management be consistent? Same goes for the problem management process. Compliance is another area where consistency is expected.
The challenge with projecting any standard is to efficiently create evidence of acceptance and verification. We try to create a working feedback loop, and we are always looking for ways to do it better. We can centrally document standard policies and procedures, but we rely on field acceptance of the standard, and we leverage our systems to measure execution versus expectation. We can say please complete work orders on time and to the following spec, and we can delegate scheduling to the field, but the loop isn’t complete until we confirm execution and offer feedback on whether the work and documentation were acceptable.
What technology or methodology has helped your organization to significantly improve data center management?
Jenkins: Our standard building management system BMS is a Niagara™ product with an open framework. This allows our legacy equipment to talk over open protocols. All of our dashboards and data look the same and feel the same across all of the sites so that anybody could pull up another site and it would look the same to the operator.
Sheputis: Whatever system you’re using, there has to be a high premium on keeping it open. If you run on a closed system, it eventually becomes a lost island. This is especially true as you scale your operation. You have to have open systems.
How does your organization use the M&O Stamp?
Sheputis: The M&O stamp is one of the most important things we have ever achieved. And I’m not saying this to flatter you or the Uptime Institute. We believe data center operations are very important, and we have always believed we were pretty good. But I have to believe that many operators think they do a good job as well. So who is right? How does anyone really know? The challenge to the casual observer is that the data center industry is fairly closed. Operations are secure and private.
We started the process to see how good we were, and if we were good, we also thought it would be great to have a credible third party to acknowledge that. Saying I think I’m good is one thing, having a credentialed organization like Uptime Institute say so is much more.
But the M&O process is more than the Stamp of Approval. Our operations have matured and improved by participating in this process. Every year we reassess and recertify we feel like we learn new things, and we’re tracking our progress. The bigger benefit may be that the process forces us to think procedurally. When we’re setting up a new site, it helps us set a roadmap for what we want to achieve. Compared to all other forms of certification, we get something out of this beyond the credential; we get a path to improve.
Jenkins: Lots of people run a SWOT (strengths, weaknesses, opportunities, and threats) analysis or internal audit, but that feedback often lacks external reference points. You can give yourself an audit, and you can say “we’re great.” But what are you learning? How do you expand your knowledge? The M&O Stamp of Approval provides learning opportunities for us by providing a neutral experienced outsider viewpoint on where, and more importantly, how we can do better.
On one of the assessments, one of Uptime Institute’s consultants demonstrated how we could setup our chiller plant so that an operator could see all the key variables easily at a glance, with fewer steps to see what valves are open or closed. The advice was practical and easy to implement. Including markers on a chain, little flags on a chiller, LED lights on a pump. Very simple things to do, but we hadn’t thought of it. They’d seen it in Europe, it was easy to do, and it helps. That’s one specific example, but we used the knowledge of the M&O team to help us grow.
We think the M&O criteria and content will get better and deeper as time goes on. This is a solid standard for people to grow on.
Sheputis: We are for certifications, as they remove doubt, but most of the work and value is had in obtaining the first certification. I can see why others are cynical about value and cost to recertify. But I do think there’s real value in the ongoing M&O certification, mainly because it shows continuous improvement. No other certification process does that.
Jenkins: A lot of certifications are binary in that you pass if you have enough checked boxes—the content is specific, but operationally shallow. We feel that we get a lot more content out of the M&O process.
Sheputis: As I said before, we are for compliance and transparency. As we are often fulfilling a compliance requirement for someone else, there is clear value is saying we are PCI compliant or SSAE certified. But the M&O Stamp of Approval process is more like seeing a professional instructor. All other certifications should address the M&O stamp as “Sir.”
Matt Stansberry
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly editorial director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/06/mo.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-06-18 09:54:552015-09-02 21:13:42Meeting the M&O Challenge of Managing a Diverse Data Center Footprint: John Sheputis and Don Jenkins, Infomart
Better information leads to better decisions
By Jose Ruiz
New tools have dramatically enhanced the ability of data center operators to base decisions regarding capacity planning and operational performance like move, adds, and changes on actual data. The combined use of modeling technologies to effectively calibrate the data center during the commissioning process and the use of these benchmarks in modeling prospective configuration scenarios enable end users to optimize the efficiency of their facilities prior to the movement or addition of a single rack.
Data center construction is expected to continue growing in coming years to house the compute and storage capacity needed to support the geometric increases in data volume that will characterize our technological environment for the foreseeable future. As a result, data center operators will find themselves under ever-increasing pressure to fulfill dynamic requirements in the most optimized environment possible. Every kilowatt (kW) of cooling capacity will become increasingly precious, and operators will need to understand the best way to deliver it proactively.
As Uptime Institute’s Lee Kirby explains in Start With the End in Mind, a data center’s ongoing operations should be the driving force behind its design, construction, and commissioning processes.
This paper examines performance calibration and its impact on ongoing operations. To maximize data center resources, Compass performs a variety of analyses using Future Facilities’ 6SigmaDC and Romonet’s Software Suite. In the sections that follow, I will discuss how predictive modeling during data center design, the commissioning process, and finally, the calibration processes validate the predictive models. Armed with the calibrated model, a customer can study the impact of proposed modifications on data center performance before any IT equipment is physically installed in the data center. This practice helps data center operators account for the three key elements during facility operations: availability, capacity, and efficiency. Compass calls this continuous modeling.
Figure 1. CFD software creates a virtual facility model and studies the physics of the cooling and power elements of the data center
What is a Predictive Model?
A predictive model, in a general sense, combines the physical attributed and operating data of a system and uses that to calculate an outcome in the future. The 6Sigma model provides complete 3D representation of a data center at any given point in its life cycle. Combining the physical elements of IT equipment, racks, cables, air handling units (AHUs), power distribution units (PDUs), etc., with computational fluid dynamics (CFD) and power modeling, enables designers and operators to predict the impact of their configuration on future data center performance. Compass uses commercially available performance modeling and CFD tools to model data center performance in the following ways:
• CFD software creates a virtual facility model and studies the physics of the cooling and power elements of the data center (see Figure 1).
• The modeling tool interrogates the individual components that make up the data center and compare their actual performance with the initial modeling prediction.
This proactive modeling process allows operators to fine tune performance and identify potential operational issues at the component level. A service provider, for example, could use this process to maximize the sellable capacity of the facility and/or its ability to meet the service level agreements (SLA) requirements for new as well as existing customers.
Case Study Essentials
For the purpose of this case study all of the calibrations and modeling are based upon Compass Data Center’s Shakopee, MN, facility with the following specifications (see Figure 2):
• 13,000 square feet (ft2) of raised floor space
• No columns on the data center floor
• 12-foot (ft) false ceiling used as a return air
plenum
• 36-inch (in.) raised floor
• 1.2 megwatt (MW) of critical IT load
• four rooftop air handlers in an N+1 configuration
• 336 perforated tiles (25% open) with dampers installed
• Customer type: service provider
Figure 2. Data center room with rooftop AHUs
Cooling Baseline
The cooling system of this data center comprises 4 120-ton rooftop air handler units in an N+1 configuration (see Figure 3). The system provides a net cooling capacity that a) supports the data center’s 1.2-MW power requirement and b) delivers 156,000 cubic feet per minute (CFM) of airflow to the white space. The cooling units are controlled based on the total IT load present in the space. This method turns on AHUs as the load increases. Table 1 describes the scheme.
Table 2. Tests performed during calibration
These units have outside air economizers to leverage free cooling and increase efficiency. For the purpose of the calibration, the system was set to full recirculation mode with the outside air economization feature turned off. This allows the cooling system to operate at 100% mechanical cooling, which is representative of a standard operating day under the Design Day conditions.
Figure 3. Rooftop AHUs
Figure 4. Cabinet and perforated tile layout. Note: Upon turnover, the customer is responsible for racking and stacking the IT equipment.
Cabinet Layout
The default cabinet layout is based on a standard Cold Aisle/Hot Aisle configuration (see Figure 4). Airflow Delivery and Extraction
Because the cooling units are effectively outside the building, a long opening on one side of the room serves as a supply air plenum. The air travels down the 36-in.-wide plenum to a patent-pending air dam before entering the raised floor. The placement of the air dam ensures even pressurization of the raised floor during both normal and maintenance failure modes. Once past the air dam, the air enters a 36-in. raised floor and is released into the above floor by 336 perforated tiles (25% open) (see Figure 5).
Figure 5. Airflow
Hot air from the servers then passes through ventilation grilles placed in the 12-ft false ceiling.
Commissioning and Calibration
Commissioning is a critical step in the calibration process because it eliminates extraneous variables that may affect subsequent reporting values. Upon the completion of the Integrated Systems Testing (IST), the calibration process begins. This calibration exercise is designed to enable the data center operator to compare actual data center performance against the modeled values.
Figure 6. Inconsistencies between model values and actual performance can be explored and examined prior to placing the facility into actual operation. These results provide a unique insight into whether the facility will operate as per the design intent in the local climate.
The actual process consists of conducting partial load tests in 25% increments and monitoring actual readings from specific building management system points, sensors, and devices that account for all the data center’s individual components.
Figure 7. Load bank and PDUs during the test
As a result of this testing, inconsistencies between model values and actual performance can be explored and examined prior to placing the facility into actual operation. These results provide a unique insight into whether the facility will operate as per the design intent in the local climate or whether there are issues that will affect future operation that must be addressed. Figure 6 shows the process. Figure 7 shows load banks and PDUs as arranged for testing.
Table 2. Tests performed during calibration
All testing at Shakopee was performed by a third-party entity to eliminate the potential for any reporting bias in the testing. The end result of this calibration exercise is that the operator now has a clear understanding of the benchmark performance standards unique to their data center. This provides specific points of reference for all future analysis and modeling to determine the prospective performance impact of site moves, adds, or changes. Table 2 lists the tests performed during the calibration.
Table 3. Perforated tile configuration during testing
During the calibration, dampers on appropriate number of tiles were closed proportionally to coincide with the load step. Table 3 shows the perforated tile damper configuration used during the test.
Table 4. CPM goals, test results, and potential adjustments
Analysis & Results
To properly interpret the results of the initial calibration testing, it’s important to understand the concept of cooling path management (CPM), which is the process of stepping through the full route taken by the cooling air and systematically minimizing or eliminating potential breakdowns. The ultimate goal of this exercise is meeting the air intake requirement for each unit of IT equipment. The objectives and associated changes are shown in Table 4.
Cooling paths are influenced by a number of variables, including the room configuration, IT equipment and its arrangement, and any changes that will fundamentally change the cooling paths. In order to proactively avoid cooling problems or inefficiencies that may creep in over time, CPM is, therefore, essential to the initial design of the room and to configuration management of the data center throughout its life span.
AHU Fans to Perforated Tiles (Cooling Path #1). CPM begins by tracing the airflow from the source (AHU fans) to the returns (AHU returns). The initial step consists of investigating the underfloor pressure. Figure 8 shows the pressure distribution in the raised floor. In this example, the underfloor pressure is uniform from the very onset; thereby, ensuring an even flow rate distribution.
Figure 8 shows the pressure distribution in the raised floor. In this example, the underfloor pressure is uniform from the very onset; thereby, ensuring an even flow rate distribution.
From a calibration perspective, Figure 9 demonstrates that the results obtained from the simulation are aligned with the data collected during commissioning/calibration testing. The average underfloor pressure captured by software during the commissioning process was 0.05 in. of H20 as compared to 0.047 in. H20 predicted by 6SigmaDC.
The airflow variation across the 336 perforated tiles was determined to be 51 CFM. These data guaranteed an average target cooling capacity of 4 kW/cabinet compared to the installed 3.57 kW/cabinet (assuming that the data center operator uses the same type of perforated tiles as those initially installed). In this instance, the calibration efforts provided the benchmark for ongoing operations, and verified that the customer target requirements could be fulfilled prior to their taking ownership of the facility.
The important takeaway in this example is the ability of calibration testing to not only validate that the facility is capable of supporting its initial requirements but also to offer the end user a cost-saving mechanism to determine the impact of proposed modifications on the site’s performance, prior to their implementation. In short, hard experience no longer needs to be the primary mode of determining the performance impact of prospective moves, adds, and changes.
Table 5. Airflow simulations and measured results
During the commissioning process, all 336 perforated tiles were measured.
Table 5 is a results comparison of the measured and simulated flow from the perforated tiles.
Table 6. Airflow distribution at the perforated tiles
The results show a 1% error between measured and simulated values. Let’s take a look at the flow distribution at the perforated tiles (see Table 6).
The flows appear to match up quite well. It is worth noting that the locations of the minimum and maximum flows are different between measured and simulated values. However, this is not of concern as the flows are within an acceptable margin of error. Any large discrepancy (> 10%) between simulated and measured would warrant further investigation (see Table 7). The next step in the calibration process examined the AHU supply temperatures.
Perforated Tiles to Cabinets (Cooling Path #2). Perforated tile to cabinet airflow (see Figure 10) is another key point of reference that should be included in calibration testing and determination. Airflow leaving the perforated tiles enters the inlets of the IT equipment with minimal bypass.
Figure 9. Simulated flow through the perforated tiles
Figure 10. The blue particles cool the IT equipment, but the gray particles bypass the equipment.
Figure 10 shows how effective the perforated tiles are in terms of delivering the cold air to the IT equipment. The blue particles cool the IT equipment while the gray particles bypassing the equipment.
A key point of this testing is the ability to proactively identify solutions that can increase efficiency. For example, during this phase, testing helped determine that reducing fan speed would improve the site’s efficiency. As a result, the AHU fans were fitted with variable frequency drives (VFDs), which enables Compass to more effectively regulate this grille to cabinet airflow.
Figure 11. Inlet temperatures
It was also determined that inlet temperatures to the cabinets were on the lower scale of the ASHRAE allowable range (see Figure 11), this creating the potential to raise the air temperature within the room for operations. If the operator takes action and raises the supply air temperature, they will have immediate efficiency gains and see significant cost savings.
Table 8. Savings estimates based on IT loads
The analytical model can estimate these savings quickly. Table 8 shows the estimated annual cost savings based on IT load, supply air temperature setting for the facility and a power cost of seven cents per kilowatt-hour (U.S. national average). It is important to note the location of the data center because the model uses specific EnergyPlus TMY3 weather files published by the U.S. Department of Energy for its calculation.
Figure 12. Cooling path three tracks airflow from the equipment exhaust to the returns of the AHU units
Cabinet Exhaust to AHU Returns (Cooling Path #3). Cooling path three tracks airflow from the equipment exhaust to the returns of the AHU units (see Figure 12). In this case, calibration testing identified that the inlet temperatures suggest that there was very little external or internal cabinet recirculation. The return temperatures and the capacities of the AHU units are fairly uniform. The table shows the comparison between measured and simulated AHU return temperatures:
Looking at the percentage cooling load utilized for each AHU unit, the measured load was around 75% and the simulated values show an average value of 80% for each AHU. This slight discrepancy was acceptable due to the differences between the measured and simulated supply and return temperatures; thereby, establishing the acceptable parameters for ongoing operation within the site.
Introducing Continuous Modeling
Up to this point, I have illustrated how calibration efforts can be used to both verify the suitability of the data center to successfully perform as originally designed and to prescribe the specific benchmarks for the site. This knowledge can be used to evaluate the impact of future operational modifications, which is the basis of continuous modeling.
The essential value of continuous modeling is its ability to facilitate more effective capacity planning. By modeling prospective changes before moving IT equipment in, a lot of important what-if’s can be answered (and costs avoided) while meeting all the SLA requirements.
Examples of continuous modeling applications include, but are not limited to:
• Creating custom cabinet layouts to predict the impact of various configurations
• Increasing cabinet power density or modeling custom cabinets
• Modeling Cold Aisle/Hot Aisle containment
• Changing the control systems that regulate VFDs to move capacity where needed
• Increasing the air temperature safely without breaking the temperature SLA
• Investigating upcoming AHU maintenance or AHU failures that can’t be achieved in a production environment
In each of these applications, the appropriate modeling tools are used in concert with initial calibration data to determine the best method of implementing a desired change. The ability to proactively identify the level of deviation from the site’s initial system benchmarks can aid in the identification of more effective alternatives that not only improve operational performance but also reduce the time and cost associated with their implementation.
Case History: Continuous Modeling
Total airflow in the facility described in this case study is based on the percentage of IT load in the data hall with a design criteria of 25°F (-4°C) ∆T. Careful tile management must be practiced in order to maintain proper static pressure under the raised floor and avoid potential hot spots. Using the calibrated model, Compass created two scenarios to understand the airflow behavior. This resulted in installing fewer perforated tiles than originally planned and better SLA compliance. Having the calibrated model gave a higher level of confidence for the results. The two scenarios are summarized following.
Figure 13. Case history equipment layout
Scenario 1: Less Than Ideal Management
There are 72 4-kW racks in one area of the raised floor and six 6 20-kW racks in the opposite corner (see Figure 13). The total IT load is 408 kW, which is equal to 34% of the total IT load available. The total design airflow at 1,200 kW is 156,000 CFM, meaning the total airflow delivered in this example is 53,040 CFM. A leakage rate of 12% is assumed, which means that 88% of the 53,040 CFM is distributed using the perforated tiles. Perforated tiles were provided in front of each rack. The 25% open tiles were used in front of the 4-kW racks and Tate GrateAire tiles were used in front of the 20-kW racks.
Figure 14. Scenario 1 data hall temperatures
The results of Scenario 1 demonstrate the temperature differences between the hot and cold aisles. For the area with 4-kW racks there is an average temperature difference of around 10°F (5.5 °C) between the Hot and Cold aisles, and the 20-kW racks have a temperature difference of around 30°F (16°C) (see Figure 14).
Scenario 2: Ideal Management
In this scenario, the racks were left in the same location, but the perforated tiles were adjusted to better distribute air based on the IT load. The 20-kW racks account for 120 kW of the total IT load while the 4-kW racks account for 288 kW of the total IT load. In an ideal floor layout, 29.4% of the airflow will be delivered to the 20-kW racks and 70.6% of the airflow will be delivered to the 4-kW racks. This will allow for an ideal average temperature difference across all racks.
Figure 15. Scenario 2 data hall temperatures
Scenario 2 shows a much better airflow distribution than Scenario 1. The 20-kW racks now have around 25°F (14°C) difference between the hot and cold aisles (see Figure 15).
In general, it may stand to reason that if there are a total of 336 perforated tiles in the space and the space is running at 34% IT load, 114 perforated tiles should be open. The model validated that if 114 perforated tiles were opened, the underfloor static pressure would drop off and potentially cause hot spots due to lack of airflow.
Furthermore, continuous modeling will allow operators a better opportunity to match growth with actual demand. Using this process, operators can validate capacity and avoid wasted capital expense due to poor capacity planning.
Conclusion
To a large extent, a lack of evaluative tools has historically forced data center operators to accept on faith their new data center’s ability to meet its design requirements. Recent developments in modeling applications not only address this long-standing short coming, but also provide operators with an unprecedented level of control. The availability of these tools provide end users with proactive analytical capabilities that manifest themselves in more effective capacity planning and efficient data center operation.
Table 9. Summary of the techniques used to develop in each step of model development and verification
Through the combination of rigorous calibration testing, measurement, and continuous modeling, operators can evaluate the impact of prospective operational modifications prior to their implementation and ensure that they are cost-effectively implemented without negatively affecting site performance. This enhanced level of control is essential for effectively managing data centers in an environment that will continue to be characterized by its dynamic nature and increasing application complexity. Finally, Table 9 summarizes the reasons why these techniques are valuable and provide a positive impact in data center operations.
Most importantly, all of these help the data center owner and operator make a more informed decision.
Jose Ruiz
Jose Ruiz is an accomplished data center professional with a proven track record of success. Mr. Ruiz serves as Compass Datacenters’ director of Engineering where he is responsible for all of the company’s sales engineering and development support activities. Prior to joining Compass, he spent four years serving in various sales engineering positions and was responsible for a global range of projects at Digital Realty Trust. Mr. Ruiz is an expert on CFD modeling.
Prior to Digital Realty Trust, Mr. Ruiz was a pilot in the United States Navy where he was awarded two Navy Achievement Medals for leadership and outstanding performance. He continues to serve in the Navy’s Individual Ready Reserve. Mr. Ruiz is a graduate of the University of Massachusetts with a degree in Bio-Mechanical Engineering.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/05/ruiz-copy.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-06-03 08:16:352015-06-03 08:21:13The Calibrated Data Center: Using Predictive Modeling
These small devices prevent accidental disconnection of mission critical gear
By Scott Good
Today IEC plugs are used at the rack-level PDU and the IT device. IEC plugs backing out of sockets create a significant concern, since these plugs feed UPS power to the device. In the past, twist-lock cord caps were used, but these did not address the connection of the IEC plug at the IT device. Retainers are a way the industry has addressed this problem.
In one case, Uptime Institute evaluated a facility in the Caribbean (a Tier Certified Constructed Facility) which was not using the retainers. While operators had checked all the connections two weeks earlier, when they isolated one UPS during the TCCF process, a single cord on a single device belonging to the largest customer was found to be loose and the device suffered an interruption of power.
The International Electrotechnical Commission (IEC) plug is the most common device used to connect rack-mounted IT hardware to power. In recent years, the use of IEC 60320 cords with IEC plugs has become more common, replacing twist-lock and field-constructed hard-wired type IEC plug connections. During several recent site evaluations, Uptime Institute has observed that the IEC 60320 plug-in electrical cords may fit loosely and accidentally disconnect during routine site network maintenance. Some incidents have involved plugs that were not fully inserted at the connections to the power distribution units (PDUs) in the IT rack or became loose due to temperature changes fluctuations. This technical paper will provide information related to cable and connector installation methods that can be used in ensuring a secure connection at the PDU.
IT Hardware Power Cables
The IEC publishes consensus-based international standards and manages conformity assessment systems for electric and electronic products, systems and services, collectively known as electrotechnology. The IEC 60320 standard describes the devices used to couple IT hardware to power systems. The plugs and cords described by this standard come in various configurations to meet the current and voltages found in each region. This standard is intended to ensure that proper voltage and current are provided to IT appliances wherever they are deployed (see http://www.iec.ch/worldplugs/?ref=extfooter).
The most common cables used to power standard PCs, monitors, and servers are designated C13 and C19. Cable connectors have male and female versions, with the female always carrying an odd number label. The male version carries the next higher even number as its designation. C19 and C20 connectors are becoming more common for use with servers and power distribution PDUs in high-power applications.
Most standard PCs accept a C13 female cable end, which connects a standard 5-15 plug cord set that plugs into a 120-volt (V) outlet to a C13 male inlet on the device end. In U.S. data centers, a C14/C13 coupler includes a C14 (male) end that plugs into a PDU and a C13 (female) end that power plugs into the server. Couplers in EU data centers also include C13s at the IT appliance end but have different male connectors to the PDU. These male ends are identified as C or CEE types. For example, the CEE /7 has two rounded prongs and provides power at a 220-V power.
IEC Plug Installation Methods
In data centers, PDUs are typically configured to support dual-corded IT hardware. Power cords are plugged into PDU receptacles that are powered from A and B power sources. During installation, installers typically plug a cable coupler in a server outlet first and then into a PDU.
Figure 1. Coiled cable
Sometimes the cord is longer than the distance between the server outlet and the PDU, so the installer will coil the cable and secure the coil with cable ties or Velcro (see Figures 1 and 2). This practice adds weight on the cable and stress to the closest connection, which is at the PDU. If the connection at the PDU is not properly supported, the connector can easily pull or fall out during network maintenance activity. Standard methods for securing PDU connections include cable retention clips, plug locks, and IEC Plug Lock and IEC Lock Plus.
Figure 2. Velcro ties
Cable retention clips are the original solution developed for IT hardware cable installations. These clips are manufactured to install at the connection point and clip to retention receptacles on the side of the PDU. Supports on the PDU receive the clip and hold the connector in the receptacle slot (see Figure 3).
Figure 3. A retention clip to PDU in use
Plug lock inserts prevent power cords from accidentally disconnecting from C13 output receptacles (see Figure 4). A Plug lock insert place over any C14 input cord strengthens the connection of the plug to the C13 outlet, keeping critical equipment plugged-in and running during routine rack access and maintenance.
Figure 4. Plug lock
C13 and C19 IEC Lock connectors include lockable female cable ends suitable for use with standard C14 or C20 outlets. They cannot be accidentally dislodged or vibrated out of the outlets (see Figure 5).
The IEC Plug Lock and IEC Lock Plus are also alternatives. Both products have an integral locking mechanism that secures C13 and C19 plugs to the power pins of the all C13 and C19 outlets.
Summary
Manufacturers of IEC plugs over the recent years have developed technologies in new and existing plug and cable products to help mitigate the issue of plugs working their way out of the sockets on both IT hardware and PDU power feeds.
Figure 5. IEC plug lock
As these connections are audited in the data center, it is good practice to see where these conditions exist or could be created. Having a plan to change out older style and suspect cables will help mitigate or avoid incidents during maintenance and change processes in data centers.
Scott Good
Scott Good is a senior consultant of Uptime Institute Professional Services, facilitating prospective engagements and delivering Tier Topology and Facilities Certifications to contracted clients. Mr. Good has been in the data center industry for more 25 years and has developed data center programs for enterprise clients globally. He has been involved in the execution of Tier programs in alignment with the Uptime Institute and was one of the first to be involved in the creation of the original Tier IV facilities. Mr. Good developed and executed a systematic approach to commissioning these facilities, and the processes he created are used by the industry to this day.
https://journal.uptimeinstitute.com/wp-content/uploads/2015/05/goodtop.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2015-05-29 07:39:182017-08-29 08:01:13Retainers Improve the Effectiveness of IEC Plugs
U.S. Bank Upgrades Its Data Center Electrical Distribution
/in Design/by Kevin HeslinOpen heart surgery on the data center: Switchgear replacement in a live facility
By Mark Johns
U.S. Bank emerged during the 1990s from mergers and acquisitions among several major regional banks in the West and Midwest. Since then, the company continued to grow through additional large acquisitions and mergers with more than 50 banks. Today, U.S. Bancorp is a diversified American financial services holding company headquartered in Minneapolis, MN. It is the parent company of U.S. Bank National Association, which is the fifth largest bank in the United States by assets, and fourth largest by total branches. U.S. Bank’s branch network serves 25 midwestern and western states with 3,081 banking offices and 4,906 ATMs. U.S. Bancorp offers regional consumer and business banking and wealth management services, national wholesale and trust services and global payments services to over 15.8 million customers.
Rich in history, US Bancorp operates under the second oldest continuous national charter—originally Charter #24—granted during Abraham Lincoln’s administration in 1863. In addition, U.S. Bank helped finance Charles Lindbergh’s historic flight across the Atlantic. For sheer volume, U.S. Bank is the fifth-largest check processor in the nation, handling 4 billion paper checks annually at 12 processing sites. The bank’s air and ground courier fleet moves 15 million checks each day.
Energy Park Site
U.S. Bank relies on its Energy Park site in St. Paul, MN, to support these operations. Energy Park comprises a 350,000-square-foot (ft2) multi-use building that houses the check production operations and 40,000-ft2 data center, as well as support staff for both. Xcel Energy provides two 2,500-kilovolt-ampere (kVA) feeds to the data center and two 2,000-kVA feeds to the rest of the building.
The utility’s data center feeds supply power to two automatic throw over switches (ATO); each ATO feeds two transformers. Two transformers support the data center, and two other transformers support check production and power for the rest of the building, including offices and HVAC (see Figures 1-3).
Figure 1. Temporary stand-alone power plant
Figures 2 and 3. Utility transfer and ATS
A single UPS module feeds the check production area. However, two separate multi-module, parallel redundant UPS systems feed data center loads. Four N+1 1,500-kilowatt (kW) standby-rated engine generators backup the three UPS systems through existing switchgear distribution. The data center switchgear is a paralleling/closed-transition type, and the check production area switchgear is an open-transition type. The remaining office area space is not backed up by engine generators.
Project Summary
To ensure data center reliability, U.S. Bank initiated an Electric Modernization Project (data center electrical distribution). The project included replacing outdated switchgear and UPS systems, which were no longer supported by the manufacturer. In the project’s first phase, Russelectric paralleling switchboards were selected to replace existing equipment and create two separate distribution systems, each backed up by existing engine generators. Mechanical and UPS loads are divided between the two systems, so that either one can support the data center. Switchgear tie breakers increase overall redundancy. The facility benefits from new generator controls and new switchgear SCADA functionality, which will monitor and control utility or generator power.
Since this project was undertaken in a live facility, several special considerations had to be addressed. In order to safely replace the existing switchgear, a temporary stand-alone power plant, sized to support all of the data center loads, was assembled in a parking lot just outside the building’s existing electric/switchgear room (see Figures 4-6). The temporary power plant consisted of a new utility transformer, powered from one of the utility’s ATOs, which supplies power to an automatic transfer switch (ATS). The ATS supplies power from either the utility feeds or the standby-rated engine generators to a new distribution switchboard to support data center loads. The switchboard was installed inside a small building to protect it from the elements. Maintenance bypass switches enable staff to work on the ATS.
Figure 4. Maintenance bypass switches were installed to allow for work on the ATS
Figure 5 (Top) and 6 (Bottom). Switchboard was installed in a small building
Each standby-rated engine generator has two sources of fuel oil. The primary source is from a bulk tank, with additional piping connected to the site’s two existing 10,000-gallon fuel oil storage tanks to allow for filling the bulk tank or direct feed to the engine generators (see Figure 7).
Transferring Data Center Loads
U.S. Bank’s commissioning of the stand-alone power plant including testing the ATS, load testing the engine generators, infrared (IR) scanning all connections, and a simulated utility outage. Some additional cabling was added during commissioning to address cable heating due to excessive voltage drop. After commissioning was completed, data center loads were transferred to the stand-alone plant. This required providing temporary circuits for select mechanical equipment and moving loads away from four panelboards (two for mechanical equipment and two for the UPS), so that they could be shut down and re-fed from the temporary power plant. The panelboards were transferred one at a time to keep the data center on-line throughout all this work. The transfer work took place over two weekends.
The mechanical loads were sequenced first in order to put load on the stand-alone plant to provide a stable power source when the UPS systems were cut over and brought on-line. Data center loads were transferred to engine-generator power at the beginning of each day to isolate the data center from the work.
On the first Saturday devoted to the transfer process, the mechanical loads were rotated away from the first panelboard to be re-fed. Equipment requiring temporary power was cut over (see Figure 8). The isolated panelboard was then shut down and re-fed from the stand-alone plant. Once the panelboard was re-fed and power restored to it, equipment receiving temporary power was returned to its normal source. Mechanical loads were rotated back to this panelboard, so that the second panelboard could be shut down and re-fed. Data center loads were transferred back to utility power at the end of each day.
The Sunday mechanical cut over followed the same sequence as Saturday, except the stand-alone power plant, with live load, was tested at the end of the day. This testing included having Xcel Energy simulate a utility outage to the data center, which the utility did with data center loads still on engine-generator power so as not to impact the data center.
UPS were transferred the following weekend. On Saturday, the two UPS systems were transferred to engine-generator power and put into maintenance bypass so their primary power sources could be re-fed from the stand-alone power plant. At the end of the day, the two UPS systems went back on-line and transferred back to utility power. On Sunday, workers cut over the UPS maintenance bypass source. That day’s work concluded with additional testing of the stand-alone power plant, including another simulated utility outage to see how the plant would respond while supporting entire data center.
Figure 7. Standby-rated engine generators have two sources of fuel oil
Figure 8. Data center loads were transferred to the temporary stand-alone power plant over the course of two weekends
Cable Bus Installation
At the same time the stand-alone power plant was assembled and loads cut over to it, four sets of cable trays and cables were installed to facilitate dividing the UPS loads. These four sets of cable trays had to be run through office and production areas to get to the existing UPS room, which is a run of approximately 625 feet (see Figure 9). Each tray served one of the four primary and maintenance bypass UPS systems.
Figure 9. New cable buss ran about 625 feet through the facility
Switchgear and Generators
After the data center loads were transferred over to the stand-alone power plant, the old switchgear was disconnected from utility power so it could be disassembled and removed from the facility (see Figures 10 and 11). Then, the new switchgear was installed (see Figures 12 and 13).
The switchgear was designed for even distribution of loads, with an A (yellow) side and a B (blue) side (see Figure 14). Each side supports one of the two UPS systems, one of the two chillers with its pumps and towers, and half of the computer room cooling units.
Figure 10 and 11. Old switchgear was disassembled and removed from the facility
After installation, portable load banks were brought in for commissioning the new switchgear. The engine generators also received a full re-commission due to the changes in the controls and the additional alarms.
Figure 12 and 13. New switchgear was installed
Figure 14. Switchgear supporting Yellow and Blue sides, equally dividing the critical load
After the new switchgear was fully commissioned, data center loads were cut over to the new switchgear following a similar transfer sequence as to the stand-alone power plant. The panelboards supporting mechanical and UPS equipment were again each cut over one panel at a time to keep the data center on-line, again requiring transferring data center loads to engine-generator power to isolate the data center throughout this work.
Figure 15 and 16. Upgraded engine-generator controls and alarming were installed, with panels installed in the Engineers Office
As previously mentioned, upgraded engine-generator controls and alarming were installed as part of the project (see Figures 15 and 16). The older controls had to be upgraded to allow communication with the new switchgear. Upgraded alarm panels were installed in the Engineering Office. In addition, each switchboard has a SCADA screen with a workstation installed in the Engineering Office (see Figure 17). The project also included updating MOPs for all aspects of the switchgear operation (see Figure 18).
Figure 17. New switchgear includes a new SCADA system
Figure 18. Updated MOPs for the switchgear
The overall project went well and was completed on time with no impact to the data center. Since this phase of the project was completed, we have performed a number of live load engine-generator tests, including a few brief utility power tests, in which the engine generators were started and supported transferred load. In each test, the new equipment performed great. Phase 2 of the modernization project is the replacement of UPS System 1, which is currently underway and anticipated to be completed later in 2014. Phase 3 is replacement of UPS System 2, scheduled for 2015.
Mark Johns
Mark Johns is chief engineer, U.S. Bank IT Critical Facilities Services. He has more than 26 years data center engineering experience, completing numerous Infrastructure upgrade projects, including all commissioning, without interruption to data center operations. Mr. John’s long career prior to U.S. Bank includes working in a 7-story multi-use facility, which includes data center operations, check processing operations, and support staff.
A Look at Data Center Cooling Technologies
/in Design/by Kevin HeslinSabey optimizes air-cooled data centers through containment
By John Sasser
The sole purpose of data center cooling technology is to maintain environmental conditions suitable for information technology equipment (ITE) operation. Achieving this goal requires removing the heat produced by the ITE and transferring that heat to some heat sink. In most data centers, the operators expect the cooling system to operate continuously and reliably.
I clearly recall a conversation with a mechanical engineer who had operated data centers for many years. He felt that most mechanical engineers did not truly understand data center operations and design. He explained that most HVAC engineers start in office or residential design, focusing on comfort cooling, before getting into data center design. He thought that the paradigms they learn in those design projects don’t necessarily translate well to data centers.
It is important to understand that comfort cooling is not the primary purpose of data center cooling systems, even though the data center must be safe for the people who work in them. In fact, it is perfectly acceptable (and typical) for areas within a data center to be uncomfortable for long-term occupancy.
As with any well-engineered system, a data center cooling system should efficiently serve its function. Data centers can be very energy intensive, and it is quite possible for a cooling system to use as much (or more) energy as the computers it supports. Conversely, a well-designed and operated cooling system may use only a small fraction of the energy used by ITE.
In this article, I will provide some history on data center cooling. I will then discuss some of the technical elements of data center cooling, along with a comparison of data center cooling technologies, including some that we use in Sabey’s data centers.
The Economic Meltdown of Moore’s Law
In the early to mid-2000s, designers and operators worried about the ability of air-cooling technologies to cool increasingly power hungry servers. With design densities approaching or exceeding 5 kilowatts (kW) per cabinet, some believed that operators would have to resort to technologies such as rear-door heat exchangers and other kinds of in-row cooling to keep up with the increasing densities.
In 2007, Ken Brill of the Uptime Institute famously predicted the Economic Meltdown of Moore’s Law. He said that the increasing amount of heat resulting from fitting more and more transistors onto a chip would reach an endpoint at which it would no longer be economically feasible to cool the data center without significant advances in technology (see Figure 1).
Figure 1. ASHRAE New Datacom Equipment Power Chart, published February 1, 2005
The U.S. Congress even got involved. National leaders had become aware of data centers and the amount of energy they require. Congress directed the U.S. Environmental Protection Agency (EPA) to submit a report on data center energy consumption (Public Law 109-341). This law also directed the EPA to identify efficiency strategies and drive the market for efficiency. This report projected vastly increasing energy use by data centers unless measures were taken to significantly increase efficiency (see Figure 2).
Figure 2. Chart ES-1 from EPA report dated (August 2, 2007)
As of 2014, Moore’s Law has not yet failed. When it does, the end will be a result of physical limitations involved in the design of chips and transistors, having nothing to do with the data center environment.
At about the same time that EPA published its data center report, industry leaders took note of efficiency issues, ITE manufacturers began to place a greater emphasis on efficiency in their designs, in addition to performance; and data center designers and operators began designing for efficiency as well as reliability and cost; and operators started to realize that efficiency does not require a sacrifice of reliability.
Legacy Cooling and the End of Raised Floor
For decades, computer rooms and data centers utilized raised floor systems to deliver cold air to servers. Cold air from a computer room air conditioner (CRAC) or computer room air handler (CRAH) pressurized the space below the raised floor. Perforated tiles provided a means for the cold air to leave the plenum and enter the main space—ideally in front of server intakes. After passing through the server, the heated air returned to the CRAC/CRAH to be cooled, usually after mixing with the cold air. Very often, the CRAC unit’s return temperature was the set point used to control the cooling system’s operation. Most commonly the CRAC unit fans ran at a constant speed, and the CRAC had a humidifier within the unit that produced steam. The primary benefit of a raised floor, from a cooling standpoint, is to deliver cold air where it is needed, with very little effort, by simply swapping a solid tile for a perforated tile (see Figure 3).
Figure 3: Legacy raised floor cooling
For many years, this system was the most common design for computer rooms and data centers. It is still employed today. In fact, I still find many operators who are surprised to enter a modern data center and not find raised floor and CRAC units.
The legacy system relies on one of the principles of comfort cooling: deliver a relatively small quantity of conditioned air and let that small volume of conditioned air mix with the larger volume of air in the space to reach the desired temperature. This system worked okay when ITE densities were low. Low densities enabled the system to meet its primary objective despite its flaws—poor efficiency, uneven cooling, etc.
At this point, it is an exaggeration to say the raised floor is obsolete. Companies still build data centers with raised floor air delivery. However, more and more modern data centers do not have raised floor simply because improved air delivery techniques have rendered it unnecessary.
How Cold is Cold Enough?
“Grab a jacket. We’re going in the data center.”
Heat must be removed from the vicinity of the ITE electrical components to avoid overheating the components. If a server gets too hot, onboard logic will turn it off to avoid damage to the server.
ASHRAE Technical Committee 9.9 (TC 9.9) has done considerable work in the area of determining suitable environments for ITE. I believe their publications, especially Thermal Guidelines for Data Processing Equipment, have facilitated the transformation of data centers from the “meat lockers” of legacy data centers to more moderate temperatures. [Editor’s note: The ASHRAE Technical Committee TC9.9 guideline recommends that the device inlet be between 18-27°C and 20-80% relative humidity (RH) to meet the manufacturer’s established criteria. Uptime Institute further recommends that the upper limit be reduced to 25°C to allow for upsets, variable conditions in operation, or to compensate for errors inherent in temperature sensors and/or controls systems.]
It is extremely important to understand that the TC 9.9 guidelines are based on server inlet temperatures—not internal server temperatures, not room temperatures, and certainly not server exhaust temperatures. It is also important to understand the concepts of Recommended and Allowable conditions.
If a server is kept too hot, but not so hot that it turns itself off, its lifespan could be reduced. Generally speaking, this lifespan reduction is a function of the high temperatures the server experiences and the duration of that exposure. In providing a broader Allowable range, ASHRAE TC 9.9 suggests that ITE can be exposed to the higher temperatures for more hours each year.
Given that technology refreshes can occur as often as every 3 years, ITE operators should consider how relevant the lifespan reduction is to their operations. The answer may depend on the specifics of a given situation. In a homogenous environment with a refresh rate of 4 years or less, the failure rate of increased temperatures may be insufficient to drive cooling design—especially if the manufacturer will warrant the ITE at higher temperatures. In a mixed environment with equipment of longer expected life spans, temperatures may warrant increased scrutiny.
In addition to temperature, humidity and contamination can affect ITE. Humidity and contamination tend to only affect ITE when the ITE is exposed to unacceptable conditions for a long period of time. Of course, in extreme cases (if someone dumped a bucket of water or dirt on a computer) one would expect to see an immediate effect.
The concern about low humidity involves electro-static discharge (ESD). As most people have experienced, in an environment with less moisture in the air (lower humidity), ESD events are more likely. However, ESD concerns related to low humidity in a data center have been largely debunked. In “Humidity Controls for Data Centers – Are They Necessary” (ASHRAE Journal, March 2010), Mark Hydeman and David Swenson wrote that ESD was not a real threat to ITE, as long as it stayed in the chassis. On the flip side, tight humidity control is no guarantee of protection against ESD for ITE with its casing removed. A technician removing the casing to work on components should use a wrist strap.
High humidity, on the other hand, does appear to pose a realistic threat to ITE. While condensation should definitely not occur, it is not a significant threat in most data centers. The primary threat is something called hygrometric dust particles. Basically, higher humidity can make dust in the air more likely to stick to electrical components in the computer. When dust sticks, it can reduce heat transfer and possibly cause corrosion to those components. The effect of reduced heat transfer is very similar to that caused by high temperatures.
There are several threats related to contamination. Dust can coat electronic components, reducing heat transfer. Certain types of dust, called zinc whiskers, are conductive. Zinc whiskers have been most commonly found in electroplated raised floor tiles. The zinc whiskers can become airborne and land inside a computer. Since they are conductive, they can actually cause damaging shorts in tiny internal components. Uptime Institute documented this phenomenon in a paper entitled “Zinc Whiskers Growing on Raised-Floor Tiles Are Causing Conductive Failures and Equipment Shutdowns.”
In addition to the threats posed by physical particulate contamination, there are threats related to gaseous contamination. Certain gases can be corrosive to the electronic components.
Cooling Process
The cooling process can be broken into steps:
1. Server Cooling. Removing heat from ITE
2. Space Cooling. Removing heat from the space housing the ITE
3. Heat Rejection. Rejecting the heat to a heat sink outside the data center
4. Fluid Conditioning. Tempering and returning fluid to the white space, to maintain appropriate
conditions within the space.
Server Cooling
ITE generates heat as the electronic components within the ITE use electricity. It’s Newtonian physics: the energy in the incoming electricity is conserved. When we say a server uses electricity, we mean the server’s components are effectively changing the state of the energy from electricity to heat.
Heat transfers from a solid (the electrical component) to a fluid (typically air) within the server, often via another solid (heat sinks within the server). ITE fans draw air across the internal components, facilitating this heat transfer.
Some sytems make use of liquids to absorb and carry heat from ITE. In general, liquids perform this function more efficiently than air. I have seen three such sytems:
• Liquid contact with a heat sink. A liquid flows through a server and makes contact with a heat sink inside the equipment, absorbing heat and removing it from the ITE.
• Immersion cooling. ITE components are immersed in a non-conductive liquid. The liquid absorbs the heat and transfers it away from the components.
• Dielectric fluid with state change. ITE components are sprayed with a non-conductive liquid. The liquid changes state and takes heat away to another heat exchanger, where the fluid rejects the heat and changes state back into a liquid.
In this article, I focus on systems associated with air-cooled ITE, as that is by far the most common method used in the industry.
Space Cooling
In legacy data center designs, heated air from servers mixes with other air in the space and eventually makes its way back to a CRAC/CRAH unit. The air transfers its heat, via a coil, to a fluid within the CRAC/CRAH. In the case of a CRAC, the fluid is a refrigerant. In the case of a CRAH, the fluid is chilled water. The refrigerant or chilled water removes the heat from the space. The air coming out of the CRAC/CRAH often has a discharge temperature of 55-60°F (13-15.5°C). The CRAC/CRAH blows the air into a raised floor plenum—typically using constant-speed fans. The standard CRAC/CRAH configuration from many manufacturers and designers controls the unit’s cooling based on return air temperature.
Layout and Heat Rejection Options
While raised floor free cooling worked okay in low-density spaces where no one paid attention to efficiency, it could not meet the demands of increasing heat density and efficiency—at least not as it had been historically used. I have been in legacy data centers with temperature gauges, and I’ve measured temperatures around 60°F (15.5°C) at the base of a rack and temperatures near 80°F (26°C) at the top of the same rack and also calculated PUEs well in excess of two.
People began to employ best practices and technologies including Hot Aisles and Cold Aisles, ceiling return plenums, raised floor management, and server blanking panels to improve the cooling performance in raised floor environments. These methods are definitely beneficial, and operators should use them.
Around 2005, design professionals and operators began to experiment with the idea of containment. The idea is simple; use a physical barrier to separate cool server intake air from heated server exhaust air. Preventing cool supply air and heated exhaust air from mixing provides a number of benefits, including:
• More consistent inlet air temperatures
• The temperature of air supplied to the white space can be raised, improving options for efficiency
• The temperature of air returning to the coil is higher, which typically makes it operate more efficiently
• The space can accommodate higher density equipment
Ideally, in a contained environment, air leaves the air handling equipment at a temperature and humidity suitable for ITE operation. The air goes through the ITE only once and then returns to the air handling equipment for conditioning.
Hot Aisle Containment vs. Cold Aisle Containment
In a Cold Aisle containment system, cool air from air handlers is contained, while hot server exhaust air is allowed to return freely to the air handlers. In a Hot Aisle containment system, hot exhaust air is contained and returns to the air handlers, usually via a ceiling return plenum (see Figure 4).
Figure 4: Hot Aisle containment
Cold Aisle containment can be very useful in a raised floor retrofit, especially if there is no ceiling return plenum. In such a case, it might be possible to leave the cabinets more or less as they are, as long as they are in a Cold Aisle/Hot Aisle arrangement. One builds the containment system around the existing Cold Aisles.
Most Cold Aisle containment environments are used in conjunction with raised floor. It is also possible to use Cold Aisle containment with another delivery system, such as overhead ducting. The raised floor option allows for some flexibility; it is much more difficult to move a duct, once it is installed.
In a raised floor environment with multiple Cold Aisle pods, the volume of cold air delivered to each pod depends largely on the number of floor tiles deployed within each of the containment areas. Unless one builds an extremely high raised floor, the amount of air that can go to a given pod is going to be limited. High raised floors can be expensive to build; the heavy ITE must go on top of the raised floor.
In a Cold Aisle containment data center, one must typically assume that airflow requirements for a pod will not vary significantly on a regular basis. It is not practical to frequently switch out floor tiles or even adjust floor tile dampers. In some cases, a software system that uses CFD modeling to determine airflows based on real time information can then control air handler fan speeds in an attempt to get the right amount of air to the right pods. There are limits to how much air can be delivered to a pod with any given tile configuration; one must still try to have about the right amount of floor tiles in the proper position.
In summary, Cold Aisle containment works best in instances where the designer and operator have confidence in the layout of ITE cabinets and in instances where the loading of the ITE does not change much, nor vary widely.
I prefer Hot Aisle containment in new data centers. Hot Aisle containment increases flexibility. In a properly designed Hot Aisle containment data center, operators have more flexibility in deploying containment. The operator can deploy a full pod or chimney cabinets. The cabinet layouts can vary. One simply connects the pod or chimney to the ceiling plenum and cuts or removes ceiling tiles to allow hot air to enter it.
In a properly controlled Hot Aisle containment environment, the ITE determines how much air is needed. There is a significant flexibility in density. The cooling system floods the room with temperate air. As air is removed from the cool side of the room by server fans, the lower pressure area causes more air to flow to replace it.
Ideally, the server room has a large, open ceiling plenum, with clear returns to the air handling equipment. It is easier to have a large, open ceiling plenum than a large, open raised floor, because the ceiling plenum does not have to support the server cabinets. The air handlers remove air from the ceiling return plenum. Sabey typically controls fan speed based on differential pressure (dP) between the cool air space and the ceiling return plenum. Sabey attempts to keep the dP slightly negative in the ceiling return plenum, with respect to the cool air space. In this manner, any small leaks in containment cause cool air to go into the plenum. The air handler fans ramp up or down to maintain the proper airflow.
Hot Aisle containment requires a much simpler control scheme and provides more flexible cabinet layouts than a typical Cold Aisle containment system.
In one rather extreme example, Sabey deployed six customer racks in a 6000 ft2 space pulling a little more than 35 kilowatts (kW) per rack. The racks were all placed in a row. Sabey allowed about 24 inches between the racks and built a Hot Aisle containment pod around them. Many data centers would have trouble accommodating such high density racks. A more typical utilization in the same space might be 200 racks (30 ft2 per rack) at 4.5 kW/rack. Other than building the pod, Sabey did not have to take any sort of custom measures for the cooling. The operations sequence worked as intended, simply ramping up the air handler fans a bit to compensate for the increased airflow. These racks have been operating well for almost a year.
Hot Aisle containment systems tend to provide higher volumes of conditioned air compared to Cold Aisle containment, which is a minor benefit. In a Cold Aisle containment system, the volume of air in a data center at any given time is the volume of air in the supply plenum (whether that is a raised floor or overhead duct) and the amount of air in the contained Cold Aisles. This volume is typically less than the volume in the remainder of the room. In a Hot Aisle containment system, the room is flooded with air. The volume of hot air is typically limited to the air inside the Hot Aisle containment and the ceiling return plenum.
Hot Aisle containment also allows operators to remove raised floor from the design. Temperate air floods the room, often from the perimeter. The containment prevents mixing, so air does not have to be delivered immediately in front of the ITE. Removing raised floor reduces the initial costs and the continuing management headache.
There is one factor that could lead operators to continue to install raised floor. If one anticipates direct liquid cooling during the lifespan of the data center, a raised floor may make a very good location for the necessary piping.
Close-Coupled Cooling
There are other methods of removing heat from white spaces, including in-row and in-cabinet solutions. For example, rear-door heat exchangers accept heat from servers and remove it from a data center via a liquid.
In-row cooling devices are placed near the servers, typically as a piece of equipment placed in a row of ITE cabinets. There are also systems that are located above the server cabinets.
These close-coupled cooling systems reduce the fan energy required to move air. These types of systems do not strike me as being optimal for Sabey’s business model. I believe such a system would likely be more expensive and less flexible than Hot Aisle containment layouts for accommodating unknown future customer requirements, which is important for Sabey’s operation. Close-coupled cooling solutions can have good applications, such as increasing density in legacy data centers.
Heat Rejection
After server heat is removed from a white space, it must be rejected to a heat sink. The most common heat sink is the atmosphere. Other choices include bodies of water or the ground.
There are various methods of transferring data center heat to its ultimate heat sink. Here is a partial list:
• CRAH units with water-cooled chillers and cooling towers
• CRAH units with air-cooled chillers
• Split system CRAC units
• CRAC units with cooling towers or fluid coolers
• Pumped liquid (e.g., from in-row cooling) and cooling towers
• Airside economization
• Airside economization with direct evaporative cooling (DEC)
• Indirect evaporative cooling (IDEC)
Economizer Cooling
Most legacy systems include some form of refrigerant-based thermodynamic cycle to obtain the desired environmental conditions. Economization is cooling in which the refrigerant cycle is turned off—either part or all of the time.
Airside economizers draw outside air into the data center, which is often mixed with return air to obtain the right conditions, before entering the data center. IDEC is a variation of this in which the outside air does not enter the data center but receives heat from the inside air via a solid heat exchanger.
Evaporative cooling (either direct or indirect) systems use evaporated water to supplement the availability of economizer cooling or more efficient refrigerant-based cooling. The state change of water absorbs energy, lowering the dry bulb temperature to a point where it approaches the wet bulb (saturated) temperature of the air (see Figure 5).
Figure 5. Direct evaporative cooling (simplified)
In waterside economizer systems, the refrigerant cycle is not required when outside conditions are cold enough to achieve the desired chilled water temperature set points. The chilled water passes through a heat exchanger and rejects the heat directly to the condenser water loop.
Design Criteria
In order to design a cooling system, the design team must agree upon certain criteria.
Heat load (most often measured in kilowatts) typically gets the most attention. Most often, heat load actually includes two elements: total heat to be rejected and the density of that heat. Traditionally, data centers have measured heat density in watts per square foot. Many postulate that density should actually be measured in kilowatts per cabinet, which is a very defensible in cases where one knows the number of cabinets to be deployed.
Airflow receives less attention than heat load. Many people use computational fluid dynamics (CFD) software to model airflow. These programs can be especially useful in non-contained raised floor environments.
In all systems, but especially in contained environments, it is important that the volume of air produced by the cooling system meet the ITE requirement. There is a direct relationship between heat gain through a server, power consumed by the server, and airflow through that server. Heat gain through a server is typically measured by the temperature difference between the server intake and server exhaust or delta T (∆T). Airflow is measured in volume over time, typically cubic feet per minute (CFM).
Assuming load has already been determined, a designer should know (or, more realistically, assume) a ∆T. If the designer does not assume a ∆T, the designer leaves it to the equipment manufacturer to determine the design ∆T, which could result in airflow that does not match the requirements.
I typically ask designers to assume a 20°F (11°C) ∆T. Higher density equipment, such as blades, typically has higher ∆T. However, most commodity servers are doing well to get as high as a 20°F (11°C) ∆T. (Proper containment and various set points can also make a tremendous difference.)
The risk of designing a system in which the design ∆T is lower than the actual ∆T is that the system will not be able to deliver the necessary airflow/cooling. The risk in going the other way is that the owner will have purchased more capacity than the design goals otherwise warrant.
The Design Day equals the most extreme outside air conditions the design is intended to handle. The owner and designers have to decide how hot is hot enough, as it affects the operation of the equipment. In Seattle, in the 100 years before July 29, 2009, there was not a recorded ambient temperature above 100°F (38°C) (as measured at SeaTac airport). Also keep in mind that equipment is often located (especially on the roof) where temperatures are higher than are experienced at official weather stations.
An owner must determine what the temperature and humidity should be in the space. Typically, this is specified for a Design Day when N equipment is operating and redundant units are off-line. Depending on the system, the designers will determine air handler discharge set points based on these conditions, making assumptions and/or calculations of temperature increases between the air handler discharge and the server inlet. There can be opportunities for more efficient systems if the owner is willing to go into the ASHRAE Allowable range during extreme outside temperatures and/or during upset conditions such as utility interruptions. Sabey typically seeks to stay within the ASHRAE Recommended range In its business model.
The owner and designer should understand the reliability goals of the data center and design mechanical, electrical, and controls to support these reliability goals. Of course, when considering these items, the design team may be subject to over building. If the design team assumes an extreme Design Day, adds in redundant equipment, specifies the low end of the ASHRAE Recommended range, and then maybe adds a little percentage on top, just in case, the resulting system can be highly reliable, if designed and operated appropriately. It can also be too expensive to build and inefficient to operate.
It is worth understanding that data centers do not typically operate at design load. In fact, during much of a data center’s lifespan, it may operate in a lightly loaded state. Operators and designers should spend some time making the data center efficient in those conditions, not just as it approaches design load. Sabey has made design choices that allow us to not only cool efficiently, but also to cool efficiently at light loads. Figure 6 shows that we reached average PUE conditions of 1.20 at only 10% loading at one of its operating data centers.
Figure 6. PUE and design load (%) over time.
Crystal Ball
While very high density ITE is still being built and deployed, the density of most ITE has not kept up with the increases projected 10 years ago. Sabey was designing data centers at an average 150 watts/ft2 6 years ago, and the company has not yet seen a reason to increase that. Of course, Sabey can accommodate significantly higher localized densities where needed.
In the near future, I expect air-based cooling systems with containment to continue to be the system of choice for cooling data centers. In the long term, I would not be surprised to see increasing adoption of liquid-cooling technologies.
Conclusion
Sabey Data Centers develops and operates data centers. It has customers in many different verticals and of many different sizes. As a service provider, Sabey does not typically know the technology or layout its customers will require. Sabey’s data centers use different cooling technologies, suitable to the location. Sabey has data centers in the mild climate of Seattle, the semi-arid climate of central Washington, and in downtown New York City. Sabey’s data centers are housed in single-story greenfield buildings and in a redeveloped high-rise.
Despite these variations and uncertainties, all the data centers Sabey designs and operates have certain common elements. They all use Hot Aisle containment without raised floor. All have a ceiling return plenum for server exhaust air and flood the room for the server inlet air. These data centers all employ some form of economizer. Sabey seeks to operate efficiently in lightly loaded conditions, with variable speed motors for fans, pumps, and chillers, where applicable.
Sabey has used a variety of different mechanical systems with Hot Aisle containment, and I tend to prefer IDEC air handlers, where practical. Sabey has found that this is a very efficient system with lower water use than the name implies. Much of the time, the system is operating in dry heat exchanger mode. The system tends to facilitate very simple control sequencing, and that simplicity enhances reliability. The systems restart rapidly, which is good in utility interruptions. The fans keep spinning and ramp up as soon as the generators start providing power. Water remains in the sump, so the evaporative cooling process requires essentially no restart time. Sabey has successfully cooled racks between 35-40 kW with no problem.
Until there is broad adoption of liquid-cooled servers, the primary opportunities appear to be in optimizing air-cooled, contained data centers.
John Sasser
John Sasser brings more than 20 years of management experience to the operations of Sabey Data Centers’ portfolio of campuses. In addition to all day-to-day operations, start-ups and transitions, he is responsible for developing the conceptual bases of design and operations for all Sabey data centers, managing client relationships, overseeing construction projects, and overall master planning.
Mr. Sasser and his team have received recognition from a variety of organizations, including continuous uptime awards from the Uptime Institute and energy conservations awards from Seattle City Light and the Association of Energy Engineers.
Prior to joining Sabey, he worked for Capital One and Walt Disney Company. Mr. Sasser also spent 7 years with the Navy Civil Engineer Corps.
AIG Tells How It Raised Its Level of Operations Excellence
/in Executive, Operations/by Kevin HeslinBy Kevin Heslin and Lee Kirby
Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates, and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.
With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.
This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.
Herb Alvarez: Director of Global Engineering and Critical Facilities
American International Group
An experienced staff was empowered to improve infrastructure, staffing, processes, and programs
What’s the greatest challenge managing your current footprint?
Providing global support and oversight via a thin staffing model can be difficult, but due to the organizational structure and the relationship with our global FM alliance partner (CBRE) we have been able to improve service delivery, manage cost, and enhance reliability. From my perspective, the greatest challenges have been managing the cultural differences of the various regions, followed by the limited availability of qualified staffing in some of the regions. With our global FM partner, we can provide qualified coverage for approximately 90% of our portfolio; the remaining 10% is where we see some of these challenges.
Do you have reliability or energy benchmarks?
We continue to make energy efficiency and sustainability a core requirement of our data center management practice. Over the last few years we retrofitted two existing data center pods at our two global data centers and we replaced EOL (end of life) equipment with best-in-class, higher efficiency systems. The UPS systems that we installed achieve a 98% efficiency rating while operating in ESS mode and 94 to 96% rating while operating in VMMS mode. In addition, the new cooling systems were installed with variable flow controls and VFDs for the chillers, pumps, and CRAHs. Including full cold aisle containment as well as multiple control algorithms to enhance operating efficiency. Our target operating model for the new data center pods was to achieve a Tier III level of reliability along with a 1.75 PUE, and we achieved both of these objectives. The next step on our energy and sustainability path is to seek Energy Star and other industry recognitions.
Can you tell me about your governance model and how that works?
My group in North America is responsible for the strategic direction and the overall management for the critical environments around the world. We set the standards (design, construction, operations, etc.), guidelines, and processes. Our regional engineering managers, in turn, carry these, out at the regional level. At the country level, we have the tactical management (FM) that ultimately implements the strategy. We subscribe to a system of checks and balances, and we have incorporated global and regional auditing to ensure that we have consistency throughout the execution phase. We also incorporate KPIs to promote the high level of service delivery that we expect.
From your perspective, what is the greatest difficulty in making that model work, ensuring that the design ideas are appropriate for each facility, and that they are executed according to your standards?
The greatest difficulties encountered were attributed to the cultural differences between regions. Initially, we encountered some resistance at the international level in regards to broad acceptance of design standards and operating standards. However, with the support of executive senior leadership and the on-going consolidation effort, we achieved global acceptance through a persistent and focused effort. We now have the visibility and oversight to ensure that our standards and guidelines are being enforced across the regions. It is important to mention that our standards, although rigid, do have flexible components embedded in them due to the fact that a “one size fits all” regimen is not always feasible. For these instances, we incorporated an exception process that grants the required flexibility to deviate from a documented standard. In terms of execution, we now have the ability via “in-country” resources to validate designs and their execution.
It also requires changing the culture, even within our own corporate group. For example, we have a Transactions group that starts the search for facilities. Our group said that we should only be in this certain type of building, this quality of building, so we created some standards and minimum requirements. We said, “We are AIG. We are an insurance company. We can’t go into a shop house.” This was a cultural change, because Transactions always looked for the lowest cost option first.
The AIG name is at stake. Anything we do that is deficient has the potential to blemish the brand.
Herb, it sounds like you are describing a pretty successful program. And yet, I am wondering if there are things that you would do differently if you starting from scratch.
If it were a clean slate, and a completely new start, I would look to use an M&O type of assessment at the onset of any new initiatives as it relates to data center space acquisition. Utilizing M&O as a widely accepted and recognized tool would help us achieve consistency across data centers and would validate colo provider capabilities as it relates to their operational practices.
How do M&O stamps help the organization, and which parts of your operations do they influence the most?
I see two clear benefits. From the management and operations perspective, the M&O Stamp offers us a proven methodology of assessing our M&O practice, not only validating our program but also offering a level of benchmarking against other participants of the assessments. The other key benefit is that the M&O stamp helps us promote our capabilities within the AIG organization. Often, we believe that we are operationally on par with the industry, but a third-party validation from a globally accepted and recognized organization helps further validate our beliefs and our posture as it relates to the quality of the service delivery that we provide. We look at the M&O stamp as an on-going certification process that ensures that we continually uphold the underlying principles of management and operations excellence, a badge of honor if you will.
AIG has been awarded two M&O Stamps of Approval in the U.S. I know you had similar scores on the two facilities. Were the recommendations similar?
I expected more commonality between both of the facilities. When you have a global partner, you expect consistency across sites. In these cases, there were about five recommendations for each site; two of them were common to both sites. The others were not. It highlighted the need for us to re-assess the operation in several areas, and remediate where necessary.
Of course you have way more than two facilities. Were you able to look at those reports and those recommendations and apply them universally?
Oh, absolutely. If there was a recommendation specific to one site, we did not look at it just for that site. We looked to leverage that across the portfolio. It only makes sense, as it applies to our core operating principals of standardizing across the portfolio.
Is setting KPIs for operations performance part of your FM vendor management strategy?
KPIs are very important to the way we operate. They allow us to set clear and measureable performance indicators that we utilize to gauge our performance. The KPIs drive our requirement for continuous improvement and development. We incentivize our alliance partner and its employees based on KPI performance, which helps drive operational excellence.
Who do you share the information with and who holds you accountable for improvements in your KPIs?
That’s an interesting question. This information is shared with our senior management as it forms our year-over-year objectives and is used as a basis for our own performance reviews and incentive packages. We review our KPIs on an on-going basis to ensure that we are trending positively; we re-assess the KPIs on an annual basis to ensure that they remain relevant to the desired corporate objectives. During the last several years one of our primary KPIs has been to drive cost reductions to the tune of 5% reductions across the portfolio.
Does implementing those reductions become part of staff appraisals?
For my direct reports, the answer is yes. It becomes part of their annual objectives, they have to be measurable and we have to agree that they are achievable. We track progress on a regular basis and communicate progress via our quarterly employee reviews. Again, we are very careful that any such reductions do not adversely impact our operations or detract us from achieving our uptime requirements.
Do you feel that AIG has mastered demand management so you can effectively plan, deploy, and manage capacity at the speed of the client?
I think that we have made significant improvements over the last few years in terms of capacity planning, but I do believe that this is an area where we can still continue to improve. Our capacity planning team does a very good job of tracking, trending, and projecting workloads. But there is ample opportunity for us to become more granular on the projections side of the reporting, so that we have a very clear and transparent view of what is planned, its anticipated arrival, and its anticipated deployment time line. We recognize that we all play a role, and the expectation is that we will all work collaboratively to implement these types of enhancements to our demand/capacity management practice.
So you are viewing all of this as a competitive advantage.
You have to. That’s a clear objective for all of senior management. We have to have a competitive edge in the marketplace, whether that’s on the technology side, product side, or how we deliver services to our clients. We need to be best in class. We need to champion the cause and drive this message throughout the organization.
Staffing is a huge part of maintaining data center operational excellence. We hear from our Network members that finding and keeping talent is a challenge. Is this something you are seeing as well?
I definitely do think there is a shortage of data center talent. We have experienced this first hand. I do believe that the industry needs to have a focused data center education program to train data center personnel. I am not referring to the theoretical or on-line programs, which already exist, but hands-on training that is specific to data center infrastructure. Typical trade school programs focus on general systems and equipment but do not have a track that is specific to data centers, one that also includes operational practices in critical environments. I think there has got to be something in the industry that’s specialized and hands-on. Training that covers the complex systems found in data centers, such as UPS systems, switchgear, EPMS, BMS, fire suppression, etc.
How do you retain your own good talent?
Keep them happy, keep them trained, and above all keep it interesting. You have to have a succession track, a practice that allows growth from within but also accounts for employee turnover. The succession track has to ensure that we have operational continuity when a team member moves on to pursue other opportunities.
The data center environment is a very demanding environment, and so you have to keep staff members focused and engaged. We focus on building a team, and as part of team development we ensure team members are properly trained and developed to the point where we can help them achieve their personal goals, which often times includes upward mobility. Our development track is based on the CBRE Foundations training program. In addition to the training program, AIG and CBRE provide multiple avenues for staff members to pursue growth opportunities.
When the staff is stable, what kinds of things can you do to keep them happy when you can’t promote them?
Oftentimes, it is the small things you do that resonate the most. I am a firm believer that above-average performance needs to be rewarded. We are pro-active and at times very creative in how we acknowledge those that are considered top performers. The Brill Award, which we achieved as a team, is just one example. We acknowledged the team members with a very focused and sincere thank you communication, acknowledging not only their participation but also the fact that it could not have been achieved without them. From a senior management perspective, we can’t lose sight of the fact that in order to cultivate a team environment you have to be part of the team. We advocate for a culture of inclusion, development, and opportunity.
Herb Alvarez
Herb Alvarez is director of Global Engineering & Critical Facilities, American International Group. Inc. Mr. Alvarez is responsible for engineering and critical facilities management for the AIG portfolio, which comprises 970 facilities spread across 130 countries. Mr. Alvarez has overarching responsibility for the global data center facilities and their building operations. He works closely and in collaboration with AIG’s Global Services group, which is the company’s IT division.
AIG operates three purpose-built data centers in the U.S., including a 235,000 square foot (ft2) facility in New Jersey and a 205,000-ft2 facility in Texas, and eight regional colo data centers in Asia Pacific, EMEA, and Japan.
Mr. Alvarez helped implement a consolidation and standardization effort Global Infrastructure Utility (GIU) that AIG’s CEO Robert Benmosche implemented in 2010. This initiative was completed in 2013.
Kevin Heslin
Kevin Heslin is chief editor and director of ancillary projects at the Uptime Institute. He served as an editor at New York Construction News, Sutton Publishing, the IESNA, and BNP Media, where he founded Mission Critical, the leading commercial publication dedicated to data center and backup power professionals. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the B.A. in Journalism from Fordham University in 1981 and a B.S. in Technical Communications from Rensselaer Polytechnic Institute in 2000.
Meeting the M&O Challenge of Managing a Diverse Data Center Footprint: John Sheputis and Don Jenkins, Infomart
/in Executive, Operations/by Kevin HeslinBy Matt Stansberry and Lee Kirby
Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.
With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.
This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.
John Sheputis: President, Infomart Data Centers
Don Jenkins: VP Operations, Infomart Data Centers
Give our readers a sense of your current data center footprint.
Sheputis: The portfolio includes about 2.2 million square feet (ft2) of real estate, mostly data center space. The facilities in both of our West Coast locations are data center exclusive. The Dallas facility is enormous, at 1.6 million ft2, and is a combination of mission critical and non-mission critical space. Our newest site in Ashburn, VA, is 180,000 ft2 and undergoing re-development now, with commissioning on the new critical load capacity expected to complete early next year.
The Dallas site has been operational since the 1980s. We assumed the responsibility for the data center pods in that building in Q4 2014 and brought on staff from that site to our team.
What is the greatest challenge of managing your current footprint?
Jenkins: There are several challenges, but communicating standards across the portfolio is a big one. Also, different municipalities have varying local codes and governmental regulations. We need to adapt our standards to the different regions.
For example, air quality control standards vary at different sites. We have to meet very high air quality standards in California, which means we adhere to very strict requirements for engine-generator runtimes and exhaust filter media. But in other locations, the regulations are less strict, and that variance impacts our maintenance schedules and parts procurement.
Sheputis: It may sound trivial to go from an area where air quality standards are high to one that is less stringent, but it still represents a change in our standards. If you’re going to do development, it’s probably best to start in California or somewhere with more restrictive standards and then go somewhere else. It would be very difficult to go the other way.
More generally, the Infomart merger was a big bite. It includes a lot of responsibility for non-data center space. So now we have two operating standards. We have over 500,000 ft2 of office-use real estate that uses the traditional break-fix operation model. We also have over two dozen data center suites with another 500,000 ft2 of mission critical space as well, where nothing breaks, or if it does, there can be no interruption of service. These different types of property have two different operations objectives and require different skill sets. Putting those varying levels of operations under one team expands the number of challenges you absorb. It pushes us from managing a few sites to a “many sites” level of complexity.
How do you benchmark performance goals?
Sheputis: I’m going to restrict my response to our mission critical space. When we start or assume control of a project, we have some pretty unforgiving standards. We want concurrent maintenance, industry-leading PUE, on time, on budget, and no injuries—and we want our project to meet critical load capacity and quality standards.
But picking up somebody else’s capital project after they‘ve already completed their design and begun the work, yet before they finished? That is the hardest thing in the world. The Dallas Infomart site is so big, there are two or three construction projects going on at any time. Show up any weekend, and you’ll somebody is doing a crane pick or has a helicopter delivering some equipment to be installed on the roof. It’s that big. It’s a damn good thing that we have great staff on site in Dallas and someone like Don Jenkins to make sure everything goes smoothly.
We hear a lot about data center operations staffing shortages. What has been your experience at Infomart?
Jenkins: Good help is hard to find anywhere. Data center skills are very specific. It’s a lot harder to find good data center people. One of the things we try to do is hire veterans. Over half our operating engineers have military backgrounds, including myself. We do this not just out of patriotism or to meet security concerns, but because we understand and appreciate the similarity of a mission critical operation and a military operation (see http://journal.uptimeinstitute.com/resolving-data-center-staffing-shortage/).
Sheputis: If you have high standards, there is always a shortage of people for any job. But the corollary for that is that if you’re known for doing your job very well, the best people often find you. Don deserves credit for building low turnover teams. Creating a culture of continuity requires more than strong technical skillsets, you have to begin recruiting the kinds of people who can play on a team.
Don uses this phrase a lot to describe the type he’s looking for—people who are capable of both leading and being led. He wants candidates with low egos who care about outcomes, strong ethics, and who want to learn. We invest heavily in our training program, and we are rigorous in finding people who buy into our process. We don’t want people who want to be heroes. The ideal candidate is a responsible team player with an aptitude for learning, and we fill in the technical gaps as necessary over time. No one has all the skills they need day one. Our training is industry leading. To date, we have had no voluntary turnover.
Jenkins: We do about 250 man-hours of training for each staff member. It’s not cheap, but we feel it’s necessary and the guys love it. They want to learn. They ask for it. Greater skill attainment is a win-win for them, our tenants, and us.
Sheputis: When you build a data center, you often meet the technically strongest people at either the beginning of the project during design or the end of the project during the commissioning phase. Every project we do is Level 5 Commissioned. That’s when you find and address all of the odd or unusual use cases that the manufacturer may not have anticipated. More than once, we have had a UPS troubleshooting specialist say to Don, “You guys do it right. Let me know when you have an opening in your organization.”
Jenkins: I think it’s a testament that shows how passionate we are about what we do.
Are you standardizing management practices across multiple sites?
Sheputis: When we had one or two sites, it wasn’t a challenge because we were copying from California to Oregon. But with three or more sites it becomes much more difficult. With the inclusion of Dallas and Ashburn, we have had to raise our game. It is tempting to say we do the same thing everywhere, but that would be unrealistic at best.
Broadly speaking, we have two families of standards: Content and Process. For functional content we have specs for staffing, maintenance, security, monitoring, and the like. We apply these with the knowledge that there will be local exceptions—such as different codes and different equipment choices. An operator from one site has to appreciate the deviations at the other sites. We also have process-based standards, and these are more meticulously applied across sites. While the OEM equipment may be different, shouldn’t the process for change management be consistent? Same goes for the problem management process. Compliance is another area where consistency is expected.
The challenge with projecting any standard is to efficiently create evidence of acceptance and verification. We try to create a working feedback loop, and we are always looking for ways to do it better. We can centrally document standard policies and procedures, but we rely on field acceptance of the standard, and we leverage our systems to measure execution versus expectation. We can say please complete work orders on time and to the following spec, and we can delegate scheduling to the field, but the loop isn’t complete until we confirm execution and offer feedback on whether the work and documentation were acceptable.
What technology or methodology has helped your organization to significantly improve data center management?
Jenkins: Our standard building management system BMS is a Niagara™ product with an open framework. This allows our legacy equipment to talk over open protocols. All of our dashboards and data look the same and feel the same across all of the sites so that anybody could pull up another site and it would look the same to the operator.
Sheputis: Whatever system you’re using, there has to be a high premium on keeping it open. If you run on a closed system, it eventually becomes a lost island. This is especially true as you scale your operation. You have to have open systems.
How does your organization use the M&O Stamp?
Sheputis: The M&O stamp is one of the most important things we have ever achieved. And I’m not saying this to flatter you or the Uptime Institute. We believe data center operations are very important, and we have always believed we were pretty good. But I have to believe that many operators think they do a good job as well. So who is right? How does anyone really know? The challenge to the casual observer is that the data center industry is fairly closed. Operations are secure and private.
We started the process to see how good we were, and if we were good, we also thought it would be great to have a credible third party to acknowledge that. Saying I think I’m good is one thing, having a credentialed organization like Uptime Institute say so is much more.
But the M&O process is more than the Stamp of Approval. Our operations have matured and improved by participating in this process. Every year we reassess and recertify we feel like we learn new things, and we’re tracking our progress. The bigger benefit may be that the process forces us to think procedurally. When we’re setting up a new site, it helps us set a roadmap for what we want to achieve. Compared to all other forms of certification, we get something out of this beyond the credential; we get a path to improve.
Jenkins: Lots of people run a SWOT (strengths, weaknesses, opportunities, and threats) analysis or internal audit, but that feedback often lacks external reference points. You can give yourself an audit, and you can say “we’re great.” But what are you learning? How do you expand your knowledge? The M&O Stamp of Approval provides learning opportunities for us by providing a neutral experienced outsider viewpoint on where, and more importantly, how we can do better.
On one of the assessments, one of Uptime Institute’s consultants demonstrated how we could setup our chiller plant so that an operator could see all the key variables easily at a glance, with fewer steps to see what valves are open or closed. The advice was practical and easy to implement. Including markers on a chain, little flags on a chiller, LED lights on a pump. Very simple things to do, but we hadn’t thought of it. They’d seen it in Europe, it was easy to do, and it helps. That’s one specific example, but we used the knowledge of the M&O team to help us grow.
We think the M&O criteria and content will get better and deeper as time goes on. This is a solid standard for people to grow on.
Sheputis: We are for certifications, as they remove doubt, but most of the work and value is had in obtaining the first certification. I can see why others are cynical about value and cost to recertify. But I do think there’s real value in the ongoing M&O certification, mainly because it shows continuous improvement. No other certification process does that.
Jenkins: A lot of certifications are binary in that you pass if you have enough checked boxes—the content is specific, but operationally shallow. We feel that we get a lot more content out of the M&O process.
Sheputis: As I said before, we are for compliance and transparency. As we are often fulfilling a compliance requirement for someone else, there is clear value is saying we are PCI compliant or SSAE certified. But the M&O Stamp of Approval process is more like seeing a professional instructor. All other certifications should address the M&O stamp as “Sir.”
Matt Stansberry
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly editorial director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.
The Calibrated Data Center: Using Predictive Modeling
/in Executive, Operations/by Kevin HeslinBetter information leads to better decisions
By Jose Ruiz
New tools have dramatically enhanced the ability of data center operators to base decisions regarding capacity planning and operational performance like move, adds, and changes on actual data. The combined use of modeling technologies to effectively calibrate the data center during the commissioning process and the use of these benchmarks in modeling prospective configuration scenarios enable end users to optimize the efficiency of their facilities prior to the movement or addition of a single rack.
Data center construction is expected to continue growing in coming years to house the compute and storage capacity needed to support the geometric increases in data volume that will characterize our technological environment for the foreseeable future. As a result, data center operators will find themselves under ever-increasing pressure to fulfill dynamic requirements in the most optimized environment possible. Every kilowatt (kW) of cooling capacity will become increasingly precious, and operators will need to understand the best way to deliver it proactively.
As Uptime Institute’s Lee Kirby explains in Start With the End in Mind, a data center’s ongoing operations should be the driving force behind its design, construction, and commissioning processes.
This paper examines performance calibration and its impact on ongoing operations. To maximize data center resources, Compass performs a variety of analyses using Future Facilities’ 6SigmaDC and Romonet’s Software Suite. In the sections that follow, I will discuss how predictive modeling during data center design, the commissioning process, and finally, the calibration processes validate the predictive models. Armed with the calibrated model, a customer can study the impact of proposed modifications on data center performance before any IT equipment is physically installed in the data center. This practice helps data center operators account for the three key elements during facility operations: availability, capacity, and efficiency. Compass calls this continuous modeling.
Figure 1. CFD software creates a virtual facility model and studies the physics of the cooling and power elements of the data center
What is a Predictive Model?
A predictive model, in a general sense, combines the physical attributed and operating data of a system and uses that to calculate an outcome in the future. The 6Sigma model provides complete 3D representation of a data center at any given point in its life cycle. Combining the physical elements of IT equipment, racks, cables, air handling units (AHUs), power distribution units (PDUs), etc., with computational fluid dynamics (CFD) and power modeling, enables designers and operators to predict the impact of their configuration on future data center performance. Compass uses commercially available performance modeling and CFD tools to model data center performance in the following ways:
• CFD software creates a virtual facility model and studies the physics of the cooling and power elements of the data center (see Figure 1).
• The modeling tool interrogates the individual components that make up the data center and compare their actual performance with the initial modeling prediction.
This proactive modeling process allows operators to fine tune performance and identify potential operational issues at the component level. A service provider, for example, could use this process to maximize the sellable capacity of the facility and/or its ability to meet the service level agreements (SLA) requirements for new as well as existing customers.
Case Study Essentials
For the purpose of this case study all of the calibrations and modeling are based upon Compass Data Center’s Shakopee, MN, facility with the following specifications (see Figure 2):
• 13,000 square feet (ft2) of raised floor space
• No columns on the data center floor
• 12-foot (ft) false ceiling used as a return air
plenum
• 36-inch (in.) raised floor
• 1.2 megwatt (MW) of critical IT load
• four rooftop air handlers in an N+1 configuration
• 336 perforated tiles (25% open) with dampers installed
• Customer type: service provider
Figure 2. Data center room with rooftop AHUs
Cooling Baseline
The cooling system of this data center comprises 4 120-ton rooftop air handler units in an N+1 configuration (see Figure 3). The system provides a net cooling capacity that a) supports the data center’s 1.2-MW power requirement and b) delivers 156,000 cubic feet per minute (CFM) of airflow to the white space. The cooling units are controlled based on the total IT load present in the space. This method turns on AHUs as the load increases. Table 1 describes the scheme.
Table 2. Tests performed during calibration
These units have outside air economizers to leverage free cooling and increase efficiency. For the purpose of the calibration, the system was set to full recirculation mode with the outside air economization feature turned off. This allows the cooling system to operate at 100% mechanical cooling, which is representative of a standard operating day under the Design Day conditions.
Figure 3. Rooftop AHUs
Figure 4. Cabinet and perforated tile layout. Note: Upon turnover, the customer is responsible for racking and stacking the IT equipment.
Cabinet Layout
The default cabinet layout is based on a standard Cold Aisle/Hot Aisle configuration (see Figure 4).
Airflow Delivery and Extraction
Because the cooling units are effectively outside the building, a long opening on one side of the room serves as a supply air plenum. The air travels down the 36-in.-wide plenum to a patent-pending air dam before entering the raised floor. The placement of the air dam ensures even pressurization of the raised floor during both normal and maintenance failure modes. Once past the air dam, the air enters a 36-in. raised floor and is released into the above floor by 336 perforated tiles (25% open) (see Figure 5).
Figure 5. Airflow
Hot air from the servers then passes through ventilation grilles placed in the 12-ft false ceiling.
Commissioning and Calibration
Commissioning is a critical step in the calibration process because it eliminates extraneous variables that may affect subsequent reporting values. Upon the completion of the Integrated Systems Testing (IST), the calibration process begins. This calibration exercise is designed to enable the data center operator to compare actual data center performance against the modeled values.
Figure 6. Inconsistencies between model values and actual performance can be explored and examined prior to placing the facility into actual operation. These results provide a unique insight into whether the facility will operate as per the design intent in the local climate.
The actual process consists of conducting partial load tests in 25% increments and monitoring actual readings from specific building management system points, sensors, and devices that account for all the data center’s individual components.
Figure 7. Load bank and PDUs during the test
As a result of this testing, inconsistencies between model values and actual performance can be explored and examined prior to placing the facility into actual operation. These results provide a unique insight into whether the facility will operate as per the design intent in the local climate or whether there are issues that will affect future operation that must be addressed. Figure 6 shows the process. Figure 7 shows load banks and PDUs as arranged for testing.
Table 2. Tests performed during calibration
All testing at Shakopee was performed by a third-party entity to eliminate the potential for any reporting bias in the testing. The end result of this calibration exercise is that the operator now has a clear understanding of the benchmark performance standards unique to their data center. This provides specific points of reference for all future analysis and modeling to determine the prospective performance impact of site moves, adds, or changes. Table 2 lists the tests performed during the calibration.
Table 3. Perforated tile configuration during testing
During the calibration, dampers on appropriate number of tiles were closed proportionally to coincide with the load step. Table 3 shows the perforated tile damper configuration used during the test.
Table 4. CPM goals, test results, and potential adjustments
Analysis & Results
To properly interpret the results of the initial calibration testing, it’s important to understand the concept of cooling path management (CPM), which is the process of stepping through the full route taken by the cooling air and systematically minimizing or eliminating potential breakdowns. The ultimate goal of this exercise is meeting the air intake requirement for each unit of IT equipment. The objectives and associated changes are shown in Table 4.
Cooling paths are influenced by a number of variables, including the room configuration, IT equipment and its arrangement, and any changes that will fundamentally change the cooling paths. In order to proactively avoid cooling problems or inefficiencies that may creep in over time, CPM is, therefore, essential to the initial design of the room and to configuration management of the data center throughout its life span.
AHU Fans to Perforated Tiles (Cooling Path #1). CPM begins by tracing the airflow from the source (AHU fans) to the returns (AHU returns). The initial step consists of investigating the underfloor pressure. Figure 8 shows the pressure distribution in the raised floor. In this example, the underfloor pressure is uniform from the very onset; thereby, ensuring an even flow rate distribution.
Figure 8 shows the pressure distribution in the raised floor. In this example, the underfloor pressure is uniform from the very onset; thereby, ensuring an even flow rate distribution.
From a calibration perspective, Figure 9 demonstrates that the results obtained from the simulation are aligned with the data collected during commissioning/calibration testing. The average underfloor pressure captured by software during the commissioning process was 0.05 in. of H20 as compared to 0.047 in. H20 predicted by 6SigmaDC.
The airflow variation across the 336 perforated tiles was determined to be 51 CFM. These data guaranteed an average target cooling capacity of 4 kW/cabinet compared to the installed 3.57 kW/cabinet (assuming that the data center operator uses the same type of perforated tiles as those initially installed). In this instance, the calibration efforts provided the benchmark for ongoing operations, and verified that the customer target requirements could be fulfilled prior to their taking ownership of the facility.
The important takeaway in this example is the ability of calibration testing to not only validate that the facility is capable of supporting its initial requirements but also to offer the end user a cost-saving mechanism to determine the impact of proposed modifications on the site’s performance, prior to their implementation. In short, hard experience no longer needs to be the primary mode of determining the performance impact of prospective moves, adds, and changes.
Table 5. Airflow simulations and measured results
During the commissioning process, all 336 perforated tiles were measured.
Table 5 is a results comparison of the measured and simulated flow from the perforated tiles.
Table 6. Airflow distribution at the perforated tiles
The results show a 1% error between measured and simulated values. Let’s take a look at the flow distribution at the perforated tiles (see Table 6).
The flows appear to match up quite well. It is worth noting that the locations of the minimum and maximum flows are different between measured and simulated values. However, this is not of concern as the flows are within an acceptable margin of error. Any large discrepancy (> 10%) between simulated and measured would warrant further investigation (see Table 7). The next step in the calibration process examined the AHU supply temperatures.
Perforated Tiles to Cabinets (Cooling Path #2). Perforated tile to cabinet airflow (see Figure 10) is another key point of reference that should be included in calibration testing and determination. Airflow leaving the perforated tiles enters the inlets of the IT equipment with minimal bypass.
Figure 9. Simulated flow through the perforated tiles
Figure 10. The blue particles cool the IT equipment, but the gray particles bypass the equipment.
Figure 10 shows how effective the perforated tiles are in terms of delivering the cold air to the IT equipment. The blue particles cool the IT equipment while the gray particles bypassing the equipment.
A key point of this testing is the ability to proactively identify solutions that can increase efficiency. For example, during this phase, testing helped determine that reducing fan speed would improve the site’s efficiency. As a result, the AHU fans were fitted with variable frequency drives (VFDs), which enables Compass to more effectively regulate this grille to cabinet airflow.
Figure 11. Inlet temperatures
It was also determined that inlet temperatures to the cabinets were on the lower scale of the ASHRAE allowable range (see Figure 11), this creating the potential to raise the air temperature within the room for operations. If the operator takes action and raises the supply air temperature, they will have immediate efficiency gains and see significant cost savings.
Table 8. Savings estimates based on IT loads
The analytical model can estimate these savings quickly. Table 8 shows the estimated annual cost savings based on IT load, supply air temperature setting for the facility and a power cost of seven cents per kilowatt-hour (U.S. national average). It is important to note the location of the data center because the model uses specific EnergyPlus TMY3 weather files published by the U.S. Department of Energy for its calculation.
Figure 12. Cooling path three tracks airflow from the equipment exhaust to the returns of the AHU units
Cabinet Exhaust to AHU Returns (Cooling Path #3). Cooling path three tracks airflow from the equipment exhaust to the returns of the AHU units (see Figure 12). In this case, calibration testing identified that the inlet temperatures suggest that there was very little external or internal cabinet recirculation. The return temperatures and the capacities of the AHU units are fairly uniform. The table shows the comparison between measured and simulated AHU return temperatures:
Looking at the percentage cooling load utilized for each AHU unit, the measured load was around 75% and the simulated values show an average value of 80% for each AHU. This slight discrepancy was acceptable due to the differences between the measured and simulated supply and return temperatures; thereby, establishing the acceptable parameters for ongoing operation within the site.
Introducing Continuous Modeling
Up to this point, I have illustrated how calibration efforts can be used to both verify the suitability of the data center to successfully perform as originally designed and to prescribe the specific benchmarks for the site. This knowledge can be used to evaluate the impact of future operational modifications, which is the basis of continuous modeling.
The essential value of continuous modeling is its ability to facilitate more effective capacity planning. By modeling prospective changes before moving IT equipment in, a lot of important what-if’s can be answered (and costs avoided) while meeting all the SLA requirements.
Examples of continuous modeling applications include, but are not limited to:
• Creating custom cabinet layouts to predict the impact of various configurations
• Increasing cabinet power density or modeling custom cabinets
• Modeling Cold Aisle/Hot Aisle containment
• Changing the control systems that regulate VFDs to move capacity where needed
• Increasing the air temperature safely without breaking the temperature SLA
• Investigating upcoming AHU maintenance or AHU failures that can’t be achieved in a production environment
In each of these applications, the appropriate modeling tools are used in concert with initial calibration data to determine the best method of implementing a desired change. The ability to proactively identify the level of deviation from the site’s initial system benchmarks can aid in the identification of more effective alternatives that not only improve operational performance but also reduce the time and cost associated with their implementation.
Case History: Continuous Modeling
Total airflow in the facility described in this case study is based on the percentage of IT load in the data hall with a design criteria of 25°F (-4°C) ∆T. Careful tile management must be practiced in order to maintain proper static pressure under the raised floor and avoid potential hot spots. Using the calibrated model, Compass created two scenarios to understand the airflow behavior. This resulted in installing fewer perforated tiles than originally planned and better SLA compliance. Having the calibrated model gave a higher level of confidence for the results. The two scenarios are summarized following.
Figure 13. Case history equipment layout
Scenario 1: Less Than Ideal Management
There are 72 4-kW racks in one area of the raised floor and six 6 20-kW racks in the opposite corner (see Figure 13). The total IT load is 408 kW, which is equal to 34% of the total IT load available. The total design airflow at 1,200 kW is 156,000 CFM, meaning the total airflow delivered in this example is 53,040 CFM. A leakage rate of 12% is assumed, which means that 88% of the 53,040 CFM is distributed using the perforated tiles. Perforated tiles were provided in front of each rack. The 25% open tiles were used in front of the 4-kW racks and Tate GrateAire tiles were used in front of the 20-kW racks.
Figure 14. Scenario 1 data hall temperatures
The results of Scenario 1 demonstrate the temperature differences between the hot and cold aisles. For the area with 4-kW racks there is an average temperature difference of around 10°F (5.5 °C) between the Hot and Cold aisles, and the 20-kW racks have a temperature difference of around 30°F (16°C) (see Figure 14).
Scenario 2: Ideal Management
In this scenario, the racks were left in the same location, but the perforated tiles were adjusted to better distribute air based on the IT load. The 20-kW racks account for 120 kW of the total IT load while the 4-kW racks account for 288 kW of the total IT load. In an ideal floor layout, 29.4% of the airflow will be delivered to the 20-kW racks and 70.6% of the airflow will be delivered to the 4-kW racks. This will allow for an ideal average temperature difference across all racks.
Figure 15. Scenario 2 data hall temperatures
Scenario 2 shows a much better airflow distribution than Scenario 1. The 20-kW racks now have around 25°F (14°C) difference between the hot and cold aisles (see Figure 15).
In general, it may stand to reason that if there are a total of 336 perforated tiles in the space and the space is running at 34% IT load, 114 perforated tiles should be open. The model validated that if 114 perforated tiles were opened, the underfloor static pressure would drop off and potentially cause hot spots due to lack of airflow.
Furthermore, continuous modeling will allow operators a better opportunity to match growth with actual demand. Using this process, operators can validate capacity and avoid wasted capital expense due to poor capacity planning.
Conclusion
To a large extent, a lack of evaluative tools has historically forced data center operators to accept on faith their new data center’s ability to meet its design requirements. Recent developments in modeling applications not only address this long-standing short coming, but also provide operators with an unprecedented level of control. The availability of these tools provide end users with proactive analytical capabilities that manifest themselves in more effective capacity planning and efficient data center operation.
Table 9. Summary of the techniques used to develop in each step of model development and verification
Through the combination of rigorous calibration testing, measurement, and continuous modeling, operators can evaluate the impact of prospective operational modifications prior to their implementation and ensure that they are cost-effectively implemented without negatively affecting site performance. This enhanced level of control is essential for effectively managing data centers in an environment that will continue to be characterized by its dynamic nature and increasing application complexity. Finally, Table 9 summarizes the reasons why these techniques are valuable and provide a positive impact in data center operations.
Most importantly, all of these help the data center owner and operator make a more informed decision.
Jose Ruiz
Jose Ruiz is an accomplished data center professional with a proven track record of success. Mr. Ruiz serves as Compass Datacenters’ director of Engineering where he is responsible for all of the company’s sales engineering and development support activities. Prior to joining Compass, he spent four years serving in various sales engineering positions and was responsible for a global range of projects at Digital Realty Trust. Mr. Ruiz is an expert on CFD modeling.
Prior to Digital Realty Trust, Mr. Ruiz was a pilot in the United States Navy where he was awarded two Navy Achievement Medals for leadership and outstanding performance. He continues to serve in the Navy’s Individual Ready Reserve. Mr. Ruiz is a graduate of the University of Massachusetts with a degree in Bio-Mechanical Engineering.
Retainers Improve the Effectiveness of IEC Plugs
/in Operations/by Kevin HeslinThese small devices prevent accidental disconnection of mission critical gear
By Scott Good
Today IEC plugs are used at the rack-level PDU and the IT device. IEC plugs backing out of sockets create a significant concern, since these plugs feed UPS power to the device. In the past, twist-lock cord caps were used, but these did not address the connection of the IEC plug at the IT device. Retainers are a way the industry has addressed this problem.
In one case, Uptime Institute evaluated a facility in the Caribbean (a Tier Certified Constructed Facility) which was not using the retainers. While operators had checked all the connections two weeks earlier, when they isolated one UPS during the TCCF process, a single cord on a single device belonging to the largest customer was found to be loose and the device suffered an interruption of power.
The International Electrotechnical Commission (IEC) plug is the most common device used to connect rack-mounted IT hardware to power. In recent years, the use of IEC 60320 cords with IEC plugs has become more common, replacing twist-lock and field-constructed hard-wired type IEC plug connections. During several recent site evaluations, Uptime Institute has observed that the IEC 60320 plug-in electrical cords may fit loosely and accidentally disconnect during routine site network maintenance. Some incidents have involved plugs that were not fully inserted at the connections to the power distribution units (PDUs) in the IT rack or became loose due to temperature changes fluctuations. This technical paper will provide information related to cable and connector installation methods that can be used in ensuring a secure connection at the PDU.
IT Hardware Power Cables
The IEC publishes consensus-based international standards and manages conformity assessment systems for electric and electronic products, systems and services, collectively known as electrotechnology. The IEC 60320 standard describes the devices used to couple IT hardware to power systems. The plugs and cords described by this standard come in various configurations to meet the current and voltages found in each region. This standard is intended to ensure that proper voltage and current are provided to IT appliances wherever they are deployed (see http://www.iec.ch/worldplugs/?ref=extfooter).
The most common cables used to power standard PCs, monitors, and servers are designated C13 and C19. Cable connectors have male and female versions, with the female always carrying an odd number label. The male version carries the next higher even number as its designation. C19 and C20 connectors are becoming more common for use with servers and power distribution PDUs in high-power applications.
Most standard PCs accept a C13 female cable end, which connects a standard 5-15 plug cord set that plugs into a 120-volt (V) outlet to a C13 male inlet on the device end. In U.S. data centers, a C14/C13 coupler includes a C14 (male) end that plugs into a PDU and a C13 (female) end that power plugs into the server. Couplers in EU data centers also include C13s at the IT appliance end but have different male connectors to the PDU. These male ends are identified as C or CEE types. For example, the CEE /7 has two rounded prongs and provides power at a 220-V power.
IEC Plug Installation Methods
In data centers, PDUs are typically configured to support dual-corded IT hardware. Power cords are plugged into PDU receptacles that are powered from A and B power sources. During installation, installers typically plug a cable coupler in a server outlet first and then into a PDU.
Figure 1. Coiled cable
Sometimes the cord is longer than the distance between the server outlet and the PDU, so the installer will coil the cable and secure the coil with cable ties or Velcro (see Figures 1 and 2). This practice adds weight on the cable and stress to the closest connection, which is at the PDU. If the connection at the PDU is not properly supported, the connector can easily pull or fall out during network maintenance activity. Standard methods for securing PDU connections include cable retention clips, plug locks, and IEC Plug Lock and IEC Lock Plus.
Figure 2. Velcro ties
Cable retention clips are the original solution developed for IT hardware cable installations. These clips are manufactured to install at the connection point and clip to retention receptacles on the side of the PDU. Supports on the PDU receive the clip and hold the connector in the receptacle slot (see Figure 3).
Figure 3. A retention clip to PDU in use
Plug lock inserts prevent power cords from accidentally disconnecting from C13 output receptacles (see Figure 4). A Plug lock insert place over any C14 input cord strengthens the connection of the plug to the C13 outlet, keeping critical equipment plugged-in and running during routine rack access and maintenance.
Figure 4. Plug lock
C13 and C19 IEC Lock connectors include lockable female cable ends suitable for use with standard C14 or C20 outlets. They cannot be accidentally dislodged or vibrated out of the outlets (see Figure 5).
The IEC Plug Lock and IEC Lock Plus are also alternatives. Both products have an integral locking mechanism that secures C13 and C19 plugs to the power pins of the all C13 and C19 outlets.
Summary
Manufacturers of IEC plugs over the recent years have developed technologies in new and existing plug and cable products to help mitigate the issue of plugs working their way out of the sockets on both IT hardware and PDU power feeds.
Figure 5. IEC plug lock
As these connections are audited in the data center, it is good practice to see where these conditions exist or could be created. Having a plan to change out older style and suspect cables will help mitigate or avoid incidents during maintenance and change processes in data centers.
Scott Good
Scott Good is a senior consultant of Uptime Institute Professional Services, facilitating prospective engagements and delivering Tier Topology and Facilities Certifications to contracted clients. Mr. Good has been in the data center industry for more 25 years and has developed data center programs for enterprise clients globally. He has been involved in the execution of Tier programs in alignment with the Uptime Institute and was one of the first to be involved in the creation of the original Tier IV facilities. Mr. Good developed and executed a systematic approach to commissioning these facilities, and the processes he created are used by the industry to this day.