Achieving Uptime Institute Tier III Gold Certification of Operational Sustainability

Vantage Data Centers certifies design, facility, and operational sustainability at its Quincy, WA site

By Mark Johnson

In February 2015, Vantage Data Centers earned Tier III Gold Certification of Operational Sustainability (TCOS) from Uptime Institute for its first build at its 68-acre Quincy, WA campus. This project is a bespoke design for a customer that expects a fully redundant, mission critical, and environmentally sensitive data center environment for its company business and mission critical applications.

02_quincyAchieving TCOS verifies that practices and procedures (according to the Uptime Institute Tier Standard: Operational Sustainability) are in place to avoid preventable errors, maintain IT functionality, and support effective site operation. The Tier Certification process ensures operations are in alignment with an organization’s business objectives, availability expectations, and mission imperatives. The Tier III Gold TCOS provides evidence that the 134,000-square foot (ft2) Quincy facility, which qualified as Tier III Certified Constructed Facility (TCCF) in September 2014, would meet the customer’s operational expectations.

Vantage believes that TCOS is a validation that its practices, procedures, and facilities management are among the best in the world. Uptime Institute professionals verified not only that all the essential components for success are in place but also that each team member demonstrates tangible evidence of adhering strictly to procedure. It also provides verification to potential tenants that everything from maintenance practices to procedures, training, and documentation is done properly.

Recognition at this level is a career highlight for data center operators and engineers—the equivalent of receiving a 4.0-grade-point average from Vantage’s most elite peers. This recognition of hard work is a morale booster for everyone involved—including the tenant, vendors, and contractors, who all worked together and demonstrated a real commitment to process in order to obtain Tier Certification at this level. This commitment from all parties is essential to ensuring that human error does not undermine the capital investment required to build a 2N+1 facility capable of supporting up to 9 megawatts of critical load.

03_quincyData centers looking to achieve TCOS (for Tier-track facilities) or Uptime Institute Management & Operations (M&O) Stamp of Approval (independent of Tiers) should recognize that the task is first and foremost a management challenge involving building a team, training, developing procedures, and ensuring consistent implementation and follow up.

BUILDING THE RIGHT TEAM

The right team is the foundation of an effectively run data center. Assembling the team was Vantage’s highest priority and required a careful examination of the organization’s strengths and weaknesses, culture, and appeal to prospective employees.

Having a team of skilled heating, ventilation and air conditioning (HVAC) mechanics, electricians, and other highly trained experts in the field is crucial to running a data center effectively. Vantage seeks technical expertise but also demonstrable discipline, accountability, responsibility, and drive in its team members.

Beyond these must-have features is a subset of nice-to-have characteristics, and at the top of that list is diversity. A team that includes diverse skill sets, backgrounds, and expertise not only ensures a more versatile organization but also enables more work to be done in-house. This is a cost saving and quality control measure, and yet another way to foster pride and ownership in the team.

Time invested upfront in selecting the best team members helps reduce headaches down the road and gives managers a clear reference for what an effective hire looks like. A poorly chosen hire costs more in the long run, even if it seems like an urgent decision in the moment, so a rigorous, competency-based interview process is a must. If the existing team does not unanimously agree on a potential hire, organizations must move on and keep searching until the right person is found.

Recruiting is a continuous process. The best time to look for top talent is before it’s desperately needed. Universities, recruiters, and contractors can be sources of local talent. The opportunity to join an elite team can be a powerful inducement to promising young talent.

TRAINING

04_quincyTalent, by itself, is not enough. It is just as important to train the employees who represent the organization. Like medicine or finance, the data center world is constantly evolving—standards shift, equipment changes, and processes are streamlined. Training is both about certification (external requirements) and ongoing learning (internal advancement and education). To accomplish these goals, Vantage maintains and mandates a video library of training modules at its facilities in Quincy and Santa Clara, CA. In addition, the company has also developed an online learning management system that augments safety training, on-site video training, and personnel qualifications standards that require every employee to be trained on every piece of equipment on site.

The first component of a successful training program is fostering on-the-job learning in every situation. Structuring on the job learning requires that senior staff work closely with junior staff and employees with different types and levels of expertise match up with each other to learn from one another. Having a diverse hiring strategy can lead to the creation of small educational partnerships.

It’s impossible to ensure the most proficient team members will be available for every problem and shift, so it’s essential that all employees have the ability to maintain and operate the data center. Data center management should encourage and challenge employees to try new tasks and require peer reviews to demonstrate competency. Improving overall competency reduces over-dependence on key employees and helps encourage a healthier work-life balance.

Formalized, continuous training programs should be designed to evaluate and certify employees using a multi-level process through which varying degrees of knowledge, skill, and experience are attained. The objectives are ensuring overall knowledge, keeping engineers apprised of any changes to systems and equipment, and identifying and correcting any knowledge shortfalls.

PROCEDURES

Ultimately, discipline and adherence to fine-tuned procedures are essential to operational excellence within a data center. The world’s best-run data centers even have procedures on how to write procedures. Any element that requires human interaction or consideration—from protective equipment to approvals—should have its own section in the operating procedures, including step-by-step instructions and potential risks. Cutting corners, while always tempting, should be avoided; data centers live and die by procedure.

Managing and updating procedure is equally important. For example, major fires broke out just a few miles away from Vantage’s Quincy facility not long ago,. The team carefully monitored and tracked the fires, noting that the fires were still several miles away and seemingly headed away from our site. That information, however, was not communicated directly to the largest customer at the site, which called in the middle of the night to ask about possible evacuation and the recovery plan. Vantage collaborated with the customer to develop a standardized system for emergency notifications, which it incorporated in its procedures, to mitigate the possibility of future miscommunications.

Once procedures are created, they should go through a careful vetting process involving a peer review, to verify the technical accuracy of each written step, including lockout/tagout and risk identification. Vetting procedures means physically walking on site and carrying out each step to validate the procedure for accuracy and precision.

Effective work order management is part of a well-organized procedure. Vantage’s work order management process:

• Predefines scope of service documents to stay ahead of work

• Manages key work order types, such as corrective work orders, preventive maintenance work orders, and project work orders

• Measures and reports on performance at every step

Maintaining regular, detailed reporting practices adds yet another layer of procedural security. A work order system can maintain and manage all action items. Reporting should be reviewed with the parties involved in each step, with everyone held accountable for the results and mistakes analyzed and rectified on an ongoing basis.

Peer review is also essential to maintaining quality methods of procedure (MOPs) and standard operating procedures (SOPs). As with training, pairing up employees for peer review processes helps ensure excellence at all stages.

IMPLEMENTATION AND DISCIPLINE

Disciplined enforcement of processes that are proven to work is the most important component of effective standards and procedures. Procedures are not there to be followed when time allows or when it is convenient. For instance, if a contractor shows upon site without a proper work order or without having followed proper procedure, that’s not an invitation to make an exception. Work must be placed on hold until procedures can be adhered to, with those who did not follow protocol bearing accountability for the delay.

For example, Vantage developed emergency operating procedures (EOPs) for any piece of equipment that could possibly fail. And, sure enough, an uninterruptible power supply failed (UPS) during routine maintenance. Because proper procedures had been developed and employees properly trained, they followed the EOP to the letter, solving the problem quickly and entirely eliminating human error from the process. The loads were diverted, the crisis averted, and everything was properly stabilized to work on the UPS system without fear of interrupting critical loads.

Similarly, proper preparation for maintenance procedures eliminates risk of losing uptime during construction. Vantage develops and maintains scope of service documents for each piece of equipment in the data center, and what is required to maintain them. The same procedures for diverting critical loads for maintenance were used during construction to ensure the build didn’t interfere with critical infrastructure despite the load being moved more than 20 times.

Transparency and open communication between data center operators and customers while executing preventative maintenance is key. Vantage notifies the customer team at the Quincy facility prior to executing any preventative maintenance that may pose a risk to their data haul. The customer then puts in a snap record, which notifies their internal teams about the work. Following these procedures and getting the proper permissions ensures that the customer won’t be subjected to any uncontrolled risk and covers all bases should any unexpected issues arise.

When procedure breaks down and fails due to lack of employee discipline, it puts both the company and managerial staff in a difficult position. First, the lack of discipline undermines the effectiveness of the procedures. Second, management must make a difficult choice—retrain or replace the offending employee. For those given a second chance, managers put their own jobs on the line—a tough prospect in a business that requires to-the-letter precision at every stage.

To ensure that discipline is instilled deeply in every employee, it’s important that the team take ownership of every component. Vantage keeps all its work in-house and consistently trains its employees in multiple disciplines rather than outsourcing. This makes the core team better and more robust and avoids reliance on outside sources. Additionally, Vantage does not allow contractors to turn breakers on and off, because the company ultimately bears the responsibility of an interrupted load. Keeping everything under one roof and knowing every aspect of the data center inside and out is a competitive advantage.

Vantage’s accomplishment of Tier III Gold Certification of Operational Sustainability validates everything the company does to develop and support its operational excellence.


Mark Johnson

Mark Johnson

Mark Johnson is Site Operations Manger at Vantage Data Centers. Prior to joining Vantage, Mr. Johnson was data center facilities manager at Yahoo.  He was responsible for the critical facilities infrastructure for the Wenatchee and Quincy, WA, data centers.  He was also a CITS Facilities Engineer at Level 3 Communications, where he was responsible for the critical facilities infrastructure for two Sunnyvale, CA, data centers. Before that, Mr. John was an Engineer III responsible for critical facilities at VeriSign, where he was responsible for two data centers, and a chief facilities engineer at Abovenet.

 

Economizers in Tier Certified Data Centers

Achieving the efficiency and cost savings benefits of economizers without compromising Tier level objectives
By Keith Klesner

In their efforts to achieve lower energy use and greater mechanical efficiency, data center owners and operators are increasingly willing to consider and try economizers. At the same time, many new vendor solutions are coming to market. In Tier Certified data center environments, however, economizers, just as any other significant infrastructure system, must operate consistently with performance objectives.

Observation by Uptime Institute consultants indicates that roughly one-third of new data center construction designs include an economizer function. Existing data centers are also looking at retrofitting economizer technologies to improve efficiency and lower costs. Economizers use external ambient air to help cool IT equipment. In some climates, the electricity savings from implementing economizers can be so significant that the method has been called “free cooling.” But, all cooling solutions require fans, pumps, and/or other systems that draw power; thus, the technology is not really free and the term economizers is more accurate.

When The Green Grid surveyed large 2,500-square foot data centers in 2011, 49% of the respondents (primarily U.S. and European facilities) reported using economizers and another 24% were considering them. In the last 4 years, these numbers have continued to grow. In virtually all climatic regions, adoption of these technologies appears to be on the rise. Uptime Institute has seen an increase in the use of economizers in both enterprise and commercial data centers, as facilities attempt to lower their power usage effectiveness (PUE) and increase efficiency. This increased adoption is due in large part to fears about rising energy costs (predicted to grow significantly in the next 10 years). In addition, outside organizations, such as ASHRAE, are advocating for greater efficiencies, and internal corporate and client sustainability initiatives at many organizations drive the push to be more efficient and reduce costs.

The marketplace includes a broad array of economizer solutions:

• Direct air cooling: Fans blow cooler outside air into a data center, typically through filters

• Indirect evaporative cooling: A wetted medium or water spray promotes evaporation to supply cool air into a data center

• Pumped refrigerant dry coolers: A closed-loop fluid, similar to an automotive radiator, rejects heat to external air and provides cooling to the data center

• Water-side economizing: Traditional cooling tower systems incorporating heat exchangers bypass chillers to cool the data center

IMPLICATIONS FOR TIER
Organizations that plan to utilize an economizer system and desire to attain Tier Certification must consider how best to incorporate these technologies into a data center in a way that meets Tier requirements. For example, Tier III Certified Constructed Facilities have Concurrently Maintainable critical systems. Tier IV Certified Facilities must be Fault Tolerant.

Some economizer technologies and/or their implementation methods can affect critical systems that are integral to meeting Tier Objectives. For instance, many technologies were not originally designed for data center use, and manufacturers may not have thought through all the implications.

For example, true Fault Tolerance is difficult to achieve and requires sophisticated controls. Detailed planning and testing is essential for a successful implementation. Uptime Institute does not endorse or recommend any specific technology solution or vendor; each organization must make its own determination of what solution will meet the business, operating, and environmental needs of its facility.

ECONOMIZER TECHNOLOGIES
Economizer technologies include commercial direct air rooftop units, direct air plus evaporative systems, indirect evaporative cooling systems, water-side economizers, and direct air plus dry cooler systems.

DIRECT AIR
Direct air units used as rooftop economizers are often the same units used for commercial heating, ventilation, and air-conditioning (HVAC) systems. Designed for office and retail environments, this equipment has been adapted for 24 x 7 applications. Select direct air systems also use evaporative cooling, but all of them combine of them combine direct air and multi-stage direct expansion (DX) or chilled water. These units require low capital investment because they are generally available commercially, service technicians are readily available, and the systems typically consume very little water. Direct air units also yield good reported PUE (1.30–1.60).

On the other hand, commercial direct air rooftop units may require outside air filtration, as many units do not have adequate filtration to prevent the introduction of outside air directly into critical spaces, which increases the risk of particulate contamination.

Outside air units suitable for mission critical spaces require the capability of 100% air recirculation during certain air quality events (e.g., high pollution events and forest or brush fires) that will temporarily negate the efficiency gains of the units.

Figure 1. A direct expansion unit with an air-side economizer unit provides four operating modes including direct air, 100% recirculation, and two mixed modes. It is a well-established technology, designed to go from full stop (no power) to full cooling in 120 seconds or less, and allowing for PUE as low as 1.30-1.40.

Figure 1. A direct expansion unit with an air-side economizer unit provides four operating modes including direct air, 100% recirculation, and two mixed modes. It is a well-established technology, designed to go from full stop (no power) to full cooling in 120 seconds or less, and allowing for PUE as low as 1.30-1.40.

Because commercial HVAC systems do not always meet the needs of mission critical facilities, owners and operators must identify the design limitations of any particular solution. Systems may integrate critical cooling and the air handling unit or provide a mechanical solution that incorporates air handling and chilled water. These units will turn off the other cooling mechanism when outside air cooling permits. Units that offer dual modes are typically not significantly more expensive. These commercial units require reliable controls that ensure that functional settings align with the mission critical environment. It is essential that the controls sequence be dialed in before performing thorough testing and commissioning of all control possibilities (see Figure 1). Commercial direct air rooftop units have been used successfully in Tier III and Tier IV applications (see Figure 2).

II Klesner Figure 2a image002II Klesner Figure 2b image003

Figure 2. Chilled water with air economizer and wetting media provides nine operating modes including direct air plus evaporative cooling. With multiple operating modes, the testing regimen is extensive (required for all modes).

Figure 2. Chilled water with air economizer and wetting media provides nine operating modes including direct air plus evaporative cooling. With multiple operating modes, the testing regimen is extensive (required for all modes).

A key step in adapting commercial units to mission critical applications is considering the data center’s worst-case scenario. Most commercial applications are rated at 95°F (35°C), and HVAC units will typically allow some fluctuation in temperature and discomfort for workers in commercial settings. The temperature requirements for data centers, however, are more stringent. Direct air or chilled water coils must be designed for peak day—the location’s ASHRAE dry bulb temperature and/or extreme maximum wet bulb temperature. Systems must be commissioned and tested in Tier demonstrations for any event that would require 100% recirculation. If the unit includes evaporative cooling, makeup (process) water must meet all Tier requirements or the evaporative capacity must be excluded from the Tier assessment.

In Tier IV facilities, Continuous Cooling is required, including during any transition from utility power to engine generators. Select facilities have achieved Continuous Cooling using chilled water storage. In the case of one Tier IV site, the rooftop chilled water unit included very large thermal storage tanks to provide Continuous Cooling via the chilled water coil.

Controller capabilities and building pressure are also considerations. As these are commercial units, their controls are usually not optimized for the transition of power from the utility to the engine-generator sets and back. Typically, over- or under-pressure imbalance in a data center increases following a utility loss or mode change due to outside air damper changes and supply and exhaust fans starting and ramping up. This pressure can be significant. Uptime Institute consultants have even seen an entire wall blow out from over-pressure in a data hall. Facility engineers have to adjust controls for the initial building pressure and fine-tune them to adjust to the pressure in the space.

To achieve Tier III objectives, each site must determine if a single or shared controller will meet its Concurrent Maintainability requirements. In a Tier IV environment, Fault Tolerance is required in each operating mode to prevent a fault from impacting the critical cooling of other units. It is acceptable to have multiple rooftop units, but they must not be on a single control or single weather sensor/control system component. It is important to have some form of distributed or lead-lag (master/slave) system to control these components and enabling them to be operate in a coordinated fashion with no points of commonality. If any one component fails, the master control system will switch to the other unit, so that a fault will not impact critical cooling. For one Tier IV project demonstration, Uptime Institute consultants found additional operating modes while on site. Each required additional testing and controls changes to ensure Fault Tolerance.

DIRECT AIR PLUS EVAPORATIVE SYSTEMS
One economizer solution on the market, along with many other similar designs, involves a modular data center with direct air cooling and wetted media fed from a fan wall. The fan wall provides air to a flooded Cold Aisle in a layout that includes a contained Hot Aisle. This proprietary solution is modular and scalable, with direct air cooling via an air optimizer. This system is factory built with well-established performance across multiple global deployments. The systems have low reported PUEs and excellent partial load efficiency. Designed as a prefabricated modular cooling system and computer room, the system comes with a control algorithm that is designed for mission critical performance.

These units are described as being somewhat like a “data center in a box” but without the electrical infrastructure, which must be site designed to go with the mechanical equipment. Cost may be another disadvantage, as there have been no deployments to date in North America. In operation, the system determines whether direct air or evaporative cooling is appropriate, depending upon external temperature and conditions. Air handling units are integrated into the building envelope rather than placed on a rooftop.

Figure 3. Bladeroom’s prefabricated modular data center uses direct air with DX and evaporative cooling.

Figure 3. Bladeroom’s prefabricated modular data center uses direct air with DX and evaporative cooling.

One company has used a prefabricated modular data center solution with integrated cooling optimization between indirect, evaporative, and DX cooling in Tier III facilities. In these facilities, a DX cooling system provides redundancy to the evaporative cooler. If there is a critical failure of the water supply to the evaporative cooler (or the water pump, which is measured by a flow switch), the building management system starts DX cooling and puts the air optimizer into full recirculation mode. In this set up, from a Tier objective perspective, the evaporative system and water system supporting it are not critical systems. Fans are installed in an N+20% configuration to provide resilience. The design plans for DX cooling at less than 6% of the year at the installations in Japan and Australia and acts as redundant mechanical cooling for the remainder of the year, able to meet 100% of the IT capacity. The redundant mechanical cooling system itself is an N+1 design (See Figures 3 and 4).

Figure 4. Supply air from the Bladeroom “air optimizer” brings direct air with DX and evaporative cooling into flooded cold aisles in the data center.

Figure 4. Supply air from the Bladeroom “air optimizer” brings direct air with DX and evaporative cooling into flooded cold aisles in the data center.

This data center solution has seen multiple Tier I and Tier II deployments, as well as several Tier III installations, providing good efficiency results. Achieving Tier IV may be difficult with this type of DX plus evaporative system because of Compartmentalization and Fault Tolerant capacity requirements. For example; Compartmentalization of the two different air optimizers is a challenge that must be solved; the louvers and louver controls in the Cold Aisles are not Fault Tolerant and would require modification, and Compartmentalization of electrical controls has not been incorporated into the concept (for example, one in the Hot Aisle and one in the Cold Aisle).

INDIRECT EVAPORATIVE COOLING SYSTEMS
Another type of economizer employs evaporative cooling to indirectly cool the data center using a heat exchanger. There are multiple suppliers of these types of systems. New technologies incorporate cooling media of hybrid plastic polymers or other materials. This approach excludes outside air from the facility. The result is a very clean solution; pollutants, over-pressure/under-pressure, and changes in humidity from outside events like thunderstorms are not concerns. Additionally, a more traditional, large chilled water plant is not necessary (although makeup water storage will be needed) because chilled water is not required.

As with many economizing technologies, greater efficiency can enable facilities to avoid upsizing the electrical plant to accommodate the cooling. A reduced mechanical footprint may mean lower engine-generator capacity, fewer transformers and switchgear, and an overall reduction in the often sizable electrical generation systems traditionally seen in a mission critical facility. For example, one data center eliminated an engine generator set and switchgear, saving approximately US$1M (although the cooling units themselves were more expensive than some other solutions on the market).

The performance of these types of systems is climate dependent. No external cooling systems are generally required in more northern locations. For most temperate and warmer climates some supplemental critical cooling will be needed for hotter days during the year. The systems have to be sized appropriately; however, a small supplemental DX top-up system can meet all critical cooling requirements even in warmer climates. These cooling systems have produced low observed PUEs (1.20 or less) with good partial load PUEs. Facilities employing these systems in conjunction with air management systems and Hot Aisle containment to supply air inlet temperatures up to the ASHRAE recommendation of 27°C (81°F) have achieved Tier III certification with no refrigeration or DX systems needed.

Indirect air/evaporative solutions have two drawbacks, a relative lack of skilled service technicians to service the units and high water requirements. For example, one fairly typical unit on the market can use approximately 1,500 cubic meters (≈400,000 gallons) of water per megawatt annually. Facilities need to budget for water treatment and prepare for a peak water scenario to avoid an impactful water shortage for critical cooling.

Makeup water storage must meet Tier criteria. Water treatment, distribution, pumps, and other parts of the water system must meet the same requirements as the critical infrastructure. Water treatment is an essential, long-term operation performed using methods such as filtration, reverse osmosis, or chemical dosing. Untreated or insufficiently treated water can potentially foul or scale equipment, and thus, water-based systems require vigilance.

It is important to accurately determine how much makeup water is needed on site. For example, a Tier III facility requires 12 hours of Concurrently Maintainable makeup water, which means multiple makeup water tanks. Designing capacity to account for a worst-case scenario can mean handling and treating a lot of water. Over the 20-30 year life of a data center, thousands of gallons (tens of cubic meters) of stored water may be required, which becomes a site planning issue. Many owners have chosen to exceed 12 hours for additional risk avoidance. For more information, refer to Accredited Tier Designer Technical Paper Series: Makeup Water).

WATER-SIDE ECONOMIZERS
Water-side economizer solutions combine a traditional water-cooled chilled water plant with heat exchangers to bypass chiller units. These systems are well known, which means that skilled service technicians are readily available. Data centers have reduced mechanical plant power consumption from 10-25% using water-side economizers. Systems of this type provide perhaps the most traditional form of economizer/mechanical systems power reduction. The technology uses much of the infrastructure that is already in place in older data centers, so it can be the easiest option to adopt. These systems introduce heat exchangers, so cooling comes directly from cooling towers and bypasses chiller units. For example, in a climate like that in the northern U.S., a facility can run water through a cooling tower during the winter to reject heat and supply the data center with cool water without operating a chiller unit.

Controls and automation for transitions between chilled water and heat exchanger modes are operationally critical but can be difficult to achieve smoothly. Some operators may bypass the water-side economizers if they don’t have full confidence in the automation controls. In some instances, operators may choose not to make the switch when a facility is not going to utilize more than four or six hours of economization. Thus energy savings may actually turn out to be much less than expected.

Relatively high capital expense (CapEx) investment is another drawback. Significant infrastructure must be in place on Day 1 to account for water consumption and treatment, heat exchangers, water pumping, and cooling towers. Additionally, the annualized PUE reduction that results from water-side systems is often not significant, most often in the 0.1–0.2 range. Data center owners will want a realistic cost/ROI analysis to determine if this cooling approach will meet business objectives.

Figure 5. Traditional heat exchanger typical of a water-side economizer system.

Figure 5. Traditional heat exchanger typical of a water-side economizer system.

Water-side economizers are proven in Tier settings and are found in multiple Tier III facilities. Tier demonstrations are focused on the critical cooling system, not necessarily the economizer function. Because the water-side economizer itself is not considered critical capacity, Tier III demonstrations are performed under chiller operations, as with typical rooftop units. Demonstrations also include isolation of the heat exchanger systems and valves, and economizer control of functions is critical (see Figures 5 and 6). However, for Tier IV settings where Fault Tolerance is required, the system must be able to respond autonomously. For example, one data center in Spain had an air-side heat recovery system with a connected office building. If an economizer fault occurred, the facility would need to ensure it would not impact the data center. The solution was to have a leak detection system that would shut off the economizer to maintain critical cooling of the data hall in isolation.

Figure 6. The cooling tower takes in hot air from the sides and blows hot, wet air out of the top, cooling the condenser water as it falls down the cooling tower. In operation it can appear that steam is coming off the unit, but this is a traditional cooling tower.

Figure 6. The cooling tower takes in hot air from the sides and blows hot, wet air out of the top, cooling the condenser water as it falls down the cooling tower. In operation it can appear that steam is coming off the unit, but this is a traditional cooling tower.

CRACs WITH PUMPED REFRIGERANT ECONOMIZER
Another system adds an economizer function to a computer room air conditioner (CRAC) unit using a pumped liquid refrigerant. In some ways, this technology operates similarly to standard refrigeration units, which use a compressor to convert a liquid refrigerant to a gas. However, instead of using a compressor, newer technology blows air across a radiator unit to reject the heat externally without converting the liquid refrigerant to a gas. This technology has been implemented and tested in several data centers with good results in two Tier III facilities.

The advantages of this system include low capital cost compared to many other mechanical cooling solutions. These systems can also be fairly inexpensive to operate and require no water. Because they use existing technology that has been modified just slightly, it is easy to find service technicians. It is a proven cooling method with low estimated PUE (1.25–1.35), not quite as much as some modern CRACs that yield 1.60–1.80 PUE, but still substantial. These systems offer distributed control of mode changes. In traditional facilities, switching from chillers to coolers typically happens using one master control. A typical DX CRAC installation will have 10-12 units (or even up to 30) that will self-determine the cooling situation and individually select the appropriate operating mode. Distributed control is less likely to cause a critical cooling problem even if one or several units fail. Additionally these units do not use any outside air. They recirculate inside air, thus avoiding any outside air issues like pollution and humidity.

The purchase of DX CRAC units with dry coolers does require more CapEx investment, a 50–100% premium over traditional CRACs.Other cooling technologies may offer higher energy efficiency. Additional space is required for the liquid pumping units, typically on the roof or beside the data center.

Multiple data centers that use this technology have achieved Tier III Certification. From a Tier standpoint, these CRACs are the same as the typical CRAC. In particular, the distributed control supports Tier III requirements, including Concurrent Maintainability. The use of DX CRAC systems needs to be considered early in the building design process. For example, the need to pump refrigerant limits the number of building stories. With a CRAC in the computer room and condenser units on the roof, two stories seem to be the building height limit at this time. The suitability of this solution for Tier IV facilities is still undetermined. The local control mechanism is an important step to Fault Tolerance, and Compartmentalization of refrigerant and power must be considered.

OPERATIONS CONSIDERATIONS WITH ECONOMIZERS
Economizer solutions present a number of operational ramifications, including short- and long-term impacts, risk, CapEx, commissioning, and ongoing operating costs. An efficiency gain is one obvious impact; although, an economizer can increase some operational maintenance expenses:
• Several types require water filtration and/or water treatment

• Select systems require additional outside air filtration

• Water-side economizing can require additional cooling tower maintenance

Unfortunately in certain applications, economization may not be a sustainable practice overall, from either a cost or “green” perspective, even though it reduces energy use. For example, high water use is not an ideal solution in dry or water-limited climates. Additionally, extreme use of materials such as filters and chemicals for water treatment can increase costs and also reduce the sustainability of some economizer solutions.

CONCLUSION
Uptime Institute experience has amply shown that, with careful evaluation, planning, and implementation, economizers can be effective at reducing energy use and costs and lowering energy consumption without sacrificing performance, availability, or Tier objectives. Even so, modern data centers have begun to see diminishing PUE returns overall, with many data centers experiencing a leveling off after initial gains. These and all facilities can find it valuable to consider whether investing in mechanical efficiency or broader IT efficiency measures such as server utilization and decommissioning will yield the most significant gains and greater holistic efficiencies.

Economizer solutions can introduce additional risks into the data center, where changes in operating modes increase the risk of equipment failure or operator error. These multi-modal systems are inherently more complex and have more components than traditional cooling solutions. In the event of a failure, operators must know how to manually isolate the equipment or transition modes to ensure critical cooling is maintained.

Any economizer solution must fit both the uptime requirement and business objective, especially if it uses newer technologies or was not originally designed for mission critical facilities. Equally important is ensuring that system selection and installation takes Tier requirements into consideration.

Many data centers with economizers have attained Tier Certification; however, in the majority of facilities, Uptime Institute consultants discovered flaws in the operational sequences or system installation during site inspections that were defeating Tier objectives. In all cases so far, the issues were correctible, but extra diligence is required.

Many economizer solutions are newer technologies, or new applications of existing technology outside of their original intended environment; therefore, careful attention should be paid to elements such as control systems to ensure compatibility with mission critical data center operation. Single shared control systems or mechanical system control components are a problem. A single controller, workstation, or weather sensor may fault or require removal from service for maintenance/upgrade over the lifespan of a data center. Neither the occurrence of a component fault nor taking a component offline for maintenance should impact critical cooling. These factors are particularly important when evaluating the impact of economizers on a facility’s Tier objective.

Despite the drawbacks and challenges of properly implementing and managing economizers, their increased use represents a trend for data center operational and ecological sustainability. For successful economizer implementation, designers and owners need to consider the overarching design objectives and data center objectives to ensure those are not compromised in pursuit of efficiency.


ECONOMIZER SUCCESS STORY

Digital Realty’s Profile Park facility in Ireland implemented compressor-less cooling by employing an indirect evaporative economizer, using technology adapted from commercial applications. The system is a success, but it took some careful consideration, adaptation, and fine-tuning to optimize the technology for a Tier III mission critical data center.

Figure 7. The unit operates as a scavenger air system (red area at left) taking the external air and running it across a media. That scavenger air is part of the evaporative process, with the air used to cool the media directly or cool the return air. This image shows summer operation where warm outside air is cooled by the addition of moisture. In winter, outside air cools the return air.

Figure 7. The unit operates as a scavenger air system (red area at left) taking the external air and running it across a media. That scavenger air is part of the evaporative process, with the air used to cool the media directly or cool the return air. This image shows summer operation where warm outside air is cooled by the addition of moisture. In winter, outside air cools the return air.

Achieving the desired energy savings first required attention to water (storage, treatment, and consumption). The water storage needs were significant—approximately 60,000 liters for 3.8-megawatts (MW), equivalent to about 50,000 gallons. Water treatment and filtration are critical in this type of system and was a significant challenge. The facility implemented very fine filtration at a particulate size of 1 micron (which is 10 times stricter than would typically be required for potable water). This type of indirect air system eliminates the need for chiller units but does require significant water pressure.

To achieve Tier III Certification, the system also had to be Concurrently Maintainable. Valves between the units and a loop format with many valves separating units, similar to what would be used with a chilled water system, helped the system meet the Concurrent Maintainability requirement. Two values in series are located between each unit on a bi-directional water loop (see Figure 7 and 8).

As with any installation that makes use of new technology, the facility required additional testing and operations sequence modification for a mission critical Tier III setting. For example, initially the units were overconsuming power, not responding to a loss of power as expected, and were draining all of the water when power was lost. After adjustments, the system performance was corrected.

II Klesner Figure 8a image007

Figure 8 (a and b). The system requires roughly twice the fan energy needed for a lot of typical rooftop units or CRACs but does not use a compressor refrigeration unit, which does reduce some of the energy use. Additionally, the fans themselves are high efficiency with optimized motors. Thus, while the facility has approximately twice the number of fans and twice the airflow, it can run many more of the small units more efficiently.

Figure 8 (a and b). The system requires roughly twice the fan energy needed for a lot of typical rooftop units or CRACs but does not use a compressor refrigeration unit, which does reduce some of the energy use. Additionally, the fans themselves are high efficiency with optimized motors. Thus, while the facility has approximately twice the number of fans and twice the airflow, it can run many more of the small units more efficiently.

Ultimately, this facility with its indirect cooling system was Tier III Certified, proving that it is possible to sustain mechanical cooling year-round without compressors. Digital Realty experienced a significant reduction in PUE with this solution, improving from 1.60 with chilled water to 1.15. With this anticipated annualized PUE reduction, the solution is expected to result in approximately €643,000 (US$711,000) in savings per year. Digital Realty was recognized with an Uptime Institute Brill Award for Efficient IT in 2014.


CHOOSING THE RIGHT ECONOMIZER SOLUTION FOR YOUR FACILITY

Organizations that are considering implementing economizers—whether retrofitting an existing facility or building a new one—have to look at a range of criteria. The specifications of any one facility need to be explored with mechanical, electrical, plumbing (MEP), and other vendors, but key factors to consider are:

Geographical area/climate: This is perhaps the most important factor in determining which economizer technologies are viable options for a facility. For example, direct outside air can be a very effective solution in northern locations that have an extended cold winter, and select industrial environments can preclude the use of outside air because of high pollutant content, other solutions will work better in tropical climates versus arid regions where water-side solutions are less appropriate.

New build or retrofit: Retrofitting an existing facility can eliminate available economizer options, usually due to space considerations but also because systems such as direct air plus evaporative and DX CRAC need to be incorporated at the design stage as part of the building envelope.

Supplier history: Beware of suppliers from other industries entering the data center space. Limited experience with mission critical functionality including utility loss restarts, control architecture, power consumption, and water consumption can mean systems need to be substantially modified to conform to 24 x 7 data center operating objectives. New suppliers are entering into the data center market, but consider which of them will be around for the long term before entering into any agreements to ensure parts supplies and skilled service capabilities will be available to maintain the system throughout its life cycle.

Financial considerations: Economizers have both CapEx and operating expense (OpEx) impact. Whether an organization wants to invest capital up front or focus on long-term operating budgets depends on the business objectives.

Some general CapEx/OpEx factors to keep in mind include:

• Select newer cooling technology systems are high cost, and thus require more up front CapEx.

• A low initial capital outlay with higher OpEx may be justified in some settings.

• Enterprise owners/operators should consider insertion of economizers into the capital project budget with long-term savings justifications.

ROI objectives: As an organization, what payback horizon is needed for significant PUE reduction? Is it one-two years, five years, or ten? The assumptions for the performance of economizer systems should utilize real-world savings, as expectations for annual hours of use and performance should be reduced from the best-case scenarios provided by suppliers. A simple payback model should be less than three to five years from the energy savings.

Depending on an organization’s status and location, it may be possible to utilize sustainability or alternate funding. When it comes to economizers, geography/climate, and ROI are typically the most significant decision factors. Uptime Institute’s FORCSS model can aid in evaluating the various economizer technology and deployment options, balancing Financial, Opportunity, Risk, Compliance, Sustainability, and Service Quality considerations (see more about FORCSS at https://journal.uptimeinstitute.com/introducing-uptime-institutes-forcss-system/).


Keith Klesner is Uptime Institute’s Vice President of Strategic Accounts. Mr. Klesner’s career in critical facilities spans 16 years and includes responsibilities ranging from planning, engineering, design, and construction to start-up and ongoing operation of data centers and mission critical facilities. He has a B.S. in Civil Engineering from the University of Colorado-Boulder and a MBA from the University of LaVerne. He maintains status as a professional engineer (PE) in Colorado and is a LEED Accredited Professional.

Tier Certification for Modular and Phased Construction

Special care must be taken on modular and phased construction projects to avoid compromising reliability goals. Shared system coordination could defeat your Tier Certification objective
By Chris Brown

Today, we often see data center owners taking a modular or phased construction approach to reduce the costs of design, construction, and operation and build time. Taking a modular or phased construction approach allows companies to make a smaller initial investment and to delay some capital expenditures by scaling capacity with business growth.

The modular and phased construction approaches bring some challenges, including the need for multiple design drawings for each phase, potential interruption of regular operations and systems during expansion, and the logistics of installation and commissioning alongside a live production environment. Meticulous planning can minimize the risks of downtime or disruption to operations and enable a facility to achieve the same high level of performance and resilience as conventionally built data centers. In fact, with appropriate planning in the design stage and by aligning Tier Certification with the commissioning process for each construction phase, data center owners can simultaneously reap the business and operating benefits of phased construction along with the risk management and reliability validation benefits of Tier Certification Constructed Facility (TCCF).

DEFINING MODULAR AND PHASED CONSTRUCTION
The terms modular construction and phased construction, though sometimes used interchangeably, are distinct. Both terms refer to the emerging practice of building production capacity in increments over time based on expanded need.

Figure 1. Phased construction allows for the addition of IT capacity over but time but relies on infrastructure design to support each additional IT increment.

Figure 1. Phased construction allows for the addition of IT capacity over but time but relies on infrastructure design to support each additional IT increment.

However, though all modular construction is by its nature phased, not all phased construction projects are modular. Uptime Institute classifies phased construction as any project in which critical capacity components are installed over time (see Figure 1). Such projects often include common distribution systems. Modular construction describes projects that add capacity in blocks over time, typically in repeated, sequential units, each with self-contained infrastructure sufficient to support the capacity of the expansion unit rather than accessing shared infrastructure (see Figure 2).

Figure 2. Modular design supports the IT capacity growth over time by allowing for separate and independent expansions of infrastructure.

Figure 2. Modular design supports the IT capacity growth over time by allowing for separate and independent expansions of infrastructure.

For example, a phased construction facility might be built with adequate electrical distribution systems and wiring to support the ultimate intended design capacity, with additional power supply added as needed to support growing IT load. Similarly, cooling piping systems might be constructed for the entire facility at the outset of a project, with additional pumps or chiller units added later, all using a shared distribution system.

Figure 3. Simplified modular electrical system with each phase utilizing independent equipment and distribution systems

Figure 3. Simplified modular electrical system with each phase utilizing independent equipment and distribution systems

For modular facilities, the design may specify an entire electrical system module that encompasses all the engine-generator sets, uninterruptible power supply (UPS) capacities, and associated distribution systems needed to support a given IT load. Then, for each incremental increase in capacity, the design may call for adding another separate and independent electrical system module to support the IT load growth. These two modules would operate independently, without sharing distribution systems (see Figure 3). Taking this same approach, a design may specify a smaller chiller, pump, piping, and an air handler to support a given heat load. Then, as load increases, the design would include the addition of another small chiller, pump, piping, and air handler to support the incremental heat load growth instead of adding onto the existing chilled water or piping system. In both examples, the expansion increments do not share distribution systems and therefore are distinct modules (see Figure 4).

Figure 4. Simplified modular mechanical system with each phase utilizing independent equipment and distribution systems expansions of infrastructure.

Figure 4. Simplified modular mechanical system with each phase utilizing independent equipment and distribution systems expansions of infrastructure.

CERTIFICATION IN A PHASED MODEL­: DESIGN THROUGH CONSTRUCTION
Organizations desiring a Tier Certified data center must first obtain Tier Certification of Design Documents (TCDD). For phased construction projects, the Tier Certification process culminates with TCCF after construction. (For conventional data center projects the Tier Certification process culminates in Tier Certification of Operational Sustainability.) TCCF validates the facility Tier level as it has been built and commissioned. It is not uncommon for multiple infrastructure and/or system elements to be altered during construction, which is why Tier Certification does not end with TCDD; a facility must undergo TCCF to ensure that the facility was built and performs as designed, without any alterations that would compromise its reliability. This applies whether a conventional, phased, or modular construction approach is used.

In a phased construction project, planning for Tier Certification begins in the design stage. To receive TCDD, Uptime Institute will review each phase and all design documents from the initial build through the final construction phase to ensure compliance with Tier Standards. All phases should meet the requirements for the Tier objective.

Certification of each incremental phase of the design depends on meaningful changes to data center capacity, meaningful being the key concept. For example, upgrading a mechanical system may increase cooling capacity, but if it does not increase processing capacity, it is not a meaningful increment. An upgrade to mechanical and/or electrical systems that expands a facility’s overall processing capacity would be considered a meaningful change and necessitate that a facility have its Certification updated.

In some cases, organizations may not yet have fully defined long-term construction phases that would enable Certification of the ultimate facility. In these situations, Uptime Institute will review design documents for only those phases that are fully defined for Tier Certification specific to those phases. Tier Certification (Tier I-IV) is limited to that specific phase alone. Knowing the desired endpoint is important: if Phase 1 and 2 of a facility do not meet Tier criteria, but subsequently Phase 3 does; then, completion of a TCCF review must wait until Phase 3 is finished.

TCCF includes a site visit with live functional demonstrations of all critical systems, which is typically completed immediately following commissioning. For a phased construction project, Tier Certification of the Phase 1 facility can be the same as Tier Certification for conventional (non-phased) projects in virtually all respects. In both cases, there is no live load at the time, allowing infrastructure demonstrations to be performed easily without risking interruption to the production environment.

Figure 5. Simplified phased electrical system with each additional phase adding equipment while sharing distribution components

Figure 5. Simplified phased electrical system with each additional phase adding equipment while sharing distribution components

The process for Tier Certification of later phases can be as easy as it is for Phase 1 or more difficult, depending on the construction approach. Truly modular expansion designs minimize risk during later phases of commissioning and TCCF because they do not rely on shared distribution systems. Because modules consist of independent, discrete systems, installing additional capacity segments over time does not put facility-wide systems at risk. However, when there is shared infrastructure, as in phased (not modular) projects, commissioning and TCCF can be more complex. Installing new capacity components on top of shared distribution paths, e.g., adding or upgrading an engine generator or UPS module, requires that all testing and demonstrations be repeated across the whole system. It’s important to ensure that all of the system settings work together, for example, verifying that all circuit breaker settings remain appropriate for the new capacity load, so that the new production load will not trip the breakers.

Pre-planning for later phases can help ensure a smooth commissioning and Tier Certification process even with shared infrastructure. As long as the design phases support a Tier Certification objective, there is no reason why phased construction projects cannot be Tier Certified.

COMMISSIONING AND TIER CERTIFICATION
TCCF demonstrations align with commissioning; both must be completed at the same stage (following installation, prior to live load). If a data center design allows full commissioning to be completed at each phase of construction, Tier Certification is achievable for both modular and non-modular phased projects. TCCF demonstrations would be done at the same expansion stages designated for the TCDD at the outset of the project.

For a modular installation, commissioning and Tier Certification demonstrations can be conducted as normal using load banks inside a common data hall, with relatively low risk. If not managed properly, load banks can direct hot air at server intakes, which would be the only significant risk. Obviously this risk can be prevented.

For phased installations that share infrastructure, later phases of commissioning and Tier Certification carry increased risk, because load banks are running in common data halls with shared distribution paths and capacity systems that are supporting a concurrent live load. The best way to reduce the risks of later phase commissioning and Tier Certification is to conduct demonstrations as early in the Certification as possible.

Figure 6. Simplified phased mechanical system with each additional phase adding equipment while sharing distribution components

Figure 6. Simplified phased mechanical system with each additional phase adding equipment while sharing distribution components

Shared critical infrastructure distribution systems included in the initial phase of construction can be commissioned and Tier Certified at full (planned) capacity during the initial TCCF review, so these demonstrations can be front loaded and will not need to be repeated at future expansion phases.

The case studies offer examples of how two data centers approached the process of incorporating phased construction practices without sacrificing Tier Certification vital to supporting their business and operating objectives.

CONCLUSION
Modular and phased construction approaches can be less expensive at each phase and require less up-front capital than traditional construction, but installing equipment that is outside of that specified for the TCDD or beyond the capacity of the TCCF demonstrations puts not only the Tier Certification at risk, but the entire operation. Tier Certification remains valid only until there has been a change to the infrastructure. Beyond that, regardless of an organization’s Tier objective, if construction phases are designed and built in a manner that prevents effective commissioning, then there are greater problems than the status of Tier Certification.
A data center that cannot be commissioned at the completion of a phase incurs increased risk of downtime or system error for that phase of operation and all later phases. Successful commissioning and Tier Certification of phased or modular projects requires thinking through the business and operational impacts of the design philosophy and the decisions made regarding facility expansion strategies. Design decisions must be made with an understanding of which factors are and are not consistent with achieving the Tier Certification‹these are essentially the same factors that allow commissioning. In cases where a facility expansion or system upgrade cannot be Tier Certified, Uptime Institute often sees that is usually the result of limitations inherent in the design of the facility or due to business choices that were made long before.

It is incumbent upon organizations to think through not only the business rationale but also the potential operational impacts of various design and construction choices. Organizations can simultaneously protect their data center investment and achieve the Tier Certification level that supports the business and operating mission‹including modular and phased construction plans‹ by properly anticipating the need for commissioning in Phase 2 and beyond.

Planning design and construction activities to allow for commissioning greatly reduces the organization¹s overall risk. TCCF is the formal validation of the reliability of the built facility.


Case Study: Tier III Certification of Constructed Facility: Phased Construction
An organization planned a South African Tier III facility capital infrastructure project in two build phases, with a shared infrastructure (i.e., non-modular, phased construction). The original design drawings specified two chilled-water plants: an air-cooled chiller plant and an absorption chiller plant, although, the absorption chiller plant was not installed initially due to a limited natural gas supply. The chilled-water system piping was installed up front, and connected to the air-cooled chiller plant. Two air-cooled chillers capable of supporting the facility load were then installed.

The organization installed all the data hall air-handling units (AHUs), including two Kyoto Cooling AHUs, on day one. Because the Kyoto AHUs would be very difficult to install once the facility was built, the facility was essentially designed around them. In other words, it was more cost effective to install both AHUs during the initial construction phase, even if their full capacity would not be reached until after Phase 2.

The facility design utilizes a common infrastructure with a single data hall. Phase 1 called for installing 154 kilowatts (kW) of IT capacity; an additional 306 kW of capacity would be added in Phase 2 for a total planned capacity of 460 kW. Phase 1 TCCF demonstrations were conducted first for the 154 kW of IT load that the facility would be supporting initially. In order to minimize the risk to IT assets when Phase 2 TCCF demonstrations are performed, the commissioning team next demonstrated both AHUs at full capacity. They increased the loading on the data hall to a full 460 kW, successfully demonstrating that the AHUs could support that load in accordance with Tier III requirements.

For Tier Certification of Phase 2, the facility will have to demonstrate that the overall chilled water piping system and additional electrical systems would support the full 460-kW capacity, but they will not have to demonstrate the AHUs again. During Phase 1 demonstrations, the chillers and engine generators ran at N capacity (both units operating) to provide ample power and cooling to show that the AHUs could support 460 kW in a Concurrently Maintainable manner. The Phase 2 demonstrations will not require placing extra load on the UPS, but they did test the effects of putting more load into the data hall and possibly raising the temperature for the systems under live load.


Case Study: Tier III Expanded to Tier IV
The design for a U.S.-based cloud data center validated as a Tier III Certified Constructed Facility after the first construction phase calls for a second construction phase and relies on a common infrastructure (i.e., non-modular, phased construction). The ultimate business objective for the facility is Tier IV, and the facility design supports those objectives. The organization was reluctant to make expenditures on the mechanical UPS required to provide Continuous Cooling for the full capacity of the center until it had secured a client that required Tier IV performance, which would then justify the capital investment in increasing cooling capacity.

The organization was only able to achieve this staged Tier expansion because it worked with Uptime Institute consultants to plan both phases and the Tier demonstrations. For Phase 1, the organization installed all systems and infrastructure needed to support a Tier IV operation, except for the mechanical UPS, thus the Tier Certification objective for Phase 1 was to attain Tier III. Phase 1 Tier Certification included all of the required demonstrations normally conducted to validate Tier III, with load banks located in the data hall. Additionally, because all systems except for the mechanical UPS were already installed, Uptime Institute was able to observe all of the demonstrations that would normally be required for Tier IV TCCF, with the exception of Continuous Cooling.

As a result when the facility is ready to proceed with the Phase 2 expansion, the only demonstrations required to qualify for Tier IV TCCF will be Continuous Cooling. The organization will have to locate load banks within the data hall but will not be required to power those load banks from the IT UPS nor simulate faults on the IT UPS system because that capability has already been satisfactorily observed. Thus, the organization can avoid any risk of interruption to the live customer load the facility will have in place during Phase 2.

The Tier III Certification of Constructed Facility demonstrations require Concurrent Maintainability. The data center must be able to provide baseline power and cooling capacity in each and every maintenance configuration required to operate and maintain the site for an indefinite period. The topology and procedures to isolate each and every component for maintenance, repair, or replacement without affecting the baseline power and cooling capacity in the computer rooms should be in place, with a summary load of 750 kW of critical IT load spread across the data hall. All other house and infrastructure loads required to sustain the baseline load must also be supported in parallel with, and without affecting, the baseline computer room load.

Tier Certification requirements are cumulative; Tier IV encompasses Concurrent Maintainability, with the additional requirements of Fault Tolerance and Continuous Cooling. To demonstrate Fault Tolerance, a facility must have the systems and redundancy in place so that a single failure of a capacity system, capacity component, or distribution element will not impact the IT equipment. The organization must demonstrate that the system automatically responds to a failure to prevent further impact to the site operations. Assessing Continuous Cooling capabilities require demonstrations of computer room air conditioning (CRAC) units under various conditions and simulated fault situations.


Chris Brown

Chris Brown

Christopher Brown joined Uptime Institute in 2010 and currently serves as Vice President, Global Standards and is the Global Tier Authority. He manages the technical standards for which Uptime Institute delivers services and ensures the technical delivery staff is properly trained and prepared to deliver the services. Mr. Brown continues to actively participate in the technical services delivery including Tier Certifications, site infrastructure audits, and custom strategic-level consulting engagements.

 

Arc Flash Mitigation in the Data Center

Meeting OSHA and NFPA 70E arc flash safety requirements while balancing prevention and production demands
By Ed Rafter

Uninterruptible uptime, 24 x 7, zero downtime…these are some of the terms that characterize data center business goals for IT clients. Given these demands, facility managers and technicians in the industry are skilled at managing the infrastructure that supports these goals, including essential electrical and mechanical systems that are paramount to maintaining the availability of business-critical systems.

Electrical accidents such as arc flash occur all too often in facility environments that have high-energy use requirements, a multitude of high-voltage electrical systems and components, and frequent maintenance and equipment installation activities. A series of stringent standards with limited published exceptions govern work on these systems and associated equipment. The U.S. Occupational Safety and Health Administration (OSHA) and National Fire Protection Association (NFPA) Standard 70E set safety and operating requirements to prevent arc flash and electric shock accidents in the workplace. Many other countries have similar regulatory requirements for electrical safety in the workplace.

When these accidents occur they can derail operations and cause serious harm to workers and equipment. Costs to businesses can include lost work time, downtime, OSHA investigation, fines, medical costs, litigation, lost business, equipment damage, and most tragically, loss of life. According to the Workplace Safety Awareness Council (WPSAC), the average cost of hospitalization for electrical accidents is US$750,000, with many exceeding US$1,000,000.

There are reasonable steps data center operators can—and must—take to ensure the safety of personnel, facilities, and equipment. These steps offer a threefold benefit: the same measures taken to protect personnel also serve to protect infrastructure, and thus protect data center operations.

Across all industries, many accidents are caused by basic mistakes, for example, electrical workers not being properly prepared, working on opened equipment that was not well understood, or magnifying risks through a lack of due diligence. Data center operators, however, are already attuned to the discipline and planning it takes to run and maintain high-availability environments.

While complying with OSHA and NFPA 70E requirements may seem daunting at first, the maintenance and operating standards in place at many data centers enable this industry to effectively meet the challenge of adhering to these mandates. The performance and rigor required to maintain 24 x 7 reliability means the gap between current industry practices and the requirements of these regulatory standards is smaller than it might at first appear, allowing data centers to balance safety with the demands of mission critical production environments.

In this article we describe arc flash and electrical safety issues, provide an overview of the essential measures data centers must follow to meet OSHA and NFPA 70E requirements, and discuss how many of the existing operational practices and adherence to Tier Standards already places many data centers well along the road to compliance.

Figure 1. An arc flash explosion demonstration. Source: Open Electrical

Figure 1. An arc flash explosion demonstration. Source: Open Electrical

UNDERSTANDING ARC FLASH

Arc flash is a discharge of electrical energy characterized by an explosion that generates light, noise, shockwave, and heat. OSHA defines it as “a phenomenon where a flashover of electric current leaves its intended path and travels through the air from one conductor to another, or to ground (see Figure 1). The results are often violent and when a human is in close proximity to the arc flash, serious injury and even death can occur.” The resulting radiation and shrapnel can cause severe skin burns and eye injuries, and pressure waves can have enough explosive force to propel people and objects across a room and cause lung and hearing damage. OSHA reports that up to 80% of all “qualified” electrical worker injuries and fatalities are not due to shock (electrical current passing through the body) but to external burn injuries caused by the intense radiant heat and energy of an arc fault/arc blast.1

An arc flash results from an arcing electrical fault, which can be caused by dust particles in the air, moisture condensation or corrosion on electrical/mechanical components, material failure, or by human factors such as improper electrical system design, faulty installation, negligent maintenance procedures, dropped tools, or accidentally touching a live electrical circuit. In short, there are numerous opportunities for arc flash to occur in industrial settings, especially those in which there is inconsistency or a lack of adherence to rigorous maintenance, training, and operating procedures.

Variables that affect the power of an arc flash are amperage, voltage, the distance of the arc gap, closure time, three-phase vs. single-phase circuit, and being in a confined space. The power of an arc at the flash location, the distance a worker is from the arc, and the time duration of their exposure to the arc will all affect the extent of skin damage. The WPSAC reports that fatal burns can occur even at distances greater than 10 feet (ft) from an arc location, in fact, serious injury and fatalities can occur up to 20 ft away. The majority of hospital admissions for electrical accidents are due to arc flash burns, with 30,000 arc incidents and 7,000 people suffering burn injuries per year, 2,000 of those requiring admission to burn centers with severe arc flash burns.2

The severity of an arc flash incident is determined by several factors, including temperature, the available fault current, and the time for a circuit to break. The total clearing time of the overcurrent protective device during a fault is not necessarily linear, as lower fault currents can sometimes result in a breaker or fuse taking longer to clear, thus extending the arc duration and thereby raising the arc flash energy.

Unlike the bolted fault (in which high current flows through a solid conductive material typically tripping a circuit breaker or protective device), an arcing fault uses ionized air as a conductor, with current jumping a gap between two conductive objects. The cause of the fault normally burns away during the initial flash, but a highly conductive, intensely hot plasma arc established by the initial arc sustains the event. Arc flash temperatures can easily reach 14,000–16,000°F (7,760–8,870°C) with some projections as high as 35,000°F (19,400°C)—more than three times hotter than the surface of the sun.

These temperatures can be reached by an arc fault event in as little as a few seconds or even a few cycles. The heat generated by the high current flow may melt or vaporize the conductive material and create an arc characterized by a brilliant flash, intense heat, and a fast-moving pressure wave that propels the arcing products. The pressure of an arc blast [up to 2,000 pounds/square foot (9765 kilograms/square meter)] is due to the expansion of the metal as it vaporizes and the heating of the air by the arc. This accounts for the expulsion of molten metal up to 10 ft away. Given these extremes of heat and energy, arc flashes often cause fires, which can rapidly spread through a facility.

INDUSTRY STANDARDS AND REGULATIONS
To prevent these kinds of accidents and injuries, it is imperative that data center operators understand and follow appropriate safety standards for working with electrical equipment. Both the NFPA and OSHA have established standards and regulations that help protect workers against electrical hazards and prevent electrical accidents in the workplace.

OSHA is a federal agency (part of the U.S. Department of Labor) that ensures safe and healthy working conditions for Americans by enforcing standards and providing workplace safety training. OSHA 29 CFR Part 1910, Subpart S and OSHA 29 CFR Part 1926, Subpart K include requirements for electrical installation, equipment, safety-related work practices, and maintenance for general industry and construction workplaces, including data centers.

NFPA 70E is a set of detailed standards (issued at the request of OSHA and updated periodically) that address electrical safety in the workplace. It covers safe work practices associated with electrical tasks and safe work practices for performing other non-electrical tasks that may expose an employee to electrical hazards. OSHA revised its electrical standard to reference NFPA 70E-2000 and continues to recognize NFPA 70E today.

The OSHA standard outlines prevention and control measures for hazardous energies including electrical, mechanical, hydraulic, pneumatic, chemical, thermal, and other energy sources.  OSHA requires that facilities:

•   Provide and be able to demonstrate a safety program with defined responsibilities.

•   Calculate the degree of arc flash hazard.

•   Use correct personal protective equipment (PPE) for workers.

•   Train workers on the hazards of arc flash.

•   Use appropriate tools for safe working.

•   Provide warning labels on equipment.

NFPA 70E further defines “electrically safe work conditions” to mean that equipment is not and cannot be energized. To ensure these conditions, personnel must identify all power sources, interrupt the load and disconnect power, visually verify that a disconnect has opened the circuit, lock out and tag the circuit, test for absence of voltage, and ground all power conductors, if necessary.

LOCKOUT/TAGOUT
Most data center technicians will be familiar with lockout and tagging procedures for disabling machinery or equipment. A single qualified individual should be responsible for de-energizing one set of conditions (unqualified personnel should never perform lockout/tagout, work on energized equipment, or enter high risk areas). An appropriate lockout or tagout device should be affixed to the de-energized equipment identifying the responsible individual (see Figure 2).

Figure 2. Equipment lockout/tagout

Figure 2. Equipment lockout/tagout

OVERVIEW: WORKING ON ENERGIZED EQUIPMENT
As the WPSAC states, “the most effective and foolproof way to eliminate the risk of electrical shock or arc flash is to simply de-energize the equipment.” However, both NFPA 70E and OSHA clarify that working “hot” (on live, energized systems) is allowed within the set safety limits on voltage exposures, work zone boundary requirements, and other measures to take in these instances. Required safety elements include personnel qualifications, hazard analysis, protective boundaries, and the use of PPE by workers.

Only qualified persons should work on electrical conductors or circuit parts that have not been put into an electrically safe work condition. A qualified person is one who has received training in and possesses skills and knowledge in the construction and operation of electric equipment and installations and the hazards involved with this type of work. Knowledge or training should encompass the skill to distinguish exposed live parts from other parts of electric equipment, determine the nominal voltage of exposed live parts, and calculate the necessary clearance distances and the corresponding voltages to which a worker will be exposed.

An arc flash hazard analysis for any work must be conducted to determine the appropriate arc flash boundary, the incident energy at the working distance, and the necessary protective equipment for the task. Arc flash is measured in thermal energy units of calories per square centimeter (calories/cm2) and arc flash analysis is referred to as the incident energy of the circuit. Incident energy is both radiant and convective. It is inversely proportional to the working distance squared and directly proportional to the time duration of the arc and to the available bolted fault current. Time has a greater effect on intensity than the available bolted fault current.

The incident energy and flash protection boundary are both calculated in an arc flash hazard analysis. There are two calculation methods, one outlined in NFPA 70E-2012 Annex D and the other in Institute of Electrical and Electronics Engineers (IEEE) Standard 1584.

In practice, to calculate the arc flash (incident energy) at a location, the amount of fault current and the amount of time it takes for the upstream device to trip must be known. A data center should model the distribution system into a software program such as SKM Power System Analysis, calculate the short circuit fault current levels and use the protective device settings feeding switchboards, panelboards, industrial control panels, and motor control centers to determine the incident energy level.

BOUNDARIES
NFPA has defined several protection boundaries: Limited Approach, Restricted, and Prohibited. The intent of NFPA 70E regarding arc flash is to provide guidelines that will limit injury to the onset of second degree burns. Where these boundaries are drawn for any specific task is based on the employee’s level of training, the use of PPE, and the voltage of the energized equipment (see Figure 3).

Figure 3. Protection boundaries. Source: Open Electrical

Figure 3. Protection boundaries. Source: Open Electrical

The closer a worker approaches an exposed, energized conductor or circuit part the greater the chance of inadvertent contact and the more severe the injury that an arc flash is likely to cause that person. When an energized conductor is exposed, the worker may not approach closer than the flash boundary without wearing appropriate personal protective clothing and PPE.

IEEE defines Flash Protection Boundary as “an approach limit at a distance from live parts operating at 50 V or more that are un-insulated or exposed within which a person could receive a second degree burn.” NFPA defines approach boundaries and workspaces as shown in Figure 4. See the sidebar Protection Boundary Definitions.

Figure 4. PPE: typical arc flash suit. Source: Open Electrical

Figure 4. PPE: typical arc flash suit. Source: Open Electrical

Calculating the specific boundaries for any given piece of machinery, equipment, or electrical component can be done using a variety of methods, including referencing NFPA tables (easiest to do but the least accurate) or using established formulas, an approach calculator tool (provided by IEEE), or one of the software packages available for this purpose.

PROTECTIVE EQUIPMENT
NFPA 70E outlines strict standards for the type of PPE required for any employees working in areas where electrical hazards are present based on the task, the parts of the body that need protection, and the suitable arc rating to match the potential flash exposure. PPE includes items such as a flash suit, switching coat, mask, hood, gloves, and leather protectors. Flame -resistant clothing underneath the PPE gear is also required.

After an arc flash hazard analysis has been performed, the correct PPE can be selected according to the equipment’s arc thermal performance exposure value (ATPV) and the break open threshold energy rating (EBT). Together, these components determine the calculated hazard level that any piece of equipment is capable of protecting a worker from (measured in calories per square centimeter). For example, a hard hat with an attached face shield provides adequate protection for Hazard/Risk Category 2, whereas an arc flash protection hood is needed for a worker exposed to Hazard/Risk Category 4.

PPE is the last line of defense in an arc flash incident; it’s not intended to prevent all injuries, but to mitigate the impact of a flash, should one occur. In many cases, the use of PPE has saved lives or prevented serious injury.

OTHER SAFETY MEASURES
Additional safety-related practices for working on energized systems could include conducting a pre-work job briefing, using insulated tools, having a written safety program, and flash hazard labeling (labels should indicate the flash hazard boundaries for a piece of equipment, and the PPE needed to work within those boundaries), and completing an Energized Electrical Work Permit. According to NFPA, an Energized Electrical Work Permit is required for a task when live parts over 50 volts are involved. The permit outlines conditions and work practices needed to protect employees from arc flash or contact with live parts, and includes the following information:

•   Circuit, equipment, and location

•   Reason for working while energized

•   Shock and arc flash hazard analysis

•   Safe work practices

•   Approach boundaries

•   Required PPE and tools

•   Access control

•   Proof of job briefing.

DECIDING WHEN TO WORK HOT
NFPA 70E and OSHA require employers to prove that working in a de-energized state creates more or worse hazards than the risk presented by working on live components or is not practical because of equipment design or operational limitations, for example, when working on circuits that are part of a continuous process that cannot be completely shut down. Other exceptions include situations in which isolating and deactivating system components would create a hazard for people not associated with the work, for example, when working on life-support systems, emergency alarm systems, ventilation equipment for hazardous locations, or extinguishing illumination for an area.

In addition, OSHA makes provision for situations in which it would be “infeasible” to shut down equipment, for example, some maintenance and testing operations can only be done on live electric circuits or equipment. The decision to work hot should only be made after careful analysis of the determination of what constitutes infeasibility. In recent years, some well publicized OSHA actions and statements have centered on the matter of how to interpret this term.

ELECTRICAL SAFETY MEASURES IN PRACTICE
Many operational and maintenance practices will help minimize the potential for arc flash, reduce the incident energy or arcing time, or move the worker away from the energy source. In fact, many of these practices are consistent with the rigorous operational and maintenance processes and procedures of a mission-critical data center.

Although the electrical industry is aware of the risks of arc flash, according to the National Institute for Occupational Safety and Health, the biggest worksite personnel hazard is still electrical shock in all but the construction and utility industries. In his presentation at an IEEE-Industry Applications Society (IAS) workshop, Ken Mastrullo of the NFPA compared OSHA 1910 Subpart S citations versus accidents and fatalities between 1 Oct. 2003, and 30 Sept. 2004. Installations accounted for 80% of the citations, while safe work practice issues were cited 20% of the time. However, installations accounted for 9% of the accidents, while safe work practice issues accounted for 91% of all electrical-related accidents. Looking at Mastrullo’s data, while the majority of the OSHA citations were for installation issues, the majority of the injuries were related to work practice issues.

Can OSHA cite you as a company that does not comply with NFPA 70E? The simple answer is: Yes. If employees are involved in a serious electrical incident, OSHA likely will present the employer/owner with several citations. In fact, OSHA assessed more than 2,880 fines between 2007–2011 for sites not meeting Regulation 1910.132(d), averaging 1.5 fines a day.

On the other hand, an OSHA inspection may actually help uncover issues. A May 2012 study of 800 California companies found that those receiving an inspection saw a decline of 9.4% in injuries. On average, these companies saved US$350,000 over the five years following the OSHA inspections,3 an outcome far preferable to being fined for noncompliance or experiencing an electrical accident. Beyond the matter of fines, however, any organization that wishes to effectively avoid putting its personnel in danger—and endangering infrastructure and operations—should endeavor to follow NFPA 70E guidelines (or their regional equivalent).

REDUCING ARC FLASH HAZARDS IN THE FACILITY
While personnel-oriented safety measures are the most important (and mandated) steps to reduce the risk of arc flash accidents, there are numerous equipment and component elements that can be incorporated into facility systems that also help reduce the risk. These include metal-clad switchgear, arc resistant switchgear, current-limiter power circuit breakers, and current-limiting reactors. Setting up zone selective interlocking of circuit breakers can also be an effective prevention measure.

TIER STANDARDS & DATA CENTER PRACTICES ALIGN WITH ARC FAULT PREVENTION
Data centers are already ahead of many industries in conforming to many provisions of OSHA and NFPA 70E. Many electrical accidents are caused by issues such as dust in the environment, improper equipment installation, and human factors. To maintain the performance and reliability demanded by customers, data center operators have adopted a rigorous approach to cleaning, maintenance, installation, training, and other tasks that forestall arc flash. Organizations that subscribe to Tier standards and maintain stringent operational practices are better prepared to take on the challenges of compliance with OSHA and NFPA 70E requirements, in particular the requirements for safely performing work on energized systems, when such work is allowed per the safety standards.

For example, commissioning procedures eliminate the risk of improper installation. Periodically load testing engine generators and UPS systems demonstrates that equipment capacity is available and helps identify out-of-tolerance conditions that are indicative of degrading hardware or calibration and alignment issues. Thermographic scanning of equipment, distribution boards, and conduction paths can identify loose or degraded connections before they reach a point of critical failure.

Adherence to rigorous processes and procedures helps avoid operator error and are tools used in personnel training and refresher classes. Facility and equipment design and capabilities, maintenance programs, and operating procedures are typically well defined and established in a mission critical data center, especially those at a Tier III or Tier IV Certification level.

Beyond the Tier Topology, the operational requirements for every infrastructure classification, as defined in the Tier Standard: Operational Sustainability, include the implementation of processes and procedures for all work activities. Completing comprehensive predictive and preventive maintenance increases reliability, which in turn improves availability. Methods of procedure are generally very detailed and task specific. Maintenance technicians meet stringent qualifications to perform work activities. Training is essential, and planning, practice, and preparation are key to managing an effective data center facility.

This industry focus on rigor and reliability in both systems and operational practices, reinforced by the Tier Standards, will enable data center teams to rapidly adopt and adhere to the practices required for compliance with OSHA and NFPA 70E. What still remains in question is whether or not a data center meets the infeasibility test prescribed by these governing bodies in either the equipment design or operational limitations.

It can be argued that some of today’s data center operations approach the status of being “essential” for much of the underlying infrastructure that runs our 24x 7 digitized society. Data centers support the functioning of global financial systems, power grids and utilities, air traffic control operations, communication networks, and the information processing that support vital activities ranging from daily commerce to national security. Each facility must assess its operations and system capabilities to enable adherence to safe electrical work practices as much as possible without jeopardizing critical mission functions. In many cases, it may become a jurisdictional decision as to the answer for a specific data center business requirement.

No measure will ever completely remove the risk of working on live, energized equipment. In instances where working on live systems is necessary and allowed by NFPA 70E rules, the application of Uptime Institute Tier III and Tier IV criteria can help minimize the risks. Tier III and IV both require the design and installation of systems that enable equipment to be fully de-energized to allow planned activities such as repair, maintenance, replacement, or upgrade without exposing personnel to the risks of working on energized electrical equipment

CONCLUSION
Over the last several decades, data centers and the information processing power they provide has become a fundamental necessity in our global, interconnected society. Balancing the need for appropriate electrical
safety measures and compliance with the need to maintain and sustain uninterrupted production capacity in an energy-intensive environment is a challenge. But it is a challenge the data center industry is perhaps better prepared to meet than many other industry segments. It is apparent that those in the data center industry who subscribe to high-availability concepts such as the Tier Standards: Topology and Operational Sustainability are in a position to readily meet the requirements of NFPA 70E and OSHA from an execution perspective.


 

SIDEBAR: PROTECTION BOUNDARY DEFINITIONS
The flash protection boundary is the closest approach allowed by qualified or unqualified persons without the use of PPE. If the flash protection boundary is crossed, PPE must be worn. The boundary is a calculated number based upon several factors such as voltage, available fault current, and time for the protective device to operate and clear the fault. It is defined as the distance at which the worker is exposed to 1.2 cal/cm2 for 0.1 second.

LIMITED APPROACH BOUNDARY
The limited approach boundary is the minimum distance from the energized item where untrained personnel may safely stand. No unqualified (untrained) personnel may approach any closer to the energized item than this boundary. The boundary is determined by NFPA 70E Table 130.4-(1) (2) (3) and is based on the voltage of the equipment (2012 Edition).

RESTRICTED APPROACH BOUNDARY
The restricted approach boundary is the distance where qualified personnel may not cross without wearing appropriate PPE. In addition, they must have a written approved plan for the work that they will perform. This boundary is determined from NFPA Table 130.4-(1) (4)  (2012 Edition) and is based on the voltage of the equipment.

PROHIBITED APPROACH BOUNDARY
Only qualified personnel wearing appropriate PPE can cross a prohibited approach boundary. Crossing this boundary is considered the same as contacting the exposed energized part. Therefore, personnel must obtain a risk assessment before the prohibited boundary is crossed. This boundary is determined by NFPA 70E Table 130.4-(1) (5)  (2012 Edition) and is based upon the voltage of the equipment.


Ed Rafter

Ed Rafter

Edward P. Rafter has been a consultant to Uptime Institute Professional Services (ComputerSite Engineering) since 1999 and assumed a full time position with Uptime Institute in 2013 as principal of Education and Training. He currently serves as vice president-Technology. Mr. Rafter is responsible for the daily management and direction of the professional education staff to deliver all Uptime Institute training services. This includes managing the activities of the faculty/staff delivering the Accredited Tier Designer (ATD) and Accredited Tier Specialist (ATS) programs, and any other courses to be developed and delivered by Uptime Institute.

 

ADDITIONAL RESOURCES
To review the complete NFPA-70E standards as set forth in NFPA 70E: Standard For Electrical Safety In The Workplace, visit www.NFPA.org

For resources to assist with calculating flash protection boundaries, visit:

•   http://www.littelfuse.com/arccalc/calc.html

•   http://www.pnl.gov/contracts/esh-procedures/forms/sp00e230.xls

•   http:www/bussmann.com/arcflash/index.aspx

To determine what PPE is required, the tables in NFPA 70E-2012 provide the simplest methods for determining PPE requirements. They provide instant answers with almost no field data needed. The tables provide limited application and are conservative for most applications (the tables are not intended as a substitution for an arc hazard analysis but only as a guide).

A simplified two-category PPE approach is found in NFPA 70E-2012, Table H-2 of Annex H. This table ensures adequate PPE for electrical workers within facilities with large and diverse electrical systems. Other good resources include:

•   Controlling Electrical Hazards. OSHA Publication 3076, (2002). 71 pages. Provides a basic overview
of basic electrical safety on the job, including information on how electricity works, how to protect
against electricity, and how OSHA cab help.

•   Electrical Safety: Safety and Health for Electrical Trades Student Manual, U.S. Department of Health and
Human Services (DHHS). National Institute for Occupational Safety and Health (NIOSH), Publication
No. 2002-123, (2002, January). This student manual is part of a safety and health curriculum for
secondary and post-secondary electrical trades courses. It is designed to engage the learner in
recognizing, electrical, and controlling hazards associated with electrical work.

•   Electrocutions Fatality Investigation Reports. National Institute for Occupational Safety and Health
(NIOSH) Safety and Health Topic. Provides information regarding hundreds of fatal incidents involving
electrocutions investigated by NIOSH and state investigators.

•   Working Safely with Electricity. OSHA Fact sheet. Provides safety information on working with
generators, power lines, extension cords, and electrical equipment.

•   Lockout/Tagout OSHA Fact Sheet, (2002).

•   Lockout-Tagout Interactive Training Program. OSHA. Includes selected references for training and
interactive case studies.

•   NIOSH Arc Flash Awareness, NIOSH Publication No. 2007-116D.

ENDNOTES
1.  http://www.arcsafety.com/resources/arc-flash-statistics

2. Common Electrical Hazards in the Workplace including Arc Flash, Workplace Safety Awareness Council (www.wpsac.org), produced under Grant SH-16615-07-60-F-12 from the Occupational Safety and Health Administration, U.S. Department of Labor.

3. “The Business Case For Safety and Health,” U.S. Department of Labor, https://www.osha.gov/dcsp/products/topics/businesscase/

Failure Doesn’t Keep Business Hours: 24×7 Coverage

A statistical justification for 24×7 coverage
By Richard Van Loo

As a result of performing numerous operational assessments at data centers around the world, Uptime Institute has observed that staffing levels at data centers vary greatly from site to site. This observation is discouraging, but not surprising, because while staffing is an important function for data centers attempting to maintain operational excellence, many factors influence an organization’s decision on appropriate staffing levels.

Factors that can affect overall staffing numbers include the complexity of the data center, the level of IT turnover, the number of support activity hours required, the number of vendors contracted to support operations, and business objectives for availability. Cost is also a concern because each staff member represents a direct cost. Because of these numerous factors, data center staffing levels must be constantly reviewed in an attempt to achieve effective data center support at a reasonable cost.

Uptime Institute is often asked, “What is the proper staffing level for my data center.” Unfortunately, there is no quick answer that works for every data center since proper staffing depends on a number of variables.

The time required to perform maintenance tasks and provide shift coverage support are two basic variables. Staffing for maintenance hours requirements is relatively fixed, but affected by which activities are performed by data center personnel and which are performed by vendors. Shift coverage support is defined as staffing for data center monitoring and rounds and for responding to any incidents or events. Staffing levels to support shift coverage can be provided in a number of different ways. Each method of providing shift coverage has potential impacts on operations depending on how that that coverage is focused.

TRENDS IN SHIFT COVERAGE
The primary purpose of having qualified personnel on site is to mitigate the risk of an outage caused by abnormal incidents or events, either by preventing the incident or containing and isolating the incident or event and keeping it from spreading or impacting other systems. Many data centers still support data shift presence with a team of qualified electricians, mechanics, and other technicians who provide 24 x 7 shift coverage. Remote monitoring technology, designs that incorporate redundancy, campus data center environments, the desire to balance costs, and other practices can lead organizations to deploy personnel differently.

Managing shift presence without having qualified personnel on site at all times can elevate risks due to delayed response to abnormal incidents. Ultimately, the acceptable level of risk must be a company decision.

Other shift presence models include:

• Training security personnel to respond to alarms and execute an escalation procedure

• Monitoring the data center through a local or regional building monitoring system (BMS) and having technicians on call

• Having personnel on site during normal business hours and on call during nights and weekends

• Operating multiple data centers as a campus or portfolio so that a team supports multiple data centers without necessarily being on site at each individual data center at a given time

These and other models have to be individually assessed for effectiveness. To assess the effectiveness of any shift presence model, the data center must determine the potential risks of incidents to the operations of the data center and the impact on the business.

For the last 20 years, Uptime Institute has built the Abnormal Incident Reports (AIRs) database using information reported by Uptime Institute Network members. Uptime Institute analyzes the data annually and reports its findings to Network members. The AIRs database provides interesting insights relating to staffing concerns and effective staffing models.

INCIDENTS OCCUR OUTSIDE BUSINESS HOURS
In 2013, a slight majority of incidents (out of 277 total incidents) occurred during normal business hours. However, 44% of incidents happened between midnight and 8:00 a.m., which underscores the potential need for 24 x 7 coverage (see Figure 1).

Figure 1. Approximately half the AIRs that occurred in 2013 took place occurred between 8 a.m. and 12 p.m., the other half between 12 a.m. and 8 a.m.

Figure 1. Approximately half the AIRs that occurred in 2013 took place occurred between 8 a.m. and 12 p.m., the other half between 12 a.m. and 8 a.m.

Similarly, incidents can happen at any time of the year. As a result, focusing shift presence activities toward a certain time of year over others would not be productive. Incident occurrence is pretty evenly spread out over the year.

Figure 2 details the day of the week when incidents occurred. The chart shows that incidents occur on nearly an equal basis every day of the week, which suggests that shift presence requirement levels should be the same every day of the week. To do otherwise would leave shifts with little or no shift presence to mitigate risks. This is an important finding because some data centers focus their shift presence support Monday through Friday and leave weekends to more remote monitoring (see Figure 2).

Figure 2. Data center staff must be ready every day of the week.

Figure 2. Data center staff must be ready every day of the week.

INCIDENTS BY INDUSTRY
Figure 3 further breaks down the incidents by industry and shows no significant difference in those trends between industries. The chart does show that the financial services industry reported far more incidents than other industries, but that number reflects the makeup of the sample more than anything.

003

Figure 3. Incidents in data centers take place all year round.

INCIDENT BREAKDOWNS

Knowing when incidents occur does little to say what personnel should be on site. Knowing what kinds of incidents occur most often will help shape the composition of the on-site staff, as will knowing how incidents are most often identified. Figure 4 shows that electrical systems experience the most incidents, followed by mechanical systems. By contrast, critical IT load causes relatively few incidents.

Figure 4. More than half the AIRs reported in 2013 involved the electrical system.

Figure 4. More than half the AIRs reported in 2013 involved the electrical system.

As a result, it would seem to make sense that shift presence teams should have sufficient electrical experience to respond to the most common incidents. The shift presence team must also respond to other types of incidents, but cross training electrical staff in mechanical and building systems might provide sufficient coverage. And, on-call personnel might cover the relatively rare IT-related incidents.

The AIRs database also sheds some light on how incidents are discovered. Figure 5 suggests that over half of all incidents discovered in 2013 were from alarms and more than 40% of incidents are discovered by technicians on site, totaling about 95% of incidents. The biggest change over the years covered by the chart is a slow growth of incidents discovered by alarm.

Figure 5. Alarms are now the source for most AIRs; however, availability failures are more likely to be found by technicians.

Figure 5. Alarms are now the source for most AIRs; however, availability failures are more likely to be found by technicians.

Alarms, however, cannot respond to or mitigate incidents. Uptime Institute has witnessed a number of methods for saving a data center from going down and reducing the impact of a data center incident. These methods require having personnel to respond to the incident, building redundancy into critical systems, and strong predictive maintenance programs to forecast potential failures before they occur. Figure 6 breaks down how often each of these methods produced actual saves.

Figure 6. Equipment redundancy was responsible for more saves in 2013 than in previous years.

Figure 6. Equipment redundancy was responsible for more saves in 2013 than in previous years.

The chart also appears to suggest that in recent years, equipment redundancy and predictive maintenance are producing more saves and technicians fewer. There are several possible explanations for this finding, including more robust systems, greater use of predictive maintenance, and budget cuts that reduce staffing or move it off site.

FAILURES
The data show that all the availability failures in 2013 were caused by electrical system incidents. A majority of the failures occurred because maintenance procedures were not followed. This finding underscores the importance of having proper procedures and well trained staff, and ensuring that vendors are familiar with the site and procedures.

Figure 7. Almost half the AIRs reported in 2013 were In Service.

Figure 7. Almost half the AIRs reported in 2013 were In Service.

Figure 7 further explores the causes of incidents in 2013. Roughly half the incidents were described as “In Service,” which is defined as inadequate maintenance, equipment adjustment, operated to failure, or no root cause found. The incidents attributed to preventive maintenance actually refer to preventive maintenance that was performed improperly. Data center staff caused just 2% of incidents, showing that the interface of personnel and equipment is not a main cause of incidents and outages.

SUMMARY
The increasing sophistication of data center infrastructure management (DCIM), building management systems (BMS), and building automation systems (BAS) is increasing the question of whether staffing can be reduced at data centers. The advances in these systems are great and can enhance the operations of your data center; however, as the AIRs data shows, mitigation of incidents often requires on-site personnel. This is why it is still a prescriptive behavior for Tier III and Tier IV Operational Sustainability Certified data centers to have qualified full time equivalent (FTE) personnel on site at all times. The driving purpose is to provide quick response time to mitigate any incidents and events. The data show that there is no pattern as to when incidents occur. Their occurrence is pretty well spread across all hours of the day and all days of the week. Watching as data centers continue to evolve with increased remote access and more redundancy built in, will show if the trends continue in their current path. As with any data center operations program the fundamental objective is risk avoidance. Each data center is unique with its own set of inherent risks. Shift presence is just one factor, but a pretty important one; a decision on how many to staff, for each shift, and with what qualifications, can have major impact on risk avoidance and continued data center availability. Choose wisely.


Rich Van Loo

Rich Van Loo

Rich Van Loo is Vice President, Operations for Uptime Institute. He performs Uptime Institute Professional Services audits and Operational Sustainability Certifications. He also serves as an instructor for the Accredited Tier Specialist course.

Mr. Van Loo’s work in critical facilities includes responsibilities ranging from projects manager of a major facility infrastructure service contract for a data center, space planning for the design/construct for several data center modifications, and facilities IT support. As a contractor for the Department of Defense, Mr. Van Loo provided planning, design, construction, operation, and maintenance of worldwide mission critical data center facilities. Mr. Van Loo’s 27-year career includes 11 years as a facility engineer and 15 years as a data center manager.

Unipol Takes Space Planning to a New Level

Municipal requirements imposed difficult space considerations for Italian insurance company’s Tier IV data center

By Andrea Ambrosi and Roberto Del Nero

Space planning is often the key to a successful data center project. Organizing a facility into functional blocks is a fundamental way to limit interference between systems, reduce any problem related to power distribution, and simplify project development. However, identifying functional blocks and optimizing space within an existing building can be extremely complex. Converting an office building into a data center can cause further complexity.

This was the challenge facing Maestrale, a consortium of four engineering companies active in the building field including Ariatta Ingegneria dei Sistemi Srl as the mechanical and electrical engineer, when a major Italian insurance company, UnipolSai Assicurazioni S.p.a (UnipolSai), asked it to design a data center in an office building that had been built at the end of the 1980s in Bologna, Italy. UnipolSai set ambitious performance goals by requiring the electrical and mechanical infrastructure to be Uptime Institute Tier IV Certification of Constructed Facility and very energy efficient.

In addition, the completed facility, designed and built to meet UnipolSai’s requirements, has the following attributes:

•   1,200 kilowatts (kW) maximum overall IT equipment load

•   UPS capacity: not less than 10 minutes

•   Four equipment rooms having a total area of 1,400 square meters (m2)

•   Cold Aisle/Hot Aisle floor layouts

: In an energy-conscious project, all innovative free-cooling technologies must be considered.

After a thorough investigation of these free-cooling technologies, Ariatta chose direct free cooling for the UnipolSai project.

MUNICIPAL RESTRICTIONS
The goal of the architectural and structural design of a data center is to accommodate, contain, and protect the mechanical, electrical, and IT equipment. The size, location, and configuration of the mechanical and electrical infrastructure will determine the architecture of the rest of the building. In a pre-existing building, however, this approach does not always apply. More often, builders must work around limitations such as fixed perimeter length and floor height, floors not capable of bearing the weight of expected IT equipment, lack of adjoining external spaces, and other restrictions imposed by municipal regulations. In this project, a series of restrictions and duties imposed by the Municipality of Bologna had a direct impact on technical choices, in particular:

•   Any part or any piece of equipment more than 1.8-meters high on the outside 
yard or on any external surface of the building (e.g., the roof) would be 
considered added volume, and therefore not allowed.

•   Any modification or remodelling activity that changed the shape of the 
building was to be considered as incompatible with municipal regulations.

•   The location was part of a residential area with strict noise limits (noise levels at property lines of 50 decibels [dbA] during the day and  40 dbA at night).

New structural work would also be subject to seismic laws, now in force throughout the country. In addition, UnipolSai’s commitment to Uptime Institute Tier IV Certification required it to also find solutions to achieve Continuous Cooling to IT equipment and to Compartmentalize ancillary systems. 

The final design incorporates a diesel rotary UPS (DRUPS) in a 2N distribution scheme, a radial double-feed electrical system, and an N+1 mechanical system with dual-water distribution backbone (Line 1 and Line 2) that enable the UnipolSai facility to meet Uptime Institute Tier IV requirements. Refrigerated water chillers with magnetic levitation bearings and air exchangers inserted in an N+1 hydraulic scheme serve the mechanical systems. The chillers are provided with double electric service entrance controlled by an automatic transfer switch (ATS). 

The DRUPS combine UPS and diesel engine-generator functions and do not require battery systems, which are normally part of static UPS systems. Uptime Institute Tier IV requires Compartmentalization, which necessitates more space. Eliminating the batteries saved a great deal of space. In addition, using the DRUPS to feed the chillers ensured that the facility would meet Tier IV requirements for Continuous Cooling with no need for storage tanks, which would be difficult to place in this site. The DRUPS also completely eliminated cooling requirements in the UPS room because the design ambient temperature would be around 30°C (maximum 40°C). Finally, using the DRUPS greatly simplified the distribution structure, limiting the ATSs on primary electric systems to a very minimum.

Municipal restrictions meant that the best option for locating the DRUPS and chillers would require radically transforming some areas inside building. For example, Ariatta uncovered an office floor and adapted structures and waterproofing to install the chiller plant (see Figures 1 and 2).I Ambrosi Figure 1 Picture_001_Chiller plant

Figures 1 and 2. Bird’s eye and ground level views of the chiller plant.

Figures 1 and 2. Bird’s eye and ground level views of the chiller plant.

Positioning the DRUPS posed another challenge. In another municipality, its dimensions (12-m length by 4.5-m height), weight, and maintenance requirements would have guided the design team towards a simple solution, such as installing them in containers directly on the floor. However, municipal restrictions for this location (1.8-m limit above street level) required an alternative solution. As a result, geological, geotechnical, and hydrogeological studies of the site of the underground garage showed that:

•   Soil conditions met the technical and structural requirements of the DRUPS installation.

•  The stratum was lower than the foundations.

•   Flood indexes are fixed 30 centimeters above street level (taking zero
 level as reference).

The garage area was therefore opened and completely modified to contain a watertight tank containing the DRUPs. The tank included a 1.2-m high parapet to prevent flooding. The tank was equipped with redundant water lifting systems fed by the DRUPS (see Figures 3 and 4).

Figures 3 and 4. Particular care was given to protect the DRUPS against water intrusions. Soundproofing was 
also necessary.

Figures 3 and 4. Particular care was given to protect the DRUPS against water intrusions. Soundproofing was 
also necessary.

Meeting the city’s acoustic requirements required soundproofing the DRUPS machines, so the DRUPS systems were double shielded reducing noise levels to 40 decibels (dbA) at 10 m during normal operation when connected to mains power. Low-noise chillers and high-performance acoustic barriers helped the entire facility meets its acoustical goals.

After identifying technical rooms and allocating space for equipment rooms, Ariatta had to design systems that met UnipolSai’s IT and mechanical and electrical requirements, IT distribution needs, and Uptime Institute Tier IV Compartmentalization requirements.

The floors of the existing building did not always align, especially on lower stories. These changes in elevation were hard to read in plans and sections. To meet this challenge, Starching S.R.L. Studio Architettura & Ingegneria and Redesco Progetti Srl, both part of the Maestrale Consortium, developed a three-dimensional Revit model, which included information about the mechanical and electrical systems. The Revit model helped identify problems caused by the misalignment of the floors and conflict between systems in the design phase. It also helped communicate new information about the project to contractors during the construction phase (see Figure 5 and 6).

I Ambrosi Figure 5 Picture_005_REVIT simul

Figures 5 and 6. Revit models helped highlight changes in building elevations that were hard to discern in other media and also aided in communication with contractors.

Figures 5 and 6. Revit models helped highlight changes in building elevations that were hard to discern in other media and also aided in communication with contractors.

The use of 3D models is becoming a common way to eliminate interference between systems in final design solutions, with positive effects on the execution of work in general and only a moderate increase in engineering costs.

Figure 7. Fire-rated pipe enclosure

Figure 7. Fire-rated pipe enclosure

At UnipolSai, Compartmentalizing ancillary systems represented one of the main problems to be resolved to obtain Uptime Institute Tier IV Certification because of restrictions caused by the existing building. Ariatta engaged in continuous dialogue with the Uptime Institute to identify technical solutions. This dialogue, along with studies and functional demonstrations carried out jointly with sector specialists, led to a shared solution where two complementary systems that form the technological backbone are compartmentalized, with respect to one another (see Figure 7). The enclosures, which basically run parallel to each other, have:

•   An external fire-resistant layer (60 minutes, same as the building structure)

•   An insulation layer to keep the temperature of the technological systems within design limits for 
 60 minutes

•   A channel that contains and protects against leaks absorbs shocks

•   Dedicated independent mounting brackets.

This solution was needed where the architectural characteristics of the building affected the technological backbone (see Figure 8).

Figure 8. The layout of the building limited the potential paths for pipe runs.

Figure 8. The layout of the building limited the potential paths for pipe runs.

ENERGY EFFICIENCY

The choice of direct free cooling was made following an environmental study intended to determine and analyse the time periods when outdoor thermo-hygrometric conditions are favorable to the indoor IT microclimate of the UnipolSai data center as well as the relevant technical, economic, and energy impact of free cooling on the facility.

The next-generation IT equipment used at UnipolSai allows it to modify the environmental parameters used as reference.

Figure 9. UnipolSai sized equipment to meet the requirements of ASHRAE’s “Thermal Guidelines for Data Processing Environments, 3rd edition,” as illustrated by that publication’s Figure 2.

Figure 9. UnipolSai sized equipment to meet the requirements of ASHRAE’s “Thermal Guidelines for Data Processing Environments, 3rd edition,” as illustrated by that publication’s Figure 2.

The air conditioning systems in the data center were sized to guarantee temperatures between 24–26°C (75-79°F) per Class A1 equipment rooms as per ASHRAE “Thermal Guidelines for Data Processing Environments, 3rd edition”, in accordance with ASHRAE psychrometric chart (see Figure 9). 

The studies carried out showed that, in the Bologna region specifically, the outdoor thermo-hygrometric conditions are favorable to the IT microclimate of the data center about 70% of the time with energy savings of approximately 2,000 megawatt-hours.

Direct free cooling brought undeniable advantages in terms of energy efficiency but introduced a significant functional complication issue related to Tiers compliance. The Tier Standards do not reference direct free cooling or other economization systems as the Tier requirements apply regardless of the technology. 
Eventually, it was decided that the free cooling system had to be subordinated to IT equipment continuous operation and excluded every time there was a problem with the mechanical and electrical systems, in which case continued operations would be ensured by the chiller plant.

Eventually, it was decided that the free cooling system had to be subordinated to IT equipment continuous operation and excluded every time there was a problem with the mechanical and electrical systems, in which case Continous Cooling would be ensured by the chiller plant. 

The direct free cooling functional setting, with unchanneled hot air rejection that was dictated by the pre-existing architectural restrictions, dictated the room layout and drove the choice of Cold Aisle containment.

The direct free-cooling system consists of N+1 CRACs placed along the perimeter of the room, blowing cool air into a 60-inch plenum created by the access floor. The same units manage the free-cooling system. Every machine is equipped with a dual feed electric entrance controlled by an ATS and connected to a dual water circuit through a series of automatic valves (see Figure 10).

Figure 10. CRACs are connected with a dual-feed electric entrance controlled by an ATS and connected to a dual water circuit.

Figure 10. CRACs are connected with a dual-feed electric entrance controlled by an ATS and connected to a dual water circuit.

Containing the Cold Aisles caused a behavioral response among the IT operators, who normally work in a cold environment. At UnipolSai’s data center, they feel hot air heat when entering the data center. Design return air temperatures in the circulation areas are 32–34°C (90-93°F), and design supply air temperatures are 24–26°C (75-79°F). It became necessary to start an informational campaign to prevent alarmism in connection with room temperatures in the areas outside the functional aisles (See Figures 11-13).

I Ambrosi Figure 11 Picture_011_pipework conteinmentI Ambrosi Figure 12 Picture_012_IT cage

Figures 11-13. Pictures show underfloor piping, containers, and raised floor environment.

Figures 11-13. Pictures show underfloor piping, containers, and raised floor environment.

Prefabricated electric busbars placed on the floor at regular intervals provide power supply to the IT racks. This decision was made in collaboration with UnipolSai technicians who considered it the most flexible solution in terms of installation and power draw, both initially and to accommodate future changes (see Figure 14 and 15).

Figure 14. Electric busbar

Figure 14. Electric busbar

Figure 15. Taps on the busbar allow great flexibility on the data center floor and feed servers on the white space floor below.

Figure 15. Taps on the busbar allow great flexibility on the data center floor and feed servers on the white space floor below.

In addition, a labeling system involving univocal synthetic description (alphanumeric code) and color coding, which allows a quick visual identification any part of any system makes simplifies the process of testing, operating, and managing all building systems  (see Figure 16 and 17).

I Ambrosi Figure 16 Picture_016_labeling

Figure 16-17. UnipolSai benefits from a well-thought out labeling system, which simplifies many aspects of operations.

Figure 16-17. UnipolSai benefits from a well-thought out labeling system, which simplifies many aspects of operations.

FINAL TESTING

Functional tests were carried out at nominal load with the support of electric heaters, distributed in a regular manner within the equipment rooms and connected to the infrastructure feeding the IT equipment (see Figure 18-19). Also, Uptime Institute observed technical and functional tests were as part of Tier IV Certification of Constructed Facility (TCCF). The results of all the tests were positive; final demonstrations are pending. The data center has received Uptime Institute Tier IV Certification of Design Documents and is in progress for Tier IV Certification of Constructed Facility.

Figure 18-19. Two views of the data center floor, including heaters, during final testing.

Figure 18-19. Two views of the data center floor, including heaters, during final testing.

I Ambrosi Figure 19 Picture_019_HeatersTo fully respond to the energy saving and absorption control policy of UnipolSai, the site was equipped with a network of heat/cooling energy meters and electrical meters connected to the central supervision system. Each chiller, and pumping and air handling system was specifically metered on electrical side, with chillers metered on the thermal side. Each electric system feeding IT loads is also metered.

UnipolSai also adopted DCIM software that, if properly used, can represent the first step towards an effective organization of the maintenance process, essential for keeping a system efficient and operational, independently from its level of redundancy and sophistication.


Andrea Ambrosi

Andrea Ambrosi

Andrea Ambrosi, is project manager, design team manager and site manager at Ariatta Ingegneria dei Sistemi Srl (Ariatta). He is responsible for the executive planning and management of operations of electrical power, control and supervision systems, safety, and security and fire detection systems to be installed in data centers. He has specific experience in Domotics and special systems for the high-tech residential sector. He has been an Accredited Tier Designer since 2013 and Accredited Tier Specialist since 2014.

Roberto Del Nero, is a project manager and design team manager at

Roberto Del Nero

Roberto Del Nero

Ariatta, where he is responsible for the executive planning and management of mechanical plants, control and supervision systems, fire systems, plumbing and drainage to be installed in data center. He has been LEED AP (Accredited Professional) since 2009, Accredited Tier Designer since 2013 and Accredited Tier Specialist since 2014.