Experience, teamwork, and third-party verification are keys to avoiding data center construction problems
By Keith Klesner
In 2014, Uptime Institute spoke to the common conflicts between data center owners and designers. In our paper, “Resolving Conflicts Between Data Center Owners and Designers” [The Uptime Institute Journal, Volume 3, p 111], we noted that both the owner and designer bear a certain degree of fault for data center projects that fail to meet the needs of the enterprise or require expensive and time-consuming remediation when problems are uncovered during commissioning or Tier Certification.
Further analysis reveals that not all the communications failures can be attributed to owners or designers. In a number of cases, data center failures, delays, or cost overruns occur during the construction phase because of misaligned construction incentives or poor contractor performance. In reality, the seeds of both these issues are sown in the earliest phases of the capital project, when design objectives, budgets, and schedules are developed, RFPs and RFIs issued, and the construction team assembled. The global scale of planning shortfalls and project communication issues became clear due to insight gained through the rapid expansion of the Tier Certiﬁcation program.
Many construction problems related to data center functionality are avoidable. This article will provide real-life examples and ways to avoid these problems.
In Uptime Institute’s experience from more than 550 Tier Certifications in over 65 countries, problems in construction resulting in poor data center performance can be attributed to:
• Poor integration of complex systems
• Lack of thorough commissioning or compressed commissioning schedules
• Design changes
• Substitution of materials or products
These issues arise during construction, commissioning, or even after operations have commenced and may impact cost, schedule, or IT operations. These construction problems often occur because of poor change management processes, inexperienced project teams, misaligned objectives of project participants, or lack of third-party verification.
Lapses in construction oversight, planning, and budget can mean that a new facility will fail to meet the owner’s expectations for resilience or require additional time or budget to cure problems that become obvious during commissioning—or even afterwards.
APPOINTING AN OWNER’S REPRESENTATIVE
At the project outset, all parties should recognize that owner objectives differ greatly from builder objectives. The owner wants a data center that best meets cost, schedule, and overall business needs, including data center availability. The builder wants to meet project budget and schedule requirements while preserving project margin. Data center uptime (availability) and operations considerations are usually outside the builder’s scope and expertise.
Thus, it is imperative that the project owner—or owner’s representatives—devise contract language, processes, and controls that limit the contractors’ ability to change or undermine design decisions while making use of the contractors’ experience in materials and labor costs, equipment availability, and local codes and practices, which can save money and help construction follow the planned timeline without compromising availability and reliability.
Data center owners should appoint an experienced owner’s representative to properly vet contractors. This representative should review contractor qualifications, experience, staffing, leadership, and communications. Less experienced and cheaper contractors can often lead to quality control problems and design compromises.
The owner or owner’s representative must work through all the project requirements and establish an agreed upon sequence of operations and an appropriate and incentivized construction schedule that includes sufficient time for rigorous and complete commissioning. In addition, the owner’s representative should regularly review the project schedule and apprise team members of the project status to ensure that the time allotted for testing and commissioning is not reduced.
Project managers, or contractors, looking to keep on schedule may perform tasks out of sequence. Tasks performed out of sequence often have to be reworked to allow access to space allocated to another system or to correct misplaced electrical service, conduits, ducts, etc., which only exacerbates scheduling problems.
Construction delays should not be allowed to compromise commissioning. Incorporating penalties for delays into the construction contract is one solution that should be considered.
Value Engineering (VE) is a regularly accepted construction practice employed by owners to reduce the expected cost of building a completed design. The VE process has its beneﬁts, but it tends to focus just on the ﬁrst costs of the build. Often conducted by a building contractor, the practice has a poor reputation among designers because it often leads to changes that compromise the design intent. Yet other designers believe that in qualified hands, VE, even in data centers, can yield savings for the project owner, without affecting reliability, availability, or operations.
If VE is performed without input from Operations and appropriate design review, any initial savings realized from VE changes may be far less than charges for remedial work needed to restore features necessary to achieve Concurrent Maintainability or Fault Tolerance and increased operating costs over the life of the data center (See Start with the End in Mind, The Uptime Institute Journal, Volume 3, p.104).
Uptime Institute believes that data center owners should be wary of changes suggested by VE that deviate from either the owner’s project requirements (OPR) or design intent. Cost savings may be elusive if changes resulting from VE substantially alter the design. As a result, each and every change must be scrutinized for its effect on the design. Retaining the original design engineer or a project engineer with experience in data centers may reduce the number of inappropriate changes generated during the process. Even so, data center owners should be aware that Uptime Institute personnel have observed that improperly conducted VE has led to equipment substitutions or systems consolidations that compromised owner expectations of Fault Tolerance or Concurrent Maintainability. Contractors may substitute lower-priced equipment that has different capacity, control methodology, tolerances, or specifications without realizing the effect on reliability.
Examples of VE changes include:
• Eliminating valves needed for Concurrent Maintainability (see Figure 1)
• Reducing the number of automatic transfer switches (ATS) by consolidating equipment onto a single ATS
• Deploying one distinct panel rather than two, confounding Fault Tolerance
• Integrating economizer and energy-efficiency systems in a way that does not allow for Concurrent Maintainability or Fault Tolerant operation
ADEQUATE TIME FOR COMMISSIONING
Problems attributed to construction delays sometimes result when the initial construction timeline does not include adequate time for Level 4 and Level 5 testing. Construction teams that are insufficiently experienced in the rigors of data center commissioning (Cx) are most susceptible to this mistake. This is not to say that builders do not contribute to the problem by working to a deadline and regarding the commissioning period as a kind of buffer that can be accessed when work runs late. For both these reasons, it is important that the owner or owner’s representative take care to schedule adequate time for commissioning and ensure that contractors meet or exceed construction deadlines. A recommendation would be to engage the Commissioning Agent (CxA) and General Contractor early in the process as a partner in the development of the project schedule.
In addition, data center capital projects include requirements that might be unfamiliar to teams lacking experience in mission critical environments; these requirements often have budgetary impacts.
For example, owners and owner’s representatives must scrutinize construction bids to ensure that they include funding and time for:
• Factory witness tests of critical equipment
• Extended Level 4 and Level 5 commissioning with vendor support
• Load banks to simulate full IT load within the critical environment
• Diesel fuel to test and verify engine-generator systems
EXAMPLES OF DATA CENTER CONSTRUCTION MISTAKES
Serious mistakes can take place at almost any time during the construction process, including during the bidding process. In one such instance, an owner’s procurement department tried to maximize a vendor discount for a UPS but failed to order bus and other components to connect the UPS.
In another example, consider the contractor who won a bid based on the cost of transporting completely assembled generators on skids for more than 800 miles. When the vendor threatened to void warranty support for this creative use of product, the contractor was forced to absorb the substantial costs of transporting equipment in a more conventional way. In such instances, owners might be wise to watch closely whether the contractor tries to recoup his costs by changing the design or making other equipment substitution.
During the Tier Certification of a Constructed Facility (TCCF) for a large financial firm, Uptime Institute uncovered a problematic installation of electrical bus duct. Experienced designers and contractors, or those willing to involve Operations in the construction process, know that these bus ducts should be regularly scanned under load at all joints. Doing so ensures that the connections do not loosen and overheat, which can lead to an arc-based failure. Locating the bus over production equipment or in hard to reach locations may prevent thorough infrared scanning and eventual maintenance.
Labeling the critical feeders is just as important so Operations knows how to respond to an incident and which systems to shut down (see Figure 2).
In this case, the TCCF team found that the builder implemented a design as it saw fit, without considering maintenance access or labeling of this critical infrastructure. The builder had instead rerouted the bus ducts into a shared compartment and neglected to label any of the conductors.
In another such case, a contractor in Latin America simply did not want to meet the terms of the contract. After bidding on a scope of work, the contractor made an unapproved change that was approved by the local engineer. Only later did the experienced project engineer hired by the owner note the discrepancy, which began a months-long struggle to get the contractor to perform. During this time, when he was reminded of his obligation, he simply deflected responsibility and eventually admitted that he didn’t want to do the work as specified. The project engineer still does not know the source of the contractor’s intransigence but speculates that inexperience led him to submit an unrealistically low bid.
Uptime Institute has witnessed all the following cooling system problems in facilities with Tier III objectives:
• When the rooftop unit (RTU) control sequence was not well understood and coordinated, RTU supply air fan and outside air dampers did not react at the same speed, creating over/under pressure conditions in the data hall. In one case, over-pressurization blew a wall out. In another case over/under pressure created door opening and closing hazards.
• A fire detection and suppression system was specifically reviewed for Concurrent Maintainability to ensure no impact to power or cooling during any maintenance or repair activities. At the TCCF, Uptime Institute recognized that a dual-fed UPS power supply to a CRAC shutdown relay that fed a standing voltage system was still an active power supply to the switchboard, even though the mechanical board had been completely isolated. Removing that relay caused the loss of all voltage, the breakers for all the CRACs to open, and critical cooling to the data halls and UPS rooms to be lost. The problem was traced to a minor construction change to the Concurrently Maintainable design of a US$22-million data center.
• In yet another instance, Uptime Institute discovered during a TCCF that a builder had fed power to a motorized building air supply and return using a single ATS, which would have defeated all critical cooling. The solution involved the application of multiple distributed damper control power ATS devices.
Fuel supply systems are also susceptible to construction errors. Generally diesel fuel for engine generators is pumped from bulk storage tanks through a control and filtration room to day tanks near the engine generators.
But in one instance, the fuel subcontractor built the system incorrectly and failed to do adequate quality control. The commissioning team also did not rigorously confirm that the system was built as designed, which is a major oversight. In fact, the commissioning agent was only manually testing the valves as the TCCF team arrived on site (see Figure 3). In this example, an experienced data center developer created an overly complex design for which the architect provisioned too little space. Operating the valves required personnel to climb on and over the piping. Much of the system was removed and rebuilt at the contractor’s expense. The owner also suffered adding project time and additional commissioning and TCCF testing after the fact.
AVOIDING CONSTRUCTION PROBLEMS
Once a design has been finalized and meets the OPR, change control processes are essential to managing and reducing risk during the construction phase. For various reasons, many builders, and even some owners, may be unfamiliar with the criticality of change control as it relates to data center projects. No project will be completely error free; however, good processes and documentation will reduce the number and severity of errors and sometimes make the errors that do occur easier to fix. Uptime Institute recommends that anyone contemplating a data center project take the following steps to protect against errors and other problems that can occur during construction.
Gather a design, construction, and project management team with extensive data center experience. If necessary bring in outside experts to focus on the OPR. Keep in mind that an IT group may not understand schedule risk or the complexity of a project. Experienced teams pushback on unrealistic schedules or VE suggestions that do not meet OPR, which prevents commissioning schedule compression and leads to good Operational Sustainability. In addition, experienced teams have data center operations and commissioning experience, which means that project changes will more likely benefit the owner. The initial costs may be higher, but experienced teams bring better ROI.
Because experienced teams understand the importance of data center specific Cx, the CxA will be able to work more effectively early in the process, setting the stage for the transition to operations. The Cx requirements and focus on functionality will be clear from the start.
In addition, Operations should be part of the design and construction team from the start. Including Operations in change management gives it the opportunity to share and learn key information about how that data center will run, including set points, equipment rotation, change management, training, and spare inventory, that will be essential in every day operations and dealing with incidents.
Finally vendors should be a resource to the construction team, but almost by definition, their interests and those of the owner are not aligned.
Assembling an experienced team only provides benefits if they work as a team. The owner and owner’s representatives can encourage collaboration among team members who have divergent interests and strong opinions by structuring contracts with designers, project engineering, and builders to prioritize meeting the OPR. Many data center professionals find that Design-Build or Design–Bid–Build using guaranteed maximum price (GMP) and sharing of cost savings contract types conducive to developing a team approach.
Third-party verifications can assure the owner that the project delivered meets the OPR. Uptime Institute has witnessed third-party verification improve contractor performance. The verifications motivate the contractors to work better, perhaps because verification increases the likelihood that shortcuts or corner cutting will be found and repaired at the contractor’s expense. Uptime Institute does not believe that contractors, as a whole, engage in such activities, but it is logical that the threat of verification may make contractors more cautious about “interpreting contract language” and making changes that inexperienced project engineers and owner’s representatives may not detect.
Certifications and verifications are only effective when conducted by an unbiased, vendor-neutral third-party. Many certifications in the market fail to meet this threshold. Some certifications and verification processes are little more than a vendor stamp of approval on pieces of equipment. Others take a checklist approach, without examining causes of test failures. Worthwhile verification and certification approaches insist on identifying the causes of anomalous results, so they do not repeat in a live environment.
Similarly, the CxA should also be independent and not the designer or project engineer. In addition the Cx team should have extensive data center experience.
The CxA should focus on proving the design and installation meet OPR. The CxA should be just as inquisitive as the verification and certification agencies, and for the same reasons: if the root cause of abnormal performance during commissioning is not identified and addressed, it will likely recur during operations.
Third-party verifications and certifications provide peer review of design changes and VE. The truth is that construction is messy: On-site teams can get caught up in the demands of meeting budget and schedule and may lose site of the objective. A third-party resource that reviews major RFIs, VE, or design changes can keep a project on track, because an independent third-party can remain uninfluenced by project pressure.
TIER CERTIFICATION IS THE WRONG TIME TO FIND THESE PROBLEMS
Uptime Institute believes that the Tier Certification process is not the appropriate time to identify design and construction errors or to find that a facility is not Concurrently Maintainable or Fault Tolerant, as the owner may require. In fact, we note with alarm that a great number of the examples in this article were first identified during the Tier Certification process, at a time when correcting problems is most costly.
In this regard, then, the number of errors discovered during commissioning and Tier Certifications point out one value of third-party review of the design and built facility. By identifying problems that would have gone unnoticed until a facility failed, the third-party reviewer saves the enterprise a potentially existential incident.
More often, though, Uptime Institute believes that a well-organized construction process, including independent Level 4 and Level 5 Commissioning and Tier Certification, includes enough checks and balances to catch errors as early as possible and to eliminate any contractor incentive to “paper over” or minimize the need for corrective action when deviations from design are identified.
SIDEBAR: FUEL SUPPLY SYSTEM ISSUE OVERLOOKED DURING COMMISSIONING
Uptime Institute recently examined a facility that used DRUPS to meet the IT loads in a data center. The facility also had separate engine generators for mechanical loads. The DRUPS were located on the lower of two basement levels, with bulk fuel storage tanks buried outside the building. As a result, the DRUPS and daily tanks were lower than the actual bulk storage tanks.
The Tier Certification Constructed Facility (TCCF) demonstrations required that the building operate on engine-generator sets for the majority of the testing. During the day, the low fuel alarm tripped on multiple DRUPS.
The ensuing investigation faulted the sequence of operations for the fuel transfer from the bulk storage tanks to the day tanks. When the day tanks called for fuel, the system would open the electrical solenoid valve at the day tank and delay the start of the fuel transfer pump to flow fuel. This sequence was intended to ensure the solenoid valve had opened so the pump would not deadhead against a closed valve.
Unfortunately, when the valve opened, gravity caused the fuel in the pipe to flow into the day tank before the pump started, which caused an automatic fuel leak detection valve to close. The fuel pump was pumping against a closed valve.
The fuel supply problem had not manifested previously, although the facility had undergone a number of commissioning exercises, because the engine-generator sets had not run long enough to deplete the fuel from the day tanks. In these exercises, the engine generators would run for a period of time and not start again until the next day. By then, the pumps running against the closed valve pushed enough fuel by the closed valves to refill the day tanks. The TCCF demonstrations caused the engine generators to run non-stop for an entire day, which emptied the day tanks and required the system to refill the day tanks in real time.
The solution to this problem did not require drastic remediation, as sometimes occurs. Instead, engineers removed the time delay after the opening of the valve from the sequence of operation so that fuel could flow as desired.
MORAL OF THE STORY
Commissioning is an important exercise. It ensures that data center infrastructure is ready day one to support a facility’s mission and business objectives. Commissioning activities must be planned so that every system is required to operate under real-world conditions. In this instance, the engine-generator set runs were too brief to test the fuel system in a real-world condition.
TCCF brings another perspective, which made all the difference in this case. In the effort to test everything during commissioning, the big picture can be lost. The TCCF focuses on demonstrating each system works as a whole to support the overall objective of supporting the critical load.
Keith Klesner’s career in critical facilities spans 15 years. In the role of Uptime Institute Vice President of Strategic Accounts, Mr. Klesner has provided leadership and results-driven consulting for leading organizations around the world. Prior to joining Uptime Institute, Mr. Klesner was responsible for the planning, design, construction, operation, and maintenance of critical facilities for the U.S. government worldwide. His early career includes six years as a U.S. Air Force Officer. He has a Bachelor of Science degree in Civil Engineering from the University of Colorado- Boulder and a Masters in Business Administration from the University of LaVerne. He maintains status as a Professional Engineer (PE) in Colorado and is a LEED Accredited Professional.