Compass sets the course with new modular data center approach.
The term modular has been used to describe a variety of approaches to data center design. Historically, the first commercially available modular design was a 2007 Sun Microsystems container-based product called the Black Box. Today, the term describes products that range from shipping containers and simple repeated designs to fully manufactured IT spaces and MEP systems built in factories and shipped to various sites.
Many modular data center providers have written white papers that tout the benefits of their proprietary designs, but there are also some informative reports on the ins and outs of modular design. John Stanley of 451 Research presented the results of his survey work at the Uptime Institute Symposium in May 2012. He surveyed 35 companies that use and provide various types of modular solutions for data centers. His research pointed out that 65% of those surveyed deployed capacity in ‘chunky’ increments, specifically highlighting the need for small chunks of capacity per module or deployment to match growth of their data centers.
Compass has based its business strategy on this industry driver and five main features of modular design:
High quality
Low construction costs
Speed to deploy
Integrated supply chain
Low operating costs
In the Green Grid paper, “Deploying and Using Containerized/Modular Data Center Facilities,” published in November 2011, the Green Grid demonstrated that modular data centers can follow capacity requirements more closely, freeing capital and keeping MEP systems more fully loaded than large-scale deployments. The authors of the paper showed that end users could employ their utility systems and floor spaces more fully, without undue concern for system exhaustion or space starvation. The authors discuss how small increments of capacity can follow compute demands more closely, with the caveat that the facility must also have a faster implementation cycle in order to match the changing IT requirements.
The data center industry is beginning to realize the benefits of the early industrial revolution. Standardized modular power center designs provide some of the same benefits to design and construction personnel. Instead of hand-building custom electrical systems for each data center, the modular approach allows for greater deployment speed, improved quality and lower costs, all achieved by using factory-based labor. The use of modules also relieves labor stacking on the job site, while reducing the overall cost of the work by a significant amount.
In his 1936 paper, Factors Affecting the Cost of Airplanes, T.P. Wright first quantified the cost savings that could be attained using factory labor. Wright showed that the direct labor cost of assembling an airframe decreased by roughly 20% for each doubling of the production numbers. In other words, if the labor cost of building one airframe was $1,000,000, the labor cost of building two of the same models might be $800,000 each. Doubling again would mean the cost of four the same airframes would be $640,000 each, or 36% lower than the first unique model.
Compass Data Centers intended to take advantage of this principle as it developed its patent-pending solution. In doing so, Compass has found that modular designs enable it to:
Benefit from learning-curve dynamics to improve quality and speed of implementation with each iteration
Lower operating costs from design efficiencies that tune initial innovative ideas
Provide predictable reliability attributes and repeatable, low-cost operations
Deliver a standard product
As a result, Compass has moved from building a prototype on every data center project to productizing the data center program and build.
Despite the fact that the industry has not settled on a clear definition of the ‘modular data center,’ it is attempting to evade this ambiguity by proclaiming the coming of ‘Version 2,’ which seems to include prefabrication as a fundamental element. Before, modular construction was associated with ISO containers and packaged chillers. Now, modularity is being embraced as synonymous with sound design and sound capital usage.
The Compass Deployment
Every data center design begins with the establishment of availability and power requirements – reliability and power density are the cornerstones of any data center specification. The business needs of the enterprise and the IT applications to be contained within the data center should drive these decisions.
Virtually every initial design addresses the project power requirements and references the Uptime Institute’s well-known Tier specifications to define reliability requirements. Uptime’s Tier Standard helps define the equipment and systems required to support the reliability and power demands of the business’s IT model. If the needs of a business are truly defined by an Uptime Tier III specification, then its data center design should be certifiable as a Tier III design.
Compass’ Truly Modular Architecture design provides for standard features such as RHI Modular Power Solutions’ Modular Power Center units, which are also built off-site, power modules that can be configured to provide a 2N power infrastructure, and N+1 mechanical systems. Other Compass architecture features include:
10,000 square feet of column-free raised floor space
Hardened structure (Seismic 1.5 and Category 4 Hurricane Winds)
The capability to support up to 20-kW racks
Convenient operations, staging and storage spaces
Uptime Tier III-certifiable design
Uptime Tier III-certifiable facility
The standardized design of the Compass Modular Architecture provides assurance that the facility will be Tier III design certified. Internal auditors and external customers can not only be assured of the design certification, but also have the option to certify the facility itself. Importantly, the ability to productize a unique modular data center solution into a repeatable and well-defined process was a top priority. Modularizing data center components permits control over three primary variables:
Cost
Quality
Schedule
Compass shares the view that these three variables are the only three points that end users actually care about. Compass also believes that modularity:
Is cost effective
Is not always a prefabricated solution
Increases quality
Supports continuous improvement
May bring jurisdictional issues
Some jurisdictions have misconceptions about modular or containerized solutions. They envision standard ISO shipping containers, assembled on-site, as the basis of the construction process.
Meeting the local authority having jurisdiction (AHJ) early in the project’s developmental cycle is an effective way to establish trust and dispel misconceptions. Providing an extensive amount of specific information on the modular components of the proposed data centers is a good way to start.
Next Phase
As part of its schematic designs, Compass decided that its basic architectural scheme would be a shell that would contain a 10,000-square-foot (ft2) column-free, raised-floor hall and all the traditional support spaces. Compass made this decision based on the collective experience of its internal and consultant teams, as well as market analysis.
Compass decided that the systems, called Modular Power Centers (MPCs), would be packaged and delivered as standardized systems. This work was provided by Modular Power Solutions. The MPCs could be installed either internally or externally to the data center’s brick-and-mortar shell structure. Crosby and the MPC executives have filed for patent protection of the intellectual property.
Compass chose not to employ a central mechanical plant, but rather to deploy an N+1 redundant rooftop mechanical system to support the facility’s HVAC requirements. So, the schematic design established the following parameters:
Data Hall:
10,000 ft2 data hall
Column free
36-inch raised-floor white space, supporting up to 500 racks
120 watts per square foot , with the ability to cool heterogeneous loads and accommodate racks up to 30 kW
1.2-MW Modular Power Centers, which could be either internally or externally installed at the facility.
The basis-of-design (BOD) goals are then used for the next step in the design process, the creation of a single-line diagram. The development of the schematic design (SD) phase single-line diagram brings together the conceptual components and requirements of the BOD (see Figure 1).
Figure 1. Schematic design single-line diagram
The initial single-line is the tool that Compass used to determine the structure and components required for the electrical distribution system. The MPCs must be sized to be able to support the demands of the 1.2-MW data hall, the mechanical systems and the support spaces. The single-line diagram is also used to validate the Tier III redundancy and concurrent maintainability requirements. The redundancy of critical components and systems and the requirement for de-energized maintenance must be worked through at this stage of the design process.
The major electrical distribution system components were identified as:
One 2500-kilovolt-ampere (kVA) utility transformer
One 2500-kVA/2000-kW standby-rated generator
One redundant 2500-kVA/2000-kW standby generator
One 480-volts (V), 3000-amp (A) main distribution switchboard
Two 1200-kW high-efficiency uninterruptible power supplies (UPS)
Two 1600-A static transfer switch (STS) bypasses
Two 1600-A UPS output distribution switchboards
Two 1600-A maintenance bypasses
The Compass one-line based on the Tier III requirements provided a look at the electrical system’s redundancy (see Figure 2).
Figure 2. Electrical system redundancy
Then Compass added two isolation or tie breakers to the main switchboard in order to meet the Tier III concurrently maintainable requirement. These breakers were required to allow either side of the switchboard to be shut down and completely de-energized for service. The tie breakers can also be used to enhance fault tolerance. If they are forced to trip before the main circuit breakers on either side, the non-faulted side of the system can remain operational.
The decision was made to look to the local utility provider to supply an N redundant 2500-kVA utility power transformer. The 2500-kVA/2000-kW standby generator in the base system is N redundant. A second N+1 or 2N redundant generator added to the lineup meets the Tier III redundancy requirements. The second generator will be connected to the opposite side of the switchboard lineup. Compass preferred generator redundancy over utility redundancy, finding generators to be more reliable than the utility, based upon its experience that blackouts tend to regional in nature (as seen with hurricane Sandy).
The remaining equipment would be installed in a Power Center module. That equipment was identified as:
Two 480-V, 3000-A main distribution switchboards
Two 1200-kW high-efficiency UPS units
Two 1600-A STS bypasses
Two 1600-A UPS output distribution switchboards
Two 1600-A maintenance bypasses
Based on the equipment defined in the single-line, Compass determined that at least two MPCs would be required.
The tie breakers provide a natural point to split the system. This split became the basis of how the system is packaged. And the size of the modules and weight of the MPCs is constrained by the need to ship them over highways from the assembly facility to the job site. Typically, those shipping packages are not to exceed 50 feet by 12 feet and 100,000 pounds.
Each MPC switchboard lineup features a 3000-A and a 1600-A UL 891-listed switchboard. Each switchboard is equipped with two 3000-A, four 400-A, and four 450-A UL 489-listed circuit breakers. All circuit breakers larger than 200-A are 100% duty rated. All circuit breakers feature zone selective interlocks (ZSIs). A ZSI ties the circuit breaker trip units together, allowing them to communicate in order to ensure that the circuit breaker closest to the fault trips first. Increasing the fault isolation capabilities increases the data center’s ability to maintain operational continuity.
The main switchboards are configured as main-tie-main-tie-main. Each CPC has a dedicated programmable logic controller (PLC). The PLCs are hot swappable, meaning that if either processor goes down, the other processor will automatically take control. The I/O rack is located in the A CPC. There is no power bussing in the I/O section. The switchgear can be manually operated if the I/O rack is de-energized for maintenance.
Modbus Protocol is provided to the Schneider Electric StruXureWare management system at the PLC gateway for each main switchboard. The main switchgear has integrated revenue-grade power-quality metering.
Each side of the redundant power system features a 1.2-MW Schneider Electric APC Symmetra Megawatt UPS (see Figure 3). Each UPS has a dedicated external 1600-A continuous-duty-rated static bypass switch. Power to the two UPS systems is delivered from two separate (A/B) switchboards. Each switchboard is able to support the entire data center. The two tie breakers operate in the normally closed position.
Figure 3. UPS efficiency
Figure 4. (Above) Modular Power Center – top view (patent pending)
Figure 5. (Below) Modular Power Center – plan view (patent pending)
Power for the data hall will be derived from two identical MPC modules, each of which is factory-assembled prior to on-site delivery. The first layouts were completed for the MPCs with the addition of the following components:
Four battery cabinets per side for 5-minutes battery backup at 1.2 MW
Two automatic transfer switches for N+1 rooftop units
One 208/120-V distribution panel for local power needs
Figure 6. Modular Power Center – interior
The MPCs (see Figures 4-8) are IBC-rated R17 structures, which meet Miami-Dade County 149-mph wind-pressure loading requirements. The MPCs are constructed to provide protection with respect to harmful effects on the equipment due to the ingress of water (rain, sleet, snow), and will be undamaged by the external formation of ice on the enclosure or seismic events.
Compass employs 2.0-MW/2.5-MVA, 277/480-V, 3ø, four-wire generator rated at 1825 kW to provide standby power (see Figure 9 and 10).
Compass determined that the data center infrastructure will require support from a 2.0-MW/2.5-MVA generator. The sequence of operation of the total system is controlled automatically through deployment of redundant PLC control units installed in each of the 3000-A main switchboards. Should the primary standby generator fail to come online after loss of the utility source, the swing generator will pick up the critical loads of the system. Each generator will be provided with a weather-protective enclosure. All generator permits (including all operations, fuel storage, noise and air) will be obtained and maintained with the appropriate AHJ. Generators are equipped with 4,000-gallon fuel storage belly tanks for 24 hours of fuel capacity at full load.
Figure 7. Modular Power Center – interior
Figure 8. Modular Power Center – exterior
Figure 9. Standby generator
The output of each UPS module is a 1600-A distribution board equipped with a maintenance bypass. A solenoid key release unit (SKRU) is provided to ensure that the UPS has transferred to bypass before the MBP breaker can be engaged to the output switchboard. This will always be a closed transition transfer so that critical load power will never be lost.
Critical power to the IT load will be provided by eight 300-kVA PDUs installed in an alternating A/B arrangement in the data hall to provide 208/120-V power to either overhead busway or remote power panels. Each PDU has a 300-kVA K-13 rated transformer and six 225-A breakers. Additionally, each PDU has six integrated revenue-grade power-monitoring meters.
Compass determined that the mechanical requirements for each data hall can be supported by four 120-ton rooftop units (RTUs). The four RTUs provide N+1 system redundancy. Power for each RTU will be available from either the A or B System through dedicated automatic transfer switches (see Figure 11). The units feature integrated controls allowing for efficient airside economization across all units. A proprietary rapid-restart feature ensures full air movement within 30 seconds after restoration of power. The system controls deliver uniform under-floor pressures.
Figure 10. Generator exterior
This single-line diagram (see Figure 12) represents the next step in the construction document’s development process: the design development (DD)-phase documents. The updated single-line features significant developments in the design process. The new features are:
The division of the electrical distribution system into two completely separate MPCs
The second generator was added to provide N+1 redundancy
Provisions for ‘A side’ and ‘B side’ mechanical feeders
The integration of the main and output switchboards
The integration of the maintenance bypass (MBP) breaker into the new contiguous switchboard lineup
The introduction of dual-programmable logic controllers (PLCs)
The introduction of a remote main utility circuit breaker
The new single-line diagram now reflects the separate MPCs. The MPCs in this arrangement actually complement each other, providing 2N redundancy to the data hall.
A second generator was added to provide 2N generator redundancy. In the future, this second generator could be shared with additional data halls on the same campus. The redundancy of the generators would then be N+1.
Each MPC was outfitted with provisions to provide power for the entire mechanical system. Normally, power for one-half of the mechanical equipment is supplied by each MPC. Dedicated ATSs will automatically roll power to any active MPC if power were lost.
Figure 11. Power for each RTU is available from dedicated automatic transfer switches.
The new single-line shows how both the input and output switchboards have been integrated into a single switchgear lineup. The new lineup provided for a simplified interconnection scheme when it was installed in the MPC.
The use of dual MPC facilitates the use of dual PLCs. The ability of the PLCs to stay in synchronous operation allows for a seamless transfer of control between either unit.
High levels of arc-flash energy in the main switchboards became a concern when the decision was made to have the utility provide the main 2500-kVA transformer and the transformer’s primary-side protection. Most utilities design their protection schemes to protect the utility’s own equipment. This typically doesn’t translate into limiting the arc-flash energy levels on the secondary side of these large transformers. Remoting the main utility breaker outside the MPC allows the arc-flash energy to be contained outside the remote switchboard. The arc-flash energy inside the MPC is now be significantly reduced.
The deployment of the MPCs is just part of the modular concept that was developed for Compass Data Centers. Compass has done further development of the concept. This work and Tier III compliance with the Uptime Institute are important elements of its business plan to control costs and provide alignment of data center facilities with customers’ business requirements.
Steve Emert earned a BSEE Degree in Electrical Engineering from San Jose State University. He is currently a Registered Professional Engineer in more than 15 states, and is director of Mission-Critical Engineering at Rosendin Electric, the largest privately owned electrical contractor in the U.S. Mr. Emert began his career performing design and analysis of industrial, commercial and utility power systems, cogeneration plant design and coordination studies. In the mid-1990s, he worked at the Ames Research Center located at Moffett Field, CA, where he began a mission-critical-focused career working on the NAS supercomputer and many of the technically advanced NASA facilities.
Since joining Rosendin Electric, Mr. Emert has provided the engineering foundation for the company’s design-build mission-critical construction business. Today Rosendin Electric is the largest design-build electrical contractor for mission-critical facilities in the U.S. Mr. Emert is an active member of the IEEE P1584 IEEE Guidelines for Performing Arc-Flash Hazard Calculations Working Group. He is a member of the IEEE P1584 Configuration Task Group, currently engaged in the process of defining a new set of standards for the next issue of the P1584 Guideline publication. Mr. Emert has coauthored IEEE EMC Society presentations and written articles on electrical power systems for Electrical Contractor Magazine.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/09/emert-cover-photo.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-09-04 07:57:362014-09-04 08:41:19Data Center Facility Owners See Modules as Efficient Way to Deploy Capital
Details of dual-corded power change, but the theme remains the same.
Uptime Institute has worked with owners and operators of data centers since the early 1990s. At the time, data center owners used single-corded IT devices for even their most critical IT assets. Figure 1 shows a selection of the many potential sources of outage in the single path.
Early on, Site Uptime Network (now the Uptime Institute Network) founder Ken Brill recognized that outages due to faults or maintenance in the critical distribution system were a major problem in high availability computing. The Uptime Institute considers the critical distribution to include the power supply to IT devices from the UPS output to any PDU (power distribution unit), panel, or remote power panel (RPP), and the power distribution down to the rack via whip or bus duct.
Ahead of their time, Ken Brill and the Network created the Fault Tolerant Power Compliance Specification in 2000 to address the sources of outages, and updated it in 2002. Then, in 2004 Uptime Institute produced the paper Fault Tolerant Power Certification is Essential When Buying Products for High-Availability to directly address the issue. When this paper was written, four years after the Fault Tolerant Power Compliance Specification was first issued, critical distribution failures continued to cause the majority of data center outages.
In the mid-1990s, the Uptime Institute led the data center industry in establishing Tiers as a way to define the performance characteristics of data centers. Each Tier builds upon the previous Tier, adding maintenance opportunity and Fault Tolerance. This progress culminated in the 2009 publication of the Tier Standard: Topology, which established Tiers as progressive maintenance opportunities and fault tolerance. The Tier Standard also included the requirement for dual-corded devices in Tier III and Tier IV objective data centers. Tier III data centers have dual power paths to provide Concurrent Maintainability of each and every component and path. Tier IV data centers require the same dual power paths for Concurrent Maintainability and add the ability to autonomously respond to failures.
Figure 1. Single-corded IT equipment
Present The Fault Tolerant Power Compliance Specification, Version 2.0 is clearly relevant 12 years later. Originally called Fault Tolerant IT devices, today the commonly used vernacular is dual corded, and these devices have become the basis of high availability. The two terms Fault Tolerant IT devices and dual-corded IT device are used interchangeably.
Tier III and Tier IV data centers designs continue to be based upon the use of dual-corded architecture and require an active-active, dual-path distribution. The dual-corded concept is cemented into high-availability architecture in enterprise data centers, hyper-scale internet providers, and third-party data center spaces. Even the innovative Open Compute Project, sponsored by Facebook, which uses cutting-edge electrical architecture, utilizes dual-corded, Fault Tolerant IT devices.
Confoundingly, though, more than half of the more than 5,000 reported incidents in the Uptime Institute Network’s Abnormal Incident Reports (AIRs) database relate to the critical distribution system.
Dual-corded assets have increased maintenance opportunities for data center facilities management. Operations teams no longer need to wait for inconveniently timed maintenance windows to perform maintenance; instead they can maintain their facilities without IT impact during safe and regular hours. If there is an anomaly, the facilities and IT staff are on hand to address them.
Figure 2. Fault-Tolerant-and-dual-corded
Uptime Institute Network members today recognize the benefits of dual-corded devices. COO Jason Weckworth of RagingWire recently said, “Dual-corded IT devices allow RagingWire the maintenance and operations flexibility that are consistent with our Concurrently Maintainable objective and provide that extra level of availability assurance below the UPS system where any problem may have consequential impacts to availability.”
Uptime Institute Network adoption of dual-corded devices has clearly improved, as indicated by the number of outages attributed to critical distribution. Properly applied, dual-corded devices do not experience any effect on loss of a single source. Analysis of the AIRs database from 2007 to 2012 showed a reduction of more than 90% of critical distribution failures impacting the IT load.
Some data center owners or IT teams try to achieve dual power paths to IT equipment using large static transfer switches (STS) or STS power distribution units (PDU) (see Figure 3). However, problems inherent in the maintenance, replacement, or a fault of an STS for the device and onward threaten the critical load. One data center suffered a fault on an STS-PDU that affected one third of its IT equipment and loss of those systems rendered the entire data center unavailable. As noted in Figure 3, the single large STS solution does not meet Tier III or Tier IV criteria.
Figure 3. Static transfer switches
Uptime Institute recognizes that some heritage devices or legacy systems may end up in data centers, due to systems migrations challenges, mergers and acquisitions, consolidations, or historical clients. Data center infrastructure professionals need to question the justifications that lead to these conditions: If the system is so important, why is it not migrated to a high-availability, dual-corded IT asset?
The Tier Standard: Topology does include an accommodation for single-corded equipment as shown in Figure 4, depicting a local, rack-mounted transfer switch. The rack-mounted or point-of-use transfer switch allows for distribution of risk as low as possible in the critical distribution.
Still, many in IT have not yet gotten the message and bring in more than the occasional one-off device. Single-corded devices are found in a larger percentage of installations than should be expected. Rob McClary, SVP and GM of FORTRUST, said, “FORTRUST utilizes infrastructure with dual-power paths, yet we estimate that greater than 50% of our clients continue to deploy at least one or more single-corded devices that do not utilize our power infrastructure and could impact their own availability. FORTRUST strongly supports continued education to our end user community to utilize all dual-corded IT assets for a true high-availability solution.” The loss of even one of the data centers asset in their deployment can render the platforms or applications of the deployment unavailable. The disconnect between data center infrastructure and IT hardware continues to exist.
Figure 4. Point-of-use transfer switch
Uptime Institute teams still find the following configurations that continue to plague data center operators:
Single-corded network devices
Mainframes that degrade or are lost on loss of a single source of power
IT devices with an odd number of cords
The Future: A Call to Action
Complex systems such as data center infrastructure and the IT equipment and systems within them require comprehensive team approaches to management, which means breaking down the barriers between the organizations by integrating Facilities and IT staff, allowing the integrated organization to manage the data center and educating end users who don’t understand power infrastructure. If we can’t integrate, then educate.
If a merger of IT and facilities just won’t work in an enterprise data center, a regular meeting will at least enable teams to share knowledge and review change management and facilities maintenance actions. In addition, codifying change management and maintenance window procedures in terms IT can understand using an ITIL-based system will enable IT counterparts to start to understand the criticality of power distribution as they see the how and why of data center facility operations firsthand.
Colocation and third-party data centers understand that many client IT organizations have limited in-house staff, expertise, and familiarity with high-availability data centers. The need to educate these clients is clear. Several ways to educate include:
Compile incident reports involving single-corded device and share them with new tenants and deployments teams
Create a one-page fact sheet on dual-corded infrastructure with a schematic and benefits summary that those users can understand
Create a policy that requires rack-mounted or point-of-use transfer switches for all single-corded device.
Require all devices that support a high-availability application or IT deployment to be dual corded
These actions will pay dividends with increased ease of maintenance and reduced client coordination.
Facilities teams also need to look within themselves. Improved monitoring and data center infrastructure management (DCIM) solutions provide windows into the infrastructure but do not replace good management. Anecdotal evidence has shown 1-10% of servers in a data center may be improperly corded, i.e., both cords are plugged into the A distribution.
Management can address these challenges by
Clearly and consistently labeling A and B power
Training all staff working in critical areas about data center policies, including the dual-corded policy
Performing quality control to verify A/B cording, phase balancing, and installation documentation
Capturing the configuration of the data center
Regularly tracking single-corded installations to pressure owners of those systems to modernize
Summary
Millions of dollars are regularly invested in the dual-power path infrastructure in data centers for high availability because of business needs. This is clearly represented in the increasing cost of downtime from lost business to ruined reputations or goodwill. It is essential that Facilities and IT, including the procurement and installation teams, work together to safeguard the investment, making sure dual-power path technology is utilized for business critical applications. In addition, owners and operators of data centers must continue to educate customers who lack the knowledge or familiarity with data center practices and manage the data center to ensure high-availability principals such as dual-corded architecture are fully utilized.
Fault-Tolerant Power Compliance Specification Version 2.0
Fault-tolerant power equipment refers to computer or communication hardware that is capable of receiving AC input from two different AC power sources. The objective is to maintain full equipment functionality when operating from A and B power sources or from A alone or from B alone. Equipment with an odd number of external power inputs (line cords) generally will not meet this requirement. It is desirable for equipment to have the least number of external power inputs while still meeting the requirement for receiving AC input from two different AC power sources. Products requiring more than two external power inputs risk being rejected by some sites. For equipment to qualify as truly fault-tolerant power compliant, it must meet all of the following criteria as initially installed, at ultimate capacity, and under any configuration or combination of options. (The designation of A and B power sources is used for clarity in the following descriptions.)
If either one of two AC power sources fails or is out-of-tolerance, the equipment must still be able to start up or continue uninterrupted operation with no loss of data, reduction in hardware functionality, performance, capacity, or cooling.
After the return of either AC power source from a failed or out-of-tolerance condition, during which acceptable power was continuously available from the other AC power source, the equipment will not require a power-down, IPL, or human intervention to restore data, hardware functionality, performance, or capacity.
The first or second AC power source may then subsequently fail no later than 10 seconds after the return of the first or second AC power source from a failed or out-of-tolerance condition with no loss of data, reduction in hardware functionality, performance, capacity, or cooling.
The two AC power sources can be out of synchronization with each having a different voltage, frequency, phase rotation, and phase angle as long as the power characteristics for each separate AC source remain within the range of the manufacturer’s published specifications and tolerances.
Both external AC power inputs must terminate within the manufacturer’s fault-tolerant power compliant computer equipment. In the event that the external AC power input is a detachable power cord, the equipment must provide for positive retention of the female plug so the plug cannot be pulled loose accidentally. Within the equipment, the AC power train (down to and including the AC to DC power supplies) must be compartmentalized such that any power train component to neighter side can be safely serviced without affecting computer equipment availability or performance and without putting the AC power train of the other side at risk.
For single- or three-phase power sources, the neutral conductor in the AC power input shall not be bonded to the chassis ground inside the equipment. This will prevent circulating ground currents between the two external power sources.
Internal or external active AC input switching devices (e.g., mechanical or electronic transfer switches) are not acceptable.
A fault inside the manufacturer’s equipment that results in the failure of one AC power source shall not be transferred to the second AC power source causing it to also fail.
For single- or three-phase power sources, with both AC power inputs available and with both inputs operating at approximately the same voltage, the normal load on each power source will be shared within 10% of the average.
For three-phase power source configurations, the normal load on each phase will be within 10% of the average.
Keith Klesner’s career in critical facilities spans 14 years and includes responsibilities ranging from planning, engineering, design and construction to start-up and ongoing operation of data centers and mission-critical facilities. In the role of Uptime Institute vice president of Engineering, Mr. Klesner has provided leadership and strategic direction to maintain the highest levels of availability for leading organizations around the world. Mr. Klesner performs strategic-level consulting engagements, Tier Certifications and industry outreach—in addition to instructing premiere professional accreditation courses. Prior to joining the Uptime Institute, Mr. Klesner was responsible for the planning, design, construction, operation and maintenance of critical facilities for the U.S. government worldwide. His early career includes six years as a U.S. Air Force officer. He has a Bachelor of Science degree in Civil Engineering from the University of Colorado-Boulder and a Masters in Business Administration from the University of LaVerne. He maintains status as a professional engineer (PE) in Colorado and is a LEED-accredited professional.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/08/Dual-Corded-Power.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-08-18 12:25:402016-10-24 10:27:14Dual-Corded Power and Fault Tolerance: Past, Present, and Future
The availability of qualified candidates is just part of the problem with data center staffing; the industry also lacks training and clear career paths attractive to recruits.
The data center industry is experiencing a shortage of personnel. Uptime Institute Founder Ken Brill, as always, was among the first to note a trend, mentioning it more than 10 years ago. This trend reflects, in part, an aging global demographic but also increasing demand for data center personnel, which Uptime Institute Network members have described as chronic. The aging population threatens many industries but few more so than the data center industry, where the Network Abnormal Incident Reports (AIRs) database supports the relationship between downtime and inexperienced personnel.
As a result, at recent meetings North American Network members discussed how enterprises could successfully attract, train and retain staff. Network members blame the shortage on increasing market demand for data centers, inflexible organizational structures and an aging workforce retiring in greater numbers. This shortfall has already caused interruptions in service and reduced availability to mission-critical business applications. Some say that the shortage of skilled personnel has already created conditions that could lead to downtime. If not addressed in the near term, the problem could affect sections of the economy and company valuations.
Prior to 2000, data center infrastructure was fairly static; power and cooling demand generally grew following a linear curve. Enterprises could manage the growth in demand during this period fairly easily. Then technology advances and increased market adoption rates changed the demand for data center capacity so that it no longer followed a linear growth model. This trend continues, with one recent survey from TheInfoPro, a service of 451 Research, finding that 37% of data center operators had added data center space from July 2012 to June 2013. Similarly, the 2013 Uptime Institute Data Center Industry Survey found that 70% of 1000 respondents had built new data center space or added space in the last five years. The survey reported even more data center construction projects in 2012 and 2011 (see Figures 1-3). The 2013 survey showing more detail about industry growth appears here.
Figure 1: New data center space reported in the last 12 months.
Drivers of the data center market are similar to those that drive overall internet growth and include increasing broadband penetration, e-commerce, video delivery, gaming, social networking, VOIP, cloud computing and web applications that make the internet and data networking a key enabler of business and consumer activity. More qualified personnel are required to respond to this accelerated growth.
The organization models of many companies kept IT and Facilities or real-estate groups totally separate. IT did IT work while Facilities maintained the building, striped the parking lot, and—oh, by the way—supported the UPS systems. The groups did not share goals, schedules, meetings or ideas. This organizational structure worked well until technology accentuated the importance of, and lack of actual, middle ground between the two groups.
Figure 2. Demand significantly outpaced supply since 2010.
Efforts to bridge the gap between the two groups foundered because of conflicting work processes and multiple definitions for shared terms (such as mission critical maintenance). Simply put, the two groups spoke different languages, followed different leaders and pursued unreconciled goals. Companies that recognized the implications of the situation immediately reorganized. Some of these companies established mission critical teams and others moved Facilities and IT into the same department. This organizational challenge is by no means worked out and will continue well into the next decade.
Though no government agency or private enterprise keeps track of employment trends in data centers, U.S. Social Security Administration (SSA) statistics for the general population support anecdotes shared by Network members. According to the SSA, which is the agency that supervises the federal retirement benefits program in the U.S., 10,000 people per day apply for social security benefits, with this number expected to continue to 2025 as the baby boomers continue to retire, a phenomenon first apparently dubbed the “silver tsunami” by the Alliance for Aging Research in 2006. The populations of Europe and wide parts of Asia, including China and Japan, are also aging.
The direct experiences shared by Uptime Institute Network members suggest that the data center industry is highly vulnerable to, if not already diminished by, this larger societal trend. Network members estimate that 40% of the facilities engineering community is older than 50. One member of the Network expects that 50% of its staff will retire in the next two years. Network members remain concerned that many qualified candidates—science, technology, engineering and mathematics (STEM) students—are unaware of the employment opportunities offered by the industry and may not be attracted to the 24 x 7 nature of the work.
Tony Ulichnie, who presided over many of these discussions as Network Director, North America (before retiring in July of this year), described the cost of wisdom and experience lost with the retirement of the retiring generation as “the price of silver,” referring to the loss to the organization when a longstanding and silver-haired data center operations specialist retires.
Military and civilian nuclear programs have proven to be a source of excellent candidates for data center facilities but yield only so many graduates. These “Navy Nukes” and seasoned facilities engineers command very competitive salaries and find themselves being courted by the industry.
Industry leaders say that the pipeline for replacement engineers has slowed to a dribble. Tactics such as poaching and counteroffers have become commonplace in the field.
Potential employers are also reluctant to risk hiring green (inexperienced) recruits. The practices of mission-critical maintenance require much discipline and patience, especially when dealing with IT schedules and meeting application availability requirements. Deliberate processes along with clear communications skills become necessary elements of an effective facilities organization. Identifying individuals with these capabilities is the trick: one Uptime Institute Network member found a key recruit working at a bakery. Another member puts HVAC students through an 18-month training program after hiring them from a local vocational school, with a 70% success rate.
Figure 3. Those reporting new space in the Uptime Institute Survey (see p. 142 for the full 2013 Uptime Institute Data Center Survey) in the last five years, Growth in new whitespace by size also reported that a wide variety of spaces had been built.
The hunt for unexplored candidate pools will increase in intensity as the demand for talent escalates in the next decade, and availability and reliability will also suffer unless the industry addresses the problem in a comprehensive manner. To mitigate the silver tsunami, some combination of industry, individual enterprises and academia must create effective training, development and apprenticeship programs to prepare replacements for retirees at all levels of responsibility. In particular, data center operators must develop ways to identify and recruit talented young individuals who possess the key attributes needed to succeed in a pressure-packed environment.
A Resource Pool
Veterans form an often overlooked and/or misunderstood talent pool for technical and high-precision jobs in many fields, including data centers. Statistics suggest that unemployment among veterans exceeds the national rate, which is counterintuitive to those who have served. With more than one million service members projected to leave the military between now and 2016 due to the draw down in combat operations, according to U.S. Department of Defense estimates, unemployment among veterans could be a growing national problem in the U.S.
From the industry perspective, however, the national problem of unemployed veterans could prove an opportunity to “do well by doing good.” While experienced nuclear professionals represent a small pool of high-end and experienced talent, the pool of unemployed but trainable veterans represents a nearly inexhaustible source of talent suitable, with appropriate preparation, for all kinds of data center challenges.
Data centers compete with other industries for personnel, so now is the time to seize the opportunity because other industries are already moving to capitalize on this pool of talent. For example, Walmart has committed to hiring any veteran who was honorably discharged in the past 12 months, JP Morgan Chase has teamed with the U.S. Chamber of Commerce to hire over 100,000 veterans, and iHeartRadio’s Show Your Stripes program features many large and small enterprises, including some that own or operate data centers, committed to meeting the employment needs of veterans. For its own good, the data center industry must broadly participate in these efforts and drive to acquire and train candidates from this talent pool.
In North America, some data center staffs already include veterans who received technical training in the military and were able to land a job because they could quickly apply those skills to data centers. These technicians have proven the value of hiring veterans for data center work, not only for their relevant skills but also for their personal attributes of discipline and performance excellence.
The data center industry can take further advantage of the talent pool of veterans by establishing effective training and onboarding programs (mechanisms that enable new employees to acquire the necessary knowledge, skills and behaviors to become effective organizational members and insiders) for veterans who do not have the technical training (e.g., infantry, armor) that translates easily to the data center industry but have all the other important characteristics, including a proven ability to learn. Providing clear pathways for veterans of all backgrounds to enter the industry will ensure that it benefits from the growing talent pool and will be able to compete effectively with the other industries.
While technically trained veterans can enter the data center industry needing only mentoring and experience to become near-term replacements for retiring mid-level personnel, reaching out to a broader pool that requires technical training will create a generation of junior staff who can grow into mid-level positions and beyond with time and experience. The leadership, discipline and drive that veterans have will enable them to more quickly grasp and master the technical requirements of the job and adapt with ease to the rigor of data center operations.
Veterans’ Value to Industries
Military training and experience is unequaled in the breadth and depth of skills that it develops and the conditions in which these skills are vetted. Service members are trained to be intellectually, mentally and emotionally strong. They are then continuously tested in the most extreme conditions. Combat veterans have made life and death decisions, 24 hours a day for months without a break. They perform complex tasks, knowing that the consequences of failure could result in harm or even the death of themselves and others. This resilience and strength can be relied on in the civilian marketplace.
Basic training teaches the men and women of the military that the needs of the team are greater than their individual needs. They are taught to lead and to follow. They are taught to communicate up, down and across. They learn that they can achieve things they never thought possible because of these skills, and with a humble confidence can do the same in any work environment.
The military is in a constant state of learning, producing individuals with uncanny adaptive thinking and a capacity and passion for continuing to learn. This learning environment focuses not only on personal development but also on training and developing subordinates and peers. This experience acts as a force multiplier when a veteran who is used to knowing his job plus that of the entire team is added to the staff. The veteran is used to making sure that the team as a whole is performing well rather than focusing on the individual. This unwavering commitment to a greater cause becomes an ingrained ethos that can improve the work habits of the entire team.
The public commonly stereotypes military personnel as unable to think outside of a chain of command, but following a chain of command is only a facet of understanding how to perform in a team. Service members are also trained to be problem solvers. In this author’s experience, Iraq and Afghanistan were highly complex operations where overlooking the smallest detail could change outcomes. The military could not succeed at any mission if everyone waited for specific orders/instructions from their superiors before reacting to a situation. The mindset of a veteran is to focus on the mission: mission leaders impart a thorough understanding of the intent of a plan to troops, who then apply problem-solving skills to each situation in order to get the most positive outcome. They are trained to be consummate planners, engaging in a continuous process of assessment, planning, execution, feedback and fine tuning to ensure mission success.
Reliability is another key attribute that comes from military service. Veterans know that a mission that starts a minute late can be fatal. This precision translates to little things like punctuality and big things like driving projects to meet due dates and budgets. This level of dependability is cornerstone of being a good teammate and leader.
Finally, an often overlooked value of military service is the scope of responsibility that veterans have had, which is often much larger than their non-veteran peers. It is not uncommon for servicemen and women in their twenties to have managed multi-million dollar budgets and hundreds of people. Their planning and management experience is gained in situations where bad decisions can cause troops to drive into an ambush that might also prevent supplies or reinforcements from reaching an under-provisioned unit.
Because military experience produces individuals who demonstrate strong leadership skills, reliability, dependability, integrity, problem-solving ability, proven ability to learn and a team-first attitude, veterans are the best source of talent available. Salute Inc. is an example of a company that helps bring veterans into the data center industry, and in less than six months has proven the value proposition.
Challenges
Recent Uptime Institute Network discussions identified the need for standard curriculum and job descriptions to help establish a pathway for veterans to more easily enter the industry, and Network members are forming a subcommittee to examine the issue. The subcommittee’s first priority is establishing a foundation of training for veterans whose military specialty did not include technical training. Training programs should allow each veteran to enter the data center industry at an appropriate level.
At the same time, the subcommittee will assess and recommend human resource (HR) policies to address a myriad of systemic issues that should be expected. For example, once trainees become qualified, how should companies adjust their salaries? Pay adjustments might exceed normal increases; however, the market value of these trainees has changed, and, unlike other entry-level trainees, veterans have proven high retention rates. The subcommittee has already defined several entry-level positions:
Network operations center trainee
Data center operations trainee
Security administration trainee
IT equipment installation trainee
Asset management administrator trainee
Resources for Veterans
The Center for New American Security (CNAS) conducted in-depth interviews with 69 companies and found that more than 80% named one or two negative perceptions about veterans. The two most common are skill translation and concerns about post-traumatic stress (PTS).
Many organizations have looked at the issue of skill translation. Some of them have developed online resources to help veterans translate their experiences into meaningful and descriptive civilian terms (www.careerinfonet.org/moc/). They also provide tools to help veterans articulate their value in terms that civilian organizations will understand.
Organizations that access these resources will also gain a better understanding of how a veteran’s training and skills suit the data center environment. In addition, the military has established comprehensive transition programs that all service members go through when re-entering the civilian job market, including resume preparation and interview planning. The combination of government-sponsored programs and resources, a veteran’s own initiative and a civilian organization’s desire to understand can offset concerns about skill translation.
PTS is an issue that cannot be ignored. It is one of the more misunderstood problems in America, even among some in the medical community. It is important to understand more about PTS before assuming this problem affects only veterans. It is estimated that 8% of all Americans suffer from PTS, which is about 25 million people. The number of returning military who have been diagnosed with PTS is 300,000, which is about 30% of Iraq/Afghanistan combat veterans, yet only a very small proportion of the total PTS sufferers in the U.S. The mass media—where most people learn about PTS—often describes PTS as a military issue because the military approach to PTS is very visible: there is a formal process for identifying it and also enormous resources focused on helping veterans cope with it. Given that there are 80 times more non-veterans suffering from PTS, the focus for any HR organization should be ensuring that a company’s practices (from the interview to employee assistance programs and retention) are effectively addressing the issue in general.
Conclusion
The data center industry needs the discipline, leadership and flexibility skills of veterans to serve as a foundation on which it can build the next generation of data center operators. The Uptime Institute Network is establishing a subcommittee and called for volunteers to help define the fundamentals that would be required to have an effective onboarding, training and development program in the industry. This group will address everything from job descriptions to clearly defined career paths for both entry-level trainees as well as experienced technicians transitioning from the military. For further information or if you are interested in contributing to this effort, please contact Rob Costa, Network Director, North America ([email protected]).
Resources
The following list provides a good starting point for understanding the many resources available for veterans and employers to connect.
Lee Kirby is Uptime Institute senior vice president, CTO. In his role he is responsible for serving Uptime Institute clients throughout the life cycle of the data center from design through operations. Mr. Kirby’s experience includes senior executive positions at Skanska, Lee Technologies and Exodus Communications. Prior to joining the Uptime Institute, he was CEO and founder of Salute Inc. He has more than 30 years of experience in all aspects of information systems, strategic business development, finance, planning, human resources and administration both in the private and public sectors. Mr. Kirby has successfully led several technology startups and turn-arounds as well as built and run world-class global operations. In addition to an MBA from University of Washington and further studies at Henley School of Business in London and Stanford University, Mr. Kirby holds professional certifications in management and security (ITIL v3 Expert, Lean Six Sigma, CCO). In addition to his many years as a successful technology industry leader, he masterfully balanced a successful military career over 36 years (Ret. Colonel) and continues to serve as an advisor to many veteran support organizations.
Mr. Kirby also has extensive experience working cooperatively with leading organizations across many Industries, including Morgan Stanley, Citibank, Digital Realty, Microsoft, Cisco and BP.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/08/06.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-08-13 13:36:072015-04-28 12:52:43Resolving the Data Center Staffing Shortage
Improving communication between the enterprise and design engineers during a capital project
For over 10 years, Uptime Institute has sought to improve the relationship between data center design engineers and data center owners. Yet, it is clear that issues remain.
Uptime Institute’s uniquely unbiased position—it does not design, construct, commission, operate, or provision equipment to data centers—affords direct insight into data center capital projects throughout the world. Uptime Institute develops this insight through relationships with Network members in North America, Latin America, EMEA, and Asia Pacific; the Accredited Tier Designer (ATD) community; and the owner/operators of 392 Tier Certified, high-performance data centers in 56 countries.
Despite increasingly sophisticated analyses and tools available to the industry, Uptime Institute continues to find that when an enterprise data center owner’s underlying assumptions at the outset of a capital project are not attuned to its business needs for performance and capacity, problematic operations issues can plague the data center for its entire life.
The most extreme cases can result in disrupted service life of the new data center. Disrupted service life may be classified in three broad categories.
1. Limited flexibility
The resulting facility does not the meet the performance requirements of an IT deployment that could have been reasonably forecast
The resulting facility is difficult to operate, and staff may avoid using performance or efficiency features in the design
2. Insufficient capacity
Another deployment (either new build, expansion, or colocation) must be launched earlier than expected
The Enterprise must again fund and resource a definition, justification, and implementation phase with all the associated business disruptions
The project is cancelled and capacity sought elsewhere
3. Excess capacity
Stranded assets in terms of space, power, and/or cooling represent a poor use of the Enterprise’s capital resources
Efficiency is diminished over the long term due to low utilization of equipment and space
Capital and operating cost per piece of IT or network equipment is untenable
Any data center capital project is subject to complex challenges. Overtime and over-budget considerations, such as inclement weather, delayed equipment delivery, overwhelmed local resources, slow-moving permitting and approval bureaucracies, lack of availability of public utilities (power, water, gas), merger or acquisition, or other shift in corporate strategy, may be outside of the direct control of the enterprise.
But other causes of overtime and over-budget are avoidable and can be dealt with effectively during the pre-design phase. Unfortunately, many of these issues become clear to the Enterprise after the project management, design, construction, and commissioning teams have completed their obligations.
Planning and justifying major data center projects has been a longstanding topic of research and education for Uptime Institute. Nevertheless, the global scale of planning shortfalls and project communication issues only became clear due to insight gained through the rapid expansion of Tier Certifications.
Even before a Tier Certification contract is signed, Uptime Institute requests a project profile, composed of key characteristics including size, capacity, density, phasing, and Tier objective(s). This information helps Uptime Institute determine the level of effort required for Tier Certification, based on similar projects. Additionally, this allows Uptime Institute to provide upfront counsel on common shortfalls and items of concern based upon our experience of similar projects.
Furthermore, a project team may update or amend the project profile to maintain cost controls. Yet, Uptime Institute noted significant variances in these updated profiles in terms of density, capacity, and Tier. It is acknowledged that an owner may decide to amend the size of a data center, or to adjust phasing, to limit initial capital costs or otherwise better respond to business needs. But a project that moves up and down the Tier levels or varies dramatically in density from one profile to another indicates management and communication issues.
These issues result in project delays, work stoppages, or cancellations. And if the project is completed, it can be expected to lack in terms of capacity (either too much or too little), meeting performance requirements (in design or facility), and flexibility.
Typically, a Tier Certification inquiry occurs after a business need has been established for a data enter project and a data center design engineer has been contracted. Unstable Certification profiles show that a project may have prematurely been moved into the design phase, with cost, schedule, and credibility consequences for a number of parties—notably, the owner and the designer.
Addressing the Communications Gap
Beginning in May 2013, Uptime Institute engaged the industry to address this management and communication issue on a broader basis. Anecdotally, both sides, via the Network or ATD courses, had voiced concerned that one had insufficient insight into the scope or responsibility, or unrealistic expectations, of the other. For example, a design engineer would typically be contracted to produce an executable design but soon find out that the owner was not ready to make the decisions that would allow the design process to begin. On the other hand, owners found that the design engineers lacked commitment to innovation, and they would be delivered a solution that was similar to a previous project rather than vetted against established performance and operations requirements. This initiative was entitled Owners vs Designers (OvD) to call attention to a tension evident between these two responsibilities.
The Uptime Institute’s approach was to meet with the designers and owners separately to gather feedback and recommendations and to then reconcile the feedback and recommendations in a publication.
OvD began with the ATD community during a special session at Uptime Institute Symposium in May 2013. Participants were predominantly senior design engineers with experience in the U.S., Canada, Brasil, Mexico, Kenya, Australia, Saudi Arabia, Lebanon, Germany, Oman, and Russia. This initial session verified the need for more attention to this issue.
The design engineers’ overwhelming guidance to owners could be summarized as “know what you want.” The following issues were raised specifically and repeatedly:
1. Lack of credible IT forecasting
Without a credible IT requirements definition, it is difficult to establish the basic project profile (size, density, capacity, phasing, and Tier). As this information is discovered, the profile changes, requiring significant delays and rework.
In the absence of an IT forecast, designers have to make assumptions about IT equipment. The designers felt that that this task is outside their design contract and they are hired to be data center design experts, not IT planning experts.
2. Lack of detailed Facilities Technical Requirements
The absence of detailed Facilities Technical Requirements forces designers to complete a project definitions document themselves because a formal design exercise cannot be launched in its absenceSome designers offer, or are forced, to complete the Facilities Technical Requirements, although it is out-of-scope
Others hesitated to do so as this is an extensive effort that requires knowledge and input from a variety of stakeholders
Others acknowledged that this process is outside their core competency and the result could be compromised by schedule pressures or limited experience
3. Misalignment of available budget and performance expectations
Owners wanted low capital expense, operating expense, and cost of ownership over the life of the project.
Most solutions cannot satisfy all three. The owners should establish the highest priority (Capex, Opex, TCO).
Designers felt unduly criticized for not prioritizing energy efficiencies in data center designs, although the owners did not understand the correlation between Capex and efficiency. “Saving money takes money; a cheap data center design is rarely efficient.”
Data Center Owners Respond
Following the initial meeting with the data center design community, Uptime Institute brought the discussion to data center owners and operators in the Uptime Institute Network throughout 2013, at the North America Network Meeting in Seattle, WA, APAC Network Meeting in Shenzhen, China, and at the Fall Network Meeting in Scottsdale, AZ.
Uptime Institute solicited input from the owners and also presented the designers’ perspective to the Network members. The problems the engineering community identified resonated with the Operations professionals. However, the owners also identified multiple problems encountered on the design side of a capital project.
In the owner’s words, “designers, do your job.”
According to the owners, the design community is responsible for drawing out the owners’ requirements, providing multiple options, and identifying and explaining potential costs. Common problems in the owners’ experience include:
Conflicts often arise between the design team and outside consultants hired by owners
Various stakeholders in the owner’s organization have different agendas that confuse priorities
Isolated IT and Facilities teams result in capacity planning problems
Design teams are reluctant to stray from their preferred designs
The data center owner community agreed with the designer’s perspective and took responsibility for those shortcomings. But the owners pointed out that many design firms promote cookie-cutter solutions and are reluctant to stray from their preferred topologies and equipment-based solutions. One participant shared that he received data center design documents for a project with the name of the design firm’s previous customer still on the paperwork.
Recommendations
Throughout this process, Uptime Institute worked to collect and synthesize the feedback and potential solutions to chronic communications problems between these two constituencies. The following best practices will improve management and communication throughout the project planning and development, with lasting positive effect on the operations lifecycle.
Pre-Design Phase
All communities that participated in OvD discussions understood the need to unite stakeholders throughout the project and the importance of reviewing documentation and tracking changes throughout. Owners and designers also agreed on the need to invest time and budget for pre-design, specifically including documenting the IT Capacity Plan with near-term, mid-term, and long-term scenarios.
The owners and designers also agreed on the importance of building Facilities Technical Requirements that are responsive to the IT Capacity Plan and includes essential project parameters:
Capacity (initial and ultimate)
Tiers[s]
Redundancy
Density
Phased implementation strategy
Configuration preferences
Technology preferences
Operations requirements
Level of innovation
Energy efficiency objectives
Computer Room Master Plan
Workshop Computer Room Master Plans with IT, Facilities, Corporate Real Estate, Security, and other stakeholders and then incorporate them into the Facilities Technical Requirements. After preparing the Facilities Technical Requirements, invite key stakeholders to ratify the document. This recommendation does not prohibit changes later but provides a basis of understanding and launch point for the project. Following ratification, brief the executive (or board). This and subsequent briefings can provide the appropriate forum for communicating the costs associated with various design alternatives, but also how they deliver business value.
RFP and Hiring
Provide as much detail about project requirements as possible in the RFP, including an excerpt of Facilities Technical Requirements in the RFP itself and technology and operations preferences and requirements. This allows respondents to the RFP to begin to understand the project and respond with most relevant experience. Also, given that many RFPs compel some level of at-risk design work, a detailed RFP will best guide this qualification period and facilitate the choice of the right design firm. Inclusion of details in the RFP does not prohibit the design from changing during its development and implementation.
Negotiate in person as much as possible. Owners regretted not spending more time with the design firm(s) before a formal engagement as misalignments only became evident once it was too late. Also, multiple owners remarked with pride that they walked out of a negotiation at least once. This demonstrated their own commitment to their projects and set a tone of consequences and accountability for poor or insufficient communication.
Assess and score the culture of the design firms for alignment with the owner’s preferred mode and tone of operations. Owners commented that they preferred a small and local design firm, which may require some additional investment in training, but they were confident would get more careful and close attention in return.
Notify the design engineer from the outset of specific requirements and indicators of success to pre-empt receiving a generic or reconstituted design.
Should the owner engage an outside consultant, avoid setting an aggressive tone for consultants. Owners may want to augment their internal team with a trusted advisor resource. Yet, this role can inadvertently result in the consultant assuming the role of guard dog, rather than focusing on collaboration and facilitation.
Design and Subsequent Phases
Owners and designers agreed that a design effort was a management challenge rather than a technical one. An active and engaged owner yields a more responsive and operable design. Those owners that viewed it as outsourcing the production/fabrication effort of a data center struggled with the resulting solution. The following recommendations will reduce surprises during or after the project.
Success was defined not as a discrete number of meetings or reports, but as being contingent upon establishing and managing a communication system.
Key components of this system include the following:
Glossary of terms: Stakeholders will have varying experience or expertise and some terms may be foreign or misconceived. A glossary of terms established a consistent vocabulary, encourages questions, and builds common understanding.
List of stakeholders: Stakeholders may vary, but identifying the ‘clients’ of the data center helps to establish and maintain accountability.
Document all changes: The owner must be able to evidence the circumstances and reasons behind any changes. These are a natural aspect of a complex data center project, but knowing the decision made and why will be key to setting expectations and successful operation of the data center.
Notify stakeholders of changes to IT Capacity Plans, Facilities Technical Requirements, and design documents. This will also help executive and non-technical stakeholders to feel engaged without disruption(s) to the project low and allow the project stakeholders to provide accurate and timely answers when decisions are questioned during or after the project.
As the recommendations were compiled from the OvD initiative, many of the recommendations resonated with Uptime Institute guidance of years past. Over 10 years ago, Ken Brill and Pitt Turner held seminars on project governance that touched upon a number of the items herein. It is an old problem, but just as relevant.
Key Quotes from the Design Community
Owners want to design to Tier III, but they want to pay for Tier II and get Tier IV performance.
Owners want technologies or designs that don’t work in their region or budget.
The IT people are not at the table, and engineers don’t have adequate opportunity to understand their requirements. Designers are often trying to meet the demands of an absent, remote, or shielded IT client who lives in a state of constant crisis.
Once the project is defined, it’s in the hands of the general contractor and commercial real estate group. Intermediaries may not have data center experience, and engineers aren’t in direct contract with the end user anymore.
Industry Perspectives
Chris Crosby, CEO, Compass Datacenters
There are some days when I’d like to throw architects and engineers off the roof. They don’t read their own documents, for example, putting in boilerplate that has nothing to do with the current project in a spec. They can also believe that they know better than the owner—making assumptions and changes independent of what you have clearly told them on paper that you want. It drives me nuts because as an owner you may not catch it until it has cost you a hundred grand, since it just gets slipsheeted into some detail or RFI response with no communication back to you.
Dennis R. Julian, PE, ATD, Principal, Integrated Design Group, Inc.
Data center designs are detail oriented. Missing a relatively minor item (e.g. control circuit), could result in shutting down the IT equipment. When schedules are compressed, it is more difficult and requires more experienced design personnel to flush out the details, do the analysis, and provide options with recommendations required for a successful solution.
There are pressures to stick with proven designs when:
Fees are low. Standard designs are used so less experienced personnel may be used to meet budgets.
Schedules are compressed. Reuse of existing designs and minimizing options and analysis speeds up completion of the design.
Good design saves capital and operating costs over the life of the facility and vastly dwarfs any savings in design fees. Selecting designers based on qualifications and not fees, similar to the Brooks Act regulating the selection of engineers by the U.S. Federal government (Public Law 92-582 92nd Congress, H.R. 12807. October 27, 1972) and allowing reasonable schedules will allow the discussion about the client’s goals and needs and the time to review alternatives for the most cost-effective solution based on total cost of ownership.
Julian Kudritzki joined the Uptime Institute in 2004 and currently serves as Chief Operating Officer. He is responsible for the global proliferation of Uptime Institute Standards. He has supported the founding of Uptime Institute offices in numerous regions, including Brasil, Russia, and North Asia. He has collaborated on the development of numerous Uptime Institute publications, education programs, and unique initiatives such as Server Roundup and FORCSS. He is based in Seattle, WA.
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/08/Data-center-owners-v-designers.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-08-07 13:06:172015-02-06 13:06:39Resolving Conflicts between Data Center Owners and Designers
How did these six enterprises find and eliminate so much waste?
Comatose IT equipment, servers long abandoned by application owners and users but still racked and running, are hiding in plain sight within even the most sophisticated IT organizations. Obsolete or unused servers represent a double threat in terms of energy waste—squandering power at the plug, but also wasting data center facility power and capacity.
Uptime Institute Research circa 2009 states decommissioning one rack unit (1U) of servers can result in a savings of US$500 per year in energy costs, an additional US$500 in operating system licenses and US$1,500 in hardware maintenance costs. But reaping those rewards is no easy task.
According to Uptime Institute’s estimates based on industry experience, around 20% of servers in data centers today are obsolete, outdated or unused. That percentage may in fact be conservative.
According to one media report, Lexis Nexis found 50% of its servers were comatose in one of its audit samples. When the insurance firm SunLife took back management from an outsourced data center engagement firm in 2011, it found 40% of its servers were doing absolutely nothing
As early as 2006, Uptime Institute Founder Ken Brill identified comatose servers as one of the biggest opportunities for companies to improve overall IT energy efficiency. While Mr. Brill advocated for industry action on this issue, he often cautioned, “Nobody gets promoted for going around in the data center and unplugging servers.” Mr. Brill meant that data center professionals had no incentive to remove comatose machines and that IT executives lacked insight into the impact idle IT equipment was having on the cost structures of their organizations, as their departments do not pay the data center power bill.
The corporate disconnect between IT and Facilities Operations continues to challenge the data center industry. Data center managers need to overcome that organizational barrier and get executive level buy-in in order to implement an effective server decommissioning program.
Winners of Server Roundup at the Uptime Institute Symposium 2013
This is why Uptime Institute invited companies around the globe to help address and solve the problem of comatose servers by participating in the Server Roundup, an initiative to promote IT and Facilities integration and improve data center energy efficiency.
The annual Uptime Institute Server Roundup contest was launched in October 2011 to raise awareness about the removal and recycling of comatose and obsolete IT equipment in an effort to reduce data center energy use. In 2012, Uptime Institute named AOL and NBC Universal inaugural Server Roundup champions. AOL had removed nearly 10,000 obsolete servers, and NBC Universal culled 1,090 comatose machines, representing 29% of its overall IT footprint. The following year’s results were even more impressive.
2013 Winners and Finalists
WINNER: AOL won in back-to-back years for its overall tally of servers removed. The global Web services company decommissioned 8,253 servers in calendar year 2012. This produced (gross) total savings of almost US$3 million from reduced utility and maintenance costs and asset resale/scrap. Environmental benefits included reducing carbon emissions by more than 16,000 tons, according to AOL.
WINNER: Barclays, a global financial organization, removed 5,515 obsolete servers in 2012, gaining power savings of around 3 megawatts, US$3.4 million annualized savings for power, and a further US$800K savings in hardware maintenance.
FINALIST: TD Bank removed 513 servers in 2012. The team from this Canadian financial firm removed 2,941 units in the 5 years they’ve been working to remove obsolete machines from the raised floor. Although the TD Bank annual server count does not approach the impressive numbers put up by AOL, the organization makes up for it in volume of waste that it diverts from local and municipal waste sites. All the equipment sent through the E-Waste recycler is salvaged within a 110-mile radius from TD Bank’s primary data centers. Nothing is shipped overseas for processing.
Server Roundup trophy belt buckle
FINALIST: McKesson pulled 586 servers in 2012, reducing data center power usage by 931.7 kilowatts and saving US$734,550.
FINALIST: Sun Life Financial removed 387 servers in 2012, which resulted in 32 kilowatts of power savings across three data centers and financial savings of US$8,800 per month.
Since the contest’s launch two years ago, Server Roundup participants have decommissioned and recycled 30,000 units of obsolete IT equipment.
In the sidebars, Server Roundup winner Paul Nally and finalist Rocco Alonzi discuss the challenges and benefits of a server-decommissioning program and detailed their strategies for success .
Takeaways From Last Year’s Winners
During the 2013 Uptime Institute Symposium, last year’s winners provided the following advice:
Get senior management to buy-into the program. “There is risk involved, but we need to get senior management buy-off on the risk,” Nally said. “There’s short-term risk and long-term risk. If you flip the wrong switch, and we have, you’ll cause an outage. But if you leave it on the wire to stagnate for five to six years, when it eventually dies, we will not be able to recover it.”
When you pitch server decommissioning to execs, discuss business impacts. “The easiest way to find yourself alone in an empty room is to call a meeting about server retirement,” Nally said. “People don’t understand the challenge. When we have the conversations with the C-level suite, we tell them what 5,000 servers means. We don’t talk in terms of kilowatts. We talk in terms of dollars.”
The biggest roadblock will be cultural. “Executives have other things on their roadmaps that are more interesting, like developing revenue. Getting buy-in requires getting people to commit to doing stuff they don’t like doing. People would rather move on to the next great thing, rather than dealing with the management problem they have,” said Scott Killian, Senior Technical Director of Data Center Services at AOL.
Get some help. “We brought in a couple of university students to do a bookto-floor audit of all the servers over three months under supervision of my group,” Alonzi said. “We took that information and started to cross-reference based on applications. All these data were about 80% accurate. Once we gathered all the information, we found question marks around a lot of hardware. There was work we had to do with our service providers, network people and storage guys. We literally had to drag people onto the raised floor and point to a cabinet or a bank of servers and say, ‘What are these doing?’”
Don’t be afraid to perform the “scream test.” “This is where you have a server that you know is not live, but you cannot find or establish the server owner. You pull the network cable from the back of the server and see who calls you to report the server being down and then investigate from there,” said Guy Pattison, Technical Solutions Officer, Data Center Management, TD Bank.
Document as much as possible. “Having a good DCIM is key. We have a backend system polling the servers to understand how machines are being used and who’s using them,” Nally said.
Keep up with incoming servers. “Any new hardware purchased comes through the data center operations group,” Alonzi said. “We don’t make a decision on what they’re buying, but we make sure it’s assigned to a project, and it’s not landing on the dock because the vendor was having a fire sale. Unless there’s a net new project or growth, we challenge more now.”
Paul Nally, Director at Barclays
“It has been said that the greenest data center is the one that’s never built. That is the main reason we have our server decommissioning program at Barclays. We are looking to shrink our data center footprint and benefit from the savings that this affords us, while allowing ourselves to massively expand our overall compute capability. When obsolete servers are removed in the thousands, it creates the capacity that we need to bring the next generation of systems in.
We save in space; we save in power. It helps us meet our carbon targets. When we eliminate or virtualize a server, we also save on network, SAN, and software costs. A server that may have cost US$100,00 seven years ago, took up half a rack of space, and required a couple of kW to run is absolutely crushed in compute performance by a modern blade costing US$5,000. But the benefits extend throughout the overall organization. A focus on removing these obsolete systems simplifies the environment from a network and systems administration perspective. Applications teams benefit from a more stable system that is easily maintained and integrated into contemporary management frameworks. We end up in a cleaner, safer, cheaper place with the capacity in hand that we need to continue to grow our business. There is real work, and some risk, in getting this job done, but the benefits are simply too many to ignore.”
Rocco Alonzi, AVP Data Center Governance at Sun Life Financial
“The removal of an under-utilized server sounds much easier than it really is. The thought of turning off a server and removing it from the raised floor can be overwhelming even if you are 100% certain that it is no longer required. Think about the process for a moment. As the server connections (electrical power, network, SAN storage) are removed and the server physically pulled out of a production cabinet, the hard drive data must be permanently destroyed and finally the server needs to be returned to the vendor or disposed of properly. The logical aspect includes another entire separate process so that in the end it is much easier on everyone to leave it powered on.
This is the message I communicated to the Leadership team followed by a solution and a promise. The solution included a dedicated resource (Contractor), asset database, and cooperation from the Server, Storage, and Network support teams. The contractor walked the raised floor performing an asset database book to raised-floor audit. And, yes, this did take some time, three months to be exact. This rich information was used to identify the servers that were not in the database but physically on the raised floor. We also challenged the support groups to associate their service offering with corresponding hardware infrastructure. These two exercises led to approximately 400 servers being switched off.
The promise was that Data Centre Operations team would do all the work after the hardware device was switched off. This included working with the support groups to reclaim IP addresses, SAN storage ports, and electrical power cords. We also provided the Financial department with detailed hardware information reclaiming cost savings that was passed on to the business unit. Finally, a process was put into place to remove the physical server from the raised floor, destroy the data, and properly dispose of the hardware.
The message: Raise awareness to the Leadership team of the issue and take a dedicated approach of decommissioning hardware infrastructure. It is well worth the effort.”
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for over a decade.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/08/server-roundup-cover-image.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-08-07 09:17:412014-08-28 07:47:33Decommissioning as a Discipline: Server Roundup Winners Share Success
What happens when economies of scale is a false promise?
By Chris Crosby
Chris Crosby
Chris Crosby is a recognized visionary and leader in the data center space, Founder and CEO of Compass Datacenters. Mr. Crosby has more than 20 years of technology experience and 10 years of real estate and investment experience. Previously, he served as a senior executive and founding member of Digital Realty Trust. Mr. Crosby was Senior Vice President of Corporate Development for Digital Realty Trust, responsible for growth initiatives including establishing the company’s presence in Asia. Mr. Crosby received a B.S. degree in Computer Sciences from the University of Texas at Austin.
For many of us, Economics 101 was not a highlight of our academic experience. However, most of us picked up enough jargon to have an air of competence when conversing with our business compadres. Does “supply and demand” ring a bell?
Another favorite term that we don’t hesitate to use is “economies of scale.” It sounds professorial and is easy for everyone, even those of us who slept our way through the course, to understand. Technically the term means: The cost advantages that enterprises obtain due to size, throughput, or scale of operation, with cost per unit of output generally decreasing with increasing scale as fixed costs are spread out over more units of output.
The metrics used in our world are usually expressed as cost/kilowatt (kW) of IT capacity and cost/square foot (ft2) of real estate. Some folks note all costs as cost/kW. Others simply talk about the data center fit out in cost/kW and leave the land and building (cost/ft2) out of the equation entirely. In both cases, however, economy of scale is the assumed catalyst that drives cost/ft2 and/or cost/kW ever lower. Hence the birth of data centers so large that they have their own atmospheric fields.
This model is used both by providers of multi-tenant data centers (MTDC), vendors of pre-fabricated modular units, and many enterprises building their own facilities. Although the belief that building at scale is the most cost efficient data center development method appears logical on the surface, it does, in fact, rely on a fundamental requirement: boat loads of cash to burn.
It’s First Cost, Not Just TCO
In data center economics, no concept has garnered more attention, and less understanding, than Total Cost of Ownership (TCO). Entering the term “data center total cost of ownership” into Google returns more than 11.5 million results, so obviously people have given this a lot of thought. Fortunately for folks who write white papers, nobody has broken the code. To a large degree, the problem is the nature of the components that comprise the TCO calculus. Because of the longitudinal elements that are part of the equation, energy costs over time for example, the perceived benefits of design decisions sometimes hide the fact that they are not worth the cost of the initial investment (first cost) required to produce them. For example, we commonly find this to be the case in the quest of many operators and providers to achieve lower PUE. While certainly admirable, incomplete economic analysis can mask the impact of a poor investment. In other words, this is like my wife bragging about the money she saved by buying the new dining room set because it was on sale even though we really liked the one we already had.
In a paper posted on the Compass website, Trading TCO for PUE?, Romonet, a leading provider of data center analytical software, illustrated the effect of failing to properly examine the impact of first cost on a long-term investment. Due to New Mexico’s favorable atmospheric conditions, Compass chose it as the location to examine the value of using an adiabatic cooling system in addition to airside economization as the cooling method for a hypothetical location. This is a fairly common industry approach to free cooling. New Mexico’s climate is hot and dry and offers substantial free cooling benefits in the summer and winter as demonstrated by Figure 1.
Figure 1. Free cooling means that the compressors are off, which, on the surface, means “free” as they are not drawing electricity.
In fact, through the use of an adiabatic system, the site would benefit from over four times the free-cooling hours than a site without one. Naturally, the initial reaction to this cursory data would be “get that cooling guy in here and give him a PO so I have a really cool case study to present at the next Uptime Institute Symposium.” And, if we looked at the perceived cost savings over a ten-year period, we¹d be feeling even better about our US$500,000 investment in that adiabatic system since it appears that it saved us over US$440,000 in operating expenses.
Unfortunately, appearances can be deceiving, and any analysis of this type needs to include a few things such as discounted future savings‹otherwise known as net present value (NPV), the cost of not only the system maintenance but also for the water used, and its treatment, over the 10-year period. When these factors are taken into account, it turns out that our US$500,000 investment in an adiabatic cooling system actually resulted in a negative payback of US$430,000! That’s a high price to pay for a tenth of a point in your PUE.
The point is that the failure to account for the long-term impact of an initial decision can permanently preclude companies from exercising alternative business, not just data center, strategies.
The Myth of Scale (a.k.a., The First Cost Trap)
Perhaps there is no better example of how the myth of scale morphs into the first cost trap than when a company elects to build out the entire shell of its data center upfront, even though their initial space requirements are only a fraction of the ultimate capacity. This is typically done using the justification that they will eventually “grow into it,” and it is necessary to build a big building because of the benefit of economy of scale. It¹s important to note that this is also a strategy used by providers of MTDCs, and it doesn’t work any better for them.
The average-powered core and shell (defined here as the land, four walls, and roof along with a transformer and common areas for security, loading dock, restrooms, corridors, etc.) of a data center facility typically ranges from US$20 million to upwards of US$100 million. The standard rationale for this “upfront” mode of data center construction is that this is not the ³expensive² portion of the build and will be necessary in the long run. In other words, the belief is that it is logical to build the facility in its entirety because construction is cheap on a per-square-foot basis. Under this scenario, the cost savings are gained through the purchase and use of materials in a high enough volume that price reductions can be extracted from providers. The problem is that when the first data center pod or module is added the costs go up in additional US$10 million increments. In other words, in the best case it costs US$30 million minimum just to turn on the first server! Even modular options require that first generator and that first heat rejection and piping. First cost per kW is two to four times the supposed “end” point cost per kilowatt. Enterprises can pay two or three times more.
Option Value
This volume mode of cost efficiency has long been viewed as an irrefutable truth within the industry. Fortunately, or unfortunately, depending on how you look at things, irrefutable truths oftentimes prove very refutable. In this method of data center construction, what is gained is often less important than what has been lost.
Option value is the associated monetary value of the prospective economic alternatives (options) that a company has in making decisions. In the example, the company gained a shell facility that it believes, based on its current analysis, will satisfy both its existing and future data center requirements. However, the inexpensive (as compared to the fit out of the data center) cost of US$100-US$300/ft2 is still real money (US$20-US$100 million depending on the size and hardening of the building). The building and the land it sits on are now dedicated to the purpose of housing the company’s data center, which means that it will employ today¹s architecture for the data center of the future. If the grand plan does not unfold as expected, this is kind of like going bust after you’ve gone all in during a poker game.
Figure 2. Estimated hours of free cooling at a hypothetical site in New Mexico.
Now that we have established what the business has gained through its decision to build out the data center shell, we should examine what it has lost. In making the decision to build in this way, the business has chosen to forgo any other use. By building the entire shell first, it has lost any future value of an appreciating asset‹the land used for the facility. It cannot be used to support any other corporate endeavors, such as disaster recovery offices, and it cannot be sold for its appreciated value. While maybe not foreseeable, this decision can become doubly problematic if the site never reaches capacity and some usable portion of the building/land is permanently rendered useless. It will be a 39-year depreciating rent payment that delivers zero return on assets. Suddenly, the economy of scale is never realized, so the initial cost per kilowatt is the end-point cost.
For example, let’s assume a US$3-million piece of land and US$17 million to build a building of 125,000 ft2 that supports six pods at 1,100 kW each. At US$9,000 per kW for the first data center, we have an all-in of US$30 million for 1,100 kWh over US$27,000 per kW. It’s not until we build all six pods that we get to the economy of scale that produces an all-in of US$12,000/kW. In other words, there is no economy of scale unless you commit to invest almost US$80M! This is the best case, assuming the builder is an MTDC.
It is logical for corporate financial executives to ask whether this is the most efficient way to allocate capital. The company has also forfeited any alternative uses for the incremental capital that was invested to manifest this all at once approach. Obviously once invested, this capital cannot be repurposed and remains tied to an underutilized depreciating asset.
Figure 3. Savings assumes energy cost of $US0.058 kWh, 0.25 cooling overhead and 1,250 kW of IT load
An Incremental Approach
The best way to address the shortcomings associated with the myth of scale is to construct data center capacity incrementally. This approach entails building a facility in discrete units that, as part of the base architecture, enable additional capacity to be added when it is required. For a number of reasons, until recently, this approach has not been a practical reality for businesses desiring this type of solution.
For organizations that elect to build their own data centers, the incremental approach described above is difficult to implement due to resource limitations. Lacking a viable prototype design (the essential element for incremental implementation), each project effectively begins from scratch and is typically focused on near-term requirements. Thus, the ultimate design methodology reflects the build it all at once approach as it is perceived to limit the drain on corporate resources to a one-time-only requirement. The typical end result of these projects is an extended design and construction period (18-36 months on average), which sacrifices the efficiency of capital allocation and option value for a flawed definition of expediency.
For purveyors of MTDC facilities, incremental expansion via standardized discrete units is precluded due to their business models. Exemplifying the definition of economies of scale found in our old Economics 101 textbooks, these organizations reduce their cost metrics by leveraging their size to procure discounted volume purchase agreements with their suppliers. These economies then translate into the need to build large facilities designed to support multiple customers. Thus, the cost efficiencies of MTDC providers drive a business model that requires large first-cost investments in data center facilities, with the core and shell built all at once and data center pods completed based on customer demand. Since MTDC efficiencies can only be achieved by reducing high first-cost investments by leasing capacity to multiple tenants or multiple pods to a tenant, they are forced to locate these sites in market areas that include a high population of their target customers. Thus, the majority of MTDC facilities are predominately found within a handful of markets (e.g., Northern Virginia, New York/New Jersey, and the San Francisco Bay area) where a critical mass of prospective customers can be found. This is the predominant reason why they have not been able to respond to customers requiring data centers in other locations. As a result, this MTDC model requires a high degree of sacrifice to be made by the customers. Not only must they relinquish their ability to locate their new data center wherever they need it, they must pre-lease additional space to ensure that it will be if they grow over time as even the largest MTDC facilities have finite data center capacity.
Many industry experts view prefabricated data centers as a solution to this incremental requirement. In a sense, they are correct. These offerings are designed to make the addition of capacity a function of adding one or more additional units. Unfortunately, many users of prefabricated data centers experience problems from how these products are incorporated in designs. Unless the customer is using them in a parking lot, more permanent configurations require the construction of a physical building to house them. The end result of this need is the construction of an oversized facility that will be grown into, but also suffers from the same first cost and absence of option value as the typical customer-constructed or MTDC facility. In other words, if I have to spend $US20 million day one for the shell and core, how am I saving by only building in 300-kW increments instead of 1-megawatt like the traditional guys?
The Purpose-Built Facility
In order to effectively implement a data center strategy that eliminates the issues of exorbitant first costs and the elimination of option value, the facility itself must be designed for just such a purpose. Unlike attempting to use size as the method for cost reduction, the data center would achieve this requirement through the use of a prototype, replicable design. In effect, the data center becomes a product with cost focus on a system level, not parts and pieces.
To many, the term “standard” is viewed as a pejorative that denotes a less than optimal configuration. However, as ³productization² has shown with the likes of the Boeing 737, the Honda Accord, or the Dell PC, when you include the most commonly desired features at a price below non-standard offerings, you eliminate or minimize the concern. For example, features like: Uptime Institute Tier III Design and Construction Certification, LEED certification, a hardened shell, and ergonomic features like a move/add/change optimized design would be included in the standard offering. This limits the scope of customer personalization to the data hall, branding experience, security and management systems, and jurisdictional requirements. This is analogous to car models that incorporate the most commonly desired features as standard, while enabling the customer to “customize their selection in areas such as car color, wheels, and interior finish.
The resulting solution then provides the customer with a dedicated facility, including the most essential features that can be delivered within a short timeframe (under six months from initial ground breaking) without requiring them to spend US$20-US$100 million on a shell while simultaneously relinquishing the option value of the remaining land. Each unit would also be designed to easily allow additional units to be built and conjoined to enable expansion to be based on the customer’s timeframe and financial consideration rather than have them imposed on them by the facility itself or a provider.
Summary
Due to their historically limited alternatives, many businesses have been forced to justify the inefficiency of their data center implementations based on the myth of scale. Although solutions like pre-fabricated facilities have attempted to offer prospective users the incremental approach that negates the impact of high first costs and the elimination of alternatives (option value), ultimately they require the same upfront physical and financial requirements as MTDC alternatives. The alternative to these approaches is through the productization of the data center in which a standard offering, that includes all of the most commonly requested customer features, provides end users with a cost effective option that can be grown incrementally in response to their individual corporate needs.
Industrialization, a la Henry Ford, ensures that each component is purchased at scale to reduce the cost per component. Productization shatters this theory by focusing on the system levels, not the part/component level. It is through productization that the paradox of high quality, low cost, in quickly delivered data centers becomes a reality.
https://journal.uptimeinstitute.com/wp-content/uploads/2014/07/myth.jpg4751201Kevin Heslinhttps://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.pngKevin Heslin2014-07-30 09:57:042014-07-31 11:12:56Data Center Cost Myths: SCALE
Data Center Facility Owners See Modules as Efficient Way to Deploy Capital
/in Design/by Kevin HeslinCompass sets the course with new modular data center approach.
The term modular has been used to describe a variety of approaches to data center design. Historically, the first commercially available modular design was a 2007 Sun Microsystems container-based product called the Black Box. Today, the term describes products that range from shipping containers and simple repeated designs to fully manufactured IT spaces and MEP systems built in factories and shipped to various sites.
Many modular data center providers have written white papers that tout the benefits of their proprietary designs, but there are also some informative reports on the ins and outs of modular design. John Stanley of 451 Research presented the results of his survey work at the Uptime Institute Symposium in May 2012. He surveyed 35 companies that use and provide various types of modular solutions for data centers. His research pointed out that 65% of those surveyed deployed capacity in ‘chunky’ increments, specifically highlighting the need for small chunks of capacity per module or deployment to match growth of their data centers.
Compass has based its business strategy on this industry driver and five main features of modular design:
In the Green Grid paper, “Deploying and Using Containerized/Modular Data Center Facilities,” published in November 2011, the Green Grid demonstrated that modular data centers can follow capacity requirements more closely, freeing capital and keeping MEP systems more fully loaded than large-scale deployments. The authors of the paper showed that end users could employ their utility systems and floor spaces more fully, without undue concern for system exhaustion or space starvation. The authors discuss how small increments of capacity can follow compute demands more closely, with the caveat that the facility must also have a faster implementation cycle in order to match the changing IT requirements.
The data center industry is beginning to realize the benefits of the early industrial revolution. Standardized modular power center designs provide some of the same benefits to design and construction personnel. Instead of hand-building custom electrical systems for each data center, the modular approach allows for greater deployment speed, improved quality and lower costs, all achieved by using factory-based labor. The use of modules also relieves labor stacking on the job site, while reducing the overall cost of the work by a significant amount.
In his 1936 paper, Factors Affecting the Cost of Airplanes, T.P. Wright first quantified the cost savings that could be attained using factory labor. Wright showed that the direct labor cost of assembling an airframe decreased by roughly 20% for each doubling of the production numbers. In other words, if the labor cost of building one airframe was $1,000,000, the labor cost of building two of the same models might be $800,000 each. Doubling again would mean the cost of four the same airframes would be $640,000 each, or 36% lower than the first unique model.
Compass Data Centers intended to take advantage of this principle as it developed its patent-pending solution. In doing so, Compass has found that modular designs enable it to:
As a result, Compass has moved from building a prototype on every data center project to productizing the data center program and build.
Despite the fact that the industry has not settled on a clear definition of the ‘modular data center,’ it is attempting to evade this ambiguity by proclaiming the coming of ‘Version 2,’ which seems to include prefabrication as a fundamental element. Before, modular construction was associated with ISO containers and packaged chillers. Now, modularity is being embraced as synonymous with sound design and sound capital usage.
The Compass Deployment
Every data center design begins with the establishment of availability and power requirements – reliability and power density are the cornerstones of any data center specification. The business needs of the enterprise and the IT applications to be contained within the data center should drive these decisions.
Virtually every initial design addresses the project power requirements and references the Uptime Institute’s well-known Tier specifications to define reliability requirements. Uptime’s Tier Standard helps define the equipment and systems required to support the reliability and power demands of the business’s IT model. If the needs of a business are truly defined by an Uptime Tier III specification, then its data center design should be certifiable as a Tier III design.
Compass’ Truly Modular Architecture design provides for standard features such as RHI Modular Power Solutions’ Modular Power Center units, which are also built off-site, power modules that can be configured to provide a 2N power infrastructure, and N+1 mechanical systems. Other Compass architecture features include:
The standardized design of the Compass Modular Architecture provides assurance that the facility will be Tier III design certified. Internal auditors and external customers can not only be assured of the design certification, but also have the option to certify the facility itself. Importantly, the ability to productize a unique modular data center solution into a repeatable and well-defined process was a top priority. Modularizing data center components permits control over three primary variables:
Compass shares the view that these three variables are the only three points that end users actually care about. Compass also believes that modularity:
Some jurisdictions have misconceptions about modular or containerized solutions. They envision standard ISO shipping containers, assembled on-site, as the basis of the construction process.
Meeting the local authority having jurisdiction (AHJ) early in the project’s developmental cycle is an effective way to establish trust and dispel misconceptions. Providing an extensive amount of specific information on the modular components of the proposed data centers is a good way to start.
Next Phase
As part of its schematic designs, Compass decided that its basic architectural scheme would be a shell that would contain a 10,000-square-foot (ft2) column-free, raised-floor hall and all the traditional support spaces. Compass made this decision based on the collective experience of its internal and consultant teams, as well as market analysis.
Compass decided that the systems, called Modular Power Centers (MPCs), would be packaged and delivered as standardized systems. This work was provided by Modular Power Solutions. The MPCs could be installed either internally or externally to the data center’s brick-and-mortar shell structure. Crosby and the MPC executives have filed for patent protection of the intellectual property.
Compass chose not to employ a central mechanical plant, but rather to deploy an N+1 redundant rooftop mechanical system to support the facility’s HVAC requirements. So, the schematic design established the following parameters:
Data Hall:
Support Spaces:
Modular Power Centers:
The basis-of-design (BOD) goals are then used for the next step in the design process, the creation of a single-line diagram. The development of the schematic design (SD) phase single-line diagram brings together the conceptual components and requirements of the BOD (see Figure 1).
Figure 1. Schematic design single-line diagram
The initial single-line is the tool that Compass used to determine the structure and components required for the electrical distribution system. The MPCs must be sized to be able to support the demands of the 1.2-MW data hall, the mechanical systems and the support spaces. The single-line diagram is also used to validate the Tier III redundancy and concurrent maintainability requirements. The redundancy of critical components and systems and the requirement for de-energized maintenance must be worked through at this stage of the design process.
The major electrical distribution system components were identified as:
The Compass one-line based on the Tier III requirements provided a look at the electrical system’s redundancy (see Figure 2).
Figure 2. Electrical system redundancy
Then Compass added two isolation or tie breakers to the main switchboard in order to meet the Tier III concurrently maintainable requirement. These breakers were required to allow either side of the switchboard to be shut down and completely de-energized for service. The tie breakers can also be used to enhance fault tolerance. If they are forced to trip before the main circuit breakers on either side, the non-faulted side of the system can remain operational.
The decision was made to look to the local utility provider to supply an N redundant 2500-kVA utility power transformer. The 2500-kVA/2000-kW standby generator in the base system is N redundant. A second N+1 or 2N redundant generator added to the lineup meets the Tier III redundancy requirements. The second generator will be connected to the opposite side of the switchboard lineup. Compass preferred generator redundancy over utility redundancy, finding generators to be more reliable than the utility, based upon its experience that blackouts tend to regional in nature (as seen with hurricane Sandy).
The remaining equipment would be installed in a Power Center module. That equipment was identified as:
Based on the equipment defined in the single-line, Compass determined that at least two MPCs would be required.
The tie breakers provide a natural point to split the system. This split became the basis of how the system is packaged. And the size of the modules and weight of the MPCs is constrained by the need to ship them over highways from the assembly facility to the job site. Typically, those shipping packages are not to exceed 50 feet by 12 feet and 100,000 pounds.
Each MPC switchboard lineup features a 3000-A and a 1600-A UL 891-listed switchboard. Each switchboard is equipped with two 3000-A, four 400-A, and four 450-A UL 489-listed circuit breakers. All circuit breakers larger than 200-A are 100% duty rated. All circuit breakers feature zone selective interlocks (ZSIs). A ZSI ties the circuit breaker trip units together, allowing them to communicate in order to ensure that the circuit breaker closest to the fault trips first. Increasing the fault isolation capabilities increases the data center’s ability to maintain operational continuity.
The main switchboards are configured as main-tie-main-tie-main. Each CPC has a dedicated programmable logic controller (PLC). The PLCs are hot swappable, meaning that if either processor goes down, the other processor will automatically take control. The I/O rack is located in the A CPC. There is no power bussing in the I/O section. The switchgear can be manually operated if the I/O rack is de-energized for maintenance.
Modbus Protocol is provided to the Schneider Electric StruXureWare management system at the PLC gateway for each main switchboard. The main switchgear has integrated revenue-grade power-quality metering.
Each side of the redundant power system features a 1.2-MW Schneider Electric APC Symmetra Megawatt UPS (see Figure 3). Each UPS has a dedicated external 1600-A continuous-duty-rated static bypass switch. Power to the two UPS systems is delivered from two separate (A/B) switchboards. Each switchboard is able to support the entire data center. The two tie breakers operate in the normally closed position.
Figure 3. UPS efficiency
Figure 4. (Above) Modular Power Center – top view (patent pending)
Figure 5. (Below) Modular Power Center – plan view (patent pending)
Power for the data hall will be derived from two identical MPC modules, each of which is factory-assembled prior to on-site delivery. The first layouts were completed for the MPCs with the addition of the following components:
Figure 6. Modular Power
Center – interior
The MPCs (see Figures 4-8) are IBC-rated R17 structures, which meet Miami-Dade County 149-mph wind-pressure loading requirements. The MPCs are constructed to provide protection with respect to harmful effects on the equipment due to the ingress of water (rain, sleet, snow), and will be undamaged by the external formation of ice on the enclosure or seismic events.
Compass employs 2.0-MW/2.5-MVA, 277/480-V, 3ø, four-wire generator rated at 1825 kW to provide standby power (see Figure 9 and 10).
Compass determined that the data center infrastructure will require support from a 2.0-MW/2.5-MVA generator. The sequence of operation of the total system is controlled automatically through deployment of redundant PLC control units installed in each of the 3000-A main switchboards. Should the primary standby generator fail to come online after loss of the utility source, the swing generator will pick up the critical loads of the system. Each generator will be provided with a weather-protective enclosure. All generator permits (including all operations, fuel storage, noise and air) will be obtained and maintained with the appropriate AHJ. Generators are equipped with 4,000-gallon fuel storage belly tanks for 24 hours of fuel capacity at full load.
Figure 7. Modular Power Center – interior
Figure 8. Modular Power Center – exterior
Figure 9. Standby generator
The output of each UPS module is a 1600-A distribution board equipped with a maintenance bypass. A solenoid key release unit (SKRU) is provided to ensure that the UPS has transferred to bypass before the MBP breaker can be engaged to the output switchboard. This will always be a closed transition transfer so that critical load power will never be lost.
Critical power to the IT load will be provided by eight 300-kVA PDUs installed in an alternating A/B arrangement in the data hall to provide 208/120-V power to either overhead busway or remote power panels. Each PDU has a 300-kVA K-13 rated transformer and six 225-A breakers. Additionally, each PDU has six integrated revenue-grade power-monitoring meters.
Compass determined that the mechanical requirements for each data hall can be supported by four 120-ton rooftop units (RTUs). The four RTUs provide N+1 system redundancy. Power for each RTU will be available from either the A or B System through dedicated automatic transfer switches (see Figure 11). The units feature integrated controls allowing for efficient airside economization across all units. A proprietary rapid-restart feature ensures full air movement within 30 seconds after restoration of power. The system controls deliver uniform under-floor pressures.
Figure 10. Generator exterior
This single-line diagram (see Figure 12) represents the next step in the construction document’s development process: the design development (DD)-phase documents. The updated single-line features significant developments in the design process. The new features are:
The division of the electrical distribution system into two completely separate MPCs
The new single-line diagram now reflects the separate MPCs. The MPCs in this arrangement actually complement each other, providing 2N redundancy to the data hall.
A second generator was added to provide 2N generator redundancy. In the future, this second generator could be shared with additional data halls on the same campus. The redundancy of the generators would then be N+1.
Each MPC was outfitted with provisions to provide power for the entire mechanical system. Normally, power for one-half of the mechanical equipment is supplied by each MPC. Dedicated ATSs will automatically roll power to any active MPC if power were lost.
Figure 11. Power for each RTU is available from dedicated automatic transfer switches.
The new single-line shows how both the input and output switchboards have been integrated into a single switchgear lineup. The new lineup provided for a simplified interconnection scheme when it was installed in the MPC.
The use of dual MPC facilitates the use of dual PLCs. The ability of the PLCs to stay in synchronous operation allows for a seamless transfer of control between either unit.
High levels of arc-flash energy in the main switchboards became a concern when the decision was made to have the utility provide the main 2500-kVA transformer and the transformer’s primary-side protection. Most utilities design their protection schemes to protect the utility’s own equipment. This typically doesn’t translate into limiting the arc-flash energy levels on the secondary side of these large transformers. Remoting the main utility breaker outside the MPC allows the arc-flash energy to be contained outside the remote switchboard. The arc-flash energy inside the MPC is now be significantly reduced.
Figure 12. Design-Development (DD) single-line diagram (patent pending)
The deployment of the MPCs is just part of the modular concept that was developed for Compass Data Centers. Compass has done further development of the concept. This work and Tier III compliance with the Uptime Institute are important elements of its business plan to control costs and provide alignment of data center facilities with customers’ business requirements.
Steve Emert earned a BSEE Degree in Electrical Engineering from San Jose State University. He is currently a Registered Professional Engineer in more than 15 states, and is director of Mission-Critical Engineering at Rosendin Electric, the largest privately owned electrical contractor in the U.S. Mr. Emert began his career performing design and analysis of industrial, commercial and utility power systems, cogeneration plant design and coordination studies. In the mid-1990s, he worked at the Ames Research Center located at Moffett Field, CA, where he began a mission-critical-focused career working on the NAS supercomputer and many of the technically advanced NASA facilities.
Since joining Rosendin Electric, Mr. Emert has provided the engineering foundation for the company’s design-build mission-critical construction business. Today Rosendin Electric is the largest design-build electrical contractor for mission-critical facilities in the U.S. Mr. Emert is an active member of the IEEE P1584 IEEE Guidelines for Performing Arc-Flash Hazard Calculations Working Group. He is a member of the IEEE P1584 Configuration Task Group, currently engaged in the process of defining a new set of standards for the next issue of the P1584 Guideline publication. Mr. Emert has coauthored IEEE EMC Society presentations and written articles on electrical power systems for Electrical Contractor Magazine.
Dual-Corded Power and Fault Tolerance: Past, Present, and Future
/in Operations/by Kevin HeslinDetails of dual-corded power change, but the theme remains the same.
Uptime Institute has worked with owners and operators of data centers since the early 1990s. At the time, data center owners used single-corded IT devices for even their most critical IT assets. Figure 1 shows a selection of the many potential sources of outage in the single path.
Early on, Site Uptime Network (now the Uptime Institute Network) founder Ken Brill recognized that outages due to faults or maintenance in the critical distribution system were a major problem in high availability computing. The Uptime Institute considers the critical distribution to include the power supply to IT devices from the UPS output to any PDU (power distribution unit), panel, or remote power panel (RPP), and the power distribution down to the rack via whip or bus duct.
Ahead of their time, Ken Brill and the Network created the Fault Tolerant Power Compliance Specification in 2000 to address the sources of outages, and updated it in 2002. Then, in 2004 Uptime Institute produced the paper Fault Tolerant Power Certification is Essential When Buying Products for High-Availability to directly address the issue. When this paper was written, four years after the Fault Tolerant Power Compliance Specification was first issued, critical distribution failures continued to cause the majority of data center outages.
“Fault-Tolerant Power Compliance Specification Version 2.0” lists the required functionality of Fault Tolerant dual-corded IT devices as defined by the Uptime Institute.
In the mid-1990s, the Uptime Institute led the data center industry in establishing Tiers as a way to define the performance characteristics of data centers. Each Tier builds upon the previous Tier, adding maintenance opportunity and Fault Tolerance. This progress culminated in the 2009 publication of the Tier Standard: Topology, which established Tiers as progressive maintenance opportunities and fault tolerance. The Tier Standard also included the requirement for dual-corded devices in Tier III and Tier IV objective data centers. Tier III data centers have dual power paths to provide Concurrent Maintainability of each and every component and path. Tier IV data centers require the same dual power paths for Concurrent Maintainability and add the ability to autonomously respond to failures.
Figure 1. Single-corded IT equipment
Present
The Fault Tolerant Power Compliance Specification, Version 2.0 is clearly relevant 12 years later. Originally called Fault Tolerant IT devices, today the commonly used vernacular is dual corded, and these devices have become the basis of high availability. The two terms Fault Tolerant IT devices and dual-corded IT device are used interchangeably.
Tier III and Tier IV data centers designs continue to be based upon the use of dual-corded architecture and require an active-active, dual-path distribution. The dual-corded concept is cemented into high-availability architecture in enterprise data centers, hyper-scale internet providers, and third-party data center spaces. Even the innovative Open Compute Project, sponsored by Facebook, which uses cutting-edge electrical architecture, utilizes dual-corded, Fault Tolerant IT devices.
Confoundingly, though, more than half of the more than 5,000 reported incidents in the Uptime Institute Network’s Abnormal Incident Reports (AIRs) database relate to the critical distribution system.
Dual-corded assets have increased maintenance opportunities for data center facilities management. Operations teams no longer need to wait for inconveniently timed maintenance windows to perform maintenance; instead they can maintain their facilities without IT impact during safe and regular hours. If there is an anomaly, the facilities and IT staff are on hand to address them.
Figure 2. Fault-Tolerant-and-dual-corded
Uptime Institute Network members today recognize the benefits of dual-corded devices. COO Jason Weckworth of RagingWire recently said, “Dual-corded IT devices allow RagingWire the maintenance and operations flexibility that are consistent with our Concurrently Maintainable objective and provide that extra level of availability assurance below the UPS system where any problem may have consequential impacts to availability.”
Uptime Institute Network adoption of dual-corded devices has clearly improved, as indicated by the number of outages attributed to critical distribution. Properly applied, dual-corded devices do not experience any effect on loss of a single source. Analysis of the AIRs database from 2007 to 2012 showed a reduction of more than 90% of critical distribution failures impacting the IT load.
Some data center owners or IT teams try to achieve dual power paths to IT equipment using large static transfer switches (STS) or STS power distribution units (PDU) (see Figure 3). However, problems inherent in the maintenance, replacement, or a fault of an STS for the device and onward threaten the critical load. One data center suffered a fault on an STS-PDU that affected one third of its IT equipment and loss of those systems rendered the entire data center unavailable. As noted in Figure 3, the single large STS solution does not meet Tier III or Tier IV criteria.
Figure 3. Static transfer switches
Uptime Institute recognizes that some heritage devices or legacy systems may end up in data centers, due to systems migrations challenges, mergers and acquisitions, consolidations, or historical clients. Data center infrastructure professionals need to question the justifications that lead to these conditions: If the system is so important, why is it not migrated to a high-availability, dual-corded IT asset?
The Tier Standard: Topology does include an accommodation for single-corded equipment as shown in Figure 4, depicting a local, rack-mounted transfer switch. The rack-mounted or point-of-use transfer switch allows for distribution of risk as low as possible in the critical distribution.
Still, many in IT have not yet gotten the message and bring in more than the occasional one-off device. Single-corded devices are found in a larger percentage of installations than should be expected. Rob McClary, SVP and GM of FORTRUST, said, “FORTRUST utilizes infrastructure with dual-power paths, yet we estimate that greater than 50% of our clients continue to deploy at least one or more single-corded devices that do not utilize our power infrastructure and could impact their own availability. FORTRUST strongly supports continued education to our end user community to utilize all dual-corded IT assets for a true high-availability solution.” The loss of even one of the data centers asset in their deployment can render the platforms or applications of the deployment unavailable. The disconnect between data center infrastructure and IT hardware continues to exist.
Figure 4. Point-of-use transfer switch
Uptime Institute teams still find the following configurations that continue to plague data center operators:
The Future: A Call to Action
Complex systems such as data center infrastructure and the IT equipment and systems within them require comprehensive team approaches to management, which means breaking down the barriers between the organizations by integrating Facilities and IT staff, allowing the integrated organization to manage the data center and educating end users who don’t understand power infrastructure. If we can’t integrate, then educate.
If a merger of IT and facilities just won’t work in an enterprise data center, a regular meeting will at least enable teams to share knowledge and review change management and facilities maintenance actions. In addition, codifying change management and maintenance window procedures in terms IT can understand using an ITIL-based system will enable IT counterparts to start to understand the criticality of power distribution as they see the how and why of data center facility operations firsthand.
Colocation and third-party data centers understand that many client IT organizations have limited in-house staff, expertise, and familiarity with high-availability data centers. The need to educate these clients is clear. Several ways to educate include:
These actions will pay dividends with increased ease of maintenance and reduced client coordination.
Facilities teams also need to look within themselves. Improved monitoring and data center infrastructure management (DCIM) solutions provide windows into the infrastructure but do not replace good management. Anecdotal evidence has shown 1-10% of servers in a data center may be improperly corded, i.e., both cords are plugged into the A distribution.
Management can address these challenges by
Summary
Millions of dollars are regularly invested in the dual-power path infrastructure in data centers for high availability because of business needs. This is clearly represented in the increasing cost of downtime from lost business to ruined reputations or goodwill. It is essential that Facilities and IT, including the procurement and installation teams, work together to safeguard the investment, making sure dual-power path technology is utilized for business critical applications. In addition, owners and operators of data centers must continue to educate customers who lack the knowledge or familiarity with data center practices and manage the data center to ensure high-availability principals such as dual-corded architecture are fully utilized.
Fault-Tolerant Power Compliance Specification Version 2.0
Fault-tolerant power equipment refers to computer or communication hardware that is capable of receiving AC input from two different AC power sources. The objective is to maintain full equipment functionality when operating from A and B power sources or from A alone or from B alone. Equipment with an odd number of external power inputs (line cords) generally will not meet this requirement. It is desirable for equipment to have the least number of external power inputs while still meeting the requirement for receiving AC input from two different AC power sources. Products requiring more than two external power inputs risk being rejected by some sites. For equipment to qualify as truly fault-tolerant power compliant, it must meet all of the following criteria as initially installed, at ultimate capacity, and under any configuration or combination of options. (The designation of A and B power sources is used for clarity in the following descriptions.)
Keith Klesner’s career in critical facilities spans 14 years and includes responsibilities ranging from planning, engineering, design and construction to start-up and ongoing operation of data centers and mission-critical facilities. In the role of Uptime Institute vice president of Engineering, Mr. Klesner has provided leadership and strategic direction to maintain the highest levels of availability for leading organizations around the world. Mr. Klesner performs strategic-level consulting engagements, Tier Certifications and industry outreach—in addition to instructing premiere professional accreditation courses. Prior to joining the Uptime Institute, Mr. Klesner was responsible for the planning, design, construction, operation and maintenance of critical facilities for the U.S. government worldwide. His early career includes six years as a U.S. Air Force officer. He has a Bachelor of Science degree in Civil Engineering from the University of Colorado-Boulder and a Masters in Business Administration from the University of LaVerne. He maintains status as a professional engineer (PE) in Colorado and is a LEED-accredited professional.
Resolving the Data Center Staffing Shortage
/in Executive/by Kevin HeslinThe availability of qualified candidates is just part of the problem with data center staffing; the industry also lacks training and clear career paths attractive to recruits.
The data center industry is experiencing a shortage of personnel. Uptime Institute Founder Ken Brill, as always, was among the first to note a trend, mentioning it more than 10 years ago. This trend reflects, in part, an aging global demographic but also increasing demand for data center personnel, which Uptime Institute Network members have described as chronic. The aging population threatens many industries but few more so than the data center industry, where the Network Abnormal Incident Reports (AIRs) database supports the relationship between downtime and inexperienced personnel.
As a result, at recent meetings North American Network members discussed how enterprises could successfully attract, train and retain staff. Network members blame the shortage on increasing market demand for data centers, inflexible organizational structures and an aging workforce retiring in greater numbers. This shortfall has already caused interruptions in service and reduced availability to mission-critical business applications. Some say that the shortage of skilled personnel has already created conditions that could lead to downtime. If not addressed in the near term, the problem could affect sections of the economy and company valuations.
Prior to 2000, data center infrastructure was fairly static; power and cooling demand generally grew following a linear curve. Enterprises could manage the growth in demand during this period fairly easily. Then technology advances and increased market adoption rates changed the demand for data center capacity so that it no longer followed a linear growth model. This trend continues, with one recent survey from TheInfoPro, a service of 451 Research, finding that 37% of data center operators had added data center space from July 2012 to June 2013. Similarly, the 2013 Uptime Institute Data Center Industry Survey found that 70% of 1000 respondents had built new data center space or added space in the last five years. The survey reported even more data center construction projects in 2012 and 2011 (see Figures 1-3). The 2013 survey showing more detail about industry growth appears here.
Figure 1: New data center space reported in the last 12 months.
Drivers of the data center market are similar to those that drive overall internet growth and include increasing broadband penetration, e-commerce, video delivery, gaming, social networking, VOIP, cloud computing and web applications that make the internet and data networking a key enabler of business and consumer activity. More qualified personnel are required to respond to this accelerated growth.
The organization models of many companies kept IT and Facilities or real-estate groups totally separate. IT did IT work while Facilities maintained the building, striped the parking lot, and—oh, by the way—supported the UPS systems. The groups did not share goals, schedules, meetings or ideas. This organizational structure worked well until technology accentuated the importance of, and lack of actual, middle ground between the two groups.
Figure 2. Demand significantly outpaced supply since 2010.
Efforts to bridge the gap between the two groups foundered because of conflicting work processes and multiple definitions for shared terms (such as mission critical maintenance). Simply put, the two groups spoke different languages, followed different leaders and pursued unreconciled goals. Companies that recognized the implications of the situation immediately reorganized. Some of these companies established mission critical teams and others moved Facilities and IT into the same department. This organizational challenge is by no means worked out and will continue well into the next decade.
Though no government agency or private enterprise keeps track of employment trends in data centers, U.S. Social Security Administration (SSA) statistics for the general population support anecdotes shared by Network members. According to the SSA, which is the agency that supervises the federal retirement benefits program in the U.S., 10,000 people per day apply for social security benefits, with this number expected to continue to 2025 as the baby boomers continue to retire, a phenomenon first apparently dubbed the “silver tsunami” by the Alliance for Aging Research in 2006. The populations of Europe and wide parts of Asia, including China and Japan, are also aging.
The direct experiences shared by Uptime Institute Network members suggest that the data center industry is highly vulnerable to, if not already diminished by, this larger societal trend. Network members estimate that 40% of the facilities engineering community is older than 50. One member of the Network expects that 50% of its staff will retire in the next two years. Network members remain concerned that many qualified candidates—science, technology, engineering and mathematics (STEM) students—are unaware of the employment opportunities offered by the industry and may not be attracted to the 24 x 7 nature of the work.
Tony Ulichnie, who presided over many of these discussions as Network Director, North America (before retiring in July of this year), described the cost of wisdom and experience lost with the retirement of the retiring generation as “the price of silver,” referring to the loss to the organization when a longstanding and silver-haired data center operations specialist retires.
Military and civilian nuclear programs have proven to be a source of excellent candidates for data center facilities but yield only so many graduates. These “Navy Nukes” and seasoned facilities engineers command very competitive salaries and find themselves being courted by the industry.
Industry leaders say that the pipeline for replacement engineers has slowed to a dribble. Tactics such as poaching and counteroffers have become commonplace in the field.
Potential employers are also reluctant to risk hiring green (inexperienced) recruits. The practices of mission-critical maintenance require much discipline and patience, especially when dealing with IT schedules and meeting application availability requirements. Deliberate processes along with clear communications skills become necessary elements of an effective facilities organization. Identifying individuals with these capabilities is the trick: one Uptime Institute Network member found a key recruit working at a bakery. Another member puts HVAC students through an 18-month training program after hiring them from a local vocational school, with a 70% success rate.
Figure 3. Those reporting new space in the Uptime Institute Survey (see p. 142 for the full
2013 Uptime Institute Data Center Survey) in the last five years, Growth in new whitespace by size also reported that a wide variety of spaces had been built.
The hunt for unexplored candidate pools will increase in intensity as the demand for talent escalates in the next decade, and availability and reliability will also suffer unless the industry addresses the problem in a comprehensive manner. To mitigate the silver tsunami, some combination of industry, individual enterprises and academia must create effective training, development and apprenticeship programs to prepare replacements for retirees at all levels of responsibility. In particular, data center operators must develop ways to identify and recruit talented young individuals who possess the key attributes needed to succeed in a pressure-packed environment.
A Resource Pool
Veterans form an often overlooked and/or misunderstood talent pool for technical and high-precision jobs in many fields, including data centers. Statistics suggest that unemployment among veterans exceeds the national rate, which is counterintuitive to those who have served. With more than one million service members projected to leave the military between now and 2016 due to the draw down in combat operations, according to U.S. Department of Defense estimates, unemployment among veterans could be a growing national problem in the U.S.
From the industry perspective, however, the national problem of unemployed veterans could prove an opportunity to “do well by doing good.” While experienced nuclear professionals represent a small pool of high-end and experienced talent, the pool of unemployed but trainable veterans represents a nearly inexhaustible source of talent suitable, with appropriate preparation, for all kinds of data center challenges.
Data centers compete with other industries for personnel, so now is the time to seize the opportunity because other industries are already moving to capitalize on this pool of talent. For example, Walmart has committed to hiring any veteran who was honorably discharged in the past 12 months, JP Morgan Chase has teamed with the U.S. Chamber of Commerce to hire over 100,000 veterans, and iHeartRadio’s Show Your Stripes program features many large and small enterprises, including some that own or operate data centers, committed to meeting the employment needs of veterans. For its own good, the data center industry must broadly participate in these efforts and drive to acquire and train candidates from this talent pool.
In North America, some data center staffs already include veterans who received technical training in the military and were able to land a job because they could quickly apply those skills to data centers. These technicians have proven the value of hiring veterans for data center work, not only for their relevant skills but also for their personal attributes of discipline and performance excellence.
The data center industry can take further advantage of the talent pool of veterans by establishing effective training and onboarding programs (mechanisms that enable new employees to acquire the necessary knowledge, skills and behaviors to become effective organizational members and insiders) for veterans who do not have the technical training (e.g., infantry, armor) that translates easily to the data center industry but have all the other important characteristics, including a proven ability to learn. Providing clear pathways for veterans of all backgrounds to enter the industry will ensure that it benefits from the growing talent pool and will be able to compete effectively with the other industries.
While technically trained veterans can enter the data center industry needing only mentoring and experience to become near-term replacements for retiring mid-level personnel, reaching out to a broader pool that requires technical training will create a generation of junior staff who can grow into mid-level positions and beyond with time and experience. The leadership, discipline and drive that veterans have will enable them to more quickly grasp and master the technical requirements of the job and adapt with ease to the rigor of data center operations.
Veterans’ Value to Industries
Military training and experience is unequaled in the breadth and depth of skills that it develops and the conditions in which these skills are vetted. Service members are trained to be intellectually, mentally and emotionally strong. They are then continuously tested in the most extreme conditions. Combat veterans have made life and death decisions, 24 hours a day for months without a break. They perform complex tasks, knowing that the consequences of failure could result in harm or even the death of themselves and others. This resilience and strength can be relied on in the civilian marketplace.
Basic training teaches the men and women of the military that the needs of the team are greater than their individual needs. They are taught to lead and to follow. They are taught to communicate up, down and across. They learn that they can achieve things they never thought possible because of these skills, and with a humble confidence can do the same in any work environment.
The military is in a constant state of learning, producing individuals with uncanny adaptive thinking and a capacity and passion for continuing to learn. This learning environment focuses not only on personal development but also on training and developing subordinates and peers. This experience acts as a force multiplier when a veteran who is used to knowing his job plus that of the entire team is added to the staff. The veteran is used to making sure that the team as a whole is performing well rather than focusing on the individual. This unwavering commitment to a greater cause becomes an ingrained ethos that can improve the work habits of the entire team.
The public commonly stereotypes military personnel as unable to think outside of a chain of command, but following a chain of command is only a facet of understanding how to perform in a team. Service members are also trained to be problem solvers. In this author’s experience, Iraq and Afghanistan were highly complex operations where overlooking the smallest detail could change outcomes. The military could not succeed at any mission if everyone waited for specific orders/instructions from their superiors before reacting to a situation. The mindset of a veteran is to focus on the mission: mission leaders impart a thorough understanding of the intent of a plan to troops, who then apply problem-solving skills to each situation in order to get the most positive outcome. They are trained to be consummate planners, engaging in a continuous process of assessment, planning, execution, feedback and fine tuning to ensure mission success.
Reliability is another key attribute that comes from military service. Veterans know that a mission that starts a minute late can be fatal. This precision translates to little things like punctuality and big things like driving projects to meet due dates and budgets. This level of dependability is cornerstone of being a good teammate and leader.
Finally, an often overlooked value of military service is the scope of responsibility that veterans have had, which is often much larger than their non-veteran peers. It is not uncommon for servicemen and women in their twenties to have managed multi-million dollar budgets and hundreds of people. Their planning and management experience is gained in situations where bad decisions can cause troops to drive into an ambush that might also prevent supplies or reinforcements from reaching an under-provisioned unit.
Because military experience produces individuals who demonstrate strong leadership skills, reliability, dependability, integrity, problem-solving ability, proven ability to learn and a team-first attitude, veterans are the best source of talent available. Salute Inc. is an example of a company that helps bring veterans into the data center industry, and in less than six months has proven the value proposition.
Challenges
Recent Uptime Institute Network discussions identified the need for standard curriculum and job descriptions to help establish a pathway for veterans to more easily enter the industry, and Network members are forming a subcommittee to examine the issue. The subcommittee’s first priority is establishing a foundation of training for veterans whose military specialty did not include technical training. Training programs should allow each veteran to enter the data center industry at an appropriate level.
At the same time, the subcommittee will assess and recommend human resource (HR) policies to address a myriad of systemic issues that should be expected. For example, once trainees become qualified, how should companies adjust their salaries? Pay adjustments might exceed normal increases; however, the market value of these trainees has changed, and, unlike other entry-level trainees, veterans have proven high retention rates. The subcommittee has already defined several entry-level positions:
Resources for Veterans
The Center for New American Security (CNAS) conducted in-depth interviews with 69 companies and found that more than 80% named one or two negative perceptions about veterans. The two most common are skill translation and concerns about post-traumatic stress (PTS).
Many organizations have looked at the issue of skill translation. Some of them have developed online resources to help veterans translate their experiences into meaningful and descriptive civilian terms (www.careerinfonet.org/moc/). They also provide tools to help veterans articulate their value in terms that civilian organizations will understand.
Organizations that access these resources will also gain a better understanding of how a veteran’s training and skills suit the data center environment. In addition, the military has established comprehensive transition programs that all service members go through when re-entering the civilian job market, including resume preparation and interview planning. The combination of government-sponsored programs and resources, a veteran’s own initiative and a civilian organization’s desire to understand can offset concerns about skill translation.
PTS is an issue that cannot be ignored. It is one of the more misunderstood problems in America, even among some in the medical community. It is important to understand more about PTS before assuming this problem affects only veterans. It is estimated that 8% of all Americans suffer from PTS, which is about 25 million people. The number of returning military who have been diagnosed with PTS is 300,000, which is about 30% of Iraq/Afghanistan combat veterans, yet only a very small proportion of the total PTS sufferers in the U.S. The mass media—where most people learn about PTS—often describes PTS as a military issue because the military approach to PTS is very visible: there is a formal process for identifying it and also enormous resources focused on helping veterans cope with it. Given that there are 80 times more non-veterans suffering from PTS, the focus for any HR organization should be ensuring that a company’s practices (from the interview to employee assistance programs and retention) are effectively addressing the issue in general.
Conclusion
The data center industry needs the discipline, leadership and flexibility skills of veterans to serve as a foundation on which it can build the next generation of data center operators. The Uptime Institute Network is establishing a subcommittee and called for volunteers to help define the fundamentals that would be required to have an effective onboarding, training and development program in the industry. This group will address everything from job descriptions to clearly defined career paths for both entry-level trainees as well as experienced technicians transitioning from the military. For further information or if you are interested in contributing to this effort, please contact Rob Costa, Network Director, North America ([email protected]).
Resources
The following list provides a good starting point for understanding the many resources available for veterans and employers to connect.
Lee Kirby is Uptime Institute senior vice president, CTO. In his role he is responsible for serving Uptime Institute clients throughout the life cycle of the data center from design through operations. Mr. Kirby’s experience includes senior executive positions at Skanska, Lee Technologies and Exodus Communications. Prior to joining the Uptime Institute, he was CEO and founder of Salute Inc. He has more than 30 years of experience in all aspects of information systems, strategic business development, finance, planning, human resources and administration both in the private and public sectors. Mr. Kirby has successfully led several technology startups and turn-arounds as well as built and run world-class global operations. In addition to an MBA from University of Washington and further studies at Henley School of Business in London and Stanford University, Mr. Kirby holds professional certifications in management and security (ITIL v3 Expert, Lean Six Sigma, CCO). In addition to his many years as a successful technology industry leader, he masterfully balanced a successful military career over 36 years (Ret. Colonel) and continues to serve as an advisor to many veteran support organizations.
Mr. Kirby also has extensive experience working cooperatively with leading organizations across many Industries, including Morgan Stanley, Citibank, Digital Realty, Microsoft, Cisco and BP.
Resolving Conflicts between Data Center Owners and Designers
/in Design, Executive/by Kevin HeslinImproving communication between the enterprise and design engineers during a capital project
For over 10 years, Uptime Institute has sought to improve the relationship between data center design engineers and data center owners. Yet, it is clear that issues remain.
Uptime Institute’s uniquely unbiased position—it does not design, construct, commission, operate, or provision equipment to data centers—affords direct insight into data center capital projects throughout the world. Uptime Institute develops this insight through relationships with Network members in North America, Latin America, EMEA, and Asia Pacific; the Accredited Tier Designer (ATD) community; and the owner/operators of 392 Tier Certified, high-performance data centers in 56 countries.
Despite increasingly sophisticated analyses and tools available to the industry, Uptime Institute continues to find that when an enterprise data center owner’s underlying assumptions at the outset of a capital project are not attuned to its business needs for performance and capacity, problematic operations issues can plague the data center for its entire life.
The most extreme cases can result in disrupted service life of the new data center. Disrupted service life may be classified in three broad categories.
1. Limited flexibility
2. Insufficient capacity
3. Excess capacity
Any data center capital project is subject to complex challenges. Overtime and over-budget considerations, such as inclement weather, delayed equipment delivery, overwhelmed local resources, slow-moving permitting and approval bureaucracies, lack of availability of public utilities (power, water, gas), merger or acquisition, or other shift in corporate strategy, may be outside of the direct control of the enterprise.
But other causes of overtime and over-budget are avoidable and can be dealt with effectively during the pre-design phase. Unfortunately, many of these issues become clear to the Enterprise after the project management, design, construction, and commissioning teams have completed their obligations.
Planning and justifying major data center projects has been a longstanding topic of research and education for Uptime Institute. Nevertheless, the global scale of planning shortfalls and project communication issues only became clear due to insight gained through the rapid expansion of Tier Certifications.
Even before a Tier Certification contract is signed, Uptime Institute requests a project profile, composed of key characteristics including size, capacity, density, phasing, and Tier objective(s). This information helps Uptime Institute determine the level of effort required for Tier Certification, based on similar projects. Additionally, this allows Uptime Institute to provide upfront counsel on common shortfalls and items of concern based upon our experience of similar projects.
Furthermore, a project team may update or amend the project profile to maintain cost controls. Yet, Uptime Institute noted significant variances in these updated profiles in terms of density, capacity, and Tier. It is acknowledged that an owner may decide to amend the size of a data center, or to adjust phasing, to limit initial capital costs or otherwise better respond to business needs. But a project that moves up and down the Tier levels or varies dramatically in density from one profile to another indicates management and communication issues.
These issues result in project delays, work stoppages, or cancellations. And if the project is completed, it can be expected to lack in terms of capacity (either too much or too little), meeting performance requirements (in design or facility), and flexibility.
Typically, a Tier Certification inquiry occurs after a business need has been established for a data enter project and a data center design engineer has been contracted. Unstable Certification profiles show that a project may have prematurely been moved into the design phase, with cost, schedule, and credibility consequences for a number of parties—notably, the owner and the designer.
Addressing the Communications Gap
Beginning in May 2013, Uptime Institute engaged the industry to address this management and communication issue on a broader basis. Anecdotally, both sides, via the Network or ATD courses, had voiced concerned that one had insufficient insight into the scope or responsibility, or unrealistic expectations, of the other. For example, a design engineer would typically be contracted to produce an executable design but soon find out that the owner was not ready to make the decisions that would allow the design process to begin. On the other hand, owners found that the design engineers lacked commitment to innovation, and they would be delivered a solution that was similar to a previous project rather than vetted against established performance and operations requirements. This initiative was entitled Owners vs Designers (OvD) to call attention to a tension evident between these two responsibilities.
The Uptime Institute’s approach was to meet with the designers and owners separately to gather feedback and recommendations and to then reconcile the feedback and recommendations in a publication.
OvD began with the ATD community during a special session at Uptime Institute Symposium in May 2013. Participants were predominantly senior design engineers with experience in the U.S., Canada, Brasil, Mexico, Kenya, Australia, Saudi Arabia, Lebanon, Germany, Oman, and Russia. This initial session verified the need for more attention to this issue.
The design engineers’ overwhelming guidance to owners could be summarized as “know what you want.” The following issues were raised specifically and repeatedly:
1. Lack of credible IT forecasting
2. Lack of detailed Facilities Technical Requirements
3. Misalignment of available budget and performance expectations
Data Center Owners Respond
Following the initial meeting with the data center design community, Uptime Institute brought the discussion to data center owners and operators in the Uptime Institute Network throughout 2013, at the North America Network Meeting in Seattle, WA, APAC Network Meeting in Shenzhen, China, and at the Fall Network Meeting in Scottsdale, AZ.
Uptime Institute solicited input from the owners and also presented the designers’ perspective to the Network members. The problems the engineering community identified resonated with the Operations professionals. However, the owners also identified multiple problems encountered on the design side of a capital project.
In the owner’s words, “designers, do your job.”
According to the owners, the design community is responsible for drawing out the owners’ requirements, providing multiple options, and identifying and explaining potential costs. Common problems in the owners’ experience include:
The data center owner community agreed with the designer’s perspective and took responsibility for those shortcomings. But the owners pointed out that many design firms promote cookie-cutter solutions and are reluctant to stray from their preferred topologies and equipment-based solutions. One participant shared that he received data center design documents for a project with the name of the design firm’s previous customer still on the paperwork.
Recommendations
Throughout this process, Uptime Institute worked to collect and synthesize the feedback and potential solutions to chronic communications problems between these two constituencies. The following best practices will improve management and communication throughout the project planning and development, with lasting positive effect on the operations lifecycle.
Pre-Design Phase
All communities that participated in OvD discussions understood the need to unite stakeholders throughout the project and the importance of reviewing documentation and tracking changes throughout. Owners and designers also agreed on the need to invest time and budget for pre-design, specifically including documenting the IT Capacity Plan with near-term, mid-term, and long-term scenarios.
The owners and designers also agreed on the importance of building Facilities Technical Requirements that are responsive to the IT Capacity Plan and includes essential project parameters:
Workshop Computer Room Master Plans with IT, Facilities, Corporate Real Estate, Security, and other stakeholders and then incorporate them into the Facilities Technical Requirements. After preparing the Facilities Technical Requirements, invite key stakeholders to ratify the document. This recommendation does not prohibit changes later but provides a basis of understanding and launch point for the project. Following ratification, brief the executive (or board). This and subsequent briefings can provide the appropriate forum for communicating the costs associated with various design alternatives, but also how they deliver business value.
RFP and Hiring
Provide as much detail about project requirements as possible in the RFP, including an excerpt of Facilities Technical Requirements in the RFP itself and technology and operations preferences and requirements. This allows respondents to the RFP to begin to understand the project and respond with most relevant experience. Also, given that many RFPs compel some level of at-risk design work, a detailed RFP will best guide this qualification period and facilitate the choice of the right design firm. Inclusion of details in the RFP does not prohibit the design from changing during its development and implementation.
Negotiate in person as much as possible. Owners regretted not spending more time with the design firm(s) before a formal engagement as misalignments only became evident once it was too late. Also, multiple owners remarked with pride that they walked out of a negotiation at least once. This demonstrated their own commitment to their projects and set a tone of consequences and accountability for poor or insufficient communication.
Assess and score the culture of the design firms for alignment with the owner’s preferred mode and tone of operations. Owners commented that they preferred a small and local design firm, which may require some additional investment in training, but they were confident would get more careful and close attention in return.
Notify the design engineer from the outset of specific requirements and indicators of success to pre-empt receiving a generic or reconstituted design.
Should the owner engage an outside consultant, avoid setting an aggressive tone for consultants. Owners may want to augment their internal team with a trusted advisor resource. Yet, this role can inadvertently result in the consultant assuming the role of guard dog, rather than focusing on collaboration and facilitation.
Design and Subsequent Phases
Owners and designers agreed that a design effort was a management challenge rather than a technical one. An active and engaged owner yields a more responsive and operable design. Those owners that viewed it as outsourcing the production/fabrication effort of a data center struggled with the resulting solution. The following recommendations will reduce surprises during or after the project.
Key components of this system include the following:
As the recommendations were compiled from the OvD initiative, many of the recommendations resonated with Uptime Institute guidance of years past. Over 10 years ago, Ken Brill and Pitt Turner held seminars on project governance that touched upon a number of the items herein. It is an old problem, but just as relevant.
Key Quotes from the Design Community
Owners want to design to Tier III, but they want to pay for Tier II and get Tier IV performance.
Owners want technologies or designs that don’t work in their region or budget.
The IT people are not at the table, and engineers don’t have adequate opportunity to understand their requirements. Designers are often trying to meet the demands of an absent, remote, or shielded IT client who lives in a state of constant crisis.
Once the project is defined, it’s in the hands of the general contractor and commercial real estate group. Intermediaries may not have data center experience, and engineers aren’t in direct contract with the end user anymore.
Industry Perspectives
Chris Crosby, CEO, Compass Datacenters
There are some days when I’d like to throw architects and engineers off the roof. They don’t read their own documents, for example, putting in boilerplate that has nothing to do with the current project in a spec. They can also believe that they know better than the owner—making assumptions and changes independent of what you have clearly told them on paper that you want. It drives me nuts because as an owner you may not catch it until it has cost you a hundred grand, since it just gets slipsheeted into some detail or RFI response with no communication back to you.
Dennis R. Julian, PE, ATD, Principal, Integrated Design Group, Inc.
Data center designs are detail oriented. Missing a relatively minor item (e.g. control circuit), could result in shutting down the IT equipment. When schedules are compressed, it is more difficult and requires more experienced design personnel to flush out the details, do the analysis, and provide options with recommendations required for a successful solution.
There are pressures to stick with proven designs when:
Good design saves capital and operating costs over the life of the facility and vastly dwarfs any savings in design fees. Selecting designers based on qualifications and not fees, similar to the Brooks Act regulating the selection of engineers by the U.S. Federal government (Public Law 92-582 92nd Congress, H.R. 12807. October 27, 1972) and allowing reasonable schedules will allow the discussion about the client’s goals and needs and the time to review alternatives for the most cost-effective solution based on total cost of ownership.
Julian Kudritzki joined the Uptime Institute in 2004 and currently serves as Chief Operating Officer. He is responsible for the global proliferation of Uptime Institute Standards. He has supported the founding of Uptime Institute offices in numerous regions, including Brasil, Russia, and North Asia. He has collaborated on the development of numerous Uptime Institute publications, education programs, and unique initiatives such as Server Roundup and FORCSS. He is based in Seattle, WA.
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.
Decommissioning as a Discipline: Server Roundup Winners Share Success
/in Executive/by Kevin HeslinHow did these six enterprises find and eliminate so much waste?
Comatose IT equipment, servers long abandoned by application owners and users but still racked and running, are hiding in plain sight within even the most sophisticated IT organizations. Obsolete or unused servers represent a double threat in terms of energy waste—squandering power at the plug, but also wasting data center facility power and capacity.
Uptime Institute Research circa 2009 states decommissioning one rack unit (1U) of servers can result in a savings of US$500 per year in energy costs, an additional US$500 in operating system licenses and US$1,500 in hardware maintenance costs. But reaping those rewards is no easy task.
According to Uptime Institute’s estimates based on industry experience, around 20% of servers in data centers today are obsolete, outdated or unused. That percentage may in fact be conservative.
According to one media report, Lexis Nexis found 50% of its servers were comatose in one of its audit samples. When the insurance firm SunLife took back management from an outsourced data center engagement firm in 2011, it found 40% of its servers were doing absolutely nothing
As early as 2006, Uptime Institute Founder Ken Brill identified comatose servers as one of the biggest opportunities for companies to improve overall IT energy efficiency. While Mr. Brill advocated for industry action on this issue, he often cautioned, “Nobody gets promoted for going around in the data center and unplugging servers.” Mr. Brill meant that data center professionals had no incentive to remove comatose machines and that IT executives lacked insight into the impact idle IT equipment was having on the cost structures of their organizations, as their departments do not pay the data center power bill.
The corporate disconnect between IT and Facilities Operations continues to challenge the data center industry. Data center managers need to overcome that organizational barrier and get executive level buy-in in order to implement an effective server decommissioning program.
Winners of Server Roundup at the Uptime Institute Symposium 2013
This is why Uptime Institute invited companies around the globe to help address and solve the problem of comatose servers by participating in the Server Roundup, an initiative to promote IT and Facilities integration and improve data center energy efficiency.
The annual Uptime Institute Server Roundup contest was launched in October 2011 to raise awareness about the removal and recycling of comatose and obsolete IT equipment in an effort to reduce data center energy use. In 2012, Uptime Institute named AOL and NBC Universal inaugural Server Roundup champions. AOL had removed nearly 10,000 obsolete servers, and NBC Universal culled 1,090 comatose machines, representing 29% of its overall IT footprint. The following year’s results were even more impressive.
2013 Winners and Finalists
WINNER: AOL won in back-to-back years for its overall tally of servers removed. The global Web services company decommissioned 8,253 servers in calendar year 2012. This produced (gross) total savings of almost US$3 million from reduced utility and maintenance costs and asset resale/scrap. Environmental benefits included reducing carbon emissions by more than 16,000 tons, according to AOL.
WINNER: Barclays, a global financial organization, removed 5,515 obsolete servers in 2012, gaining power savings of around 3 megawatts, US$3.4 million annualized savings for power, and a further US$800K savings in hardware maintenance.
FINALIST: TD Bank removed 513 servers in 2012. The team from this Canadian financial firm removed 2,941 units in the 5 years they’ve been working to remove obsolete machines from the raised floor. Although the TD Bank annual server count does not approach the impressive numbers put up by AOL, the organization makes up for it in volume of waste that it diverts from local and municipal waste sites. All the equipment sent through the E-Waste recycler is salvaged within a 110-mile radius from TD Bank’s primary data centers. Nothing is shipped overseas for processing.
Server Roundup trophy belt buckle
FINALIST: McKesson pulled 586 servers in 2012, reducing data center power usage by 931.7 kilowatts and saving US$734,550.
FINALIST: Sun Life Financial removed 387 servers in 2012, which resulted in 32 kilowatts of power savings across three data centers and financial savings of US$8,800 per month.
Since the contest’s launch two years ago, Server Roundup participants have decommissioned and recycled 30,000 units of obsolete IT equipment.
In the sidebars, Server Roundup winner Paul Nally and finalist Rocco Alonzi discuss the challenges and benefits of a server-decommissioning program and detailed their strategies for success .
Takeaways From Last Year’s Winners
During the 2013 Uptime Institute Symposium, last year’s winners provided the following advice:
Paul Nally, Director at Barclays
“It has been said that the greenest data center is the one that’s never built. That is the main reason we have our server decommissioning program at Barclays. We are looking to shrink our data center footprint and benefit from the savings that this affords us, while allowing ourselves to massively expand our overall compute capability. When obsolete servers are removed in the thousands, it creates the capacity that we need to bring the next generation of systems in.
We save in space; we save in power. It helps us meet our carbon targets. When we eliminate or virtualize a server, we also save on network, SAN, and software costs. A server that may have cost US$100,00 seven years ago, took up half a rack of space, and required a couple of kW to run is absolutely crushed in compute performance by a modern blade costing US$5,000. But the benefits extend throughout the overall organization. A focus on removing these obsolete systems simplifies the environment from a network and systems administration perspective. Applications teams benefit from a more stable system that is easily maintained and integrated into contemporary management frameworks. We end up in a cleaner, safer, cheaper place with the capacity in hand that we need to continue to grow our business. There is real work, and some risk, in getting this job done, but the benefits are simply too many to ignore.”
Rocco Alonzi, AVP Data Center Governance at Sun Life Financial
“The removal of an under-utilized server sounds much easier than it really is. The thought of turning off a server and removing it from the raised floor can be overwhelming even if you are 100% certain that it is no longer required. Think about the process for a moment. As the server connections (electrical power, network, SAN storage) are removed and the server physically pulled out of a production cabinet, the hard drive data must be permanently destroyed and finally the server needs to be returned to the vendor or disposed of properly. The logical aspect includes another entire separate process so that in the end it is much easier on everyone to leave it powered on.
This is the message I communicated to the Leadership team followed by a solution and a promise. The solution included a dedicated resource (Contractor), asset database, and cooperation from the Server, Storage, and Network support teams. The contractor walked the raised floor performing an asset database book to raised-floor audit. And, yes, this did take some time, three months to be exact. This rich information was used to identify the servers that were not in the database but physically on the raised floor. We also challenged the support groups to associate their service offering with corresponding hardware infrastructure. These two exercises led to approximately 400 servers being switched off.
The promise was that Data Centre Operations team would do all the work after the hardware device was switched off. This included working with the support groups to reclaim IP addresses, SAN storage ports, and electrical power cords. We also provided the Financial department with detailed hardware information reclaiming cost savings that was passed on to the business unit. Finally, a process was put into place to remove the physical server from the raised floor, destroy the data, and properly dispose of the hardware.
The message: Raise awareness to the Leadership team of the issue and take a dedicated approach of decommissioning hardware infrastructure. It is well worth the effort.”
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for over a decade.
Data Center Cost Myths: SCALE
/in Executive/by Kevin HeslinWhat happens when economies of scale is a false promise?
By Chris Crosby
Chris Crosby
Chris Crosby is a recognized visionary and leader in the data center space, Founder and CEO of Compass Datacenters. Mr. Crosby has more than 20 years of technology experience and 10 years of real estate and investment experience. Previously, he served as a senior executive and founding member of Digital Realty Trust. Mr. Crosby was Senior Vice President of Corporate Development for Digital Realty Trust, responsible for growth initiatives including establishing the company’s presence in Asia. Mr. Crosby received a B.S. degree in Computer Sciences from the University of Texas at Austin.
For many of us, Economics 101 was not a highlight of our academic experience. However, most of us picked up enough jargon to have an air of competence when conversing with our business compadres. Does “supply and demand” ring a bell?
Another favorite term that we don’t hesitate to use is “economies of scale.” It sounds professorial and is easy for everyone, even those of us who slept our way through the course, to understand. Technically the term means: The cost advantages that enterprises obtain due to size, throughput, or scale of operation, with cost per unit of output generally decreasing with increasing scale as fixed costs are spread out over more units of output.
The metrics used in our world are usually expressed as cost/kilowatt (kW) of IT capacity and cost/square foot (ft2) of real estate. Some folks note all costs as cost/kW. Others simply talk about the data center fit out in cost/kW and leave the land and building (cost/ft2) out of the equation entirely. In both cases, however, economy of scale is the assumed catalyst that drives cost/ft2 and/or cost/kW ever lower. Hence the birth of data centers so large that they have their own atmospheric fields.
This model is used both by providers of multi-tenant data centers (MTDC), vendors of pre-fabricated modular units, and many enterprises building their own facilities. Although the belief that building at scale is the most cost efficient data center development method appears logical on the surface, it does, in fact, rely on a fundamental requirement: boat loads of cash to burn.
It’s First Cost, Not Just TCO
In data center economics, no concept has garnered more attention, and less understanding, than Total Cost of Ownership (TCO). Entering the term “data center total cost of ownership” into Google returns more than 11.5 million results, so obviously people have given this a lot of thought. Fortunately for folks who write white papers, nobody has broken the code. To a large degree, the problem is the nature of the components that comprise the TCO calculus. Because of the longitudinal elements that are part of the equation, energy costs over time for example, the perceived benefits of design decisions sometimes hide the fact that they are not worth the cost of the initial investment (first cost) required to produce them. For example, we commonly find this to be the case in the quest of many operators and providers to achieve lower PUE. While certainly admirable, incomplete economic analysis can mask the impact of a poor investment. In other words, this is like my wife bragging about the money she saved by buying the new dining room set because it was on sale even though we really liked the one we already had.
In a paper posted on the Compass website, Trading TCO for PUE?, Romonet, a leading provider of data center analytical software, illustrated the effect of failing to properly examine the impact of first cost on a long-term investment. Due to New Mexico’s favorable atmospheric conditions, Compass chose it as the location to examine the value of using an adiabatic cooling system in addition to airside economization as the cooling method for a hypothetical location. This is a fairly common industry approach to free cooling. New Mexico’s climate is hot and dry and offers substantial free cooling benefits in the summer and winter as demonstrated by Figure 1.
Figure 1. Free cooling means that the compressors are off, which, on the surface, means “free” as they are not drawing electricity.
In fact, through the use of an adiabatic system, the site would benefit from over four times the free-cooling hours than a site without one. Naturally, the initial reaction to this cursory data would be “get that cooling guy in here and give him a PO so I have a really cool case study to present at the next Uptime Institute Symposium.” And, if we looked at the perceived cost savings over a ten-year period, we¹d be feeling even better about our US$500,000 investment in that adiabatic system since it appears that it saved us over US$440,000 in operating expenses.
Unfortunately, appearances can be deceiving, and any analysis of this type needs to include a few things such as discounted future savings‹otherwise known as net present value (NPV), the cost of not only the system maintenance but also for the water used, and its treatment, over the 10-year period. When these factors are taken into account, it turns out that our US$500,000 investment in an adiabatic cooling system actually resulted in a negative payback of US$430,000! That’s a high price to pay for a tenth of a point in your PUE.
The point is that the failure to account for the long-term impact of an initial decision can permanently preclude companies from exercising alternative business, not just data center, strategies.
The Myth of Scale (a.k.a., The First Cost Trap)
Perhaps there is no better example of how the myth of scale morphs into the first cost trap than when a company elects to build out the entire shell of its data center upfront, even though their initial space requirements are only a fraction of the ultimate capacity. This is typically done using the justification that they will eventually “grow into it,” and it is necessary to build a big building because of the benefit of economy of scale. It¹s important to note that this is also a strategy used by providers of MTDCs, and it doesn’t work any better for them.
The average-powered core and shell (defined here as the land, four walls, and roof along with a transformer and common areas for security, loading dock, restrooms, corridors, etc.) of a data center facility typically ranges from US$20 million to upwards of US$100 million. The standard rationale for this “upfront” mode of data center construction is that this is not the ³expensive² portion of the build and will be necessary in the long run. In other words, the belief is that it is logical to build the facility in its entirety because construction is cheap on a per-square-foot basis. Under this scenario, the cost savings are gained through the purchase and use of materials in a high enough volume that price reductions can be extracted from providers. The problem is that when the first data center pod or module is added the costs go up in additional US$10 million increments. In other words, in the best case it costs US$30 million minimum just to turn on the first server! Even modular options require that first generator and that first heat rejection and piping. First cost per kW is two to four times the supposed “end” point cost per kilowatt. Enterprises can pay two or three times more.
Option Value
This volume mode of cost efficiency has long been viewed as an irrefutable truth within the industry. Fortunately, or unfortunately, depending on how you look at things, irrefutable truths oftentimes prove very refutable. In this method of data center construction, what is gained is often less important than what has been lost.
Option value is the associated monetary value of the prospective economic alternatives (options) that a company has in making decisions. In the example, the company gained a shell facility that it believes, based on its current analysis, will satisfy both its existing and future data center requirements. However, the inexpensive (as compared to the fit out of the data center) cost of US$100-US$300/ft2 is still real money (US$20-US$100 million depending on the size and hardening of the building). The building and the land it sits on are now dedicated to the purpose of housing the company’s data center, which means that it will employ today¹s architecture for the data center of the future. If the grand plan does not unfold as expected, this is kind of like going bust after you’ve gone all in during a poker game.
Figure 2. Estimated hours of free cooling at a hypothetical site in New Mexico.
Now that we have established what the business has gained through its decision to build out the data center shell, we should examine what it has lost. In making the decision to build in this way, the business has chosen to forgo any other use. By building the entire shell first, it has lost any future value of an appreciating asset‹the land used for the facility. It cannot be used to support any other corporate endeavors, such as disaster recovery offices, and it cannot be sold for its appreciated value. While maybe not foreseeable, this decision can become doubly problematic if the site never reaches capacity and some usable portion of the building/land is permanently rendered useless. It will be a 39-year depreciating rent payment that delivers zero return on assets. Suddenly, the economy of scale is never realized, so the initial cost per kilowatt is the end-point cost.
For example, let’s assume a US$3-million piece of land and US$17 million to build a building of 125,000 ft2 that supports six pods at 1,100 kW each. At US$9,000 per kW for the first data center, we have an all-in of US$30 million for 1,100 kWh over US$27,000 per kW. It’s not until we build all six pods that we get to the economy of scale that produces an all-in of US$12,000/kW. In other words, there is no economy of scale unless you commit to invest almost US$80M! This is the best case, assuming the builder is an MTDC.
It is logical for corporate financial executives to ask whether this is the most efficient way to allocate capital. The company has also forfeited any alternative uses for the incremental capital that was invested to manifest this all at once approach. Obviously once invested, this capital cannot be repurposed and remains tied to an underutilized depreciating asset.
Figure 3. Savings assumes energy cost of $US0.058 kWh, 0.25 cooling overhead and 1,250 kW of IT load
An Incremental Approach
The best way to address the shortcomings associated with the myth of scale is to construct data center capacity incrementally. This approach entails building a facility in discrete units that, as part of the base architecture, enable additional capacity to be added when it is required. For a number of reasons, until recently, this approach has not been a practical reality for businesses desiring this type of solution.
For organizations that elect to build their own data centers, the incremental approach described above is difficult to implement due to resource limitations. Lacking a viable prototype design (the essential element for incremental implementation), each project effectively begins from scratch and is typically focused on near-term requirements. Thus, the ultimate design methodology reflects the build it all at once approach as it is perceived to limit the drain on corporate resources to a one-time-only requirement. The typical end result of these projects is an extended design and construction period (18-36 months on average), which sacrifices the efficiency of capital allocation and option value for a flawed definition of expediency.
For purveyors of MTDC facilities, incremental expansion via standardized discrete units is precluded due to their business models. Exemplifying the definition of economies of scale found in our old Economics 101 textbooks, these organizations reduce their cost metrics by leveraging their size to procure discounted volume purchase agreements with their suppliers. These economies then translate into the need to build large facilities designed to support multiple customers. Thus, the cost efficiencies of MTDC providers drive a business model that requires large first-cost investments in data center facilities, with the core and shell built all at once and data center pods completed based on customer demand. Since MTDC efficiencies can only be achieved by reducing high first-cost investments by leasing capacity to multiple tenants or multiple pods to a tenant, they are forced to locate these sites in market areas that include a high population of their target customers. Thus, the majority of MTDC facilities are predominately found within a handful of markets (e.g., Northern Virginia, New York/New Jersey, and the San Francisco Bay area) where a critical mass of prospective customers can be found. This is the predominant reason why they have not been able to respond to customers requiring data centers in other locations. As a result, this MTDC model requires a high degree of sacrifice to be made by the customers. Not only must they relinquish their ability to locate their new data center wherever they need it, they must pre-lease additional space to ensure that it will be if they grow over time as even the largest MTDC facilities have finite data center capacity.
Many industry experts view prefabricated data centers as a solution to this incremental requirement. In a sense, they are correct. These offerings are designed to make the addition of capacity a function of adding one or more additional units. Unfortunately, many users of prefabricated data centers experience problems from how these products are incorporated in designs. Unless the customer is using them in a parking lot, more permanent configurations require the construction of a physical building to house them. The end result of this need is the construction of an oversized facility that will be grown into, but also suffers from the same first cost and absence of option value as the typical customer-constructed or MTDC facility. In other words, if I have to spend $US20 million day one for the shell and core, how am I saving by only building in 300-kW increments instead of 1-megawatt like the traditional guys?
The Purpose-Built Facility
In order to effectively implement a data center strategy that eliminates the issues of exorbitant first costs and the elimination of option value, the facility itself must be designed for just such a purpose. Unlike attempting to use size as the method for cost reduction, the data center would achieve this requirement through the use of a prototype, replicable design. In effect, the data center becomes a product with cost focus on a system level, not parts and pieces.
To many, the term “standard” is viewed as a pejorative that denotes a less than optimal configuration. However, as ³productization² has shown with the likes of the Boeing 737, the Honda Accord, or the Dell PC, when you include the most commonly desired features at a price below non-standard offerings, you eliminate or minimize the concern. For example, features like: Uptime Institute Tier III Design and Construction Certification, LEED certification, a hardened shell, and ergonomic features like a move/add/change optimized design would be included in the standard offering. This limits the scope of customer personalization to the data hall, branding experience, security and management systems, and jurisdictional requirements. This is analogous to car models that incorporate the most commonly desired features as standard, while enabling the customer to “customize their selection in areas such as car color, wheels, and interior finish.
The resulting solution then provides the customer with a dedicated facility, including the most essential features that can be delivered within a short timeframe (under six months from initial ground breaking) without requiring them to spend US$20-US$100 million on a shell while simultaneously relinquishing the option value of the remaining land. Each unit would also be designed to easily allow additional units to be built and conjoined to enable expansion to be based on the customer’s timeframe and financial consideration rather than have them imposed on them by the facility itself or a provider.
Summary
Due to their historically limited alternatives, many businesses have been forced to justify the inefficiency of their data center implementations based on the myth of scale. Although solutions like pre-fabricated facilities have attempted to offer prospective users the incremental approach that negates the impact of high first costs and the elimination of alternatives (option value), ultimately they require the same upfront physical and financial requirements as MTDC alternatives. The alternative to these approaches is through the productization of the data center in which a standard offering, that includes all of the most commonly requested customer features, provides end users with a cost effective option that can be grown incrementally in response to their individual corporate needs.
Industrialization, a la Henry Ford, ensures that each component is purchased at scale to reduce the cost per component. Productization shatters this theory by focusing on the system levels, not the part/component level. It is through productization that the paradox of high quality, low cost, in quickly delivered data centers becomes a reality.