Fuel System Design and Reliability

Topology and operational sustainability considerations must drive fuel system design.

When Superstorm Sandy hit the Northeastern U.S., some data center owners and operators discovered that they hadn’t considered and treated their fuel systems as part of a mission-critical system. In one highly publicized story, a data center operator in New York City was forced to use a “fuel bucket brigade,” moving fuel by hand up 18 stories for 48 hours. The brigade’s heroics kept the data center online, but thoughtful design of the fuel topology would have led to better operational sustainability. Superstorm Sandy taught many other New York and New Jersey data center owners that fuel system planning is essential if critical power is to be maintained in an emergency.

Uptime Institute has observed that designers may not be well versed in fuel solutions for data centers. Possible explanations include the delegation of the fuel system design to a fuel or engine generator supplier and a focus on other systems. As a critical system, however, fuel systems require all the consideration paid to the topology of other data center subsystems from data center designers and owners, with integration of Operational Sustainability principles to ensure high availability.

Tier Topology of Fuel Systems
Recall that the Tier Standard is based on the resiliency of the lowest-rated data center subsystem. The topology of the fuel system includes paths of fuel distribution and each and every fuel component, all of which must correspond to the Tier objective of the data center. For Tier III and Tier IV data centers, the path of fuel supply and potentially the return lines must be either Concurrently Maintainable or Fault Tolerant. The fuel system components—namely pumps, manual valves, automated valves, control panels, bulk tanks and day tanks – all must meet either the Concurrently Maintainable or Fault Tolerant objective.

Determining whether a fuel system is Concurrently Maintainable or Fault Tolerant means that it must be evaluated in the same way as the chilled-water schematic or an electrical one-line drawing, starting from a given point – often the bulk storage tanks—and methodically working through the paths and components to the fuel treatment supplies and day tanks. Removing each component and path reveals points in the system where “N,” or the needed fuel flow and/or storage, are unavailable.

Tier Topology Fuel Requirement
The Uptime Institute Owners Advisory Committee defined 12 hours of minimum fuel storage as a starting point for Tier-defined data centers. The Tier Standard: Topology requires this 12-hour fuel storage minimum for all Tiers at 12 hours of runtime at “N” load while meeting the facility’s stated topology objective. Put another way, the fuel storage must be adequate to support the data center design load for 12 hours while on engine generators while meeting the Concurrently Maintainable or Fault Tolerant objective. Exceeding the 12-hour minimum is an Operational Sustainability issue that requires careful analysis of the risks to the data center energy supply.

Fuel Storage Topology
Many owners reference the amount of fuel on hand based on total capacity. Just as with engine generators, chillers and other capacity components, the true fuel storage capacity is evaluated by removing the redundant component(s). A common example is an owner’s claim of 48 hours of fuel; however, the configuration is two 24-hour tanks, which is only 24 hours of Concurrently Maintainable fuel. The amount of Concurrently Maintainable or Fault Tolerant fuel establishes the baseline as the amount always available, not the best-case scenario of raw fuel storage.

Common fuel system design discrepancies in Tier III and Tier IV systems include:

  • The fuel system valving and piping does not match the Concurrently Maintainable objective. An example of the most common fuel error is a single valve between two pumps or storage tanks when using an N+1 configuration, which cannot be maintained without shutdown of more than the redundant number of components.
  • The power to fuel pumps and fuel control panels is not Concurrently Maintainable or Fault Tolerant. A single panel or single ATS feeding all fuel pumps is a common example. The entire power path to fuel pumps and controls must be Concurrently Maintainable or Fault Tolerant.
  • Depending on the topology of the fuel system (constant flow, etc.), the return fuel path may be critical. If the fuel supply is pressurized or part of a continuously functioning return system from the day tanks to the bulk tanks, the return path must meet the Concurrently Maintainable or Fault Tolerant objective.
  • The fuel control system is not Concurrently Maintainable. A fuel control panel may be replicated or a method of manual fuel operations must be available in case of maintenance or replacement of the fuel control panel. In this case, the manual method must be a designed and installed methodology.

Other considerations include fuel cooling, the fault tolerance of fuel systems and fuel type.

The internal supply pumps of most engine generators continuously oversupply diesel to the engine generator and return superheated diesel fuel to the day tank. During extended operations or during high ambient temperatures, it is possible to overheat the day tank to such an extent that it reduces engine generator power or even causes thermal shutdown of the unit. There are two primary ways to prevent fuel overheat:

  • Recirculating day tank fuel to the underground bulk tank. This system utilizes the thermal inertia of the underground system with little or no bulk tank temperature change. Of course, the return path is a critical element of the fuel system and must meet all the same Tier criteria.
  • Specifying a fuel cooler as part of the engine generator package. Fuel coolers function well as long as their net power impacts on the engine generator are considered and the fuel cooler is confirmed to function at extreme ambient temperatures.

Tier IV requires autonomous response to failure, which encompasses the ability to detect a fault, isolate the fault and sustain operations with alternate systems. The simple test of Fault Tolerance for fuel is that “N,” or the needed amount of fuel for the data center, is available after any failure. Some failure scenarios of fuel systems include loss of power, leaks, failure of components or fire. Leak detection of fuel systems is crucial to Fault Tolerance. Fault detection may be achieved using a variety of methods, but the chosen method must be integrated with the system response to isolate the fuel leak. The isolation must limit fuel loss such that the ability of alternate systems to deliver “N” fuel is not impacted.

Tier IV also requires the compartmentalization of fuel systems. Compartmentalizing complementary systems within separate spaces limits the damage of a single catastrophic event to a single fuel system. This includes fuel lines, fuel controllers, fuel pumps and all types of fuel tanks.

Owners considering propane or natural gas to feed either gas turbines or more recent innovations such as fuel cells face additional fuel considerations.

Fuel must have completely redundant supply lines in order to be Concurrently Maintainable.

Fuel must have completely redundant supply lines in order to be Concurrently Maintainable.

Tier topology views gas supply lines as a utility similar to water or electricity. Thus, as a utility, a minimum of 12 hours of gas is required to be stored on-site in a manner consistent with the site’s Tier design objective. The result is one or more large gas tanks for storage. While Tiers allow this solution, not many jurisdictions have shown tolerance for large gas storage under local or national code. And, from an Operational Sustainability perspective, multiple large gas tanks could be an explosion hazard. Explosion risk may be mitigated but certainly is a major design consideration.

Fuel System Design Impacts to Operational Sustainability
Operational Sustainability defines the behaviors and risks beyond Tier Topology that impact the ability of a data center to meet its business objectives over the long term. Of the three elements of Operational Sustainability, the data center design team influences the building characteristics element the most. An improperly designed fuel system can have serious negative Operational Sustainability impacts.

Sustainability Impacts
Recent weather and disaster events have shown that planning for more extreme circumstances than previously anticipated is being realized. Properly designed fuel storage can mitigate the operational risks related to long-term outages and fuel availability caused by disaster. Some questions to ask include:

  • What is the owner’s business continuity objective and corporate policy?
  • How will the data center operator manage the bulk fuel on hand?
  • What is the site’s risk of a natural disaster?
  • What is the site’s risk of a man-made disaster?
  • What fuel service can be provided in a local disaster, and can local suppliers meet fuel requirements?
  • What fuel service can be provided in a regional emergency, and within what time period?

The answers to questions about fuel storage requirements can come from dialogue with the data center owner, who must clearly define the site’s autonomy objective. The design team can then apply the Tier topology along with Operational Sustainability risk mitigation to create the fuel storage and delivery system.

Operational Sustainability is about mitigating risks through defining specific behaviors. Human error is one of the biggest risks to operations. In order to mitigate the human factor, the design of infrastructure, including fuel, must incorporate simplicity of design and operations. In essence, the design must make the fuel system easier to operate and maintain. Fuel distribution isolation valves and control systems must be accessible. Underground vaults for fuel systems should be routinely accessible and provide adequate space for maintenance activities. Fuel tank site glasses located in readable locations can provide visual clues of infrastructure status, confirm fuel levels and give operators direct feedback before the start of maintenance activities.

Fuel treatment is not required by Tier topology, but it is certainly an important Operational Sustainability site characteristic. Diesel fuel requires regular management to remove water, algal growth and sediment to ensure that engine generators are available to provide power. Permanently installed fuel treatment systems encourage regular fuel maintenance and improve long-term availability. Local fuel vendors can help address local concerns based on diesel fuel and climate. Finally, the Institute encourages designers to integrate fuel treatment systems carefully to ensure that the Tier objective is not compromised.

The Institute has seen a growing use of engine generators in exterior enclosures with fuel systems located outside. Physical protection of these assets is an Operation Sustainability concern. With the level of investment made in data center infrastructure, designers must prevent weather and incidental damage to the fuel and engine generator systems. Protecting fuel lines, tanks and engine generators from errant operators or vehicles is paramount; bollards or permanent structures are good solutions. The fuel system must consider Operational Sustainability to achieve high availability. Simplicity of design and consideration of operations can be the most effective long-term investment in uptime. Analysis of Operational Sustainability risks with appropriate mitigations will ensure uptime in an emergency.

Summary
Fuel systems must match the topology objective set for the entire facility. The chart outlines the benefits, drawbacks and Operational Sustainability considerations of different fuel solutions.

fuel solutions chart

Keith KelsnerKeith Klesner’s career in critical facilities spans 14 years and includes responsibilities ranging from planning, engineering, design and construction to start-up and ongoing operation of data centers and mission-critical facilities. In the role of Uptime Institute vice president of Engineering, Mr. Klesner has provided leadership and strategic direction to maintain the highest levels of availability for leading organizations around the world. Mr. Klesner performs strategic-level consulting engagements, Tier Certifications and industry outreach—in addition to instructing premiere professional accreditation courses. Prior to joining the Uptime Institute, Mr. Klesner was responsible for the planning, design, construction, operation and maintenance of critical facilities for the U.S. government worldwide. His early career includes six years as a U.S. Air Force officer. He has a Bachelor of Science degree in Civil Engineering from the University of Colorado-Boulder and a Masters in Business Administration from the University of LaVerne. He maintains status as a professional engineer (PE) in Colorado and is a LEED-accredited professional.

2014 Uptime Institute Data Center Survey Results

Uptime Institute Director of Content and Publications, Matt Stansberry, delivers the opening keynote from Uptime Institute Symposium 2014: Empowering the Data Center Professional. This is Uptime Institute’s fourth annual survey, and looks at data center budgets, energy efficiency metrics, and adoption trends in colocation and cloud computing.

Diesel exhaust after treatment for data centers

Cleaning up your act: A guide to exhaust standards

By Lamont Fortune, PE, United Health Group

Are your diesel generators used strictly for emergency or non-emergency purposes? Your answer to this question will tell you just how clean your diesel exhaust has to be by 2015 (and as early as 2014) as far as the U.S. Environmental Protection Agency (EPA) is concerned. Of course, you may have already run into this issue.

The EPA is thinking about your diesel exhaust because of the air pollution regulations spelled out in the EPA Standards of Performance for Stationary Compression Ignition Internal Combustion Engines (CI ICE) from 40CFR (Code of Federal Regulations), Part 60, Subpart IIII. The EPA air pollution regulations are very extensive, including a completely separate standard for spark ignition engines. The air quality requirements for engine generator exhaust emissions regulated by this document were first declared effective on September 11, 2006. The regulations have undergone some revisions and updates since then due to industry objections. The last final revision was released on January 14, 2013. This six-plus-year period has seen active industry involvement in an attempt to better balance the overall environmental benefit with regulation adoption and enforcement.

CI ICE in the Data Center Arena
EPA 40CFR, Part 60, Subpart IIII covers CI ICE powered by liquid, gaseous, and solid fuels and built from 2007 forward. This particular article, however, focuses strictly on the compliance needs of diesel-fueled engines. EPA 40CFR, Part 60, Subpart IIII sets up four primary engine characteristics to determine what emission source restrictions must be observed and when. These characteristics are as follows:
1. Classification
2. Model year manufactured (not shipped or delivered)
3. Cylinder displacement
4. Horsepower

These characteristics are part of an elaborate and layered matrix with compliance categories tiers, which was designed to guide the overall industry implementation of these air quality standards from Day One 2007 through January 1, 2015.

In turn, these standards included a targeted and continual incremental reduction of allowed emissions of four pollutant groups during engine operation. The lowest and final permitted discharge levels take effect on January 1, 2015.

Now in its seventh year of implementation, the current (2013) compliance program has evolved so that Tier 4 Interim and Tier 4 Final are the only remaining categories. The four targeted pollutant groups are:
1. Nitrogen oxides
2. Particulate matter
3. Carbon monoxide
4. Non-methane hydrocarbons

Both engine manufacturers and end users are responsible for working together to achieve operational compliance. Manufacturers have to build increasingly clean burning engines; end users have to purchase the correct EPA Tier-certified engine to install (either EPA Tier 4 Interim or EPA Tier 4 Final at this point) and the appropriate engine exhaust after treatment (see Figure 1).

Emissions control is a joint effort of the end user and the engine generator vendor.

Emissions control is a joint effort of the end user and the engine generator vendor.

Since no single after treatment effectively reduces all four pollutant groups, a combination of methods is needed to meet mandated targets. Consequently, the only way EPA Tier 4 Final compliance can be verified is by field testing the generator and its after-treatment system. As a side note, all of these requirements for diesel engine generators are applied in conjunction with the current use of ultra-low sulfur diesel fuel (less than or equal to 15-ppm sulfur content) as part of the EPA’s multi-pronged approach to achieving clean air.

So now that you know about the EPA’s emission limitations for CI ICE, how do they apply to your situation? Let’s go back to the opening question to start to answer that: Are your diesel generators used strictly for emergency or non-emergency purposes?

Emergency or Non-Emergency?
If your generators are used solely for emergency purposes, 40CFR, Part 60, Subpart IIII does not apply. In this case, emergency means that your generators are operated for emergency backup power only when there is an electrical utility outage. Therefore, generators installed for fire and life-safety purposes also fall within the emergency designation. In addition, generators used to power critical networks and equipment can qualify as long as they are operated only during times of electric utility loss. And fire and flood pump applications also fall squarely in the emergency status camp, whether powered by engine generator or direct engine drive.

If, however, the chosen operational strategy is to run CI ICE for any other purpose when utility power is available (such as storm avoidance), then non-emergency status applies, and the application must comply with 40CFR, Part 60, Subpart IIII. This status also applies if there is a standing financial incentive or arrangement (peak shaving, demand response, rate curtailment, continuous base loading, etc.) for a generator operator to supply power into an electric utility grid for the grid owner’s benefit.

Regardless of whether the engine generators involved are permanently installed or rental units for temporary use (i.e., non-road mobile), 40 CFR, Part 60, Subpart IIII still applies.

EPA regulations recognize the practical need for maintaining the reliability of the backup system, including emergency utility requests for disconnection from their grid to sustain grid stability in avoiding brownouts, blackouts and utility islanding. The EPA does allow emergency equipment to be run for system maintenance, exercising and testing purposes, up to 100 hours per year. Non-emergency equipment subject to 40CFR, Part 60, Subpart IIII regulations can be run typically up to 50 hours per year for these purposes. Allowable runtimes, though, may be less than these numbers, depending on the local air district authority having jurisdiction (AHJ).

Figure 2 shows the EPA Tier 4 compliance timeline for various sizes of non-emergency engine generators. Please also understand that once an after-treatment system has been successfully installed and commissioned detailed system operating records need to be maintained from that point forward. Besides being useful from a practical operating perspective, these records then document ongoing compliance with EPA’s air quality standards.

EPA compliance timeline.

EPA compliance timeline.

After-Treatment Systems
After-treatment systems basically come in either proprietary prepackaged forms or as custom-located and interconnected components. Either way, the prevailing design approach is that each engine has its own dedicated after-treatment system. These use a reactant-based treatment in combination with stationary catalyst elements. The reactant is typically a 32.5-wt% urea solution (known generically in the U.S. as diesel exhaust fluid or DEF and in Europe as AdBlue or AUS 32), a 40-wt% solution, or an ammonia solution. While ammonia can be used as a reactant, urea is preferred because it is much less toxic, easier to transport and store and simpler to permit.

Platinum, palladium, copper zeolite, iron zeolit, and Vanadia are common catalysts. The contact geometries for the catalyst elements are often of a honeycomb type but vary by system manufacturer. Catalyst elements are not intended to be thought of or act as particulate filters.

Each of the major engine manufacturers is currently promoting its own after-treatment system to accompany and mate with its respective engine products, apparently in an effort to keep after-treatment packages in as compact an arrangement as possible to minimize the floor and/or cubic space required to house them. Nonetheless, be prepared to need a significant amount of room in addition to that normally used by the engine generator. Serviceability and maintainability also become important design issues to consider when evaluating the various options for after-treatment systems.

Figure 3 is a diagram of a fairly generic urea after-treatment system offered by one major supplier of such systems. Note that the preliminary treatment equipment is shown without the external insulation that will be required for safety and operational efficiency reasons for a final installation.

After-treatment system diagram

After-treatment system diagram

Exhaust gases from an EPA Tier 4-certified engine are directed through a pre-oxidation catalyst section to a reactant mixing section and then finally on to a selective catalytic reduction (SCR) unit before being discharged to the outside atmosphere. The precious metal catalyst materials in the pre-oxidation unit significantly reduce carbon monoxide, non-methane hydrocarbons and other hazardous air pollutants from the exhaust including up to 50% of the particulate matter. Urea then gets injected into and thoroughly mixed with the pre-oxidation output gases after the temperature of these gases rises to the 300-400°C (572-752°F) range before subsequent introduction into the SCR unit. This urea activation temperature is needed to actively regenerate the SCR catalyst as well as incinerate any accumulated ash. Passive and slower rate regeneration, however, occurs prior to this activation temperature being reached. (Note: Some SCR systems activate urea injection at 93°F [200°C].) And lastly, the SCR unit is where the bulk (up to 90%) of the remaining nitrogen oxides is removed before final discharge to the atmosphere.

The remaining components shown (reactant storage tank, booster pump, air compressor, dosing box and SCR controller) are all to support the correct operation of the overall after treatment. They represent the dynamic parts of the system whereas the pre-oxidation and SCR units are the static components. The SCR controller serves as the brains of the system in directing the necessary urea injection rates to achieve required discharge emission targets.

One side benefit of the after-treatment system is that it provides acoustical attenuation of the engine-generator noise that has normally been handled by an exhaust silencer. In fact, an acoustical analysis could show that a traditional silencer is not needed because the after-treatment system will fulfill the necessary sound reduction needs.

An After-Treatment System Example
The next set of figures (See Figures 4-10) depicts an actual installation of the generic after-treatment system previously described for a 2.725-megawatt (MW) generator driven by a 4005-HP prime-rated engine. They also convey some sense of the amount of additional cubic space required to accommodate such a system.

The surfaces of the catalyst modules that the exhaust gases flow across in this section (See Figure 4) should be visually checked for ash deposits every few years depending on the intensity of engine use during that time. These catalyst modules can be removed, swept clean of ash (using appropriate safety gear and associated practices), re-oriented to expose clean surfaces and then reinstalled. There is no need to replace the catalyst modules themselves.

Oxidation catalyst housing

Oxidation catalyst housing

Although the mixing section itself (See Figure 5) is basically a static unit with no moving parts, it is specifically engineered from a dimensional perspective to promote maximal mixing of injected urea with the engine exhaust gas stream given the engine’s discharge flow parameters. The combination of flow turbulence produced by the static mixers and atomized urea injection into the exhaust flow are integral to producing thorough mixing.

Illustration shows the urea injection and mixing section.

Illustration shows the urea injection and mixing section.

Figure 6 gives a more detailed view of the urea-dosing box and how it interfaces with the point of atomized urea spray injection. And although the various equipment components of a generic after-treatment system can be custom located, there can be limits on the maximum distances between components. Figure 6 shows one such limit: how far the dosing box can be from the urea injection point is shown.

The dosing box is a critical part of the mixing process.

The dosing box is a critical part of the mixing process.

The urea dose rate adjustment occurs in the dosing box, based on control feedback from the SCR controller. The compressed air feed is responsible for producing a very fine atomization of the liquid (or aqueous) urea flow for optimal injection and mixing efficiency. Both flows must work together for effective performance.

The heat of the mixed exhaust stream then evaporates the urea content and causes it to decompose into ammonia before this exhaust stream enters the SCR housing.

The urea supply line in this case is heat traced and insulated to ensure reliable flow operation during engine runtimes for the coldest expected winter days. Maintaining urea quality and fluidity within its recommended temperature range is critical to keep the injection spray nozzle from clogging with crystalline deposits and, thus, preventing adequate after treatment.

The final key after-treatment step occurs next within the SCR housing unit (see Figure 6), which is where the nitrogen oxide content of the exhaust stream gets knocked down to levels that comply with the EPA’s Tier 4 mandate. The intimate mixture of evaporated urea turned to ammonia entering the exhaust chemically reacts with the catalyst elements within the SCR housing to effect the conversion of the remaining nitrogen oxides into mostly nitrogen (N2), CO2 and water for discharge to the atmosphere. Some minute amounts of ammonia (called ammonia slip) may also sneak through the SCR housing and exit with the intended discharge products. Proper tuning of the urea dosing to the actual installed system performance conditions, however, will minimize ammonia slip issues.

As a reminder, the SCR housing shown in Figure 7 is for a 2.725-MW system. The installed unit with final insulation for this size system requires a 7-foot clearance from the underside of the supporting structure. Because of the width of the housing, a fire sprinkler head is required underneath. This head and accompanying sprinkler piping in turn take up even more height. Consequently, careful planning, unit placement, and overall adequate room height are needed for a successful installation that can be safely accessed and serviced.

Components have maximum and minimum clearances.

Components have maximum and minimum clearances.

SCR technology, by the way, has been successfully used for nitrogen oxide removal for decades, having started out in central power plant applications for the exhausts of gas turbines and reciprocating engines. Therefore, this technology will likely continue to be around for a long time to meet future emissions standards as well.

Figure 8 illustrates the types of operating characteristics monitored for the SCR unit. These include the differential pressure across the SCR housing, inlet and outlet temperatures and a continuous sampling port for the final exhaust gas. They each have status indication connections with the SCR controller so the controller can regulate the upstream urea dosing and overall system operation to provide the required after-treatment results. All things considered, SCR unit monitoring is relatively straightforward.

Key operating parameters to be monitored include include differential pressure, inlet and outlet temperatures and final exhaust gas quality.

Key operating parameters to be monitored include include differential pressure,
inlet and outlet temperatures and final exhaust gas quality.

Figure 9 shows a room full of SCR controllers. This is the technical complexity of the after-treatment-system lies. Though there is a sample port in the discharge of the SCR unit, the actual gas sampling operation happens within the SCR controller. Each controller housing contains a nitrogen oxide comparison cell, little pumps, fans, fine filters and small cooling equipment as well as a touchscreen-activated programmable controller for each after-treatment system. This includes operating set points, status monitoring, performance parameter indications, diagnostics, alarm generation, maintenance alerts and other historical data records.

A room full of SCR controllers.

A room full of SCR controllers.

Figure 10 shows a large-scale central urea storage and distribution system. Using high-quality aqueous urea (solid urea dissolved in water) and sustaining its quality throughout its storage life are essential for effective after-treatment system performance. To start, the solid urea used to make aqueous urea needs to be of commercial-or technical-grade purity (99.45% pure). Next, the water used to make 32.5-wt% aqueous urea solutions (40 wt% is also used for some systems) also has to be ultra clean, such as the product of reverse osmosis or ion exchange cartridges. These purification requirements remove as many potential catalyst poisons as possible. The list of catalyst poisons contains at least 31 elements, most of them heavy metals and compounds containing silicon (Si), phosphorous (P), arsenic (As), antimony (Sb), sodium (Na), and zinc (Zn). Fortunately, ready-made aqueous urea solutions meeting necessary purification standards can be commercially purchased for nominally around US$3 per gallon, often from a fuel oil supplier.

Things to Know About Aqueous Urea
Make sure you and your urea solution supplier use caution in transporting and handling this solution to avoid any contamination from its exposure to other materials or environments. Once safely stored on-site, the next task is to keep the solution between 32-77°F (25-30°C). At 32°F (25°C) and lower, the solution will begin to stratify and form precipitates, which will form deposits in the storage tanks. In addition, the solution concentration will become inconsistent and no longer perform reliably after treatment. Aqueous urea freezes and crystallizes at 12°F (-11°C), and solution temperatures above 77°F (25°C) (and especially above 86°F [(30°C]) accelerate solution breakdown. Therefore, the temperature range restrictions point to the technical requirements for the necessary environmental conditions for effective storage and distribution of aqueous urea.

A large-scale central storage and distribution system.

A large-scale central storage and distribution system.

The materials within the urea storage and distribution system likewise play an important role in helping to maintain solution purity. For example, stainless steels (304, 304L, 316, and 316L), polyethylene, polypropylene and polyisobutylene are recommended for direct contact applications. Other plastics such as PFE, PTFE, PFA and PVDF are also acceptable. All plastics considered should be free of additives. On the opposite end of the compatibility spectrum, carbon steel, galvanized steel, copper, copper alloys, aluminum, lead and zinc are not recommended for direct contact uses. Solders containing silver, lead, zinc and copper are also not recommended.

Urea as a solid material is available in different grades and is used in making fertilizers, cosmetics, food additives and industrial processes. Aqueous urea can also be used as a fertilizer, which may be a disposal option depending on local regulations. It is a clear and colorless liquid with a specific gravity of 1.09. It is non-flammable and readily biodegradable, with no known significant hazardous properties–even at a pH between 9.0 and 9.8. Nonetheless, personnel handling urea should use goggles and rubber gloves.

So what is a reasonable shelf life for aqueous urea? Figure 11 is a table that shows shelf life as a function of storage temperature. As the storage temperature rises, the expected urea shelf life drops. This inverse relationship (six-month shelf life change for every 9°F [13°C] change) is a key reason why prolonged storage above 77°F (25°C) (and especially above 86°F [30°C]) is not a recommended practice.

Urea shelf life as a function of temperature.

Urea shelf life as a function of temperature.

Another good shelf-life management practice is to periodically sample and test the urea solution quality. ISO 22241 probably comes closest to being the national or international sampling standard. That said, the following would be a recommended monitoring practice:

1. Determine the initial (or latest) as-received shipment of aqueous urea alkalinity as NH3 as the starting baseline predictor of quality condition for that shipment.

2. Choose a time interval (six months, three months, monthly, etc.) for urea quality control checks based on expected usage.

3. Trend subsequent quality check results by continuing to measure urea alkalinity of each sample as NH3 to track the change of this value.

4. Use ISO 22241-2 parameters as the complete set of reference characteristics for judging urea quality conditions.

Lessons Learned
At this point, we have covered the basic design and installation of an engine exhaust after-treatment system and maybe a smidge of some operational considerations. Actual operating experience, however, of almost two years has brought me additional insight into predictive factors.

After-treatment systems have to reach internal temperatures 600°F (316°C) of nominal before any urea solution injection will be initiated. With an installation having engines that produce 817°F (436°C) exhaust temperatures only when they are running flat out at 100% load, a single exercising engine could activate urea use with relative ease by powering a load bank set for about 70% or higher engine loading. However, if this installation involves multiple engines running in a redundant configuration, hitting the initiation temperature becomes much harder to achieve without some clever manipulation. And even clever manipulation may not be enough to achieve an activation scenario depending on the active mix of the following:

1. Overall building load (move-in, fully loaded, in-between)
2. Minimum number of generators needed to run to keep total harmonic distortion on the building electrical distribution system within acceptable limits
3. Load bank conditions (permanent, temporary and capacity relative to a single generator, etc.)
4. The various operating configurations employed (i.e., emergency, non-emergency, maintenance, exercising, etc.)

The design expectation based on manufacturer technical literature for effective after treatment was for six gallons of urea to be used for every 100 gallons of fuel oil. The operating reality is that this ratio is more like 1.9 gallons of urea per 100 gallons of fuel oil, or 32% of originally predicted. This reality then begs the following issues for further scrutiny:

1. The original design numbers were based on engines running at 100% load. In a highly redundant normal operating configuration (N+2 minimum), each engine generator operates partly loaded with that partial load depending on the actual overall demand load at a point in time and the number of engines running. That demand load can shift depending upon the operating scenario. Therefore, the original design assumptions not only did not match actual operating conditions, they grossly overestimated them for most of these scenarios by a factor of three.

2. This overestimation, in turn, translates into oversized urea storage facilities and the purchase of too much urea, at least initially. Having too much urea on-hand leads to storing urea to the limits of its expected shelf life or beyond.

3. The minimum individual engine loading to create hot enough exhaust temperatures to initiate urea injection is 70%. While this loading was first determined during the commissioning phase, full realization of its consequences, unfortunately, took much longer to arrive.

4. Engine-generator operation during monthly exercising activities requires a permanently installed load bank to insure urea injection. This condition, however, is the only one of several where urea injection can be achieved.

5. Exercising an engine under no-load conditions for any length of time usually results in incomplete combustion within the engine. This, in turn, causes the discharge of partially combusted fuel oil by-products into the exhaust and, therefore, the downstream after-treatment system. The fouling in this situation (affectionately called slobber by some) is certainly not good for the internal surfaces (including the precious metal impregnated ones) of the after-treatment system.

6. Operating records indicate that the normal non-exercise operation scenario typically imposes a 45% load on each engine generator, well below that required to initiate urea injection. This, unfortunately, is inherent to running a highly redundant configuration.

7. There seems to be no current consensus on what should be done if normal operation of an engine-generator system does not produce hot enough exhaust temperatures to initiate urea injection into its after-treatment system. Will enforcement of 40CFR, Part 60, Subpart IIII be such that owners should expect frequent assessment of penalty fees for non-compliance? Will such no-urea-injection operating results mean that backup systems cannot be legally operated? Will lower-than-needed activation temperatures for urea injection be acceptable? Will the Best Available Control Technology (BACT) standard be invoked by the EPA to address this apparent disconnect between after-treatment functional goals and the BACT to allow additional time to come up with a workable solution? Will backup power requirements needed to meet Uptime Institute Tier Standards be affected?

So caveat emptor (buyer beware), especially if a highly redundant engine-generator system is involved. A new design and operation challenge is now before the mission critical data center community!

Cost Awareness
The installed cost of the after-treatment system described in this article for ten 2.725 MW units was about US$4.5 million, with only US$2,500 being spent for urea in the first year of operation. Contracted maintenance service costs, however, are much healthier and shown in Figure 12.

Contracted maintenance service costs for a typical system comprising ten 2.725-MW units.

Contracted maintenance service costs for a typical system comprising ten
2.725-MW units.

These costs are based on an annual maintenance visit and a more thorough alternating bi-annual visit. They do demonstrate the importance of careful consideration of maintenance cost.

NESHAP
Now that the EPA requirements for engines made in 2007 and after have been discussed, attention should be turned to those engine-generator installations with engines manufactured in 2006 and earlier. These installations do not get to avoid EPA regulations. They just have to comply with another long-named reference standard instead (40CFR, Part 63, Subpart ZZZZ). These regulations are known as the EPA National Emission Standards for Hazardous Air Pollutants for Reciprocating Internal Combustion Engines (or NESHAP for RICE for short).

Though less stringent than 40CFR, Part 60, Subpart IIII, NESHAP still will likely require the installation of an oxidation catalyst unit on the exhaust of each engine to meet the newer and much stricter carbon monoxide (CO) emission targets than those that had previously existed. In fact, there are already combination silencer and oxidation catalyst housings available that can often be installed in the same location as existing silencers to minimize retrofit problems. These CO targets vary according to engine horsepower rating, hazardous air pollutant source potential-to-emit (either Area or Major), and the same Emergency and Non-Emergency engine classifications as previously described for 40CFR, Part 60.

Conclusion
Backup diesel engine generators are integral to most modern data centers. And engine-generator use varies depending on the business mission, criticality, and sophistication of a particular data center. The EPA and regional air quality districts have long since pegged engine generators as unfavorable but necessary air emissions polluting sources, typically restricting their annual runtime allowances to 100 hours or less.

The individual impact of this campaign for a data center depends on how the data center uses its engine generator(s). If the use is for emergency purposes only, then the prevailing emission standards are less restrictive. If nothing else, an oxidation catalyst installation may (or may not) be in your near future. Many new and emerging data center projects are claiming emergency status to avoid the added costs and complexities of installing exhaust after-treatment systems.

Nonetheless, if intended generator use does include non-emergency use, then expect to get very familiar with exhaust after-treatment systems. The content of this article is meant to assist that familiarization so your air emissions act can get appropriately cleaned up. Uptime Institute Tier III and higher projects are more likely candidates for falling into the non-emergency category.

In conclusion, currently operating data centers and new data centers due to come online need to understand their vulnerability to EPA and regional air quality district regulations. Evaluating this potential impact hinges on a host of factors discussed within this article, including some lessons learned from a real-life operating after-treatment system. While the technology for legitimate after-treatment systems is fairly mature, its use within the data center world is not. Therefore, let the growing pains begin—and with knowledgeable foresight and guidance, let them quickly be absorbed and abated. Uninterruptible uptime is about to get a lot cleaner.

Lamont Fortune, PELamont Fortune, is the lead mechanical engineer in the Data Center Facilities group within the Information Technology division of UnitedHealth Group (or UHG-IT). He has over 40 years of engineering experience in the facility infrastructure field involving water and wastewater treatment plants, high production printing, corporate offices, research laboratories, archival record storage and other technical operations. His last 20-plus years have been especially focused on mission-critical data center projects including their planning, design, project management, construction, commissioning and operations. These projects have ranged across the private, institutional, governmental and military arenas. He enjoys addressing the complexities of successfully sustaining uninterruptible uptime environments.

How business practices can impair data center availability

Don’t pit man against machine

By Charles Selkirk, ATD

While data centers in Southern Africa utilize world-class designs and construction techniques, ongoing operational sustainability models struggle to attract and retain sufficient qualified and motivated personnel, in part due to the lack of recognition of the importance of their work (unless something goes wrong!).

The Uptime Institute’s Tier models have wide recognition and acceptance in the region, and many of our data center customers have lately elected for Tier III designs. In most cases, customers require active-active A and B paths, both with UPS backup, due to power instability issues experienced throughout the region; however, the recently launched Operational Sustainability Standard has had less impact in this region and is only now starting to gain some traction. As a result, data center operators in Southern Africa seem to take one of two very different approaches to the working relationship, which we have labeled Availability-Centric and Safety-Centric.

We based this observation on a catalog of data center failure and near-miss events compiled through our work in building facilities and providing sustainable support of ongoing operations in the region (especially in South Africa), which caused us to examine how the customers’ focus on operator safety might affect availability and reliability. We wanted to test the general business perceptions that availability is paramount and that it can be assumed that operators and maintenance support personnel have the skills and motivation to meet their own safety needs.

Availability-Centric View
Whatever the agreed-upon availability level of a facility, businesses consider availability to be paramount. This view is founded on the belief that business will suffer in the event of any downtime, and availability cannot be compromised unless the circumstances are either unavoidable or dire. In high-availability facilities, owners may believe there is no need for downtime and that maintenance can be done only in strictly controlled time slots, with generally no tolerance for faults or errors. The issue of operator and maintenance personnel safety is rarely, if ever, raised or discussed. Without generalizing too much, this attitude is prevalent in the financial services and retail industries.

Safety-Centric View
In a progressive corporate culture, the safety of employees is paramount, and all business needs are undertaken with this understanding. This is not to say that availability is unimportant, only that “safety is king.” These businesses have better-informed engineering support, and without exception, they view accidents and other incidents as potentially significant losses that must be considered and minimized. This culture drives, empowers and enables designers and operators to value themselves and their workplaces – and, in our experience, it also leads to improved availability.

We have found this culture to be more prevalent within the resources and manufacturing industries, although some commercial and colocation operators have also adopted this perspective.

In the schedules below, we summarize 49 incidents that have occurred in data centers over a period of 13 years throughout Southern Africa. Luckily, none of these incidents led to any form of personal injury, although many of them led to outages that damaged reputations, sapped customer confidence and caused financial losses.

In the schedules, we list the centricity characteristic of the owners, the markets served, the availability levels, the presence of dual-cabinet feeds, a description of the root cause of the problem, and whether an outage occurred – plus generalized comments on the incidents reported. The assignment of a centricity characteristic label may appear somewhat subjective, but in our observation, the difference between the Safety-Centric and Availability-Centric perspectives is easily discernible.

Summary
We would make the following observations based on this study:

A. The study was limited to the 49 incidents reported. A wider study with a larger number of incidents may yield somewhat different results.

B. There has been a shift to higher-reliability facilities over the last decade, in accordance with the higher uptime requirements of customers.

C. Despite the shift to higher-availability facilities, it is worth noting that the reduction in downtime incidents reported by Safety-Centric customers remains significantly lower than those reported by the Availability-Centric customers.

D. There is a clear and significant trend indicating that refocusing data center operations and maintenance to a Safety-Centric focus has significant benefits to customers in terms of uptime experienced.

Results
Figure 1. In our study, half the incidents reported resulted in downtime of some form to the data center.

Figure 1: Outage Breakdowns

Figure 2. Clearly, a higher Tier-level design has the desired effect of reducing downtime.
*The availability assessments are the author’s subjective evaluations.

Figure 2: Outage breakdowns by availability

Figure 3. If we exclude downtime incidents at lower-availability facilities, the ratio adjusts. Three times as many incidents occur in facilities designed to meet Tier III certification as in Tier IV or similar facilities, in part because of the relative number of the lower-availability facilities.

Figure 3: Outages by Availability

Figure 4. Of the data center owners included in our report, 75% are clearly more focused on availability, while we would classify only 25% as Safety-Centric owners.

Figure 4: Breakdown by owners

Figure 5. We can see from this chart that the breakdown is largely in agreement with the breakdown by centricity type. If we exclude the incidents reported for lower-availability facilities, the incident breakdown remains largely unchanged.

Figure 5: Incidents

Figure 6. This figure illustrates the relative proportion of downtime incidents by centricity type.

Figure 6: Centricity Type

Figure 7. Of the total of 33 incidents recorded by Availability-Centric customers, 22 (67%) caused either partial or total outages. Of the total of 16 incidents recorded by Safety-Centric customers, just three (19%) caused either partial or total outages. In all three cases reported, the downtime incidents were the result of equipment failures in older facilities where maintenance practices and scheduling were not optimal. If we reexamine the proportion of downtime incidents when lower availability facilities are excluded, the incident breakdown is substantially similar.

Figure 7: Incidents with downtime

Figure 8. Of the 12 downtime incidents reported, 11 were recorded by Availability-Centric customers while only one was recorded by a Safety-Centric customer. That single failure was attributable to equipment failure in a legacy facility.

Figure 8: Downtime Incidents

The following tables list the 13 years of events the author used to draw the conclusions presented in this article.

 Table 1: Data Center Incidents

 Table 2: Data Center Incidents

 Table 3: Data Center Incidents

 Table 4: Data Center Incidents

 Table 5: Data Center Incidents

Charles SelkirkCharles Selkirk was born in Zimbabwe and grew up in Harare, completing high school in 1977. He served as a crime investigator in rural districts for the local police force and later became a circuit court prosecutor. Mr. Selkirk then followed his brother into electrical engineering, earning a degree from the University of Cape Town in 1984. In 1985, he married and moved to ‘The Reef’ near Johannesburg and started working on a deep-level gold mine, initially serving as a foreman and supervisor. He eventually became a section engineer in charge of engineering construction and maintenance operations, a position he held for five years.

Mr. Selkirk left the mining industry in 1989 and moved to Cape Town, where he joined his brother in an engineering consulting practice. In the early years of the firm, they consulted on a wide range of projects for building services. Mr. Selkirk’s brother emigrated to the U.S. in 2000 as the firm shifted its focus to specialize in data center MEP services, turnkey data center construction and running data centers. More recently, his two sons have joined the business – one as a site engineer and the other as a programmer.

Introducing Uptime Institute’s FORCSS System

A new process to compare IT deployment options across both in-house and outsourced alternatives

By Julian Kudritzki and Matt Stansberry, Uptime Institute

Uptime Institute FORCSS™ is an original system to capture, compare, and prioritize the various impacts to the many IT deployment alternatives.

On an ongoing basis, enterprise organizations decide to deploy IT assets in an internal data center, colocation facility, hosting environment, or cloud solution. These decisions may not holistically view the financial, risk, performance, or other impacts. FORCSS enables the management of an enterprise organization to identify, weigh, and communicate the advantages and risks of IT applications deployment options using a consistent and relevant criteria based in business drivers and influences.

Since its inception, the mission of the Uptime Institute has been to assist the enterprise in devising feasible and adaptable data center solutions that are responsive to the business. Successful solutions align data center design, technology selection, construction, and operation to achieve high reliability. One of the leading challenges today is deciding the most viable IT deployment option.

FORCSS helps the enterprise to overcome this challenge by focusing on the critical selection factors, thereby reducing or eliminating unfounded assumptions and organizational “blind spots.” FORCSS establishes a consistent and repeatable set of evaluation criteria and a structure to communicate the informed decision to stakeholders.

A coherent IT deployment strategy is often difficult because the staff responsible for IT assets and IT services across multiple geographies and multiple operating units are themselves spread over multiple geographies and operating units. The result can be a range of operating goals, modes, and needs that are virtually impossible to incorporate into a single, unified deployment strategy. And when a single strategy is developed from the “top down,” the staff responsible for implementing that strategy often struggles to adapt that strategy to their operational requirements and environments.

FORCSS was developed to provide organizations with the flexibility to respond to varying organizational needs while maintaining a consistent overall strategic approach to IT deployments. FORCSS represents a process a) to apply consistent selection criteria to specific deployment options, and b) to translate the outcome of the key criteria into a concise structure that can be presented to “non-IT” executive management.

The FORCSS system is composed of six necessary and sufficient selection factors relevant to an effective deployment decision. These six factors, or criteria, provide a holistic evaluation system, and drive a succinct decision exercise that avoids analytical paralysis. FORCSS identifies the relevant internal and external input.

And, by scaling the importance of the criteria within the system, FORCSS allows each organization to align the decision process to organizational needs and business drivers.

FORCSS Factors And Their Definitions

The Uptime Institute FORCSS system is a means to evaluate deployment alternatives. Accordingly, it is crucial to have a working knowledge of the tangible or intangible values associated with the application being deployed. Tangible values are notably revenues and intangible end-user satisfaction. To be effective and lasting, FORCSS must involve the stakeholder requesting the IT deployment. In other words, don’t lose sight of your client.

FORCSS TABLE

Financial

The fiscal consequences associated with deployment alternatives.

  • Net Revenue Impact: An estimation of gross profit margin—estimated revenues of IT service or application minus cost of ownership.
  • Comparative Cost of Ownership: The identified differential cost of deploying the alternative plus ongoing operations and maintenance, including the incremental cost of scaling the alternative as business grows. For example: Significant cost centers can include real estate development, MEP infrastructure, cost of financing, taxes, IT equipment, software license and customization, staffing, service provider and consulting fees. The most definitive cost information for each alternative is from a Total Cost Of Ownership (TCO) accounting protocol, for those few companies that have the capability to reliably determine TCO. Differential and incremental cost is often more directly determined.
  • Cash and Funding Commitment: Representation of liquidity—cash necessary at appropriate intervals for the projected duration of the business service.

Opportunity

A deployment alternative’s ability to fulfill compute capacity demand over time.

  • Time to Value: The time period from decision to IT service availability. Timeline must include department deployment schedules of IT, facilities, network, and service providers.
  • Scalable Capacity: Available capacity for expansion of a given deployment alternative.
  • Business Leverage and Synergy: Significant ancillary benefits of a deployment alternative outside of the specific application or business service.

For example: Improve economies of scale and pricing for other applications. Or, geographic location of a particular site provides business benefits beyond the scope of a single application.

Risk

A deployment alternative’s potential for negative business impacts.

  • Cost of Downtime vs. Availability: Estimated cost of an IT service outage vs. forecasted availability of deployment alternative.
  • Acceptable Security Assessment: Internal security staff evaluation of deployment alternative’s physical and data security.
  • Supplier Flexibility: Potential “lock-ins” from a technical or contractual standpoint.

For example: Rating situations as simple, difficult/costly, or impossible to negotiate regarding software, hardware, site, and service provider commitments.

Compliance

  • Verification, internal and/or third-party, of a deployment alternative’s compliance with regulatory, industry, or other relevant criteria.
  • Government: Legally mandated reporting obligations associated with the application or business service. For example: HIPAA, Sarbanes-Oxley, PCI-DSS.
  • Corporate Policies: Internal reporting requirements associated with the application or business service. For example: Data protection and privacy, ethical procurement, Corporate Social Responsibility.
  • Compliance & Certifications to Industry Standards: Current or recurring validations achieved by the site or service provider, beyond internal and governmental regulations. For example: SAS 70®, SSAE 16, Uptime Institute Tier Certification or M&O Stamp of Approval, ISO®.

Sustainability

  • Environmental consequences of a deployment alternative.
  • Carbon and Water Impact: Carbon and water use for given a site or service. For example: The Green Grid’s Carbon Usage Effectiveness (CUE)™ and Water Usage Effectiveness (WUE)™ metrics.
  • Green Compliance & Certifications: Current or recurring validations achieved by the site or service provider, beyond internal and governmental regulations, of sustainable design and/or operations practices. For example: LEED®, BREEAM®, Carbon Credits, European Union Code of Conduct, U.S. EPA Energy Star®, and The Green Grid’s® DC Maturity Model equalizer.
  • PUE Reporting: PUE is an industry-accepted indicator of a site or service provider’s efficiency commitment.

Service Quality

  • A deployment alternative’s capability to meet end-user performance requirements.
  • Application Availability: Computing environment uptime at the application or operating system level.
  • Application Performance: Evaluation of an application functional response; acceptable speeds at the end-user level.
  • End-User Satisfaction: Stakeholder response that an application or deployment alternative addresses end-user functional needs. For example: End-user preference for Graphical User Interfaces or Operating/Management Systems tied to a specific deployment alternative.

Using Uptime Institute FORCSS

This system was developed and validated by thought leaders in the enterprise IT industry to ensure
usefulness by those who inform senior-level decision makers.Many organizations already perform due diligence that would include most of this process. But the Uptime Institute FORCSS system provides the following:

  • A structure and a set of common definitions agreed upon by an elite group of data center owners and
    operators from around the world.
  • A succinct and effective way to communicate recommendations to the C-level executives.

Uptime Institute believes the FORCSS system is sufficiently flexible and comprehensive to improve IT investment decisions.

Notes on using FORCSS:

Uptime Institute acknowledges that there are overlaps and dependencies across all six factors. But, in order to provide a succinct, sufficient process to inform C-level decision makers, categories must be finite and separate to avoid analysis paralysis. The purpose of FORCSS is to identify the business requirements of the IT service, and pragmatically evaluate capabilities of potential deployment options as defined.

Uptime Institute recognizes organizations will have unique business demands and priorities. Therefore, it will be necessary for each company conducting a FORCSS analysis to weigh each criteria according to specific business requirements. For example, most companies try to maximize data center efficiency. But, for a growing number of organizations, overall environmental sustainability of operations and supplier choices is a very public (therefore critical) aspect of their business. Organizations that put a high value on sustainability will weigh the criteria accordingly when applying FORCSS in their organizations. Other organizations may weigh sustainability at a low value, as inconsequential.

Uptime Institute is currently evaluating numerous concepts for FORCSS ‘displays.’ These displays will be graphical in nature, rather than a numerical score, to allow for evaluation of each factor within FORCSS and provide a visual comparison of one deployment alternative against another. Please visit FORCSS on the Uptime Institute Web site for the latest information and tools.

Uptime Institute’s Unique Approach To FORCSS Development

In order to ensure the development of a well-rounded, thorough, and useful methodology, Uptime Institute facilitated a series of Charrettes. (A Charrette is a design process that brings stakeholders together at one time, in one place, as a group completing design tasks in a focused, time-limited effort.) The benefits of this approach are that the stakeholders begin with a common understanding of the design objective, share in the development process, and receive immediate feedback on the result of their deliberations.

In October 2011 the first Charrette was held, composed of peers within Uptime Institute and the 451 Group. The fundamental objective was to define the problem and assemble an original framework to be submitted at a second Charrette of key industry stakeholders. This initial work created the structure of a multiple-component solution, including business functions, facilities infrastructure, computing hardware, and applications performance perspectives.

Building on this foundational effort, in January 2012, Uptime Institute hosted over 25 hand-picked senior technology executives from large organizations across multiple industries at a second Charrette. Uptime Institute invited executive leaders at organizations whose decisions impacted international markets and brands and provided broad experience making decisions influenced by multiple factors and challenges.

This group edited and crystallized the original structure into six top-level criteria, or principal factors, that make up the FORCSS framework. Following the second Charrette, Uptime Institute identified three key components for each of the six top-level criteria to further define the FORCSS criteria, and presented the expanded system at Uptime Institute Symposium in Santa Clara, CA, in May 2012.

At Symposium, Uptime Institute reconvened the previous group of executives who comprised the second Charrette, as well as new end-user participants, for a follow-up Charrette on FORCSS.

Some of the new participants represented companies that had been in business for more than 100 years and plan to be in business another 100 years. Many of these organizations are at a strategic inflection point—do they modernize or minimize their IT infrastructures? The participants recognized the FORCSS approach as a means to improve confidence in decision making and avoid unintended consequences.

The third Charrette participants were tasked with vetting the expanded 18-point FORCSS process. The discussions and debate provided substantive insight resulting in changes to the components making up the six factors.

The majority of executives at the second Charrette reported consistent and enduring challenges within their organizations:

FORCSS Begins With These Steps:

  • Incomplete data when evaluating internal assets, such as data center capital costs that aren’t included in
    TCO calculations for IT projects, or lack of in sight into personnel costs associated with providing internal
    IT services.
  • Lack of insight into cloud computing security, pricing models, and reliability data. Lack of credible cloud
    computing case studies.
  • Inconsistency in reporting structures across geographies and divisions and between internal resources and
    colocation providers.
  • Difficulty articulating business value for criteria not tied to a specific cost metric, like redundancy or
    service quality. Difficulty connecting IT metrics to business performance metrics.
  • Challenge of capacity planning for IT requirements forecast beyond six months due to evolving
    architecture/application strategy and shifting vendor roadmaps.
  • Difficulty collecting information across the various stakeholders, from application development,
    corporate real estate.
  • The first step is to identify the new application workload to be analyzed. The process is designed to
    evaluate a specific application workload against specific, existing assets or external resources (or in cases
    where a new site or service may be considered, detailed evaluation of planned asset).
  • Identify and engage the decision maker or C-level executive who will sign off on the final project. Provide
    background on FORCSS as a selection tool for winnowing deployment choices and eliminating blind spots
    in an organization.
  • Identify senior management in adjacent divisions to assess the implementation being considered. No one
    person will have sufficient insight into all areas of an organization. Be sure to include application owners
    and users, facilities/real estate, IT operations, and any other stakeholders.
  • Set parameters for your application to determine the functional life cycle of the application or IT service
    being analyzed in order to determine the value of the application, appropriate cost profile, and other
    necessary attributes that ensure the viability of business solution.

Uptime Institute recognizes the many challenges in conducting a FORCSS analysis:

  • Getting buy-in and understanding of the FORCSS language across disciplines and at the C-level.
  • Avoiding inappropriate weighting of Risk or other criteria based on division bias.
  • Obtaining objective data on third-party service provider pricing and availability.

Also, many companies may be challenged by the subjective nature of some of the inputs or have difficulty determining the true costs and benefits of various projects.

The purpose of this timely initiative is to improve a company’s investments and decision making, not to compare one company’s decisions against another’s. The way one organization determines the business value of an application or total cost of providing a service does not need to be the same as how another organization gathers those same data inputs.

A FORCSS analysis may pose tough questions without easy answers, but will help organizations make IT deployment decisions with confidence.

Julian Kudritzki Julian Kudrtitzki joined the Uptime Institute in 2004 and currently serves as Chief Operating Officer. He is responsible for the global proliferation of Uptime Institute Standards. He has supported the founding of Uptime Institute offices in numerous regions, including Brasil, Russia and North Asia. He has collaborated on the development of numerous Uptime Institute publications, education programs and unique initiatives such as Server Roundup and FORCSS. He is based in Seattle, WA.

Matt StansberryMatt Stansberry is Director of Content and Publications for the Uptime Institute and also serves as Program Director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for over a decade.

Accredited Tier Designer Profiles: Adel Rizk, Gerard Thibault and Michael Kalny

Three data center design professionals talk about their work and Uptime Institute’s ATD program.
By Kevin Heslin, Uptime Institute

Uptime Institute’s Accredited Tier Designer (ATD) program and its Tier Certification program have affected data center design around the world, raised standards for construction, and brought a new level of sophistication to facility owners, operators, and designers everywhere, according to three far-flung professionals who have completed the ATD program. Adel Rizk of Saudia Arabia’s Edarat, Gerard Thibault, senior technical director, Design and Construction division of Digital Realty (DLR) in the U.K., and Michael Kalny, head of Metronode Engineering, Leighton Telecommunications Group, in Australia, have applied the concepts they learned in the ATD program to develop new facilities and improve the operation of legacy facilities while also aggressively implementing energy-efficiency programs. Together, they prove that high reliability and energy efficiency are not mutually exclusive goals. Of course, they each work in different business environments in different countries, and the story of how they achieve their goals under such different circumstances makes interesting reading.

In addition to achieving professional success, the ATDs each noted that Tier certification and ATD programs had helped them innovate and develop new approaches to data center design and operations while helping market facilities and raise the standards of construction in their countries.

More than that, Rizk, Thibault, and Kalny have followed career arcs with some similarities. Each developed data center expertise after entering the field from a different discipline, Rizk from telephony and manufacturing, Thibault from real estate, and Kalny from building fiber transmission networks. They each acknowledge the ATD program as having deepened their understanding of data center design and construction and having increased their ability to contribute to major company initiatives. This similarity has particular significance in the cases of Rizk and Kalny, who have become data center experts in regions that often depend on consultants and operators from around the globe to ensure reliability and energy efficiency. It is in these areas, perhaps, that the ATD credential and Tier certification have their greatest impact.

On the other hand, the U.K, especially London, has been the home of many sophisticated data center operators and customers for years, making Thibault’s task of modifying Digital Realty’s U.S. specification to meet European market demands a critical one.

On the technology front, all three see continued advances in energy efficiency, and they all see market demand for greater sustainability and energy efficiency. Kalny and Thibault both noted increased adoption of higher server air supply temperatures in data centers and the use of outside air. Kalny, located in Australia, noted extreme interest in a number of water-saving technologies.

Hear from these three ATDs below:

ADEL RIZK

Adel Rizk Just tell me a little about yourself.

I’m a consulting engineer. After graduating from a civil engineering program in 1998 and working for a few years on public projects for the Public Switched Telephone Network (PSTN) Outside Plant (OSP), I decided in 2000 to change my career and joined a manufacturer of fast-moving consumer goods. During this period, I also pursued my MBA.

After gaining knowledge and experience in IT by enhancing and automating the manufacturer’s operations and business processes, I found an opportunity to start my own business in IT consulting with two friends and colleagues of mine and co-founded Edarat Group in 2005.

As a consultant working in Edarat Group, I also pursued professional certifications in project management (PMP) and business continuity (MBCI) and was in charge of implementing the Business Continuity Management Program for telecom and financial institutions in Saudi Arabia.

How did you transition from this IT environment to data centers?

One day, a customer who was operating a strategic and mission-critical data center facility asked me to help him improve the reliability of his MEP infrastructure. I turned his problem into an opportunity and ventured into the data center facility infrastructure business in 2008.

In 2009-2010, Edarat Group, in partnership with IDGroup, a leading data center design company based out of Boston, developed the design for two Tier IV and two Tier III data centers for a telecom operator and the smart cities being built in Riyadh by the Public Pension Agency. In 2010, I got accredited as a Tier Designer (ATD) by the Uptime Institute, and all four facilities achieved Tier Certification of Design Documents (TCDD).

What was the impact of the Tier certification?

Once we succeeded in achieving the Tier Certification, it was like a tipping point.

We became the leading company in the region in data center design. Saudi Arabia values certifications very highly. Any certification is considered valuable and even considered a requirement for businesses, as well as for professionals. By the same token, the ATD certificate positioned me as the lead consultant at that time.

Since that time, Edarat has grown very rapidly, working on the design and construction supervision of Tier III, Tier IV, and even Tier II facilities. Today, we have at least 10 facilities that received design Tier-certifications and one facility that is Tier III Certified as a Constructed Facility (TCCF).

What has been your personal involvement in projects at Edarat?

I am involved in every detail in the design and construction process. I have full confidence in these facilities being built, and Uptime Institute Certifications are mere evidence of these significant successful achievements.

What is Edarat doing today?

Currently, we are involved in design and construction. In construction, we review material submittals and shop drawings and apply value engineering to make sure that the changes during construction don’t affect reliability or Tier certification of the constructed facility. Finally, we manage the integrated testing and final stages of commissioning and ensure smooth handover to the operations team.

Are all your projects in Saudi Arabia?

No. We also obtained Tier III certification for a renowned bank in Lebanon. We also have done consultancy work for data centers in Abu Dhabi and Muscat.

What stimulates demand for Tier certification in Saudi Arabia?

Well, there are two factors: the guarantee of quality and the show-off factor due to competition. Some customers have asked us to design and build a Tier IV facility for them, though they can tolerate a long period of downtime and would not suffer great losses from a business outage.

Edarat Group is vendor-neutral, and as consultants, it is our job to educate the customer and raise his awareness because investing in a Tier IV facility should be justifiable when compared to the cost of disruption.

My experience in business continuity enables me to help customers meet their business requirements. A data center facility should be fit-for-purpose, and every customer is unique, each having different business, regulatory, and
operational requirements. You can’t just copy and paste. Modeling is most important at the beginning of every data center design project.

Though it may seem like hype, I strongly believe that Uptime Institute certification is a guarantee of reliability and high availability.

Information Technology Communications Complex (ITCC), in Riyadh, Saudi Arabia.

Information Technology Communications
Complex (ITCC), in Riyadh, Saudi Arabia.

Information Technology Communications Complex (ITCC), in Riyadh, Saudi Arabia

Information Technology Communications
Complex (ITCC), in Riyadh, Saudi Arabia

What has the effect of the ATD been on the data center industry in Saudi Arabia?

Now you can see other players in the market, including systems integrators, getting their engineers ATD certified. Being ATD certified really helps. I personally always refer to the training booklet; you can’t capture and remember everything about Tiers after just three days of training.

What’s unique about data centers in Saudi Arabia?

Energy is cheap; telecom is also cheap. In addition, Saudi Arabia is a gateway from Europe to Asia. The SAS1 cable connects Europe to India through Saudi Arabia. Energy-efficient solutions are difficult to achieve. Free cooling is not available in the major cities, and connectivity is not yet available in remote areas where free cooling is available for longer periods during the year. In addition, the climate conditions are not very favorable to energy-efficient solutions; for example, dust and sand make it difficult to rely on solar power. In Riyadh, the cost of water is so high that it makes the cost of investing in cooling towers unjustifiable compared to air-cooled chillers. It could take 10 years to get payback on such a system.

Budget can sometimes be a constraint on energy efficiency because, as you know, green solutions have high capex, which is unattractive because energy is cheap in Saudi Arabia. If you use free cooling, there are limited hours, plus the climate is sandy, which renders maintenance costs high. So the total cost of ownership for a green solution is not really justifiable from an economic perspective, and the government so far does not have any regulations on carbon emissions and so forth.

Therefore, in the big cities, Riyadh, Dammam, and Jeddah, we focus primarily on reliability. Nevertheless, some customers still want to achieve LEED Gold.

What’s the future for Edarat?

We are expanding geographically and expanding our services portfolio. After design and building, the challenge is now in the operation. As you already know, human errors represent 70 percent of the causes for downtime. Customers are now seeking our help to provide consultancy in facility management, such as training, drafting SOPs, EOPs, capacity management, and change management procedures.


MICHAEL KALNY

Michael KalnyTell me a little about yourself.

I received an honors degree in electrical engineering in the 1980s and completed a postgraduate diploma in communications systems a couple of years later. In conjunction with practical experience working as a technical officer and engineer in the telecommunications field, I gained a very sound foundation on which to progress my career in the ICT space.

My career started with a technical officer position in a company called Telecom, the monopoly carrier operating in Australia. I was there about 14 years and worked my way through many departments within the company including research, design, construction and business planning. It was a time spent learning much about the ICT business and applying engineering skills and experience to modernize and progress the Telecom business. Around 1990 the Australian government decided to end the carrier monopoly, and a company by the name of Optus emerged to compete directly with Telecom. Optus was backed by overseas carriers Bellsouth (USA) and Cable and Wireless (UK). There were many new exciting opportunities in the ICT carrier space when Optus began operations. At this point I left Telecom and started with Optus as project manager to design and construct Optus’ national fiber and data center network around Australia.

Telecom was viewed by many as slow to introduce new technologies and services and not competitive compared to many overseas carriers. Optus changed all that. They introduced heavily discounted local and overseas calls, mobile cellular systems, pay TV, point-to-point high-capacity business network services and a host of other value-added services for business and residential customers. At the time, Telecom struggled to develop and launch service offerings that could compete with Optus, and a large portion of the Australian population embraced the growth and service offerings available from Optus.

I spent 10 years at Optus, where I managed the 8500-kilometer rollout of fiber that extended from Perth to Adelaide, Melbourne, Canberra, Sydney and Brisbane. I must have done a good job on the build, as I was promoted to the role of Field/National Operations manager to manage all the infrastructure that was built in the first four years. Maybe that was the punishment in some way? I had a workgroup of some 300-400 staff during this period and gained a great deal of operational experience.

The breadth of knowledge, experience and networks established during my time at Optus was invaluable and led me to my next exciting role in the telecommunications industry during 2001. Nextgen Networks was basically formed to fill a void in the Australian long-haul, high-capacity digital transmission carrier market, spanning all mainland capital cities with high-speed fiber networks. Leighton Contractors was engaged to build the network and maintain it. In conjunction with transmission carriage services, Nextgen also pre-empted the introduction and development of transmission nodes and data center services. Major rollouts of fiber networks, associated transmission hubs and data centers were a major undertaking, providing exciting opportunities to employ innovation and new technologies.

My new role within Leighton Telecommunications Group to support and build the Nextgen network had several similarities with Optus. Included were design activities and technical acceptance from the builder of all built infrastructure, including transmission nodes and data centers. During 2003 Nextgen Networks went into administration as the forecast demand for large amounts of transmission capacity did not eventuate. Leighton Telecommunications realized the future opportunities and potential of the Nextgen assets and purchased the company. Through good strategic planning and business management, the Nextgen business has continued to successfully expand and grow. It is now Australia’s third largest carrier. This was indeed a success story for the Leighton Telecommunications Group.

Metronode was established as a separate business entity to support the Nextgen network rollout by way of providing capital city transmission nodes for the longhaul fiber network and data center colo space. Metronode is now one of the biggest data center owner/operators in Australia, with the largest coverage nationally.

I’ve now been with the Leighton Telco Group for 12 years and have worked in the areas of design, development, project management and operations. Much of the time was actually spent in the data center area of the business.

For the last three years I have headed Metronode’s Engineering Group and have been involved in many exciting activities including new technology assessment, selection and data center design. All Metronode capital city data centers were approaching full design capacity a couple of years ago. In order to continue on a successful growth path for data center space, services and much improved energy efficiency, Metronode could no longer rely on traditional data center topologies and builds to meet current and future demands in the marketplace. After very careful consideration and planning, it was obvious that any new data center we would build in Australia would have to meet several important design criteria. These included good energy efficiency (sub 1.2 PUE), modular construction, quick to build, high availability and environmental sustainability. Formal certification of the site to an Uptime Institute Tier III standard was also an important requirement.

During 2011 I embarked on a mission to assess a range of data center offerings and technologies that would meet all of Metronode’s objectives.

So your telecom and fiber work set you up to be the manager of operations at Metronode, but where did you learn the other data center disciplines?

During the rollout of fiber for the Optus network, capital city “nodes” or data centers were built to support the transmission network. I was involved with the acceptance of all the transmission nodes and data centers, and I guess that’s where I got my first exposure to data centers. Also, when Nextgen rolled out its fiber network, it was also supported by nodes and data centers in all the capital cities.

I was primarily responsible for acceptance of the fiber network, all regenerator sites (basically mini data centers) and the capital city data centers that the fiber passed through. I wasn’t directly involved with design, but I was involved with commissioning and acceptance, which is where I got my experience.

Do you consider yourself to be more of a network professional or facilities professional?

A portion of both, however, stronger affinity with the data center side of things. I’m an electrical engineer and relate more closely to the infrastructure side of data centers and transmission regeneration sites–they all depend on cooling, UPS, batteries, controlled environments, high levels of redundancy and all that sort of thing.

Prior to completion of any data center build (includes fiber transmission nodes), a rigorous commissioning and integrated services testing (IST) regime is extremely important. A high level of confidence in the design, construction and operation is gained after the data center is subjected to a large range of different fault types and scenarios to successfully prove resilience. My team and I always work hard to cover all permutations and combinations of fault/operational scenarios during IST to demonstrate resilience of the site before handing it over to our operations colleagues.

What does the Australian data center business look like? Is it international or dependent on local customers?

Definitely dependent on both local and international customers. Metronode is a bit different from most of the competition, in that we design, build, own and operate each of our sites. We specialize in providing wholesale colo services to the large corporations, state and federal government departments and carriers. Many government departments in Australia have embarked on plans to consolidate and migrate their existing owned, leased and “back of office” sites into two or more modern data centers managed and operated by experienced operators with a solid track record. Metronode recently secured the contract to build two new major data centers in NSW to consolidate and migrate all of government’s requirements.

Can you describe Metronode’s data centers?

We have five legacy sites that are designed on traditional technologies and builds comprising raised floor, chilled-water recirculation, CRACs on floor, under-floor power cabling from PDUs to racks and with relatively low-power density format. About three years ago, capacity was reaching design limits and the requirement to expand was paramount. I was given the task of reviewing new and emerging data center technologies that would best fit Metronode’s business plans and requirements.

It was clear that a modular data center configuration would provide significant capital savings upfront by way of only expanding when customer demand dictated. The data center had to be pre-built in the factory, tested and commissioned and proven to be highly resilient under all operating conditions. It also had to be able to grow in increments of around 800kW of IT load up to a total 15 MW if need be. Energy efficiency was another requirement, with a PUE of less than 1.2 set as a non-negotiable target.

BladeRoom technology from the UK was the technology chosen. Metronode purchased BladeRoom modules from the U.K. They were shipped to Australia and assembled on-site. We also coordinated the design of plant rooms to accommodate site HV intake, switchgear, UPS and switchboards to support the BladeRoom modules. The first BladeRoom deployment of 1.5 MW in Melbourne took around nine months to complete.

The exterior of the Metronode facility.

The exterior of the Metronode facility.

BladeRoom uses direct free air and evaporative cooling as the primary cooling system. It uses an N+1 DX cooling system as a backup. The evaporative cooling system was looked at very favorably in Australia mainly because of the relatively high temperatures and low humidity levels throughout the year in most capital cities.

To date, we have confirmed that free air/evaporative cooling is used for 95-98% of the year, with DX cooling systems used for the balance. Our overall energy usage is very low compared to any traditional type sites.

We were the first data center owner/operator to have a fully Uptime Institute-certified Tier III site in Australia. This was another point of differentiation we used to present a unique offering in the Australian marketplace.

In Australia and New Zealand, we have experienced many other data center operators claiming all sorts of Tier ratings for their sites, such as Tier IV, Tier IV+, etc. Our aim was to formalize our tier rating by gaining a formal accreditation that would stand up to any scrutiny.

The Uptime Institute Tier ratings appealed to us for many reasons, so we embarked on a Tier III rating for all of our new BladeRoom sites. From a marketing perspective, it’s been very successful. Most customers have stopped asking many questions relating to concurrent maintainability, now that we have been formally certified. In finalizing the design for our new generation data centers, we also decided on engineering out all single points of failure. This is over and above the requirement for a Tier III site, which has been very well received in the marketplace.

What is the one-line of the BladeRoom electrical system?

The BladeRoom data hall comprises a self-contained cooling unit, UPS distribution point and accommodation space for all the IT equipment.

Support of the BladeRoom data hall requires utility power, a generator, UPS and associated switchgear to power the BladeRoom data hall. These components are built into what we call a “duty plant room.” The design is based on a block-redundant architecture and does not involve the paralleling of large strings of generators to support the load under utility power failure conditions.

A separate duty plant room is dedicated or assigned to each BladeRoom. A separate “redundant plant room” is part of the block-redundant design. In the case of any duty plant room failure, critical data hall load will be transferred to the redundant plant room via a pair of STS switches. Over here we refer to that as a block-redundant architecture. We calculate that we will achieve better than 5 nines availability.

Our objective was to ensure a simple design in order to reduce any operational complexity.

What’s the current footprint and power draw?

The data center built in Melbourne is based on a BladeRoom data hall with a cooling module on either end, allowing support 760 kW of IT load.

To minimize footprint, we build data halls into a block, comprising four by 760 kW and double-stack them. This provides us with two ground floor data halls and two first floor data halls, with a total IT capacity of 3 MW. Currently, the Melbourne site comprises a half block, which is 1.5 MW; we’re planning to build the next half block by the end of this year. Due to the modularity aspect and the similar design, we simply replicate the block structure across the site, based on client-driven demand. Our Melbourne site has capacity to accommodate five blocks or 15 MW of IT load.

In terms of the half block in operation in Melbourne, we have provisioned about 1 MW of IT load to our clients; however, utilization is still low at around 100 kW. In general, we have found that many clients do not reach their full allocated capacity for some time, possibly due to being conservative about demand forecast and the time it takes for complex migrations and new installations to be completed.

What about power density?

We can accommodate 30 kW per rack. A supercomputer was recently installed that took up six or seven rack spaces. With a future power demand in excess of 200kW, we are getting close to the 30 kW per rack mark.

And PUE?

At our new Melbourne site, we currently support an average IT load of 100 kW across 1.5 MW of IT load capacity,
which approximates to 7% IT load, a very light load which would reflect an unmentionable PUE in a traditionally
designed site! Our monthly PUE is now running at about 1.5. Based on trending over the last three months (which have been summer months in Australia) we are well on target to achieving our design PUE profile. We’re very confident we’ll have a 1.2 annual rolling PUE once we reach 30% load; we should have a sub 1.2 PUE as the load approaches 100%.

What got you interested in the ATD program and what have been its benefits?

During my assessment of new-generation data centers, there was some difficulty experienced in fully understanding the resilience that various data center configurations provided and comparing them against one another. At this point it was decided that some formal standard/certification would be the logical way to proceed, so that data center performance/characteristics could be compared in a like-for-like manner. A number of standards and best practices were reviewed, including those published by IBM, BICSI, AS/NZ Standards, TIA 942, UI, etc.; many of which quoted different rating definitions for each “tier.” The Uptime Institute tiering regime appealed to me the most, as it was not prescriptive in nature and yet provided an objective basis for comparing the functionality, maintainability, redundancy and fault tolerance levels for different site infrastructure topologies.

My view was that formal certification of a site would provide a clear differentiator between ourselves and competition in the marketplace. To further familiarize myself with the UI standards, applications and technical discussions with clients, an ATD qualification was considered a valuable asset. I undertook the ATD course nearly two years ago. Since then, the learnings have been applied to our new designs, in discussions with clients on technical performance, in design reviews with consultants, and for comparing different data center attributes.

I found the ATD qualification very useful in terms of assessing various designs. During the design of our Melbourne data center, our local design consultants didn’t have anyone who was ATD accredited locally, and I found that designs presented did not always comply with the minimum requested requirements for Tier III, and in many cases exceeded the Tier III requirement. This was another very good reason for completing the ATD course in order to keep an eye on consultant designs.

Michael, are you the only ATD at Metronode?

I’m the only Tier accredited designer in Metronode. Our consultants have since had a few people accredited because it looked a little bit odd that the client had an ATD and the consultant didn’t.

What does the future hold?

We won a major contract with the NSW government last year, which involved building two new data centers. The NSW government objective was to consolidate around 200 existing “data centers” and migrate services into the two main facilities. They’re under construction as we speak. There’s one in Sydney and one just south of Sydney near Wollongong. We recently obtained Tier III design certification from Uptime for both sites.

The Sydney data center will be ready for live operations in mid-July this year and the site near Wollongong operational a couple of months later. The facilities in Sydney and near Wollongong have been dimensioned to support an ultimate IT capacity of 9 MW.

Metronode also has a new data center under construction in Perth. It will support an ultimate IT capacity of 2.2 MW, and the first stage will support 760 kW. We hope to obtain Tier III design certification on the Perth site shortly and expect to have it completed and operational before the end of the year.

The other exciting opportunity is in Canberra, and we’re currently finalizing our design for this site. It will be a Tier III site with 6 MW of IT capacity.

With my passion for sustainability and high efficiency, we’re now looking at some major future innovations to further improve our data center performance. We are now looking at new hydrogel technologies where moisture from the data hall exhaust air can be recycled back into the evaporative cooling systems. We are also harvesting rainwater from every square meter of roof at our data centers. Rainwater is stored in tanks and used for the evaporative cooling systems.

Plant rooms containing UPS, switchboards, ATS, etc. in our legacy sites are air conditioned. If you walked into one of these plant rooms, you’d experience a very comfortable 23 or 24 degrees Centrigrade temperature all year round. Plant rooms in our new data centers run between 35-40 degrees Centrigrade, allowing them to be free air-cooled for most of the year. This provides significant energy savings and allows our PUE to be minimized.

We are now exploring the use the hot exhaust air from the data halls to heat the generators, rather than using electrical energy to heat engine water jackets and alternators. Office heating is another area where use of data
hall exhaust is being examined.


GERARD THIBAULT

Gerard ThibaultHow did you get started?

I had quite a practical education up to the point I undertook a degree in electronics and electrical engineering and then came to the market.
I worked on a number of schemes unconnected with data centers until the point that I joined CBRE in 1998, and it was working with them that I stumbled across the data center market in support of Global Crossing, looking for real estate across Europe trying to build its Pan European network. Within CBRE, I provided a lot of support to the project management of their data center/POP builds as well as consultancy to the CBRE customer base. When Global Crossing got their network established, I joined them to head up their building services design function within the Global Center, the web-hosting portion of Global Crossing. Together with a colleague leading construction, we ran a program of building five major data centers across the tier one cities with two more in design.

After the crash of 2001, I returned to CBRE, working within the technical real estate group. I advised clients such as HSBC, Goldman Sachs and Lehman Brothers about the design and programming and feasibility of potential data centers. So I had a lot of exposure to high capital investment programs, feasibility studies for HSBC, for instance, covering about L220 million investment in the U.K. for a pair of mirrored data centers to replace a pair of dated facilities for a global bank.

During the period, 2005-2006, I became aware of Digital Realty through a number of projects undertaken by out team. I actually left CBRE to work for Digital directly, becoming European employee #4, setting up an embryonic group as part of the REIT that is Digital Realty.

Since then, I’ve been responsible for a number of new builds in Europe, including Dublin and London, driving the design standards that Digital builds to in Europe, which included taking the U.S. guidelines requirements and adapting them to not only European units but also to European data center requirements. That, in a way, led to my current role.

Today, I practice less of the day-to-day development and am more involved with strategic design and how we design and build projects. I get involved with the sales team about how we try to invite people into our portfolio. One of those key tools was working with Uptime Institute to get ATD accreditation so that we could talk authoritatively in assessing customer needs for reliability and redundancy.

What are the European data center requirements, other than units of measure and power differences?

I think it’s more about the philosophy of redundancy. It’s my own view that transactions in the U.S. are very much more reliant on the SLA and how the developer or operator manages his own risk compared to certification within the U.K. End users of data centers, whether it is an opex (rental) or build–to-own requirement, seem to exercise more due diligence for the product they are looking to buy. Part of the development of the design requirement was that we had to modify the electrical arrangements to be more of a 2N system and provide greater resilience in the cooling system to meet the more stringent view of the infrastructure.

Digital Realty Data Center, SW London

So you feel your U.K. customers are going to examine claims of redundancy more carefully, and they’re going to look for higher standards?

Yes. It’s matter of where they see the risks. Historically, we’ve seen that in the U.K. and across Europe too, they’ve sought to eradicate the risks, whereas I think maybe because of the approach to rental in the U.S., the risk is left with the operator. DLR has a robust architecture globally to mitigate our risk, but other operators may adopt different risk mitigation measures. It seems to me that over the years people in the U.K. market want to tick all the boxes and ensure that there is no risk before they take on the SLA, whereas in the U.S., it’s left to the operator to manage.

How has the ATD program played into your current role and helped you meet skepticism?

I think one of the only ways people can benchmark a facility is by having a certain stamp that says it meets a certain specification. In the data center market, there isn’t really anything that gives you a benchmark to judge a facility by, aside from the Uptime Institute ATD and certification programs.

It’s a shame that people in the industry will say that a data center is Tier X, when it hasn’t been assessed or certified. More discerning clients can easily see that a data center that has been certified in-line with the specs of the Uptime Institute, will gives them the assurance they need compared to a facility where they have no visibility of what a failure is going to do to the data center service. Certification, particularly to the Uptime Institute guidance is really a good way of benchmarking and reducing risk. That is certainly what customers are looking for. And perhaps it helps them sleep better at tonight; I don’t know.

Did the standards influence the base design documents of Digital Realty? Or was the U.S. version more or less complete?

I think the standards have affected the document quite a lot within Europe. We were the first to introduce the 2N supply right through the medium voltage supply to the rack supply. In the U.S., we always operated at 2N UPS, but the switchgear requires the skills of the DLR Tech Ops team to make it to concurrently maintainable. Additional features are required to meet the TUI standards that I’ve come to understand from taking the course.

I think we always had looked at achieving concurrent maintainability, but that might be by taking some
additional operational risks. When you sit back and analyze systems using the Uptime philosophy, you can see that having features such as double valving or double electrical isolation gives you the ability to maintain the facility not just by maintaining your N-capacity, or further resilience if you have a Tier IV system, but in in a safe and predictable manner.

We’ve often considered something concurrently maintainable on systems where a pipe freeze could be used to replace a critical valve. Now that might well be an industry-accepted technique, but if it goes wrong, the consequences could be very significant. What I’ve learned from the ATD process is regularizing the approach to a problem in a system and how to make sure that it is fail safe in its operation to either concurrent maintainability or fault tolerant standards if you are at the Tier III or Tier IV level, respectively.

How does Digital deploy its European Product Development strategy?

What we tried to do through product development is offer choice to people. In terms of the buildings, we are trying to build a non-specific envelope that can adapt to multiple solutions and thereby give people the choice to elect to have those data center systems deployed as the base design or upgrade them to meet Tier requirements.

Within a recent report that I completed on product development, the approach has made bringing the facility up to a Tier lever simpler or cheaper to adapt. We don’t by default build everything to a Tier III in every respect, although that’s changing now with DLR’s decision to gain a higher degree of certification. So far, we’ve pursued certification within the U.S. market as required and more frequently on the Asia-Pacific new build sites we have.

I think that difference may be due to the maturity of each market. There are a lot of people building data centers who perhaps don’t have the depth of maturity in engineering that DLR has. So people are looking for the facilities they buy to be certified so they can be sure they are reliable. Perhaps in the U.S. and Europe, they might be more familiar with data centers; they can look at the design themselves and make that assessment.

And that’s why in the European arena, we want to offer not only a choice of system but also to improve higher load efficiencies. The aim is to offer chilled water and then outside air, direct or indirect, or Tier certified designs, all within the same building and all offered as a sort of off-the-shelf product from a catalog of designs.

It would seem that providing customers with the option to certify to Tier would be a lot easier in a facility where you have just one customer.

Yes, but we are used to having buildings where we have a lot of customers and sometimes a number of customers in one data hall. Clearly, the first customer that goes into that space will often determine the infrastructure of that space. You can’t go back when you have shared systems beyond what the specification was when the initial build was completed. It is complicated sometimes, but it is something we’re used to because we do deal in multi-customer buildings.

What is the current state of Digital’s footprint in terms of size?

Within the U.K., we now have properties (ten buildings) in the London metro area, which represents over 1.2 million ft2 of space. This is now our largest metro region outside of the U.S. We have a very strong presence south of central London near Gatwick airport, but that has increased with recently acquired stock. We have two facilities there, one we built for a sole client—it was a build-to-suit option—and then a multi-tenant one. Then we have another multi-tenant building, probably in the region of 8 MW in the southwest of London. To the north in Manchester, we have another facility that is fully leased, and it’s probably in the neighborhood of 4-5 MW.

We’re actually under way with a new development for a client, a major cloud and hosting provider. We are looking to provide a 10-MW data center for them, and we’re going through the design and selection process for that project at the moment.

That’s the core of what we own within the U.K., but we also offer services for design and project management. We actually assisted HSBC with two very, very secure facilities; one to the north of London in the region of 4.5 MW
with 30,000 ft2 raised floor base and another in the north of the U.K.with an ultimate capacity of 14.5 MW and a full 2N electrical system, approaching a Tier IV design but not actually certified.

The most recent of our projects that we finished in Europe has been the first phase of a building for Telefonica in Madrid. This was a project where we we acted as consultants and design manager in a process to create a custom designed Tier IV data center with outside air, and that was done in conjunction with your team and Keith Klesner. I believe that’s one of only nine Tier IV data centers in Europe.

Walk me through the Madrid project.

It’s the first of five planned phases in which we assisted in creating a total of seven data halls, along with a support office block. The data halls are approximately 7,500 ft2 each. We actually advised on the design and fit-out of six of these data halls, and the design is based on an outside-air, direct-cooling system, which takes advantage of the very dry climate. Even though the dry-bulb temperatures are quite high, Madrid, being at high altitude, gives Telefonica the ability to get a good number of free-cooling hours within the year, driving down their PUE and running costs.

Each of the data centers has been set to run at four different power levels. The initial phase is at 1,200 kW per data hall, but the ultimate capacity is 4.8 MW per hall. All of it is supported by a full concurrently maintainable and fault-tolerant Tier IV-certified infrastructure. On the cooling side, the design was based on N+2 direct-air cooling units on the roof. Each unit is provided with a chilled water circuit for cooling in recirculation mode when outside air free-cooling is not available. There are two independent chilled-water systems in physically separate support buildings, separated from the main data center building.

The electrical system is based on a full 2N+1 UPS system with transformerless UPS to help the power efficiency and reduce the losses within the infrastructure. Those are based around the Schneider Galaxy 7000 UPS. Each of those 2N+1 UPS systems and the mechanical cooling systems were supported by a mains infrastructure at medium voltage, with a 2N on-site full continuous duty-rated backup generation system.

Is the PUE for a fully built out facility or partially loaded?

Bear with me, because people will focus on that. The approximate annualized PUEs based on recorded data at 100% were 1.25; 75%, 1.3; 50%, 1.35; and not surprisingly, as you drop down the curve, at 25% it rose to about 1.5.

What do you foresee for future development?

In the last few years, there has been quite a significant change in how people look at the data center and how people are prepared to manage the temperature parameters. Within the last 18 months I would say, the desire to adopt ASHRAE’s 2011 operating parameters for servers has been fairly uniform. Across the business, there has been quite a significant movement, which has been brought to a head by a combination of a lot of new cooling technologies going forward. So now you have ability to use outside air direct with full mechanical backup or outside air indirect where there is a use of evaporative cooling, but in the right climates, of course.

I think there is also an extreme amount of effort looking at various liquid-cooled server technology. From my standpoint, we still see an awful lot of equipment that wants to be cooled by air because that is the easiest presentation of the equipment, so it may be a few years before liquid rules the day.

There’s been a lot of development mechanically and I think we’re sort of pushing the limits of what we can
achieve with our toolkit.

In the next phase of development, there’s got to be ways to improve the electrical systems’ efficiencies, so I think there is going to be a huge pressure on UPS technology to reduce the losses and different types of voltage distribution, whether that be direct current or elevated ac voltages. All these ideas have been around before but may not have been fully exploited. The key thing is the potential, provided you are not operating on recirc for lot of the time, with outside air direct you’ve got PUEs approaching 1.2 and closing in on 1.15 in the right climate. At that level of PUE, the UPS and the electrical infrastructure can score a significant part of the remaining PUE uplift, with the amount of waste that currently exists. So I think one of the issues going forward, will be addressing the efficiency issues on the electrical side of the equation.

Kevin HeslinKevin Heslin is senior editor at the Uptime Institute. He served as an editor at New York Construction News, Sutton Publishing, the IESNA, and BNP Media, where he founded Mission Critical, the leading publication dedicated to data center and backup power professionals. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the B.A. in Journalism from Fordham University in 1981 and a B.S. in Technical Communications from Rensselaer Polytechnic Institute in 2000.