Blog Single Author Small - Uptime Institute Blog

ATD Interview: Christopher Johnston, Syska Hennessy

June 18, 2014/in Design/by Kevin Heslin

Industry Perspective: Three Accredited Tier Designers discuss 25 years of data center evolution

The experiences of Christopher Johnston, Elie Siam, and Dennis Julian are very different. Yet, their experiences as Accredited Tier Designers (ATDs) all highlight the pace of change in the industry. Somehow, though, despite significant changes in how IT services are delivered and business practices affect design, the challenge of meeting reliability goals remains very much the same, complicated only by greater energy concerns and increased density. All three men agree that Uptime Institute’s Tiers and ATD programs have helped raise the level of practice worldwide and the quality of facilities. In this installment, SyskaHennessy Group’s Christopher Johnston looks forward by examining current trends. This is the first of three ATD interviews from the May 2014 Issue of the Uptime Institute Journal.

Christopher Johnston is a senior vice president and chief engineer for Syska’s Critical Facilities Team. He leads research and development efforts to address current and impending technical issues (cooling, power, etc.) in critical and hypercritical facilities and specializes in the planning, design, construction, testing, and commissioning of critical 7×24 facilities. He has more than 40 years of engineering experience. Mr. Johnston has served as quality assurance officer and supervising engineer on many projects, as well as designing electrical and mechanical systems for major critical facilities projects. He has been actively involved in the design of critical facility projects throughout the U.S. and internationally for both corporate and institutional clients. He is a regular presenter at technical conferences for organizations such as 7×24 Exchange, Datacenter Dynamics, and the Uptime Institute. He regularly authors articles in technical publications. He heads Syska’s Critical Facilities’ Technical Leadership Committee and is a member of Syska’s Green Critical Facilities Committee.

Christopher, you are pretty well known in this business for your professional experience and prominence at various industry events. I bet a lot of people don’t know how you got your start.

I was born in Georgia, where I attended the public schools and then did my undergraduate training at Georgia Tech, where I earned a Bachelor’s in Electrical Engineering and effectively a minor in Mechanical Engineering. Then, I went to the University of Tennessee at Chattanooga. For the first 18 years of my career, I was involved in the design of industrial electrical systems, water systems, wastewater systems, pumping stations, and treatment stations. That’s an environment where power and controls have a premium.

In 1988, I decided to relocate my practice (Johnston Engineers) to Atlanta and merge it with RL Daniell & Associates. RL Daniell & Associates had been doing data center work since the mid-1970s.

Raymond Daniell was my senior partner, and he, Paul Katzaroff, and Charlie Krieger were three of the original top-level designers in data center design.

That was 25 years ago, Raymond and I parted ways in 2000, and I was at Carlson until 2003. Then, I opened up the Atlanta office for EYP Mission Critical Facilities and moved over to Syska in 2005.

What are the biggest changes since you began practicing as a data center designer?

From a technical standpoint, when I began we were still in the mainframe era, so every shop was running a mainframe, be it Hitachi, Amdahl, NAS, or IBM. Basically we dealt with enterprise-type clients. People working at their desks might have a dumb terminal keyboard and video. Very few had mice, and they would connect into a terminal controller, which would connect to the mainframe.

The next big change was when they went to distributed computing, which were the IBM AS400 and the Digital VAX systems. Then, in the 1990s, we started seeing server-based technologies, which is what we have today.

Of course, we now have virtualized servers that make the servers operate like mainframes, so the technology has gone in a circle. That’s my experience with IT hardware. Of course, today’s hardware is typically more forgiving than the mainframes were.

We’re seeing much bigger data centers today. A 400-kilowatt (kW) UPS system was very sizable. Today, we have data centers with tens of megawatts of critical load.

I’m the Engineer-of-Record now for a data center with 60 megawatts (MW) of critical load. SyskaHennessy Group has a data center design sitting on the shelf with 100 MW of load. I’ve actually done a design for kind of a prototype for a telecom in China that wanted 315 MW of critical load.

There is also much more use of medium voltage because we have to move tremendous amounts of electricity at minimal costs to meet client requirements.

The business of data center design has become commoditized since the late 1980s. Data center design was very much a specialty, and we were a boutique operation. There was not much competition, and we could typically demand three times the fee on a percentage basis than we can now.

Now everything is commoditized; there is a lot of competition. We encounter people who have not done a data center in 5-7 years who consider themselves data center experts. They say to themselves, “Oh data centers, I did one of those 6 or 7 years ago. How difficult can it be? Nothing’s changed.” Well the whole world has changed.

Please tell me more about the China facility. At 315 MW, obviously it has a huge footprint, but does it have the kind of densities that I think you are implying? And, if so, what kind of systems are you looking to incorporate?

It was not extremely dense. It was maybe 200 watts (W)/ft2 or 2,150 W/square meter (m2). The Chinese like low voltage, they like domestically manufactured equipment, and there are no low-voltage domestically manufactured UPS systems in China.

The project basically involved setting up a 1,200-kW building block. You would have two UPS systems for every 1,200-kW block of white space, and then you would have another white space supplied by two more 1,200-kW UPS systems.

It was just the same design repeated over and over again, because that is what fit how the Chinese like to do data centers. They do not like to buy products from outside the country, and the largest unit substation transformer you can get is 2,500 kilovolt-amperes so perhaps you can go up to a 1,600-kW system. It was just many, many, many multiple systems, and it was multiple buildings as well.

I think a lot of people did conceptual designs. We didn’t end up doing the project.

GE Appliances Tier III data center in Louisville, KY

What projects are you doing?

We’ve got a large project that I cannot mention. It’s very large, very high density, and very cutting edge. We also have a number of greenfield data centers on the board. We always have greenfield projects on the board.

You might find it interesting that when I came to Syska, they had done only one greenfield; Syska had always lived in other people’s data centers. I was part of the team that helped corporate leadership develop its critical facilities team to where it is today.

We’re also seeing the beginning of a growing market in the U.S. in upgrading and replacing legacy data center equipment. We’re beginning to see the first generation of large enterprise data centers bumping up against the end-of-usable-life spans of the UPS systems, cooling systems, switchgear, and other infrastructure. Companies want to upgrade and replace these critical components, which, by the way, must be done in live data centers. It’s like doing open-heart surgery on people while they work.

What challenge does that pose?

One of the challenges is figuring out how to perform the work without posing risk to the operating computing load. You can’t shut down the data center, and the people operating the data center want minimal risk to their computers. Let’s say I am going to replace a piece of switchgear. Basically, there are a few ways I can do it. I can pull the old switchgear out, put the new switchgear in its place, connect everything back together, then commission it, and put everything back into service. That’s what we use to call a push pull.

The downside is that the computers are possibly going to be more at risk because you are not going to be able to pull the old switchgear out, put new equipment in, and commission it very quickly. If you are working on a data center with A and B sides and don’t have ability to do a tie, perhaps the A side will be down for 2 weeks, so there is certainly more risk in terms of time there.

If you have unused space in the building where you can put new equipment or if you can put the switchgear in containers or another building, then you can put the new switchboard in, test it, commission it, and make sure everything is fine. Then you just need to have a day or perhaps a couple of days to cut the feeders over. There may be less risk, but it might end up costing more in terms of capital expense.

So each facility has to be assessed to understand its restrictions and vulnerabilities?

Yes, it is not a one-size-fits all. You have to go and look. And an astute designer has got to do the work in his or her head. The worst situation a designer can put the contractor in is to turn out a set of drawings that say, “Here’s the snapshot of the data center today and here’s what we want to look at when we finish. How you get from point A to point B is your problem.” You can’t do that because the contractor may not be able to build it without an outage.

We are looking at data centers that were built in the 1990s up to the dot bomb in 2002-2003 that are getting close to the end, particularly on the UPS systems, of usable life. The problem is not only the technology, but it is also being able to find technicians who can work on the older equipment.

What happens is the UPS manufacturer may say, “I’m not going to make this particular UPS system any more after next year. I’ve got a new model. I’ll agree to support that system normally, for, let’s say, 7 years. I’ll provide full parts and service support.” Past 7 years they don’t guarantee the equipment.

So let’s say that two years before the UPS was obsoleted you bought one and you are planning on keeping it 20 years. You are going to have to live with it for years afterwards. The manufacturer hasn’t trained any technicians, and the technicians are retiring. And, the manufacturer may have repair parts, but then again after 7 years, they may not.

What other changes do you see?

We’re beginning to see the enterprise data centers in the U.S. beginning to settle out into two groups. We are seeing the very, very, very small, 300-500 kW, maybe a 1000 kW, and then we see very big data centers of 50 MW or maybe 60 MW. We’re not seeing many in the middle.

But, we are seeing a tremendous amount of building by colo operators like T5, Digital Realty, and DuPont Fabros. I think they are having a real effect in the market, and I think there are a lot of customers who are tending to go to colo or to cloud colo. We see that as being more of an interest in the future, so the growth of very large data centers is going to continue.

At the same time, what we see from the colos is that they want to master plan and build out capacity as they need it, just in time. They don’t want to spend a lot of money building out a huge amount of capacity that they might not be able to sell, so maybe they build out 6 MW, and after they’ve sold 4 MW, maybe they build 4-6 more MW. So that’s certainly a dynamic that we’re seeing as people look to reduce the capital requirements of data centers.

We’re also seeing the cost per megawatt going down. We have clients doing Uptime Institute Certified Tier III Constructed Facility data centers in the $9-$11 million/MW range, which is a lot less than the $25 million/MW people have expected historically. People are just not willing to pay $25 million/MW. Even the large financials are looking for ways to cut costs, both capital costs and operating costs.

You have been designing data centers for a long time. Yet, you became an ATD just a few years ago. What was your motivation?

December 2010. My ATD foil number is 233. The motivation at the time was that we were seeing more interest in Uptime Institute Certification from clients, particularly in China and the Mideast. I went primarily because we felt we needed an ATD, so we could meet client expectations.

If a knowledgeable data center operator outside goes to China and sets up a data center, the Chinese look at an Uptime Institute Certification as a Good Housekeeping Seal of Approval. It gives them evidence that all is well.

In my view, data center operators in the U.S. tend to be a bit more knowledgeable, and they don’t necessarily feel the need for Certification.

Since then, we got three more people accredited in the U.S. because I didn’t want to be a single point of failure. And, our Mideast office decided to get a couple of our staff in Dubai accredited, so all told Syska has six ATDs.

This article was written by Kevin Heslin, senior editor at the Uptime Institute. He served as an editor at New York Construction News, Sutton Publishing, the IESNA, and BNP Media, where he founded Mission Critical, the leading publication dedicated to data center and backup power professionals. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the B.A. in Journalism from Fordham University in 1981 and a B.S. in Technical Communications from Rensselaer Polytechnic Institute in 2000.

Fuel System Design and Reliability

June 18, 2014/in Design/by Kevin Heslin

Topology and operational sustainability considerations must drive fuel system design.

When Superstorm Sandy hit the Northeastern U.S., some data center owners and operators discovered that they hadn’t considered and treated their fuel systems as part of a mission-critical system. In one highly publicized story, a data center operator in New York City was forced to use a “fuel bucket brigade,” moving fuel by hand up 18 stories for 48 hours. The brigade’s heroics kept the data center online, but thoughtful design of the fuel topology would have led to better operational sustainability. Superstorm Sandy taught many other New York and New Jersey data center owners that fuel system planning is essential if critical power is to be maintained in an emergency.

Uptime Institute has observed that designers may not be well versed in fuel solutions for data centers. Possible explanations include the delegation of the fuel system design to a fuel or engine generator supplier and a focus on other systems. As a critical system, however, fuel systems require all the consideration paid to the topology of other data center subsystems from data center designers and owners, with integration of Operational Sustainability principles to ensure high availability.

Tier Topology of Fuel Systems
Recall that the Tier Standard is based on the resiliency of the lowest-rated data center subsystem. The topology of the fuel system includes paths of fuel distribution and each and every fuel component, all of which must correspond to the Tier objective of the data center. For Tier III and Tier IV data centers, the path of fuel supply and potentially the return lines must be either Concurrently Maintainable or Fault Tolerant. The fuel system components—namely pumps, manual valves, automated valves, control panels, bulk tanks and day tanks – all must meet either the Concurrently Maintainable or Fault Tolerant objective.

Determining whether a fuel system is Concurrently Maintainable or Fault Tolerant means that it must be evaluated in the same way as the chilled-water schematic or an electrical one-line drawing, starting from a given point – often the bulk storage tanks—and methodically working through the paths and components to the fuel treatment supplies and day tanks. Removing each component and path reveals points in the system where “N,” or the needed fuel flow and/or storage, are unavailable.

Tier Topology Fuel Requirement
The Uptime Institute Owners Advisory Committee defined 12 hours of minimum fuel storage as a starting point for Tier-defined data centers. The Tier Standard: Topology requires this 12-hour fuel storage minimum for all Tiers at 12 hours of runtime at “N” load while meeting the facility’s stated topology objective. Put another way, the fuel storage must be adequate to support the data center design load for 12 hours while on engine generators while meeting the Concurrently Maintainable or Fault Tolerant objective. Exceeding the 12-hour minimum is an Operational Sustainability issue that requires careful analysis of the risks to the data center energy supply.

Fuel Storage Topology
Many owners reference the amount of fuel on hand based on total capacity. Just as with engine generators, chillers and other capacity components, the true fuel storage capacity is evaluated by removing the redundant component(s). A common example is an owner’s claim of 48 hours of fuel; however, the configuration is two 24-hour tanks, which is only 24 hours of Concurrently Maintainable fuel. The amount of Concurrently Maintainable or Fault Tolerant fuel establishes the baseline as the amount always available, not the best-case scenario of raw fuel storage.

Common fuel system design discrepancies in Tier III and Tier IV systems include:

The fuel system valving and piping does not match the Concurrently Maintainable objective. An example of the most common fuel error is a single valve between two pumps or storage tanks when using an N+1 configuration, which cannot be maintained without shutdown of more than the redundant number of components.
The power to fuel pumps and fuel control panels is not Concurrently Maintainable or Fault Tolerant. A single panel or single ATS feeding all fuel pumps is a common example. The entire power path to fuel pumps and controls must be Concurrently Maintainable or Fault Tolerant.
Depending on the topology of the fuel system (constant flow, etc.), the return fuel path may be critical. If the fuel supply is pressurized or part of a continuously functioning return system from the day tanks to the bulk tanks, the return path must meet the Concurrently Maintainable or Fault Tolerant objective.
The fuel control system is not Concurrently Maintainable. A fuel control panel may be replicated or a method of manual fuel operations must be available in case of maintenance or replacement of the fuel control panel. In this case, the manual method must be a designed and installed methodology.

Other considerations include fuel cooling, the fault tolerance of fuel systems and fuel type.

The internal supply pumps of most engine generators continuously oversupply diesel to the engine generator and return superheated diesel fuel to the day tank. During extended operations or during high ambient temperatures, it is possible to overheat the day tank to such an extent that it reduces engine generator power or even causes thermal shutdown of the unit. There are two primary ways to prevent fuel overheat:

Recirculating day tank fuel to the underground bulk tank. This system utilizes the thermal inertia of the underground system with little or no bulk tank temperature change. Of course, the return path is a critical element of the fuel system and must meet all the same Tier criteria.
Specifying a fuel cooler as part of the engine generator package. Fuel coolers function well as long as their net power impacts on the engine generator are considered and the fuel cooler is confirmed to function at extreme ambient temperatures.

Tier IV requires autonomous response to failure, which encompasses the ability to detect a fault, isolate the fault and sustain operations with alternate systems. The simple test of Fault Tolerance for fuel is that “N,” or the needed amount of fuel for the data center, is available after any failure. Some failure scenarios of fuel systems include loss of power, leaks, failure of components or fire. Leak detection of fuel systems is crucial to Fault Tolerance. Fault detection may be achieved using a variety of methods, but the chosen method must be integrated with the system response to isolate the fuel leak. The isolation must limit fuel loss such that the ability of alternate systems to deliver “N” fuel is not impacted.

Tier IV also requires the compartmentalization of fuel systems. Compartmentalizing complementary systems within separate spaces limits the damage of a single catastrophic event to a single fuel system. This includes fuel lines, fuel controllers, fuel pumps and all types of fuel tanks.

Owners considering propane or natural gas to feed either gas turbines or more recent innovations such as fuel cells face additional fuel considerations.

Fuel must have completely redundant supply lines in order to be Concurrently Maintainable.

Tier topology views gas supply lines as a utility similar to water or electricity. Thus, as a utility, a minimum of 12 hours of gas is required to be stored on-site in a manner consistent with the site’s Tier design objective. The result is one or more large gas tanks for storage. While Tiers allow this solution, not many jurisdictions have shown tolerance for large gas storage under local or national code. And, from an Operational Sustainability perspective, multiple large gas tanks could be an explosion hazard. Explosion risk may be mitigated but certainly is a major design consideration.

Fuel System Design Impacts to Operational Sustainability
Operational Sustainability defines the behaviors and risks beyond Tier Topology that impact the ability of a data center to meet its business objectives over the long term. Of the three elements of Operational Sustainability, the data center design team influences the building characteristics element the most. An improperly designed fuel system can have serious negative Operational Sustainability impacts.

Sustainability Impacts
Recent weather and disaster events have shown that planning for more extreme circumstances than previously anticipated is being realized. Properly designed fuel storage can mitigate the operational risks related to long-term outages and fuel availability caused by disaster. Some questions to ask include:

What is the owner’s business continuity objective and corporate policy?
How will the data center operator manage the bulk fuel on hand?
What is the site’s risk of a natural disaster?
What is the site’s risk of a man-made disaster?
What fuel service can be provided in a local disaster, and can local suppliers meet fuel requirements?
What fuel service can be provided in a regional emergency, and within what time period?

The answers to questions about fuel storage requirements can come from dialogue with the data center owner, who must clearly define the site’s autonomy objective. The design team can then apply the Tier topology along with Operational Sustainability risk mitigation to create the fuel storage and delivery system.

Operational Sustainability is about mitigating risks through defining specific behaviors. Human error is one of the biggest risks to operations. In order to mitigate the human factor, the design of infrastructure, including fuel, must incorporate simplicity of design and operations. In essence, the design must make the fuel system easier to operate and maintain. Fuel distribution isolation valves and control systems must be accessible. Underground vaults for fuel systems should be routinely accessible and provide adequate space for maintenance activities. Fuel tank site glasses located in readable locations can provide visual clues of infrastructure status, confirm fuel levels and give operators direct feedback before the start of maintenance activities.

Fuel treatment is not required by Tier topology, but it is certainly an important Operational Sustainability site characteristic. Diesel fuel requires regular management to remove water, algal growth and sediment to ensure that engine generators are available to provide power. Permanently installed fuel treatment systems encourage regular fuel maintenance and improve long-term availability. Local fuel vendors can help address local concerns based on diesel fuel and climate. Finally, the Institute encourages designers to integrate fuel treatment systems carefully to ensure that the Tier objective is not compromised.

The Institute has seen a growing use of engine generators in exterior enclosures with fuel systems located outside. Physical protection of these assets is an Operation Sustainability concern. With the level of investment made in data center infrastructure, designers must prevent weather and incidental damage to the fuel and engine generator systems. Protecting fuel lines, tanks and engine generators from errant operators or vehicles is paramount; bollards or permanent structures are good solutions. The fuel system must consider Operational Sustainability to achieve high availability. Simplicity of design and consideration of operations can be the most effective long-term investment in uptime. Analysis of Operational Sustainability risks with appropriate mitigations will ensure uptime in an emergency.

Summary
Fuel systems must match the topology objective set for the entire facility. The chart outlines the benefits, drawbacks and Operational Sustainability considerations of different fuel solutions.

Keith Klesner’s career in critical facilities spans 14 years and includes responsibilities ranging from planning, engineering, design and construction to start-up and ongoing operation of data centers and mission-critical facilities. In the role of Uptime Institute vice president of Engineering, Mr. Klesner has provided leadership and strategic direction to maintain the highest levels of availability for leading organizations around the world. Mr. Klesner performs strategic-level consulting engagements, Tier Certifications and industry outreach—in addition to instructing premiere professional accreditation courses. Prior to joining the Uptime Institute, Mr. Klesner was responsible for the planning, design, construction, operation and maintenance of critical facilities for the U.S. government worldwide. His early career includes six years as a U.S. Air Force officer. He has a Bachelor of Science degree in Civil Engineering from the University of Colorado-Boulder and a Masters in Business Administration from the University of LaVerne. He maintains status as a professional engineer (PE) in Colorado and is a LEED-accredited professional.

2014 Uptime Institute Data Center Survey Results

June 10, 2014/in Executive/by Mark Harris

Uptime Institute Director of Content and Publications, Matt Stansberry, delivers the opening keynote from Uptime Institute Symposium 2014: Empowering the Data Center Professional. This is Uptime Institute’s fourth annual survey, and looks at data center budgets, energy efficiency metrics, and adoption trends in colocation and cloud computing.

Diesel exhaust after treatment for data centers

May 19, 2014/in Operations/by Matt Stansberry

Cleaning up your act: A guide to exhaust standards

By Lamont Fortune, PE, United Health Group

Are your diesel generators used strictly for emergency or non-emergency purposes? Your answer to this question will tell you just how clean your diesel exhaust has to be by 2015 (and as early as 2014) as far as the U.S. Environmental Protection Agency (EPA) is concerned. Of course, you may have already run into this issue.

The EPA is thinking about your diesel exhaust because of the air pollution regulations spelled out in the EPA Standards of Performance for Stationary Compression Ignition Internal Combustion Engines (CI ICE) from 40CFR (Code of Federal Regulations), Part 60, Subpart IIII. The EPA air pollution regulations are very extensive, including a completely separate standard for spark ignition engines. The air quality requirements for engine generator exhaust emissions regulated by this document were first declared effective on September 11, 2006. The regulations have undergone some revisions and updates since then due to industry objections. The last final revision was released on January 14, 2013. This six-plus-year period has seen active industry involvement in an attempt to better balance the overall environmental benefit with regulation adoption and enforcement.

CI ICE in the Data Center Arena
EPA 40CFR, Part 60, Subpart IIII covers CI ICE powered by liquid, gaseous, and solid fuels and built from 2007 forward. This particular article, however, focuses strictly on the compliance needs of diesel-fueled engines. EPA 40CFR, Part 60, Subpart IIII sets up four primary engine characteristics to determine what emission source restrictions must be observed and when. These characteristics are as follows:
1. Classification
2. Model year manufactured (not shipped or delivered)
3. Cylinder displacement
4. Horsepower

These characteristics are part of an elaborate and layered matrix with compliance categories tiers, which was designed to guide the overall industry implementation of these air quality standards from Day One 2007 through January 1, 2015.

In turn, these standards included a targeted and continual incremental reduction of allowed emissions of four pollutant groups during engine operation. The lowest and final permitted discharge levels take effect on January 1, 2015.

Now in its seventh year of implementation, the current (2013) compliance program has evolved so that Tier 4 Interim and Tier 4 Final are the only remaining categories. The four targeted pollutant groups are:
1. Nitrogen oxides
2. Particulate matter
3. Carbon monoxide
4. Non-methane hydrocarbons

Both engine manufacturers and end users are responsible for working together to achieve operational compliance. Manufacturers have to build increasingly clean burning engines; end users have to purchase the correct EPA Tier-certified engine to install (either EPA Tier 4 Interim or EPA Tier 4 Final at this point) and the appropriate engine exhaust after treatment (see Figure 1).

Emissions control is a joint effort of the end user and the engine generator vendor.

Since no single after treatment effectively reduces all four pollutant groups, a combination of methods is needed to meet mandated targets. Consequently, the only way EPA Tier 4 Final compliance can be verified is by field testing the generator and its after-treatment system. As a side note, all of these requirements for diesel engine generators are applied in conjunction with the current use of ultra-low sulfur diesel fuel (less than or equal to 15-ppm sulfur content) as part of the EPA’s multi-pronged approach to achieving clean air.

So now that you know about the EPA’s emission limitations for CI ICE, how do they apply to your situation? Let’s go back to the opening question to start to answer that: Are your diesel generators used strictly for emergency or non-emergency purposes?

Emergency or Non-Emergency?
If your generators are used solely for emergency purposes, 40CFR, Part 60, Subpart IIII does not apply. In this case, emergency means that your generators are operated for emergency backup power only when there is an electrical utility outage. Therefore, generators installed for fire and life-safety purposes also fall within the emergency designation. In addition, generators used to power critical networks and equipment can qualify as long as they are operated only during times of electric utility loss. And fire and flood pump applications also fall squarely in the emergency status camp, whether powered by engine generator or direct engine drive.

If, however, the chosen operational strategy is to run CI ICE for any other purpose when utility power is available (such as storm avoidance), then non-emergency status applies, and the application must comply with 40CFR, Part 60, Subpart IIII. This status also applies if there is a standing financial incentive or arrangement (peak shaving, demand response, rate curtailment, continuous base loading, etc.) for a generator operator to supply power into an electric utility grid for the grid owner’s benefit.

Regardless of whether the engine generators involved are permanently installed or rental units for temporary use (i.e., non-road mobile), 40 CFR, Part 60, Subpart IIII still applies.

EPA regulations recognize the practical need for maintaining the reliability of the backup system, including emergency utility requests for disconnection from their grid to sustain grid stability in avoiding brownouts, blackouts and utility islanding. The EPA does allow emergency equipment to be run for system maintenance, exercising and testing purposes, up to 100 hours per year. Non-emergency equipment subject to 40CFR, Part 60, Subpart IIII regulations can be run typically up to 50 hours per year for these purposes. Allowable runtimes, though, may be less than these numbers, depending on the local air district authority having jurisdiction (AHJ).

Figure 2 shows the EPA Tier 4 compliance timeline for various sizes of non-emergency engine generators. Please also understand that once an after-treatment system has been successfully installed and commissioned detailed system operating records need to be maintained from that point forward. Besides being useful from a practical operating perspective, these records then document ongoing compliance with EPA’s air quality standards.

EPA compliance timeline.

After-Treatment Systems
After-treatment systems basically come in either proprietary prepackaged forms or as custom-located and interconnected components. Either way, the prevailing design approach is that each engine has its own dedicated after-treatment system. These use a reactant-based treatment in combination with stationary catalyst elements. The reactant is typically a 32.5-wt% urea solution (known generically in the U.S. as diesel exhaust fluid or DEF and in Europe as AdBlue or AUS 32), a 40-wt% solution, or an ammonia solution. While ammonia can be used as a reactant, urea is preferred because it is much less toxic, easier to transport and store and simpler to permit.

Platinum, palladium, copper zeolite, iron zeolit, and Vanadia are common catalysts. The contact geometries for the catalyst elements are often of a honeycomb type but vary by system manufacturer. Catalyst elements are not intended to be thought of or act as particulate filters.

Each of the major engine manufacturers is currently promoting its own after-treatment system to accompany and mate with its respective engine products, apparently in an effort to keep after-treatment packages in as compact an arrangement as possible to minimize the floor and/or cubic space required to house them. Nonetheless, be prepared to need a significant amount of room in addition to that normally used by the engine generator. Serviceability and maintainability also become important design issues to consider when evaluating the various options for after-treatment systems.

Figure 3 is a diagram of a fairly generic urea after-treatment system offered by one major supplier of such systems. Note that the preliminary treatment equipment is shown without the external insulation that will be required for safety and operational efficiency reasons for a final installation.

After-treatment system diagram

Exhaust gases from an EPA Tier 4-certified engine are directed through a pre-oxidation catalyst section to a reactant mixing section and then finally on to a selective catalytic reduction (SCR) unit before being discharged to the outside atmosphere. The precious metal catalyst materials in the pre-oxidation unit significantly reduce carbon monoxide, non-methane hydrocarbons and other hazardous air pollutants from the exhaust including up to 50% of the particulate matter. Urea then gets injected into and thoroughly mixed with the pre-oxidation output gases after the temperature of these gases rises to the 300-400°C (572-752°F) range before subsequent introduction into the SCR unit. This urea activation temperature is needed to actively regenerate the SCR catalyst as well as incinerate any accumulated ash. Passive and slower rate regeneration, however, occurs prior to this activation temperature being reached. (Note: Some SCR systems activate urea injection at 93°F [200°C].) And lastly, the SCR unit is where the bulk (up to 90%) of the remaining nitrogen oxides is removed before final discharge to the atmosphere.

The remaining components shown (reactant storage tank, booster pump, air compressor, dosing box and SCR controller) are all to support the correct operation of the overall after treatment. They represent the dynamic parts of the system whereas the pre-oxidation and SCR units are the static components. The SCR controller serves as the brains of the system in directing the necessary urea injection rates to achieve required discharge emission targets.

One side benefit of the after-treatment system is that it provides acoustical attenuation of the engine-generator noise that has normally been handled by an exhaust silencer. In fact, an acoustical analysis could show that a traditional silencer is not needed because the after-treatment system will fulfill the necessary sound reduction needs.

An After-Treatment System Example
The next set of figures (See Figures 4-10) depicts an actual installation of the generic after-treatment system previously described for a 2.725-megawatt (MW) generator driven by a 4005-HP prime-rated engine. They also convey some sense of the amount of additional cubic space required to accommodate such a system.

The surfaces of the catalyst modules that the exhaust gases flow across in this section (See Figure 4) should be visually checked for ash deposits every few years depending on the intensity of engine use during that time. These catalyst modules can be removed, swept clean of ash (using appropriate safety gear and associated practices), re-oriented to expose clean surfaces and then reinstalled. There is no need to replace the catalyst modules themselves.

Oxidation catalyst housing

Although the mixing section itself (See Figure 5) is basically a static unit with no moving parts, it is specifically engineered from a dimensional perspective to promote maximal mixing of injected urea with the engine exhaust gas stream given the engine’s discharge flow parameters. The combination of flow turbulence produced by the static mixers and atomized urea injection into the exhaust flow are integral to producing thorough mixing.

Illustration shows the urea injection and mixing section.

Figure 6 gives a more detailed view of the urea-dosing box and how it interfaces with the point of atomized urea spray injection. And although the various equipment components of a generic after-treatment system can be custom located, there can be limits on the maximum distances between components. Figure 6 shows one such limit: how far the dosing box can be from the urea injection point is shown.

The dosing box is a critical part of the mixing process.

The urea dose rate adjustment occurs in the dosing box, based on control feedback from the SCR controller. The compressed air feed is responsible for producing a very fine atomization of the liquid (or aqueous) urea flow for optimal injection and mixing efficiency. Both flows must work together for effective performance.

The heat of the mixed exhaust stream then evaporates the urea content and causes it to decompose into ammonia before this exhaust stream enters the SCR housing.

The urea supply line in this case is heat traced and insulated to ensure reliable flow operation during engine runtimes for the coldest expected winter days. Maintaining urea quality and fluidity within its recommended temperature range is critical to keep the injection spray nozzle from clogging with crystalline deposits and, thus, preventing adequate after treatment.

The final key after-treatment step occurs next within the SCR housing unit (see Figure 6), which is where the nitrogen oxide content of the exhaust stream gets knocked down to levels that comply with the EPA’s Tier 4 mandate. The intimate mixture of evaporated urea turned to ammonia entering the exhaust chemically reacts with the catalyst elements within the SCR housing to effect the conversion of the remaining nitrogen oxides into mostly nitrogen (N2), CO2 and water for discharge to the atmosphere. Some minute amounts of ammonia (called ammonia slip) may also sneak through the SCR housing and exit with the intended discharge products. Proper tuning of the urea dosing to the actual installed system performance conditions, however, will minimize ammonia slip issues.

As a reminder, the SCR housing shown in Figure 7 is for a 2.725-MW system. The installed unit with final insulation for this size system requires a 7-foot clearance from the underside of the supporting structure. Because of the width of the housing, a fire sprinkler head is required underneath. This head and accompanying sprinkler piping in turn take up even more height. Consequently, careful planning, unit placement, and overall adequate room height are needed for a successful installation that can be safely accessed and serviced.

Components have maximum and minimum clearances.

SCR technology, by the way, has been successfully used for nitrogen oxide removal for decades, having started out in central power plant applications for the exhausts of gas turbines and reciprocating engines. Therefore, this technology will likely continue to be around for a long time to meet future emissions standards as well.

Figure 8 illustrates the types of operating characteristics monitored for the SCR unit. These include the differential pressure across the SCR housing, inlet and outlet temperatures and a continuous sampling port for the final exhaust gas. They each have status indication connections with the SCR controller so the controller can regulate the upstream urea dosing and overall system operation to provide the required after-treatment results. All things considered, SCR unit monitoring is relatively straightforward.

Key operating parameters to be monitored include include differential pressure,
inlet and outlet temperatures and final exhaust gas quality.

Figure 9 shows a room full of SCR controllers. This is the technical complexity of the after-treatment-system lies. Though there is a sample port in the discharge of the SCR unit, the actual gas sampling operation happens within the SCR controller. Each controller housing contains a nitrogen oxide comparison cell, little pumps, fans, fine filters and small cooling equipment as well as a touchscreen-activated programmable controller for each after-treatment system. This includes operating set points, status monitoring, performance parameter indications, diagnostics, alarm generation, maintenance alerts and other historical data records.

A room full of SCR controllers.

Figure 10 shows a large-scale central urea storage and distribution system. Using high-quality aqueous urea (solid urea dissolved in water) and sustaining its quality throughout its storage life are essential for effective after-treatment system performance. To start, the solid urea used to make aqueous urea needs to be of commercial-or technical-grade purity (99.45% pure). Next, the water used to make 32.5-wt% aqueous urea solutions (40 wt% is also used for some systems) also has to be ultra clean, such as the product of reverse osmosis or ion exchange cartridges. These purification requirements remove as many potential catalyst poisons as possible. The list of catalyst poisons contains at least 31 elements, most of them heavy metals and compounds containing silicon (Si), phosphorous (P), arsenic (As), antimony (Sb), sodium (Na), and zinc (Zn). Fortunately, ready-made aqueous urea solutions meeting necessary purification standards can be commercially purchased for nominally around US$3 per gallon, often from a fuel oil supplier.

Things to Know About Aqueous Urea
Make sure you and your urea solution supplier use caution in transporting and handling this solution to avoid any contamination from its exposure to other materials or environments. Once safely stored on-site, the next task is to keep the solution between 32-77°F (25-30°C). At 32°F (25°C) and lower, the solution will begin to stratify and form precipitates, which will form deposits in the storage tanks. In addition, the solution concentration will become inconsistent and no longer perform reliably after treatment. Aqueous urea freezes and crystallizes at 12°F (-11°C), and solution temperatures above 77°F (25°C) (and especially above 86°F [(30°C]) accelerate solution breakdown. Therefore, the temperature range restrictions point to the technical requirements for the necessary environmental conditions for effective storage and distribution of aqueous urea.

A large-scale central storage and distribution system.

The materials within the urea storage and distribution system likewise play an important role in helping to maintain solution purity. For example, stainless steels (304, 304L, 316, and 316L), polyethylene, polypropylene and polyisobutylene are recommended for direct contact applications. Other plastics such as PFE, PTFE, PFA and PVDF are also acceptable. All plastics considered should be free of additives. On the opposite end of the compatibility spectrum, carbon steel, galvanized steel, copper, copper alloys, aluminum, lead and zinc are not recommended for direct contact uses. Solders containing silver, lead, zinc and copper are also not recommended.

Urea as a solid material is available in different grades and is used in making fertilizers, cosmetics, food additives and industrial processes. Aqueous urea can also be used as a fertilizer, which may be a disposal option depending on local regulations. It is a clear and colorless liquid with a specific gravity of 1.09. It is non-flammable and readily biodegradable, with no known significant hazardous properties–even at a pH between 9.0 and 9.8. Nonetheless, personnel handling urea should use goggles and rubber gloves.

So what is a reasonable shelf life for aqueous urea? Figure 11 is a table that shows shelf life as a function of storage temperature. As the storage temperature rises, the expected urea shelf life drops. This inverse relationship (six-month shelf life change for every 9°F [13°C] change) is a key reason why prolonged storage above 77°F (25°C) (and especially above 86°F [30°C]) is not a recommended practice.

Urea shelf life as a function of temperature.

Another good shelf-life management practice is to periodically sample and test the urea solution quality. ISO 22241 probably comes closest to being the national or international sampling standard. That said, the following would be a recommended monitoring practice:

1. Determine the initial (or latest) as-received shipment of aqueous urea alkalinity as NH3 as the starting baseline predictor of quality condition for that shipment.

2. Choose a time interval (six months, three months, monthly, etc.) for urea quality control checks based on expected usage.

3. Trend subsequent quality check results by continuing to measure urea alkalinity of each sample as NH3 to track the change of this value.

4. Use ISO 22241-2 parameters as the complete set of reference characteristics for judging urea quality conditions.

Lessons Learned
At this point, we have covered the basic design and installation of an engine exhaust after-treatment system and maybe a smidge of some operational considerations. Actual operating experience, however, of almost two years has brought me additional insight into predictive factors.

After-treatment systems have to reach internal temperatures 600°F (316°C) of nominal before any urea solution injection will be initiated. With an installation having engines that produce 817°F (436°C) exhaust temperatures only when they are running flat out at 100% load, a single exercising engine could activate urea use with relative ease by powering a load bank set for about 70% or higher engine loading. However, if this installation involves multiple engines running in a redundant configuration, hitting the initiation temperature becomes much harder to achieve without some clever manipulation. And even clever manipulation may not be enough to achieve an activation scenario depending on the active mix of the following:

1. Overall building load (move-in, fully loaded, in-between)
2. Minimum number of generators needed to run to keep total harmonic distortion on the building electrical distribution system within acceptable limits
3. Load bank conditions (permanent, temporary and capacity relative to a single generator, etc.)
4. The various operating configurations employed (i.e., emergency, non-emergency, maintenance, exercising, etc.)

The design expectation based on manufacturer technical literature for effective after treatment was for six gallons of urea to be used for every 100 gallons of fuel oil. The operating reality is that this ratio is more like 1.9 gallons of urea per 100 gallons of fuel oil, or 32% of originally predicted. This reality then begs the following issues for further scrutiny:

1. The original design numbers were based on engines running at 100% load. In a highly redundant normal operating configuration (N+2 minimum), each engine generator operates partly loaded with that partial load depending on the actual overall demand load at a point in time and the number of engines running. That demand load can shift depending upon the operating scenario. Therefore, the original design assumptions not only did not match actual operating conditions, they grossly overestimated them for most of these scenarios by a factor of three.

2. This overestimation, in turn, translates into oversized urea storage facilities and the purchase of too much urea, at least initially. Having too much urea on-hand leads to storing urea to the limits of its expected shelf life or beyond.

3. The minimum individual engine loading to create hot enough exhaust temperatures to initiate urea injection is 70%. While this loading was first determined during the commissioning phase, full realization of its consequences, unfortunately, took much longer to arrive.

4. Engine-generator operation during monthly exercising activities requires a permanently installed load bank to insure urea injection. This condition, however, is the only one of several where urea injection can be achieved.

5. Exercising an engine under no-load conditions for any length of time usually results in incomplete combustion within the engine. This, in turn, causes the discharge of partially combusted fuel oil by-products into the exhaust and, therefore, the downstream after-treatment system. The fouling in this situation (affectionately called slobber by some) is certainly not good for the internal surfaces (including the precious metal impregnated ones) of the after-treatment system.

6. Operating records indicate that the normal non-exercise operation scenario typically imposes a 45% load on each engine generator, well below that required to initiate urea injection. This, unfortunately, is inherent to running a highly redundant configuration.

7. There seems to be no current consensus on what should be done if normal operation of an engine-generator system does not produce hot enough exhaust temperatures to initiate urea injection into its after-treatment system. Will enforcement of 40CFR, Part 60, Subpart IIII be such that owners should expect frequent assessment of penalty fees for non-compliance? Will such no-urea-injection operating results mean that backup systems cannot be legally operated? Will lower-than-needed activation temperatures for urea injection be acceptable? Will the Best Available Control Technology (BACT) standard be invoked by the EPA to address this apparent disconnect between after-treatment functional goals and the BACT to allow additional time to come up with a workable solution? Will backup power requirements needed to meet Uptime Institute Tier Standards be affected?

So caveat emptor (buyer beware), especially if a highly redundant engine-generator system is involved. A new design and operation challenge is now before the mission critical data center community!

Cost Awareness
The installed cost of the after-treatment system described in this article for ten 2.725 MW units was about US$4.5 million, with only US$2,500 being spent for urea in the first year of operation. Contracted maintenance service costs, however, are much healthier and shown in Figure 12.

Contracted maintenance service costs for a typical system comprising ten
2.725-MW units.

These costs are based on an annual maintenance visit and a more thorough alternating bi-annual visit. They do demonstrate the importance of careful consideration of maintenance cost.

NESHAP
Now that the EPA requirements for engines made in 2007 and after have been discussed, attention should be turned to those engine-generator installations with engines manufactured in 2006 and earlier. These installations do not get to avoid EPA regulations. They just have to comply with another long-named reference standard instead (40CFR, Part 63, Subpart ZZZZ). These regulations are known as the EPA National Emission Standards for Hazardous Air Pollutants for Reciprocating Internal Combustion Engines (or NESHAP for RICE for short).

Though less stringent than 40CFR, Part 60, Subpart IIII, NESHAP still will likely require the installation of an oxidation catalyst unit on the exhaust of each engine to meet the newer and much stricter carbon monoxide (CO) emission targets than those that had previously existed. In fact, there are already combination silencer and oxidation catalyst housings available that can often be installed in the same location as existing silencers to minimize retrofit problems. These CO targets vary according to engine horsepower rating, hazardous air pollutant source potential-to-emit (either Area or Major), and the same Emergency and Non-Emergency engine classifications as previously described for 40CFR, Part 60.

Conclusion
Backup diesel engine generators are integral to most modern data centers. And engine-generator use varies depending on the business mission, criticality, and sophistication of a particular data center. The EPA and regional air quality districts have long since pegged engine generators as unfavorable but necessary air emissions polluting sources, typically restricting their annual runtime allowances to 100 hours or less.

The individual impact of this campaign for a data center depends on how the data center uses its engine generator(s). If the use is for emergency purposes only, then the prevailing emission standards are less restrictive. If nothing else, an oxidation catalyst installation may (or may not) be in your near future. Many new and emerging data center projects are claiming emergency status to avoid the added costs and complexities of installing exhaust after-treatment systems.

Nonetheless, if intended generator use does include non-emergency use, then expect to get very familiar with exhaust after-treatment systems. The content of this article is meant to assist that familiarization so your air emissions act can get appropriately cleaned up. Uptime Institute Tier III and higher projects are more likely candidates for falling into the non-emergency category.

In conclusion, currently operating data centers and new data centers due to come online need to understand their vulnerability to EPA and regional air quality district regulations. Evaluating this potential impact hinges on a host of factors discussed within this article, including some lessons learned from a real-life operating after-treatment system. While the technology for legitimate after-treatment systems is fairly mature, its use within the data center world is not. Therefore, let the growing pains begin—and with knowledgeable foresight and guidance, let them quickly be absorbed and abated. Uninterruptible uptime is about to get a lot cleaner.

Lamont Fortune, PE, is the lead mechanical engineer in the Data Center Facilities group within the Information Technology division of UnitedHealth Group (or UHG-IT). He has over 40 years of engineering experience in the facility infrastructure field involving water and wastewater treatment plants, high production printing, corporate offices, research laboratories, archival record storage and other technical operations. His last 20-plus years have been especially focused on mission-critical data center projects including their planning, design, project management, construction, commissioning and operations. These projects have ranged across the private, institutional, governmental and military arenas. He enjoys addressing the complexities of successfully sustaining uninterruptible uptime environments.

How business practices can impair data center availability

May 16, 2014/in Operations/by Matt Stansberry

Don’t pit man against machine

By Charles Selkirk, ATD

While data centers in Southern Africa utilize world-class designs and construction techniques, ongoing operational sustainability models struggle to attract and retain sufficient qualified and motivated personnel, in part due to the lack of recognition of the importance of their work (unless something goes wrong!).

The Uptime Institute’s Tier models have wide recognition and acceptance in the region, and many of our data center customers have lately elected for Tier III designs. In most cases, customers require active-active A and B paths, both with UPS backup, due to power instability issues experienced throughout the region; however, the recently launched Operational Sustainability Standard has had less impact in this region and is only now starting to gain some traction. As a result, data center operators in Southern Africa seem to take one of two very different approaches to the working relationship, which we have labeled Availability-Centric and Safety-Centric.

We based this observation on a catalog of data center failure and near-miss events compiled through our work in building facilities and providing sustainable support of ongoing operations in the region (especially in South Africa), which caused us to examine how the customers’ focus on operator safety might affect availability and reliability. We wanted to test the general business perceptions that availability is paramount and that it can be assumed that operators and maintenance support personnel have the skills and motivation to meet their own safety needs.

Availability-Centric View
Whatever the agreed-upon availability level of a facility, businesses consider availability to be paramount. This view is founded on the belief that business will suffer in the event of any downtime, and availability cannot be compromised unless the circumstances are either unavoidable or dire. In high-availability facilities, owners may believe there is no need for downtime and that maintenance can be done only in strictly controlled time slots, with generally no tolerance for faults or errors. The issue of operator and maintenance personnel safety is rarely, if ever, raised or discussed. Without generalizing too much, this attitude is prevalent in the financial services and retail industries.

Safety-Centric View
In a progressive corporate culture, the safety of employees is paramount, and all business needs are undertaken with this understanding. This is not to say that availability is unimportant, only that “safety is king.” These businesses have better-informed engineering support, and without exception, they view accidents and other incidents as potentially significant losses that must be considered and minimized. This culture drives, empowers and enables designers and operators to value themselves and their workplaces – and, in our experience, it also leads to improved availability.

We have found this culture to be more prevalent within the resources and manufacturing industries, although some commercial and colocation operators have also adopted this perspective.

In the schedules below, we summarize 49 incidents that have occurred in data centers over a period of 13 years throughout Southern Africa. Luckily, none of these incidents led to any form of personal injury, although many of them led to outages that damaged reputations, sapped customer confidence and caused financial losses.

In the schedules, we list the centricity characteristic of the owners, the markets served, the availability levels, the presence of dual-cabinet feeds, a description of the root cause of the problem, and whether an outage occurred – plus generalized comments on the incidents reported. The assignment of a centricity characteristic label may appear somewhat subjective, but in our observation, the difference between the Safety-Centric and Availability-Centric perspectives is easily discernible.

Summary
We would make the following observations based on this study:

A. The study was limited to the 49 incidents reported. A wider study with a larger number of incidents may yield somewhat different results.

B. There has been a shift to higher-reliability facilities over the last decade, in accordance with the higher uptime requirements of customers.

C. Despite the shift to higher-availability facilities, it is worth noting that the reduction in downtime incidents reported by Safety-Centric customers remains significantly lower than those reported by the Availability-Centric customers.

D. There is a clear and significant trend indicating that refocusing data center operations and maintenance to a Safety-Centric focus has significant benefits to customers in terms of uptime experienced.

Results
Figure 1. In our study, half the incidents reported resulted in downtime of some form to the data center.

Figure 2. Clearly, a higher Tier-level design has the desired effect of reducing downtime.
*The availability assessments are the author’s subjective evaluations.

Figure 3. If we exclude downtime incidents at lower-availability facilities, the ratio adjusts. Three times as many incidents occur in facilities designed to meet Tier III certification as in Tier IV or similar facilities, in part because of the relative number of the lower-availability facilities.

Figure 4. Of the data center owners included in our report, 75% are clearly more focused on availability, while we would classify only 25% as Safety-Centric owners.

Figure 5. We can see from this chart that the breakdown is largely in agreement with the breakdown by centricity type. If we exclude the incidents reported for lower-availability facilities, the incident breakdown remains largely unchanged.

Figure 6. This figure illustrates the relative proportion of downtime incidents by centricity type.

Figure 7. Of the total of 33 incidents recorded by Availability-Centric customers, 22 (67%) caused either partial or total outages. Of the total of 16 incidents recorded by Safety-Centric customers, just three (19%) caused either partial or total outages. In all three cases reported, the downtime incidents were the result of equipment failures in older facilities where maintenance practices and scheduling were not optimal. If we reexamine the proportion of downtime incidents when lower availability facilities are excluded, the incident breakdown is substantially similar.

Figure 8. Of the 12 downtime incidents reported, 11 were recorded by Availability-Centric customers while only one was recorded by a Safety-Centric customer. That single failure was attributable to equipment failure in a legacy facility.

The following tables list the 13 years of events the author used to draw the conclusions presented in this article.

Charles Selkirk was born in Zimbabwe and grew up in Harare, completing high school in 1977. He served as a crime investigator in rural districts for the local police force and later became a circuit court prosecutor. Mr. Selkirk then followed his brother into electrical engineering, earning a degree from the University of Cape Town in 1984. In 1985, he married and moved to ‘The Reef’ near Johannesburg and started working on a deep-level gold mine, initially serving as a foreman and supervisor. He eventually became a section engineer in charge of engineering construction and maintenance operations, a position he held for five years.

Mr. Selkirk left the mining industry in 1989 and moved to Cape Town, where he joined his brother in an engineering consulting practice. In the early years of the firm, they consulted on a wide range of projects for building services. Mr. Selkirk’s brother emigrated to the U.S. in 2000 as the firm shifted its focus to specialize in data center MEP services, turnkey data center construction and running data centers. More recently, his two sons have joined the business – one as a site engineer and the other as a programmer.

Introducing Uptime Institute’s FORCSS System

May 16, 2014/in Executive/by Matt Stansberry

A new process to compare IT deployment options across both in-house and outsourced alternatives

By Julian Kudritzki and Matt Stansberry, Uptime Institute

Uptime Institute FORCSS™ is an original system to capture, compare, and prioritize the various impacts to the many IT deployment alternatives.

On an ongoing basis, enterprise organizations decide to deploy IT assets in an internal data center, colocation facility, hosting environment, or cloud solution. These decisions may not holistically view the financial, risk, performance, or other impacts. FORCSS enables the management of an enterprise organization to identify, weigh, and communicate the advantages and risks of IT applications deployment options using a consistent and relevant criteria based in business drivers and influences.

Since its inception, the mission of the Uptime Institute has been to assist the enterprise in devising feasible and adaptable data center solutions that are responsive to the business. Successful solutions align data center design, technology selection, construction, and operation to achieve high reliability. One of the leading challenges today is deciding the most viable IT deployment option.

FORCSS helps the enterprise to overcome this challenge by focusing on the critical selection factors, thereby reducing or eliminating unfounded assumptions and organizational “blind spots.” FORCSS establishes a consistent and repeatable set of evaluation criteria and a structure to communicate the informed decision to stakeholders.

A coherent IT deployment strategy is often difficult because the staff responsible for IT assets and IT services across multiple geographies and multiple operating units are themselves spread over multiple geographies and operating units. The result can be a range of operating goals, modes, and needs that are virtually impossible to incorporate into a single, unified deployment strategy. And when a single strategy is developed from the “top down,” the staff responsible for implementing that strategy often struggles to adapt that strategy to their operational requirements and environments.

FORCSS was developed to provide organizations with the flexibility to respond to varying organizational needs while maintaining a consistent overall strategic approach to IT deployments. FORCSS represents a process a) to apply consistent selection criteria to specific deployment options, and b) to translate the outcome of the key criteria into a concise structure that can be presented to “non-IT” executive management.

The FORCSS system is composed of six necessary and sufficient selection factors relevant to an effective deployment decision. These six factors, or criteria, provide a holistic evaluation system, and drive a succinct decision exercise that avoids analytical paralysis. FORCSS identifies the relevant internal and external input.

And, by scaling the importance of the criteria within the system, FORCSS allows each organization to align the decision process to organizational needs and business drivers.

FORCSS Factors And Their Definitions

The Uptime Institute FORCSS system is a means to evaluate deployment alternatives. Accordingly, it is crucial to have a working knowledge of the tangible or intangible values associated with the application being deployed. Tangible values are notably revenues and intangible end-user satisfaction. To be effective and lasting, FORCSS must involve the stakeholder requesting the IT deployment. In other words, don’t lose sight of your client.

Financial

The fiscal consequences associated with deployment alternatives.

Net Revenue Impact: An estimation of gross profit margin—estimated revenues of IT service or application minus cost of ownership.
Comparative Cost of Ownership: The identified differential cost of deploying the alternative plus ongoing operations and maintenance, including the incremental cost of scaling the alternative as business grows. For example: Significant cost centers can include real estate development, MEP infrastructure, cost of financing, taxes, IT equipment, software license and customization, staffing, service provider and consulting fees. The most definitive cost information for each alternative is from a Total Cost Of Ownership (TCO) accounting protocol, for those few companies that have the capability to reliably determine TCO. Differential and incremental cost is often more directly determined.
Cash and Funding Commitment: Representation of liquidity—cash necessary at appropriate intervals for the projected duration of the business service.

Opportunity

A deployment alternative’s ability to fulfill compute capacity demand over time.

Time to Value: The time period from decision to IT service availability. Timeline must include department deployment schedules of IT, facilities, network, and service providers.
Scalable Capacity: Available capacity for expansion of a given deployment alternative.
Business Leverage and Synergy: Significant ancillary benefits of a deployment alternative outside of the specific application or business service.

For example: Improve economies of scale and pricing for other applications. Or, geographic location of a particular site provides business benefits beyond the scope of a single application.

Risk

A deployment alternative’s potential for negative business impacts.

Cost of Downtime vs. Availability: Estimated cost of an IT service outage vs. forecasted availability of deployment alternative.
Acceptable Security Assessment: Internal security staff evaluation of deployment alternative’s physical and data security.
Supplier Flexibility: Potential “lock-ins” from a technical or contractual standpoint.

For example: Rating situations as simple, difficult/costly, or impossible to negotiate regarding software, hardware, site, and service provider commitments.

Compliance

Verification, internal and/or third-party, of a deployment alternative’s compliance with regulatory, industry, or other relevant criteria.
Government: Legally mandated reporting obligations associated with the application or business service. For example: HIPAA, Sarbanes-Oxley, PCI-DSS.
Corporate Policies: Internal reporting requirements associated with the application or business service. For example: Data protection and privacy, ethical procurement, Corporate Social Responsibility.
Compliance & Certifications to Industry Standards: Current or recurring validations achieved by the site or service provider, beyond internal and governmental regulations. For example: SAS 70®, SSAE 16, Uptime Institute Tier Certification or M&O Stamp of Approval, ISO®.

Sustainability

Environmental consequences of a deployment alternative.
Carbon and Water Impact: Carbon and water use for given a site or service. For example: The Green Grid’s Carbon Usage Effectiveness (CUE)™ and Water Usage Effectiveness (WUE)™ metrics.
Green Compliance & Certifications: Current or recurring validations achieved by the site or service provider, beyond internal and governmental regulations, of sustainable design and/or operations practices. For example: LEED®, BREEAM®, Carbon Credits, European Union Code of Conduct, U.S. EPA Energy Star®, and The Green Grid’s® DC Maturity Model equalizer.
PUE Reporting: PUE is an industry-accepted indicator of a site or service provider’s efficiency commitment.

Service Quality

A deployment alternative’s capability to meet end-user performance requirements.
Application Availability: Computing environment uptime at the application or operating system level.
Application Performance: Evaluation of an application functional response; acceptable speeds at the end-user level.
End-User Satisfaction: Stakeholder response that an application or deployment alternative addresses end-user functional needs. For example: End-user preference for Graphical User Interfaces or Operating/Management Systems tied to a specific deployment alternative.

Using Uptime Institute FORCSS

This system was developed and validated by thought leaders in the enterprise IT industry to ensure
usefulness by those who inform senior-level decision makers.Many organizations already perform due diligence that would include most of this process. But the Uptime Institute FORCSS system provides the following:

A structure and a set of common definitions agreed upon by an elite group of data center owners and
operators from around the world.
A succinct and effective way to communicate recommendations to the C-level executives.

Uptime Institute believes the FORCSS system is sufficiently flexible and comprehensive to improve IT investment decisions.

Notes on using FORCSS:

Uptime Institute acknowledges that there are overlaps and dependencies across all six factors. But, in order to provide a succinct, sufficient process to inform C-level decision makers, categories must be finite and separate to avoid analysis paralysis. The purpose of FORCSS is to identify the business requirements of the IT service, and pragmatically evaluate capabilities of potential deployment options as defined.

Uptime Institute recognizes organizations will have unique business demands and priorities. Therefore, it will be necessary for each company conducting a FORCSS analysis to weigh each criteria according to specific business requirements. For example, most companies try to maximize data center efficiency. But, for a growing number of organizations, overall environmental sustainability of operations and supplier choices is a very public (therefore critical) aspect of their business. Organizations that put a high value on sustainability will weigh the criteria accordingly when applying FORCSS in their organizations. Other organizations may weigh sustainability at a low value, as inconsequential.

Uptime Institute is currently evaluating numerous concepts for FORCSS ‘displays.’ These displays will be graphical in nature, rather than a numerical score, to allow for evaluation of each factor within FORCSS and provide a visual comparison of one deployment alternative against another. Please visit FORCSS on the Uptime Institute Web site for the latest information and tools.

Uptime Institute’s Unique Approach To FORCSS Development

In order to ensure the development of a well-rounded, thorough, and useful methodology, Uptime Institute facilitated a series of Charrettes. (A Charrette is a design process that brings stakeholders together at one time, in one place, as a group completing design tasks in a focused, time-limited effort.) The benefits of this approach are that the stakeholders begin with a common understanding of the design objective, share in the development process, and receive immediate feedback on the result of their deliberations.

In October 2011 the first Charrette was held, composed of peers within Uptime Institute and the 451 Group. The fundamental objective was to define the problem and assemble an original framework to be submitted at a second Charrette of key industry stakeholders. This initial work created the structure of a multiple-component solution, including business functions, facilities infrastructure, computing hardware, and applications performance perspectives.

Building on this foundational effort, in January 2012, Uptime Institute hosted over 25 hand-picked senior technology executives from large organizations across multiple industries at a second Charrette. Uptime Institute invited executive leaders at organizations whose decisions impacted international markets and brands and provided broad experience making decisions influenced by multiple factors and challenges.

This group edited and crystallized the original structure into six top-level criteria, or principal factors, that make up the FORCSS framework. Following the second Charrette, Uptime Institute identified three key components for each of the six top-level criteria to further define the FORCSS criteria, and presented the expanded system at Uptime Institute Symposium in Santa Clara, CA, in May 2012.

At Symposium, Uptime Institute reconvened the previous group of executives who comprised the second Charrette, as well as new end-user participants, for a follow-up Charrette on FORCSS.

Some of the new participants represented companies that had been in business for more than 100 years and plan to be in business another 100 years. Many of these organizations are at a strategic inflection point—do they modernize or minimize their IT infrastructures? The participants recognized the FORCSS approach as a means to improve confidence in decision making and avoid unintended consequences.

The third Charrette participants were tasked with vetting the expanded 18-point FORCSS process. The discussions and debate provided substantive insight resulting in changes to the components making up the six factors.

The majority of executives at the second Charrette reported consistent and enduring challenges within their organizations:

FORCSS Begins With These Steps:

Incomplete data when evaluating internal assets, such as data center capital costs that aren’t included in
TCO calculations for IT projects, or lack of in sight into personnel costs associated with providing internal
IT services.
Lack of insight into cloud computing security, pricing models, and reliability data. Lack of credible cloud
computing case studies.
Inconsistency in reporting structures across geographies and divisions and between internal resources and
colocation providers.
Difficulty articulating business value for criteria not tied to a specific cost metric, like redundancy or
service quality. Difficulty connecting IT metrics to business performance metrics.
Challenge of capacity planning for IT requirements forecast beyond six months due to evolving
architecture/application strategy and shifting vendor roadmaps.
Difficulty collecting information across the various stakeholders, from application development,
corporate real estate.
The first step is to identify the new application workload to be analyzed. The process is designed to
evaluate a specific application workload against specific, existing assets or external resources (or in cases
where a new site or service may be considered, detailed evaluation of planned asset).
Identify and engage the decision maker or C-level executive who will sign off on the final project. Provide
background on FORCSS as a selection tool for winnowing deployment choices and eliminating blind spots
in an organization.
Identify senior management in adjacent divisions to assess the implementation being considered. No one
person will have sufficient insight into all areas of an organization. Be sure to include application owners
and users, facilities/real estate, IT operations, and any other stakeholders.
Set parameters for your application to determine the functional life cycle of the application or IT service
being analyzed in order to determine the value of the application, appropriate cost profile, and other
necessary attributes that ensure the viability of business solution.

Uptime Institute recognizes the many challenges in conducting a FORCSS analysis:

Getting buy-in and understanding of the FORCSS language across disciplines and at the C-level.
Avoiding inappropriate weighting of Risk or other criteria based on division bias.
Obtaining objective data on third-party service provider pricing and availability.

Also, many companies may be challenged by the subjective nature of some of the inputs or have difficulty determining the true costs and benefits of various projects.

The purpose of this timely initiative is to improve a company’s investments and decision making, not to compare one company’s decisions against another’s. The way one organization determines the business value of an application or total cost of providing a service does not need to be the same as how another organization gathers those same data inputs.

A FORCSS analysis may pose tough questions without easy answers, but will help organizations make IT deployment decisions with confidence.

Julian Kudritzki joined the Uptime Institute in 2004 and currently serves as Chief Operating Officer. He is responsible for the global proliferation of Uptime Institute Standards. He has supported the founding of Uptime Institute offices in numerous regions, including Brasil, Russia and North Asia. He has collaborated on the development of numerous Uptime Institute publications, education programs and unique initiatives such as Server Roundup and FORCSS. He is based in Seattle, WA.

Matt Stansberry is Director of Content and Publications for the Uptime Institute and also serves as Program Director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly Editorial Director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for over a decade.

ATD Interview: Christopher Johnston, Syska Hennessy

Fuel System Design and Reliability

Diesel exhaust after treatment for data centers

How business practices can impair data center availability

Explaining the Uptime Institute’s Tier Classification System (April 2021 Update)

The Making of a Good Method of Procedure

A Look at Data Center Cooling Technologies

Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics

Implementing Data Center Cooling Best Practices