Data Center Security

Hacking the Physical Data Center – Not just for Hollywood Movies

We have all seen big headline stories for years about overseas network hackers who are able to extract millions of account details and social security numbers from the retail, financial and a litany of other industries. And Hollywood has done a great job of painting the picture about bad guys physically breaking into data centers to steal information making them rich men overnight. But a less considered situation is the one the fits somewhere in the middle: online hacking of the the physical data center itself. And the reality of what this hacking entails, lies somewhere between what you see in the Hollywood movies and those stories heard around your water-cooler being shared between your IT staffers. Data Center hacking is real, occurs nearly every day, and goes far beyond downloading customer data which shows up on the evening news. And in many cases, it may not even be noticed for days, weeks or years!

Every organization needs to be aware of all of the ways in which their business can be compromised. In fact, now more than ever they need to take all of these threats seriously as their company’s core business is transforming to be digital. Every process and asset is represented digitally. Workflows and actions are defined digitally.  Building systems and energy usage is managed digitally too. And all of this access is connected to systems in the data center. Get into the data center logically or physically and a person can wreak havoc throughout an entire organization.

Security Starts in the Physical World

Security Starts with Policies

But let’s focus on the here and now; the ground level reality of what this all means and how it happens…

It actually started quite innocently. Over the years, manufacturers realized that hardwired RS232/RS485 and other proprietary control wiring interfaces were a dead end, so they turned to open systems approaches and added IP access to their infrastructure control devices. This open connected approach simplified the systems design and integrations needed for desired results. As such, nearly every company has now placed the vast majority of their infrastructure control systems on company intranets. The air conditioning, the power distribution systems, the security cameras, ID readers and door access. Yup, its now all digital and all IP controlled on the network. And its easy to see the huge upside possible for efficiency and intelligence, but the downside is that essentially anyone with access to the systems or network can turn all of these systems on and off, wipe security footage, and unlock doors.

So why has this gone unnoticed? We publicly hear about the security breaches that result in the bad guys gathering sensitive customer data because it affects us all. But all of the other breaches are mostly invisible to the public. Who really cares about a breach that simply shuts off the air-conditioning in the data center? Or the breach that unlocks all of the perimeter doors and allows criminals to wander around? Or the one that turns off the security cameras where your diesel generator fuel is stored? And these types of hacking aren’t just invisible to the public, but in many cases the executive team as well.

So how did we get here? This situation is very different than in decades past, since most of the people responsible for these control systems are not accustomed to worrying about such ‘invisible’ security risks. Many of these folks still think about bigger padlocks rather than firewalls. They think about taller fences and night-vision cameras. The very idea that all of those physical/mechanical systems can be useless when a hacker “rides the wire” is hard to imagine for many of them.

And it all starts with policy and procedure. A huge percentage of the hacking we hear about today actually originates from good people that HAVE AUTHORIZED ACCESS into the data center to do their regular jobs. What? Good people hack? Well, not intentionally. Regularly, data center operators plug in flash drives that they used at home, which have become infected. And as bad as that sounds with operators, higher-level technicians who have access to deeper or more strategic systems can do the same thing, unintentionally releasing a storm! They walk into into a data center to work on some type of control system and the first they do is connect THEIR laptop to the systems. That first hop is essential to any bad guy hack, so policies that prevent these types of common ‘innocent’ practices can offer huge reductions in risk and incidents.

Its hard to comprehend the true cost of physical systems that have been hacked. Security cameras that protect millions of dollars of capital and other assets can be compromised enabling theft. Cooling systems can be switched off for a mere few minutes taking down entire data centers with hundreds of millions of dollars of equipment and all of the lost business and valuation that goes with massive downtime. And this hacking isn’t even focused on the customer data breaches are all hear about every month.

So for 2018, commit to a top to bottom assessment of how you fare in this security framework. You likely already have lots of people focused on the Network and Server portions of security, but I would suspect that you have far fewer people working at the very top (policy and physical) or the very bottom (application and Device) of the security framework we are discussing.

Get on it!

Shrinking Data Center

As Data Center Footprint shrinks, the Importance of the Data Center Grows!

Today, your data center is more important and essential than it has ever been!

Let me explain. We are at a cross-roads in thinking for IT Infrastructures. For nearly 40 years (about the time when the first connected computing systems came into being), bigger has always been better. Bigger processors, bigger disk drives, bigger servers, and bigger data centers to house it all in. Most seasoned IT professionals have taken great pride in the amount of stuff they could acquire and then cobble together to create bespoke IT solutions that could meld and adjust to changing business conditions. And for many years, bigger worked. Every imaginable application was thrown at the IT organization and more stuff would be added to the infrastructure to allow for these new services and expanded capacity. Bigger really was better!

Balancing Public and Private Clouds is essential for 2018

But that all changed in the last ten years as applications and infrastructures were virtualized and all of the essential active devices increased in density and at the same time shrank in physical size. And all of this was happening at the very same time that the public cloud was coming into its own stride and offering new ways of running applications without consuming any owned and operated gear at all! In-house data centers began to shrink in size and take on many of the attributes of the Public Cloud, forming Private Clouds.

Now, some of our colleagues are already equating this shrinking trend to the elimination of data centers altogether. They see the physical size as trending downward and extrapolate this to zero footprint at some point in time. Not so fast partner! This isn’t a statistical and theoretical discussion. Its about identifying all of the business services that are needed to run a business, and then determining the best platform to deliver those services upon. And the platform choice is not one-size-fits-all…. it is a balancing act based upon cost, capacity, timing, security, support and regulatory needs!

Perhaps counter intuitive, but the shrinking footprints of these data centers that you and your team are operating and optimizing into private clouds today are actually becoming MORE important to the business itself! Think about it. What are the applications and services that are being kept in these owned and/or operated data centers? Well, typically its the most critical ones. It’s the secret sauce that you hold close to the vest. It’s the stuff that runs the very essence of your business. It’s your customer detail and your confidential data. It’s the stuff that can’t leave your four walls for jurisdiction, regulatory and/or security reasons.

Make no mistake, the public cloud will help us all offload capacity and allow us access to new applications that can not be brought in-house easily or cost-effectively. The Public Cloud provides a great platform for some applications, but as we have all learned in our IT careers, this is NEVER a one-size-fits-all game, and to focus on a zero-footprint end-game is irrational, and contrary to good business and technology understanding. Each IT business service needs to be delivered on the most suitable platform. And it is the definition of “most suitable” that will get us all in trouble if we are not careful. In cases of security, jurisdiction or regulatory compliance, that definition will be more stringent than many other applications and your ability to reliably deliver those ‘bet the business’ core applications has never been more critical.

In 2018, those IT professionals that really get their business’ needs, and can defend WHY they have satisfied those needs in the various technology platforms available to them today, will be handsomely rewarded….

Delivering IT in 2018 is All About Accountability!

It has always been incredibly difficult to align an organization’s business needs and strategies with the underlying information technologies required to get there. For many years, the long-tenured IT organization was given a tremendous amount of leeway due to the complex nature of delivering IT. IT was viewed as an almost magical discipline, something that had to exist at the genetic level of the people that called themselves IT professionals. In fact, the executive team knew that IT was a domain that few humans could comprehend, so they rarely asked any deep questions at all! They didn’t really challenge budgets, timing or strategy for that matter. Instead, they hired IT experts with impressive resumes and trusted them to deliver. In doing so, they simply let those experts TELL THEM what was possible IT, how soon they could deliver it online, and how much it would cost. And since these were IT experts, everyone listening just wrote down what they heard and considered it as fact. Not a lot of discussion or challenge to their statements. The tail really was wagging the dog!

Last summer I conducted a survey of IT professionals during the VMworld trade-show held in Las Vegas.  Not surprisingly, almost 80% of those people I surveyed indicated that they had been personally focused on delivering IT services for at least 10 years. A bit more surprising was that half of those surveyed actually indicated it was almost twice that amount! The IT world is full of long-term practitioners. So, during such long stints, many individuals find an approach or ‘rhythm’ that works for them. From their perspective, this rhythm continues is tried and true and will always meet the needs of their constituents. And given the lack of questions from the executive team and other stakeholders and the apparent free-wheeling nature of IT deployment and direction, times were pretty good for all those years.

In 2018… this is a problem. In the era where all businesses are fast becoming digitally-centric, this model no longer works. There must be a plan that is well understood and very defend-able, in ALL contexts. There is massive accountability for every action taken and every dollar spent, and just being able to deliver raw capacity and make systems talk to each other is expected as table-stakes.

And the strategies must be well-conceived and sound. Think back about 5 years ago when nearly every company on the planet was blindly jumping on the Public Cloud band wagon like drunken sailors on holiday leave.  That was their ‘strategy’. It sounded really good on paper and appeared to be very high tech.  Consequently, this Public Cloud cut-over plan was presented as the savior to all of the IT woes that existed….

Then reality set in and these same drunken sailors were sobering up and realizing that their actual business needs were not always being met by just abandoning what they already had and foolishly running en mass to the Public Cloud. Whether it was cost or access or predictability, there were one or more aspects of their shiny new “Cloud” strategy that simply didn’t pass the real-world muster. Oops.

So companies are now (re-)thinking through their IT strategies plans. They realize that due to economics, applications or audience they need to create a defendable plan to blend existing resources (like their billion dollars invested in data center infrastructure) with new resources that are now available as a service. They need to have operational plans that account for a long list of “what happens if” scenarios. They need to understand the hardware, software and management costs of each component, and then apply those costs to each business service on a per-unit basis to determine if the business can afford ‘their plan’.

So the point is, true innovation is available everywhere you look in IT, but there is a white-collar wrapper needed for everything we do. New data centers can be built if the business model make sense. Existing data centers can be run better and more efficiently. Co-Location space can be added based on cost or capacity build-outs, and Public Cloud services can be adopted for dynamic capacity when the business needs it. This is not a one-size fits all game.

Accountability is not a FOUR letter word (its actually 14), its your way of life if you are going to be a successful IT professional in 2018.


By Mark Harris, Senior Vice President of Marketing, Uptime Institute

Fire Suppression Systems Bring Unexpected Risk

Hazards include server damage from loud noise during discharge of inert gas fire suppression systems

By Kevin Heslin, with contributions from Scott Good and Pitt Turner

A downtime incident in Europe has rekindled interest in a topic that never seems more than a spark away from becoming a heated discussion. In that incident, the accidental discharge of an inert gas fire suppression system during testing damaged the servers in a mission-critical facility.

The incident, which took place while testing the fire suppression system 10 September 2016 at an ING facility in Bucharest, destroyed dozens of hard drives, according to published reports in the BBC and elsewhere. As a result, ING was forced to rely on a nearby backup facility to support its Romanian operations. The event was of great interest to Uptime Institute’s EMEA Network because of the universal requirement for fire protection in data centers. Uptime Institute and Network principals inaugurated a number of information exchanges at ensuing Network meetings.

Fires that originate in data centers are relatively rare and are usually caused by human error during testing and maintenance or by electrical failures, which tend to be self-extinguishing. Other fires spread to the data center from other spaces. At these times, the need for an effective and functioning fire suppression system is obvious, and the system must provide life safety and protect expensive gear and mission-critical data. However, the fire suppression system can pose a risk to operations when inadvertently activated during testing and maintenance. In addition, the fire suppression systems, when deployed, also cause damage in a facility.

These considerations mean that the choice and design of a fire suppression system must meet the business needs and fire threats the facility is likely to face. Water-based systems, for example, will destroy sensitive IT gear when deployed. In general, however, the loss of IT gear in a fire is acceptable to insurance companies and local authorities having jurisdiction (AHJs), who view equipment as replaceable as long as the system saves lives and preserves the building. Data center operators will place a higher value on the data and operations.

Some data centers, however, do deploy inert gas fire suppression systems. In general use, these systems can be used to protect irreplaceable or extremely costly gear. High-performance computers, for example, tend to be far more expensive to replace than standard X86 servers. In theory, inert gas fire suppression systems prevent water from entering the server room via sprinkler systems. However, discharge of an inert gas system has been shown to damage data center servers—even when there is no fire, and so many facilities are turning to pre-action systems, which also remove the presence of water from the data center floor, except when activated. According to one major vendor of both types of fire suppression systems, inert gas systems better protect IT equipment because they do not damage electric and electronic circuits, even under full load operation. In addition inert gas systems can suppress deep-seated fires, including those inside a cabinet.

Uptime Institute agrees that accidental charges of inert gas fire suppression systems are rare. But, at the same time, according to the 2017 Uptime Institute Data Center Industry Survey, about one-third of data center operators have experienced an accidental discharge. In fact, in the same survey, respondents were three times more likely to have experienced an accidental discharge than an actual fire.

Beyond that point of agreement, however, consensus throughout the industry is rare, with much uncertainty on exactly how IT gear is damaged by the discharge of inert gas, how to protect against the damages, or even whether inert gas fire suppression system vendors or IT manufacturers are best positioned to eliminate the problem. Still anecdotes continue to surface, and vendors have documented, during test conditions, the phenomenon in which loud noises from fire suppression systems impaired the performance of data center servers or disabled them, either temporarily or permanently, leading to data loss.

Uptime Institute notes that vendors have tried to address problems tied to the release of inert gasses by redesigning nozzles and improving sensors to reduce false positive signals. Uptime Institute also agrees with vendor recommendations regarding the use of inert gas systems, including:

  1. Installing racks that have doors to muffle noise
  2. Installing sound-insulating cabinets
  3. Using high-quality servers or even solid-state servers and memory
  4. Slowing the rate of inert gas discharge
  5. Installing walls and ceilings that incorporate sound-muffling materials
  6. Aiming gas discharge nozzles away from servers
  7. Removing power from IT gear before testing inert gas fire suppression systems
  8. Muffling alarms during testing
  9. Replicating data to off-site disk storage

A more dramatic step would be to move to pre-action (dry pipe) water sprinkler or chemical suppression systems, but at least one insurance broker recommends the use of inert gas systems in conjunction with a water system as part of a two-phase fire suppression system.

Regardless, pre-action fire suppression systems have become more common. The use of water means that facility owners are protected against the total loss of a data center, and the dry-pipe feature—originally developed to protect against fire in cold environments such as parking garages or refrigerated coolers—protects facilities from the consequences of an accidental discharge in white spaces. In many applications, they are also the more economical choice, especially as local codes and authorities may require the use of a water suppression system, and the inert gas system then becomes not a replacement, but a fairly expensive supplement.

Still inert gas fire suppression systems continue to have their adherents, and they may make business sense in select applications. Data center operators may consider using inert gas in locations where water is scarce or when an application makes use of very expensive and unique IT gear, such as supercomputers in HPC facilities or old-style tape-drive storage. In addition, inert gas systems may be the best choice when water damage would cause irreplaceable data to be irretrievably lost. In these instances, Uptime Institute believes that organizations would be better served by developing improved backup and business continuity plans.

Those considering inert gas suppressions may be somewhat relieved to learn that vendors have taken steps to minimize damage from discharges of inert gas systems, perhaps the most important of these being improved sensors that register fewer false positives. It is also entirely possible to develop rigorous procedures that reduce the likelihood of an inadvertent discharge due to human error, which is by far the most common cause of accidental discharge.

Some industry sources believe that the problem first began to manifest around 2008 as inert gas systems began to become popular as fire suppression systems in data centers. Others note that server density increased at about the same time.

An examination of Uptime Institute Network’s Abnormal Incident Reports (AIRs) database does not support this belief. It includes reports dating back as far as 1994 and 1995, with no obvious increase in 2008. In total, the AIRs database includes 54 incidents involving inert gas fire suppression systems. Of these reported incidents, 15 involved accidental discharge, with 2 downtime reports. However, many of the incidents took place in support areas, with no possibility of server downtime. Still the documented possibility of damage to IT gear and facility downtime worries many data center operators.

Uptime Institute data finds that operations and management issues lead to most accidental discharges of fire suppression systems.

Uptime Institute Network members commonly use the AIRs database to identify and prevent problems experienced by their colleagues. In this case, Network members report that 27 incidents were caused by technician error, with another 14 resulting from no maintenance or poor procedures, 9 resulting from a manufacturing problem, and 4 from a design omission or installation problem (see the table).

Although these results are typical for most system failures, they are particularly relevant to discussions of all fire suppression systems, as differences of opinion exist about how exactly the discharge of inert gas systems damages IT gear. Manufacturers believe that the sound of the discharge damages the drives, but others say it is the noise of the fire alarm, either independently or by contributing to the noise level in the facility.

Uptime Institute does not believe this to be a realistic concern, as the AIRs database includes no instances where fire alarms by themselves resulted in data center downtime or server damage.

The fire suppression industry is aware of the problem. During a 2014 meeting with server manufacturers, they acknowledged the problem, noting concerns about the noise (actually a pressure wave moving through the room) emitted by inert gas suppression systems during discharge. The vendors theorized that the volume (decibels) and frequency (hertz) of sound emitted during system discharge combine to damage servers. In response, many vendors redesigned nozzles to reduce pressure, and therefore, the decibel level of the discharge. This, one vendor said, would not eliminate the problem as each server type has a different sensitivity and new server types are also susceptible to different volume and frequency combinations. In heterogeneous environments, sensitivity may vary from server to server or rack to rack.

Vendors say that the time required for testing makes it impossible to develop inert gas suppression systems that discharge without affecting the many server types already available in the marketplace. Many environments are heterogeneous, so a discharge system may affect some of the equipment in a rack or room without doing any apparent damage to other equipment. Vendors have also introduced sensors that are more accurate, reducing the likelihood that a false alarm will trigger an unnecessary discharge.

These same vendors note higher-grade enterprise servers are less susceptible to damage from inert gas discharges. These servers, they note, are more likely to be installed in quality racks that have doors and other features that muffle noise and cushion servers against the shock from the sound waves. In addition, enterprise servers are designed to be operated in large cabinets where many drives spin at the same time, creating both noise and vibration. These drives are tested to withstand harsher environments than consumer drives. They track more precisely and have high sustaining data rates, making them resistant to sound and vibration. According to one vendor, even slamming doors can degrade the performance of the consumer drives sometimes be found in data centers.

These measures can be effective, according to a Data Center Journal article written in 2012 by IBM’s Brian P. Rawson and Kent C. Green. They explain that noise causes the read-write element to go o the data track, “Current-generation HDDs have up to about 250,000 data tracks per inch on their disks. To read and write, the element must be within ±15% of the data track spacing. This means the HDD can tolerate less than 1/1,000,000 of an inch off set from the center of the data track—any more than that will halt reads and writes.” They theorize that decreased spacing between data tracks made servers more susceptible to damage or degraded performance from noise. In the same article, Rawson and Green cite a YouTube video that shows how even a low-decibel noise such as a human voice can degrade HDD performance.

Uptime Institute notes that common security practices along with strong operational procedures relating to testing fire suppression systems can mitigate most risk associated with inert gas fire suppression systems.

Uptime Institute recommends that IT management teams work with risk managers to ensure that all stakeholders understand a facility’s fire suppression requirements and options before selecting a fire suppression system. Operational considerations should also be included, so that the system is well suited to an organization’s risk exposure and the business requirements.

Uptime Institute believes that most data centers would be best served by a combination of pre-action (dry pipe) sprinkler system and high-sensitivity smoke detection. Most AHJs, risk managers, and insurance companies will support this choice as long as other operating requirements are met, like having educated and trained staff providing building coverage. These authorities are generally quite familiar with water-based fire suppression systems, as these constitute the vast majority of installations in the U.S; however, they may not always be familiar with pre-action systems.

In instances when risk managers or insurers require an inert gas fire suppression system, operations staff may be able to mitigate the risk for accidental discharge by implementing documented policies, procedures, and practices, etc. These documents should include as many of the vendor recommendations on page 67 as possible. In this way, the risk manager requirement for inert gas fire suppression should not be taken as the end of the discussion but rather the start of a dialog.

Finally, IT should continuously evaluate its fire suppression system and consider removing inert gas systems from spaces when its use changes. Uptime Institute has documented the use of inert gas fire suppression in spaces that were converted to storage from IT. In this instance, the facility increased its risk of accidental discharge but gained no benefits at all.


Kevin Heslin is chief editor and director of Ancillary Projects at Uptime Institute. In these roles, he supports Uptime Institute communications and education efforts. Previously, he served as an editor at BNP Media, where he founded Mission Critical, a commercial publication dedicated to data center and backup power professionals. He also served as editor at New York Construction News and CEE and was the editor of LD+A and JIES at the IESNA. In addition, Heslin served as communications manager at the Lighting Research Center of Rensselaer Polytechnic Institute. He earned the BA in journalism from Fordham University in 1981 and a BS in technical communications from Rensselaer Polytechnic Institute in 2000.

Myths and Misconceptions Regarding the Uptime Institute’s Tier Certification System

True or false? When it comes to Tier Certification, just ask Uptime Institute

By Uptime Institute Staff

Uptime Institute’s Tier Classification System for data centers has reached the two-decade mark. Since its creation in the mid-1990s, Tiers has evolved from a shared industry terminology into the global standard for third-party validation of data center critical infrastructure.

In that time, the industry has changed, and Tiers has evolved with it, remaining as relevant and important as  it was when Uptime Institute first developed and disseminated Tiers. At the same time, Uptime Institute has observed that public understanding of Tiers has been clouded by the many myths and misconceptions that have developed over the years.

Uptime Institute has long been aware that not everyone fully understands the concepts described by the Tier Standards, and yet others disagree with some of the definitions. Both these situations lead to classic misunderstandings in which individuals substitute their preferences for accurate information. Other times, however, marketers have invoked a kind of shorthand based on Tiers. While objectionable, these marketers have coined terms like Tier III plus when speaking to their potential customers. These terms have no basis in Tiers but can be especially confusing to IT, real estate, procurement personnel, and even CFOs, all of whom might lack a technical background.

Other myths develop because some industry professionals reference old, out-of-date publications and explanatory material that is no longer valid. There may be other sources of myth, but knowing that the Uptime Institute is the only source of current and reliable information about Tiers is what is really important. We conduct numerous classes during the year, write many articles, and eld numerous inquiries to keep the industry current on Tiers.

Fundamentally, Uptime Institute created the Tier Classification System to consistently evaluate various data center facilities in terms of potential site infrastructure performance, or uptime. The system comprises four Tiers; each Tier incorporates the requirements of the lower Tiers.

• Tier I: Basic Capacity

• Tier II: Redundant Capacity Components

• Tier III: Concurrently Maintainable

• Tier IV: Fault Tolerant

Data center infrastructure costs and operational complexities increase with Tier level, and it is up to the data center owner to determine the Tier that fits the business’s need.

Uptime Institute is the only organization permitted to Certify data centers against the Tier Classification System. Uptime Institute does not design, build, or operate data centers. Uptime Institute’s role is to evaluate site infrastructure, operations, and strategy.

From this experience, we have compiled and addressed many of the myths and misconceptions. You can read about some of these experiences in Uptime Institute eJournal articles such as “Avoid Failure and Delay on Capital Projects: Lessons from Tier Certification” and “Avoiding Data Center Construction Problems.” For even more information, please contact us at https://uptimeinstitute.com/contact.

Tiers does not address business requirements. 

False. Tiers is a performance-based, business-case-driven data center benchmarking system. An organization’s risk tolerance determines the appropriate Tier for the business. In other words, Tiers is predicated on the business case of an individual company. Companies that fail to develop a unique business case for their facilities before developing a Tier objective are misusing Tiers and bypassing the internal dialogue that needs to occur.

Tier IV is the best. 

False. An organization’s tolerance for risk determines the appropriate Tier to support the business objective. Tier IV is not the best answer for all organizations, and neither is Tier II. Owners should perform due diligence assessments of their facilities before determining a Tier objective. If no business objective is defined, then Tiers may be misused to rationalize unnecessary investment.

Tier I and Tier II are tactical solutions, usually driven more by first-cost and time-to-market than life-cycle cost and performance (uptime) requirements. Organizations selecting Tier I and Tier II solutions typically do not depend on real-time delivery of products or services for a significant part of their revenue stream. Generally, these organizations are contractually protected from damages stemming from lack of system availability.

Rigorous uptime requirements and long-term viability are usually the reason for selecting strategic solutions found in Tier III and Tier IV site infrastructure. In a Tier III facility, each and every capacity component can be taken out of service on a planned basis, without affecting the critical environment or IT processes. Tier IV solutions are even more robust, as each and every capacity component and distribution path can sustain a failure, error, or unplanned event without impacting the critical environment or IT processes.

A Tier IV solution is not better than a Tier II solution. The performance and capabilities of a data center’s infrastructure should match a business application; otherwise companies may overinvest or take on too much risk.

For example, before building a Tier II Certified Constructed Facility, which by definition does not include Concurrent Maintainability across all critical subsystems, an owner should consider whether the business can tolerate a planned or maintenance-related shutdown and how the site operations team would coordinate a site-wide shutdown for maintenance. Similarly business objectives should drive decisions to build a Tier I, Tier III, or Tier IV Certified Constructed Facility.

Component count determines Tier level. 

False. Tier Certification is a performance-based evaluation of a data center’s specific infrastructure; it is not a checklist or cookbook. Unfortunately, some industry shorthand employs N terminology—where N is de ned as the number of components that are minimally required to meet the load demand—to define availability. Incorporating more equipment can be described as designing an N+1, N+2, 2N, or 2(N+1) facility. However, increasing the component count does not determine or guarantee achievement of any specific Tier level, because Tiers also includes evaluation of distribution pathways and other system elements. Therefore, it is possible to achieve Tier IV with just N+1 components, depending on how they are configured and connected to redundant distribution pathways. 

Design Certification is the only Certification that matters. 

False. The first step in a Tier Certification process is a Tier Certification of Design Documents. Uptime Institute Consultants review the 100% design documents, ensuring each electrical, mechanical, monitoring, and automation subsystem meets the fundamental concepts and there are no weak links in the chain. The Design Certification is intended to be a milestone so that data center owners can commence data center construction knowing that the intended design meets the Tier objective.

Tier Certification of Design Documents applies to a document package. It is intended as provisional verification until the Tier Certification of Constructed Facility. Uptime Institute has not verified the constructed environment of these facilities, and thus cannot speak to the standard(s) to which they were built. To emphasize this point, Uptime Institute implemented an expiration date on Design Certifications. All Tier Certification of Design Documents awards issued after 1 January 2014 expired two years after the award date.

During a Facility Certification, a team of Uptime Institute consultants conducts a site visit, identifying discrepancies between the design drawings and installed equipment. The consultants observe tests and demonstrations to prove Tier compliance. Fundamentally, this is the value of the Tier Certification, finding these blind spots and weak points in the chain. Uptime Institute consultants say that in almost every site visit they find that changes have been made after the Design Certification was awarded so that one or more systems or subsystems will not perform in a way that complies with Tier requirements.

More recently, Uptime Institute instituted the Tier Certification of Operational Sustainability to evaluate how operators run and manage their mission-critical facilities. Even the most robustly designed and constructed facilities may experience outages without a well-developed comprehensive management and operation program. Certification at all three levels is how data center owners can be assured they are realizing the maximum potential of their data centers.

Tier levels specify an estimated downtime per year. 

False. Uptime Institute removed references to “expected downtime per year” from the Tier Standard in 2009, but they were never a part of the Tier definitions. Tier Standard: Topology is based on specific performance factors (outcomes) that demonstrate that a facility has met specific performance objectives, such as having redundant capacity components, Concurrent Maintainability (generally, the ability to remove any capacity or distribution component from service on a planned basis without impacting IT), or Fault Tolerance (generally, the ability to experience any unplanned failure in the site infrastructure without impacting IT). However, even a Tier IV data center, which is Fault Tolerant, may experience IT outages if it is not operated and managed effectively.

There are statistical tools to predict the frequency of failures and time to recover. Availability is simply the arithmetic calculation of time a site was available over total time. The number, frequency, and duration of disruptions will drive the availability result. However, caution is appropriate when using these tools. Human activity is often not considered by statistical models. In addition, the statistical prediction of a 100-year storm, for example, can obscure the possibility that several 100-year storms can happen in the same year.

Tier Certification applies only to newly built facilities. 

False. Uptime Institute has Certified many existing buildings. However, the process can be more challenging when working in facilities with live loads. For best results with an existing facility, the process should begin with a Tier Gap Analysis rather than a formal Certification effort. Tier Gap Analysis provides a high-level summary review for major Tier shortfalls. This allows the owner to make an informed decision whether to proceed with a detailed, exhaustive Certification effort. Tier Certification of Constructed Facility can be performed with any load profile, including resistive load banks, live critical IT load, or a mix.

Uptime Institute Tiers is U.S.-centric. 

False. Uptime Institute is currently delivering Tier Certifications in more than 85 countries. Tiers, which allows for many solutions and a variety of configurations, gives the design, engineering, and operations teams the flexibility to meet both local regulations and performance requirements. To date, there has not been a conflict between Tiers and local building codes, statutes, or jurisdictions.

TIA-942 is a guideline for Uptime Institute Tiers. 

False. In 2014 Uptime Institute and the Telecommunications Industry Association (TIA) agreed on a clear separation between their respective benchmarking systems to avoid industry confusion and drive accountability. In fact, any reference to the TIA rating of a data center may not include the word Tier.

The core objective of Uptime Institute Tiers is to define performance capabilities that will deliver availability required by the data center owner. By contrast, TIA member company experts focus on the need to support the deployment of advanced communications networks. See https://uptimeinstitute.com/uptime-tia for a more detailed explanation.

Utility feeds determine Tier level. 

False. According to Tier Standard: Topology, the only reliable source of power for a data center is the engine- generator plant. This is because utility power is subject to unscheduled interruption—even in places with reliable power grids. As a result, the number of utility feeds, substations, and power grids that provide public power to a data center neither predicts nor influences Tier level. As a consequence, utility power is not even required for Tiers. Most Tier Certified data centers use utility power for main operations as an economic alternative, but this decision does not affect the owner’s target Tier objective.

For Tier III and IV, the engine-generator plant must be operational at all times. 

False. Tiers does not require that the engine-generator plant actually run at all times; however, data centers will typically utilize a public utility a majority of the time for cost or regulatory reasons. At the same time, the engine-generator plant must be properly configured, rated, and sized to have the capability to carry the critical load without runtime limitations. Hence, the performance requirements outlined in the Tier Standard must be met with the data center supported by engine-generator power. Meeting these criteria requires special attention to engine-generator capacity ratings and power distribution.

Uptime Institute’s Tier system is based on the U.S. Environmental Protection Agency (EPA) regulations on diesel-engine operations.

False. There is no correlation between EPA’s Tiers (or other restrictions of engine-generator operation) and Uptime Institute Tiers, except that both systems use a similar hierarchical system of nomenclature. The EPA’s limits on runtime may complicate a facility’s testing and maintenance regimens and add costs when a facility is forced to rely on backup power for an extended period. However, runtime limitations posed by local authorities do not exempt a data center from having on-site power generation rated to operate without runtime limitations at a constant load.

EPOs (emergency power off) or other systems that shut down the critical load impact Tier objectives.

False. When code or the local authority having jurisdiction (AHJ) mandate an EPO, this does not prohibit Tier compliance. At the same time, Uptime Institute does not recommend EPO installation, unless it is compelled by a local code because even Tier Certified data centers are vulnerable to outages from purposeful or accidental activation of the EPO system. Analysis of the Uptime Institute Network’s Abnormal Incident Report (AIRs) database confirms that accidental EPO activation is a recurring cause of downtime.

The Tiers Standard requires that maintenance, isolation, and/or removal can be performed on the EPO system without affecting the critical load for Tier III data centers. Tier IV data centers additionally require a Fault Tolerant EPO system.

Uptime Institute Tiers requires raised floor. 

False. The choice of under floor or overhead cooling is a decision to be made by the owner based an operational preference. In Uptime Institute’s experience, a raised floor enhances operational flexibility over the long term. Yet, decisions such as raised floor or on-slab, Cold Aisle/Hot Aisle, containment of Cold Aisle/Hot Aisle, and gallery cooling can affect the efficiency of the computer room environment, but are not mandated by Uptime Institute Tiers.

Rack-based automatic transfer switches (ATS) meet the requirement for dual paths to a server (i.e., the server has one cord to the ATS, but the ATS [rack mount] has dual power feeds).

True. The Tier Standard includes a concession for equipment with odd numbers of cords (1,3,5) in the form of rack-mounted transfer switches to provide access to multiple power paths. However, Tier III and Tier IV data centers must still have multiple and independent feeds to the rack.

The Tier Standard focuses on ensuring that the facility’s infrastructure meets the requirements of the Tier objective. There are many reasons why a facility may contain single-corded IT devices or those with an odd number of power supplies, including lack of knowledge of the facility impacts, lack of options for equipment vendors, and colocation environments where facility personnel have no control over the types of IT devices within the data center. Rack-based transfer switches are most typically supplied by the IT side of the organization, so the facility’s infrastructure can meet the Tier objective. However, planned isolation or fault of these rack-based transfer switches may lead to an outage for individual racks or devices.

Tier II provides Concurrent Maintenance opportunities. 

Partially true. Tier II allows for Concurrent Maintenance of capacity components, but not distribution pathways or critical elements. So a Tier II Certified facility can perform Concurrent Maintenance on engine generators, UPS, chillers, cooling towers, pumps, air conditioners, fuel tanks, water tanks, and fuel pumps
but not switchboards, panels, transfer switches, transformers, bus bars, cables, and pipes. In many cases, this limitation will require the computer room to be shutdown for planned maintenance or replacement of critical pathways and elements.

The requirement to maintain any component, pathway, or element without shutting down equipment, known as Concurrent Maintainability, defines Tier III. Many owners’ business cases, including healthcare, domestic outsourcers, and state governments, require Tier III. The list of organizations that have protected their investment with Tier Certifications may be found on Uptime Institute’s website.

A Tier III facility is Tier compliant if one of the redundant branches is inactive. 

Partially true. Tier III requires active/active distribution for critical power distribution (which is defined as the output of the UPS and below). Outside of that, active/inactive is acceptable. This means that if a rack receives dual power from two separate power distributions, they must both normally be active. It is not allowable to have one feed normally disabled, nor is it Tier III compliant to have one of the power feeds directly fed from utility power while bypassing a UPS power source.

There are no active/active requirements for mechanical systems in Tier III data centers. So if there are N+1 chillers in a Tier III facility, with each chiller feeding separate A and B chilled water loops, it is permissible for one of the loops to be normally disabled, with all air conditioners normally fed from the same loop.

Facilities can’t be changed after Tier Certification of Constructed Facility. 

False. Infrastructure changes must be approached using carefully developed and written procedures and processes. If the topology of a facility changes, it may no longer be Concurrently Maintainable or Fault Tolerant, so clients should have Uptime Institute review designs and construction that might affect a facility’s topology to protect their investment and Tier Certification. Tier Certifications can be revoked if unreviewed changes compromise a facility’s Concurrent Maintainability or Fault Tolerance.

A Tier IV facility must have all its cooling units operating. 

Mostly false. The Tier Standard requires only that Tier IV facilities provide stable cooling to the IT and UPS environment for the time it takes for the mechanical systems to completely restart after a utility power outage and provide rated load to the data center. Tier IV data centers must also be able to maintain a stable thermal environment for the duration of the mechanical restart time and for any 15-minute period in accordance with the 2015 ASHRAE Thermal Guidelines. Tier IV facilities are also required to be active/active for all systems. This is intended to ensure that Continuous Cooling solutions are not negated by a lack of active operation of components. A lightly loaded data center or one with a very complex control system may be able to meet these requirements, without using all the available cooling units. However, there are Tier IV data center designs, especially those at full load, that would in fact require all units to run during normal operations.

Makeup air capacity counts as critical cooling capacity. 

Typically false. Makeup air systems in data center applications are typically designed to meet one of three objectives (or a combination of the three):

• Provide the fresh air for occupants required by an AHJ

• Ensure positive pressure in the data hall, which will help keep contaminants out of the data center

• Aid in meeting the humidity requirements of the data center

Data centers are rarely designed in a manner that require the makeup air handler to be active in order to meet the N cooling capacity requirement. However, the existence of a makeup air handler and its operations cannot negatively impact compliance with Tiers. For example, if a makeup air handler is not sized to ASHRAE extremes in compliance with Tiers, the additional heat load from this air handler at those conditions must be considered when sizing the critical cooling system.

It is not possible to utilize diesel rotary uninterruptible power systems (DRUPS) as continuous cooling in a Tier IV facility.

False. The Tier Standard is vendor and technology neutral, which means it is possible to Tier Certify facilities that include a wide variety of innovative and new technologies, including DRUPS.

Facilities tend to deploy DRUPS, which combine a diesel engine and a rotary UPS that uses kinetic energy to eliminate batteries, which require high levels of maintenance, somewhat frequent replacement, and a lot of extra space for battery placement/storage. This design usually provides ride-through times of between 10-30 seconds, depending on the application, which is shorter than other technologies. The Tier Standard does not include a minimum ride-through time. In fact, Uptime Institute has Certified several facilities that include DRUPS technology.

DRUPS may also be used to power motor loads. That means that caution must be exercised to ensure that the DRUPS have sufficient capacity to power each and every system and subsystem, including cooling systems, which is accomplished by putting the mechanical components on a no-break bus.

Ductwork does not need to comply with the Tier requirements. 

False. Tier Certification analyzes each and every system and subsystem down to the level of valve positions and panel feeds. Ductwork, just like piping systems, may need planned maintenance, replacement, or reconfiguration. As such, traditional ductwork distribution systems must meet the requirements of the Tier objective.

Uptime Institute understands that there is a lot of confusion about what “maintaining” ductwork means to meet Concurrently Maintainable requirements. But in this case, Concurrent Maintainability is about having the capability to isolate a system or part of a system to maintain, repair, upgrade, or reconfigure the data center without impacting any of the computer equipment.

Site Location affects Tier level. 

False. Although a critical consideration for the life-cycle operation of the facility and in determining, evaluating, and mitigating risk to the data center, geographical location does not affect a facility’s Tier level and is not part of the Tier Standard: Topology. 

Data center designers can take precautions to address the specific risks of a site. A data center sited in a high-risk earthquake zone can include equipment that has been seismically rated and certified as well as incorporate techniques that mitigate damage from seismic activity. Or if a data center has been sited in a high-risk tornado area, designers can consider wind protection measures for the exterior electrical and heat rejection equipment.

Site Location is a criterion in the Tier Certification of Operational Sustainability.


Chris Brown, Enrique Hernandez, Kevin Heslin, Julian Kudritzki, Eric Maddison, Ryan Orr, Sarah Thomas, Pitt Turner, and Rich Van Loo all contributed to this article.

Top Considerations for Addressing Data Center Facilities Management Risks

Uptime Institute recently published “Top Considerations for Addressing Data Center Facilities Management Risks,” a guide for reducing data center risks in enterprise IT organizations . The guide comprises 14 top considerations useful for designing and running an enterprise-grade data center facilities management program. The full guide is available for download on the Uptime Institute website.

Areas of discussion include:

  • Monitoring and Managing Staff Overtime
  • Managing a Critical Spares Inventory
  • Maintaining a Reliable Diesel Fuel Supply
  • Developing Emergency Operating Procedures (EOPs)
  • Regular Execution of Site Drills
  • Executing Against a Procedure-Based Control Methodology
  • Establishing a NFPA 70E Compliant Safety Program
  • Completing Short-Circuit Coordination Studies & Arc Flash Assessments
  • Implementing Battery Monitoring Systems
  • Preparing a Formalized Training Curriculum
  • Invoking Maintenance Program Best Practices
  • Enforcing Access Control & Vendor Supervision
  • Performing Regular Integrated and Key Systems Testing & Validation
  • Building & Integrating Robust Change Management Protocols