Achieving Uptime Institute Tier III Gold Certification of Operational Sustainability
Vantage Data Centers certifies design, facility, and operational sustainability at its Quincy, WA site
By Mark Johnson
In February 2015, Vantage Data Centers earned Tier III Gold Certification of Operational Sustainability (TCOS) from Uptime Institute for its first build at its 68-acre Quincy, WA campus. This project is a bespoke design for a customer that expects a fully redundant, mission critical, and environmentally sensitive data center environment for its company business and mission critical applications.
Achieving TCOS verifies that practices and procedures (according to the Uptime Institute Tier Standard: Operational Sustainability) are in place to avoid preventable errors, maintain IT functionality, and support effective site operation. The Tier Certification process ensures operations are in alignment with an organization’s business objectives, availability expectations, and mission imperatives. The Tier III Gold TCOS provides evidence that the 134,000-square foot (ft2) Quincy facility, which qualified as Tier III Certified Constructed Facility (TCCF) in September 2014, would meet the customer’s operational expectations.
Vantage believes that TCOS is a validation that its practices, procedures, and facilities management are among the best in the world. Uptime Institute professionals verified not only that all the essential components for success are in place but also that each team member demonstrates tangible evidence of adhering strictly to procedure. It also provides verification to potential tenants that everything from maintenance practices to procedures, training, and documentation is done properly.
Recognition at this level is a career highlight for data center operators and engineers—the equivalent of receiving a 4.0-grade-point average from Vantage’s most elite peers. This recognition of hard work is a morale booster for everyone involved—including the tenant, vendors, and contractors, who all worked together and demonstrated a real commitment to process in order to obtain Tier Certification at this level. This commitment from all parties is essential to ensuring that human error does not undermine the capital investment required to build a 2N+1 facility capable of supporting up to 9 megawatts of critical load.
Data centers looking to achieve TCOS (for Tier-track facilities) or Uptime Institute Management & Operations (M&O) Stamp of Approval (independent of Tiers) should recognize that the task is first and foremost a management challenge involving building a team, training, developing procedures, and ensuring consistent implementation and follow up.
BUILDING THE RIGHT TEAM
The right team is the foundation of an effectively run data center. Assembling the team was Vantage’s highest priority and required a careful examination of the organization’s strengths and weaknesses, culture, and appeal to prospective employees.
Having a team of skilled heating, ventilation and air conditioning (HVAC) mechanics, electricians, and other highly trained experts in the field is crucial to running a data center effectively. Vantage seeks technical expertise but also demonstrable discipline, accountability, responsibility, and drive in its team members.
Beyond these must-have features is a subset of nice-to-have characteristics, and at the top of that list is diversity. A team that includes diverse skill sets, backgrounds, and expertise not only ensures a more versatile organization but also enables more work to be done in-house. This is a cost saving and quality control measure, and yet another way to foster pride and ownership in the team.
Time invested upfront in selecting the best team members helps reduce headaches down the road and gives managers a clear reference for what an effective hire looks like. A poorly chosen hire costs more in the long run, even if it seems like an urgent decision in the moment, so a rigorous, competency-based interview process is a must. If the existing team does not unanimously agree on a potential hire, organizations must move on and keep searching until the right person is found.
Recruiting is a continuous process. The best time to look for top talent is before it’s desperately needed. Universities, recruiters, and contractors can be sources of local talent. The opportunity to join an elite team can be a powerful inducement to promising young talent.
Talent, by itself, is not enough. It is just as important to train the employees who represent the organization. Like medicine or finance, the data center world is constantly evolving—standards shift, equipment changes, and processes are streamlined. Training is both about certification (external requirements) and ongoing learning (internal advancement and education). To accomplish these goals, Vantage maintains and mandates a video library of training modules at its facilities in Quincy and Santa Clara, CA. In addition, the company has also developed an online learning management system that augments safety training, on-site video training, and personnel qualifications standards that require every employee to be trained on every piece of equipment on site.
The first component of a successful training program is fostering on-the-job learning in every situation. Structuring on the job learning requires that senior staff work closely with junior staff and employees with different types and levels of expertise match up with each other to learn from one another. Having a diverse hiring strategy can lead to the creation of small educational partnerships.
It’s impossible to ensure the most proficient team members will be available for every problem and shift, so it’s essential that all employees have the ability to maintain and operate the data center. Data center management should encourage and challenge employees to try new tasks and require peer reviews to demonstrate competency. Improving overall competency reduces over-dependence on key employees and helps encourage a healthier work-life balance.
Formalized, continuous training programs should be designed to evaluate and certify employees using a multi-level process through which varying degrees of knowledge, skill, and experience are attained. The objectives are ensuring overall knowledge, keeping engineers apprised of any changes to systems and equipment, and identifying and correcting any knowledge shortfalls.
Ultimately, discipline and adherence to fine-tuned procedures are essential to operational excellence within a data center. The world’s best-run data centers even have procedures on how to write procedures. Any element that requires human interaction or consideration—from protective equipment to approvals—should have its own section in the operating procedures, including step-by-step instructions and potential risks. Cutting corners, while always tempting, should be avoided; data centers live and die by procedure.
Managing and updating procedure is equally important. For example, major fires broke out just a few miles away from Vantage’s Quincy facility not long ago,. The team carefully monitored and tracked the fires, noting that the fires were still several miles away and seemingly headed away from our site. That information, however, was not communicated directly to the largest customer at the site, which called in the middle of the night to ask about possible evacuation and the recovery plan. Vantage collaborated with the customer to develop a standardized system for emergency notifications, which it incorporated in its procedures, to mitigate the possibility of future miscommunications.
Once procedures are created, they should go through a careful vetting process involving a peer review, to verify the technical accuracy of each written step, including lockout/tagout and risk identification. Vetting procedures means physically walking on site and carrying out each step to validate the procedure for accuracy and precision.
Effective work order management is part of a well-organized procedure. Vantage’s work order management process:
• Predefines scope of service documents to stay ahead of work
• Manages key work order types, such as corrective work orders, preventive maintenance work orders, and project work orders
• Measures and reports on performance at every step
Maintaining regular, detailed reporting practices adds yet another layer of procedural security. A work order system can maintain and manage all action items. Reporting should be reviewed with the parties involved in each step, with everyone held accountable for the results and mistakes analyzed and rectified on an ongoing basis.
Peer review is also essential to maintaining quality methods of procedure (MOPs) and standard operating procedures (SOPs). As with training, pairing up employees for peer review processes helps ensure excellence at all stages.
IMPLEMENTATION AND DISCIPLINE
Disciplined enforcement of processes that are proven to work is the most important component of effective standards and procedures. Procedures are not there to be followed when time allows or when it is convenient. For instance, if a contractor shows upon site without a proper work order or without having followed proper procedure, that’s not an invitation to make an exception. Work must be placed on hold until procedures can be adhered to, with those who did not follow protocol bearing accountability for the delay.
For example, Vantage developed emergency operating procedures (EOPs) for any piece of equipment that could possibly fail. And, sure enough, an uninterruptible power supply failed (UPS) during routine maintenance. Because proper procedures had been developed and employees properly trained, they followed the EOP to the letter, solving the problem quickly and entirely eliminating human error from the process. The loads were diverted, the crisis averted, and everything was properly stabilized to work on the UPS system without fear of interrupting critical loads.
Similarly, proper preparation for maintenance procedures eliminates risk of losing uptime during construction. Vantage develops and maintains scope of service documents for each piece of equipment in the data center, and what is required to maintain them. The same procedures for diverting critical loads for maintenance were used during construction to ensure the build didn’t interfere with critical infrastructure despite the load being moved more than 20 times.
Transparency and open communication between data center operators and customers while executing preventative maintenance is key. Vantage notifies the customer team at the Quincy facility prior to executing any preventative maintenance that may pose a risk to their data haul. The customer then puts in a snap record, which notifies their internal teams about the work. Following these procedures and getting the proper permissions ensures that the customer won’t be subjected to any uncontrolled risk and covers all bases should any unexpected issues arise.
When procedure breaks down and fails due to lack of employee discipline, it puts both the company and managerial staff in a difficult position. First, the lack of discipline undermines the effectiveness of the procedures. Second, management must make a difficult choice—retrain or replace the offending employee. For those given a second chance, managers put their own jobs on the line—a tough prospect in a business that requires to-the-letter precision at every stage.
To ensure that discipline is instilled deeply in every employee, it’s important that the team take ownership of every component. Vantage keeps all its work in-house and consistently trains its employees in multiple disciplines rather than outsourcing. This makes the core team better and more robust and avoids reliance on outside sources. Additionally, Vantage does not allow contractors to turn breakers on and off, because the company ultimately bears the responsibility of an interrupted load. Keeping everything under one roof and knowing every aspect of the data center inside and out is a competitive advantage.
Vantage’s accomplishment of Tier III Gold Certification of Operational Sustainability validates everything the company does to develop and support its operational excellence.
Mark Johnson is Site Operations Manger at Vantage Data Centers. Prior to joining Vantage, Mr. Johnson was data center facilities manager at Yahoo. He was responsible for the critical facilities infrastructure for the Wenatchee and Quincy, WA, data centers. He was also a CITS Facilities Engineer at Level 3 Communications, where he was responsible for the critical facilities infrastructure for two Sunnyvale, CA, data centers. Before that, Mr. John was an Engineer III responsible for critical facilities at VeriSign, where he was responsible for two data centers, and a chief facilities engineer at Abovenet.