By Matt Stansberry and Lee Kirby
Driving operational excellence across multiple data centers is exponentially more difficult than managing just one. Technical complexity multiplies as you move to different sites, regions, and countries where codes, cultures, climates and other factors are different. Organizational complexity further complicates matters when the data centers in your portfolio have different business requirements.
With little difficulty, an organization can focus on staffing, maintenance planning and execution, training and operations for a single site. Managing a portfolio turns the focus from projects to programs and from activity to outcomes. Processes become increasingly complex and critical. In this series of interviews, you will hear from practitioners about the challenges and lessons they have drawn from their experiences. You will find that those who thrive in this role share the understanding that Operational Excellence is not an end state, but a state of mind.
This interview is part of a series of conversations with executives who are managing diverse data center portfolios. The interviewees in this series participated in a panel at Uptime Institute Symposium 2015, discussing their use of the Uptime Institute Management & Operations (M&O) Stamp of Approval to drive standardization across data center operations.
John Sheputis: President, Infomart Data Centers
Don Jenkins: VP Operations, Infomart Data Centers
Give our readers a sense of your current data center footprint.
Sheputis: The portfolio includes about 2.2 million square feet (ft2) of real estate, mostly data center space. The facilities in both of our West Coast locations are data center exclusive. The Dallas facility is enormous, at 1.6 million ft2, and is a combination of mission critical and non-mission critical space. Our newest site in Ashburn, VA, is 180,000 ft2 and undergoing re-development now, with commissioning on the new critical load capacity expected to complete early next year.
The Dallas site has been operational since the 1980s. We assumed the responsibility for the data center pods in that building in Q4 2014 and brought on staff from that site to our team.
What is the greatest challenge of managing your current footprint?
Jenkins: There are several challenges, but communicating standards across the portfolio is a big one. Also, different municipalities have varying local codes and governmental regulations. We need to adapt our standards to the different regions.
For example, air quality control standards vary at different sites. We have to meet very high air quality standards in California, which means we adhere to very strict requirements for engine-generator runtimes and exhaust filter media. But in other locations, the regulations are less strict, and that variance impacts our maintenance schedules and parts procurement.
Sheputis: It may sound trivial to go from an area where air quality standards are high to one that is less stringent, but it still represents a change in our standards. If you’re going to do development, it’s probably best to start in California or somewhere with more restrictive standards and then go somewhere else. It would be very difficult to go the other way.
More generally, the Infomart merger was a big bite. It includes a lot of responsibility for non-data center space. So now we have two operating standards. We have over 500,000 ft2 of office-use real estate that uses the traditional break-fix operation model. We also have over two dozen data center suites with another 500,000 ft2 of mission critical space as well, where nothing breaks, or if it does, there can be no interruption of service. These different types of property have two different operations objectives and require different skill sets. Putting those varying levels of operations under one team expands the number of challenges you absorb. It pushes us from managing a few sites to a “many sites” level of complexity.
How do you benchmark performance goals?
Sheputis: I’m going to restrict my response to our mission critical space. When we start or assume control of a project, we have some pretty unforgiving standards. We want concurrent maintenance, industry-leading PUE, on time, on budget, and no injuries—and we want our project to meet critical load capacity and quality standards.
But picking up somebody else’s capital project after they‘ve already completed their design and begun the work, yet before they finished? That is the hardest thing in the world. The Dallas Infomart site is so big, there are two or three construction projects going on at any time. Show up any weekend, and you’ll somebody is doing a crane pick or has a helicopter delivering some equipment to be installed on the roof. It’s that big. It’s a damn good thing that we have great staff on site in Dallas and someone like Don Jenkins to make sure everything goes smoothly.
We hear a lot about data center operations staffing shortages. What has been your experience at Infomart?
Jenkins: Good help is hard to find anywhere. Data center skills are very specific. It’s a lot harder to find good data center people. One of the things we try to do is hire veterans. Over half our operating engineers have military backgrounds, including myself. We do this not just out of patriotism or to meet security concerns, but because we understand and appreciate the similarity of a mission critical operation and a military operation (see http://journal.uptimeinstitute.com/resolving-data-center-staffing-shortage/).
Sheputis: If you have high standards, there is always a shortage of people for any job. But the corollary for that is that if you’re known for doing your job very well, the best people often find you. Don deserves credit for building low turnover teams. Creating a culture of continuity requires more than strong technical skillsets, you have to begin recruiting the kinds of people who can play on a team.
Don uses this phrase a lot to describe the type he’s looking for—people who are capable of both leading and being led. He wants candidates with low egos who care about outcomes, strong ethics, and who want to learn. We invest heavily in our training program, and we are rigorous in finding people who buy into our process. We don’t want people who want to be heroes. The ideal candidate is a responsible team player with an aptitude for learning, and we fill in the technical gaps as necessary over time. No one has all the skills they need day one. Our training is industry leading. To date, we have had no voluntary turnover.
Jenkins: We do about 250 man-hours of training for each staff member. It’s not cheap, but we feel it’s necessary and the guys love it. They want to learn. They ask for it. Greater skill attainment is a win-win for them, our tenants, and us.
Sheputis: When you build a data center, you often meet the technically strongest people at either the beginning of the project during design or the end of the project during the commissioning phase. Every project we do is Level 5 Commissioned. That’s when you find and address all of the odd or unusual use cases that the manufacturer may not have anticipated. More than once, we have had a UPS troubleshooting specialist say to Don, “You guys do it right. Let me know when you have an opening in your organization.”
Jenkins: I think it’s a testament that shows how passionate we are about what we do.
Are you standardizing management practices across multiple sites?
Sheputis: When we had one or two sites, it wasn’t a challenge because we were copying from California to Oregon. But with three or more sites it becomes much more difficult. With the inclusion of Dallas and Ashburn, we have had to raise our game. It is tempting to say we do the same thing everywhere, but that would be unrealistic at best.
Broadly speaking, we have two families of standards: Content and Process. For functional content we have specs for staffing, maintenance, security, monitoring, and the like. We apply these with the knowledge that there will be local exceptions—such as different codes and different equipment choices. An operator from one site has to appreciate the deviations at the other sites. We also have process-based standards, and these are more meticulously applied across sites. While the OEM equipment may be different, shouldn’t the process for change management be consistent? Same goes for the problem management process. Compliance is another area where consistency is expected.
The challenge with projecting any standard is to efficiently create evidence of acceptance and verification. We try to create a working feedback loop, and we are always looking for ways to do it better. We can centrally document standard policies and procedures, but we rely on field acceptance of the standard, and we leverage our systems to measure execution versus expectation. We can say please complete work orders on time and to the following spec, and we can delegate scheduling to the field, but the loop isn’t complete until we confirm execution and offer feedback on whether the work and documentation were acceptable.
What technology or methodology has helped your organization to significantly improve data center management?
Jenkins: Our standard building management system BMS is a Niagara™ product with an open framework. This allows our legacy equipment to talk over open protocols. All of our dashboards and data look the same and feel the same across all of the sites so that anybody could pull up another site and it would look the same to the operator.
Sheputis: Whatever system you’re using, there has to be a high premium on keeping it open. If you run on a closed system, it eventually becomes a lost island. This is especially true as you scale your operation. You have to have open systems.
How does your organization use the M&O Stamp?
Sheputis: The M&O stamp is one of the most important things we have ever achieved. And I’m not saying this to flatter you or the Uptime Institute. We believe data center operations are very important, and we have always believed we were pretty good. But I have to believe that many operators think they do a good job as well. So who is right? How does anyone really know? The challenge to the casual observer is that the data center industry is fairly closed. Operations are secure and private.
We started the process to see how good we were, and if we were good, we also thought it would be great to have a credible third party to acknowledge that. Saying I think I’m good is one thing, having a credentialed organization like Uptime Institute say so is much more.
But the M&O process is more than the Stamp of Approval. Our operations have matured and improved by participating in this process. Every year we reassess and recertify we feel like we learn new things, and we’re tracking our progress. The bigger benefit may be that the process forces us to think procedurally. When we’re setting up a new site, it helps us set a roadmap for what we want to achieve. Compared to all other forms of certification, we get something out of this beyond the credential; we get a path to improve.
Jenkins: Lots of people run a SWOT (strengths, weaknesses, opportunities, and threats) analysis or internal audit, but that feedback often lacks external reference points. You can give yourself an audit, and you can say “we’re great.” But what are you learning? How do you expand your knowledge? The M&O Stamp of Approval provides learning opportunities for us by providing a neutral experienced outsider viewpoint on where, and more importantly, how we can do better.
On one of the assessments, one of Uptime Institute’s consultants demonstrated how we could setup our chiller plant so that an operator could see all the key variables easily at a glance, with fewer steps to see what valves are open or closed. The advice was practical and easy to implement. Including markers on a chain, little flags on a chiller, LED lights on a pump. Very simple things to do, but we hadn’t thought of it. They’d seen it in Europe, it was easy to do, and it helps. That’s one specific example, but we used the knowledge of the M&O team to help us grow.
We think the M&O criteria and content will get better and deeper as time goes on. This is a solid standard for people to grow on.
Sheputis: We are for certifications, as they remove doubt, but most of the work and value is had in obtaining the first certification. I can see why others are cynical about value and cost to recertify. But I do think there’s real value in the ongoing M&O certification, mainly because it shows continuous improvement. No other certification process does that.
Jenkins: A lot of certifications are binary in that you pass if you have enough checked boxes—the content is specific, but operationally shallow. We feel that we get a lot more content out of the M&O process.
Sheputis: As I said before, we are for compliance and transparency. As we are often fulfilling a compliance requirement for someone else, there is clear value is saying we are PCI compliant or SSAE certified. But the M&O Stamp of Approval process is more like seeing a professional instructor. All other certifications should address the M&O stamp as “Sir.”
Matt Stansberry is director of Content and Publications for the Uptime Institute and also serves as program director for the Uptime Institute Symposium, an annual spring event that brings together 1,500 stakeholders in enterprise IT, data center facilities, and corporate real estate to deal with the critical issues surrounding enterprise computing. He was formerly editorial director for Tech Target’s Data Center and Virtualization media group, and was managing editor of Today’s Facility Manager magazine. He has reported on the convergence of IT and Facilities for more than a decade.