Blog Multi Author - Uptime Institute Blog

Tweak to AWS Outposts reflects demand for greater cloud autonomy

December 7, 2022/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

Amazon Web Services (AWS) has made a minor change to its private-cloud appliance, AWS Outposts, that could significantly impact resiliency. The cloud provider has enabled local access to cloud administration, removing the appliance’s reliance on the public cloud. In the event of a network failure between the public cloud and the user’s data center, the private-cloud container platform can still be configured and maintained.

Many public-cloud providers have extended their offerings to now make their services accessible through the user’s own choice of data center. Services are typically billed in the same way as they are via the public cloud, and accessed through the same portal and software interfaces, but are now delivered from hardware and software hosted in the user’s own facility. Such services are in demand from customers seeking to meet compliance or data protection requirements, or to improve the end-user experience through lower latency.

In one business model, the cloud provider ships a server-storage private-cloud appliance to an organization’s data center. The organization manages the data center. The public-cloud provider is responsible for the hardware and middleware that delivers the cloud functionality.

The term “private cloud” describes a cloud platform where the user has access to elements of the platform not usually accessible in the public cloud (such as the data center facility, hardware and middleware). These appliances are a particular type of private cloud, not designed to be operated independently of the public cloud. They are best thought of as extensions of the public cloud to the on-premises data center (or colocation facility) since administration and software maintenance is performed via the public cloud.

As the public and private cloud use the same platform and application programming interfaces (APIs), applications can be built across the organization’s and the cloud provider’s data centers, and the platform can be managed as one. For more information on private-cloud appliances (see the Uptime Institute Intelligence report Cloud scalability and resiliency from first principles).

The resilience of this architecture has not, hitherto, been assured because the application still relies on the cloud provider’s ability to manage some services, such as the management interface. The public-cloud provider controls the interface for interacting with the user’s on-premises cloud (the “control plane”); if that interface goes down, so too does the ability to administrate the on-premises cloud.

Ironically, it is precisely during an outage that an administrator is most likely to want to make such changes to configuration — to reserve capacity for mission-critical workloads or to reprioritize applications to handle the loss of public-cloud capacity, for example. If an AWS Outpost appliance were being used in a factory to support manufacturing machinery, for instance, the inability to configure local capabilities during a network failure could significantly affect production.

It is for this reason that AWS’s announcement that its Elastic Kubernetes Service product (Amazon EKS) can be managed locally on AWS Outposts is important. Kubernetes is a platform used to manage containers. This new capability allows users to configure API endpoints on the AWS Outposts appliance, meaning the container configuration can be changed via the local network without connecting to the public cloud.

In practical terms, this addition makes AWS Outposts more resilient to outages because it can function in the event of a connectivity failure between the cloud provider and the data center. AWS Outposts is now far more feasible as a disaster-recovery or failover location, and more appropriate for edge locations, where connectivity might be less assured.

The most important aspect of this development, however, is that it indicates AWS — the largest cloud provider — is perhaps acknowledging that users don’t just want an extension of the public cloud to their own facilities. Although many organizations are pursuing a hybrid-cloud approach, where public and private cloud platforms can work together, they don’t want to sacrifice the autonomy of each of those environments.

Organizations want venues to work independently of each other if required, avoiding single points of failure. To address this desire, other AWS Outposts services may be made locally configurable over time as users demand autonomy and greater control over their cloud applications.

Why are governments investigating cloud competitiveness?

November 30, 2022/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

In any market, fewer sellers or providers typically results in less choice for buyers. Where the number of sellers is very low this could, theoretically, lead to exploitation, through higher prices or lower-quality goods and services — with buyers having no choice but to accept such terms.

Three hyperscale cloud providers — Amazon Web Services, Google Cloud and Microsoft Azure — have become dominant throughout most of the world. This has triggered investigations by some governments to check that limited competition is not impacting customers.

The UK government’s Office of Communications’ (Ofcom’s) Cloud services market study is intended to investigate the role played by these “cloud provider hyperscalers” in the country’s £15 billion public cloud services market. Ofcom’s objective, specifically, is to understand the strength of competition in the market and to investigate whether the dominance of these hyperscalers is limiting growth and innovation.

Although there is a debate about the cost and strategic implications of moving core workloads to the cloud, competition among cloud provider hyperscalers, so far, seems to be good for users: recent inflation-driven increases notwithstanding, prices have generally decreased (across all providers) over the past few years. Apart from the hyperscalers, users can procure cloud services from local providers (and established brands), colocation providers and private cloud vendors. The cloud provider hyperscalers continue to develop innovative products, sold for pennies per hour through the pay-as-you-go pricing model and accessible to anyone with a credit card.

However, Ofcom is concerned. It cites research from Synergy Research Group showing that the combined market share of the hyperscalers is growing at the expense of smaller providers (at a rate of 3% per year) with the hyperscalers’ UK market share now standing at over 80%. As discussed in Uptime Institute Intelligence’s Cloud scalability and resiliency from first principles report, vendor lock-in can make it harder for users to change cloud providers to find a better deal.

The Herfindahl-Hirschman Index (HHI) is commonly used to assess market competitiveness on the basis of market share. A market with an HHI of over 2,500 suggests a limited number of companies have significant power to control market prices — a “high concentration.” The UK cloud services market is estimated to have an HHI of over 2,900. Given the global HHI of 1,600 for this sector, the UK’s high value validates the need for the Ofcom investigation.

Such a high market concentration isn’t necessarily a problem, however, if competing companies keep prices low while offering innovative products and services to a large population. A high concentration is only problematic if the cloud providers are in a stalemate (or worse, in collusion) — not cutting prices, not releasing new products, and not fighting to win each other’s customers. UK law prevents cloud providers from colluding to fix prices or restrict competition. But with so few competitors, such anti-competitive behavior might emerge accidentally (although there are few — if any — signs of such a stalemate so far).

The most intriguing part of Ofcom’s study will be its recommendations on how to make the market more competitive. Unless Ofcom can find evidence of anti-competitive behavior, there may be very little it can do to help smaller players compete, apart from limiting the hyperscalers’ ambitions, through regulation or divestiture. Outward signs are that cloud providers have come to dominate the market by providing users with the services they expect, at a price they’re willing to pay, rather than through any nefarious means.

Hyperscale cloud providers require colossal capital, substantial and cutting-edge expertise, and global-scale efficiency investments — all of which means they can cut prices, over time, while expanding into new markets and releasing new products. The hyperscalers themselves have not created the significant barrier to entry faced by smaller players in attempting to compete here: that barrier exists because of the sheer scale of operations fundamental to cloud computing’s raison d’etre.

In most countries, competition authorities — or governments generally — have limited ability to help smaller providers overcome this barrier, whether through investment or support. In the case of the UK, Ofcom’s only option is to restrict the dominance of the hyperscalers.

One option open to competition authorities would be regulating cloud prices by setting price caps, or by forcing providers to pass on cost savings. But price regulation only makes sense if prices are going up, and if users have no other alternatives. Many users of cloud services have seen prices come down: and they are, in any case, at liberty to use noncloud infrastructure if providers are not delivering good value.

Ofcom (and other regulators) could, alternatively, enforce the divestment of hyperscalers’ assets. But breaking up a cloud provider on the basis of the products and services offered would penalize those users looking for integrated services from a single source. It would also be an extremely bold and highly controversial step that the UK government would be unlikely to undertake without wider political consensus. In the US, there is a bipartisan support for an investigation into tech giant market power, which could provide that impetus.

Regulators could also legislate to force suppliers to offer greater support in migrating services between cloud providers: but this could stifle innovation, with providers unable to develop differentiated features that might not work elsewhere. Theoretically, a government could even nationalize a major cloud provider (although this is highly unlikely).

Given the high concentration of this market, Ofcom’s interest in conducting an investigation is understandable: while there is limited evidence to date, there could, be anti-competitive factors at play that are not immediately obvious to customers. Ofcom’s study may well not uncover many competitive concerns at the moment but it might, equally, focus attention on the nation’s over-reliance on a limited number of cloud providers in the years ahead.

In this Note, we have focused purely on AWS’, Google’s and Microsoft’s cloud infrastructure businesses (Amazon Web Services, Google Cloud and Microsoft Azure). But these tech giants also provide many other products and services in many markets, each of which has different levels of competitiveness.

Microsoft, for example, has recently been pressured into making changes to its software licensing terms following complaints from EU regulators and European cloud providers (including Aruba, NextCloud and OVHcloud). These regulators and cloud providers argue that Microsoft has an unfair advantage in delivering cloud services (via its Azure cloud), given it owns the underlying operating system. Microsoft, they claim, could potentially price its cloud competitors out of the market by increasing its software licensing fees.

As their market power continues to increase, these tech giants will continue to face anti-competitive regulation and lawsuits in some, or many, of these markets. In the UK, how far Ofcom will investigate the hyperscalers’ impact in particular subsectors, such as retail, mobile, operating systems and internet search is yet to be seen.

Users unprepared for inevitable cloud outages

November 23, 2022/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

Organizations are becoming more confident in using the cloud for mission-critical workloads — partly due to a perception of improved visibility into operational resiliency. But many users aren’t taking basic steps to ensure their mission-critical applications can endure relatively frequent availability zone outages.

Data from the 2022 Uptime Institute annual survey reflects this growing confidence in public cloud. The proportion of respondents not placing mission-critical workloads into a public cloud has now dropped from 74% (2019) to 63% (2022), while those saying they have adequate visibility into the resiliency of public-cloud services has risen from 14% to 21%.

However, other survey data suggests cloud users’ confidence may be misplaced. Cloud providers recommend that users distribute their workloads across multiple availability zones. An availability zone is a logical data center, often understood to have redundant and separate power and networking. Cloud providers make explicitly clear that zones will suffer outages occasionally — their position being that users must architect their applications to handle the loss of an availability zone.

Zone outages are relatively common: 35% of respondents said the loss of an availability zone would result in significant performance issues. Only 16% of those surveyed said that the loss of an availability zone would have no impact on their cloud applications (see Figure 1).

diagram: Many cloud applications vulnerable to availability zone outages — **Figure 1. Many cloud applications vulnerable to availability zone outages**

This presents a clear contradiction. Users appear to be more confident that the public cloud can handle mission-critical workloads, yet over a third of users are architecting applications vulnerable to relatively common availability zone outages. This contradiction is due to a lack of clarity on the respective roles and responsibilities of provider and user.

Who is at fault if an application goes down as a result of a single availability zone outage? This data point would appear to reflect the lack of clarity on roles and responsibilities: half of respondents to Uptime’s annual survey believe this to be primarily the cloud provider’s fault, while the other half believe responsibility lies with the user in having failed to architect the application to avoid such downtime.

The provider is, of course, responsible for the operational resiliency of its data centers. But cloud providers neither state nor guarantee that availability zones will be highly available. On which basis, why do users assume that a single availability zone will provide the resiliency their application requires?

This misunderstanding might, at least in part, be due to the simplistic view that the cloud is just someone else’s computer, in someone else’s data center: but this is not the case. A cloud service is a complex combination of data center, hardware, software and people. Services will fail from time to time due to unexpected behavior arising from the complexity of interacting systems, and people.

Accordingly, organizations that want to achieve high availability in the cloud must architect their applications to endure frequent outages of single availability zones. Lifting and shifting an application from an on-premises server to a cloud virtual machine might reduce resiliency if the application is not rearchitected to work across cloud zones.

As cloud adoption increases, the impact of outages is likely to grow as a significantly higher number of organizations rely on cloud computing for their applications. While many will architect their applications to weather occasional outages, many are not yet fully prepared for inevitable cloud service failures and the subsequent impact on their applications.

Will high energy prices push operators to direct liquid cooling?

November 16, 2022/in Design, Executive, Operations/by Lenny Simon, Senior Research Associate, Uptime Institute

The data center industry and other large power-consuming industries continue to feel pressure from skyrocketing electricity prices. In Germany and France, wholesale energy prices this August increased six-fold compared to prices from 2021. The US has fared better, but wholesale electricity prices have doubled this summer compared with last year’s prices.

While leased data center operators can typically pass on these higher energy costs to tenants, many IT service providers, such as web-hosting platforms and cloud data center operators, have seen their profits erode. High energy prices contributed to the bankruptcy of the UK division of colocation and cloud provider Sungard Availability Services in March 2022, followed by a bankruptcy filing for its US and Canadian operations in April.

A positive side effect of historically high energy prices is that investments in efficiency become more attractive. Industry-wide, power usage effectiveness (PUE) has been largely stagnant in recent years and cooling remains the largest source of inefficiency (see the Uptime Institute Global Data Center Survey 2022).

Direct liquid cooling (DLC) of IT hardware, while still relatively niche, can deliver significant energy savings for digital infrastructure. Even before the latest spikes in power costs, energy savings were already the top attraction for operators considering DLC. Uptime Institute’s Direct Liquid Cooling Survey, conducted early in 2022, shows that two-thirds of enterprise respondents think cost savings are the key factor for organizations considering a switch to DLC (see Figure 1).

**Figure 1 Energy savings and sustainability are top DLC drivers**

This is a potential shift in the adoption dynamics for DLC: for early adopters, high rack density was the major catalyst in moving away from air-cooled systems. The recent spikes in energy prices, however, may push an even higher proportion of operators to consider DLC as a means to reduce energy costs.

DLC enables efficiency gains for both the data center facility and the IT hardware. For facility cooling, DLC offers the potential to use less energy for mechanical refrigeration — or in some cases none, depending on implementation and local climate conditions. It also substantially lowers the volume of air that needs to be moved around the data hall, thereby reducing energy consumption from air handlers.

There are further efficiency gains to be made from powering IT hardware, through the elimination of server fans and the potential for lower static power losses in the silicon, by significantly reducing IT operating temperature.

These savings in IT power are nontrivial and difficult to quantify, but models estimate that they can be considerable — ranging from 10% to 20% of total IT power. Yet, despite the energy and cost savings associated with DLC, there are some key barriers to adoption that allow air cooling to dominate:

A lack of standardization for existing DLC technologies.
Concerns over coolant leaks and material compatibility, which limit the speed of DLC adoption.
Retrofitting existing data centers with DLC may not be economically sound unless the facility already uses a chilled water loop.
Racks need to be densified (typically above 20 kilowatts per rack) for DLC to be economically viable.

Sustainability is an additional key factor that is likely to drive DLC adoption this decade. Energy savings translate into reductions in Scope 2 emissions (from purchased, off-site electricity), which is a major focus for companies seeking to improve their sustainability credentials.

The combination of this commitment to sustainability and historically high energy prices for the foreseeable, means data center operators have an unprecedented and powerful incentive to improve their infrastructure efficiency, strengthening the business case for a shift to DLC.

AWS price cuts: is serverless gaining momentum?

November 2, 2022/in Executive, Operations/by Dr. Owen Rogers, Research Director for Cloud Computing, Uptime Institute, [email protected]

A serverless platform is an abstracted cloud computing service that executes a user’s code without the user needing to provision the underlying server or operating environment. The physical server, resources and operating environment used to execute the user’s code are managed by the cloud provider and are not accessible to the user (hence “serverless”). In July, Amazon Web Services (AWS) announced a price cut to its serverless platform, Lambda, but only for high levels of consumption.

Why would AWS make this price cut? In a serverless computing service, developers upload code via a graphical user interface or application programming interface (API). The user defines a trigger that executes this code, such as an API call sent from an application, a timer or an event on another cloud service. When triggered, the serverless platform assigns resources to the code, executes it and returns any result.

What differentiates serverless from platform as a service is that the user is only billed for the precise period the platform runs the algorithm. Aside from the cost of using persistent cloud storage, there is no ongoing expense for a serverless application when dormant, making it economically attractive for “bursty” workloads that exist for short periods of time when demand necessitates.

Serverless computing is a relatively recent innovation in IT. AWS pioneered the serverless cloud model (also called function as a service) with the launch of Lambda in 2014. Usage of Lambda is billed to the user using two metrics: the number of transactions and the total amount of memory consumed by code execution for the period executed, expressed in gigabyte (GB)-seconds.

AWS has announced a 10% to 20% discount for monthly consumption of Lambda of over six billion GB-seconds. To achieve this discount, users would have to consume a substantial amount in practice. No price cuts have been announced below the six billion GB-second threshold.

The price cut is unlikely to be related to market price pressure or internal cost reductions. The driver is likely to be that some organizations have grown their serverless consumption to a cost-prohibitive point. To reduce this barrier, AWS has opted to take a hit on gross margins for larger consumers in the belief that their sales volume will increase to offset the loss.

An organization’s cloud consumption is more likely to rise than fall over time. Most would prefer that their cloud applications grow to meet demand, rather than lock down scalability to save costs. Cost efficiency is rarely at the forefront of application development and operations teams. Many Lambda users will likely see increased usage and increased bills but this isn’t necessarily a problem if it translates into business benefits.

Analysis of the price cuts suggests that some users are consuming significant amounts. AWS has announced a 10% discount on the monthly execution costs of consumption between six billion and 15 billion GB-seconds and 20% for greater than 20 billion GB-seconds. Six billion GB-seconds is a considerable capacity: a user would have to consume the equivalent a of 2.3 terabytes (TB) of memory for an entire month to obtain a 10% discount and 5.7 TB to receive a 20% discount. The latter is the equivalent of 178 regular cloud instances, each configured with 32 GB of memory.

These figures demonstrate the significant scale of serverless being adopted by some organizations. Considering AWS has rebuilt its pricing model to reduce costs for these users, the number of organizations consuming such large amounts cannot be trivial. This price drop matters because it makes the serverless model more economical and attractive for use at larger scale, beyond just simple functionality. It signals wider adoption of serverless computing to the market, validating the emerging computing model’s viability. Highly bursty workloads architected to take advantage of serverless platforms are also likely to be more resource-efficient than even modern, containerised software, let alone traditional, nonscalable applications.

EU’s EED recast set to create reporting challenges

October 26, 2022/in Executive, Operations/by Andy Lawrence, Executive Director of Research, Uptime Institute, [email protected]

The European Commission’s (EC’s) proposed recast of its Energy Efficiency Directive (EED) sets out new and strict reporting requirements for data centers operating in the EU. If passed, data centers with 100 kilowatts or more total installed IT power demand (from server, storage and network equipment) will have to report their energy performance every year, including details on data traffic, quantity of data stored, water use, energy consumption, heat re-use and power utilization (see Table 1).

**Table 1 Reporting requirements of proposed EED recast**

These reporting requirements raise several concerns for data centers. One concern is that some of the information is simply difficult to collect — at least for some. Most colocation operators do not currently have control or insight into the data traffic, storage and processing being performed on their customers’ IT equipment. For example, it will be challenging for a large retail colocation data center to collect, normalize and aggregate data from tens or hundreds of IT operators with different data collection and management systems into a coherent, accurate and standardized report.

Some parties have also raised concerns about the security risks associated with publicly reporting the names of owners, addresses and other details of data centers — information that is particularly sensitive for financial institutions. At present, it is relatively easy to find the location of data centers, but far more difficult to find details of the owners and operators of those data centers. Other parties are concerned about the administrative and financial burdens imposed on smaller operators.

While some data center operators have welcomed further transparency regarding their energy use, but argue against some of the proposed metrics. Feedback from DigitalEurope, a trade association representing the digital technology industry in Europe, notes that data traffic, processing and storage is unrelated to data center sustainability, as well as being commercially sensitive information. Apart from which, servers account for most data center energy use, yet the new EED makes no attempt to gather indicative data on server power efficiency. This is a missed opportunity to tackle the single largest cause of energy inefficiency in data centers.

As part of the proposed EED recast, the EC is planning mandatory annual reporting of key performance indicators (KPIs). The aggregated data will be used to develop sustainability indicators based on energy efficiency, use of renewable energy, water usage and waste heat utilization (see Table 1). These indicators will be used to define and rate a data center in terms of sustainability.

The EC hopes that by creating a register of energy use based on performance indicator reporting, opportunities for the data center industry to reduce energy consumption and increase efficiency can be identified and encouraged. It also hopes that future sustainability ratings developed from the KPIs will help to provide transparency of and accountability for data center carbon footprints. In time, stakeholders may come to expect this data from operators when evaluating business decisions in the EU.

The stakes are particularly high when it comes to defining renewable energy usage in terms of these KPIs. The legislation is currently unclear as to how the use of carbon offsets (such as guarantees of origin or renewable energy certificates) will be treated in audits or in developing sustainability ratings. Ideally, the ratings should assess the direct use of renewable and zero-carbon energy (as supplied throughout a grid region or through power purchase agreements) to accurately depict a data center’s energy use and carbon footprint. Without greater clarity, the impact of the proposed reporting requirements may instead depend on how the proposed legislation is interpreted by governments or operators.

For more information, see our recent report Critical regulation: the EU Energy Efficiency Directive recast.

Will high energy prices push operators to direct liquid cooling?

EU’s EED recast set to create reporting challenges

Explaining the Uptime Institute’s Tier Classification System (April 2021 Update)

The Making of a Good Method of Procedure

A Look at Data Center Cooling Technologies

Data Center Cooling: CRAC/CRAH redundancy, capacity, and selection metrics

Implementing Data Center Cooling Best Practices