Why are governments investigating cloud competitiveness?

Why are governments investigating cloud competitiveness?

In any market, fewer sellers or providers typically results in less choice for buyers. Where the number of sellers is very low this could, theoretically, lead to exploitation, through higher prices or lower-quality goods and services — with buyers having no choice but to accept such terms.

Three hyperscale cloud providers — Amazon Web Services, Google Cloud and Microsoft Azure — have become dominant throughout most of the world. This has triggered investigations by some governments to check that limited competition is not impacting customers.

The UK government’s Office of Communications’ (Ofcom’s) Cloud services market study is intended to investigate the role played by these “cloud provider hyperscalers” in the country’s £15 billion public cloud services market. Ofcom’s objective, specifically, is to understand the strength of competition in the market and to investigate whether the dominance of these hyperscalers is limiting growth and innovation.

Although there is a debate about the cost and strategic implications of moving core workloads to the cloud, competition among cloud provider hyperscalers, so far, seems to be good for users: recent inflation-driven increases notwithstanding, prices have generally decreased (across all providers) over the past few years. Apart from the hyperscalers, users can procure cloud services from local providers (and established brands), colocation providers and private cloud vendors. The cloud provider hyperscalers continue to develop innovative products, sold for pennies per hour through the pay-as-you-go pricing model and accessible to anyone with a credit card.

However, Ofcom is concerned. It cites research from Synergy Research Group showing that the combined market share of the hyperscalers is growing at the expense of smaller providers (at a rate of 3% per year) with the hyperscalers’ UK market share now standing at over 80%. As discussed in Uptime Institute Intelligence’s Cloud scalability and resiliency from first principles report, vendor lock-in can make it harder for users to change cloud providers to find a better deal.

The Herfindahl-Hirschman Index (HHI) is commonly used to assess market competitiveness on the basis of market share. A market with an HHI of over 2,500 suggests a limited number of companies have significant power to control market prices — a “high concentration.” The UK cloud services market is estimated to have an HHI of over 2,900. Given the global HHI of 1,600 for this sector, the UK’s high value validates the need for the Ofcom investigation.

Such a high market concentration isn’t necessarily a problem, however, if competing companies keep prices low while offering innovative products and services to a large population. A high concentration is only problematic if the cloud providers are in a stalemate (or worse, in collusion) — not cutting prices, not releasing new products, and not fighting to win each other’s customers. UK law prevents cloud providers from colluding to fix prices or restrict competition. But with so few competitors, such anti-competitive behavior might emerge accidentally (although there are few — if any — signs of such a stalemate so far).

The most intriguing part of Ofcom’s study will be its recommendations on how to make the market more competitive. Unless Ofcom can find evidence of anti-competitive behavior, there may be very little it can do to help smaller players compete, apart from limiting the hyperscalers’ ambitions, through regulation or divestiture. Outward signs are that cloud providers have come to dominate the market by providing users with the services they expect, at a price they’re willing to pay, rather than through any nefarious means.

Hyperscale cloud providers require colossal capital, substantial and cutting-edge expertise, and global-scale efficiency investments — all of which means they can cut prices, over time, while expanding into new markets and releasing new products. The hyperscalers themselves have not created the significant barrier to entry faced by smaller players in attempting to compete here: that barrier exists because of the sheer scale of operations fundamental to cloud computing’s raison d’etre.

In most countries, competition authorities — or governments generally — have limited ability to help smaller providers overcome this barrier, whether through investment or support. In the case of the UK, Ofcom’s only option is to restrict the dominance of the hyperscalers.

One option open to competition authorities would be regulating cloud prices by setting price caps, or by forcing providers to pass on cost savings. But price regulation only makes sense if prices are going up, and if users have no other alternatives. Many users of cloud services have seen prices come down: and they are, in any case, at liberty to use noncloud infrastructure if providers are not delivering good value.

Ofcom (and other regulators) could, alternatively, enforce the divestment of hyperscalers’ assets. But breaking up a cloud provider on the basis of the products and services offered would penalize those users looking for integrated services from a single source. It would also be an extremely bold and highly controversial step that the UK government would be unlikely to undertake without wider political consensus. In the US, there is a bipartisan support for an investigation into tech giant market power, which could provide that impetus.

Regulators could also legislate to force suppliers to offer greater support in migrating services between cloud providers: but this could stifle innovation, with providers unable to develop differentiated features that might not work elsewhere. Theoretically, a government could even nationalize a major cloud provider (although this is highly unlikely).

Given the high concentration of this market, Ofcom’s interest in conducting an investigation is understandable: while there is limited evidence to date, there could, be anti-competitive factors at play that are not immediately obvious to customers. Ofcom’s study may well not uncover many competitive concerns at the moment but it might, equally, focus attention on the nation’s over-reliance on a limited number of cloud providers in the years ahead.

In this Note, we have focused purely on AWS’, Google’s and Microsoft’s cloud infrastructure businesses (Amazon Web Services, Google Cloud and Microsoft Azure). But these tech giants also provide many other products and services in many markets, each of which has different levels of competitiveness.

Microsoft, for example, has recently been pressured into making changes to its software licensing terms following complaints from EU regulators and European cloud providers (including Aruba, NextCloud and OVHcloud). These regulators and cloud providers argue that Microsoft has an unfair advantage in delivering cloud services (via its Azure cloud), given it owns the underlying operating system. Microsoft, they claim, could potentially price its cloud competitors out of the market by increasing its software licensing fees.

As their market power continues to increase, these tech giants will continue to face anti-competitive regulation and lawsuits in some, or many, of these markets. In the UK, how far Ofcom will investigate the hyperscalers’ impact in particular subsectors, such as retail, mobile, operating systems and internet search is yet to be seen.

Users unprepared for inevitable cloud outages

Users unprepared for inevitable cloud outages

Organizations are becoming more confident in using the cloud for mission-critical workloads — partly due to a perception of improved visibility into operational resiliency. But many users aren’t taking basic steps to ensure their mission-critical applications can endure relatively frequent availability zone outages.

Data from the 2022 Uptime Institute annual survey reflects this growing confidence in public cloud. The proportion of respondents not placing mission-critical workloads into a public cloud has now dropped from 74% (2019) to 63% (2022), while those saying they have adequate visibility into the resiliency of public-cloud services has risen from 14% to 21%.

However, other survey data suggests cloud users’ confidence may be misplaced. Cloud providers recommend that users distribute their workloads across multiple availability zones. An availability zone is a logical data center, often understood to have redundant and separate power and networking. Cloud providers make explicitly clear that zones will suffer outages occasionally — their position being that users must architect their applications to handle the loss of an availability zone.

Zone outages are relatively common: 35% of respondents said the loss of an availability zone would result in significant performance issues. Only 16% of those surveyed said that the loss of an availability zone would have no impact on their cloud applications (see Figure 1).

diagram: Many cloud applications vulnerable to availability zone outages
Figure 1. Many cloud applications vulnerable to availability zone outages

This presents a clear contradiction. Users appear to be more confident that the public cloud can handle mission-critical workloads, yet over a third of users are architecting applications vulnerable to relatively common availability zone outages. This contradiction is due to a lack of clarity on the respective roles and responsibilities of provider and user.

Who is at fault if an application goes down as a result of a single availability zone outage? This data point would appear to reflect the lack of clarity on roles and responsibilities: half of respondents to Uptime’s annual survey believe this to be primarily the cloud provider’s fault, while the other half believe responsibility lies with the user in having failed to architect the application to avoid such downtime.

The provider is, of course, responsible for the operational resiliency of its data centers. But cloud providers neither state nor guarantee that availability zones will be highly available. On which basis, why do users assume that a single availability zone will provide the resiliency their application requires?

This misunderstanding might, at least in part, be due to the simplistic view that the cloud is just someone else’s computer, in someone else’s data center: but this is not the case. A cloud service is a complex combination of data center, hardware, software and people. Services will fail from time to time due to unexpected behavior arising from the complexity of interacting systems, and people.

Accordingly, organizations that want to achieve high availability in the cloud must architect their applications to endure frequent outages of single availability zones. Lifting and shifting an application from an on-premises server to a cloud virtual machine might reduce resiliency if the application is not rearchitected to work across cloud zones.

As cloud adoption increases, the impact of outages is likely to grow as a significantly higher number of organizations rely on cloud computing for their applications. While many will architect their applications to weather occasional outages, many are not yet fully prepared for inevitable cloud service failures and the subsequent impact on their applications.

Rising energy prices

Will high energy prices push operators to direct liquid cooling?

The data center industry and other large power-consuming industries continue to feel pressure from skyrocketing electricity prices. In Germany and France, wholesale energy prices this August increased six-fold compared to prices from 2021. The US has fared better, but wholesale electricity prices have doubled this summer compared with last year’s prices.

While leased data center operators can typically pass on these higher energy costs to tenants, many IT service providers, such as web-hosting platforms and cloud data center operators, have seen their profits erode. High energy prices contributed to the bankruptcy of the UK division of colocation and cloud provider Sungard Availability Services in March 2022, followed by a bankruptcy filing for its US and Canadian operations in April.

A positive side effect of historically high energy prices is that investments in efficiency become more attractive. Industry-wide, power usage effectiveness (PUE) has been largely stagnant in recent years and cooling remains the largest source of inefficiency (see the Uptime Institute Global Data Center Survey 2022).

Direct liquid cooling (DLC) of IT hardware, while still relatively niche, can deliver significant energy savings for digital infrastructure. Even before the latest spikes in power costs, energy savings were already the top attraction for operators considering DLC. Uptime Institute’s Direct Liquid Cooling Survey, conducted early in 2022, shows that two-thirds of enterprise respondents think cost savings are the key factor for organizations considering a switch to DLC (see Figure 1).

Figure 1 Energy savings and sustainability are top DLC drivers
Figure 1 Energy savings and sustainability are top DLC drivers

This is a potential shift in the adoption dynamics for DLC: for early adopters, high rack density was the major catalyst in moving away from air-cooled systems. The recent spikes in energy prices, however, may push an even higher proportion of operators to consider DLC as a means to reduce energy costs.

DLC enables efficiency gains for both the data center facility and the IT hardware. For facility cooling, DLC offers the potential to use less energy for mechanical refrigeration — or in some cases none, depending on implementation and local climate conditions. It also substantially lowers the volume of air that needs to be moved around the data hall, thereby reducing energy consumption from air handlers.

There are further efficiency gains to be made from powering IT hardware, through the elimination of server fans and the potential for lower static power losses in the silicon, by significantly reducing IT operating temperature.

These savings in IT power are nontrivial and difficult to quantify, but models estimate that they can be considerable — ranging from 10% to 20% of total IT power. Yet, despite the energy and cost savings associated with DLC, there are some key barriers to adoption that allow air cooling to dominate:

  • A lack of standardization for existing DLC technologies.
  • Concerns over coolant leaks and material compatibility, which limit the speed of DLC adoption.
  • Retrofitting existing data centers with DLC may not be economically sound unless the facility already uses a chilled water loop.
  • Racks need to be densified (typically above 20 kilowatts per rack) for DLC to be economically viable.

Sustainability is an additional key factor that is likely to drive DLC adoption this decade. Energy savings translate into reductions in Scope 2 emissions (from purchased, off-site electricity), which is a major focus for companies seeking to improve their sustainability credentials.

The combination of this commitment to sustainability and historically high energy prices for the foreseeable, means data center operators have an unprecedented and powerful incentive to improve their infrastructure efficiency, strengthening the business case for a shift to DLC.

AWS price cuts: is serverless gaining momentum?

AWS price cuts: is serverless gaining momentum?

A serverless platform is an abstracted cloud computing service that executes a user’s code without the user needing to provision the underlying server or operating environment. The physical server, resources and operating environment used to execute the user’s code are managed by the cloud provider and are not accessible to the user (hence “serverless”). In July, Amazon Web Services (AWS) announced a price cut to its serverless platform, Lambda, but only for high levels of consumption.

Why would AWS make this price cut? In a serverless computing service, developers upload code via a graphical user interface or application programming interface (API). The user defines a trigger that executes this code, such as an API call sent from an application, a timer or an event on another cloud service. When triggered, the serverless platform assigns resources to the code, executes it and returns any result.

What differentiates serverless from platform as a service is that the user is only billed for the precise period the platform runs the algorithm. Aside from the cost of using persistent cloud storage, there is no ongoing expense for a serverless application when dormant, making it economically attractive for “bursty” workloads that exist for short periods of time when demand necessitates.

Serverless computing is a relatively recent innovation in IT. AWS pioneered the serverless cloud model (also called function as a service) with the launch of Lambda in 2014. Usage of Lambda is billed to the user using two metrics: the number of transactions and the total amount of memory consumed by code execution for the period executed, expressed in gigabyte (GB)-seconds.

AWS has announced a 10% to 20% discount for monthly consumption of Lambda of over six billion GB-seconds. To achieve this discount, users would have to consume a substantial amount in practice. No price cuts have been announced below the six billion GB-second threshold.

The price cut is unlikely to be related to market price pressure or internal cost reductions. The driver is likely to be that some organizations have grown their serverless consumption to a cost-prohibitive point. To reduce this barrier, AWS has opted to take a hit on gross margins for larger consumers in the belief that their sales volume will increase to offset the loss.

An organization’s cloud consumption is more likely to rise than fall over time. Most would prefer that their cloud applications grow to meet demand, rather than lock down scalability to save costs. Cost efficiency is rarely at the forefront of application development and operations teams. Many Lambda users will likely see increased usage and increased bills but this isn’t necessarily a problem if it translates into business benefits.

Analysis of the price cuts suggests that some users are consuming significant amounts. AWS has announced a 10% discount on the monthly execution costs of consumption between six billion and 15 billion GB-seconds and 20% for greater than 20 billion GB-seconds. Six billion GB-seconds is a considerable capacity: a user would have to consume the equivalent a of 2.3 terabytes (TB) of memory for an entire month to obtain a 10% discount and 5.7 TB to receive a 20% discount. The latter is the equivalent of 178 regular cloud instances, each configured with 32 GB of memory.

These figures demonstrate the significant scale of serverless being adopted by some organizations. Considering AWS has rebuilt its pricing model to reduce costs for these users, the number of organizations consuming such large amounts cannot be trivial. This price drop matters because it makes the serverless model more economical and attractive for use at larger scale, beyond just simple functionality. It signals wider adoption of serverless computing to the market, validating the emerging computing model’s viability. Highly bursty workloads architected to take advantage of serverless platforms are also likely to be more resource-efficient than even modern, containerised software, let alone traditional, nonscalable applications.

EU’s EED recast set to create reporting challenges

EU’s EED recast set to create reporting challenges

The European Commission’s (EC’s) proposed recast of its Energy Efficiency Directive (EED) sets out new and strict reporting requirements for data centers operating in the EU. If passed, data centers with 100 kilowatts or more total installed IT power demand (from server, storage and network equipment) will have to report their energy performance every year, including details on data traffic, quantity of data stored, water use, energy consumption, heat re-use and power utilization (see Table 1).

Table 1 Reporting requirements of proposed EED recast
Table 1 Reporting requirements of proposed EED recast

These reporting requirements raise several concerns for data centers. One concern is that some of the information is simply difficult to collect — at least for some. Most colocation operators do not currently have control or insight into the data traffic, storage and processing being performed on their customers’ IT equipment. For example, it will be challenging for a large retail colocation data center to collect, normalize and aggregate data from tens or hundreds of IT operators with different data collection and management systems into a coherent, accurate and standardized report.

Some parties have also raised concerns about the security risks associated with publicly reporting the names of owners, addresses and other details of data centers — information that is particularly sensitive for financial institutions. At present, it is relatively easy to find the location of data centers, but far more difficult to find details of the owners and operators of those data centers. Other parties are concerned about the administrative and financial burdens imposed on smaller operators.

While some data center operators have welcomed further transparency regarding their energy use, but argue against some of the proposed metrics. Feedback from DigitalEurope, a trade association representing the digital technology industry in Europe, notes that data traffic, processing and storage is unrelated to data center sustainability, as well as being commercially sensitive information. Apart from which, servers account for most data center energy use, yet the new EED makes no attempt to gather indicative data on server power efficiency. This is a missed opportunity to tackle the single largest cause of energy inefficiency in data centers.

As part of the proposed EED recast, the EC is planning mandatory annual reporting of key performance indicators (KPIs). The aggregated data will be used to develop sustainability indicators based on energy efficiency, use of renewable energy, water usage and waste heat utilization (see Table 1). These indicators will be used to define and rate a data center in terms of sustainability.

The EC hopes that by creating a register of energy use based on performance indicator reporting, opportunities for the data center industry to reduce energy consumption and increase efficiency can be identified and encouraged. It also hopes that future sustainability ratings developed from the KPIs will help to provide transparency of and accountability for data center carbon footprints. In time, stakeholders may come to expect this data from operators when evaluating business decisions in the EU.

The stakes are particularly high when it comes to defining renewable energy usage in terms of these KPIs. The legislation is currently unclear as to how the use of carbon offsets (such as guarantees of origin or renewable energy certificates) will be treated in audits or in developing sustainability ratings. Ideally, the ratings should assess the direct use of renewable and zero-carbon energy (as supplied throughout a grid region or through power purchase agreements) to accurately depict a data center’s energy use and carbon footprint. Without greater clarity, the impact of the proposed reporting requirements may instead depend on how the proposed legislation is interpreted by governments or operators.

For more information, see our recent report Critical regulation: the EU Energy Efficiency Directive recast.

This Halloween, beware the vampire server

This Halloween, beware the vampire server

Halloween brings joy to many in the form of tricks and treats. But to IT managers, Halloween is a stark reminder of the evil spirits that hide out of sight in data center cabinets and public cloud applications. Vampires, zombies and ghosts haunt the infrastructure, sucking valuable energy, space and resources. IT managers need to hunt, identify and purge these evil spirits before it’s too late — efficiency and sustainability are at stake.

Vampire (or comatose) servers and virtual machines lie — seemingly dead — in the shadows, their purpose unknown. Left forgotten due to staff changes, poor tagging or deprived documentation, there are no records of their function or value. But their removal should be performed with care. The aggression of a user who loses their valued, but seldom used, business process can be far more terrifying than the vampire that was slain.

Similarly, zombie servers and virtual machines wander the data center infrastructure, forgotten and unclaimed by the user that provisioned them. Readily identified by low utilization, they add no value and offer no useful purpose. They should be quietly put to rest and repurposed, refurbished or recycled.

Ghost data is generated and saved without a known purpose or immediate need. It haunts storage devices, occupying valuable terabytes. The data may be generated by a vampire server, creating an ever-increasing volume of data and making resolution evermore critical. In an ideal world, the server should be banished and the data deleted, but this may not be feasible if the data’s purpose is unknown or encrypted using a, seemingly, lost encryption key. The ghost data may need to be sent to a permanent grave of low-energy and low-cost long-term storage, available to be called back from the dead when needed.

Fortunately, there are tools available to help wannabe paranormal investigators banish these demons: data center administrative systems, specialty workload management software and cloud management tools.

IT administrators have access to data center management tools, such as data center infrastructure management (DCIM) programs that identify applications, and track and report resource demand and utilization for assigned resources. This data can be used to find servers and virtual machines with either low or no resource utilization or activity. Reports can be generated to list candidates for shutdown or consolidation. Many cloud providers such as Amazon Web Services, Google Cloud and Microsoft Azure offer this functionality for free in cloud management portals.

Software tools are also available to identify these vampires, zombies and ghosts. Products such as Densify, Granulate, Turbonomic Application Resource Management, TSO Logic and others scan physical servers and applications placed in the public cloud. They assess central processing unit (CPU), memory and application service levels, recommending or implementing resource adjustments to maximize operational efficiency, and minimize energy and resource use. Where an application or piece of IT equipment is not being used, they are slated for shutdown and removal. This can be done automatically or verified and initiated by a system administrator. Where resources are over or underutilized, application placements are adjusted to optimize their deployment so as to minimize resource use and improve resiliency and reliability.

Many spirits can be banished before they appear if the IT manager enforces a process to register, deploy and manage IT equipment and applications. This process is augmented and improved using a software monitoring tool. Properly executed, the process enables tracking and management of all equipment and applications to prevent the appearance of vampires, zombies and ghosts.

This is particularly important when users of the IT environment can conjure up an application or bare metal server from the public cloud in a matter of minutes. Easy launching and deployment, meant to simplify and improve the user’s experience, also feeds the system administrator’s worst nightmare of uncontrolled spirits wandering across their environments.

Controlling these spirits is an important aspect of a sustainability strategy. Eliminating wasted work and equipment reduces energy use and increases delivered work per watt of consumed energy. It is hauntingly beautiful to behold.