Use tools to control cloud costs before it’s too late

The public cloud’s on-demand pricing model is vital in enabling application scalability — the key benefit of cloud computing. Resources need to be readily available for a cloud application to scale when required without the customer having to give advance notification. Cloud providers can offer such flexibility by allowing customers to pay their bills in arrears based on the number of resources consumed during a specified period.

This flexibility does have a downside, however. If more resources are consumed than expected due to increased demand or configuration errors, the organization is still liable to pay for them — it is too late to control costs after the fact. A total of 42% of respondents to the Uptime Institute Data Center Capacity Trends Survey 2022 cited escalating costs as the top reason for moving workloads from the public cloud back to on-premises infrastructure. Chief information officers face a tricky balancing act when allowing applications to scale to meet business objectives without letting budgets spiral out of control.

This Uptime Intelligence Update summarizes the challenges that cloud customers face when forecasting, controlling and optimizing costs. It also provides simple steps that can help buyers take control of their spend. As organizations face increasing macroeconomic pressures, reducing cloud expenditure has never been more important (see Cloud migrations to face closer scrutiny).

Cloud complexity

A cloud application is usually architected from multiple cloud services, such as virtual machines, storage platforms and databases. Each cloud service has its own metrics for billing. For example, customers may be charged for storage services based on the amount of storage used, the number of transactions made, and the bandwidth consumed between the storage service and the end user. The result is that even a simple bill for a cloud application will have many different charges spread across various services.

Controlling, forecasting and optimizing the costs of cloud-native applications (i.e., applications built for the cloud that can scale automatically) is challenging for several reasons:

Consumption is not always under the customer’s control. For example, many end users might upload data to a storage platform — thus increasing the customer’s bill — without the customer being aware until the end of the billing period.
Each service has many metrics to consider and an application will typically use multiple cloud services. Each cloud provider measures the consumption of their service in different ways; there is no standard approach.
Metrics are not always related to tangible units that are easy for the customer to predict. For example, a specific and unknown type of transaction may generate a cost on a database platform; however, the customer may have no understanding or visibility of how many of these transactions will be executed in a certain period.
Applications may scale up by accident due to errant code or human error and use resources without purpose. Similarly, applications may not scale down when able (reducing costs) due to incorrect configuration.

Conversely, applications that don’t scale, such as those lifted and shifted from on-premises locations, are generally predictable and stable in capacity terms. However, without the ability to scale down (and reduce costs), infrastructure expenses are not always as low as hoped (see Asset utilization drives cloud repatriation economics).

A sudden and unexpected increase in a monthly cloud bill is often described as “bill shock,” a term coined initially for unexpectedly large consumer phone bills. Is bill shock a problem? Not necessarily. If an application is scaled to derive more revenue from the end users, for example, then paying more for the underlying infrastructure is not an issue. But although applications may be designed to be scalable, organizations and budgets are not. An IT department might generate new revenue for the organization from spending more on infrastructure — but if the department has a fixed budget, the chief financial officer might not understand why costs have increased. Most organizations would not report the cost of cloud services against any revenue created by the investment in those services — to senior management, cloud services may appear to be an expense rather than a value-generating activity.

The complexity of the situation has led to the creation of an open-source project, the FinOps Foundation. The foundation describes FinOps (a portmanteau of finance and operations) as a “financial management discipline and cultural practice that enables organizations to get maximum business value by helping engineering, finance, technology and business teams to collaborate on data-driven spending decisions.” At a high level, the foundation describes six principles to effectively manage cloud costs:

Teams need to collaborate.
Decisions should be driven by the business value of the cloud.
Everyone needs to take ownership of their cloud usage.
FinOps data should be accessible and timely.
FinOps needs to be driven by a centralized team.
Organizations should take advantage of the variable cost model of the cloud.

The need for a foundation dedicated to cloud finance demonstrates the complexity of managing cloud costs effectively. Fully executing the foundation’s six key steps requires substantial investment and motivation — and many organizations will need expert assistance in this endeavor.

Taking charge

There are some simple steps organizations can take to control their public cloud costs, most related are to the foundation’s six principles:

Set alerts to warn of overspending

All cloud providers allow customers to set custom spend alerts, which warn when a cost threshold has been reached. Such alerts enable the budget holders to determine if the spend is justified, if further funding should be sought, or if the spending is accidental and needs to be curtailed. Setting alerts is the minimal step all organizations should take to control their cloud expenditures. Organizations should ensure that alerts are configured and sent to a valid mailbox, phone number or event management system.

Use free tools to forecast month-to-month consumption

Most cloud providers include tools to forecast future spending based on past performance. These tools aren’t perfect, but they do give some visibility into how an application consumes resources over time, for free. It’s better to inform leadership in advance if costs are expected to rise rather than after the bill is due.

Work with stakeholders to determine future needs

Ensure that all parts of the business that use the public cloud understand how costs may change. For example, a new product launch, sale or event may increase the use of a website, which increases costs. Knowing this in advance enables a more realistic forecast of future costs and an open discussion on who will pay.

Consider showback and chargeback models

In a showback model, the IT department shows individual departments and business units their monthly cloud spends. The idea being that they become more aware of how their decisions affect expenditure, which enables them to take steps to reduce it. In a chargeback model, IT invoices these departments for the cloud costs related to their applications. Each department is then responsible for its own costs and is obliged to justify the expenditure relative to the value gained (e.g., increased revenue, better customer satisfaction).

Showback can be set up relatively quickly by “tagging” resources appropriately with an owner and then using the cloud provider’s reporting tools to break down business owners’ spending. Chargeback is a more significant undertaking, which affects the culture and structure of a company — most non-IT teams may not have the understanding or appetite to be financially responsible for their IT bills.

Take advantage of optimization tools

With an accurate forecast, organizations can use alternative pricing models to reduce their spend. These models give customers discounts of up to 70% compared with on-demand pricing in return for a commitment of up to three years or a minimum spend. Many cloud providers also offer spot instances, which provide cheap access to cloud resources on the understanding that this access can be terminated without warning. The best use of alternative pricing models will be discussed further in a future Uptime Intelligence Update. Most cloud providers offer tools that suggest alternative pricing models based on past performance. Such tools can also identify “orphaned” resources that cost money but don’t appear to be doing anything useful.

Security and governance practices prevent overspend

A well-tested application hosted in a secure cloud environment reduces the likelihood of things going wrong and costs increasing as a result. For example, organizations should use role-based access to ensure only those employees who need to create resources are permitted to do so. This prevents costly services from being set up and subsequently forgotten about. Similarly, cloud customers should take appropriate precautions to stop malicious scripts from executing in their environment and sending out large quantities of data that will increase bandwidth costs. IT teams should test code thoroughly before deployment to reduce the chance of accidental resource consumption.

Get help

Most hyperscaler cloud providers, including Amazon Web Services, Google Cloud Platform, Microsoft Azure, Oracle Cloud, IBM Cloud and Alibaba Cloud, offer tools to aid cost forecasting, optimization and management. Smaller cloud providers are less likely to have these features but their charges are usually based on fewer metrics and offer fewer services, thereby reducing complexity.

Some organizations use third-party platforms to track and optimize their spend. The key benefit of these platforms is that they can optimize across multiple cloud providers and are independent, which arguably provides a more unbiased view of costs. These platforms include Apptio Cloudability, Flexera, NetApp CloudCheckr, IBM Turbonomic and VMware CloudHealth.

There are also consultancies and managed service providers, such as Accenture, Deloitte and HCLTech, that integrate cost-optimization practices into organizations and optimize cloud costs on their customer’s behalf on an ongoing basis. The cost of not acting can be substantial. This analyst spent $4,000 on a bare-metal cloud server after accidentally leaving it running for two months. Without an alert set up, the analyst only became aware when the cloud provider posted an invoice to his home address. Organizations should check that warnings and limits are configured now, before it is too late. If cloud costs are a significant part of the IT expenditure, expert advice is essential.