Understanding how server power management works

Uptime Intelligence regularly addresses IT infrastructure efficiency, particularly servers, in our reports on data center energy performance and sustainability. Without active contribution from IT operations, facility operations alone will not be able to meet future energy and sustainability demands on data center infrastructure. Purchases of renewable energy and renewable energy certificates will become increasingly — and, in many locations, prohibitively — expensive as demand outstrips supply, making the energy wasted by IT even more costly.

The power efficiency of a server fleet, that is, how much work servers perform for the energy they use, is influenced by multiple factors. Hardware features receive the most attention from IT buyers: the server’s technology generation, the configuration of the system and the selection of power supply or fan settings. The single most significant factor that affects server efficiency, however, is the level at which the servers are typically utilized; a seemingly obvious consideration — and enough for regulators to include it as a reporting requirement in the EU’s new Energy Efficiency Directive (see EED comes into force, creating an enormous task for the industry). Even so, the process of sourcing the correct utilization data for the purposes of power efficiency calculations (as opposed to capacity planning) remains arguably misunderstood (see Tools to watch and improve power use by IT are underused).

The primacy of server utilization in data center efficiency has increased in recent years. The latest server platforms are only able to deliver major gains in energy performance when put to heavy-duty work — either by carrying a larger software payload through workload consolidation, or by running scalable, large applications. If these conditions are not met, running lighter or bursty workloads on today’s servers (regardless of whether based on Intel or AMD chips) will deliver only a marginal, if any, improvement in the power efficiency compared with many of the supposedly outdated servers that are five to seven years old (see Server efficiency increases again — but so do the caveats).

Cycles of a processor’s sleep

This leads into the key discussion point of this report: the importance of taking advantage of dynamic energy saving features. Settings for power and performance management of servers are often an overlooked — and underused — lever in improving power efficiency. Server power management techniques affect power use and overall system efficiency significantly. This effect is even more pronounced for systems that are only lightly loaded or spend much of their time doing little work: for example, servers that run enterprise applications.

The reduction in server power demand resulting from power management can be substantial. In July 2023 Uptime Intelligence published a report discussing data (although sparse) that indicates 10% to 20% reductions in energy use from enabling certain power-saving modes in modern servers, with only a marginal performance penalty when running a Java-based business logic (see The strong case for power management). Energy efficiency gains will depend on the type of processor and hardware configuration, but we consider the results indicative for most servers. Despite this, our research indicates that many, if not most, IT operations do not use power management features.

So, what are these power management settings? Server power management settings are governed by the firmware statically (what modes are enabled upon system start up) and dynamically by the operating system or hypervisor once running through the Advanced Configuration and Power Interface.

There are many components in a server that may have power management features, enabling them to run slower or power off. Operating systems also have their own software mechanisms, such as suspending their operation and saving the machine state to central memory or the storage system.

But in servers, which tend to be always powered on, it is the processors’ power management modes that dictate most of the energy gains. Modern processors have sophisticated power management features for idling, that is, when the processor does not execute code. These are represented by various levels of C-states (the C stands for CPU) denoted by numbers, such as C1 and C2 (with C0 being the fully active state).

The number of these states has expanded over time as chip architects introduce new, more advanced power-saving features to help processors reduce their energy use when doing no work. The chief benefit of these techniques is to minimize leakage currents that would otherwise increasingly permeate modern processor silicon.

The higher the C-state number, the more of its circuitry the CPU sends to various states of sleep. In summary:

C0: processor active.
C1/C1E: processor core halts, not performing work, but is ready to immediately resume operation with negligible performance penalty, optionally reducing its voltage and frequency to save power.
C3: processor clock distribution is switched off and core caches are emptied.
C4: enhancement to C3 that extends the parts covered.
C6: essentially powers down entire cores after saving the state to resume from later.
C7 and higher: shared resources between cores may be powered downs, or even the entire processor package.

Skipped numbers, such as C2 and C5, are incremental, transitionary processor power states between main states. Not all these C-states are available on all processor architectures.

A good sleep in a millisecond

The levels of C-states and understanding them matters because they largely define the cost in performance and the benefit in power. The measured results of 10% to 20% reduction in energy use when enabling certain power management features, as discussed earlier, have allowed the server processor (an AMD model) to enter processor power states up to C6. These sleep states save power even when the server, on a human level of perception, is processing database transactions and responding to queries.

This is because processors operate on a timescale measured in nanoseconds, while software-level requests between commands can take milliseconds even on a busy machine. This is a factor of one million difference: milliseconds between work assignments represent millions of processor cycles waiting. For modern server processors, some of the many cores may often have no work to do for a second or more, which is an eternity on the processor’s time scale. On a human scale, a comparable time would be several years of inactivity.

However, there is a cost associated with the processor cores going to sleep. Entering ever deeper sleep states across processor cores or entire chips can take thousands of cycles, and as many as tens of thousands of cycles to wake up and reinstate operation. This added latency to respond to wake-up requests is what shows up as a loss of performance in measurements. In the reference measurement running the Java-based business logic, this is in the 5% to 6% range — arguably a small price to pay.

Workloads will vary greatly in the size of the performance penalty introduced by this added latency. Crucially, they will differ even more in how costly the lost application performance is for the business — high-frequency trading or processing of high volumes of mission-critical online transactions are areas where any loss of performance is unacceptable. Another area may include storage servers with heavy demand for handling random read-write operations at low latency. But a vast array of applications will not see material change to the quality of service.

Using server power management is not a binary decision either. IT buyers can also calibrate the depth of sleep they enable for the server processor (and other components) to enter. Limiting it to C3 or C1E may deliver better trade-offs. Many servers, however, are not running performance-critical applications and spend most of their time doing no work — even if it seems that they are often, by human standards, called upon. For servers that are often idle, the energy saved can be in the 20% to 40% range, which can amount to tens of watts for every lightly loaded or idle server.

Optimizing server energy performance does not stop with C-states. Performance-governing features (setting performance levels when the processor is actively working), known as P-states, offer another set of possibilities to find better trade-offs between power and performance. Rather than minimizing waste when the processor idles, P-states direct how much power should be expended on getting the work done. Future reports will introduce P-states for a more complete view of server power and performance management for IT infrastructure operators that are looking for further options in meeting their efficiency and sustainability objectives.

The Uptime Intelligence View

A server processor’s power management is a seemingly minute function buried under layers of technical details of an infrastructure. Still, its role in the overall energy performance of a data center infrastructure will be outsized for many organizations. In the near term, blanket policies (or simply IT administrator habits) of keeping server power management features switched off will inevitably be challenged by internal stakeholders in pursuit of cost efficiencies and better sustainability credentials; or, possibly in the longer term, by regulators catching on to technicalities and industry practices. Technical organizations at enterprises and IT service providers will want to map out server power management opportunities ahead of time.