Power Saving Guide

SLURM provides an integrated power saving mechanism beginning with version 1.2.7. Nodes that remain idle for an configurable period of time can be placed in a power saving mode. The nodes will be restored to normal operation once work is assigned to them. Power saving is accomplished using a cpufreq governor that can change CPU frequency and voltage. Note that the cpufreq driver must be enabled in the Linux kernel configuration. While the "ondemand" governor can be configured to operate at all times to automatically alter the CPU performance based upon workload, SLURM provides somewhat greater flexibility for power management on a cluster. Of particular note, SLURM can alter the governors across the cluster at a configurable rate to prevent rapid changes in power demands. For example, starting a 1000 node job on an idle cluster could result in an instantaneous surge in power demand of multiple megawatts without SLURM's support to increase power demands in a gradual fashion.

Configuration

Rather than changing SLURM's configuration file (and data structures) after SLURM version 1.2 was released, we decided to temporarily put the configuration parameters directly in the src/slurmctld/power_save.c file. These paramters will all be moved into the slurm.conf configuration file when SLURM version 1.3 is released. Until that time, pleased directly edit the code to use this feature. The following configuration paramters are available:

While SuspendProgram and ResumeProgram execute as SlurmUser. The program can take advantage of this to execute programs directly on the nodes as user root through the SLURM infrastructure. Example scripts are shown below:

#!/bin/bash
# Example SuspendProgram for cluster where every node has two CPUs
srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu0/cpufreq
srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu1/cpufreq

#!/bin/bash
# Example ResumeProgram for cluster where every node has two CPUs
srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu0/cpufreq
srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu1/cpufreq

The srun --no-allocate option permits SlurmUser and user root only to spawn tasks directly on the compute nodes without actually creating a SLURM job. No other users have this permission (their requests will generate an invalid credential error message and the event will be logged). The srun --uid option permits SlurmUser and user root only to execute a job as some other user. Then SlurmUser uses the srun --uid option, the srun command will try to set its user ID to that value in order to fully operate as the specified user. This will fail and srun will report an error to that effect. This does not prevent the spawned programs from running as user root. No other users have this permission (their requests will generate an invalid user id error message and the event will be logged).

The slurmctld daemon will periodically (every 10 minutes) log how many nodes are in power save mode using messages of this sort:

[May 02 15:31:25] Power save mode 0 nodes
...
[May 02 15:41:26] Power save mode 10 nodes
...
[May 02 15:51:28] Power save mode 22 nodes

Using these logs you can easily see the effect of SLURM's power saving support. You can also configure SLURM without SuspendProgram or ResumeProgram values to assess the potential impact of power saving mode before enabling it.

Last modified 9 May 2007

Lawrence Livermore National Laboratory
7000 East Avenue • Livermore, CA 94550
Operated by Lawrence Livermore National Security, LLC, for the Department of Energy's
National Nuclear Security Administration
NNSA logo links to the NNSA Web site Department of Energy logo links to the DOE Web site