Power Saving Guide
SLURM provides an integrated power saving mechanism beginning with version 1.2.7. Nodes that remain idle for an configurable period of time can be placed in a power saving mode. The nodes will be restored to normal operation once work is assigned to them. Power saving is accomplished using a cpufreq governor that can change CPU frequency and voltage. Note that the cpufreq driver must be enabled in the Linux kernel configuration. While the "ondemand" governor can be configured to operate at all times to automatically alter the CPU performance based upon workload, SLURM provides somewhat greater flexibility for power management on a cluster. Of particular note, SLURM can alter the governors across the cluster at a configurable rate to prevent rapid changes in power demands. For example, starting a 1000 node job on an idle cluster could result in an instantaneous surge in power demand of multiple megawatts without SLURM's support to increase power demands in a gradual fashion.
Configuration
Rather than changing SLURM's configuration file (and data structures) after SLURM version 1.2 was released, we decided to temporarily put the configuration parameters directly in the src/slurmctld/power_save.c file. These paramters will all be moved into the slurm.conf configuration file when SLURM version 1.3 is released. Until that time, pleased directly edit the code to use this feature. The following configuration paramters are available:
- IdleTime: Nodes becomes elligible for power saving mode after being idle for this number of seconds. A negative number disables power saving mode. The default value is -1 (disabled).
- SuspendRate: Maximum number of nodes to be placed into power saving mode per minute. A value of zero results in no limits being imposed. The default value is 60. Use this to prevent rapid drops in power requirements.
- ResumeRate: Maximum number of nodes to be placed into power saving mode per minute. A value of zero results in no limits being imposed. The default value is 60. Use this to prevent rapid increasses in power requirements.
- SuspendProgram: Program to be executed to place nodes into power saving mode. The program executes as SlurmUser (as configured in slurm.conf. The argument to the program will be the names of nodes to be placed into power savings mode (using SLURM's hostlist expression format).
- ResumeProgram: Program to be executed to remove nodes from power saving mode. The program executes as SlurmUser (as configured in slurm.conf. The argument to the program will be the names of nodes to be removed from power savings mode (using SLURM's hostlist expression format).
- ExcludeSuspendNodes: List of nodes to never place in power saving mode. Use SLURM's hostlist expression format. By default, no nodes are excluded.
- ExcludeSuspendPartitions: List of partitions with nodes to never place in power saving mode. Multiple partitions may be specified using a comma separator. By default, no nodes are excluded.
While SuspendProgram and ResumeProgram execute as SlurmUser. The program can take advantage of this to execute programs directly on the nodes as user root through the SLURM infrastructure. Example scripts are shown below:
#!/bin/bash # Example SuspendProgram for cluster where every node has two CPUs srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu0/cpufreq srun --uid=0 --no-allocate --nodelist=$1 echo powersave >/sys/devices/system/cpu1/cpufreq #!/bin/bash # Example ResumeProgram for cluster where every node has two CPUs srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu0/cpufreq srun --uid=0 --no-allocate --nodelist=$1 echo performance >/sys/devices/system/cpu1/cpufreq
The srun --no-allocate option permits SlurmUser and user root only to spawn tasks directly on the compute nodes without actually creating a SLURM job. No other users have this permission (their requests will generate an invalid credential error message and the event will be logged). The srun --uid option permits SlurmUser and user root only to execute a job as some other user. Then SlurmUser uses the srun --uid option, the srun command will try to set its user ID to that value in order to fully operate as the specified user. This will fail and srun will report an error to that effect. This does not prevent the spawned programs from running as user root. No other users have this permission (their requests will generate an invalid user id error message and the event will be logged).
The slurmctld daemon will periodically (every 10 minutes) log how many nodes are in power save mode using messages of this sort:
[May 02 15:31:25] Power save mode 0 nodes ... [May 02 15:41:26] Power save mode 10 nodes ... [May 02 15:51:28] Power save mode 22 nodes
Using these logs you can easily see the effect of SLURM's power saving support. You can also configure SLURM without SuspendProgram or ResumeProgram values to assess the potential impact of power saving mode before enabling it.
Last modified 9 May 2007