Download
SLURM source can be downloaded from
http://www.schedmd.com/#repos
SLURM has also been packaged for
Debian and
Ubuntu
(both named slurm-llnl).
A SLURM simulator is available to assess various scheduling policies. Under simulation jobs are not actually executed. Instead a job execution trace from a real system or a synthetic trace are used.
Related software available from various sources include:
- Authentication plugins identifies the user originating a message.
- authd
- MUNGE (recommended)
In order to compile the "auth/munge" authentication plugin for SLURM, you will need to build and install MUNGE, available from http://munge.googlecode.com/ and Debian and Fedora and Ubuntu. - Authentication tools for users that work with SLURM.
- AUKS
AUKS is an utility designed to ease Kerberos V credential support addition to non-interactive applications, like batch systems (SLURM, LSF, Torque, etc.). It includes a plugin for the SLURM workload manager. AUKS is not used as an authentication plugin by the SLURM code itself, but provides a mechanism for the application to manage Kerberos V credentials. - Databases can be used to store accounting information. See our Accounting web page for more information.
- MySQL (recommended)
- PostgreSQL (Not fully functional)
- Debuggers and debugging tools
- TotalView is a GUI-based source code debugger well suited for parallel applications.
- Padb is a job inspection tool for examining and debugging parallel programs, primarily it simplifies the process of gathering stack traces but also supports a wide range of other functions. It's an open source, non-interactive, command line, scriptable tool intended for use by programmers and system administrators alike.
- Digital signatures (Cypto plugin) are used to insure message are not altered.
- MUNGE (recommended)
MUNGE can be used at an alternative to OpenSSL. MUNGE is available under the Gnu General Public License. See MUNGE download information above. - OpenSSL
OpenSSL may be used as an alternative to MUNGE for generation of digital signatures. Download it from http://www.openssl.org/. - DRMAA (Distributed Resource Management Application API)
PSNC DRMAA for SLURM is an implementation of Open Grid Forum DRMAA 1.0 (Distributed Resource Management Application API) specification for submission and control of jobs toSLURM. Using DRMAA, grid applications builders, portal developers and ISVs can use the same high-level API to link their software with different cluster/resource management systems. - Hostlist
A Python program used for manipulation of SLURM hostlists including functions such as intersection and difference. Download the code from:
http://www.nsc.liu.se/~kent/python-hostlist
Lua bindings for hostlist functions are also available here:
https://github.com/grondo/lua-hostlist
NOTE: The Lua hostlist functions do not support the IBM Bluegene naming convention or bracketed numeric ranges anywhere except at the end of the name (i.e. "bgq[0000x1333]" and "rack[0-3]_blade[0-63]" are not supported). - Interactive Script
A wrapper script that makes it very simple to get an interactive shell on a cluster. Download the code from:
https://github.com/alanorth/hpc_infrastructure_scripts/blob/master/slurm/interactive - Interconnect plugins (Switch plugin)
- Infiniband
The topology.conf file for an Infiniband switch can be automatically generated using the ib2slurm tool found here: https://github.com/fintler/ib2slurm. - I/O Watchdog
A facility for monitoring user applications, most notably parallel jobs, for hangs which typically have a side-effect of ceasing all write activity. This faciltiy attempts to monitor all write activity of an application and trigger a set of user-defined actions when write activity as ceased for a configurable period of time. A SPANK plugin is provided for use with SLURM. See the README and man page in the package for more details. Download the latest source from:
http://io-watchdog.googlecode.com/files/io-watchdog-0.6.tar.bz2 - MPI versions supported
- ChaMPIon, MPI Software Technology
- HP-MPI
- Intel-MPI
- LAM/MPI
- MPICH1
- MPICH2
- MPICH-GM
- MPICH-MX
- MVAPICH
- MVAPICH2
- Open MPI
- Quadrics MPI
- PAM Module (pam_slurm)
Pluggable Authentication Module (PAM) for restricting access to compute nodes where SLURM performs workload management. Access to the node is restricted to user root and users who have been allocated resources on that node. NOTE: pam_slurm is included within the SLURM distribution for version 2.1 or higher. For earlier SLURM versions, pam_slurm is available for download from
http://www.schedmd.com/download/extras/pam_slurm-1.6.tar.bz2
SLURM's PAM module has also been packaged for Debian and Ubuntu (both named libpam-slurm). - Schedulers offering control over the workload
- Catalina a scheduler supporting the Open Grid Forum Advance Reservation API
- StarCluster cloud computing toolkit has a SLURM port available.
- Load Sharing Facility (LSF)
- Maui Scheduler
- Moab Cluster Suite
- Command wrappers
There is a wrapper for Maui/Moab's showq command at https://github.com/pedmon/slurm_showq. - Scripting interfaces
- A Perl interface is included in the SLURM distribution in the contribs/perlapi directory and packaged in the perapi RPM.
- PySlurm is a Python/Pyrex module to interface with SLURM. There is also a Python module to expand and collect hostlist expressions available at http://www.nsc.liu.se/~kent/python-hostlist/.
- Lua may be used to implement a SLURM process tracking plugin. The Lua script available in contribs/lua/protrack.lua implements containers using CPUSETs.
- SPANK Plugins
SPANK provides a very generic interface for stackable plug-ins which may be used to dynamically modify the job launch code in SLURM. SPANK plugins may be built without access to SLURM source code. They need only be compiled against SLURM‘s spank.h header file, added to the SPANK config file plugstack.conf, and they will be loaded at runtime during the next job launch. Thus, the SPANK infrastructure provides administrators and other developers a low cost, low effort ability to dynamically modify the runtime behavior of SLURM job launch. As assortment of SPANK plugins are available from
http://code.google.com/p/slurm-spank-plugins/.
The current source for the plugins can be checked out of the subversion repository with the following command:
svn checkout http://slurm-spank-plugins.googlecode.com/svn/trunk/ slurm-plugins - Sqlog
A set of scripts that leverages SLURM's job completion logging facility in provide information about what jobs were running at any point in the past as well as what resources they used. Download the code from:
http://sqlog.googlecode.com - Task Affinity plugins
- Node Health Check
Probably the most comprehensive and lightweight health check tool out there is Node Health Check. It has integration with Slurm as well as Torque resource managers. - Accounting Tools
UBMoD is a web based tool for displaying accounting data from various resource managers. It aggregates the accounting data from sacct into a MySQL data warehouse and provide a front end web interface for browsing the data. For more information, see UDMod home page and source code. - Slurmmon
Slurmmon is a system for gathering and plotting data about Slurm scheduling and job characteristics. It currently simply sends the data to ganglia, but it includes some custom reports and a web page for an organized summary. It collects all the data from sdiag as well as total counts of running and pending jobs in the system and the maximum such values for any single user. It can also submit probe jobs to various partitions in order to trend the times spent pending in them, which is often a good bellwether of scheduling problems.
Slurmmon code - MSlurm
Such a superstructure for the management of multiple Slurm environments is done with MSlurm. Thereby several Slurm clusters - even across multiple Slurm databases - can run parallel on a Slurm master and can be administered in an easy and elegantly manner.- Overview
- Installation Instructions
- Code
Last modified 24 January 2014