SLURM Job Accounting Plugin API

Overview

This document describes SLURM job accounting plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own SLURM job accounting plugins. This is version 1 of the API.

SLURM job accounting plugins must conform to the SLURM Plugin API with the following specifications:

const char plugin_name[]="full text name"

A free-formatted ASCII text string that identifies the plugin.

const char plugin_type[]="major/minor"

The major type must be "jobacct." The minor type can be any suitable name for the type of accounting package. We currently use

The sacct program can be used to display gathered data from regular accounting and from these plugins.

The programmer is urged to study src/plugins/jobacct/linux and src/plugins/jobacct/common for a sample implementation of a SLURM job accounting plugin.

API Functions

The job accounting API uses hooks in the slurmctld, slurmd, and slurmstepd.

All of the following functions are required. Functions which are not implemented must be stubbed.

Functions called by all slurmstepd processes

int jobacct_p_startpoll(int frequency)

Description: jobacct_p_startpoll() is called at the start of the slurmstepd, this starts a thread that should poll information to be queried at any time during throughout the end of the process. Put global initialization here.

Arguments: frequency (input) poll frequency for polling thread.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_endpoll()

Description: jobacct_p_endpoll() is called when the process is finished to stop the polling thread.

Arguments: none

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

void jobacct_p_suspend_poll()

Description: jobacct_p_suspend_poll() is called when the process is suspended. This causes the polling thread to halt until the process is resumed.

Arguments: none

Returns: none

void jobacct_p_resume_poll()

Description: jobacct_p_resume_poll() is called when the process is resumed. This causes the polling thread to resume operation.

Arguments: none

Returns: none

int jobacct_p_add_task(pid_t pid, uint16_t tid)

Description: jobacct_p_add_task() used to add a task to the poller.

Arguments: pid (input) Process id tid (input) slurm global task id

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

jobacctinfo_t *jobacct_p_stat_task(pid_t pid)

Description: jobacct_p_stat_task() used to get most recent information about task. You need to FREE the information returned by this function!

Arguments: pid (input) Process id

Returns: jobacctinfo structure pointer on success, or NULL on failure.

jobacctinfo_t *jobacct_p_remove_task(pid_t pid)

Description: jobacct_p_remove_task() used to remove a task from the poller. You need to FREE the information returned by this function!

Arguments: pid (input) Process id

Returns: Pointer to removed jobacctinfo_t structure on success, or NULL on failure.

Functions called by the slurmctld process

int jobacct_p_init_slurmctld(char *job_acct_log)

Description: jobacct_p_init_slurmctld() is called at the start of the slurmctld, this opens the logfile to be written to. Put global initialization here.

Arguments: job_acct_log (input) logfile name.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_fini_slurmctld()

Description: jobacct_p_fini_slurmctld() is called at the end of the slurmctld, this closes the logfile.

Arguments: none

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_job_start_slurmctld(struct job_record *job_ptr)

Description: jobacct_p_job_start_slurmctld() is called at the allocation of a new job in the slurmctld, this prints out beginning information about a job.

Arguments: job_ptr (input) information about the job in slurmctld.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_job_complete_slurmctld(struct job_record *job_ptr)

Description: jobacct_p_job_complete_slurmctld() is called at the end of a job in the slurmctld, this prints out ending information about a job.

Arguments: job_ptr (input) information about the job in slurmctld.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_step_start_slurmctld(struct step_record *step_ptr)

Description: jobacct_p_step_start_slurmctld() is called at the allocation of a new step in the slurmctld, this prints out beginning information about a step.

Arguments: step_ptr (input) information about the step in slurmctld.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_step_complete_slurmctld(struct step_record *step_ptr)

Description: jobacct_p_step_complete_slurmctld() is called at the end of a step in the slurmctld, this prints out ending information about a step.

Arguments: step_ptr (input) information about the step in slurmctld.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_suspend_slurmctld(struct job_record *job_ptr)

Description: jobacct_p_suspend_slurmctld() is called when a job is suspended or resumed in the slurmctld, this prints out information about the suspension of the job to the logfile.

Arguments: job_ptr (input) information about the job in slurmctld.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Functions common to all processes

int jobacct_p_init_struct(jobacctinfo_t *jobacct, uint16_t tid)

Description: jobacct_p_init_struct() is called to set the values of a jobacctinfo_t to initial values.

Arguments: jobacct (input/output) structure to be altered. tid (input) id of the task send in (uint16_t)NO_VAL if no specfic task.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

jobacctinfo_t *jobacct_p_alloc(uint16_t tid)

Description: jobacct_p_alloc() used to alloc a pointer to and initialize a new jobacctinfo structure.
You will need to free the information returned by this function!

Arguments: tid (input) id of the task send in (uint16_t)NO_VAL if no specfic task.

Returns: jobacctinfo structure pointer on success, or NULL on failure.

void jobacct_p_free(jobacctinfo_t *jobacct)

Description: jobacct_p_free() used to free the allocation made by jobacct_p_alloc().

Arguments: jobacct (input) structure to be freed. none

Returns: none

int jobacct_p_setinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)

Description: jobacct_p_setinfo() is called to set the values of a jobacctinfo_t to specific values based on inputs.

Arguments: jobacct (input/output) structure to be altered. type (input) enum of specific part of jobacct to alter. data (input) corresponding data to set jobacct part to.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

int jobacct_p_getinfo(jobacctinfo_t *jobacct, enum jobacct_data_type type, void *data)

Description: jobacct_p_getinfo() is called to get the values of a jobacctinfo_t specific values based on inputs.

Arguments: jobacct (input) structure to be queried. type (input) enum of specific part of jobacct to get. data (output) corresponding data to from jobacct part.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

void jobacct_p_aggregate(jobacctinfo_t *dest, jobacctinfo_t *from)

Description: jobacct_p_aggregate() is called to aggregate and get max values from two different jobacctinfo structures.

Arguments: dest (input/output) initial structure to be applied to. from (input) new info to apply to dest.

Returns: none

void jobacct_p_2_sacct(sacct_t *sacct, jobacctinfo_t *jobacct)

Description: jobacct_p_2_sacct() is called to transfer information from data structure jobacct to structure sacct.

Arguments: sacct (input/output) initial structure to be applied to. jobacct (input) jobacctinfo_t structure containing information to apply to sacct.

Returns: none

void jobacct_p_pack(jobacctinfo_t *jobacct, Buf buffer)

Description: jobacct_p_pack() pack jobacctinfo_t in a buffer to send across the network.

Arguments: jobacct (input) structure to pack. buffer (input/output) buffer to pack structure into.

Returns: none

void jobacct_p_unpack(jobacctinfo_t *jobacct, Buf buffer)

Description: jobacct_p_unpack() unpack jobacctinfo_t from a buffer received from the network. You will need to free the jobacctinfo_t returned by this function!

Arguments: jobacct (input/output) structure to fill. buffer (input) buffer to unpack structure from.

Returns: SLURM_SUCCESS on success, or SLURM_FAILURE on failure.

Parameters

Rather than proliferate slurm.conf parameters for new or evolved plugins, the job accounting API counts on three parameters:

JobAcctType
Specifies which plugin should be used.
JobAcctFrequency
Let the plugin know how long between pollings.
JobAcctLogFile
Let the plugin the name of the logfile to use.

Versioning

This document describes version 1 of the SLURM Job Accounting API. Future releases of SLURM may revise this API. A job accounting plugin conveys its ability to implement a particular API version using the mechanism outlined for SLURM plugins.

Last modified 31 January 2007

Lawrence Livermore National Laboratory
7000 East Avenue • Livermore, CA 94550
Operated by Lawrence Livermore National Security, LLC, for the Department of Energy's
National Nuclear Security Administration
NNSA logo links to the NNSA Web site Department of Energy logo links to the DOE Web site