Slurm Job Scheduling#

Learning Objectives#

By the end of this section, learners will be able to:

  • Explain the purpose of job scheduling in HPC systems and why Slurm is used.

  • Create, edit, and submit Slurm job scripts that specify resources and commands for execution.

  • Use key Slurm commands to manage jobs, including sbatch, squeue, scancel, and srun.

  • Interpret job status outputs from squeue and understand job states such as running, pending, and cancelled.

  • Configure resource requests in job scripts, including CPUs, GPUs, memory, partitions, and time limits.

  • Cancel jobs individually or in bulk, and verify job cancellations using Slurm commands.

  • Implement job dependencies with --dependency to control execution order in multi-step workflows.

  • Differentiate between batch jobs (sbatch) and interactive jobs (srun) and identify when each is appropriate.

  • Apply practical strategies for working efficiently with Slurm, including setting output files, realistic resource requests, and monitoring running jobs.

  • Collect and interpret system information from Slurm job outputs to better understand available compute and GPU resources.

Overview#

When working on a multi-user HPC system (like a cluster or supercomputer), you typically don’t run big GPU jobs directly on the login node. Instead, you use a scheduler to queue up your work. Slurm (Simple Linux Utility for Resource Management) is a widely used job scheduler for HPC environments. It manages allocating computer resources (CPUs, GPUs, memory) to user jobs and queues them to run in time, depending on other jobs.

This section will introduce Slurm for beginners, covering how to submit and manage jobs using commands like sbatch, squeue, and scancel, and how to set up job dependencies so jobs run in a certain order.

What is a Slurm job?#

A job in Slurm is a unit of work, usually defined by a job script. The job script is a bash script (or another shell) that specifies resources needed (via special #SBATCH directives) and the commands to execute. When you submit this script, Slurm will find an available compute node (or nodes) that meet your requirements (CPU cores, GPUs, time, etc.) and run the script there, not on the login machine.

Creating a Simple Job Script#

Here’s a fundamental example of a Slurm job script, which we could call myjob.slurm:

#!/bin/bash
#SBATCH --job-name=testjob          # Name of the job
#SBATCH --output=job_%j.out         # Output file (%j will be replaced with job ID)
#SBATCH --error=job_%j.err          # Error file (a separate file for stderr, optional)
#SBATCH --time=0-00:05              # Wall time (DD-HH:MM) here 5 minutes
#SBATCH --partition=gpu             # Partition/queue name, e.g., 'gpu' or as configured
#SBATCH --gres=gpu:1                # Request 1 GPU (generic resource)
#SBATCH --cpus-per-task=4           # Request 4 CPU cores
#SBATCH --mem=16G                   # Request 16 GB of RAM

echo "Hello from job $SLURM_JOB_ID running on $SLURM_NODELIST"
sleep 60  # Simulate work by sleeping for 60 seconds

The script uses #SBATCH lines to request resources:

  • A job name for easy identification

  • Output/error file names

  • A time limit of 5 minutes

  • The partition (queue) to run on (often clusters have a special partition for GPU jobs)

  • 1 GPU(--gres=gpu:1 means one GPU)

  • 4 CPU cores and 16GB of memory

The body of the script prints a message and sleeps 60 seconds (as a placeholder for real work). SLURM_JOB_ID and $SLUM_NODELIST are environment variables Slurm sets for your job.

Submitting Jobs with sbatch#

To submit the above job scripts to Slurm, use the sbatch commands:

$ sbatch myjob.slurm
Submitted batch job 123456

Slurm will respond with a job ID (in this example, 123456). At this point, your job is in the queue. It might start immediately if resources are free or wait in line if the cluster is busy.

Key points about sbatch: It queues the job, and then you return to your shell prompt. The job runs asynchronously in the background on the cluster. The sbatch command is non-interactive; it just submits the job. You won’t see the job output on your screen live, it will go to the files defined by –output/–error` in the script.

Checking Job Status with squeue#

Once a job is submitted, you’ll want to check its status (queues, running, finished). Use squeue to view the job queue:

$ squeue -u your_username

This shows all your jobs - using squeue alone shows everyone’s jobs, but that can be long on busy systems. Typical squeue output columns include:

  • JOBID: The job ID (e.g. 123456)

  • PARTITION: Which partition/queue is it in?.

  • NAME: the job name.

  • USER: who submitted it.

  • ST: state (R = running, PD = pending/waiting, CG = completing, F = finished, CA = canceled, etc.)

  • TIME: how long it’s been running (or pending)

  • NODES: number of nodes allocated.

  • NODELIST(REASON): which node(s) it’s on, or the reason it’s pending (e.g. resource, priority, etc.)

For example, if your job is waiting, you might see PD and REASON might be “Resources”, meaning it is waiting for resources to free up.

Note

You can also filter by job ID (`squeue -j 123456) or other criteria. Slurm has many options, but checking by username is simplest to see all your jobs.

Canceling a job with scancel#

If you need to stop a job (maybe you realised there’s a bug or its taking too long), you can cancel it:

$ scancel 123456

Replace 123456 with the job ID you want to cancel. This will terminate the job if it’s running or remove it from the queue if it hasn’t started yet. After cancelling, use squeue to verify it’s gone or see it marked as CA (cancelled).

You can cancel all your jobs with scancel -u your_username. You can also cancel an entire job array or a range of jobs if needed.

Job Dependencies: Ordering Jobs#

Slurm allows you to chain jobs so that one doesn’t start until the other is complete (and, optionally, only if it succeeds). This is done with the --dependency option of sbatch.

Use case: Suppose you have two jobs, and Job2 should run only after Job1 finishes successfully. It could be the case that Job1 generates data that Job2 will process. You can submit Job1 normally, then submit Job2 with a dependency on Job1.

Submit the first job and note its job ID:

$ sbatch job1.slurm
Submitted batch job 111111

Submit the second job with dependency:

$ sbatch --dependency=afterok:111111 job2.slurm
Submitted batch job 111112

The dependency=afterok:111111 means “run this job after 111111 finished after OK (exit code 0).” In other words, job2 will wait until job1 is done successfully. If job1 fails (non-zero exit), job2 will not run (it will be canceled due to dependency failure).

There are a number of other dependency types, including:

  • afterany:<jobid>: run after job finishes regardless of success or failure.

  • after:<jobid>: run after job starts (not commonly used; afterok is more typical).

  • singleton: ensure only one job with the same name/user runs at a time (to avoid duplicates). You can chain multiple job IDs like --dependency=afterok:ID1:ID2 (job runs after both ID1 and ID2 succeed).

Dependencies are powerful for building task pipelines. For instance, you could have a preprocessing job, then a training job, then a post-processing job, each submitted with appropriate --dependency so they execute in sequence without manual intervention.

Interactive Jobs (srun)#

While sbatch is for batch submission, Slurm also has srun for running tasks interactively (especially useful for debugging or running short tests on compute nodes). For example, srun --pty bash will give you an interactive shell on a compute note. This course focuses on batch jobs for GPU tasks, but keep in mind that srun exists for interactive use.

Practical Tips for Slurm#

  • Default Behaviour: If you don’t specify an output file, Slurm, by default, writes output to a file like slurm-<jobid>.out. It’s better to set --output to something meaningful.

  • Resource Requests: Always request resources (time, memory, GPUs) realistically. If you ask for too little, your job might be killed for exceeding memory or time. If you ask for too much, you could wait longer in the queue.

  • Partition/Queues: Clusters often have multiple partitions (e.g. GPU, CPU, long, debug). Make sure to use an appropriate one, as each has limits (a debug queue might only allow short 30-minute jobs, for example).

  • Monitoring: You can monitor usage on a running job with commands like sstat (for stats) or by logging into the node (if interactive) and using tools like nvidia-smi to see GPU usage.

Summary: Key Slurm Commands#

  • sbatch <script>: Submit a job script to the queue.

  • squeue: Check job status in the queue (use -u <user> to filter by your username).

  • scancel <jobid>: Cancel a job.

  • Dependencies: Use sbatch --dependency=afterok:<jobid> to chain jobs.

  • Interactive: srun for interactive jobs (not covered in depth here, but useful to know).

Details of all of the #SBATCH directives can be found on the sbatch documentation.

Exercise#

Exercise: Collect and Interpret System Information via SLURM#

In this exercise, you’ll run the provided SLURM submission script to gather detailed information about your compute node.

Prepare the SLURM Script#

Within the directory slurm_submissions_scripts, you will find a .slurm script called system_info.slurm. Open it and update it to cluster-specific directives (e.g. partition, account, time limit, etc.) so that it will run in your HPC environment.

Submit the Job#

Execute the SLURM script to collect system info:

sbatch system_info.slurm

if you are in the slurm_submissions_scripts directory.

Inspect the Results#

Open the SLURM output file, which will be named something like <JOBID>.out. Based on the output can you determine:

  • Compute capacity: How many CPU cores and how much RAM are available?

  • GPU capabilities: What GPU(s) are present, and what are their key specs (CUDA cores, memory)?

Quiz#