Submitting a job ================ For those familiar with GridEngine, Slurm documentation provide a `Rosetta Stone for schedulers `_, to ease the transition. Slurm commands -------------- :term:`Slurm` allows requesting resources and submitting jobs in a variety of ways. The main Slurm commands to submit jobs are: * srun * Request resources and **runs a command** on the allocated compute node(s) * **Blocking**: will not return until the command ends * sbatch * Request resources and **runs a script** on the allocated compute node(s) * **Asynchronous**: will return as soon as the job is submitted .. TIP:: **Slurm Basics** .. _slurm_basics: * **Job** A Job is an allocation of resources (CPUs, RAM, time, etc.) reserved for the execution of a specific process: * The allocation is defined in the submission script as the number of Tasks (``--ntasks``) multiplied by the number of CPUs per Task (``--cpus-per-task``) and corresponds to the maximum resources that can be used in parallel, * The submission script, via ``sbatch``, creates one or more Job Steps and manages the distribution of Tasks on Compute Nodes. * **Tasks** A Task is a process to which are allocated the resources defined in the script via the ``--cpus-per-task``, ``--mem`` and ``--mem-per-cpu`` options. A Task can have these resources like any other process (creation of threads, of sub-processes possibly themselves multi-threaded). This is the Job's resource allocation unit. CPUs not used by a Task will be **lost**, not usable by any other Task or Step. If the Task creates more processes/threads than allocated CPUs, these threads will share the allocation. * **Job Steps** A Job Step represents a stage, or section, of the processing performed by the Job. It executes one or more Tasks. This division into Job Steps offers great flexibility in the organization of the steps in the Job and the management, and analysis, of the allocated resources: * Steps can be executed sequentially or in parallel, * one Step can initiate one or more Tasks, executed sequentially or in parallel, * Steps are tracked by the ``sstat/sacct`` commands, allowing both Step-by-Step progress tracking of a Job during it's execution, and detailed resource usage statistics for each Step (during and after execution). Using ``srun`` for a single task, inside a submission script, is not mandatory. * **Partition** A Partition is a logical grouping of Compute Nodes. This grouping makes it possible to specialize and optimize each partition for a particular type of job. See :doc:`computing_resources` and :doc:`partitions_overview` for more details. .. _job_script: Job script ---------- To run a job on the system you need to create a ``submission script`` (or job script, or batch script). This script is a regular shell script (bash) with some directives specifying the number of CPUs, memory, etc., that will be interpreted by the scheduling system upon submission. * very simple .. code-block:: bash #!/bin/bash # #SBATCH --job-name=test hostname -s sleep 60s Writing submission scripts can be tricky, see more in :doc:`batch_scripts`. See also our `repository of examples scripts `_. First job --------- submit your job script with: .. code-block:: bash $ sbatch myfirstjob.sh Submitted batch job 623 :term:`Slurm` will return with a ``$JOBID`` if the job is accepted, else an error message. Without any options about output, it will be defaulted to ``slurm-$JOBID.out`` (slurm-623.out, with the above example), in the submission directory. Once submitted, the job enters the queue in the *PENDING* (PD) state. When resources become available and the job has sufficient priority, an allocation is created for it and it moves to the *RUNNING* (R) state. If the job completes correctly, it goes to the *COMPLETED* state, otherwise, its state is set to *FAILED*. .. TIP:: **You can submit jobs from any login node to any partition. Login nodes are only segregated for build (CPU µarch) and scratch access.** Monitor your jobs ----------------- You can monitor your job using either its name (``#SBATCH --job-name``) or its ``$JOBID`` with Slurm's ``squeue`` [#squeue]_ command: .. code-block:: bash $ squeue -j 623 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 623 E5 test ltaulell R 0:04 1 c82gluster2 By default, ``squeue`` show every pending and running jobs. You can filter in your own jobs, using ``-u $USER`` or ``--me`` option: .. code-block:: bash $ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 623 E5 test ltaulell R 0:04 1 c82gluster2 If needed, you can modify the output of ``squeue`` [#squeue]_. Here's an example (add CPUs to default output): .. code-block:: bash $ squeue --me --format="%.7i %.9P %.8j %.8u %.2t %.10M %.6D %.4C %N" JOBID PARTITION NAME USER ST TIME NODES CPUS NODELIST 38956 Lake test ltaulell R 0:41 1 1 c6420node172 Usefull bash aliases: .. code-block:: bash alias pending='squeue --me --states=PENDING --sort=S,Q --format="%.10i %.12P %.8j %.8u %.6D %.4C %.20R %Q %.19S" # my pending jobs alias running='squeue --me --states=RUNNING --format="%.10i %.12P %.8j %.8u %.2t %.10M %.6D %.4C %R %.19e" # my running jobs Analyzing currently running jobs -------------------------------- The ``sstat`` [#sstat]_ command allows users to easily pull up status information about their currently running jobs. This includes information about **CPU usage**, **task information**, **node information**, **resident set size (RSS)**, and **virtual memory (VM)**. You can invoke the ``sstat`` command as such: .. code-block:: bash $ sstat --jobs=$JOB_ID By default, sstat will pull up significantly more information than what would be needed in the commands default output. To remedy this, you can use the `--format` flag to choose what you want in your output. See format flag in ``man sstat``. Some relevant variables are listed in the table below: +-----------+----------------------------------------------------------+ | Variable | Description | +===========+==========================================================+ | avecpu | Average CPU time of all tasks in job. | +-----------+----------------------------------------------------------+ | averss | Average resident set size of all tasks. | +-----------+----------------------------------------------------------+ | avevmsize | Average virtual memory of all tasks in a job. | +-----------+----------------------------------------------------------+ | jobid | The id of the Job. | +-----------+----------------------------------------------------------+ | maxrss | Maximum number of bytes read by all tasks in the job. | +-----------+----------------------------------------------------------+ | maxvsize | Maximum number of bytes written by all tasks in the job. | +-----------+----------------------------------------------------------+ | ntasks | Number of tasks in a job. | +-----------+----------------------------------------------------------+ For example, let's print out a job's average job id, cpu time, max rss, and number of tasks: .. code-block:: bash sstat --jobs=$JOB_ID --format=jobid,cputime,maxrss,ntasks You can obtain more detailed informations about a job using Slurm's ``scontrol`` [#scontrol]_ command. This can be very usefull for troubleshooting. .. code-block:: bash $ scontrol show jobid $JOB_ID $ scontrol show jobid 38956 JobId=38956 JobName=test UserId=ltaulell(*****) GroupId=psmn(*****) MCS_label=N/A Priority=8628 Nice=0 Account=staff QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:08 TimeLimit=8-00:00:00 TimeMin=N/A SubmitTime=2022-07-08T12:00:20 EligibleTime=2022-07-08T12:00:20 AccrueTime=2022-07-08T12:00:20 StartTime=2022-07-08T12:00:22 EndTime=2022-07-16T12:00:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2022-07-08T12:00:22 Partition=Lake AllocNode:Sid=x5570comp2:446203 ReqNodeList=(null) ExcNodeList=(null) NodeList=c6420node172 BatchHost=c6420node172 NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,mem=385582M,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=385582M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/home/ltaulell/tests/env.sh WorkDir=/home/ltaulell/tests StdErr=/home/ltaulell/tests/slurm-38956.out StdIn=/dev/null StdOut=/home/ltaulell/tests/slurm-38956.out Power= NtasksPerTRES:0 .. [#squeue] You can get the complete list of parameters by referring to the ``squeue`` manual page (``man squeue``). .. [#scontrol] You can get the complete list of parameters by referring to the ``scontrol`` manual page (``man scontrol``). .. [#sstat] You can get the complete list of parameters by referring to the ``sstat`` manual page (``man sstat``).