GridEngine

The job scheduler (or batch-queuing system) used in PSMN cluster is SGE,-previously Sun Grid Engine and now Son of Grid Engine-; it manages the execution of non-interactive jobs.

Infrastructure:

  • the compilation servers are described here
  • the cluster hardware configuration is described here
  • the queues for job submission are described here

Optimum use of resources

To make the best use of resources, it is important to fill up the servers. For this, they are two strategies:

  • fill in “at best” (a best effort strategy),
  • fill in with multiple of n cores (where n is the number of physical cores per server).

Filling at best quickly leads to a fragmentation between the servers, of parallel applications via mpi.

Filling at best is therefore only implemented on a few queues for parallel applications. On other queues, the fill in with the multiple of n cores of an entire server is used.

Priorities

Job priority is:

  • inversely proportional to the calculation time already consumed,
  • proportional to waiting time and number of hearts requested.

This is to distribute the available resources more equitably.

GridEngine : Submitting jobs

qsub programme <input >output
qsub -V -e /path/to/workdir/ -o /path/to/workdir/ -q $QUEUE script
 
-V : export environment variables
-e : where to put error files
-o : where to put output files
-q : queue (file d'attente)

It is simpler to submit a script to GridEngine, which will contain more options.

Some options don't work directly in CLI, you have to use a script (example: send mails beginning and end).

Voir complete documentation to submit a job, as well as the queue list.

How to choose the adapted queues for my needs?

Due to successive purchases of compute nodes with cores/CPU architectures of different generation, it was not possible to define a single queue. It is better to have different queues for each architecture, in order to achieve interesting performance for each queue.

In concrete terms, the choice of the “production” queue should be made according to the desired objective:

  • if the main criterion is the speed of execution, you must look at what are the queues available to accept the job. The use of commands of the type qstat -g c should help you to chose the intended queue
  • if the main criterion is the large number of resources (eg a job with a lot of cores, a job with a lot of RAM,etc), then you have to move towards the queues that have a large number of resources (at least the resources requested by the job), even if waiting time in the queue is greater.

Obviously, the above command (qstat -g c) and the list of queues should guide your choice.

And, of course, when tuning ypur code, you have to choose a test queue that is closest to the intended “production” queue (i.e. same type of compute nodes). Eg r815lin128ib was chosen for the production queue, you thus have to run your tests on r815_ib_test.

GridEngine : other useful commands

Checking job status

  • display job status of a specific user:
qstat -u login 
  • display queues status (and list of queues):
qstat -g c 
  • display nodes status in a given queue:
qstat -q <queue_name> -f 
  • display the running jobs of all users:
qstat -u "*" -s r 
  • display the pending jobs of all users:
qstat -u "*" -s p 
  • display the status of a job in progress:
qstat -j <job_id> | less 
  • display the status of a job in progress with more details (longer):
qstat -j <job_id> -g t | less 
  • display the status of a job in progress with even more details (even longer):
qstat -j <job_id> -g t -s r | less 
  • display information on a job afetr its completion (long):
qacct -j <job_id> -f /gridware/psmn/accounting | less
  • delete a job:
qdel <job_id> 
  • delete a job (force deletion) :
qdel -f <job_id> 

Accounting

The accounting file is distributed on /gridware/psmn/accounting
  • Job details for the last 30 days:
qacct -f /gridware/psmn/accounting -d 30 -o <login> -j 
  • CPU hours consumption (utime on the last 30 days):
qacct -f /gridware/psmn/accounting -d 30 -o <login> | tail -1 | awk '{print $3/3600}'

ou

qacct -f /gridware/psmn/accounting -q "*" -o <login> -d 30 | awk '{ SUM += $5} END {print SUM/3600}'
  • CPU hours consumption(utime from date to date, in this example, year 2012):
qacct -f /gridware/psmn/accounting -b 201201010000 -e 201212312359 -o <login> | tail -1 | awk '{print $3/3600}'

Troubleshootings:

Run the command:

qstat -g c 

on the output, look at the last two columns:

  • aoACD : Number of slots/cores that are at least in one of the following states:
    • a Load threshold alarm
    • o Orphaned
    • A Suspend threshold alarm * C Suspended by calendar
    • D Disabled by calendar
  • cdsuE : Number of slots/cores that are at least in one of the following states:
    • c Configuration ambiguous
    • d Disabled
    • s Suspended
    • u Unknown
    • E Error

Possible job status:

  • d(eletion),
  • E(rror),
  • h(old),
  • r(unning),
  • R(estarted),
  • s(uspended),
  • S(uspended),
  • t(ransfering),
  • T(hreshold),
  • w(aiting).

GridEngine: Environment variables

#$ is dedicated to GridEngine to transmit parameters (ex: #$ -cwd or #$ -V).
  • SGE_O_WORKDIR : directory where the job was submited, re-usable in scripts
  • NSLOTS : number of slots/cores requested
  • JOB_ID : job id (unique) assigned by GridEngine
  • JOB_NAME : name of the job (-N)
  • PE_HOSTFILE : hosts files (for MPI jobs)

References :

en/documentation/tools/sge.txt · Dernière modification: 2020/05/11 18:34 par fleroux