CPU Compute Cluster Resources Available (Jan 2020)

There are 17 CPU compute nodes installed for HPC cluster jobs.  All nodes are configured with:

CPU coresMemoryNetwork portsLocal SSDsOperating System
40384 GiB2x25Gb/s + 1Gb/s480GBRHEL 7.x

GPU Compute Cluster Resources Available (Nov 2018)

There are 2 GPU compute nodes.  Each node is configured with:

CPU coresMemoryGPU cardsGPU memorySSH NetworkNFS NetworkLocal SSDsOperating System
24192 GiB2 x V10016GB per card1Gb/s10Gb/s480GB+960GBUbuntu 16.04

JCU has purchased GPU capacity from QCIF - access to 36 V100 cards (32GiB of memory per card).  The existing GPU servers will be repurposed sometime after we gain access to the UQ managed resource.

Configuration details for job management system (Dec 2018)

Walltime RequestedQueue
0:00:00 - 24:00:00short
24:00:01 - 168:00:00normal
168:00:01 - 2160:00:00long

The maximum walltime for each queue may be changed (to match usage patterns).  Note;  2160:00:00 = 90 days.

The values in the table below may be changed (to match usage patterns).   Note that the HPC cluster has a maximum of 600 CPU cores available (as of 10-Dec-2019).

QueueMax. jobs in queueMax. CPUs in useMax. job array size
short1000540200
normal1000400120
long1608040

Resource Requirements

More accurate resources request equate to higher return on investment.  Organisations such as NCI and AWS charge for resources allocated/requested.

Resource under-specification

PBSPro has been configured to kill jobs that consume more resource than they request.  In some cases, HPC staff can increase the limits - dependent on resource and situation.

Resource under-specification often leads to inefficiency (impacts worse for memory than CPU).  Resource under-specification can also lead to compute node(s) crashing - potentially affecting other users' jobs.

Resource over-specification

The resources you request for a job are dedicated to your job - unused components are not available for other jobs.

Users who repeatedly over-specify CPU and/or memory resource requirements will be contacted by ICT/HPC staff to change their behaviour.

Most jobs will only use 1 CPU core - requesting more will not see your job complete more quickly unless the software you are using is written to support execution on multiple CPU cores.

HPC staff realise that many people do not know the memory requirements of their jobs - e.g., memory requirement can vary based on input data or type of analysis performed.

The more resource you request, the more likely your usage will be scrutinised.