🚀 Exciting Update: JCU Confluence is now on the Cloud!
Click for the new experience: https://jcu.atlassian.net/wiki.
PLEASE DO NOT MAKE CHANGES. No updates will be migrated now. For assistance, contact IT Help Desk.
Click for the new experience: https://jcu.atlassian.net/wiki.
PLEASE DO NOT MAKE CHANGES. No updates will be migrated now. For assistance, contact IT Help Desk.
The JCU HPC job management system (PBSPro) works with 2 login nodes, 16 compute nodes, and 1 test node (mostly TS staff only). All nodes have 40 CPU cores, 384GiB of RAM/memory, 480GB of RAID1 SSD storage, ~7TB of RAID0 SSD storage, and 51Gb/s of network connectivity. All nodes are running the RedHat Enterprise Linux 7 operating system.Â
Three "walltime request" job queues (FIFO) have been configured to accept researcher workflows. On 1-Mar-2021, the following configuration was operational:
Walltime | Max. CPU cores | |||
---|---|---|---|---|
Queue | min. | max. | per user | all jobs |
tiny | 0:00:01 | 2:00:00 | 360 | all |
short | 2:00:01 | 24:00:00 | 240 | all |
normal | 24:00:01 | 168:00:00 | 160 | 400 |
long | 168:00:01 | 720:00:00 | 80 | 240 |
Job array limits match the per-user limits in the table above. The limits mentioned above are reviewed regularly and have been changed on multiple occasions (to match researchers' usage patterns).  Note: 720:00:00 is equivalent to 30 days.
Play nice:Â Â The JCU HPC has not be architected for multi-node MPI jobs (no researcher should be submitting job requests that involve more than 1 node (select=1
). Researchers who need to run jobs that require more than 40 cores should seek time on QCIF, NCI, or Pawsey HPC facilities (or public cloud if you have sufficient budget). The storage platforms that house JCU HPC filesystems were purchased for capacity, not performance.
Historically, there are idle CPU cores on the HPC cluster about 70% of the time. As a result, a FIFO queues configuration was deemed to be minimum viable product. Fairshare queues will be configured if there is evidence of near 100% utilization with jobs waiting in queues for a period of at least 3 months. At maximum compute cluster capacity, we only have 680 CPU cores available to accept jobs/workflows.
There are several factors to consider when requesting resources for your job(s).
Researchers who are found to repeatedly under-request or significantly over request job/workflow resources will be contacted in an attempt to change their behaviour. HPC staff realise that many people do not know the memory requirements of their jobs - e.g., memory requirement can vary based on input data or type of analysis performed.