- The HPC cluster is built upon the Linux operating system, not Windows. There may be a version of the software you want to use available for Linux (features/functionality may differ).
- Researchers' with requirements to use computational research software that only works on Windows can request a virtual machine. HPC have very limited resources to meet such requirements - VM size is limited to 2 CPU cores, 16GiB of memory, and 100GiB of disk space.
- HPC have commenced work on a high-end graphical environment (Virtual Desktop Infrastructure) for researchers.
No. HPC does not have any resources that could be used to run computational research software written only for OSX.
However, there may be a Linux version of the software you wish to use (features/functionality may differ).
Ultimately it is your choice. However, be aware that the Australian government and their research funding bodies do not consider use of personal computing devices (e.g., desktop/laptop computers) to be responsible research. Also be aware that I have heard several stories of researchers (PhD students mostly) having to "start from scratch" after a disk failure.
The following list contains some key advantages to use of HPC resources:
- You can run many jobs in parallel - e.g., we have had users submit thousand of jobs in a single day.
- Some software has been written to run on many CPU cores - all HPC compute nodes have 40 CPU cores.
- Each HPC compute node (1-Jan-2020) has 384GiB of memory installed, an amount you won't see in any personal computing device.
- HPC infrastructure uses ECC memory - single bit errors will be reported and corrects, multiple bit errors will result in server shutdown, so your results won't be corrupted by events invisible to you.
As of 1-Jan-2020, the CPU cluster has:
- 2 login nodes; each with 40 CPU cores (Intel Xeon 6248) and 384GiB of memory.
- 17 compute nodes; each with 40 CPU cores (Intel Xeon 6148/6248) and 384GiB of memory.
Login nodes may be used for interactive workflows (e.g., where GUIs are required), testing and/or development purposes, and for short (<4 hours), single-core jobs (no more than 4 at a time).
As at 1-Jan-2020, NVIDIA GPU resources are available as follows:
- One GPU node at JCU (tesla.hpc.jcu.edu.au), with 2 NVIDIA V100 cards (16GiB).
- Thirty NVIDIA V100 cards (32GiB) at UQ. Contact email@example.com to arrange access to this resource.
Note: The last of the GPU nodes at JCU is likely to decommissioned or repurposed in 2020. Researchers wishing to use accelerators may access the UQ resource we have paid for or find another alternative (e.g., public cloud).
As of 1-Jan-2020, there are two storage platforms for JCU researcher consumption. They provide multiple filesystems:
/home- 512TiB of space for researchers' home directories.
/scratch- 80TiB of "scratch" space (similar to
/tmpin terms of usage).
/sw- 200GiB of space to house software installed by HPRC staff, available for all researchers.
/gpfs01- 516TiB of cache space for "nationally significant", ARDC collections. The primary copy of all ARDC collections is held by QCIF in South-East Queensland.
One of the most important things for you to note is that there is insufficient space for all researchers to actually consume their default quota.
Quota enforcement is in place on all research filesystems:
/home- 5TiB per researcher. 250,000 inodes per researcher.
/scratch- 5TiB per researcher, 1,000,000 inodes per researcher. Not suitable for long-term data housing.
/gpfs01- ARDC/QCIF have a merit-based approval process for research projects that request storage. JCU's cache quota per collection will be lower than the quota set at QCIF.
Individual jobs/workflows can also use SSD space available on each server. Each CPU node should have a 300TiB
/tmp filesystem for such requirements. Scheduled processes regularly clean up (delete) old files in this filesystem (on all nodes).
Perception of filesystem performance decreases with increasing inode count.
JCU replicate all research data to an offsite location. Our current total inode count is the primary reason that a replication process (all data) will take well over a month, even if there has been no/little change to data.
The default quotas can be thought of as "what a researcher gets with a HPC account". Until processes mentioned below are introduced, reasonable quota increases will be actioned. However, such quota increases may be removed at a latter date (at short notice).
Mechanisms are likely to be introduced that allow research groups to pay for increased quota.
Disk (space) quota increases without extra payment will likely be subject to an approval process.
Firstly, "slow" is subjective - what you consider slow others may consider fast. Note that use of personal computers for research computing (or data retention) isn't regarded as "responsible research" and I'm not aware of any personal computing device that can scale up to 2PB, so comparisons with speed of an SSD on your personal computer are meaningless.
There are numerous reasons impacting performance of speed of uploads/downloads:
- I am not aware of any shared network that provides a performance guarantee. Think about it this way - on your home NBN/ADSL link, do you always get the speed you pay for? I definitely don't.
- Storage is often the biggest bottleneck to percieved network performance. Once again, all HPC storage is shared across users and jobs they are running (could be hundreds of parallel IO operations at any given time).
- HPC houses over 2PiB (2038TiB) of research data, almost wholly on 7200RPM NL-SAS disks. Performance at that scale comes at an extremely high price - way above HPC budget.
- JCU HPC filesystem (home directories) performance caps out at about 12Gb/s (theoretical). Filesystem performance decreases with increasing inode (file) count. Also, the smaller the file the worse the maximum speed you'll see.
Ultimately, it really comes down to cost. If you are unhappy with the performance provided by HPC, there are other options - most of which you will have to find budget and/or time to setup and use.
This is a very difficult question to answer, some of the factors involved are:
- Size of file. Small files will never see high transfer speeds (even when there is no network involved).
- Number of concurrent IO requests active on the system/environment or number of concurrent users and jobs using the infrastructure.
- Time of day (related to 2). The biggest factor to performance seems to be time of day - highest speeds are achieved outside normal working hours.
- Anti-virus software, encryption level, security devices, etc. In a world where more people are having their accounts compromised, think very carefully before bypassing or not using such devices/methods.
For uploads of old research data to AWS, I see average speeds ranging between 10MB/s and 240MB/s. At a given instant though, I have seen transfer rates that are less than 1MB/s. Generally speaking, I don't like seeing <10MB/s, happy with 30MB/s, but love seeing more.
Note: There are ways/tools to improve network performance, however many come with a high price tag or risk.
There are numerous options available to JCU researchers:
- QCIF/QCISCloud - There is a short-term merit allocation scheme in place. Pay-for-service options exist.
- NCI - There is a merit allocation scheme in place. Pay-for-service options may exist.
- Public cloud (e.g., AWS or Azure) - This is a pay-for-service option in most cases.
If you are collaborating with researchers from other institutions, you may consider requesting access to their resources.