Child pages
  • JCU eResearch Infrastructure  Q&A
Skip to end of metadata
Go to start of metadata

General Questions

 Why use HPC, when using my PC/Mac is easier?

The Australian government and their research funding bodies do not consider use of personal computing devices (e.g., desktop/laptop computers) to be responsible research.  Using personal computing devices for research could result in non-awarding of funding or a post-graduate degree.  For career researchers, it could also lead to your research being labelled as untrusted.

During my time at JCU, I have heard several stories of researchers (PhD students mostly) having to "start from scratch" after a disk/laptop has failed.  HPC storage has significant levels of protection against hardware failures. 

The following list contains some key advantages to use of HPC resources:

  1. You can run many jobs in parallel, increasing your research productivity.
  2. Some software has been written to run on many CPU cores - all HPC compute nodes have 40 CPU cores, again increasing your research productivity.
  3. Each HPC compute node (1-Jan-2020) has 384GiB of memory installed, an amount you won't see in any personal computing device.  So you can perform research on the HPC infrastructure that isn't possible on a personal computing device.
  4. HPC infrastructure uses ECC memory - single bit errors will be reported and corrects, multiple bit errors will result in server shutdown, so your results won't be corrupted by events invisible to you.  Unless you are using workstation class hardware, you cannot guarantee that results of computational research done on personal computing devices can be trusted.


JCU CPU Cluster

 How large is the CPU cluster?

As of 1-Jan-2020, the CPU cluster has:

  • 2 login nodes; each with 40 CPU cores (Intel Xeon 6248) and 384GiB of memory.
  • 17 compute nodes; each with 40 CPU cores (Intel Xeon 6148/6248) and 384GiB of memory.

Login nodes may be used for interactive workflows (e.g., where GUIs are required), testing and/or development purposes, and for short (<4 hours), single-core jobs (no more than 4 at a time).

 Can I run Windows software on HPC infrastructure?

The HPC cluster is built upon the Linux operating system, not Windows.

However, There may be a version of the software you want to use available for Linux (features/functionality may differ).

Researchers' using computational research software that only works under Windows can request a virtual machine (see Virtual Machines section below). 

 Can I run OSX software on HPC infrastructure?

No.  HPC does not have any resources that could be used to run computational research software written only for OSX.

However,  there may be a Linux or Windows version of the software you wish to use (features/functionality may differ).


JCU GPU Cluster

 Is there a GPU resource that I can use?

Yes.  There is one (1) remaining server with two nVIDIA Volta100 GPU cards installed - this server will hit end of life in Aug-2020 and is unlikely to be replaced.

In 2019, JCU has purchased three (3) years of access to 10% of the GPU capacity at Queensland Brain Institute (QBI).  Please contact chantelle.pinnington@jcu.edu.au to gain access to the QBI GPU cluster.

JCU Virtual Machines

 I wish to learn more about Virtual Machine Resources

JCU HPC run a very small number of servers that provide virtual machines to support JCU research activities.  As of 1-Jan-2020, the HPC ESXi cluster consists of:

  • 2 servers; each with 28 CPU cores and 512GiB of memory.
  • 1 server capable of providing virtual Quadro graphics card capabilities (GPU backed Virtual Desktop Infrastructure).

Virtual machines can be requested to perform tasks or provide services that cannot be done on the the HPC CPU cluster.  As a general rule, the maximum virtual resource that will be provided is 4 vCPUs, 32GB of vRAM, and/or 100GB of disk space.  Note that virtual CPU and memory resources are shared - you will not get physical server performance.

 What operating system choices are there?

JCU Technology Solutions provide platform(s) to researchers.  Development of a service providing platform will be based on one of the following operating systems

  • Microsoft Windows Server
  • RedHat Enterprise Linux


Computational Research Software

 Can I run Windows software on HPC infrastructure?

Yes, conditionally.

  1. The HPC cluster is built upon the Linux operating system, not Windows.  There may be a version of the software you want to use available for Linux (features/functionality may differ).
  2. Researchers' with requirements to use computational research software that only works on Windows can request a virtual machine.  HPC have very limited resources to meet such requirements - VM size is limited to 2 CPU cores, 16GiB of memory, and 100GiB of disk space.
  3. HPC have commenced work on a high-end graphical environment (Virtual Desktop Infrastructure) for researchers.
 Does JCU provide accelerators?

As at 1-Jan-2020, NVIDIA GPU resources are available as follows:

  • One GPU node at JCU (tesla.hpc.jcu.edu.au), with 2 NVIDIA  V100 cards (16GiB).
  • Thirty NVIDIA V100 cards (32GiB) at UQ.  Contact chantelle.pinnington@jcu.edu.au to arrange access to this resource.

Note:  The last of the GPU nodes at JCU is likely to decommissioned or repurposed in 2020.  Researchers wishing to use accelerators may access the UQ resource we have paid for or find another alternative (e.g., public cloud).

Research Storage

 How much storage does HPC have (total)?

As of 1-Jan-2020, there are two storage platforms for JCU researcher consumption.  They provide multiple filesystems:

  • /home - 512TiB of space for researchers' home directories.
  • /scratch - 80TiB of "scratch" space (similar to /tmp in terms of usage).
  • /sw - 200GiB of space to house software installed by HPRC staff, available for all researchers.
  • /gpfs01 - 516TiB of cache space for "nationally significant", ARDC collections.  The primary copy of all ARDC collections is held by QCIF in South-East Queensland.

One of the most important things for you to note is that there is insufficient space for all researchers to actually consume their default quota.

 How much storage is available for me?

Quota enforcement is in place on all research filesystems:

  • /home - 5TiB per researcher.  250,000 inodes per researcher.
  • /scratch - 5TiB per researcher, 1,000,000 inodes per researcher.  Not suitable for long-term data housing.
  • /gpfs01 - ARDC/QCIF have a merit-based approval process for research projects that request storage.  JCU's cache quota per collection will be lower than the quota set at QCIF.

Individual jobs/workflows can also use SSD space available on each server.  Each CPU node should have a 300TiB /tmp filesystem for such requirements.  Scheduled processes regularly clean up (delete) old files in this filesystem (on all nodes).

 What is an inode? Why is there an inode quota?

https://en.wikipedia.org/wiki/Inode

Perception of filesystem performance decreases with increasing inode count.

JCU replicate all research data to an offsite location.  Our current total inode count is the primary reason that a replication process (all data) will take well over a month, even if there has been no/little change to data.

 Can my quota be increased?

The default quotas can be thought of as "what a researcher gets with a HPC account".  Until processes mentioned below are introduced, reasonable quota increases will be actioned.  However, such quota increases may be removed at a latter date (at short notice).

Future: 

Mechanisms are likely to be introduced that allow research groups to pay for increased quota.

Disk (space) quota increases without extra payment will likely be subject to an approval process.

Performance

 Why is my upload/download speed so slow?

Firstly, "slow" is subjective - what you consider slow others may consider fast.  Note that use of personal computers for research computing (or data retention) isn't regarded as "responsible research" and I'm not aware of any personal computing device that can scale up to 2PB, so comparisons with speed of an SSD on your personal computer are meaningless.

There are numerous reasons impacting performance of speed of uploads/downloads:

  1. I am not aware of any shared network that provides a performance guarantee.  Think about it this way - on your home NBN/ADSL link, do you always get the speed you pay for?  I definitely don't.
  2. Storage is often the biggest bottleneck to percieved network performance.  Once again, all HPC storage is shared across users and jobs they are running (could be hundreds of parallel IO operations at any given time).
  3. HPC houses over 2PiB (2038TiB) of research data, almost wholly on 7200RPM NL-SAS disks.  Performance at that scale comes at an extremely high price - way above HPC budget.
  4. JCU HPC filesystem (home directories) performance caps out at about 12Gb/s (theoretical).  Filesystem performance decreases with increasing inode (file) count.  Also, the smaller the file the worse the maximum speed you'll see.

Ultimately, it really comes down to cost.  If you are unhappy with the performance provided by HPC, there are other options - most of which you will have to find budget and/or time to setup and use.

 What IO performance should I expect when accessing HPC?

This is a very difficult question to answer, some of the factors involved are:

  1. Size of file.  Small files will never see high transfer speeds (even when there is no network involved).
  2. Number of concurrent IO requests active on the system/environment or number of concurrent users and jobs using the infrastructure.
  3. Time of day (related to 2).  The biggest factor to performance seems to be time of day - highest speeds are achieved outside normal working hours.
  4. Anti-virus software, encryption level, security devices, etc.  In a world where more people are having their accounts compromised, think very carefully before bypassing or not using such devices/methods.

For uploads of old research data to AWS, I see average speeds ranging between 10MB/s and 240MB/s.  At a given instant though, I have seen transfer rates that are less than 1MB/s.  Generally speaking, I don't like seeing <10MB/s, happy with 30MB/s, but love seeing more.

Note:  There are ways/tools to improve network performance, however many come with a high price tag or risk.

Alternatives

 What other options exist for my computational research?

There are numerous options available to JCU researchers:

  1. QCIF/QCISCloud - There is a short-term merit allocation scheme in place.  Pay-for-service options exist.
  2. NCI - There is a merit allocation scheme in place.  Pay-for-service options may exist.
  3. Public cloud (e.g., AWS or Azure) - This is a pay-for-service option in most cases.

If you are collaborating with researchers from other institutions, you may consider requesting access to their resources.

  • No labels