The following table shows bulk storage appliances purchased for JCU research.
|Owner,Operator||Usable Size||Vendor Model||End-of-Life|
The DELL storage array is at maximum capacity, so its capacity cannot be increased. The DDN storage can be expanded if budget can be found/provided.
Each HPC login/compute node has internal SSD storage, as follows:
|Server(s) purpose||Raw size|
|Login/Compute nodes||480GB (RAID-1)|
|GPU servers||240GB (RAID-1) + 960GB (RAID-0)|
|ESXi servers||2400GB (RAID-0)|
Storage inside all servers could be increased, but there is little evidence supporting such an upgrade (Apr-2020). Note: GPU nodes are likely to be decommissioned in 2020.
The following filesystems are configured specifically for JCU research use
|516TiB||GPFS||Cache for ARDC/RDSI/QCIF approved allocations|
|512TiB||NFS||Researchers' home directories|
|80TiB||NFS||Scratch space, shared using NFS|
|200GiB||NFS||Software (read-only for researchers)|
|300GiB||-||Local (to node) SSD scratch space|
For efficiency across JCU, it is best if your computational research jobs are run under /tmp or /scratch. One completed, only files that are likely to have long-term value should be moved to your home directory or ARDC/RDSI/QCIF approved allocation.
JCU's DELL storage array is very much entry-level - purchased to provide capacity, not performance.
While the DDN storage array has a much higher performance potential, it's performance is limited by the number of disks (64) and SSDs (12) installed in it.
/gpfs01 filesystem is a medici cache - primary copy of all data is held by QCIF (locations in Brisbane and Springfield). There is no guarantee of recall time for offline (not currently on JCU cache space) files. Outages at QCIF, or network issues between JCU and QCIF, will lead to IO errors being seen whenever you try to access/use offline files. Retrying your task(s) at a later time is what I would suggest.
Your perception of filesystem performance decreases with increasing inodes (file count). In mid-2019, HPC held over 500,000,000 nodes across 8 filesystems - having less than 10,000,000 inodes per filesystem is desirable.
As of 1-Jan-2020, the following default quotas are configured
|Filesystem||Account type||Default Quotas|
|JCU||5TiB & 250,000 inodes|
5TiB & 1,000,000 inodes
100GiB & 100,000 inodes
100GiB & 100,000 inodes
A research data management strategy is being considered. Until this strategy is in place, individual quotas may be increased. Longer term, there may be a merit-approval and/or purchasing process for individual users obtaining an increased quota.
There is no data protection in place for external or delegate accounts. External accounts are generally allocated for <= 12 months of use, while delegate accounts are intended for short-term use (a few months at most). Both can be problematic when it comes to identification of data ownership (long term). If you are collaborating with people using external/delegate accounts and you want long-term data retention, the data should be housed in a space allocated to a JCU staff member or postgraduate student. HPC staff can configure directory/sub-directory access to people you are collaborating with, within your home directory or ARDC/RDSI/QCIF allocation.
Many JCU researchers have applied for, and been awarded, an RDSI storage allocation for their research. There is a default inode quota of 1,000,000 on all allocations, however increases up to the QCIF upper limit of 10,000,000 are possible. Note - inode quotas above 10M in the table below are temporary, the data owner will need to get their inode usage below 10M.
|Cache Quota (TiB)||5,6||8,12||48,70||55,59||5,6||20,21||5,6||4,5||2,2||4,5||4,5||8,10||1,1||5,6||5,6|
|QCIF Quota (TiB)||15||20||1024||100||30||100||12||10||2||5||10||35||1||20||12|
|Cache Quota (TiB)||5,5||1,1||8,10||5,6||5,6||0.5,0.5||5,5||27,30||4,5||4,5||4,5||8,10||40.42||0.2||25,30|
|QCIF Quota (TiB)||5||1||45||16||20||0.5||5||15||10||10||10||150||150||0.2||40|
|Cache Quota (TiB)||2,2||8,10||60,62||4,5||5,6||2,2||8,10||0.5,0.5||1,1||4,5||0.5,0.5||0.5,0.5||1,1||2,2||3,4|
|QCIF Quota (TiB)||2||50||60||8||20||2||60||0.5||1||10||0.5||0.5||1||2||4|
|Cache Quota (TiB)||4,5||7,8||4,5||8,10||3,3|
|QCIF Quota (TiB)||7||15||5||40||15||50||35||120||5||25||25||3T||40T|
email@example.com if you need up increase in quota or are interested in obtaining a new allocation. ARDC/RDSI/QCIF allocations are recommended for any researcher or research group that has a storage requirement in excess of default HPC user quotas.
Q2031 – 3TiB, 10M inodes
HPC provides a location for the primary/trusted copy of your research data only. Other examples of locations which are responsible locations for your primary/copy of research data are:
- Public cloud - e.g., OneDrive and AWS.
- Education/Private Cloud - e.g., AARNet CloudStor and ARDC services.
- Institutional/Other facilities that offer equivalent or higher level protection mechanisms than JCU HPC.
Personal computing devices (e.g., PCs, USB disks, personal NAS appliance) are definitely NOT a responsible location for the primary/trusted copy of your research data. From a technical perspective, this is because changes can occur on disks without being noticed and corrected (e.g., search the internet for "silent data corruption"). One of the bigger issues that has been brought to the attention of authorities is the lack of ability to verify/validate research if personal computing devices are involved (e.g., when they house the primary copy of data).
The task of research data management is far more likely to be successful if data owners/curators are involved, as HPC staff may have little/no understanding of the data.
Archiving: To cap costs of providing storage for researchers, JCU is archiving old files to AWS Glacier Deep Archive. A Service-Now request can be submitted to recall files from this archive.
Classification: A data classification exercise began in 2019 with a goal of identifying an removing (from HPC) files that are not permitted on HPC storage, files that can be archived, etc.
Data Protection: JCU protect research data by replication to an remote location. This protection does not protect against accidental file deletion. Not all files will be protected (e.g., user installed software). Should any researcher or research group desire backup level protection for their data on HPC, ICT will be able to provide technical advice - however you will be responsible for paying for the solution.
Quotas: Researchers will be expected to make every effort to work within the default quota limits. Increases to quotas are more likely if your files are all research data and there is no duplication. Software (archivemount) has been installed on all HPC login/compute nodes to allow you to access/modify files within a tarball. HPC staff are willing to assist you with data management tasks.
JCU HPC staff are working to turn off all desktop file shares (SMB protocol). Alternative options for researchers are:
- Many national/international research areas already have mature facilities/services for housing, protecting, and/or publishing your research data. Ideally, these options should be your first choice.
- AARNet CloudStor provides a storage option that can be seen by a greater spread of personal computing devices than HPC.
- OneDrive provides up to 5TiB of storage to all JCU staff/students. Like CloudStor, this option is compatible with a very large range of personal computing devices.
- Australian Research Data Commons (ARDC) will continue to provide resources for Australian researchers.
- Several researchers and HPC staff are involved in a trial of Mountain Duck software as a replacement for traditional file shares (to personal computing devices). So far, all experiences have been positive, so this option is likely to be made available to anyone interested soon.
- Public cloud resources are definitely a good option - control of usage becomes extremely important in this case.
Using JCU's VPN service is the safest way to access HPC resources when not on a JCU campus.
Currently, there is a JCU firewall exception that allows you to connect to HPC login nodes by SSH from off-campus - this exception may be removed in future.
HPC systems are automatically patched and scheduled reboots will occur. Generally speaking, HPC staff will try to avoid killing your jobs - but we may get directed to do work that will kill jobs/connections.
ESSENTIAL UNDERSTANDING (Policy)
All JCU researchers need to read and understand information within https://www.jcu.edu.au/policy/information-and-communications-technology/information-communication-technology-acceptable-use-policy