Child pages
  • Research Storage Environment (2020-)
Skip to end of metadata
Go to start of metadata
 HPC Storage Appliances

The following table shows bulk storage appliances purchased for JCU research.

Owner,OperatorUsable SizeVendor ModelEnd-of-Life
JCU,JCU600TiBDELL SC4020Q4-2021
JCU,QCIF516TiBDDN SFA7990EQ1-2024

The DELL storage array is at maximum capacity, so its capacity cannot be increased.  The DDN storage can be expanded if budget can be found/provided.

 In-server Storage

Each HPC login/compute node has internal SSD storage, as follows:

Server(s) purposeRaw size
Login/Compute nodes480GB (RAID-1)
GPU servers240GB (RAID-1) + 960GB (RAID-0)
ESXi servers2400GB (RAID-0)

Storage inside all servers could be increased, but there is little evidence supporting such an upgrade (Apr-2020).  Note: GPU nodes are likely to be decommissioned in 2020.

 HPC Filesystems

The following filesystems are configured specifically for JCU research use

FilesystemSizeShared viaDetails
/gpfs01 516TiBGPFSCache for ARDC/RDSI/QCIF approved allocations
/home 512TiBNFSResearchers' home directories
/scratch 80TiBNFSScratch space, shared using NFS
/sw 200GiBNFSSoftware (read-only for researchers)
/tmp300GiB-Local (to node) SSD scratch space

For efficiency across JCU, it is best if your computational research jobs are run under /tmp or /scratch.  One completed, only files that are likely to have long-term value should be moved to your home directory or ARDC/RDSI/QCIF approved allocation.


 Filesystem Performance

JCU's DELL storage array is very much entry-level - purchased to provide capacity, not performance.

While the DDN storage array has a much higher performance potential, it's performance is limited by the number of disks (64) and SSDs (12) installed in it.

The /gpfs01  filesystem is a medici cache - primary copy of all data is held by QCIF (locations in Brisbane and Springfield).  There is no guarantee of recall time for offline (not currently on JCU cache space) files.  Outages at QCIF, or network issues between JCU and QCIF, will lead to IO errors being seen whenever you try to access/use offline files.  Retrying your task(s) at a later time is what I would suggest.

Your perception of filesystem performance decreases with increasing inodes (file count).  In mid-2019, HPC held over 500,000,000 nodes across 8 filesystems - having less than 10,000,000 inodes per filesystem is desirable.

 Default User Quotas

As of 1-Jan-2020, the following default quotas are configured

FilesystemAccount typeDefault Quotas
/home JCU5TiB & 250,000 inodes
/scratch 

JCU

External

Delegate

5TiB & 1,000,000 inodes

100GiB & 100,000 inodes

100GiB & 100,000 inodes

A research data management strategy is being considered.  Until this strategy is in place, individual quotas may be increased.  Longer term, there may be a merit-approval and/or purchasing process for individual users obtaining an increased quota.

There is no data protection in place for external or delegate accounts.  External accounts are generally allocated for <= 12 months of use, while delegate accounts are intended for short-term use (a few months at most).  Both can be problematic when it comes to identification of data ownership (long term).  If you are collaborating with people using external/delegate accounts and you want long-term data retention, the data should be housed in a space allocated to a JCU staff member or postgraduate student.  HPC staff can configure directory/sub-directory access to people you are collaborating with, within your home directory or ARDC/RDSI/QCIF allocation.

 Project (ARDC/RDSI/QCIF) Quotas

Many JCU researchers have applied for, and been awarded, an RDSI storage allocation for their research.  There is a default inode quota of 1,000,000 on all allocations, however increases up to the QCIF upper limit of 10,000,000 are possible.

AllocationQ0033Q0036Q0037Q0042Q0043Q0044Q0050Q0114Q0123Q0124Q0125Q0145Q0148Q0149Q0150
Cache Quota (TiB)5,68,1248,7055,595,620,215,64,52,24,54,58,101,15,65,6
QCIF Quota (TiB)1520102410030100121025103512012
inode Quota10M1M11M11M1M10M1M10M1M1M1M2M1M1M1M

AllocationQ0166Q0171Q0184Q0188Q0189Q0195Q0199Q0200Q0201Q0202Q0203Q0208Q0210Q0213Q0214
Cache Quota (TiB)5,51,18,105,65,60.5,0.55,527,304,54,54,58,1040.420.225,30
QCIF Quota (TiB)514516200.55151010101501500.240
inode Quota1M1M1M1M1M5M1M20M1M1M1M1M2M1M10M

AllocationQ0217Q0219Q0220Q0222Q0230Q0252Q0261Q0262Q0266Q0269Q0308Q0309Q0365Q0477Q0478
Cache Quota (TiB)2,28,1060,624,55,62,28,100.5,0.51,14,50.5,0.50.5,0.51,12,23,4
QCIF Quota (TiB)250608202600.51100.50.5124
inode Quota5M1M10M1M1M1M1M1M1M4M1M1M1M1M1M

AllocationQ0634Q0638Q0750Q1116Q2024Q2025Q2026Q2027Q2028Q2029Q2030Q2031Q2536

Cache Quota (TiB)4,57,84,58,10






3,3


QCIF Quota (TiB)715540155035120525253T40T

inode Quota1M1M1M1M5M10M3M9M10M10M10M10M10M

Contact chantelle.pinnington@jcu.edu.au if you need up increase in quota or are interested in obtaining a new allocation.  ARDC/RDSI/QCIF allocations are recommended for any researcher or research group that has a storage requirement in excess of default HPC user quotas.

Q2031 – 3TiB, 10M inodes

 Responsible Research Guidelines (Storage)

HPC provides a location for the primary/trusted copy of your research data only.  Other examples of locations which are responsible locations for your primary/copy of research data are:

  1. Public cloud - e.g., OneDrive and AWS.
  2. Education/Private Cloud - e.g., AARNet CloudStor and ARDC services.
  3. Institutional/Other facilities that offer equivalent or higher level protection mechanisms than JCU HPC.

Personal computing devices (e.g., PCs, USB disks, personal NAS appliance) are definitely NOT a responsible location for the primary/trusted copy of your research data.  From a technical perspective, this is because changes can occur on disks without being noticed and corrected (e.g., search the internet for "silent data corruption").  One of the bigger issues that has been brought to the attention of authorities is the lack of ability to verify/validate research if personal computing devices are involved (e.g., when they house the primary copy of data).

 Research Data Management

The task of research data management is far more likely to be successful if data owners/curators are involved, as HPC staff may have little/no understanding of the data.

Archiving:  To cap costs of providing storage for researchers, JCU is archiving old files to AWS Glacier Deep Archive.  A Service-Now request can be submitted to recall files from this archive.

Classification:  A data classification exercise began in 2019 with a goal of identifying an removing (from HPC) files that are not permitted on HPC storage, files that can be archived, etc.  

Data Protection:  JCU protect research data by replication to an remote location.  This protection does not protect against accidental file deletion.  Not all files will be protected (e.g., user installed software).  Should any researcher or research group desire backup level protection for their data on HPC, ICT will be able to provide technical advice - however you will be responsible for paying for the solution.

Quotas:  Researchers will be expected to make every effort to work within the default quota limits.  Increases to quotas are more likely if your files are all research data and there is no duplication.  Software (archivemount) has been installed on all HPC login/compute nodes to allow you to access/modify files within a tarball.  HPC staff are willing to assist you with data management tasks.


 IT Security

JCU HPC staff are working to turn off all desktop file shares (SMB protocol).  Alternative options for researchers are:

  1. Many national/international research areas already have mature facilities/services for housing, protecting, and/or publishing your research data.  Ideally, these options should be your first choice.
  2. AARNet CloudStor provides a storage option that can be seen by a greater spread of personal computing devices than HPC.
  3. OneDrive provides up to 5TiB of storage to all JCU staff/students.  Like CloudStor, this option is compatible with a very large range of personal computing devices.
  4. Australian Research Data Commons (ARDC) will continue to provide resources for Australian researchers.
  5. Several researchers and HPC staff are involved in a trial of Mountain Duck software as a replacement for traditional file shares (to personal computing devices).  So far, all experiences have been positive, so this option is likely to be made available to anyone interested soon.
  6. Public cloud resources are definitely a good option - control of usage becomes extremely important in this case.

Using JCU's VPN service is the safest way to access HPC resources when not on a JCU campus.

Currently, there is a JCU firewall exception that allows you to connect to HPC login nodes by SSH from off-campus - this exception may be removed in future.

HPC systems are automatically patched and scheduled reboots will occur.  Generally speaking, HPC staff will try to avoid killing your jobs - but we may get directed to do work that will kill jobs/connections.


ESSENTIAL UNDERSTANDING (Policy)

All JCU researchers need to read and understand information within https://www.jcu.edu.au/policy/information-and-communications-technology/information-communication-technology-acceptable-use-policy

  • No labels