Child pages
  • HPRC Suggestions for Cairns Researchers
Skip to end of metadata
Go to start of metadata

Before reporting performance issues with HPRC fileshares, please realize that there is very little HPRC staff can do in most cases. We will not kill HPRC cluster jobs to free up bandwidth for fileshare use. Most fileshare use is interactive and you are talking about minutes/hours of lost productivity. Killing a HPRC cluster job could mean the loss of months to a researcher - consider if you were PhD student with 1 month left to submission and HPRC staff just killed your final job run that had been started 3 months earlier and you'll understand why we avoid killing jobs.

A number of JCU researchers based in Cairns have complained about the performance of fileshares. There are a number of factors to considers when talking about the performance of fileshares over the WAN (versus LAN). This document has been written so that you may understand some of the key factors involved in slow file transfers.

  1. JCU has a 2.5Gb/s link between the Townsville and Cairns campuses, however some 1Gb/s network hardware remains in use. ITR/HPC staff have tested the link (using iperf) and confirmed that 970Mb/s is possible outside working hours.
    1. The active HPC NAS server has, on several occasions, been observed to send data to UQ at over 700Mb/s (same link characteristics).
    2. WAN optimization (on your laptop/desktop) will probably be needed in order to obtain high speed file transfers. Tuning window size based on expected latency is key to this. However, if the network use changes the link latency expect the bandwidth you are obtaining to change.
  2. The smaller the file, the slower the maximum rate of transfer. Consider this a "fact of life" - it also holds true outside the realms of IP networking.
  3. Almost all network links back to JCU data-centres are over-provisioned - e.g., a 10Gb/s link providing networking for 96 systems with 1Gb/s connectivity.
  4. There are many protocols that can be used for high-bandwidth (minimizing time to completion) file transfers, all of which have different performance characteristics.
    1. A file transfer speed of 30MB/s (240Mb/s) represents about 25% of a 1Gb/s network connection. Given the number of staff/students at JCU, you should be extremely happy if you get data transfer rates this high during work hours.
    2. FileZilla is the recommended method for connecting desktop/laptop computers to HPRC storage. It has been tested (by ITR/HPC staff) to achieve file transfer rates in excess of 30MB/s at times (see above point).
    3. The CIFS (Windows filesharing) protocol is not generally optimized for WAN traffic. If you choose to use CIFS, the maximum transfer speed you can expect is about 10MB/s to the HPRC Linux NAS servers. Note that optimizing WAN performance will most likely lead to a decrease in LAN performance due to the differing latencies.
    4. Other protocols exist for more demanding transfer tasks - e.g., Aspera or GridFTP. HPRC staff will be investigating these and similar technologies in 2013.
  5. As at 14-Feb-2013, HPRC storage contained slightly over 500TB of researcher files. There is no way to backup this amount of data. HPRC have been using HSM (Hierarchical Storage Management) software for over 10 years to achieve government guidelines/legislation for research data retention.
    1. Our HSM system places all files onto LTO-5 tape media (2 copies). Files also remain on disk until a space and age policy determines that these files would be better off on tape.
    2. From a client perspective, when you request to download a file that is on tape, your client will not see any data transfer until the file has been un-migrated on HPRC disks. Some file transfer protocols will time-out while waiting for the data to be recalled from tape. Just wait a few minutes and try again.
    3. Individually, our LTO-5 tape drives are faster than our disks for streaming data. However, tape drives incur a significant overhead for loading, seeking, and unloading.
    4. The HPRC tape library only contains 8 tape drives. This means only 8 concurrent read (or 4 write) requests can be handled at any one time. Extended delays in file retrieval will almost always be due to tape drive contention.
  6. The HPRC compute cluster contains 34 nodes, each connected to the HPRC NAS servers at 40Gb/s. The compute cluster has more than enough power to bring our entry-level storage arrays to a crawl.
    1. Users should understand the the HPRC compute cluster is the primary reason HPRC exists. Fileshares are available for convenience.
  7. Enterprise storage is very expensive - about $700/TB in the last HPRC infrastructure upgrade, all 7200RPM disks.

Recommendations

  1. Try using different software (protocol).
  2. Start your big file transfers outside normal working hours.
  3. Try to get access to a wired network rather than using wireless networks.
  4. Avoid practices such as HD video editing directly on a HPRC fileshare - it'll never be a good experience. An alternative in this case would be to download the source video from HPRC, perform your video editing while the file is on your local disk(s), then upload the result to HPRC storage.
  5. Remember, having more than 1 copy of your research data is a wise move.
  6. RDSI (Research Data Storage Infrastructure) is a national programme to provide massive data storage facilities across Australia. They provide merit-allocated storage to Australian researchers. After successful application for an RDSI allocation you may receive a dedicate filesystem or storage array for your requirements. Aspects such as WAN optimization for your specific purposes could be looked into in this case.
  • No labels