Child pages
  • HPRC Performance Considerations
Skip to end of metadata
Go to start of metadata

Storage

The following points/suggestions should be considered when running jobs concurrently:

  • HPRC storage was purchased in attempt to meet capacity requirements (it was a critical issue at the time).
  • If you are reading/writing large files (greater than 10MB say), limit the number of concurrent jobs you are running (less than 10 at a time).  Users running concurrent tar, zip, gzip, and/or bzip2 commands are the reason for most incidents where people complain about 'slow storage'.
  • Avoid using parallel utilities like pbzip2, when running concurrent jobs.
  • Create a separate job (sequential) for all your post-run I/O transactions (such as those mentioned above).
  • Compression tools don't provide much benefit when working on binary data files - it's generally better to leave binary files uncompressed.

Finally, a warning for zip users: Do not use zip when you are working with more than 2GB of files - you may have difficulty with unzip.  If you have created a large zip file already, one solution that has worked for me is to perform the unzip on OSX (MAC).  I cannot guarantee this work-around is robust. Generally, I'd recommend using tar for grouping multiple files into a single file. Use gzip or bzip2 for compression, if wanted or required.  Note that compression can be done as part of the tar command, for example:

tar cfz proj1.tar.gz proj1/

will tar and compress the contents of a proj1 subdirectory.

  • No labels