The following points/suggestions should be considered when running jobs concurrently:
- HPRC storage was purchased in attempt to meet capacity requirements (it was a critical issue at the time).
- If you are reading/writing large files (greater than 10MB say), limit the number of concurrent jobs you are running (less than 10 at a time). Users running concurrent
bzip2commands are the reason for most incidents where people complain about 'slow storage'.
- Avoid using parallel utilities like pbzip2, when running concurrent jobs.
- Create a separate job (sequential) for all your post-run I/O transactions (such as those mentioned above).
- Compression tools don't provide much benefit when working on binary data files - it's generally better to leave binary files uncompressed.
Finally, a warning for zip users: Do not use
zip when you are working with more than 2GB of files - you may have difficulty with
unzip. If you have created a large zip file already, one solution that has worked for me is to perform the unzip on OSX (MAC). I cannot guarantee this work-around is robust. Generally, I'd recommend using
tar for grouping multiple files into a single file. Use
bzip2 for compression, if wanted or required. Note that compression can be done as part of the
tar command, for example:
will tar and compress the contents of a