Storage
The following points/suggestions should be considered when running jobs concurrently:
- HPRC storage was purchased in attempt to meet capacity requirements (it was a critical issue at the time).
- If you are reading/writing large files (greater than 10MB say), limit the number of concurrent jobs you are running (less than 10 at a time). Users running concurrent
tar,zip,gzip, and/orbzip2commands are the reason for most incidents where people complain about 'slow storage'. - Avoid using parallel utilities like pbzip2, when running concurrent jobs.
- Create a separate job (sequential) for all your post-run I/O transactions (such as those mentioned above).
- Compression tools don't provide much benefit when working on binary data files - it's generally better to leave binary files uncompressed.
Finally, a warning for zip users: Do not use zip when you are working with more than 2GB of files - you may have difficulty with unzip. If you have created a large zip file already, one solution that has worked for me is to perform the unzip on OSX (MAC). I cannot guarantee this work-around is robust. Generally, I'd recommend using tar for grouping multiple files into a single file. Use gzip or bzip2 for compression, if wanted or required. Note that compression can be done as part of the tar command, for example:
tar cfz proj1.tar.gz proj1/
will tar and compress the contents of a proj1 subdirectory.
Performance versus block size (read/write)
The following output shows the performance achievable, on SATA SSDs, for reads/writes of differing block sizes.
Block size 1K Write: 256000000 bytes (256 MB, 244 MiB) copied, 10.0142 s, 25.6 MB/sBlock size 1K Read : 256000000 bytes (256 MB, 244 MiB) copied, 9.18361 s, 27.9 MB/sBlock size 2K Write: 256000000 bytes (256 MB, 244 MiB) copied, 5.29912 s, 48.3 MB/sBlock size 2K Read : 256000000 bytes (256 MB, 244 MiB) copied, 4.87071 s, 52.6 MB/sBlock size 4K Write: 256000000 bytes (256 MB, 244 MiB) copied, 2.97716 s, 86.0 MB/sBlock size 4K Read : 256000000 bytes (256 MB, 244 MiB) copied, 2.54667 s, 101 MB/sBlock size 8K Write: 256000000 bytes (256 MB, 244 MiB) copied, 1.51576 s, 169 MB/sBlock size 8K Read : 256000000 bytes (256 MB, 244 MiB) copied, 1.3195 s, 194 MB/sBlock size 16K Write: 256000000 bytes (256 MB, 244 MiB) copied, 0.843084 s, 304 MB/sBlock size 16K Read : 256000000 bytes (256 MB, 244 MiB) copied, 0.676777 s, 378 MB/sBlock size 32K Write: 255983616 bytes (256 MB, 244 MiB) copied, 0.487291 s, 525 MB/sBlock size 32K Read : 255983616 bytes (256 MB, 244 MiB) copied, 0.411139 s, 623 MB/sBlock size 64K Write: 255983616 bytes (256 MB, 244 MiB) copied, 0.287594 s, 890 MB/sBlock size 64K Read : 255983616 bytes (256 MB, 244 MiB) copied, 0.248797 s, 1.0 GB/sBlock size 128K Write: 255983616 bytes (256 MB, 244 MiB) copied, 0.324746 s, 788 MB/sBlock size 128K Read : 255983616 bytes (256 MB, 244 MiB) copied, 0.153213 s, 1.7 GB/sBlock size 256K Write: 255852544 bytes (256 MB, 244 MiB) copied, 0.705291 s, 363 MB/sBlock size 256K Read : 255852544 bytes (256 MB, 244 MiB) copied, 0.108035 s, 2.4 GB/sBlock size 512K Write: 255852544 bytes (256 MB, 244 MiB) copied, 0.739421 s, 346 MB/sBlock size 512K Read : 255852544 bytes (256 MB, 244 MiB) copied, 0.0776258 s, 3.3 GB/sBlock size 1024K Write: 255852544 bytes (256 MB, 244 MiB) copied, 0.748344 s, 342 MB/sBlock size 1024K Read : 255852544 bytes (256 MB, 244 MiB) copied, 0.0622923 s, 4.1 GB/sBlock size 2048K Write: 255852544 bytes (256 MB, 244 MiB) copied, 0.801381 s, 319 MB/sBlock size 2048K Read : 255852544 bytes (256 MB, 244 MiB) copied, 0.0554749 s, 4.6 GB/sBlock size 4096K Write: 255852544 bytes (256 MB, 244 MiB) copied, 0.781268 s, 327 MB/sBlock size 4096K Read : 255852544 bytes (256 MB, 244 MiB) copied, 0.0500531 s, 5.1 GB/sBlock size 8192K Write: 251658240 bytes (252 MB, 240 MiB) copied, 0.781807 s, 322 MB/sBlock size 8192K Read : 251658240 bytes (252 MB, 240 MiB) copied, 0.0461191 s, 5.5 GB/s
Using an 8MB block size gains about 200x the read performance achieved with a 1K block size. The situation is similar when it comes to performance of moving small and large files across a network - the smaller the file, the lower the speed, unless you have already hit line rate.