You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 47 Next »

Important:  Most software will only consume 1 CPU core - e.g., requesting 8 CPU cores for a PAUP job blocks other people using the unused 7 CPU cores.  Most HPC users will have scripts similar to Example 1 below.

Example 1:

The following PBS script requests 1 CPU core, 2GB of memory, and 24 hours of walltime for the running of "paup -n input.nex".

#!/bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N JobName1
#PBS -l pmem=2gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=24:00:00
#PBS -M your.name@jcu.edu.au

cd $PBS_O_WORKDIR
shopt -s expand_aliases
source /etc/profile.d/modules.sh
echo "Job identifier is $PBS_JOBID"
echo "Working directory is $PBS_O_WORKDIR"

module load paup
paup -n input.nex

If the file containing the above content has a name of JobName1.pbs, you simply execute qsub JobName1.pbs to place it into the queueing system.

Example 3:

The following PBS script requests 16 CPU cores, 64GB of memory, and 2 days of walltime for running of an MPI job.

#!/bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N JobName3
#PBS -l pmem=2gb
#PBS -l nodes=1:ppn=20
#PBS -l walltime=240:00:00
#PBS -M your.name@my.jcu.edu.au


cd $PBS_O_WORKDIR
shopt -s expand_aliases
source /etc/profile.d/modules.sh
echo "Job identifier is $PBS_JOBID"
echo "Working directory is $PBS_O_WORKDIR"

module load openmpi
module load migrate
mpirun -np 20 -machinefile $PBS_NODEFILE migrate-n-mpi ...

If the file containing the above content has a name of JobName3.pbs, you simply execute qsub JobName3.pbs to place it into the queueing system.

Example 2:

The following PBS script requests 8 CPU cores, 24GB of memory, and 3 hours of walltime for running of 8 MATLAB jobs in parallel.

#!/bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N JobName2
#PBS -l pmem=3gb
#PBS -l nodes=1:ppn=8
#PBS -l walltime=3:00:00
#PBS -M your.name@jcu.edu.au

cd $PBS_O_WORKDIR
shopt -s expand_aliases
source /etc/profile.d/modules.sh
echo "Job identifier is $PBS_JOBID"
echo "Working directory is $PBS_O_WORKDIR"

module load matlab
matlab -r myjob1 &
matlab -r myjob2 &
matlab -r myjob3 &
matlab -r myjob4 &
matlab -r myjob5 &
matlab -r myjob6 &
matlab -r myjob7 &
matlab -r myjob8 &
wait    # Wait for background jobs to finish.

If the file containing the above content has a name of JobName2.pbs, you simply execute qsub JobName2.pbs to place it into the queueing system.

Example 4:

The following PBS script requests uses job arrays. If you aren't proficient with bash scripting, using job arrays could be painful. The example below has each sub-job requesting 1 CPU core, 1 GB of memory, and 20 minutes of walltime.

#!/bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N ArrayJob
#PBS -l pmem=1gb
#PBS -l nodes=1:ppn=1
#PBS -l walltime=20:00
#PBS -M your.name@my.jcu.edu.au

cd $PBS_O_WORKDIR
shopt -s expand_aliases
source /etc/profile.d/modules.sh

module load matlab
matlab -r myjob$PBS_ARRAYID

If the file containing the above content has a name of ArrayJob.pbs and you will be running 32 sub-jobs, you simply use qsub -t 1-32 ArrayJob.pbs to place it into the queueing system.

Example 1:


    The filenames, paths, email addresses, and some values below are things you will probably need to change. In some cases, values/names have been used to demonstrate possibilities that you could employ (in a slightly different way). Apart from the -l options, no option should appear on multiple lines.

    Directive(s)

    Description of purpose

    #PBS -d /fast/jc000000

    Defines the working directory path to be used for the job.

    #PBS -j oe
    #PBS -o /tmp/output.$PBS_O_JOBID

    Merge standard output and standard error streams into the named file.

    #PBS -l pmem=4gb
    #PBS -l nodes=1:ppn=2
    #PBS -l walltime=24:00:00

    Request that 4GB of memory per CPU core be reserved for the batch job.
    Request that 2 CPU cores on 1 host be reserved for the batch job.
    Advise the scheduler that this job will have completed within 24 hours.

    #PBS -l nodes=2 -I -X

    Request 2 CPU cores that can be used for interactive job(s).
    Note: Our 2 login nodes each provide 18 CPU cores and 64GB of memory for running interactive jobs (without qsub).

    #PBS -m ae
    #PBS -M john.doe@jcu.edu.au
    #PBS -M joe.blogg@my.jcu.edu.au

    Send mail at batch job abort/exit to the Email address provided.

    #PBS -N job_name

    Assign a name (job_name) to the batch job

    #PBS -V

    Export environment variables to the batch job

    While defaults exist for many options, HPC staff ask researchers to specify CPU core, memory, and walltime requirements as accurately as possible.

    A -W option can be used for more complicated tasks such as job dependencies, stage-in and stage-out. Researchers may wish to consult with HPC staff with regard to use of the -W options. A man qsub will provide more information and more options than provided above.

    Users interested in protecting there job runs with checkpointing should realize that this feature comes at a cost (I/O operations). Checkpoint restart of a job (using BLCR) will not work for all job types. HPC staff advise use to test this feature on a typical job first before using it on other similar jobs. Generally speaking, checkpointing will only be a real benefit to jobs that run for over a week.

    The variables listed in the table below are commonly used within a PBS script file.

    Variable

    Description

    PBS_JOBNAME

    Job name specified by the user

    PBS_O_WORKDIR

    Working directory from which the job was submitted

    PBS_O_HOME

    Home directory of user submitting the job

    PBS_O_LOGNAME

    Name of user submitting the job

    PBS_O_SHELL

    Script shell

    PBS_O_JOBID

    Unique PBS job id

    PBS_O_HOST

    Host on which job script is running

    PBS_QUEUE

    Name of the job queue

    PBS_NODEFILE

    File containing line delimited list on nodes allocated to the job

    PBS_O_PATH

    Path variable used to locate executables within the job script

    Note: On multi-core systems, a node (line in PBS_NODEFILE) will identify the hostname and a CPU core.


    This example runs PAUP on the input file input.nex that resides in the current working directory. A file (here we'll name it pbsjob) is created with the contents:

    To submit the job for execution on a HPRC compute node simply enter the command:

    qsub pbsjob

    Do It Yourself

    There are several legitimate reasons for wanting to run multiple single processor jobs in parallel within a single PBS script. For example, you may want to run 8 MATLAB jobs which require a toolbox that only has 4 licensed users. Only 1 MATLAB license is checked out if all 8 jobs are run on the same system. An example PBS script to do this task would look like

    To submit the job for execution on a HPRC compute node simply enter the command:Note that the above job would be allocated 8 CPU cores and 24GB of memory.

    qsub pbsjob

    Note: The echo commands in the PBS script example above are informational only.

    Using Job Arrays

    Users with a knowledge of shell scripting (e.g., bash) may choose to take advantage of job arrays. This feature significantly reduces load on our Torque/Maui server (compared to lots of individual job submissions). The example below (assume the file name is pbsjob), will only be useful as a guide

    qsub -S /bin/bash -t 1-8 pbsjob

    Issuing the command

    will see 8 jobs run under one major identifier.  The above example is identical (in terms of what jobs would be executed) to the one in the "Do It Yourself" section above.

    Chances are you may need more advanced features of the scripting language than what is shown above. HPRC staff will endeavour to provide assistance with job arrays, if requested.

    Note that the above job would be allocated 20 CPU cores and 40GB of memory.  Users with MPI/PVM/OpenMP jobs should test the efficiency of the software.  Not all software scales well in the parallel computing space - e.g., the number of CPUs MrBayes can scale to is dependent on the type of analysis being done.  Comparison of walltime and CPU time in the job completion Email sent out can be done to determine efficiencies.  For example, assuming there's no CPU or memory under-requesting happening, running an 8 CPU core job where the elapsed CPU time is 7.5 times the walltime means your run is suitably efficient.  If the 7.5 were only 6, your job is actually preventing 2 CPU cores from being used.  In this case, you might want to rerun your job with only 6 cores (say) and see if the efficiency increases.

    A standard compute node in the JCU HPC cluster now has approximately 3GB of memory per configured core.  The following table contains a number of examples of PBS options/directives that should be used for the given memory requirement of the job in question.

    Resources Required for job PBS Resources Request
    1 CPU core, 3GB memory

    -l nodes=1:ppn=1 -l pmem=3gb

    1 CPU core, 8GB memory -l nodes=1:ppn=1 -l pmem=8gb
    1 CPU core, 20GB memory -l nodes=1:ppn=1 -l pmem=20gb
    2 CPU cores, 6GB memory -l nodes=1:ppn=2 -l pmem=3gb
    2 CPU cores, 10GB memory -l nodes=1:ppn=2 -l pmem=5gb
    4 CPU cores, 12GB memory -l nodes=1:ppn=4 -l pmem=3gb
    6 CPU cores, 24GB memory -l nodes=1:ppn=6 -l pmem=4gb
    12 CPU cores, 60GB memory -l nodes=1:ppn=12 -l pmem=6gb
    20 CPU cores, 60GB memory -l nodes=1:ppn=20 -l pmem=3gb

    Note that the above table only contains a discrete number of examples.  HPC cluster compute nodes have been configured to only provide 20 CPU cores and 60GB of memory to users' jobs.  This was done in June 2014 to try and maintain resources for critical system processes.

    Big memory nodes have approximately 5.5GB of memory per CPU core configured inside Torque.  The bigmem queue will need to be used when your PBS job requires more than 60GB of memory.

    Resources Required for job PBS Resources Request
    1 CPU core, 128GB memory -l nodes=1:ppn=1 -l pmem=128gb
    4 CPU cores, 96GB memory -l nodes=1:ppn=4 -l pmem=24gb
    12 CPU cores, 120GB memory -l nodes=1:ppn=12 -l pmem=10gb
    24 CPU cores, 240GB memory -l nodes=1:ppn=24 -l pmem=10gb

    Use mb units if your want/need a more precise memory per core ratio.

    • No labels