Differences between revisions 9 and 10
Revision 9 as of 2010-10-01 07:31:06
Size: 2986
Comment:
Revision 10 as of 2011-04-05 08:52:14
Size: 2987
Editor: NicoleThomas
Comment:
Deletions are marked like this. Additions are marked like this.
Line 76: Line 76:
 * ''showq [r] '' <<BR>> show status of all (running) jobs  * ''showq [-r] '' <<BR>> show status of all (running) jobs

Compile and execute CLaMS programs on JUROPA

Compiler

To use the Intel MPI compiler the following command will be necessary:

  • module load parastation/intel 

Compilation

All used libraries and module files are linked to:

  • /lustre/jhome4/jicg11/jicg1108/clams_lib
    /lustre/jhome4/jicg11/jicg1108/clams_mod

The CLaMS makefiles are adjusted for compilation of CLaMS programs on JUROPA with the Intel compiler (mkincl/config/config.Linux_ifc).

The netCDF version 3.6.3 is used. For using netCDF 4 see Used Libraries (netCDF3/netCDF4).

The CLaMS programs can be compiled with:

  •    1 gmake useComp=ifc [useMPI=true] progname
    

Create batch script

Write a batch script including the mpiexec command for the just compiled executable:

  •    1 #!/bin/bash -x
       2 #MSUB -l nodes=<no of nodes>:ppn=<no of procs/node>
       3 #MSUB -l walltime=<hh:mm:ss>
       4 #MSUB -e <full path for error file>
       5 #MSUB -o <full path for output file>
       6 #MSUB -M <mail-address>
       7 #      official mail address
       8 #MSUB -m n|a|b|e
       9 #      send mail: never, on abort, beginning or end of job
      10 
      11 ### start of jobscript
      12 cd $PBS_O_WORKDIR
      13 echo "workdir: $PBS_O_WORKDIR"
      14 NP=$(cat $PBS_NODEFILE | wc -l) # nodes*ppn
      15 echo "running on $NP cpus ..."
      16 mpiexec -np $NP <executable>
    

Job Limits

Batch jobs

  • max. number of nodes: 256 (default: 1)
  • max. wall-clock time: 12 hours (default: 30 min)

Interactive jobs:

  • max. number of nodes: 2 (default: 1)
  • max. wall-clock time: 30 min (default: 10 min)

Submit job

Submit the job with:

  • msub <jobscript>

On success msub returns the job ID of the submitted job.

Useful commands

  • checkjob -v <jobid>
    get detailed information about a job

  • showq [-r]
    show status of all (running) jobs

  • canceljob <jobid>
    cancel a job

  • mjobctl -q starttime <jobid>
    show estimated starttime of specified job

  • mjobctl --help
    show detailed information about this command

Problems with timeouts using MPI

Currently jobs requesting > 128 nodes ( >1024 cores) fail with PMI barrier timeouts. This is caused by a problem in the ParaStation process management which could not be fixed yet. As a workaround ParTec recommends the setting of the following environment variables in the job script of large MPI applications:

  export __PSI_LOGGER_TIMEOUT=3600  
  export PMI_BARRIER_TMOUT=1800  

just before the 'mpiexec' call. Similar problems have been observed on workstations as well.

Interactive Session

  • Allocate interactive partition with X11 forwarding:
    msub -I -X -l nodes=#:ppn=8
    possible number of nodes: 1 or 2
  • Start your applications
  • Deallocate interactive partition with:
    exit

Juropa/CompileExecute (last edited 2011-04-05 08:52:14 by NicoleThomas)