Glotzer:Aon Documentation

From NSDL Materials Digital Library Soft Matter Wiki

Jump to: navigation, search

AON consists of 220 dual processor nodes (440 total processors). 2.8 Terabytes of storage are provided by the two RAID volumes, /home and /home1.

  • Do not run large cpu/memory-intensive jobs on the headnode. This may cause erratic performance and potentially cause a system crash.


Contents

System Specs

Total Nodes:

  • 220


Total Processors:

  • 440


Cluster Node Specs:

  • # Processors : 2
  • Processor Type : Apple G5
  • Processor Speed : 2.0 Ghz
  • Physical Memory : 1.0 gb - 2.0 gb


Head Node Specs:

  • aon-login
  • # Processors : 2
  • Processor Type : Apple G5
  • Processor Speed : 2.0 Ghz
  • Physical Memory : 3.0 gb


Storage:

  • /glotzer: 500 gb
  • /kieffer: 200 gb
  • / larson: 500 gb
  • / falk: 200 gb
  • /home : 1 Tb
  • /home1 : 1.8 Tb


Networking:

  • 95 Nodes connected via low-latency Myrinet interconnects
  • 58 Nodes connected via gigabit ethernet
  • 162 Nodes connected via Fast ethernet (this includes the service network for Myrinet-connected nodes)


PBS Batch System

Portable Batch System (PBS) is used just like on NYX.

Submitting Jobs

The general command for submitting a job is "qsub". For example, one could save a sample script from below to the filename "script.sh", and submit it for execution as follows:

qsub script.sh

Checking on Jobs

The general command for checking on jobs is "qstat". The basic identifiers used are: R= running, Q = queued, C = canceling, E = ending.

  • List all jobs running or queued:
qstat
  • List only my jobs running or queued:
qstat -u USERNAME
  • List jobs with nodes they are running on:
qstat -n

One can also utilize built in unix tools such as "grep" to filter this information.

  • Show only jobs running in the short queue:
qstat -n | grep short

The command "alloc" parses some of the information about jobs executing and groups it by group (certain functions broke during the last major change to the system, these functionalities will be corrected at a future date).

Deleting a Job

To delete a job from the queue issue the command:

qdel JOBID

where JOBID is the ID number of the job.

Running Interactively

The AON headnode (the node you log into) handles all scheduling and interactive duties. The headnode should be used for:

  • submitting jobs
  • browsing directories
  • compiling code
  • debugging code (only very short runs, approximately 1 minute or less).

If longer interactive debugging of code is needed, you can run interactively on a cluster node in the short queue. Do not run code for any extended period of time on the headnode, as it will negatively affect the performance for all other users. Additionally, a user spawned process has a maximum wall clock, and will be killed to avoid runaway processes. Running interactively in the short queue allows you access to a dedicated node for 24 hours of debugging.

To run interactively on the short queue, you will first request a node:

qsub -q short -I

This will produce and interactive shell similar to the standard terminal (this may take a few moments as PBS find you an available node).

qsub: waiting for job JOBID.aon.engin.umich.edu to start

When PBS has found a node for you, you will see:

qsub: job JOBID.aon.engin.umich.edu ready

You can then cd to the appropriate directory and run as normal. When you are done, you can simple type exit. Your job will show up to others in the queue as:

JOBID.aon.engin.umic USERID      short    STDIN       15442   --   --    --  24:00 R 00:01

Queues

The route queue will direct your jobs to the appropriate resources related to your user group. The short queue will allow you to run on any available nodes, not just your group resources. DO NOT ABUSE THE SHORT QUEUE.

route queue

  • Name : "route"
  • Max Wall Time : None
  • Max Nodes per Job : 8
  • Follows group allocation

short queue

  • Name : "short"
  • Max Wall Time : 24 Hours
  • Max Nodes per Job : 8
  • Runs on available free nodes


Sample PBS Scripts

Serial Jobs

Sample PBS script:

#!/bin/sh
#PBS -V
#PBS -N testjobname
#PBS -l nodes=1:ppn=2,walltime=60:00:00
#PBS -q route
#PBS -M username@umich.edu
#PBS -m abe
#PBS -j oe

cd /home/username/testcode
./test &
cd /home/username/testcode2
./test &
wait
  • "-V" includes the currently loaded modules. If you do not include this, your code will either crash or simply not run.
  • "testjobname" is the name given to the specific run
  • "nodes=1" refers to the number of nodes requested
  • "ppn=2" refers to the number of processors requestion. You can request an individual processor on a node, or both processors on a node. If you do not use this modifier, your code will only get 1 processor or crash.
  • "walltime=60:00:00" refers to the walltime this will run, if the job excedes this time, it will be terminated by PBS
  • "route" is the queue name, this can also be "workq" or "short"
  • "username@umich.edu" is the email address a confirmation message of starting and finishing will be sent
  • "abe" send confirmation email when starting, finishing, or on error
  • "oe" joins the standard output and error into a single file

Parallel Jobs:

  • Parallel jobs that have large amounts of message passing between processors (lots of small messages) will benefit most from Myrinet.
  • Parallel jobs that communicate less often, but send large messages will typically run well on Myrinet or gigabit (there is often a large drop off in efficiency when one uses more than 3 gigabit nodes).
  • Parallel jobs, such as parallel tempering, will often recieve no benefit from using higher bandwidth (gigabit) or low-latency (Myrinet) interconnects.
  • 2 processor jobs (running on 1 single node) communicate via the memory bus, and as such, will not benefit from the gigabit or Myrinet networks.

LAM-MPI

LAM-MPI is the default module. Future development of LAM-MPI has stopped, OPEN-MPI will be replacing it. LAM-MPI does not support Myrinet on AON.

Sample LAM-MPI PBS script:

#!/bin/sh
#PBS -V
#PBS -N testjobname
#PBS -l nodes=2:ppn=2,walltime=60:00:00
#PBS -q route
#PBS -M username@umich.edu
#PBS -m abe
#PBS -j oe

#
echo "I ran on:"
cat $PBS_NODEFILE
#
# Change to your execution directory.
cd ~/your-run-dir
#
lamboot $PBS_NODEFILE
#
# Use mpirun to run with 2 nodes for 60 hours
mpirun -np 4 ./your-mpi-program
#
lamhalt 

OPEN-MPI

If possible, please use OPEN-MPI as all development work has transitioned to this MPI implementation. To switch to OPEN-MPI, issue the following command (or add it to your .bashrc file):

module swap lam openmpi

When submitting jobs for OPEN-MPI you do not need to issue a lamboot or lamhalt command.

Sample OPEN-MPI PBS script:

#!/bin/sh
#PBS -V
#PBS -N testjobname
#PBS -l nodes=2:ppn=2,walltime=60:00:00
#PBS -q route
#PBS -M username@umich.edu
#PBS -m abe
#PBS -j oe

#
echo "I ran on:"
cat $PBS_NODEFILE
#
# Change to your execution directory.
cd ~/your-run-dir
#
# Use mpirun to run with 2 nodes for 60 hours
mpirun -np 4 ./your-mpi-program
#


Selecting Networking

By default, your job will preferentially try to run on nodes that are NOT connected via gigabit or Myrinet. This is to ensure that these resources are available to users that need them.

  • Parallel jobs that have large amounts of message passing between processors (lots of small messages) will benefit most from Myrinet.
  • Parallel jobs that communicate less often, but send large messages will typically run well on Myrinet or gigabit (there is often a large drop off in efficiency when one uses more than 3 gigabit nodes).
  • Parallel jobs, such as parallel tempering, will often recieve no benefit from using higher bandwidth (gigabit) or low-latency (Myrinet) interconnects.
  • 2 processor jobs (running on 1 single node) communicate via the memory bus, and as such, will not benefit from the gigabit or Myrinet networks.

Gigabit

To select gigabit, mody the "-l" line in your PBS script as follows:

#PBS -l nodes=2:ppn=2:gigabit,walltime=60:00:00

Myrinet

To select Myrinet, modify the "-l" line in your PBS script as follows:

#PBS -l nodes=2:ppn=2:myrinet,walltime=60:00:00

NOTE: LAM-MPI does not support Myrinet on AON. Your code will run using fast Ethernet instead to communicate.


Selecting Memory

To select between nodes with 1gb total (512 per processor) and 2gb total (1gb per processor), modify the "-l" in your PBS script as follows:

#PBS =l nodes=1:ppn=2:pmem=1000mb,walltime=60:00:00

This will select a 2gb memory node since "pmem" refers to the amount of memory per processor.

Modules

AON allows users to selectively load different modules, such as the MPI implementation. The following commands can be utilized in your .bashrc script to load by default.

To list available modules:

module avail

To list loaded modules:

module list

To load a module called NEW_MODULE:

module load NEW_MODULE

To swap modules from CURRENTLY_LOADED to NEW_MODULE:

module swap CURRENTLY_LOADED NEW_MODULE

Lammps Module

Lammps, has been built on AON as a module; once the module is loaded you will be able to use the Lammps binary lmp_fink.

Lammps was built using the following modules:

  • openmpi/1.2b1-xlf
  • fftw/2.1.5-xlf

The current version built is from the 8 May 2007 source code. The date of the source code will be reflected in the name of the module (i.e. lammps/8May07). I will make an effort to stay up-to-date regarding building new revisions of the code, however contact me about building a new version if you find the current one is out-of-date and insufficient. A list of bug fixes is available on the Lammps bug page.

  • Example:
$ module swap lam openmpi  
$ module load lammps/1Apr08
$ lmp_fink

Job Management

To provide equitable resource managment within each user group, users may wish to employ some of the advanced features of PBS, specifically the depend command when submitting.

Depend

For example, "user_A", a member of "group_A", wishes to submit 20 jobs, however "group_A" only has 10 nodes allocated to it. By submitting 20 jobs, "user_A" would completely fill all the available nodes and fill the queue, barring any group members from running until those jobs have completed. Instead, "user_A" can modify future jobs to depend on the state of previous jobs.

  • If "user_A" has 2 jobs already running,
111.aon Job_1 user_A 10:00:01 R cac_serial
123.aon Job_2 user_A 09:00:00 R cac_serial
  • "user_A" can tell future jobs that are submitted to wait to be queued until after jobs "111" and "123" have finished.
% qsub -W depend=afterany:111 Job_3.sh
% qsub -W depend=afterany:123 Job_4.sh
  • The result is, this new job will not be queued until job "111" has completed
111.aon Job_1 user_A 10:00:01 R cac_serial
123.aon Job_2 user_A 09:00:00 R cac_serial
128.aon Job_3 user_A 0 H cac_serial
129.aon Job_4 user_A 0 H cac_serial
  • And following this, Job_5 can depend on Job_3 ending, etc...
% qsub -W depend=afterany:128 Job_5.sh
% qsub -W depend=afterany:129 Job_6.sh


Data Storage

AON has two disk arrays, "/home" and "/home1". Use the "pwd" command to see which array your home directory lives on. The machine will crash if the drives become very close to full (>90%), so please keep only current data on the machine.

Total Free Space

To check the amount of free space on each of the disk arrays, issue the following command and look for the lines that correpsond to "/home" and "/home1"

df -h

Calculate File Size

To check on the space in your individual account, use the following command at the top level of your account (your home directory):

du -s -h

You can specifically calculate the amount of space used by an individual folder or file using the command:

du -s -h FILE_NAME

This may take several minutes depending on the amount of data you have stored in your account.

Compressing Data

Please utilize tools such as gzip to compress ascii data; typically this will cut file size down to half of the original. Rasmol is able to read gzipped files without needing to decompress.

gzip -9 filename //compress a file
gzip -d filename //decompress a file

Please be responsible with your data, otherwise the cluster may crash which is bad for everyone.


Compiling

C/C++ :

gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)

  • For serial jobs invoked using "gcc" (C) or "g++" (C++)
  • For parallel jobs invoked using, "mpicc" (C) and "mpiCC" (C++)

Recommended Flags:

  • -O3 -funroll-loops -mtune=G5 -mcpu=G5
  • -fast
    • this often works well, however be aware of the following:
      • this may have problems with fstream when using C++
      • this utilizes the flag -ffast-math which may produce incorrect results in some codes. Anecdotaly we found that Monte Carlo codes were affected while Molecular Dynamics codes were not.

For more info: Apple gcc flags

Fortran

IBM XL Fortran Advanced Edition Version 8.1 for Mac OS X (Absoft) Invoked using:

  • xlf, xlf_r, f77 or, fort77 (Fortran 77)
  • xlf90, xlf90_r or, f90 (Fortran 90)
  • xlf95, xlf95_r or, f95 (Fortran 95)

For more information: Absoft general Info or Absoft compiling Info

Altivec (Velocity Engine) Enabled Code

When using gcc, preliminary tests indicate the "-fast" flag produces inconsistant behavior. Exclude "-mpowerpc64" from flags packed into "-fast" corresponding to:

  • O3
  • funroll-loops
  • fstrict-aliasing
  • fsched-interblock
  • falign-loops=16
  • falign-jumps=16
  • falign-functions=16
  • falign-jumps-max-skip=15
  • falign-loops-max-skip=15
  • malign-natural
  • ffast-math
  • mdynamic-no-pic
  • mpowerpc-gpopt
  • force_cpusubtype_ALL
  • fstrict-aliasing
  • mtune=G5
  • mcpu=G5

Altivec Documentation : Apple developer info, Motorola tech info

Personal tools

Kent State University NIST MIT University of Michigan Purdue Iowa State University