Quick & Dirty Introduction to
Compiling and Running on Sooner
For help, please contact us.
If you aren't sure whether your program uses the
Message
Passing Interface (MPI),
then it probably doesn't.
But, here's how you can find out for sure:
-
Log in
to
sooner.oscer.ou.edu .
-
At the Unix prompt,
change directory
(
cd )
to the subdirectory where your source code lives.
-
At the Unix prompt,
type this command:
grep "mpi.h" *.[cChH]*
-
At the Unix prompt,
type this command:
grep -i "mpif.h" *.[fF]*
-
At the Unix prompt,
type this command:
grep -i "use mpi" *.[fF]*
If, in response to any of these
grep
commands,
you get anything back other than the Unix prompt
(and especially anything that has the word "include"
or the word "use" in it),
then very probably your code uses MPI.
If not, then very probably your code doesn't use MPI.
If you're still unsure,
please contact us
and we'll help.
If you aren't sure whether your program uses
OpenMP,
then it probably doesn't.
But, here's how you can find out for sure:
-
Log in
to
sooner.oscer.ou.edu .
-
At the Unix prompt,
change directory
(
cd )
to the subdirectory where your source code lives.
-
At the Unix prompt,
type this command:
grep "pragma" *.[cChH]* |
grep "omp"
-
At the Unix prompt,
type this command:
egrep -e '\!\$[oO][mM][pP]'
*.[fF]*
If, in response to any of these
grep
or
egrep
commands,
you get anything back other than the Unix prompt
(and especially anything that has the word "include"
or the word "parallel" in it),
then very probably your code uses
OpenMP.
If not, then very probably your code doesn't use
OpenMP.
If you're still unsure,
please contact us
and we'll help.
If you aren't sure whether your program uses
POSIX
threads (pthreads),
then it probably doesn't.
But, here's how you can find out for sure:
-
Log in
to
sooner.oscer.ou.edu .
-
At the Unix prompt,
change directory
(
cd )
to the subdirectory where your source code lives.
-
At the Unix prompt,
type this command:
grep -i "pthread" *.[cChHfF]*
If, in response to this
grep
command,
you get anything back other than the Unix prompt
(and especially anything that has the word "pthread" in it),
then very probably your code uses
POSIX
threads (pthreads).
Another possibility is that your code uses
the
Basic
Linear Algebra Subprograms (BLAS).
In that case,
very probably the implementation of the BLAS that you use
will have
POSIX
threads (pthreads)
enabled.
Note that if your code uses any of the following
(and this list isn't exhaustive),
then it probably uses the BLAS or otherwise needs
POSIX
threads (pthreads):
If not, then very probably your code doesn't use
POSIX
threads (pthreads).
If you're still unsure,
please contact us
and we'll help.
On sooner.oscer.ou.edu ,
we support the following compilers:
-
Intel
compiler family
(all version 10.1)
-
icc
for C
-
icpc
for C++
-
ifort
(formerly known as ifc )
for Fortran 77/90/95
-
Portland
Group compiler family
(all version 7.0-7)
-
pgcc
for C.hack sign Op & End single
-
pgCC
for C++
-
pgf77
for Fortran 77
-
pgf90
for Fortran 90/95
-
pghpf
for High Performance Fortran
-
GNU
compiler family
(all version 4.1.2)
-
gcc
for C
-
g++
for C++
-
gfortran
for Fortran 77/90/95
-
g77
for Fortran 77
NOTE:
In Red Hat Enterprise Linux 5
(and presumably in future releases),
g77
has been supplanted by
gfortran ,
but for completeness
we've provided a
soft link
(like a shortcut in Microsoft Windows)
from
gfortran
to
g77
so that makefiles and other build scripts
that expect to use
g77
will seem to find
g77
(but will actually use
gfortran ).
-
Numerical
Algorithms Group (NAG) compiler family
-
nagf95
for Fortran 77/90/95 (version 5.2)
NOTE:
The
nagf95
compiler
is actually a Fortran-to-C translator
that then calls
gcc
(see above).
NOTE:
The
nagf95
compiler
hasn't yet been installed on Sooner
(as of August 18 2008).
Each family of compilers has its own unique compiler options;
you can find complete lists both in their
man pages
(online manuals)
and in their
manuals.
So, the following are merely the recommended options
for reasonably vanilla source codes.
If you've determined that your program uses no parallelism
—
that it's a "serial" (non-parallel) program
—
then here's how to compile.
-
Choose the compiler that you want.
-
Try the recommended compiler options,
using the
EXACT SAME COMPILER OPTIONS
for
EVERY SINGLE COMPILE COMMAND
in your compilation procedure
(including the link command).
For example:
ifort -O -march=core2 -mtune=core2
-c mysourcefile1.f90
ifort -O -march=core2 -mtune=core2
-c mysourcefile2.f90
...
ifort -O -march=core2 -mtune=core2
-c mysourcefileN.f90
ifort -O -march=core2 -mtune=core2
-c mymainprogram.f90
ifort -O -march=core2 -mtune=core2
-o myexecutable \
mymainprogram.o
mysourcefile1.o ...
mysourcefileN.o
NOTE:
The backslash \ at the end of the first line
of the final compile (link) command
is the Unix/Linux continuation character:
it means that the command continues onto the next line.
You don't absolutely need it
—
you can just type the entire link command on a single line
—
but it makes the link command more readable,
which is good.
-
You're also welcome to try other compiler options,
as described in the
compiler
manuals
and
man pages
(online manuals).
-
If you have several source files,
we
STRONGLY RECOMMEND
creating a
library archive file.
For example:
ifort -O -march=core2 -mtune=core2
-c mysourcefile1.f90
ifort -O -march=core2 -mtune=core2
-c mysourcefile2.f90
...
ifort -O -march=core2 -mtune=core2
-c mysourcefileN.f90
ifort -O -march=core2 -mtune=core2
-c mymainprogram.f90
ar ru libmysource.a
mysourcefile1.o ... mysourcefileN.o
ranlib libmysource.a
ifort -O -march=core2 -mtune=core2
-o myexecutable \
mymainprogram.o
-L. -lmysource
NOTE:
The backslash \ at the end of the first line
of the final compile (link) command
is the Unix/Linux continuation character:
it means that the command continues onto the next line.
You don't absolutely need it
—
you can just type the entire link command on a single line
—
but it makes the link command more readable,
which is good.
NOTE:
The
ar
command
creates the library archive file
libmysource.a ,
containing the binary versions of
all of the routines in
all of the object files
(mysourcefileK.o for K from 1 to N).
NOTE:
The
ranlib
command
makes the library archive file
libmysource.a
smarter about its own contents,
so we
STRONGLY RECOMMEND
using it.
NOTE:
In the final compile (link) command,
the -L. option means:
"Look for library archive files named
lib*.a
or
lib*.so*
in the current working directory."
NOTE:
In the final compile (link) command,
the -lmysource option means:
"Link to the library archive file named
libmysource.a
(if linking statically)
or
libmysource.so*
(if linking dynamically)."
Note that the second character is a lower case L,
NOT a one.
For example,
if you write C or C++ code,
you may have used the following:
-lm
which means "link to the C/C++ standard math library,"
which includes functions such as
sqrt .
-
NOTE:
When compiling (actually linking) with the
Intel
compiler family,
a common warning is:
/usr/intel10/lib/libimf.so:
warning: warning: feupdateenv is not implemented
and will always fail
You may
IGNORE
this warning;
to our knowledge,
no one has ever been harmed by it.
Rumor has it that adding the following compiler option
to ALL compile/link commands
will make this warning go away:
-shared-intel
You can instead use the following:
-i-dynamic
However,
Intel now
deprecates
(severely frowns on)
the second option,
so we recommend the first.
-
You may find it valuable to try compiling and running your code
with each of the compiler families,
and with various combinations of compiler options,
to find the best combination for your code.
-
Please bear in mind that some codes can only be compiled by
a subset of the compilers that are available on Sooner.
OpenMP
is supported by the following compilers,
using the associated compiler option
for
EVERY SINGLE COMPILE COMMAND
in your compilation procedure
(including the link command):
Compiling an
OpenMP
program is exactly the same as
compiling
a non-parallel (serial) program,
except that you add
the appropriate
OpenMP
compiler option
to EVERY compile/link command.
For example:
ifort -O -march=core2 -mtune=core2 -openmp
-c mysourcefile1.f90
ifort -O -march=core2 -mtune=core2 -openmp
-c mysourcefile2.f90
...
ifort -O -march=core2 -mtune=core2 -openmp
-c mysourcefileN.f90
ifort -O -march=core2 -mtune=core2 -openmp
-c mymainprogram.f90
ar ru libmysource.a
mysourcefile1.o ... mysourcefileN.o
ranlib libmysource.a
ifort -O -march=core2 -mtune=core2 -openmp
-o myexecutable \
mymainprogram.o
-L. -lmysource
POSIX
threads (pthreads)
are supported
by all of the compilers installed on Sooner.
To compile with
POSIX
threads (pthreads),
all you need to do is,
in the final link step of compiling,
append the following at the very end of the link command:
-lpthread
This means,
"link to the
POSIX
threads (pthreads)
library."
Note that the second character is a lower case L,
NOT a one.
For example,
if you write C or C++ code,
you may have used the following:
-lm
which means "link to the C/C++ standard math library,"
which includes functions such as
sqrt .
1. Before Compiling an
MPI
Program
Before compiling an
MPI
program,
you need to set some
environment variables:
-
MPI_COMPILER
This environment variable indicates
the family of compilers that you're using.
NOTE:
MPIENV ,
an equivalent environment variable,
is deprecated
(frowned upon for future use).
Please transition your builds and batch scripts to
MPI_COMPILER
instead of
MPIENV .
-
MPI_INTERCONNECT
This environment variable indicates
the communication hardware that you're using
to send MPI messages between MPI processes.
NOTE:
MPIDEV ,
an equivalent environment variable,
is deprecated
(frowned upon for future use).
Please transition your builds and batch scripts to
MPI_INTERCONNECT
instead of
MPIDEV .
-
MPI_VENDOR
This environment variable indicates
a subcategory of communication that you're using;
you only need to set this environment variable
under a few certain specific circumstances.
A quick discussion of how to set environment variables is below.
MPI_COMPILER
This environment variable indicates
the family of compilers that you're using.
OSCER supports
multiple
compiler families on Sooner (see above).
Possible values that you can set
MPI_COMPILER
to are:
-
intel
or
intel10
Either of these values indicates that
you want to compile using
the
Intel
compiler family,
version 10.1 (the default version).
-
intel9
This value indicates that
you want to compile using
the
Intel
compiler family,
version 9.1 (an older version).
NOTE:
We
STRONGLY RECOMMEND AGAINST
using older compiler versions such as
intel9 ,
which is included for backward compatibility only
and which we'll shut down at the earliest opportunity
(so it may not even exist by the time you read this).
-
pgi
This value indicates that
you want to compile using
the
Portland
Group compiler family
(version 7.1-7).
-
gnu
or
gcc
Either of these values indicates that
you want to compile using
the
GNU
compiler family
(version 4.1.2).
-
nag
This value indicates that you want to compile using the
NAG
compiler
(version 5.1)
for Fortran 77/90/95
and the
GNU
compiler family
(version 4.1.2)
for C and C++.
NOTE:
The
NAG
compiler family
includes only a Fortran 90/95 compiler,
which can also be used to compile Fortran 77 code,
but for C and C++ the
GNU
compilers
are used.
(The nagf95 compiler
is a Fortran-to-C translator that then invokes
gcc ,
so these compilers should be fully compatible.)
MPI_INTERCONNECT
This environment variable indicates
the communication hardware that you're using
to send MPI messages between MPI processes.
Possible values you can set
MPI_INTERCONNECT
to are:
-
ib
This value means that
MPI messages will be sent over
the high performance Infiniband interconnect,
using the native Infiniband software drivers,
known as
MVAPICH.
This is the fastest way to communicate among multiple nodes.
This is the default,
and in general you should use this
unless you have a
VERY GOOD REASON
to do otherwise
(for example, if Infiniband is incompatible with
the compiler you're using,
as shown in the
compatibility matrix
below).
-
gige
This value means that
MPI messages will be sent
over Gigabit Ethernet,
using the slower TCP/IP software driver
that is used for standard Ethernet
(e.g., for sending data on the Internet).
This is a substantially slower way
to communicate among multiple nodes,
and should be used only if you have a very good reason
(such as needing the PGI or NAG compilers).
-
shmem
This value means that
MPI messages will be sent
entirely inside RAM.
This value should only be used if
your MPI executable is
guaranteed
to be run inside
a single individual node
(i.e., on at most 8 MPI processes).
MPI_VENDOR
This environment variable indicates
some more specifics
how to send MPI messages between MPI processes
via Infiniband.
The only possible value you can set
MPI_VENDOR
to is:
-
openmpi
This value means that
MPI messages will be sent
using
OpenMPI,
rather than
MVAPICH.
If the
MPI_VENDOR
environment variable
is not defined,
but
MPI_INTERCONNECT
is set to
ib ,
then
MVAPICH
will be used
as the Infiniband communication mechanism.
2. Compiler/Interconnect Compatibility Matrix
COMPILER
|
INTERCONNECT
|
|
ib
(Infiniband hardware via MVAPICH software)
|
ib/openmpi
(Infiniband hardware via OpenMPI software)
|
gige
(Gigabit Ethernet hardware via TCP/IP software)
|
shmem
(using RAM inside a single node)
|
gcc
|
X
|
X
|
X
|
X
|
g++
|
X
|
X
|
X
|
X
|
g77
(soft linked to
gfortran )
|
X
|
X
|
X
|
X
|
gfortran
|
X
|
X
|
X
|
X
|
icc
|
X
|
X
|
X
|
X
|
icpc
|
X
|
X
|
X
|
X
|
ifort
|
X
|
X
|
X
|
X
|
pgcc
|
X
|
X
|
X
|
X
|
pgCC
|
X
|
X
|
X
|
X
|
pgf77
|
X
|
X
|
X
|
X
|
pgf90
|
X
|
X
|
X
|
X
|
pghpf
|
|
|
|
|
nagf95
|
|
|
NOT YET
|
NOT YET
|
1. Before Compiling a
CUDA
Program
Before compiling a
CUDA
program,
you need to set some
environment variables:
-
CUDA_PATH
This environment variable indicates
the location of the version of CUDA you're using.
-
PATH
This environment variable indicates
the locaton(s) of exectuables that you're using.
-
LD_LIBRARY_PATH
This environment variable indicates
the locaton(s) of libraries that you're using;
A quick discussion of how to the set environment variables is below.
CUDA_PATH
This environment variable indicates
the location of the version of CUDA you're using.
OSCER supports
multiple
CUDA versions on Sooner (see above).
Possible values that you can set
CUDA_PATH
to are:
-
/home/software/CUDA/2.3/install/toolkit/cuda
-
/home/software/CUDA/3.0/install/cuda
-
/home/software/CUDA/3.2/install/cuda
-
/home/software/CUDA/4.0/cuda
NOTE:
As newer versions of CUDA become available, older
versions will be depricated and eventually removed
from the system.
PATH
This environment variable indicates
the locaton(s) of exectuables that you're using.
Simply edit the existing
PATH
for CUDA compliation by adding:
-
${CUDA_PATH}/bin:${CUDA_PATH}/open64/bin:${PATH}
NOTE:
The "${CUDA_PATH}" means take the value of CUDA_PATH and place its value here.
LD_LIBRARY_PATH
This environment variable indicates
the locaton(s) of libraries that you're using;
The only possible value you can set
LD_LIBRARY_PATH
to is:
-
${CUDA_PATH}/lib:${CUDA_PATH}/open64/lib:${CUDA_PATH}/lib64:${LD_LIBRARY_PATH}
This value means the Nvidia CUDA Compiler
will look for necessary libraries need for
successful compilation in the indicated locations.
3. How to Set Environment Variables
How to set these environment variables
depends on which
Unix shell
you are using.
If you're not sure which
Unix shell
you're using
(or if you're not sure what a "Unix shell" is),
then type this at the Unix prompt
and then press the "Enter" key:
ps
The response will look something like this:
PID TTY TIME CMD
28826 pts/19 00:00:00 tcsh
30567 pts/19 00:00:00 ps
If your shell is
tcsh ,
then to set the environment variables,
type this
(perhaps with different values for the environment variables)
at the Unix prompt:
setenv MPI_COMPILER intel
setenv MPI_INTERCONNECT ib
If your shell is
bash
or
ksh
or
zsh ,
then to set the environment variables,
type this at the Unix prompt:
export MPI_COMPILER=intel
export MPI_INTERCONNECT=ib
If your shell is
sh ,
then to set the environment variables,
type this at the Unix prompt:
MPI_COMPILER=intel
export MPI_COMPILER
MPI_INTERCONNECT=ib
export MPI_INTERCONNECT
If your shell is something else,
then please contact us
and we'll try to help.
4. How to Compile an MPI Program
After You've Set the Environment Variables
After you've set the environment variables,
compile using one of the following compile commands:
-
mpicc
for C code
NOTE:
Use this INSTEAD OF
icc ,
gcc
or
pgcc .
-
mpiCC
for C++ code
NOTE:
Use this INSTEAD OF
icpc ,
g++
or
pgCC .
-
mpif77
for Fortran 77 code
NOTE:
Use this INSTEAD OF
ifort ,
g77 ,
gfortran ,
nagf95 ,
or
pgf77 .
-
mpif90
for Fortran 90 code
NOTE:
Use this INSTEAD OF
ifort ,
gfortran ,
nagf95 ,
or
pgf90 .
In all other details,
the usage is exactly the same
as when
compiling
a non-parallel code.
In particular, you will use the same compiler options
as you would have used for the compiler family
that you chose when setting
MPI_COMPILER .
For example,
to compile an MPI code written in Fortran 90
using the
Intel
compiler family,
the compile command might start with:
mpif90 -O -march=core2 -mtune=core2 ...
WARNING!
Be sure that
EVERY SINGLE COMPILE COMMAND
in your compilaton procedure,
including the final compile command for linking the executable,
uses the EXACT SAME COMPILER OPTIONS!
Except for very small, very brief non-MPI runs
on a few CPU cores within a single node,
ALL
runs should be performed using the LSF batch queue system.
To start,
you'll want to make a copy of one of the following
batch script files:
-
~hneeman/example_nonparallel.bsub
Use this batch script file
to run any batch job that uses
EXACTLY ONE CPU CORE
inside a single compute node
(that is, a non-parallel job).
-
~hneeman/example_parallel_sharedmem.bsub
Use this batch script file
to run any
batch job that uses
shared memory parallelism
via either
OpenMP
or
POSIX
threads (pthreads)
on up to 8 CPU cores inside a single compute node.
-
~hneeman/example_parallel_mpi.bsub
Use this batch script file
to run any batch job that uses
purely distributed parallelism via
MPI
on any number of nodes and any number of CPU cores per node
(up to 8).
-
~hneeman/example_parallel_hybrid.bsub
Use this batch script file
to run any batch job that uses
a hybrid of two kinds of parallelism:
(a) distributed parallelism via
MPI
on any number of nodes and any number of CPU cores per node
(up to 8, though we recommend an upper limit of 4),
and
(b) shared memory parallel
via either
OpenMP
or
POSIX
threads (pthreads).
We recommend that the total number of threads per node
be at most 8,
because you almost certainly don't want more threads
than cores per node.
-
~hneeman/example_fat.bsub
Use this batch script file
to run any nonparallel (serial) or shared memory parallel
batch job that uses
1 to 16 CPU cores inside a single fat node.
The batch job can be either
(a) non-parallel (serial)
or
(b) shared memory parallel
via either
OpenMP
or
POSIX
threads (pthreads).
-
~hneeman/example_nonparallel_cuda.bsub
Use this batch script file
to run any nonparallel cuda batch job that uses
1 CPU core inside a single node but could use up
to two CUDA devices.
-
~hneeman/example_parallel_mpi_cuda.bsub
Use this batch script file
to run any parallel cuda batch job that uses
multiple CPU cores inside a multiple nodes
AND uses multiple CUDA devices.
Once you've identified which batch script file to use,
do this:
cp ~hneeman/example_nonparallel.bsub whatever.bsub
where
whatever.bsub
is the name of the batch script file that you want to create;
typically,
you'll replace
whatever
with the name of your executable or the experiment or something.
Note that you could use other of the
example_*.bsub
files
instead of
example_nonparallel.bsub .
You should be able to modify your copy of the batch script file
to suit your needs.
It contains detailed information
about how to set up and run a batch job.
IMPORTANT NOTE!!!
In your batch script,
you MUST
use the absolute FULL PATH
for your executable!
DON'T
use a relative path nor leave out the path!
Click
here.
Click
here.
Click
here.
For help, please contact us.
|