OSCER Benchmarks

Aspen Systems Linux Pentium4 Xeon Cluster (boomer.oscer.ou.edu)

These benchmarks are provided with no claims as to correctness or relevance.


High Performance Linpack (HPL)

HPL is the de facto standard for benchmarking supercomputers. It's used for determining rankings on the Top 500 List of supercomputers.

Sustained GFLOP/s on 256 procs: 606.9 GFLOP/s (57.5% of peak)
(We used 256 processors instead of 264 because HPL runs faster on numbers of processors that are perfect squares.)

This number was obtained using Kazushige Goto's DGEMM kernel for Pentium4.

Input file
Output file

Benchmarks on various numbers of processors
Raw performance (GFLOP/s) PostScript PDF
Percent of theoretical peak PostScript PDF

 

 


STREAM

The STREAM benchmark measures memory bandwidth on one or more processors within a cluster node or an SMP.

All data are in MB/sec.

Note that these benchmarks are based on the generic version of the STREAM benchmark. For each entry, only the maximum values are provided.

Using gcc -O
Procs Copy Scale Add Triad
1 1481.9591 1481.0690 1653.8636 1665.6851
Using ecgs -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer
Procs Copy Scale Add Triad
1 1276.2237 1269.3378 1452.8294 1450.1064
Using pgcc -O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align -Mdalign -Mvect=sse
Procs Copy Scale Add Triad
1 1365.6556 1364.3167 1533.3977 1535.1661
Using pgf77 -O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align -Mdalign -Mvect=sse -mp
Procs Copy Scale Add Triad
1 1362.3381 1360.8911 1529.9287 1534.9195
Using icc -O3 -tpp7 -xW -static
Procs Copy Scale Add Triad
1 1372.0924 1370.3309 1535.6075 1535.6532

LLCbench

LLCbench contains three benchmarks: BLASBench, CacheBench and MPBench.

BLASBench
BLASBench measures numerical performance using the Basic Linear Algebra Subprograms. The tested routines are: DAXPY (Double precision a*x+y), DGEMV (Double precision GEneral Matrix-Vector multiply) and DGEMM (Double precision GEneral Matrix-Matrix multiply).
Using Netlib BLAS
(gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer)
PDF PostScript
Using ATLAS
(gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer)
PDF PostScript
Using ATLAS
(icc -O2 -mp -prec_div -pc64 -unroll -tpp7 -xW -vec_report2 -opt_report)
PDF PostScript

CacheBench
CacheBench measures memory bandwidth, sort of the way that STREAM does, but also determining the bandwidth of cache(s).
Using
(icc -O2 -mp -prec_div -pc64 -unroll -tpp7 -xW -vec_report2 -opt_report)
PDF PostScript
Using
gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer
PDF PostScript

MPBench: coming soon