Baseline C : cc -arch ev7 -fast +CFB ONESTEP
C++: cxx -arch ev7 -O2 ONESTEP
Peak:
The following use: -g3 -arch ev7 ONESTEP
175.vpr 181.mcf 197.parser 253.perlbmk
The following use: -g3 -arch ev6 ONESTEP
164.gzip 176.gcc 254.gap 255.vortex 256.bzip2 300.twolf
Individual benchmark tuning:
164.gzip: -fast -O4 -non_shared +CFB
175.vpr: -fast -O4 -assume restricted_pointers +CFB
176.gcc: -fast -O4 -xtaso_short -all -ldensemalloc -none
+CFB +IFB
181.mcf: -fast -xtaso_short +CFB +IFB +PFB
186.crafty: same as base
197.parser: -fast -O4 -xtaso_short -non_shared +CFB
252.eon: -arch ev7 -O2 -all -ldensemalloc -none
253.perlbmk: -fast -non_shared +CFB +IFB
254.gap: -fast -O4 -non_shared +CFB +IFB +PFB
255.vortex: -fast -non_shared +CFB +IFB
256.bzip2: -fast -O4 -non_shared +CFB
300.twolf: -fast -O4
-ldensemalloc -non_shared +CFB +IFB
Most benchmarks are built using one or more types of
profile-driven feedback. The types used are designated
by abbreviations in the notes:
+CFB: Code generation is optimized by the compiler, using
feedback from a training run. These commands are
done before the first compile (in phase "fdo_pre0"):
mkdir /tmp/pp
rm -f /tmp/pp/${baseexe}*
and these flags are added to the first and second compiles:
PASS1_CFLAGS = -prof_gen_noopt -prof_dir /tmp/pp
PASS2_CFLAGS = -prof_use -prof_dir /tmp/pp
(Peak builds use /tmp/pp above; base builds use /tmp/pb.)
+IFB: Icache usage is improved by the post-link-time optimizer
Spike, using feedback from a training run. These commands
are used (in phase "fdo_postN"):
mv ${baseexe} oldexe
spike oldexe -feedback oldexe -o ${baseexe}
+PFB: Prefetches are improved by the post-link-time optimizer
Spike, using feedback from a training run. These
commands are used (in phase "fdo_post_makeN"):
rm -f *Counts*
mv ${baseexe} oldexe
pixie -stats dstride oldexe 1>pixie.out 2>pixie.err
mv oldexe.pixie ${baseexe}
A training run is carried out (in phase "fdo_runN"), and
then this command (in phase "fdo_postN"):
spike oldexe -fb oldexe -stride_prefetch -o ${baseexe}
When Spike is used for both Icache and Prefetch improvements,
only one spike command is actually issued, with the Icache
options followed by the Prefetch options.
Portability: gcc: -Dalloca=__builtin_alloca; crafty: -DALPHA
perlbmk: -DSPEC_CPU2000_DUNIX; vortex: -DSPEC_CPU2000_LP64
gap: -DSYS_HAS_CALLOC_PROTO -DSYS_IS_BSD -DSYS_HAS_IOCTL_PROTO
-DSPEC_CPU2000_LP64
Information on UNIX V5.1B Patches can be found at
http://ftp1.service.digital.com/public/unix/v5.1b/
vm:
vm_bigpg_enabled = 1
vm_bigpg_thresh=16
vm_swap_eager = 0
proc:
max_per_proc_address_space = 0x40000000000
max_per_proc_data_size = 0x40000000000
max_per_proc_stack_size = 0x40000000000
max_proc_per_user = 2048
max_threads_per_user = 0
maxusers = 16384
per_proc_address_space = 0x40000000000
per_proc_data_size = 0x40000000000
per_proc_stack_size = 0x40000000000
In the ES80, there are two cpus per shelf. Each cpu has
its own 4GB of memory. Neither of the cpus can be
physically removed. For 1 cpu results measured on a 2 cpu
system, one cpu was turned off at boot time using the
/etc/sysconfigtab setting "cpu_enabled_mask=0". The cpu's
4GB of memory was also physically removed.
|