Baseline C : cc -arch ev7 -fast +CFB ONESTEP
C++: cxx -arch ev7 -O2 ONESTEP
Peak:
The following use: -g3 -arch ev7 ONESTEP
175.vpr 181.mcf 197.parser 253.perlbmk
The following use: -g3 -arch ev6 ONESTEP
164.gzip 176.gcc 254.gap 255.vortex 256.bzip2 300.twolf
Individual benchmark tuning:
164.gzip: -fast -O4 -non_shared +CFB
175.vpr: -fast -O4 -assume restricted_pointers +CFB
176.gcc: -fast -O4 -xtaso_short -all -ldensemalloc -none
+CFB +IFB
181.mcf: -fast -xtaso_short +CFB +IFB +PFB
186.crafty: same as base
197.parser: -fast -O4 -xtaso_short -non_shared +CFB
252.eon: -arch ev7 -O2 -all -ldensemalloc -none
253.perlbmk: -fast -non_shared +CFB +IFB
254.gap: -fast -O4 -non_shared +CFB +IFB +PFB
255.vortex: -fast -non_shared +CFB +IFB
256.bzip2: -fast -O4 -non_shared +CFB
300.twolf: -fast -O4
-ldensemalloc -non_shared +CFB +IFB
Most benchmarks are built using one or more types of
profile-driven feedback. The types used are designated
by abbreviations in the notes:
+CFB: Code generation is optimized by the compiler, using
feedback from a training run. These commands are
done before the first compile (in phase "fdo_pre0"):
mkdir /tmp/pp
rm -f /tmp/pp/${baseexe}*
and these flags are added to the first and second compiles:
PASS1_CFLAGS = -prof_gen_noopt -prof_dir /tmp/pp
PASS2_CFLAGS = -prof_use -prof_dir /tmp/pp
(Peak builds use /tmp/pp above; base builds use /tmp/pb.)
+IFB: Icache usage is improved by the post-link-time optimizer
Spike, using feedback from a training run. These commands
are used (in phase "fdo_postN"):
mv ${baseexe} oldexe
spike oldexe -feedback oldexe -o ${baseexe}
+PFB: Prefetches are improved by the post-link-time optimizer
Spike, using feedback from a training run. These
commands are used (in phase "fdo_post_makeN"):
rm -f *Counts*
mv ${baseexe} oldexe
pixie -stats dstride oldexe 1>pixie.out 2>pixie.err
mv oldexe.pixie ${baseexe}
A training run is carried out (in phase "fdo_runN"), and
then this command (in phase "fdo_postN"):
spike oldexe -fb oldexe -stride_prefetch -o ${baseexe}
When Spike is used for both Icache and Prefetch improvements,
only one spike command is actually issued, with the Icache
options followed by the Prefetch options.
vm:
vm_bigpg_enabled = 1
vm_bigpg_thresh=16
vm_swap_eager = 0
proc:
max_per_proc_address_space = 0x40000000000
max_per_proc_data_size = 0x40000000000
max_per_proc_stack_size = 0x40000000000
max_proc_per_user = 2048
max_threads_per_user = 0
maxusers = 16384
per_proc_address_space = 0x40000000000
per_proc_data_size = 0x40000000000
per_proc_stack_size = 0x40000000000
Portability: gcc: -Dalloca=__builtin_alloca; crafty: -DALPHA
perlbmk: -DSPEC_CPU2000_DUNIX; vortex: -DSPEC_CPU2000_LP64
gap: -DSYS_HAS_CALLOC_PROTO -DSYS_IS_BSD -DSYS_HAS_IOCTL_PROTO
-DSPEC_CPU2000_LP64
Information on UNIX V5.1B Patches can be found at
http://ftp1.service.digital.com/public/unix/v5.1b/
Processes were bound to CPUs using 'runon'.
In the GS1280, there are two CPUs per shelf. Each CPU
has its own 4GB of memory. Neither of the CPUs can be
physically removed. For 1 CPU result measurements,
one CPU was turned off at boot time using the
/etc/sysconfigtab setting "cpu_enabled_mask=0". The
second CPU's 4GB of memory was also physically removed.
|