Base notes:
ONESTEP=yes for all benchmarks in base
C flags: -fast -xopenmp -xalias_level=std -xipo=2
-xprefetch_level=2 -m64 -lmtmalloc -xprofile
f90 flags: -fast -autopar -openmp -xipo=2 -xprefetch_level=2
-m64 -xprefetch=latx:3 -xprofile
Extra Base C flags: -Xc
328.fma3d_m srcalt: ompm2001-fma3dsqrtinit-20070912, fix race condition
Peak notes:
ONESTEP=yes for all benchmarks in peak
310.wupwise: -fast -autopar -openmp -xipo=2 -m64
-xprefetch=latx:3 -xprofile
312.swim: -fast -openmp -autopar -xunroll=7 -m64 -xipo=2
-xpagesize=4m -xprefetch=latx:4 -xpad=common:1921
-Qoption iropt -Atile:skewp
314.mgrid: -fast -openmp -xipo=2 -xprefetch_level=3 -m64
-xpagesize=4m -xprefetch=latx:3 -xcode=abs32
-xunroll=8 -xprofile
OMP_NUM_THREADS=128
SUNW_MP_PROCBIND=true
316.applu: -fast -openmp -autopar -xipo=2 -xprefetch_level=3
-m64 -xprefetch=latx:4 -xcode=abs32 -xunroll=2
-xpagesize_heap=4m -xlinkopt=2
OMP_NUM_THREADS = 80
SUNW_MP_PROCBIND= 2 3 4 5 6 10 11 12 13 14 18 19 20 21 22
26 27 28 29 30 34 35 36 37 38 42 43 44 45 46 50 51 52
53 54 58 59 60 61 62 66 67 68 69 70 74 75 76 77 78 82
83 84 85 86 90 91 92 93 94 98 99 100 101 102 106 107
108 109 110 114 115 116 117 118 122 123 124 125 126
318.galgel: -fast -xipo=2 -openmp -autopar -prefetch=latx:3.0
-xunroll=8 -dbl_align_all=yes -stackvar -xlinkopt=2
-xlic_lib=sunperf -m64 -xprofile
RM_SOURCES=lapak.f90
OMP_NUM_THREADS = 64
SUNW_MP_PROCBIND= 3 4 5 6 11 12 13 14 19 20 21 22 27 28
29 30 35 36 37 38 43 44 45 46 51 52 53 54 59 60 61 62
67 68 69 70 75 76 77 78 83 84 85 86 91 92 93 94 99
100 101 102 107 108 109 110 115 116 117 118 123 124
125 126
320.equake: -fast -xopenmp -xipo=2 -xprefetch=latx:2
-xprefetch_level=3 -m64 -xunroll=4 -lmtmalloc
-xpagesize=64K -xautopar -xprofile
srcalt = ompl.32
OMP_NUM_THREADS = 96
SUNW_MP_PROCBIND= 1 2 3 4 5 6 9 10 11 12 13 14 17 18 19 20
21 22 25 26 27 28 29 30 33 34 35 36 37 38 41 42 43 44 45
46 49 50 51 52 53 54 57 58 59 60 61 62 65 66 67 68 69 70
73 74 75 76 77 78 81 82 83 84 85 86 89 90 91 92 93 94 97
98 99 100 101 102 105 106 107 108 109 110 113 114 115
116 117 118 121 122 123 124 125 126
324.apsi: -fast -openmp -xipo=2 -m64 -xpagesize=4M
-xprefetch=latx:5 -xunroll=5 -xprofile
OMP_NUM_THREADS = 128
SUNW_MP_PROCBIND= 3 11 19 27 35 43 51 59 67 75 83 91 99
107 115 123 4 12 20 28 36 44 52 60 68 76 84 92 100 108
116 124 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117
125 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126
2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 7 15
23 31 39 47 55 63 71 79 87 95 103 111 119 127 1 9 17
25 33 41 49 57 65 73 81 89 97 105 113 121 0 8 16 24 32
40 48 56 64 72 80 88 96 104 112 120
326.gafort: -fast -autopar -openmp -xipo=2 -xprefetch_level=3
-m64 -xpagesize=4m -xprefetch=latx:5 -xunroll=6
-dbl_align_all=yes -stackvar -xprofile
OMP_NUM_THREADS = 127
SUNW_MP_PROCBIND= 3 11 19 27 35 43 51 59 67 75 83 91 99
107 115 123 4 12 20 28 36 44 52 60 68 76 84 92 100 108
116 124 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117
125 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126
2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 7 15
23 31 39 47 55 63 71 79 87 95 103 111 119 127 1 9 17
25 33 41 49 57 65 73 81 89 97 105 113 121 0 8 16 24 32
40 48 56 64 72 80 88 96 104 112 120
328.fma3d: -fast -autopar -openmp -xipo=2 -xprefetch_level=3
-xprefetch=latx:4 -m64 -xcode=abs32 -xprofile
srcalt = ompl.32.sqrt.init
ompm2001-fma3dsqrtinit-20070912, fix race condition
330.art: -fast -xopenmp -xautopar -xipo=2 -m64 -xprofile
OMP_NUM_THREADS = 127
SUNW_MP_PROCBIND= 3 11 19 27 35 43 51 59 67 75 83 91 99
107 115 123 4 12 20 28 36 44 52 60 68 76 84 92 100 108
116 124 5 13 21 29 37 45 53 61 69 77 85 93 101 109 117
125 6 14 22 30 38 46 54 62 70 78 86 94 102 110 118 126
2 10 18 26 34 42 50 58 66 74 82 90 98 106 114 122 7 15
23 31 39 47 55 63 71 79 87 95 103 111 119 127 1 9 17
25 33 41 49 57 65 73 81 89 97 105 113 121 0 8 16 24 32
40 48 56 64 72 80 88 96 104 112 120
332.ammp: -fast -xipo=2 -xopenmp -xalias_level=strong -lm
-xprefetch=latx:2 -xlinkopt=2 -xpagesize_stack=8K
-xpagesize_heap=4M
-xprefetch_auto_type=indirect_array_access
Other notes:
318.galgel_m portability flags: -e -fixed
330.art_m extra flags: -DINTS_PER_CACHELINE=16 -DDBLS_PER_CACHELINE=8
Feedback optimization (-xprofile) is done as follows,
unless otherwise noted:
fdo_pre0: rm -rf `pwd`/feedback.profile
PASS1: -xprofile=collect:./feedback
PASS2: -xprofile=use:./feedback
The following user environment was used for base and peak runs
except as previously noted:
ulimit -s 32768 (in /bin/sh)
export OMP_DYNAMIC=FALSE
export OMP_NUM_THREADS=127
export SUNW_MP_PROCBIND="1-127"
export SUNW_MP_THR_IDLE=SPIN
export STACKSIZE=16384
/etc/system parameters:
autoup=600
Causes pages older than the listed number of seconds to
be written by fsflush.
bufhwm=3000
Memory byte limit for caching I/O buffers
segmap_percent=1
Set maximum percent memory for file system cache
tune_t_fsflushr=10
Controls how many seconds elapse between runs of the
page flush daemon, fsflush.
tsb_rss_factor=128
Suggests that the the size of the TSB (Translation Storage Buffer)
may be increased if it is more than 25% (128/512) full. Doing so
may reduce TSB traps, at the cost of additional kernel memory.
The "webconsole" service was turned off using
svcadm disable webconsole
Sun Studio compiler patches are available at
The tested configuration included patch 124867-07,
124861-08, 124863-07, 127000-06
For a description of Sun Studio 12 Compiler flags, portability flags
and system parameters used to generate this result, please refer to
SUN-20080714-Studio-Solaris-sparc.txt file in the flags directory.
|