Sun SPEC CPU2000 Flag Descriptions
Sun ONE Studio 8
Last updated: 11-Feb-2004
Note: This flags file is alphabetized by command
or switch name,
without regard to upper/lower case, without regard to the presence
or absence of a leading "-", and without regard to the software
component that uses the command or switch. The component is mentioned in
(parentheses) immediately after the name of the command or switch.
It is hoped that this order of presentation will make it easier to look
up commands or switches even if the reader does not already know what
software component they belong to.
-Abcopy (optimizer)
Increase the probability that the compiler will perform
memcpy/memset transformations.
-Addint:ignore_parallel (optimizer)
Ignore parallelization factors in loop
interchange heuristics.
-Addint:sf=<n> (optimizer)
When considering whether to interchange loops,
set memory store operation weight to n.
A higher value of n indicates a greater performance
cost for stores.
-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>]
[:mi][:recursion=1]
(optimizer)
Control the optimizer's loop inliner: |
|
cp=<n> |
The minimum call site frequency counter
in order to consider a routine for inlining.
|
|
cs=<n> |
Set inline callee size limit to n. The unit
roughly corresponds to the number of instructions.
|
|
inc=<n> |
The inliner is allowed to increase the
size of the program by up to n%.
|
|
irs=<n> |
Allow routines to increase by up to n. The
unit roughly corresponds to the number of instructions.
|
|
mi |
Perform maximum inlining (without considering code
size increase).
|
|
recursion=1 |
Allow routines that are called recursively to still be
eligible for inlining.
|
-Aivsub3 (optimizer)
Increase the probability that loop induction variables will replaced,
so that some extraneous code can be eliminated from loops.
-Aloop_dist:ignore_parallel (optimizer)
Ignore parallelization factors in loop
distribution heuristics.
-Amemopt:arrayloc (optimizer)
Reconstruct array subscripts during memory allocation merging and
data layout program transformation.
-Apf:llist=<n>:noinnerllist (optimizer)
Do speculative prefetching for link-list data structures:
llist=<n> perform prefetching n
iterations ahead
noinnerllist do not attempt for innermost loops.
-Apf:pdl=1 (optimizer)
Do prefetching for one-level indirect memory references.
-array_pad_rows,<n> (Fortran)
Enable padding of arrays by n.
-Ashort_ldst (optimizer)
Convert multiple short memory operations into single long
memory operations.
-Atile:skewp[:b<n>] (optimizer)
Perform loop tiling which is enabled by loop skewing. Loop skewing is a
transformation that transforms a non-fully interchangeable loop nest
to a fully interchangeable loop nest. The optional b<n>
sets the tiling block size to n.
-Aujam:inner=g (optimizer)
Increase the probability that small-trip-count inner loops will
be fully unrolled.
autoup=<n> (Unix)
When the file system flush daemon fsflush runs, it
will write to disk all modified file buffers that are more than
n seconds old.
cc (C compiler)
Invoke the Sun ONE Studio 8 Compiler C
CC (C++ compiler)
Invoke the Sun ONE Studio 8 Compiler C++
cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors will be enabled at boot time.
<n> represents a bitmap of the
processors that will be brought online.
-crit (optimizer)
Enable optimization of critical control paths
-dalign (C, C++, Fortran)
Assume data is naturally aligned.
-Dalloca=__builtin_alloca (Portability: SPEC Tools)
Portability switch, used for 176.gcc: allow use of compiler's internal
builtin alloca.
-depend (Fortran)
Synonym for -xdepend.
-DHOST_WORDS_BIG_ENDIAN (Portability: SPEC Tools)
Portability switch, used for 176.gcc: controls how bytes are numbered within
a word.
disablecomponent (System Management Services)
This command is used prior to booting the system for a 1-cpu test.
The tester uses disablecomponent to add all other CPUs
to the "blacklist",
which is a list of components that cannot be used at boot time.
-D__MATHERR_ERRNO_DONTCARE (C)
Allows the compiler to assume that your code does not rely on setting
of the errno variable.
-DSPEC_CPU2000_SOLARIS (Portability: SPEC Tools)
Portability switch, used for 253.perlbmk: selects header files and
code paths compatible with Solaris.
-DSUN (Portability: SPEC Tools)
Portability switch, used for 186.crafty: selects header files and code paths
compatible with solaris.
-DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
-DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
-DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
-DSYS_HAS_TIME_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.
-DSYS_IS_USG (Portability: SPEC Tools)
Portability switch, used for 254.gap: selects code compatible with
USG-based systems.
-e (Portability, Fortran)
Portability switch, used for 178.galgel: allows source lines to be
up to 132 characters long.
f90 (Fortran compiler)
Invoke the Sun ONE Studio 8 Compiler Fortran 90
-fast (C)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
-D__MATHERR_ERRNO_DONTCARE
-dalign
-fns
-fsimple=2
-fsingle
-ftrap=%none
-xalias_level=basic
-xbuiltin=%all
-xdepend
-xlibmil
-xO5
-xprefetch=auto,explicit
-xtarget=native
-fast (C++)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
-dalign
-fns
-fsimple=2
-ftrap=%none
-xbuiltin=%all
-xlibmil
-xlibmopt
-xO5
-xtarget=native
-fast (Fortran)
A convenience option, this switch selects the following switches that
are defined elsewhere in this page:
-dalign
-depend
-fns
-fsimple=2
-ftrap=common
-xlibmil
-xlibmopt
-xO5
-xpad=local
-xprefetch=auto,explicit
-xtarget=native
-xvector=yes
-fixed (Portability, Fortran)
Portability switch, used for 178.galgel: assume fixed-format source input.
-fns (C, C++, Fortran)
Selects faster (but nonstandard) handling of floating point
arithmetic exceptions and gradual underflow.
-fsimple=<n> (C, C++, Fortran)
Controls simplifying assumptions for floating point arithmetic:
- -fsimple=0 permits no simplifying assumptions.
Preserves strict IEEE 754 conformance.
- -fsimple=1 allows the optimizer to assume:
- The IEEE 754 default rounding/trapping modes do not change
after process initialization.
- Computations producing no visible result other than potential
floating-point exceptions may be deleted.
- Computations with Infinity or NaNs as operands need not
propagate NaNs to their results. For example, x*0 may be replaced
by 0.
- Computations do not depend on sign of zero.
- -fsimple=2 permits more aggressive floating point
optimizations that may cause
programs to produce different numeric results due to changes in
rounding. Even with -fsimple=2, the optimizer
still is not permitted to introduce a floating point exception
in a program that otherwise produces none.
-fsingle (C)
Evaluate float expressions as single precision.
-ftrap=common (C, C++, Fortran)
Sets the IEEE 754 trapping mode to common exceptions (invalid, division
by zero, and overflow).
-ftrap=%none (C, C++, Fortran)
Turns off all IEEE 754 trapping modes.
LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time
and run-time linkers. Usually, it can be defaulted; but testers may
sometimes choose to explicitly set it (as documented in the notes in the
submission), in order to ensure that the correct versions of libraries
are picked up.
LD_PRELOAD=mpss.so.1 (Unix)
Allow use of the mpss.so.1 shared object, which provides a means
by which preferred stack and/or heap page sizes can be selected.
-library=iostream (Portability, C++)
Portability switch, used for 252.eon: allow use of the classic iostream
library.
-ll2amm (linker)
Include a library containing chip specific memory routines.
-lm (linker)
Include the math library.
-lmopt (linker)
Include the optimized math library. This option usually generates
faster code, but may produce slightly different results. Usually
these results will differ only in the last bit.
MPSSHEAP=<n> (Unix)
Specify the preferred page size for heap. The specified page size is
applied to all created processes.
MPSSSTACK=<n> (Unix)
Specify the preferred page size for stack. The specified page size is
applied to all created processes.
-noex (C++)
Do not allow C++ exceptions. A throw specification on a function is
accepted but ignored; the compiler does not generate exception code.
-O (Fortran)
A synomym for -xO3.
PARALLEL=<n> (Unix)
Specify the requested number of processors for running programs
that have been compiled with -xautopar.
priocntl -e -c RT -p 15 -t 20 (Unix)
Requests that the benchmarks be run at high priority,
specifically in the Real Time scheduling category. -p
n indicates the priority, by default a number in the
range of 0 to 59; -t n indicates the time
quantum given to a process (if not preempted by a higher
priority process), in units of milliseconds.
-Qdepgraph-early_cross_call=1 (code generator)
There are several scheduling passes in the compiler. This option
allows early passes to move instructions across call instructions.
-Qeps:enabled=1 (code generator)
Use enhanced pipeline scheduling(EPS) and selective scheduling
algorithms for instruction scheduling.
-Qeps:rp_filtering_margin=100 (code generator)
Turn off register pressure heuristics in EPS.
-Qeps:ws=<n> (code generator)
Set the EPS window size, that is, the number of instructions it will
consider across all paths when trying to find independent instructions
to schedule a parallel group. Larger values may result in better
run time, at the cost of increased compile time.
-Qgsched-T<n> (code generator)
Sets the aggressiveness of the trace formation, where n
is 4, 5, or 6. The higher the value of n, the lower
the branch probability needed to include a basic block in a trace.
-Qicache-chbab=1 (code generator)
Turn on optimization to reduce branch after branch penalty: nops
will be inserted to prevent one branch from occupying the delay slot of
another branch.
-Qipa:valueprediction (code generator)
Use profile feedback data to predict values and attempt to
generate faster code along these control paths, even at the
expense of possibly slower code along paths leading to different
values. Correct code is generated for all paths.
-Qiselect-funcalign=<n> (code generator)
Do function entry alignment at n-byte boundaries.
-Qiselect-sw_pf_tbl_th=<n> (code generator)
Peels the most frequent test branches/cases off a switch until
the branch probability reaches less than 1/n. This is effective
only when profile feedback is used
-Qlp=<n>[-av=<n>][-t=<n>][-fa=<n>][-fl=<n>] (code generator)
Control irregular loop prefetching: |
|
lp=<n> |
Turns the module on (1) or off (0) (default is on for F90;
off for C/C++)
|
|
-av=<n> |
Sets the prefetch look ahead distance, in bytes. Default is 256.
|
|
-t=<n> |
Sets the number of attempts at prefetching. If not
specified, t=2 if -xprefetch_level=3 has been
set; otherwise, defaults to t=1.
|
|
-fa=<n> |
1=Force user settings to override internally computed values.
|
|
-fl=<n> |
1=Force the optimization to be turned on for all languages.
|
-Qms_pipe+alldoall (code generator)
Specifies that all loops can be pipelined without needing to
be concerned about loop-carried dependencies.
-Qms_pipe+intdivusefp (code generator)
In pipelined loops, use floating point divide instructions
for signed integer division.
-Qms_pipe+prefolim=<n> (code generator)
Set number of outstanding prefetches in pipelined loops to <n>
-Qms_pipe+unoovf (code generator)
Assert (to the pipeliner) that unsigned int computations will not overflow.
-Qms_pipe-prefst (code generator)
Turn off prefetching for stores in the pipeliner.
-Qoption cg -switch[,-switch...] (C++, Fortran)
Send the listed switch(es) to the code generator. See the definitions
of the individual switches elsewhere in this page (alphabetically
ordered).
-Qoption f90comp -switch[,-switch...] (Fortran)
Send the listed switch(es) to the Fortran 90 front end. See the definitions
of the individual switches elsewhere in this page (alphabetically
ordered).
-Qoption iropt -switch[,-switch...] (C++, Fortran)
Send the listed switch(es) to the global optimizer. See the definitions
of the individual switches elsewhere in this page (alphabetically
ordered).
-Qpeep-Sh0 (code generator)
Reduce the probability that the compiler will hoist sethi insructions
out of loops.
RM_SOURCES = lapak.f90 (SPEC tools)
This option allows building the benchmark 178.galgel without its
copy of the lapak sources; instead, the lapak entry points in
the sunperf library are used.
rm -rf ./feedback.profile ./SunWS_cache (Unix)
Remove any profile feedback information from previous runs.
STACKSIZE=<n> (Unix)
Set the size of the stack (temporary storage area) for each slave
thread of a multithreaded program.
-stackvar (Fortran)
Allocate routine local variables on the stack.
submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix)
When running multiple copies of benchmarks, the SPEC config file feature
submit is sometimes used to cause individual jobs to be
bound to specific processors:
- submit= causes the SPEC tools to use this line
when submitting jobs.
- echo ...> dobmk causes the generated commands
to be written to a file, namely dobmk.
- pbind -b causes this copy's processes to be bound to
the CPU specified by the expression that follows it. See the
config file used in the submission for the exact syntax, which
tends to be cumbersome because of the need to carefully quote
parts of the expression. When all expressions are evaluated,
each CPU ends up with exactly one copy of each benchmark.
The pbind expression may include:
- $SPECUSERNUM: the SPEC tools-assigned number for
this copy of the benchmark.
- expr: Calculate simple arithmetic expressions.
For example, the effect of binding jobs to a
(quote-resolved) expression such as:
expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) )
would be to send the jobs to processors whose numbers are:
0,1,2,3, 8,9,10,11, 16,17,18,19 ...
- psrinfo: find out what processors are available
- grep on-line: search the psrinfo
output for information regarding on-line cpus
- awk...print \$1: Pick out the
line corresponding to this copy of the benchmark
and use the CPU number mentioned at the start of this line.
- sh dobmk actually runs the benchmark.
tune_t_fsflushr=<n> (Unix)
Controls the number of seconds between runs of the file system
flush daemon, fsflush.
ulimit -s unlimited (Unix)
Allow stack size to grow without limit.
-W2,-switch[,-switch...] (C)
Send the listed switch(es) to the global optimizer. See the definitions
of the individual switches elsewhere in this page (alphabetically
ordered).
-Wc,-switch[,-switch...] (C)
Send the listed switch(es) to the code generator. See the definitions
of the individual switches elsewhere in this page (alphabetically
ordered).
-xalias_level=[basic|std|strong] (C)
Allows the compiler to perform type-based alias analysis at the
specified alias level:
- basic assume that memory references
that involve different C basic types do not alias each
other.
- std assume aliasing rules described in
the ISO 1999 C standard.
- strong in addition to the restrictions
at the std level, assume that pointers of
type char * are used only to access an object of
type char; and assume that there are no interior pointers.
-xalias_level=compatible (C++)
Allows the compiler to assume that layout-incompatible types
are not aliased.
-xarch=v8plusb (C, C++, Fortran)
Allow the compiler to use instructions from architecture level v8plusb
(UltraSPARC III, 32-bit mode).
-xautopar (C, Fortran)
Turn on automatic parallelization for multiple processors.
-xbuiltin=%all (C, C++)
Substitute intrinsic functions or inline system functions where
profitable for performance.
-xchip=ultra3 (C, C++, Fortran)
Specify that the target processor will be an UltraSPARC-III.
-xdepend (C, Fortran)
Analyze loops for inter-iteration data dependencies, and do loop
restructuring.
-xinline= (C, C++, Fortran)
Turn off inlining.
-xipo[=2] (C, C++, Fortran)
Perform optimizations across all object files in the link step:
0=off
1=on
2=performs whole-program detection and analysis.
At -xipo=2, the compiler performs inter-procedural
aliasing analysis as well as optimization of memory
allocation and layout to improve cache performance.
-xlibmil (C, C++, Fortran)
Use inline expansion for math library, libm.
-xlibmopt (C++, Fortran)
Select the optimized math library.
-xlic_lib=sunperf (C, C++, Fortran)
Link with Sun supplied licensed sunperf library.
-xlinkopt (C, C++, Fortran)
Perform link-time optimizations, such as branch optimization
and cache coloring.
-xO<n> (C, C++, Fortran)
Specify optimization level n:
- -xO1 does only basic local optimizations (peephole.)
- -xO2 Do basic local and
global optimizations, such as induction variable
elimination, common subexpression elimination, constant
propogation, register allocation, and basic block merging.
- -xO3 Add global
optimizations at the function level, loop unrolling,
and software pipelining.
- -xO4 Adds automatic
inlining of functions in the same file.
- -xO5 Uses optmization
algorithms that may take significantly more compilation
time or that do not have as high a probability of improving
execution time, such as speculative code motion.
-xpad=common[:<n>] (Fortran)
If multiple same-sized arrays are placed in common,
insert padding between them for better use of cache.
n specifies the amount of padding to apply,
in units that are the same size as the array elements.
If no parameter is
specified then the compiler selects one automatically.
-xpad=local (Fortran)
Pad local variables, for better use of cache.
-xpagesize=<n> (C, Fortran)
Set the preferred page size for running the program.
-xprefetch=auto,explicit (C, C++, Fortran)
Allow generation of prefetch instructions. -xprefetch=yes is a
synonym for -xprefetch=auto,explicit.
-xprefetch=latx:<n> (C, C++, Fortran)
Adjust the compiler's assumptions about prefetch latency by
the specified factor. Typically values in the range of
0.5 to 2.0 will be useful. A lower number might indicate
that data will usually be cache resident; a higher number
might indicate a relatively larger gap between the processor
speed and the memory speed (compared to the assumptions built
into the compiler).
-xprefetch=no%auto (C, C++, Fortran)
Turn off prefetch instruction generation.
-xprefetch_level=<n> (C, C++, Fortran)
Control the level of searching that the compiler does for prefetch
opportunities by setting n to 1, 2, or 3, where higher
numbers mean to do more searching. The default is 2.
-xprofile=collect:./feedback (C, C++, Fortran)
Collect profile data for feedback-directed optimization, and store it in
a subdirectory of the current directory, named ./feedback.
-xprofile=use:./feedback (C, C++, Fortran)
Use data collected for profile feedback. Look for it in
a subdirectory of the current directory, named ./feedback.
-xreduction (C, Fortran)
Analyze loops for reductions such as dot products, maximum and
minimum finding.
-xrestrict (C)
Treat pointer-valued function parameters as restricted pointers.
-xsafe=mem (C, C++, Fortran)
Enables the use of non-faulting loads when used in conjunction
with -xarch=v8plus. Assumes that no memory
based traps will occur.
-xtarget=native (C, C++, Fortran)
Selects options appropriate for the system where the compile is
taking place, including architecture, chip, and cache sizes. (These
can also be controlled separately, via -xarch, -xchip, and -xcache,
respectively.)
-xvector (C, Fortran)
Allow the compiler to transform math library calls within loops
into calls to the vector math library. Specifying
-xvector is equivalent to -xvector=yes.
|