Sun SPEC CPU2000 Flags

Sun SPEC CPU2000 Flag Descriptions

Sun ONE Studio 8
Last updated: 11-Feb-2004

Note: This flags file is alphabetized by command or switch name, without regard to upper/lower case, without regard to the presence or absence of a leading "-", and without regard to the software component that uses the command or switch. The component is mentioned in (parentheses) immediately after the name of the command or switch.

It is hoped that this order of presentation will make it easier to look up commands or switches even if the reader does not already know what software component they belong to.

-Abcopy (optimizer)
Increase the probability that the compiler will perform memcpy/memset transformations.

-Addint:ignore_parallel (optimizer)
Ignore parallelization factors in loop interchange heuristics.

-Addint:sf=<n> (optimizer)
When considering whether to interchange loops, set memory store operation weight to n. A higher value of n indicates a greater performance cost for stores.

-Ainline[:cp=<n>][:cs=<n>][:inc=<n>][:irs=<n>]         [:mi][:recursion=1] (optimizer)

Control the optimizer's loop inliner:

    cp=<n> The minimum call site frequency counter in order to consider a routine for inlining.

    cs=<n> Set inline callee size limit to n. The unit roughly corresponds to the number of instructions.

    inc=<n> The inliner is allowed to increase the size of the program by up to n%.

    irs=<n> Allow routines to increase by up to n. The unit roughly corresponds to the number of instructions.

    mi Perform maximum inlining (without considering code size increase).

    recursion=1 Allow routines that are called recursively to still be eligible for inlining.

-Aivsub3 (optimizer)
Increase the probability that loop induction variables will replaced, so that some extraneous code can be eliminated from loops.

-Aloop_dist:ignore_parallel (optimizer)
Ignore parallelization factors in loop distribution heuristics.

-Amemopt:arrayloc (optimizer)
Reconstruct array subscripts during memory allocation merging and data layout program transformation.

-Apf:llist=<n>:noinnerllist (optimizer)
Do speculative prefetching for link-list data structures:
llist=<n> perform prefetching n iterations ahead
noinnerllist do not attempt for innermost loops.

-Apf:pdl=1 (optimizer)
Do prefetching for one-level indirect memory references.

-array_pad_rows,<n> (Fortran)
Enable padding of arrays by n.

-Ashort_ldst (optimizer)
Convert multiple short memory operations into single long memory operations.

-Atile:skewp[:b<n>] (optimizer)
Perform loop tiling which is enabled by loop skewing. Loop skewing is a transformation that transforms a non-fully interchangeable loop nest to a fully interchangeable loop nest. The optional b<n> sets the tiling block size to n.

-Aujam:inner=g (optimizer)
Increase the probability that small-trip-count inner loops will be fully unrolled.

autoup=<n> (Unix)
When the file system flush daemon fsflush runs, it will write to disk all modified file buffers that are more than n seconds old.

cc (C compiler)
Invoke the Sun ONE Studio 8 Compiler C

CC (C++ compiler)
Invoke the Sun ONE Studio 8 Compiler C++

cpu_bringup_set=<n> (Unix /etc/system)
Specifies which processors will be enabled at boot time. <n> represents a bitmap of the processors that will be brought online.

-crit (optimizer)
Enable optimization of critical control paths

-dalign (C, C++, Fortran)
Assume data is naturally aligned.

-Dalloca=__builtin_alloca (Portability: SPEC Tools)
Portability switch, used for 176.gcc: allow use of compiler's internal builtin alloca.

-depend (Fortran)
Synonym for -xdepend.

-DHOST_WORDS_BIG_ENDIAN (Portability: SPEC Tools)
Portability switch, used for 176.gcc: controls how bytes are numbered within a word.

disablecomponent (System Management Services)
This command is used prior to booting the system for a 1-cpu test. The tester uses disablecomponent to add all other CPUs to the "blacklist", which is a list of components that cannot be used at boot time.

-D__MATHERR_ERRNO_DONTCARE (C)
Allows the compiler to assume that your code does not rely on setting of the errno variable.

-DSPEC_CPU2000_SOLARIS (Portability: SPEC Tools)
Portability switch, used for 253.perlbmk: selects header files and code paths compatible with Solaris.

-DSUN (Portability: SPEC Tools)
Portability switch, used for 186.crafty: selects header files and code paths compatible with solaris.

-DSYS_HAS_CALLOC_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_IOCTL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_SIGNAL_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_HAS_TIME_PROTO (Portability: SPEC Tools)
Portability switch, used for 254.gap: allows use of the designated prototype.

-DSYS_IS_USG (Portability: SPEC Tools)
Portability switch, used for 254.gap: selects code compatible with USG-based systems.

-e (Portability, Fortran)
Portability switch, used for 178.galgel: allows source lines to be up to 132 characters long.

f90 (Fortran compiler)
Invoke the Sun ONE Studio 8 Compiler Fortran 90

-fast (C)
A convenience option, this switch selects the following switches that are defined elsewhere in this page:

     -D__MATHERR_ERRNO_DONTCARE 
     -dalign
     -fns 
     -fsimple=2 
     -fsingle 
     -ftrap=%none 
     -xalias_level=basic 
     -xbuiltin=%all 
     -xdepend 
     -xlibmil 
     -xO5 
     -xprefetch=auto,explicit 
     -xtarget=native

-fast (C++)
A convenience option, this switch selects the following switches that are defined elsewhere in this page:

     -dalign
     -fns
     -fsimple=2 
     -ftrap=%none 
     -xbuiltin=%all 
     -xlibmil 
     -xlibmopt 
     -xO5 
     -xtarget=native

-fast (Fortran)
A convenience option, this switch selects the following switches that are defined elsewhere in this page:

     -dalign 
     -depend
     -fns
     -fsimple=2 
     -ftrap=common 
     -xlibmil 
     -xlibmopt 
     -xO5 
     -xpad=local 
     -xprefetch=auto,explicit 
     -xtarget=native
     -xvector=yes

-fixed (Portability, Fortran)
Portability switch, used for 178.galgel: assume fixed-format source input.

-fns (C, C++, Fortran)
Selects faster (but nonstandard) handling of floating point arithmetic exceptions and gradual underflow.

-fsimple=<n> (C, C++, Fortran)
Controls simplifying assumptions for floating point arithmetic:

-fsimple=0 permits no simplifying assumptions. Preserves strict IEEE 754 conformance.
-fsimple=1 allows the optimizer to assume:
- The IEEE 754 default rounding/trapping modes do not change after process initialization.
- Computations producing no visible result other than potential floating-point exceptions may be deleted.
- Computations with Infinity or NaNs as operands need not propagate NaNs to their results. For example, x*0 may be replaced by 0.
- Computations do not depend on sign of zero.
-fsimple=2 permits more aggressive floating point optimizations that may cause programs to produce different numeric results due to changes in rounding. Even with -fsimple=2, the optimizer still is not permitted to introduce a floating point exception in a program that otherwise produces none.

-fsingle (C)
Evaluate float expressions as single precision.

-ftrap=common (C, C++, Fortran)
Sets the IEEE 754 trapping mode to common exceptions (invalid, division by zero, and overflow).

-ftrap=%none (C, C++, Fortran)
Turns off all IEEE 754 trapping modes.

LD_LIBRARY_PATH=<directories> (linker)
LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.

LD_PRELOAD=mpss.so.1 (Unix)
Allow use of the mpss.so.1 shared object, which provides a means by which preferred stack and/or heap page sizes can be selected.

-library=iostream (Portability, C++)
Portability switch, used for 252.eon: allow use of the classic iostream library.

-ll2amm (linker)
Include a library containing chip specific memory routines.

-lm (linker)
Include the math library.

-lmopt (linker)
Include the optimized math library. This option usually generates faster code, but may produce slightly different results. Usually these results will differ only in the last bit.

MPSSHEAP=<n> (Unix)
Specify the preferred page size for heap. The specified page size is applied to all created processes.

MPSSSTACK=<n> (Unix)
Specify the preferred page size for stack. The specified page size is applied to all created processes.

-noex (C++)
Do not allow C++ exceptions. A throw specification on a function is accepted but ignored; the compiler does not generate exception code.

-O (Fortran)
A synomym for -xO3.

PARALLEL=<n> (Unix)
Specify the requested number of processors for running programs that have been compiled with -xautopar.

priocntl -e -c RT -p 15 -t 20 (Unix)
Requests that the benchmarks be run at high priority, specifically in the Real Time scheduling category. -p n indicates the priority, by default a number in the range of 0 to 59; -t n indicates the time quantum given to a process (if not preempted by a higher priority process), in units of milliseconds.

-Qdepgraph-early_cross_call=1 (code generator)
There are several scheduling passes in the compiler. This option allows early passes to move instructions across call instructions.

-Qeps:enabled=1 (code generator)
Use enhanced pipeline scheduling(EPS) and selective scheduling algorithms for instruction scheduling.

-Qeps:rp_filtering_margin=100 (code generator)
Turn off register pressure heuristics in EPS.

-Qeps:ws=<n> (code generator)
Set the EPS window size, that is, the number of instructions it will consider across all paths when trying to find independent instructions to schedule a parallel group. Larger values may result in better run time, at the cost of increased compile time.

-Qgsched-T<n> (code generator)
Sets the aggressiveness of the trace formation, where n is 4, 5, or 6. The higher the value of n, the lower the branch probability needed to include a basic block in a trace.

-Qicache-chbab=1 (code generator)
Turn on optimization to reduce branch after branch penalty: nops will be inserted to prevent one branch from occupying the delay slot of another branch.

-Qipa:valueprediction (code generator)
Use profile feedback data to predict values and attempt to generate faster code along these control paths, even at the expense of possibly slower code along paths leading to different values. Correct code is generated for all paths.

-Qiselect-funcalign=<n> (code generator)
Do function entry alignment at n-byte boundaries.

-Qiselect-sw_pf_tbl_th=<n> (code generator)
Peels the most frequent test branches/cases off a switch until the branch probability reaches less than 1/n. This is effective only when profile feedback is used

-Qlp=<n>[-av=<n>][-t=<n>][-fa=<n>][-fl=<n>] (code generator)

Control irregular loop prefetching:

    lp=<n> Turns the module on (1) or off (0) (default is on for F90; off for C/C++)

    -av=<n> Sets the prefetch look ahead distance, in bytes. Default is 256.

    -t=<n> Sets the number of attempts at prefetching. If not specified, t=2 if -xprefetch_level=3 has been set; otherwise, defaults to t=1.

    -fa=<n> 1=Force user settings to override internally computed values.

    -fl=<n> 1=Force the optimization to be turned on for all languages.

-Qms_pipe+alldoall (code generator)
Specifies that all loops can be pipelined without needing to be concerned about loop-carried dependencies.

-Qms_pipe+intdivusefp (code generator)
In pipelined loops, use floating point divide instructions for signed integer division.

-Qms_pipe+prefolim=<n> (code generator)
Set number of outstanding prefetches in pipelined loops to <n>

-Qms_pipe+unoovf (code generator)
Assert (to the pipeliner) that unsigned int computations will not overflow.

-Qms_pipe-prefst (code generator)
Turn off prefetching for stores in the pipeliner.

-Qoption cg -switch[,-switch...] (C++, Fortran)
Send the listed switch(es) to the code generator. See the definitions of the individual switches elsewhere in this page (alphabetically ordered).

-Qoption f90comp -switch[,-switch...] (Fortran)
Send the listed switch(es) to the Fortran 90 front end. See the definitions of the individual switches elsewhere in this page (alphabetically ordered).

-Qoption iropt -switch[,-switch...] (C++, Fortran)
Send the listed switch(es) to the global optimizer. See the definitions of the individual switches elsewhere in this page (alphabetically ordered).

-Qpeep-Sh0 (code generator)
Reduce the probability that the compiler will hoist sethi insructions out of loops.

RM_SOURCES = lapak.f90 (SPEC tools)
This option allows building the benchmark 178.galgel without its copy of the lapak sources; instead, the lapak entry points in the sunperf library are used.

rm -rf ./feedback.profile ./SunWS_cache (Unix)
Remove any profile feedback information from previous runs.

STACKSIZE=<n> (Unix)
Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.

-stackvar (Fortran)
Allocate routine local variables on the stack.

submit=echo 'pbind -b...' > dobmk; sh dobmk (SPEC tools, Unix)
When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors:

submit= causes the SPEC tools to use this line when submitting jobs.
echo ...> dobmk causes the generated commands to be written to a file, namely dobmk.
pbind -b causes this copy's processes to be bound to the CPU specified by the expression that follows it. See the config file used in the submission for the exact syntax, which tends to be cumbersome because of the need to carefully quote parts of the expression. When all expressions are evaluated, each CPU ends up with exactly one copy of each benchmark. The pbind expression may include:
- $SPECUSERNUM: the SPEC tools-assigned number for this copy of the benchmark.
- expr: Calculate simple arithmetic expressions. For example, the effect of binding jobs to a (quote-resolved) expression such as:
  expr ( $SPECUSERNUM / 4 ) * 8 + ($SPECUSERNUM % 4 ) )
  would be to send the jobs to processors whose numbers are:
  0,1,2,3, 8,9,10,11, 16,17,18,19 ...
- psrinfo: find out what processors are available
- grep on-line: search the psrinfo output for information regarding on-line cpus
- awk...print \$1: Pick out the line corresponding to this copy of the benchmark and use the CPU number mentioned at the start of this line.
sh dobmk actually runs the benchmark.

tune_t_fsflushr=<n> (Unix)
Controls the number of seconds between runs of the file system flush daemon, fsflush.

ulimit -s unlimited (Unix)
Allow stack size to grow without limit.

-W2,-switch[,-switch...] (C)
Send the listed switch(es) to the global optimizer. See the definitions of the individual switches elsewhere in this page (alphabetically ordered).

-Wc,-switch[,-switch...] (C)
Send the listed switch(es) to the code generator. See the definitions of the individual switches elsewhere in this page (alphabetically ordered).

-xalias_level=[basic|std|strong] (C)
Allows the compiler to perform type-based alias analysis at the specified alias level:

basic assume that memory references that involve different C basic types do not alias each other.
std assume aliasing rules described in the ISO 1999 C standard.
strong in addition to the restrictions at the std level, assume that pointers of type char * are used only to access an object of type char; and assume that there are no interior pointers.

-xalias_level=compatible (C++)
Allows the compiler to assume that layout-incompatible types are not aliased.

-xarch=v8plusb (C, C++, Fortran)
Allow the compiler to use instructions from architecture level v8plusb (UltraSPARC III, 32-bit mode).

-xautopar (C, Fortran)
Turn on automatic parallelization for multiple processors.

-xbuiltin=%all (C, C++)
Substitute intrinsic functions or inline system functions where profitable for performance.

-xchip=ultra3 (C, C++, Fortran)
Specify that the target processor will be an UltraSPARC-III.

-xdepend (C, Fortran)
Analyze loops for inter-iteration data dependencies, and do loop restructuring.

-xinline= (C, C++, Fortran)
Turn off inlining.

-xipo[=2] (C, C++, Fortran)
Perform optimizations across all object files in the link step:
0=off
1=on
2=performs whole-program detection and analysis. At -xipo=2, the compiler performs inter-procedural aliasing analysis as well as optimization of memory allocation and layout to improve cache performance.

-xlibmil (C, C++, Fortran)
Use inline expansion for math library, libm.

-xlibmopt (C++, Fortran)
Select the optimized math library.

-xlic_lib=sunperf (C, C++, Fortran)
Link with Sun supplied licensed sunperf library.

-xlinkopt (C, C++, Fortran)
Perform link-time optimizations, such as branch optimization and cache coloring.

-xO<n> (C, C++, Fortran)
Specify optimization level n:

-xO1 does only basic local optimizations (peephole.)
-xO2 Do basic local and global optimizations, such as induction variable elimination, common subexpression elimination, constant propogation, register allocation, and basic block merging.
-xO3 Add global optimizations at the function level, loop unrolling, and software pipelining.
-xO4 Adds automatic inlining of functions in the same file.
-xO5 Uses optmization algorithms that may take significantly more compilation time or that do not have as high a probability of improving execution time, such as speculative code motion.

-xpad=common[:<n>] (Fortran)
If multiple same-sized arrays are placed in common, insert padding between them for better use of cache. n specifies the amount of padding to apply, in units that are the same size as the array elements. If no parameter is specified then the compiler selects one automatically.

-xpad=local (Fortran)
Pad local variables, for better use of cache.

-xpagesize=<n> (C, Fortran)
Set the preferred page size for running the program.

-xprefetch=auto,explicit (C, C++, Fortran)
Allow generation of prefetch instructions. -xprefetch=yes is a synonym for -xprefetch=auto,explicit.

-xprefetch=latx:<n> (C, C++, Fortran)
Adjust the compiler's assumptions about prefetch latency by the specified factor. Typically values in the range of 0.5 to 2.0 will be useful. A lower number might indicate that data will usually be cache resident; a higher number might indicate a relatively larger gap between the processor speed and the memory speed (compared to the assumptions built into the compiler).

-xprefetch=no%auto (C, C++, Fortran)
Turn off prefetch instruction generation.

-xprefetch_level=<n> (C, C++, Fortran)
Control the level of searching that the compiler does for prefetch opportunities by setting n to 1, 2, or 3, where higher numbers mean to do more searching. The default is 2.

-xprofile=collect:./feedback (C, C++, Fortran)
Collect profile data for feedback-directed optimization, and store it in a subdirectory of the current directory, named ./feedback.

-xprofile=use:./feedback (C, C++, Fortran)
Use data collected for profile feedback. Look for it in a subdirectory of the current directory, named ./feedback.

-xreduction (C, Fortran)
Analyze loops for reductions such as dot products, maximum and minimum finding.

-xrestrict (C)
Treat pointer-valued function parameters as restricted pointers.

-xsafe=mem (C, C++, Fortran)
Enables the use of non-faulting loads when used in conjunction with -xarch=v8plus. Assumes that no memory based traps will occur.

-xtarget=native (C, C++, Fortran)
Selects options appropriate for the system where the compile is taking place, including architecture, chip, and cache sizes. (These can also be controlled separately, via -xarch, -xchip, and -xcache, respectively.)

-xvector (C, Fortran)
Allow the compiler to transform math library calls within loops into calls to the vector math library. Specifying -xvector is equivalent to -xvector=yes.