Tru64 UNIX Switch Disclosure SPEC CPU2000 Hewlett-Packard Company Revised 23 January 2003 This SPEC CPU2000 switch disclosure is for Tru64 UNIX (formerly known as Digital UNIX). This document was originally written in November 1999, and will be updated to add new switches used in later submissions. An attempt is made to be cumulative, so some switches listed from earlier submissions might not be used in later submissions. Switches are given in alphabetical order rather than by product or benchmark. It is hoped that this order will be convenient for the reader of the NOTES section of a SPEC CPU2000 disclosure who wants to look up a specific command or switch. The collating sequence ignores upper/lower case, hyphens, and the presence of "no" for negation. That is, if you are looking for "-nomumble", try looking under "-mumble". Note: some switches in this disclosure statement are not used directly, but are generated by other switches (e.g. "-fast"). -aggressive= (KAP Fortran) a Pads COMMON blocks to avoid cache line collisions. c Allows inlining of routines that contain static (SAVE or DATA) variables by promoting these variables to members of a COMMON block introduced into the program. -align commons (Compaq Fortran) Aligns all COMMON block entities on natural boundaries up to 4-bytes. -align dcommons (Compaq Fortran) Aligns COMMON block entities on natural boundaries up to 8-bytes. -align sequence (Compaq Fortran) Specifies that components of a SEQUENCEd derived type are to be aligned according to the alignment rules set by the user (which by default cause components to be aligned on natural boundaries). -all -ldensemalloc -none (linker) The "dense malloc" library provides a memory allocation strategy that packs memory more tightly, at a slight cost in execution speed of malloc and free. The "-all" and "-none" options surrounding the reference to libdensemalloc cause all symbols in the library to be included in the images. -ansi_alias (Compaq C) Directs the compiler to assume the ANSI C aliasing rules. -ansi_args (Compaq C) Tells the compiler that the source code follows all ANSI rules about arguments; that is, whether the type of an argument matches the type of the parameter in the called function, or whether a function prototype is present so the compiler can automatically perform the expected type conversion. -arch (Compaq C, Compaq Fortran) Generate code that may include instructions which are newly introduced with . For example, "ev56" adds byte/word load and store, and "ev6" adds sqrt. See also -tune, below. -arl=n (KAP C) Informs KAP what level of data aliasing may be present in the program: 0 kapc makes no assumptions about data aliasing. 1 A pointer will not contain its own address. 2 No objects represented by function parameters overlap in memory. 3 Globals, function parameters, and locals form distinct groups. 4 No aliases for objects. -assume bigarrays (Compaq Fortran) Suppresses run-time checking for distributed small array dimensions, for increased performance if using the -wsf option. -assume noaccuracy_sensitive (Compaq Fortran) Same as -fp_reorder -assume nomath_errno (Compaq C) Allows the compiler to reorder or combine computations to improve the performance of those math functions that it recognizes as intrinsic functions. -assume restricted_pointers (Compaq C) During the lifetime of any given pointer "p", the memory locations accessed through it are not accessed by any other memory references. -assume trusted_short_alignment (Compaq C) Specifies that this is a strictly-conforming ANSI C program with respect to the dereferencing of pointer-to-short variables. This allows the compiler to assume that any short accessed through a pointer is naturally aligned (as the C language requires). -assume nozsize (Compaq Fortran) Omits run-time checking for zero-sized array sections for increased performance if using the -wsf option. -assume whole_program (Compaq C) Specifies that no occurrences of the address-of operator (&) are being applied outside of the current compilation unit to extern variables that are declared inside the current compilation unit. -automatic (Compaq Fortran) Places local variables on the run-time stack. cc (compiler) If the Developer's Toolkit is NOT installed, this command invokes the system C compiler. If the toolkit has been installed, then it invokes the compiler in /usr/lib/cmplrs/cc.dtk -cachesize= (KAP Fortran) Informs KAP Fortran of the size in kilobytes of the cache memory. The first argument gives the size of the primary cache, and the second gives the size of the secondary cache. -call_shared (Compaq C) Produces a dynamic executable file that uses shareable objects during run time. The loader uses shareable objects to resolve undefined symbols. -ckapargs='' (KAP C) Pass the switches between apostrophes to the KAP optimizer. -Dalloca=__builtin_alloca (Compaq C) Portability switch, used for gcc; specifies to use the builtin version of alloca +CFB (notes only) As explained in the notes section, "+CFB" is merely an abbrevation to help readability of the notes. Look elsewhere in the notes section for a description of what this abbreviation means. cpu_enable_mask (Tru64 Unix) This parameter is set in /etc/sysconfigtab to determine, at boot time, which cpus are started in a multi-cpu system. The value 0 indicates that only the master (boot) cpu be enabled. cxx (compiler) Invokes the C++ compiler -D_INTRINSICS (Compaq C) Declares certain functions to be intrinsic. When a function is intrinsic, the compiler is free to generate faster code that provides the same function behavior (but may not actually call the function). -D_INLINE_INTRINSICS (Compaq C) Directs the compiler to inline some of the intrinsic functions, avoiding the overhead of a function call. -D_FASTMATH (Compaq C) Redefines the names of certain common math routines so that faster but slightly less accurate functions are used. -DALPHA (crafty) Portability switch for crafty. Specifies that longs are 64 bits, that we do not need to say "long long" to get 64 bit quantities, and that the architecture is little endian. -DSPEC_CPU2000_DUNIX (perlbmk) Portability switch for perlbmk - see source code for exact effect in module benchspec/CINT2000/253.perlbmk/src/spec_config.h. Sets items such as number of bytes in a long, little endian byte order, how to invoke the C preprocessor, says that "fcntl" is available. -DSPEC_CPU2000_LP64 (gap, vortex) Specifies that longs and pointers are 64 bits. -DSYS_HAS_CALLOC_PROTO (gap) Specifies that the system already defines the function calloc -DSYS_HAS_IOCTL_PROTO (gap) Specifies that the system already defines ioctl -DSYS_IS_BSD (gap) Specifies that the system is compatible with BSD Unix, using conventions such as "/" for directory separation, Unix signals, string concatenation, etc. -fb name (spike) Causes spike to look for feedback information stored in files name.Counts.* f77 (compiler) EITHER invokes the f90 compiler with some flags set that increase compatibility with the older f77 standard, OR invokes the older compiler, if the link in /bin/f77 has been set as specified in the release notes. Initial CPU2000 submissions set the link for the older compiler. f90 (compiler) Invokes the f90 compiler -fast (Compaq C) Provides a single method for turning on a collection of optimizations for increased performance, namely: -ansi_alias -ansi_args -assume nomath_errno -assume trusted_short_alignment -D_INTRINSICS -D_INLINE_INTRINSICS -D_FASTMATH -float -fp_reorder -ifo -intrinsics -O3 -readonly_strings -fast (Compaq Fortran) Provides a single method for turning on a collection of optimizations for increased performance, namely: -align dcommons -arch host -assume noaccuracy_sensitive -math_library fast -O4 (the default) -tune host For f90 and f95, -fast also sets -align sequence -assume bigarrays -assume nozsize. fdo_pre0 = mkdir /tmp/pb; rm -f /tmp/pb/${baseexe}* (CPU2000 config file) Causes the SPEC tools to clean the directories where feedback is accumulated, to remove data from any previous compiles. -feedback file (spike) Causes spike to use the feedback database in the named file. -fixed (Compaq Fortran) Portability switch, used by galgel, indicates that source code is in fixed (72 column) format. -fkapargs='...' (KAP Fortran) Pass the switches between apostrophes to the KAP optimizer. -float (Compaq C) Tells the compiler that it is not necessary to promote expressions of type float to type double. -fp_reorder (Compaq C, Compaq Fortran) Allows floating-point operations to be reordered during optimiza- tion based on algebraic identities. -fuse (KAP Fortran) Tells KAP Fortran to peform loop fusion. Loop fusion transforms two adjacent loops into a single loop. -fuselevel= (KAP Fortran) Further controls the level of loop fusion. 0 Performs standard fusion techniques 1 Instructs the fusion pass to move nonadjacent loops to adjacent positions. 2 Instructs the fusion pass to attempt loop-iteration space reversal and loop peeling to provide additional opportunities to fuse loops. +GEMFB (Compaq C) Use GEM (i.e. compiler) feedback. This is an abbreviation used to make the notes section simpler, and not an actual switch. Look elsewhere in the notes section for the details. -g3 (Compaq C, Compaq C++, Compaq Fortran) Allow symbols in optimized code +IFB (notes only) As explained in the notes section, "+IFB" is merely an abbrevation to help readability of the notes. Look elsewhere in the notes section for a description of what this abbreviation means. -ifo (Compaq C) Performs inter-file optimizations. -inline speed (Compaq C, Compaq Fortran) Provides inline expansion of function calls even when doing so may significantly increase the size of the program. -[no]interleave (KAP Fortran) Controls loop unrolling and rescheduling. Interleaved unrolling can help the compiler recognize quad-word lloads and stores, which are more efficient than ordinary loads and stores. -[no]intrinsics The -intrinsics option causes the compiler to recognize intrinsic func- tions wherever it can automatically, based only on name and call signa- ture. kcc (compiler) This command invokes the KAP C high-level optimizer and then invokes the Compaq C compiler. Note: switches are passed to the KAP C optimizer within the -ckapargs="" phrase, other switches are directed to the Compaq C compiler. kf77 (compiler) This command invokes the KAP Fortran 77 high-level optimizer and then invokes the Fortran 77 compiler. When the f77 compiler is invoked, KAP adds the following switches: -fast -non_shared (single CPU only) -tune host Note: Optimization switches are passed to the KAP Fortran 77 optimizer within the -fkapargs="" phrase, other switches are directed to the Compaq Fortran 77 compiler. kf90 (compiler) This command invokes the KAP Fortran 90 high-level optimizer and then invokes the Fortran 90 compiler. When the f90 compiler is invoked, KAP adds the following switches: -fast -non_shared (single CPU only) -tune host Note: Optimization switches are passed to the KAP Fortran 90 optimizer within the -fkapargs="" phrase, other switches are directed to the Compaq Fortran 90 compiler. -ldensemalloc Please see above, under "-all -ldensemalloc -none" -ldxml (library) Specifies that the program should be linked with the Compaq extended math library, cxml, which incluces optimized BLAS functions -math_library fast (Compaq Fortran) Select math library routines that provide faster performance. For certain ranges of input values, the faster routines may not provide a result that is as accurate as provided by the default. max_per_proc_address_space (Compaq Tru64 Unix) Current maximum amount, in bytes, of user process address space. max_per_proc_data_size (Compaq Tru64 Unix) Maximum size, in bytes, of a data segement for each process. max_proc_per_user (Compaq Tru64 Unix) This parameter is set in /etc/sysconfigtab to control how many processes are allowed per user. max_thread_per_user (Compaq Tru64 Unix) This parameter is set in /etc/sysconfigtab to control how many threads are allowed per user. -non_shared (ld) Directs the linker to produce a static executable. The output object created by the linker will not use any shared objects during execution. -none Please see above, under "-all -ldensemalloc -none" -noporder Disables procedure ordering. -O0 through -O5 (Compaq Fortran) Fortran's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions O2 global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O3 additional global optimizations that may cost more space, such as loop unrolling and code replication O4 inline expansion O5 software pipelining and loop transformation -O0 through -O4 (Compaq C) Compaq C's general optimization level. O0 disable all optimizations O1 local optimizations and common subexpressions global optimizations such as code motion, strength reduction, lifetime analysis, and code scheduling O2 additional global optimizations that may cost more space, such as loop unrolling and code replication O3 inline expansion O4 software pipelining NOTE: when kcc is used, optimization levels are effectively one less than stated in the command line, for historical reasons. That is, "kcc -O4" eventually invokes the compiler backend with the same optimization level as would be used by "cc -O3". -O0 through -O4 (Compaq C++) C++ General optimization level: O0 no optimization O1 Optimize for space O2 Optimize for time O3 Same as O2 O4 Additional speed optimizations at the expense of space -o file (spike) Names the desired output file -o=n (KAP Fortran) KAP's general optimization level. 0 No optimization 1 Induction variables recognized, loop interchanging 2 Lifetime analysis 3 Additional loop interchanging, wraparound variables 4 Loop interchanging around reductions 5 Array expansion -o=n (KAP C) KAP C's general optimization level. 0 No loop optimization performed Only simple analysis performed 1 Simple loop optimization performed Loops distributed to optimize only part of loop 2 Loops in a loop nest optimized Lifetime analysis performed More powerful data dependence tests performed 3 Special techniques used to break data dependence cycles Triangular loops recognized Loop interchanging performed to improve memory referencing Special case data dependence tests used 4 Two versions of a loop generated to break data dependence arc when necessary Exact data dependence tests used Wraparound variables recognized 5 Array expansion and loop fusion enabled ONESTEP (SPEC CPU2000 config file) Setting ONESTEP=YES tells the SPEC tools to build from all sources in one step. For more information, search for "ONESTEP" in the run rules. per_proc_data_size (Compaq Tru64 Unix) Current maximum size, in bytes, of a data segement for each process. +PFB (notes only) As explained in the notes section, "+PFB" is merely an abbrevation to help readability of the notes. Look elsewhere in the notes section for a description of what this abbreviation means. -pipeline (Compaq C, Compaq Fortran) Enables software pipelining, that is, "wrap around" of loop iterations to reduce latency. pixie (Program Analysis Tool) Invokes the profiling tool to instrument an executable image. -prof_dir (Compaq C) Specifies a location to which the profiling data files (.Counts and .Addrs) are written. -prof_gen_noopt (Compaq C) Generates an executable image that has profiling code added to it and which is not optimized (this may improve the profile accuracy). -prof_use_feedback (Compaq C) Uses profiling feedback to improve runtime performance. -r= (KAP Fortran) [alternative spelling: -roundoff=] Allows the user to specify the change from serial roundoff error that is tolerable. 1 Expression simplification and code floating enabled Arithmetic reductions recognized Loop interchanging around arithmetic reductions allowed if -optimize >= 4, and loop unrolling if -so >= 1 readonly_strings (Compaq C) Makes string literals read-only for improved performance. RM_SOURCES= (SPEC CPU2000 config file) Tells the SPEC tools not to use a certain source file, normally because it will be replaced by a cxml library. -so= (KAP Fortran) [alternative spelling: -scalaropt=] Sets the level at which dusty-deck and other scalar transformations are performed. 0 No scalar optimizations performed 2 Full range of scalar optimizations enabled spike (optimizer) Performs code optimization after a program has been linked, such as code layout for efficient cache usage, deleting unreachable code, and optimization of address computations. Spike is most effective when it uses profile information to guide optimization. Example: the first usage of this tool in a SPEC CPU2000 submission was in November, 2000, as: spike -feedback ${baseexe} -o tmp ${baseexe}; mv tmp ${baseexe} In the above commands, the SPEC tools use ${baseexe} to refer to the filename of the executable without any directory specifiers or extensions that the tools will add later. (In this instance, "base" does not refer to the concept of base as used in the run rules.) Spike optimizes the executable, looking for feedback data in the executable itself (placed there by the compiler). The output of spike is written to a temporary file, which is then immediately moved to replace the original executable. +SPIKEFB (optimizer) Use SPIKE feedback. This is an abbreviation used to make the notes section simpler, and not an actual switch. Look elsewhere in the notes section for the details. -split_threshold n -splitThresh n Adjusts the threshold used by procedure splitting in code layout to decide which code is frequently and infrequently executed. The default is .99, which means that the most frequently executed basic blocks that make up at least 99 percent of the estimated execution time are considered frequently executed and the rest are marked as infrequently executed. Increasing the threshold can help when the profile is not representative. For example, try a value of .999. -stats dstride (pixie) Causes pixie to instrument the program to examine memory access strides. -stride_prefetch (spike) Causes spike to optimize prefetches, based on feedback collected as to useful strides. -transform_loops (Compaq Fortran) -notransform_loops Enables/Disables a group of loop transformation optimizations that apply to array references within loops, including loop blocking, distribution, fusion, and interchange. -tune (Compaq C, Compaq Fortran) Generate code that is optimized for a particular cpu model. This switch preferentially tunes for the specified model, but assumes that the code may be run on any processor that implements the instruction set called for in -arch. For example, the combination "-tune ev6 -arch ev56" specifies that the code should be scheduled for ev6 class machines while still preserving the ability to run quickly on machines that lack the sqrt instruction. See also -arch, above. -unroll n (Compaq C, Compaq Fortran) Specify the depth of loop unrolling -ur= (KAP Fortran) [alternate spelling: -unroll] The maximum number of iterations to unroll inner loops. Used within -fkapargs="". -ur2= (KAP Fortran) [alternate spelling: -unroll2] Sets the upper limit for unrolling. If the estimate of work is greater than , then the loop will not be unrolled. -ur3= (KAP Fortran) [alternate spelling: -unroll3] Sets the lower limit for unrolling. If the estimate of work is less than , then the loop will not be unrolled. -v (all compilers) Turn on verbose mode, so the compiler driver will print its steps as it goes. Has no effect on the generated executable. vm_bigpg_enabled (Tru64 Unix) Master switch that enables (1) or disables (0/default) memory allocation for user processes in "big pages" mode. vm_bigpg_thresh (Tru64 Unix) The percentage of physical memory that should be maintained on the free page list for each of the four possible page sizes (8, 64, 512, 4096 Kbytes) Default: 6% -xtaso_short (Compaq C) Directs the compiler to allocate 32-bit pointers by default. You can still use 64-bit pointers, but only by the use of pragmas. Using this switch can cause conflicts between the compiler's assumptions about pointer sizes and the assumptions in the system libraries. Diagnostic messages will be generated at compile time unless the installation option "protect_headers_setup(8)" has been used [run: /usr/lib/cmplrs/cc/protect_headers_setup.sh -l ]. (It is, in fact, used in the CPU2000 submissions, as requested by the manpage).