The following text (courtesy of Intel Corp.) describes the compiler flags used for Unisys CPU2000 results generated with the Intel 7.1 compilers on Window-based sysetms. Description of compiler flags for Intel C++ Compiler 7.0 and 7.1 ---------------------------------------------------------------- -O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. -O2 This is the default level of optimization. Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. -O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. -Oa[-] assume [do not assume] no aliasing in program -Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) -Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -Ob{0|1|2} Controls the compiler's inline expansion. 0: disable inlining. 1: disables inlining unless -Qip or -Ob2 are specified. 2: enables inlining of any function. However, the compiler decides which functions are inlined. This option enables interprocedural optimizations and has the same effect as specifying the -Qip option. -Oi[-] enables [disables] inline expansion of standard library functions -Qfp_port Round fp results at assignments and casts. -Qip enable single-file IP optimizations (within files, same as -Ob2) -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization -Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files -Qrcd The Intel compiler uses the -Qrcd option to improve the performance of code that requires floating-point-to-integer conversions. The system default floating point rounding mode is round-to-nearest. This means that values are rounded during floating point calculations. However, the C language requires floating point values to be truncated when a conversion to an integer is involved. To do this, the compiler must change the rounding mode to truncation before each floating point-to-integer conversion and change it back afterwards. The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance, but floating point conversions to integer will not conform to C semantics. -Qunroll[n] Specifies the maximum number of times to unroll a loop. Omit n to let the compiler decide whether to perform unrolling or not. Use n = 0 to disable unroller. If n is not specified, the compiler automatically chooses the maximum number of times to unroll a loop. -GX Enables the full C++ Exception Handling unwind semantics. -GR Enables C++ Runtime Type Information (RTTI). -Zp{1|2|4|8|16} Specifies the strictest alignment constraint for structure and union types as one of the following: 1, 2, 4, 8, or 16 (default) bytes. shlW32M.lib: MicroQuill SmartHeap Library 6.0 available from http://www.microquill.com/ Description of compiler flags for Intel FORTRAN Compiler 7.0 and 7.1 -------------------------------------------------------------------- -O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. -O2 This is the default level of optimization. Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. -O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. -Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K above) -Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -Qip enable single-file IP optimizations (within files, same as -Ob2) -Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion -Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization -Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files -Qrcd Enables fast float-to-int conversion. -Qscalar_rep(-) Enables(disables) scalar replacement performed during loop transformations (requires /O3). -Qauto Causes all variables to be allocated on the stack, rather than in local static storage. Does not affect variables that appear in an EQUIVALENCE or SAVE statement, or those that are in COMMON. Makes all local variables AUTOMATIC, same as /4Ya. Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca -Fn : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI : Fixed-format F90 source code. -F32000000 : Same as with 176.gcc, pre-allocates a 32MB stack 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executible without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executible to be as close as possible to the Unix-based one. -MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes.