Description of compiler flags for Intel C Compiler 5.0 ------------------------------------------------------ /O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. /O2 Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. /O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. /Oa[-] assume [do not assume] no aliasing in program /Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K) /Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. -QxK and -QaxK ensure consistent floating point arithmetic. /Ob{1|2|3} Controls the compiler's inline expansion. 0: disable inlining. 1: disables inlining unless /Qip or /Ob2 are specified. 2: enables inlining of any function. However, the compiler decides which functions are inlined. This option enables interprocedural optimizations and has the same effect as specifying the /Qip option. /Qip enable single-file IP optimizations (within files, same as /Ob2) /Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion /Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization /Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files /Qrcd The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance /GX Enables the full C++ Exception Handling unwind semantics. /GR Enables C++ Runtime Type Information (RTTI). /Qfp_port round fp results at assignments & casts (some speed impact) shlW32M.lib: MicroQuill SmartHeap Library 5.0 available from http://www.microquill.com/ Description of compiler flags for Intel FORTRAN Compiler 5.0 ------------------------------------------------------------ /O1 optimize for speed, but disable some optimizations which increase code size for a small speed benefit. Includes inline expansion except for intrinsic functions, global optimizations, string pooling optimizations. /O2 Optimizes for speed. The -O2 option includes O1 optimizations and in addition enables inlining of intrinsics and more speed optimizations. /O3: Builds on -01 and -02 optimizations by enabling high-level optimization. This level does not guarantee higher performance unless loop and memory access transformation take place. In conjunction with -QaxK/-QxK and QaxW/QxW, this switch causes the compiler to perform more aggressive data dependency analysis than for -O2. This may result in longer compilation times. /Qax generate code specialized for processor extensions specified by while also generating generic IA-32 code. includes one or more of the following characters: i Pentium Pro and Pentium II processor instructions M MMX(TM) instructions K streaming SIMD extensions (implies i and M above) W Pentium 4 processor with Streaming SIMD Extensions 2 (implies i, M and K above) /Qx generate specialized code to run exclusively on processors supporting the extensions indicated by as described above. /Qip enable single-file IP optimizations (within files, same as /Ob2) /Qipo multi-file ip optimizations that includes: - inline function expansion - interprocedural constant propogation - dead code elimination - propagation of function characteristics - passing arguments in registers - loop-invariant code motion /Qwp_ipo enable multi-file IP optimizations (between files) and make "whole program" assumption that all variables and functions seen in the compiled sources are referenced only within those sources; the user must guarantee that this assumption is safe /Qprof_gen instrument program for profiling for the first phase of two-phase profile guided otimization /Qprof_use Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file. Without any other options, the current directory is searched for .dyn files /Qrcd The -Qrcd option disables the change to truncation of the rounding mode for all floating point calculations, including floating point-to-integer conversions. Turning on this option can improve performance /Oi[-] enable/disable inline expansion of intrinsic functions Other Notes: ------------ "/" and "-" are both allowable starting tokens for flags passed to the compiler i.e. -QxK and /QxK are identical switches. Portability options for CPU2000: ------------------------------- 176.gcc: -Dalloca=_alloca : so as to use the built-in optimized alloca /Fn : 176.gcc uses alloca and this options tells the linker to pre-allocate n bytes of stack. The default amount of stack allocated is not enough and 176.gcc crashes with a run-time error 178.galgel: -FI : Fixed-format F90 source code. -F32000000 : Same as with 176.gcc, pre-allocates a 32MB stack 186.crafty: -DNT_i386 : Specifies that it is a Windows NT Intel processor-based system which makes the compiler use "long long" as the 64-bit variable that 186.crafty needs. 253.perlbmk: -DSPEC_CPU2000_NTOS : This enables the code changes for porting to Windows get included -DPERLDLL : On Windows, we need a perl.exe instead of a perl.exe and perl.dll. This pre-define ensures that the changes necessary to get a single, UNIX-style executable without getting the indirect calls that can cause a 10% performance degradation. This allows the Windows-based executable to be as close as possible to the Unix-based one. /MT : Use the static multi-threaded library else it will not compile. 254.gap: -DSYS_HAS_CALLOC_PROTO : -DSYS_HAS_MALLOC_PROTO : These two pre-defines tell of the existence of malloc and calloc prototypes. Flag disclosure for the Compaq Visual Fortran 6.5 ********************************************************************* /[no]optimize Syntax: /optimize[:level], /nooptimize, /Od, /Ox, or /Oxp The /optimize option controls the level of optimization performed by the compiler. To provide efficient run-time performance, Visual Fortran increases compile time in favor of decreasing run time. If an operation can be performed, eliminated, or simplified at compile time, the compiler does so rather than have it done at run time. Also, the size of object file usually increases when certain optimizations occur (such as with more loop unrolling and more inlined procedures). In the visual development environment, specify the Optimization Level in the General or Optimizations Compiler Option Category. The /optimize options are: /optimize:0 or /Od /optimize:1 /optimize:2 /optimize:3 /optimize:4, /Ox, and /Oxp /optimize:5 The /optimize options: /optimize:0 or /Od Disables nearly all optimizations. This is the default if you specify /debug (with no keyword). Specifying this option causes certain /warn options to be ignored. Specifying /Od sets the /optimize:0 and /math_library:check options. /optimize:1 Enables local optimizations within the source program unit, recognition of common subexpressions, and expansion of integer multiplication and division (using shifts). /optimize:2 Enables global optimization. This includes data-flow analysis, code motion, strength reduction and test replacement, split-lifetime analysis, and instruction scheduling. Specifying /optimize:2 includes the optimizations performed by /optimize:1. /optimize:3 Enables additional global optimizations that improve speed (at the cost of extra code size). These optimizations include: Loop unrolling, including instruction scheduling Code replication to eliminate branches Padding the size of certain power-of-two arrays to allow more efficient cache use (see Use Arrays Efficiently) Specifying /optimize:3 includes the optimizations performed by /optimize:1 and /optimize:2. /optimize:4, /Ox, and /Oxp Enables interprocedure analysis and automatic inlining of small procedures (with heuristics limiting the amount of extra code). Specifying /optimize:4 includes the optimizations performed by /optimize:1 /optimize:2, and /optimize:3. For the DF command, /optimize:4 is the default unless you specify /debug (with no keyword). Specifying /Ox sets: /optimize:4, /math_library:fast, and /assume:nodummy_aliases. Specifying /Oxp sets: /optimize:4, /math_library:check, /assume:nodummy_aliases, and /fpconsistency (x86 systems). /optimize:5 On x86 systems, activates the loop transformation optimizations (also set by /transform_loops). The loop transformation optimizations are a group of optimizations that apply to array references within loops. These optimizations can improve the performance of the memory system and can apply to multiple nested loops. Loop transformation optimizations include loop blocking, loop distribution, loop fusion, loop interchange, loop scalar replacement, and outer loop unrolling. You can specify loop transformation optimizations without software pipelining (see /[no]transform_loops). On x86 systems, specifying /optimize:5 activates /transform_loops. To determine whether using /optimize:5 benefits your particular program, you should compare program execution timings for the same program (or subprogram) compiled at levels /optimize:4 and /optimize:5. Specifying /optimize:5 includes the optimizations performed by /optimize:1, /optimize:2, /optimize:3, and /optimize:4. For detailed information on these optimizations, see Optimization Levels: the /optimize Option. For information about timing your program, see Analyze Program Performance. To compile your application for efficient run-time performance, see Compile With Appropriate Options and Multiple Source Files. /fast Syntax: /fast The /fast option sets several options that generate optimized code forfast run-time performance. Specifying this option is equivalent to specifying: /alignment:(dcommons, records, sequence) /architecture:host /assume:noaccuracy_sensitive /math_library:fast (which changes the default of /check:[no]power) /tune:host In the visual development environment, specify the Generate Most Optimized Code in the Code Generation Compiler Option Category. If you omit /fast, these performance-related options will not be set. /[no]alignment Syntax: /alignment[:keyword...], /noalignment, or /Zpn The /alignment option specifies the alignment of data items in common blocks, record structures, and derived-type structures. The /Zpn option specifies the alignment of data items in derived-type or record structures. The /alignment options are: /align:[no]commons The /align:commons option aligns the data items of all COMMON data blocks on natural boundaries up to four bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. In the visual development environment, specify the Common Element Alignment as 4 in the Fortran Data Compiler Option Category. /align:dcommons The /align:dcommons option aligns the data items of all COMMON data blocks on natural boundaries up to eight bytes. The default is /align:nocommons (unless /fast is specified), which does not align data blocks on natural boundaries. Specifying /fast sets /align:dcommons. In the visual development environment, specify the Common Element Alignment as 8 in the Fortran Data Compiler Option Category. /align:[no]records The /align:records option (the default) requests that components of derived types and fields of records be aligned on natural boundaries up to 8 bytes (for derived types with the SEQUENCE statement, see /align:[no]sequence below). The /align:norecords option requests that components and fields be aligned on arbitrary byte boundaries, instead of on natural boundaries up to 8 bytes. In the visual development environment, specify the Structure Element Alignment in the Fortran Data Compiler Option Category. /align:[no]sequence The /align:sequence option requests that components of derived types with the SEQUENCE statement will obey whatever alignment rules are currently in use (default alignment rules will align unsequenced components on natural boundaries). The default value (unless /fast is specified) is /align:nosequence, which means that components of derived types with the SEQUENCE property will be packed, regardless of whatever alignment rules are currently in use. Specifying /fast sets /align:sequence. In the visual development environment, specify Allow SEQUENCE Types to be Padded for Alignment in the Fortran Data Compiler Option Category. /align:recNbyte or /Zpn The /align:recNbyte or /Zpn options request that fields of records and components of derived types be aligned on the smaller of: The size byte boundary (N) specified. The boundary that will naturally align them. Specifying /align:recNbyte, /Zpn, or /align:[no]records does not affect whether common block fields are naturally aligned or packed. In the visual development environment, specify the Structure Element Alignment in the Fortran Data Compiler Option Category. Specifying Is the Same as Specifying /Zp /alignment:records or /align:rec8byte /Zp1 /alignment:norecords or /align:rec1byte /Zp2 /align:rec2byte /Zp4 /align:rec4byte /alignment /Zp8 with /align:dcommons, /alignment:all, or /alignment:(dcommons, records) /noalignment /Zp1, /alignment:none, or /alignment:(nocommons,nodcommons, norecords) /align:rec1byte /align:norecords /align:rec8byte /align:records When you omit the /alignment option, records and components of derived types are naturally aligned, but fields in common blocks are packed. This default is equivalent to: /alignment=(nocommons,nodcommons,records,nosequence) You can also control the alignment of components in records and derived types and data items in common blocks by Using the cDEC$ OPTIONS Directive. /architecture Syntax: /architecture:keyword The /architecture (/arch) option controls the types of processor specific instructions generated for this program unit. The /arch:keyword option uses the same keywords as the /tune:keyword option. All processors of a certain architecture type (Alpha or x86) implement a core set of instructions. Certain (more recent) processor versions include additional instruction extensions. Whereas the /tune:keyword option is primarily used by certain higher level optimizations for instruction scheduling purposes, the /arch:keyword option determines the type of machine-code instructions generated for the program unit being compiled. In the visual development environment, specify the Generate Code For in the Code Generation Compiler Option Category. For x86 (IntelTM and AMDTM) systems, the supported /arch keywords are: /arch:generic Generates code (sometimes called blended code) that is appropriate for processor generations for the architecture type in use. This is the default. Programs compiled on an x86 system with the generic keyword will run on all x86 (486 and Pentium series) systems. /arch:host Generates code for the processor generation in use on the system being used for compilation. Depending on the host system used on x86 systems, the program may or may not run on other x86 systems: Programs compiled on a 486 system with the host keyword will run on all x86 systems. Programs compiled on a PentiumTM (586) system with the host keyword should not be run on 486 systems. Programs compiled on a Pentium ProTM, Pentium II, or AMD K6 system with the host keyword should not be run on 486 or Pentium systems. Programs compiled on a Pentium III system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, or AMD K6 systems. Programs compiled on a AMD K6_2 or AMD K6_III system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, AMD K6, or Pentium III systems. Programs compiled on a AMD Athlon system with the host keyword should not be run on 486, Pentium, Pentium Pro, Pentium II, AMD K6, Pentium III systems, AMD K6_2, or AMD K6_III systems. /arch:p5 Generates code for the Pentium processor systems. Programs compiled with the p5 keyword will run correctly on Pentium, Pentium Pro, Pentium II, AMD K6, and higher processors, but should not be run on 486 processors. /arch:p6 Generates code for the Pentium Pro and Pentium II and AMD K6 processor systems only. Programs compiled with the p6 or k6 keyword will run correctly on Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. /arch:k6 Generates code for the AMD K6 (same as Pentium II systems) processor systems only. Programs compiled with the k6 or p6 keyword will run correctly on Pentium II, AMD K6, Pentium III, and higher processors, but should not be run on 486 or Pentium processors. /arch:p6p Generates code for the Pentium III, AMD K6_2, and AMD K6_III processor systems only. Programs compiled with the p6p keyword will run correctly on Pentium III, AMD K6_2, and AMD K6_III and higher processors, but should not be run on 486, Pentium, Pentium Pro, or Pentium II (same as AMD K6) processors. /arch:k6_2 Generates code for the AMD K6_2 and AMD K6_III processor systems. Programs compiled with the k6_2 keyword will run correctly on AMD K6_2, AMD K6_III, and AMD AthlonTM processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), or Pentium III processors. /arch:k7 Generates code for the AMD Athlon processor systems only. Programs compiled with the k7 keyword will run correctly on AMD Athlon processors, but should not be run on 486, Pentium, Pentium Pro, Pentium II (same as AMD K6), Pentium III, AMD K6_2, or AMD K6_III processors. Other processors (not listed) that have instruction-level compatiblity with the processors listed above will have results similar to those processors. Specifying /fast sets /arch:host. For information about timing program execution, see Analyze Program Performance. /assume Syntax: /assume:keyword The /assume option specifies assumptions made by the Fortran syntax analyzer, optimizer, and code generator. These option keywords are: /assume:[no]accuracy_sensitive /assume:[no]buffered_io /assume:[no]byterecl /assume:[no]dummy_aliases /assume:[no]minus0 /assume:[no]source_include /assume:[no]underscore The /assume options are: /assume:[no]accuracy_sensitive Specifying /assume:noaccuracy_sensitive allows the compiler to reorder code based on algebraic identities (inverses, associativity, and distribution) to improve performance. In the visual development environment, specify Allow Reordering of Floating-Point Operations in the Optimizations Compiler Option Category. The numeric results can be slightly different from the default (/assume:accuracy_sensitive) because of the way intermediate results are rounded. Numeric results with /assume:noaccuracy_sensitive are not categorically less accurate. They can produce more accurate results for certain floating-point calculations, such as dot product summations. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic. X = (A + B) - C X = A + (B - C) If you omit /assume:noaccuracy_sensitive and omit /fast, the compiler uses a limited number of rules for calculations, which might prevent some optimizations. If you specify /assume:noaccuracy_sensitive, or if you specify /fast and omit /assume:accuracy_sensitive, the compiler can reorder code based on algebraic identities to improve performance. For more information on /assume:noaccuracy_sensitive, see Arithmetic Reordering Optimizations. /assume:[no]buffered_io The /assume:buffered_io option controls whether records are written (flushed) to disk as each record is written (default) or accumulated in the buffer. For disk devices, /assume:buffered_io (or the equivalent OPEN statement BUFFERED='YES' specifier) requests that the internal buffer will be filled, possibly by many record output statements (WRITE), before it is written to disk by the Fortran run-time system. If a file is opened for direct access, I/O buffering will be ignored. Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, if you specified /assume:buffered_io or BUFFERED='YES', records not yet written to disk may be lost in the event of a system failure. The default is BUFFERED='NO' and /assume:nobuffered_io for all I/O, in which case, the Fortran run-time system empties its internal buffer for each WRITE (or similar record output statement). The OPEN statement BUFFERED specifier takes precedence over the /assume:[no]buffered_io option. In the visual development environment, specify the Enable I/O Buffering in the Optimizations Compiler Option Category. For more information on /assume:buffered_io, see Efficient Use of Record Buffers and Disk I/O. /assume:[no]byterecl The /assume:byterecl option applies only to unformatted files. In the visual development environment, specify the Use Bytes as Unit for Unformatted Files in the Fortran Data Compiler Option Category. Specifying the /assume:byterecl option: Indicates that the units for an explicit OPEN statement RECL specifier value are in bytes. Forces the record length value returned by an INQUIRE by output list to be in byte units. Specifying /assume:nobyterecl indicates that the units for RECL values with unformatted files are in four-byte (longword) units. This is the default. /assume:[no]dummy_aliases Specifying the /assume:dummy_aliases option requires that the compiler assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. The default is /assume:nodummy_aliases. In the visual development environment, specify Enable Dummy Argument Aliasing in the Fortran Data (or Optimizations) Compiler Option Category. These program semantics do not strictly obey the Fortran 90 Standard and they slow performance. If you omit /assume:dummy_aliases, the compiler does not need to make these assumptions, which results in better run-time performance. However, omitting /assume:dummy_aliases can cause some programs that depend on such aliases to fail or produce wrong answers. You only need to compile the called subprogram with /assume:dummy_aliases. If you compile a program that uses dummy aliasing with /assume:nodummy_aliases in effect, the run-time behavior of the program will be unpredictable. In such programs, the results will depend on the exact optimizations that are performed. In some cases, normal results will occur; however, in other cases, results will differ because the values used in computations involving the offending aliases will differ. For more information, see Dummy Aliasing Assumption. /assume:[no]minus0 This option controls whether the compiler uses Fortran 95 standard semantics for the IEEE floating-point value of -0.0 (minus zero) in the SIGN intrinsic, if the processor is capable of distinguishing the difference between -0.0 and +0.0. The default is /assume:nominus0, which uses Fortran 90 and FORTRAN 77 semantics where the value -0.0 or +0.0 in the SIGN function is treated as 0.0. To request Fortran 95 semantics to allow use of the IEEE value -0.0 in the SIGN intrinsic, specify /assume:minus0. In the visual development environment, specify Enable IEEE Minus Zero Support in the Floating Point Compiler Option Category. /assume:[no]source_include This option controls the directory searched for module files specified by a USE statement or source files specified by an INCLUDE statement: Specifying /assume:source_include requests a search for module or include files in the directory where the source file being compiled resides. This is the default. Specifying /assume:nosource_include requests a search for module or include files in the current (default) directory. In the visual development environment, specify the Default INCLUDE and USE Paths in the Preprocessor Compiler Option Category. /assume:[no]underscore Specifying /assume:underscore option controls the appending of an underscore character to external user-defined names: the main program name, named COMMON, BLOCK DATA, and names implicitly or explicitly declared EXTERNAL. The name of blank COMMON remains _BLNK__, and Fortran intrinsic names are not affected. In the visual development environment, specify Append Underscore to External Names in the External Procedures (or Fortran Data) Compiler Option Category. Specifying /assume:nounderscore option does not append an underscore character to external user-defined names. This is the default. For example, the following command requests the noaccuracy_sensitive and nosource_include keywords and accepts the defaults for the other /assume keywords: df /assume:(noaccuracy_sensitive,nosource_include) testfile.f90 /math_library Syntax: /math_library:keyword The /math_library option specifies whether argument checking of math routines is done on x86 systems and the type of math library routines used on Alpha systems. In the visual development environment, specify the Math Library in the Optimizations (or Code Generation) Compiler Option Category. The /math_library options are: /math_library:fast, and /math_library:check: /math_library:fast On x86 systems, /math_library:fast improves performance by not checking the arguments to the math routines. Using /math_library:fast makes tracing the cause of unexpected exceptional values results difficult. On x86 systems, /math_library:fast does not change the accuracy of calculated floating-point numbers. /math_library:check On x86 systems, /math_library:check validates the arguments to and results from calls to the Fortran math routines. This provides slower run-time performance than /math_library:fast on x86 systems, but with earlier detection of exceptional values. This is the default on x86 systems. /tune Syntax: /tune:keyword The /tune option specifies the type of processor-specific machine code instruction tuning for implementations of the processor architecture in use (either x86 or Alpha). Tuning for a specific implementation can improve run-time performance; it is also possible that code tuned for a specific processor may run slower on another processor. Regardless of the /tune:keyword option you use, the generated code runs correctly on all implementations of the processor architecture. If you omit /tune:keyword, /tune:generic is used. In the visual development environment, specify the Optimize For in the Optimizations Compiler Option Category. The /tune keywords have meanings specific to x86 systems or Alpha systems. For x86 (Intel and AMD) systems, the /tune keywords are: /tune:generic Generates and schedules code (sometimes called blended code) that will execute well for all x86 systems. This provides generally efficient code for those applications where all x86 processor generations are likely to be used. This is the default. /tune:host Generates and schedules code optimized for the processor type in use on the x86 system being used for compilation. /tune:p5 (x86 only) Generates and schedules code optimized for the Pentium (586) processor systems. /tune:p6 (x86 only) Generates and schedules code optimized for Pentium Pro, Pentium II, and AMD K6 processor systems. /tune:k6 (x86 only) Generates and schedules code optimized for AMD K6 and Pentium Pro and Pentium II processor systems (/tune:p6 and /tune:k6 are the same). /tune:p6p (x86 only) Generates and schedules code optimized for Pentium III, AMD K6_2, and AMD K6_III processor systems. /tune:k6_2 (x86 only) Generates and schedules code optimized for AMD K6_2 and AMD K6_III processor systems. /tune:k7 (x86 only) Generates and schedules code optimized for AMD Athlon processor systems. Specifying /fast sets /tune:host. For more information about this option, see Requesting Optimized Code for a Specific Processor Generation. For information about timing program execution, see Analyze Program Performance. To control the processor-specific type of machine-code instructions being generated, see the /architecture:keyword option.