Last updated: 8-Apr-2008
This flags disclosure file describes the compiler flags associated with the following Intel compilers:
Selecting one of the following will take you directly to that section:
Enables optimizations for speed and disables some optimizations that increase code size and affect speed. To limit code size, this option:
The O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.
On IA-32 Mac OSX platforms, -O1 sets the following:
Enables optimizations for speed. This is the generally recommended optimization level. This option also enables:
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations.
Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux/Mac OSX), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times.
The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations.
The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
This option enables additional interprocedural optimizations for single file compilation. These optimizations are a subset of full intra-file interprocedural optimizations. One of these optimizations enables the compiler to perform inline function expansion for calls to functions defined within the current source file.
This option enables multi-file interprocedural optimizations that includes:
When you specify this option, the compiler performs inline function expansion for calls to functions defined in separate files.
The -fast option enhances execution speed across the entire program by including the following options that can improve run-time performance:
Options set by -fast cannot be overidden, list options separately to change behavior. The options set by -fast may change from release to release.
The -xT option tells the compiler to generate optimized code for the Intel Core 2 Duo processor family. It can generate SSSE3, SSE3, SSE2, and SSE instructions for the Intel processors.
Tells the compiler to generate code for IA-32 architecture. If this flag is not specified, the compiler generates code based on whether 32-bit or the 64-bit compiler is in the search path.
Tells the compiler to generate code for EM64T architecture. If this flag is not specified, the compiler generates code based on whether 32-bit or the 64-bit compiler is in the search path.
Enables the compiler to generate runtime control code for effective automatic parallelization
Tells the auto-parallelizer to generated multithreaded code for loops that can be safely executed in parallel. To use this option, you must also use option O2 or O3.
Tells the compiler to link in the optimized malloc implementation that resides under /usr/lib.
Links the 32-bit Intel's C++ compiler libraries.
Links the 64-bit Intel's C++ compiler libraries.
Links the 32-bit Intel's Fortran compiler libraries.
Links the 64-bit Intel's Fortran compiler libraries.
Code is not relocatable, but external references are relocatable.
This option improves precision of floating-point divides. It has a slight impact on speed.
With some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, use this option to disable the floating-point division-to-multiplication optimization. The result is more accurate, with some loss of performance.
If you specify -no-prec-div (Linux and Mac OSX), it enables optimizations that give slightly less precise results than full IEEE division. The default is -prec-div.
Instrument program for profiling for the first phase of two-phase profile guided optimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
Instructs the compiler to produce a profile-optimized executable and merges available dynamic information (.dyn) files into a pgopti.dpi file. If you perform multiple executions of the instrumented program, -Qprof_use merges the dynamic information files again and overwrites the previous pgopti.dpi file.
Without any other options, the current directory is searched for .dyn files
This option causes the Intel-provided libraries to be linked in statically. It is the opposite of -shared-intel. Note that when this option is provided, libguide is also linked in statically.
Tells the compiler the maximum number of times (n) to unroll loops.
Disables inline expansion of all intrinsic functions.
Disables conformance to the ANSI C and IEEE 754 standards for floating-point arithmetic.
Allows use of EBP as a general-purpose register in optimizations.
Places each function in its own COMDAT section.
For mixed-language benchmarks, tell the compiler that the main program is not written in Fortran
icc invokes the Intel C++ compiler . It is invoked as:
icc [ options ] file1 [ file2 ... ]
where,
Invoking the compiler using icc compiles .c and .i files as C. Using icc only links in C++ libraries if C++ source is provided on the command line.
invokes the Intel C compiler for Intel 64 applications
invokes the Intel C++ compiler for Intel 64 applications
invokes the Intel C compiler for Intel 32 applications
invokes the Intel C++ compiler for Intel 32 applications
invokes the Intel Fortran compiler for Intel 32 applications
invokes the Intel Fortran compiler for Intel 64 applications
The icpc command uses the same compiler options as the icc command. Invoking the compiler using icpc compiles .c, and .i files as C++. Using icpc always links in C++ libraries.
ifort invokes the Intel Fortran compiler. It is invoked as:
ifort [ options ] file1 [ file2 ... ]
where,
Invoke the Intel C++ compiler in C99 mode for Mac OSX.
Compiler option to set the path for include files. Used in some integer peak benchmarks which were built using the Intel 64-bit C++ compiler.
Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 64-bit C++ compiler.
Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 32-bit C++ compiler.
Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 32-bit C++ compiler.
Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 32-bit Fortran compiler.
Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 32-bit Fortran compiler.
Compiler option to set the path for include files. Used in some peak benchmarks which were built using the Intel 64-bit Fortran compiler.
Compiler option to set the path for library files. Used in some integer peak benchmarks which were built using the Intel 64-bit Fortran compiler.
Generates static binaries. Libraries are statically linked in to the executable. Default behavior on Mac OS X is to produce dynamically linked binaries. This flag has been deprecated in the 10.x compiler; use -static-intel instead.
Pass options o1, o2, etc. to the linker for processing.
Specifies the initial address of the stack pointer value, where value is a hexadecimal number rounded to the segment alignment. The default segment alignment is the target pagesize (currently, 1000 hexadecimal for the PowerPC and for i386). If -stack_size is specified and -stack_addr is not, a default stack address specific for the architecture being linked will be used and its value printed as a warning message. This creates a segment named __UNIXSTACK. Note that the initial stack address will be either at the high address of the segment or the low address of the segment depending on which direction the stack grows for the architecture being linked.
Specifies the size of the stack segment value, where value is a hexadecimal number rounded to the segment alignment. The default segment alignment is the target pagesize (currently, 1000 hexadecimal for the PowerPC and for i386). If -stack_addr is specified and -stack_size is not, a default stack size specific for the architecture being linked will be used and its value printed as a warning message. This creates a segment named __UNIXSTACK .
Platform settings
The system under test is deemed reasonably quiet by turning off the following from the System Preferences panel:
OMP_NUM_THREADS
Sets the maximum number of threads to use for OpenMP* parallel regions if no
other value is specified in the application. This environment variable applies to both
-openmp and -parallel (Linux and Mac OS X).
Example syntax on a Mac OS X system with 8 cores:
export OMP_NUM_THREADS=8