Copyright © 2016 Intel Corporation. All Rights Reserved.
Invoke the Intel C compiler driver
Invoke the Intel C++ compiler driver
Invoke the Intel Fortran compiler driver
Invoke the Intel C compiler driver
Invoke the Intel C++ compiler driver
Invoke the Intel Fortran compiler driver
This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This flag can be set for SPEC compilation for LINUX using default compiler.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This macro specifies that the target system uses the LP64 data model; specifically, that integers are 32 bits, while longs and pointers are 64 bits.
This macro indicates that the benchmark is being compiled on an AMD64-compatible system running the Linux operating system.
This macro determines which file system interface will be used. Common file i/o calls like stat() and readdir() return off_t data that may or may not fit within a 32bit data structure if this flag is not used. With _FILE_OFFSET_BITS=64, types like off_t have a size of 64 bits. The truncation that happens without _FILE_OFFSET_BITS=64 has been observed to yield intermittent failures.
Ex: RHEL7 distributions format partitions using xfs. Runtime errors are observed on such systems because sometimes returned values will not fit into 32bit data types that are a mismatch for xfs.
See the gnuc feature test macros article for more information.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This macro determines which file system interface will be used. Common file i/o calls like stat() and readdir() return off_t data that may or may not fit within a 32bit data structure if this flag is not used. With _FILE_OFFSET_BITS=64, types like off_t have a size of 64 bits. The truncation that happens without _FILE_OFFSET_BITS=64 has been observed to yield intermittent failures.
Ex: RHEL7 distributions format partitions using xfs. Runtime errors are observed on such systems because sometimes returned values will not fit into 32bit data types that are a mismatch for xfs.
See the gnuc feature test macros article for more information.
This flag can be set for SPEC compilation for LINUX using default compiler.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
This option is used to indicate that the host system's integers are 32-bits wide, and longs and pointers are 64-bits wide. Not all benchmarks recognize this macro, but the preferred practice for data model selection applies the flags to all benchmarks; this flag description is a placeholder for those benchmarks that do not recognize this macro.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
If you specify nostandard-realloc-lhs, the compiler uses the old Fortran 2003 rules when interpreting assignment statements. The left-hand side is assumed to be allocated with the correct shape to hold the right-hand side. If it is not, incorrect behavior will occur. This option has the same effect as option assume norealloc_lhs.
The align toggle changes how data elements are aligned. Variables and arrays are analyzed and memory layout can be altered. Specifying array32byte will look for opportunities to transform and reailgn arrays to 32byte boundaries.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
-profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Tells the compiler to remove the assumption that source code follows c99 signed overflow rules.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable the use of the 32-bit compiler by passing the directory name for the compiler link libraries.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
-profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Specify build time link path for jemalloc 32bit built to support the CPU2017/CPU2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
This options tells the compiler to assume no aliasing in the program.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable the use of the 32-bit compiler by passing the directory name for the compiler link libraries.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
-profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Specify build time link path for jemalloc 32bit built to support the CPU2017/CPU2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Enable SmartHeap and/or other library usage by forcing the linker to ignore multiple definitions if present
Instrument program for profiling for the first phase of two-phase profile guided otimization. This instrumentation gathers information about a program's execution paths and data values but does not gather information from hardware performance counters. The profile instrumentation also gathers data for optimizations which are unique to profile-feedback optimization.
-profgen:threadsafe option collects profile guided optimization data with guards for threaded applications.
Instructs the compiler to produce a profile-optimized
executable and merges available dynamic information (.dyn)
files into a pgopti.dpi file. If you perform multiple
executions of the instrumented program, -prof-use merges
the dynamic information files again and overwrites the
previous pgopti.dpi file.
Without any other options, the current directory is
searched for .dyn files
Multi-file ip optimizations that includes:
- inline function expansion
- interprocedural constant propogation
- dead code elimination
- propagation of function characteristics
- passing arguments in registers
- loop-invariant code motion
Code is optimized for Intel(R) processors with support for CORE-AVX512 instructions. The resulting code may contain unconditional use of features that are not supported on other processors. This option also enables new optimizations in addition to Intel processor-specific optimizations including advanced data layout and code restructuring optimizations to improve memory accesses for Intel processors.
Do not use this option if you are executing a program on a processor that is not an Intel processor. If you use this option on a non-compatible processor to compile the main program (in Fortran) or the function main() in C/C++, the program will display a fatal run-time error if they are executed on unsupported processors.
Enables O2 optimizations plus more aggressive optimizations, such as prefetching, scalar replacement, and loop and memory access transformations. Enables optimizations for maximum speed, such as:
On IA-32 and Intel EM64T processors, when O3 is used with options -ax or -x (Linux) or with options /Qax or /Qx (Windows), the compiler performs more aggressive data dependency analysis than for O2, which may result in longer compilation times. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
-no-prec-div enables optimizations that give slightly less precise results than full IEEE division.
When you specify -no-prec-div along with some optimizations, such as -xN and -xB (Linux) or /QxN and /QxB (Windows), the compiler may change floating-point division computations into multiplication by the reciprocal of the denominator. For example, A/B is computed as A * (1/B) to improve the speed of the computation.
However, sometimes the value produced by this transformation is not as accurate as full IEEE division. When it is important to have fully precise IEEE division, do not use -no-prec-div. This will enable the default -prec-div and the result will be more accurate, with some loss of performance.
Controls the level of memory layout transformations performed by the compiler. This option can improve cache reuse and cache locality.
Specify build time link path for jemalloc 64bit built to support the CPU2017/2006 build. See jemalloc.net for more information.
Linker toggle to specify jemalloc linker library. See jemalloc.net for more information.
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
Invoke the Intel C compiler in C99 mode
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
Invoke the Intel C compiler in C99 mode
Specify the uarch target for the compiled executable to be ia32. Seperate, further toggles may add various other instruction set features. Particularly useful in 64bit mode.
Invoke the Intel C compiler in C99 mode
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
Specify the uarch target for the compiled executable to be ia32. Seperate, further toggles may add various other instruction set features. Particularly useful in 64bit mode.
Specify the uarch target for the compiled executable to be intel64. Seperate, further toggles may add various other instruction set features. Particularly useful in 32bit mode.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Enables optimizations for speed. This is the generally recommended
optimization level. This option also enables:
- Inlining of intrinsics
- Intra-file interprocedural optimizations, which include:
- inlining
- constant propagation
- forward substitution
- routine attribute propagation
- variable address-taken analysis
- dead static function elimination
- removal of unreferenced variables
- The following capabilities for performance gain:
- constant propagation
- copy propagation
- dead-code elimination
- global register allocation
- global instruction scheduling and control speculation
- loop unrolling
- optimized code selection
- partial redundancy elimination
- strength reduction/induction variable simplification
- variable renaming
- exception handling optimizations
- tail recursions
- peephole optimizations
- structure assignment lowering and optimizations
- dead store elimination
Enables optimizations for speed and disables some optimizations that increase code size and affect speed.
To limit code size, this option:
The O1 option may improve performance for applications with very large code size, many branches, and execution time not dominated by code within loops.
-O1 sets the following options:Tells the compiler the maximum number of times to unroll loops. For example -funroll-loops0 would disable unrolling of loops.
-fno-builtin disables inline expansion for all intrinsic functions.
This option trades off floating-point precision for speed by removing the restriction to conform to the IEEE standard.
EBP is used as a general-purpose register in optimizations.
Places each function in its own COMDAT section.
Flushes denormal results to zero.
Hardware Prefetcher:
This BIOS option allows the enabling/disabling of a processor mechanism to prefetch data into the cache according to a pattern-recognition algorithm. This default setting is "Enable".
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
Adjacent Cache Prefetch:
This BIOS option allows the enabling/disabling of a processor mechanism to fetch the adjacent cache line within a 128-byte sector that contains the data needed due to a cache line miss. This default setting is "Enable".
In some cases, setting this option to Disabled may improve performance. Users should only disable this option after performing application benchmarking to verify improved performance in their environment.
DCU Streamer Prefetcher:
This BIOS option allows enabling/disabling the function of Data Cache Unit (DCU) Streamer prefetcher. This default setting is "Enable".
If this option sets to "Enable", when the DCU Streamer prefetcher detects multiple loads from the same line done within a time limit, it prefetches the next line into the L1 data cache.
Hyper-Threading [ALL]:
Disabling Intel's Hyper-Threading Technology reduces the number of threads per core to 1. The default is "Enable"; in this case each core provides additional resources for executing up to 2 threads in parallel.
SNC:
Sub NUMA Cluster (SNC) breaks up the last-level cache (LLC) into two disjoint clusters based on address range, with each cluster bound to one memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA (Non Uniform Memory Access) domains. If this option is disabled, the LLC is treated as one cluster. If this option is enabled and IMC Interleaving is 1-way Interleave, utilizes LLC capacity more efficiently and reduces latency due to core/IMC proximity. This may provide performance improvement on NUMA-aware operating systems. This default setting is "Enable".
IMC Interleaving:
This BIOS option controls the interleaving between the Integrated Memory Controllers (IMCs). There are two IMCs per socket in Skylake. If IMC Interleaving is set to 2-way Interleave, addresses will be interleaved between the two IMCs. If IMC Interleaving is set to 1-way Interleave, there will be no interleaving. If SNC is enabled, IMC Interleaving should be set to 1-way Interleave. This default setting is "Auto".
Stale AtoS:
The in-memory directory has three states: invalid (I), snoopAll (A), and shared (S). Invalid (I) state means the data is clean and does not exist in any other socket`s cache. The snoopAll (A) state means the data may exist in another socket in exclusive or modified state. Shared (S) state means the data is clean and may be shared across one or more socket`s caches. When doing a read to memory, if the directory line is in the A state we must snoop all the other sockets because another socket may have the line in modified state. If this is the case, the snoop will return the modified data. However, it may be the case that a line is read in A state and all the snoops come back a miss. This can happen if another socket read the line earlier and then silently dropped it from its cache without modifying it. If this option is enabled, in the situation where a line in A state returns only snoop misses, the line will transition to S state. That way, subsequent reads to the line will encounter it in S state and not have to snoop, saving latency and snoop bandwidth. This default setting is "Disable".
LLC dead Line Alloc:
In the Skylake non-inclusive cache scheme, mid-level cache (MLC) evictions are filled into the last-level cache (LLC). If a line is evicted from the MLC to the LLC, the core can flag the evicted MLC lines as "dead." This means that the lines are not likely to be read again. This option allows dead lines to be dropped and never fill the LLC if this option is disabled. However, if this option is enabled, the LLC can opportunistically fill dead lines into the LLC if there is free space available. This default setting is "Enable".
ENERGY_PERF_BIAS_CFG mode:
This item decides the energy efficiency policy which is the energy per performance rate. If the highest performance is preferred, this option should be set to "Performance". This default setting is "Balanced Performance".
Patrol Scrub:
Patrol Scrub is a mechanism for memory controller to periodically read all memory. Corrected read data is written back to memory when a correctable error is detected. This default setting is "Enable".
Demand Scrub:
Demand Scrub is a mechanism for memory controller to correct a correctable error in memory. Corrected read data is sent to the requestor and written back to memory. This default setting is "Enable".
Link Frequency Select:
This option selects the upper limit of the UPI link speed. This default setting is "Auto", which configures the highest supported UPI link speed automatically.
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2017-2018 Standard Performance Evaluation Corporation
Tested with SPEC CPU2017 v1.0.2.
Report generated on 2018-10-31 17:24:56 by SPEC CPU2017 flags formatter v5178.