Intel Compiler Flags for Cray XC40 KNL

Optimization Flags

-fast
- -O3
  - -finline-functions
  - -frename-registers
- -funroll-loops
  - -fstrength-reduce
  - -frerun-cse-after-loop
- -fstrict-aliasing
- -fsched-interblock
- -falign-loops=16
- -falign-jumps=16
- -falign-functions=16
- -falign-jumps-max-skip=15
- -falign-loops-max-skip=15
- -malign-natural
- -ffast-math
- -mdynamic-no-pic
- -mpowerpc-gpopt
- -force_cpusubtype_ALL
- -fstrict-aliasing
- -mtune=G5
- -mcpu=G5
- -mpowerpc64
-O3
- -finline-functions
- -frename-registers
-mcpu=7450, -mcpu=G5
-falign-functions, -falign-functions=n
-falign-loops, -falign-loops=n
-falign-loops-max-skip, -falign-loops-max-skip=n
-falign-jumps, -falign-jumps=n
-falign-jumps-max-skip, -falign-jumps-max-skip=n
-force_cpusubtype_ALL
-fsched-interblock
-fstrict-aliasing
-funroll-loops
- -fstrength-reduce
- -frerun-cse-after-loop
-ffast-math
- -fno-math-errno
- -funsafe-math-optimizations
- -fno-trapping-math
  - -fno-signaling-nans
- -ffinite-math-only
- -fno-signaling-nans
-mpowerpc64
-malign-natural
-mpowerpc-gpopt
-mtune=7450, -mtune=G5
-mdynamic-no-pic
-fstrength-reduce
-fno-trapping-math
- -fno-signaling-nans
-fno-signaling-nans
-funsafe-math-optimizations
-ffinite-math-only
-fno-math-errno
-frerun-cse-after-loop
-finline-functions
-frename-registers
-g
-qopenmp
-qopenmpoffhost
-avx512
-optreport5

- -fast
- (?:^|(?<=\s))-fast(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Optimize for maximum performance. -fast changes the overall optimization strategy of GCC in order to produce the fastest possible running code for PPC7450 and G5 architectures. By default, -fast optimizes for G5. Programs optimized for G5 will not run on PPC7450.
  
  -fast currently enables the following optimization flags. These flags may change in the future. You cannot override any of these options if you use -fast except by setting -mcpu=7450. Note that -ffast-math, -fstrict-aliasing and -malign-natural are unsafe in some situations.
- Includes:
- -O3
- (?:^|(?<=\s))-O3(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions and -frename-registers options.
  
  Okay, I (cds) made an extreme tactical error when choosing gcc as the basis of an allegedly "simple" flags example. I don't want this example to grow to the size of the GCC man page so let me just leave off by saying that a formal reference to -O2 should be included here, and that the description of -O2 must also contain references to the 25 flags that it turns on.
- Includes:
  - -finline-functions
  - -frename-registers
- -mcpu=7450, -mcpu=G5
- -mcpu=(\S+)\b
- Set architecture type, register usage, choice of mnemonics, and instruction scheduling parameters for a particular machine type.
  
  Supported values for this flag are
  - rios
  - rios1
  - rsc
  - rios2
  - rs64a
  - 601
  - 602
  - 603
  - 603e
  - 604
  - 604e
  - 620
  - 630
  - 740
  - 7400
  - 7450
  - 750
  - power
  - power2
  - powerpc
  - 403
  - 505
  - 801
  - 821
  - 823
  - 860
  - common
- -falign-functions, -falign-functions=n
- (?:^|(?<=\s))-falign-functions(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance, -falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only if this can be done by skipping 23 bytes or less.
  
  -fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
  
  Some assemblers only support this flag when n is a power of two; in that case, it is rounded up.
  
  If n is not specified, use a machine-dependent default.
- -falign-loops, -falign-loops=n
- (?:^|(?<=\s))-falign-loops(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Align loops to a power-of-two boundary, skipping up to n bytes like -falign-functions. The hope is that the loop will be executed many times, which will make up for any execution of the dummy operations.
- -falign-loops-max-skip, -falign-loops-max-skip=n
- (?:^|(?<=\s))-falign-loops-max-skip(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- When aligning loops to a power-of-two boundary, only do so if can skip by up to n bytes.
  
  If n is not specified, use a machine-dependent default.
- -falign-jumps, -falign-jumps=n
- (?:^|(?<=\s))-falign-jumps(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Align branch targets to a power-of-two boundary, for branch targets where the targets can only be reached by jumping, skipping up to n bytes like -falign-functions. In this case, no dummy operations need be executed.
- -falign-jumps-max-skip, -falign-jumps-max-skip=n
- (?:^|(?<=\s))-falign-jumps-max-skip(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- When aligning branch targets to a power-of-two boundary, only do so if can skip by up to n bytes.
  
  If n is not specified, use a machine-dependent default.
- -force_cpusubtype_ALL
- (?:^|(?<=\s))-force_cpusubtype_ALL(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Hey! What does this flag do? It's not in the man page.
  
  Well, I know that you, as the well informed and well- connected (with your compiler vendor) will be able to document ALL of your implicitly included flags.
- -fsched-interblock
- (?:^|(?<=\s))-fsched-interblock(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Schedule instructions across basic blocks. This is enabled by default when scheduling before register allocation, i.e. with -fschedule-insns or at -O2 or higher.
- -fstrict-aliasing
- (?:^|(?<=\s))-fstrict-aliasing(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an "unsigned int" can alias an "int", but not a "void*" or a "double". A character type may alias any other type.
  
  Pay special attention to code like this:
```
   union a_union {
     int i;
     double d;
   };

   int f() {
     a_union t;
     t.d = 3.0;
     return t.i;
   }
```
  The practice of reading from a different union member than the one most recently written to (called ``type-punning'') is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. However, this code might not:
```
   int f() {
     a_union t;
     int* ip;
     t.d = 3.0;
     ip = &t.i;
     return *ip;
   }
```
- -funroll-loops
- (?:^|(?<=\s))-funroll-loops(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
- Includes:
  - -fstrength-reduce
  - -frerun-cse-after-loop
- -ffast-math
- (?:^|(?<=\s))-ffast-math(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Sets the following flags:
  - -fno-math-errno
  - -funsafe-math-optimizations
  - -fno-trapping-math
  - -ffinite-math-only
  - -fno-signaling-nans
- Includes:
- -mpowerpc64
- (?:^|(?<=\s))-mpowerpc64(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- The -mpowerpc64 option allows GCC to generate the additional 64-bit instructions that are found in the full PowerPC64 architecture and to treat GPRs as 64-bit, doubleword quantities. GCC defaults to -mno-powerpc64.
- -malign-natural
- (?:^|(?<=\s))-malign-natural(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Aligns larger data types such as doubles on their natural boundaries.
- -mpowerpc-gpopt
- (?:^|(?<=\s))-mpowerpc-gpopt(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Allows GCC to use the optional PowerPC architecture instructions in the General Purpose group, including floating-point square root.
- -mtune=7450, -mtune=G5
- (?:^|(?<=\s))-mtune(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Sets the instruction scheduling parameters for a particular machine type, but does not set the architecture type, register usage, or choice of mnemonics, as -mcpu=cpu_type would. The same values for cpu_type are used for -mtune as for -mcpu. If both are specified, the code generated will use the architecture, registers, and mnemonics set by -mcpu, but the scheduling parameters set by -mtune.
- -mdynamic-no-pic
- (?:^|(?<=\s))-mdynamic-no-pic(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Compile code so that it is not relocatable, but that its external references are relocatable. The resulting code is suitable for applications, but not shared libraries.
- -fstrength-reduce
- (?:^|(?<=\s))-fstrength-reduce(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Perform the optimizations of loop strength reduction and elimination of iteration variables.
- -fno-trapping-math
- (?:^|(?<=\s))-fno-trapping-math(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation. This option implies -fno-signaling-nans. Setting this option may allow faster code if one relies on `non-stop' IEEE arithmetic, for example.
  
  Use of this option can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
- Includes:
  - -fno-signaling-nans
- -fno-signaling-nans
- (?:^|(?<=\s))-fno-signaling-nans(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Compile code assuming that IEEE signaling NaNs may not generate user-visible traps during floating-point operations. Setting this option enabled optimizations that may change the number of exceptions visible with signaling NaNs.
- -funsafe-math-optimizations
- (?:^|(?<=\s))-funsafe-math-optimizations(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.
  
  Use of this option may result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
- -ffinite-math-only
- (?:^|(?<=\s))-ffinite-math-only(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.
  
  Use of this option may result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
- -fno-math-errno
- (?:^|(?<=\s))-fno-math-errno(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Do not set ERRNO after calling math functions that are executed with a single instruction, e.g., sqrt. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while maintaining IEEE arithmetic compatibility.
  
  Use of this option may result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.
- -frerun-cse-after-loop
- (?:^|(?<=\s))-frerun-cse-after-loop(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Re-run common subexpression elimination after loop optimizations have been performed.
- -finline-functions
- (?:^|(?<=\s))-finline-functions(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Integrate all simple functions into their callers. The compiler heuristically decides which functions are simple enough to be worth integrating in this way.
  
  If all calls to a given function are integrated, and the function is declared "static", then the function is normally not output as assembler code in its own right.
- -frename-registers
- (?:^|(?<=\s))-frename-registers(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Attempt to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers. It can, however, make debugging impossible, since variables will no longer stay in a `home register'.
- -g
- (?:^|(?<=\s))-g(?:$[^$]+\))?(?:=\S*)?(?=\s|$)
- Produce debugging information in the operating system's native format (stabs, COFF, XCOFF, or DWARF 2). GDB can work with this debugging information.
  
  On most systems that use stabs format, -g enables use of extra debugging information that only GDB can use; this extra information makes debugging work better in GDB but will probably make other debuggers crash or refuse to read the program.
- -qopenmp
- -qopenmp\s
- Enables OpenMP.
- -qopenmpoffhost
- -qopenmp-offload=host\b
- Enables OpenMP self-offloading.
- -avx512
- -xMIC-AVX512\b
- Enables AVX512 instructions.
- -optreport5
- -qopt-report=5\b
- Prints an optimization report.

The following expression was used for the submit command: 'aprun -n 1 -d 256 -j 4 -cc depth -q numactl -m 1 $command'.

Intel Compiler Flags for Cray XC40 KNL

Sections

Optimization Flags

Portability Flags

Compiler Flags

Other Flags

Commands and Options Used to Submit Benchmark Runs

Shell, Environment, and Other Software Settings