The OpenMPI C driver configured for use with the NVIDIA HPC C compiler (nvc).
The OpenMPI Fortran driver configured for use with the NVIDIA HPC Fortran compiler (nvfortran).
Use blank instead of the OpenMP "nothing" clause for compilers that do not support this feature.
-mcmodel=medium is equivlent to -mcmodel=large.
Linker option to disable global optimizations that become possible when the linker resolves addressing in the program. Relax may cause linking errors when using large data objects (>2GB).
Chooses generally optimal flags for the target platform.
Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy.
Place automatic arrays on the stack.
Staticily link with the NVIDIA runtime libraries. System libraries may still be dynamically linked.
Select Arm Neoverse V2 architecture (SVE x 128) as the target processor.
Chooses generally optimal flags for the target platform.
Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy.
Place automatic arrays on the stack.
Staticily link with the NVIDIA runtime libraries. System libraries may still be dynamically linked.
Select Arm Neoverse V2 architecture (SVE x 128) as the target processor.
Chooses generally optimal flags for the target platform.
Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy.
Place automatic arrays on the stack.
Staticily link with the NVIDIA runtime libraries. System libraries may still be dynamically linked.
Select Arm Neoverse V2 architecture (SVE x 128) as the target processor.
Don't include Fortran main program object module.
Chooses generally optimal flags for the target platform.
Instructs the compiler to use relaxed precision in the calculation of some intrinsic functions. Can result in improved performance at the expense of numerical accuracy.
Place automatic arrays on the stack.
Staticily link with the NVIDIA runtime libraries. System libraries may still be dynamically linked.
Select Arm Neoverse V2 architecture (SVE x 128) as the target processor.
This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.
Enable support for 64-bit indexing and single static data objects larger than 2GB in size. This option is default in the presence of -mcmodel=medium. Can be used separately together with the default small memory model for certain 64-bit applications that manage their own memory space.
Generate code for the large memory model. The default small memory model limits the combined area for a user's object or executable to 1GB, with the Linux kernel managing usage of the second 1GB of address for system routines, shared libraries, stacks, etc. Programs are started at a fixed address, and the program can use a single instruction to make most memory references. The large memory model allows for larger than 2GB data areas, or .bss sections. Program units compiled using either -mcmodel=medium, -mcmodel=larger, or -fpic require additional instructions to reference memory. The effect on performance is a function of the data-use of the application. The -mcmodel=large switch must be used at both compile time and link time to create 64-bit executables. Program units compiled for the default small memory model can be linked into the large memory model executables as long as they are compiled -fpic, or position-independent.
All level 1 and 2 optimizations are performed. In addition, this level enables more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable.
Level-two optimization (-O2 or -O) specifies global optimization. The -fast option generally will specify global optimization; however, the -fast switch will vary from release to release depending on a reasonable selection of switches for any one particular release. The -O or -O2 level performs all level-one local optimizations as well as global optimizations. Control flow analysis is applied and global registers are allocated for all functions and subroutines. Loop regions are given special consideration. This optimization level is a good choice when the program contains loops, the loops are short, and the structure of the code is regular.
The NVHPC compilers perform many different types of global optimizations, including but not limited to:
Level-one optimization specifies local optimization (-O1). The compiler performs scheduling of basic blocks as well as register allocation. This optimization level is a good choice when the code is very irregular; that is it contains many short statements containing IF statements and the program does not contain loops (DO or DO WHILE statements). For certain types of code, this optimization level may perform better than level-two (-O2) although this case rarely occurs.
The NVHPC compilers perform many different types of local optimizations, including but not limited to:
Align "unconstrained" data objects of size greater than or equal to 16 bytes on cache-line boundaries. An "unconstrained" object is a variable or array that is not a member of an aggregate structure or common block, is not allocatable, and is not an automatic array. On by default on 64-bit Linux systems.
Control automatic vector pipelining using SIMD instructions.
Enable automatic vector pipelining.
Instructs the vectorizer to enable certain associativity conversions that can change the results of a computations due to roundoff error. A typical optimization is to change an arithmetic operation to an arithmetic operation that is mathematically correct, but can be computationally different, due to round-off error.
Instructs the vectorizer to generate alternate code for vectorized loops when appropriate. For each vectorized loop the compiler decides whether to generate altcode and what type or types to generate, which may be any or all of:
The compiler also determines suitable loop count and array alignment conditions for executing the altcode.
Set SSE to flush-to-zero mode; if a floating-point underflow occurs, the value is set to zero.
Place automatic arrays on the stack.
Use the -mp option to instruct the compiler to interpret user-inserted OpenMP shared-memory parallel programming directives and generate an executable file which will utilize multiple processors in a shared-memory parallel system. When used strictly as a linker flag, the NVHPC OpenMP runtime will be linked and users can use the environment variables MP_BIND and MP_BLIST to bind a serial program to a CPU.
Instructs the compiler to use relaxed precision in the calculation of floating-point reciprocal square root (1/sqrt). Can result in improved performance at the expense of numerical accuracy.
Instructs the compiler to use relaxed precision in the calculation of floating-point square root. Can result in improved performance at the expense of numerical accuracy.
Instructs the compiler to use relaxed precision in the calculation of floating-point division. Can result in improved performance at the expense of numerical accuracy.
Instructs the compiler to allow floating-point expression reordering, including factoring. Can result in improved performance at the expense of numerical accuracy.
Flag description origin markings:
For questions about the meanings of these flags, please contact the tester.
For other inquiries, please contact info@spec.org
Copyright 2023 Standard Performance Evaluation Corporation
Tested with SPEC accel2023 v2.0.17.
Report generated on 2023-12-06 13:07:17 by SPEC accel2023 flags formatter v112 .