IBM XL Compiler Flags and Common Unix Commands and Environment Settings

Optimization Flags

-O5
- -O4
  - -O3
    - -O2
      - -O
    - -qhot=level=0
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- -qipa=level=2
-O4
- -O3
  - -O2
    - -O
  - -qhot=level=0
- -qipa=level=1
- -qarch=auto
- -qtune=auto
- -qsimd=auto
-O3
- -O2
  - -O
- -qhot=level=0
-O2
- -O
  - -O2
-O
- -O2
  - -O
-qarch
-qtune
-qnoinline
-qinlglue
-qhot, -qhot=level=1, -qhot=simd -qhot=novector
-qipa=level
-qnoipa
-qpdf1
-qpdf2
-qfdpr
-qnothreaded
-qnoxlcompatmacros
-qxlf90=nosignedzero
-q64
-qsmallstack=dynlenonheap
-qsave
-qsimd -qnosimd -qsimd=noauto
-qenablevmx -qnoenablevmx
-qvecnvol
-lmass
-lessl
-qessl
-qrtti
-qalias=noansi, -qalias=nostd
-qalign=natural
-qassert=refalign -qassert=contig
-qprefetch=aggressive
-qprefetch=dscr=42
-qnoprefetch
-qrestrict
-qsmp=auto
-qsmp=omp
-qstrict, -qnostrict
-D__extern_always_inline=inline
-qinline=40
-qipa=inline=limit=1000 -qipa=inline=threshold=100
-qipa=partition=large
-qipa=threads
-ltcmalloc
-lhugetlbfs
-tl
"-Wl,--wholearchive /usr/lib/libhugetlbfs.a"
"/usr/lib/libdl.a"
-link_no_whole_archive
-link_mul_defs
-hugetlbfs_BDT
-hugetlbfs_align
-B/opt/at8.0/share/libhugetlbfs/
-link_emit_relocation
-lstd8d
-Lstd
-Rstd

- -O5
- -O5\b
- Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
  -O5 is equivalent to the following flags
  - -O4
  - -qipa=level=2
- Includes:
  - -O4
    - -O3
      
      -O2
      
      -O
      
      -qhot=level=0
    - -qipa=level=1
    - -qarch=auto
    - -qtune=auto
    - -qsimd=auto
  - -qipa=level=2
- -O4
- -O4\b
- Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
  -O4 is equivalent to the following flags
  - -O3
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- Includes:
  - -O3
    - -O2
      
      -O
    - -qhot=level=0
  - -qipa=level=1
  - -qarch=auto
  - -qtune=auto
  - -qsimd=auto
- -O3
- -O3\b
- -O3 Performs additional optimizations that are memory intensive, compile-time intensive, and may change the semantics of the program slightly, unless -qstrict is specified. We recommend these optimizations when the desire for run-time speed improvements outweighs the concern for limiting compile-time resources. The optimizations provided include:
  - In-depth memory access analysis
  - Better loop scheduling
  - High-order loop analysis and transformations (-qhot=level=0)
  - Inlining of small procedures within a compilation unit by default
  - Eliminating implicit compile-time memory usage limits
  - Widening, which merges adjacent load/stores and other operations
  - Pointer aliasing improvements to enhance other optimizations
  -O3 is equivalent to the following flags
  - -O2
  - -qhot=level=0
- Includes:
  - -O2
    - -O
      
      -O2
  - -qhot=level=0
- -O2
- -O2\b
- -O2 Performs a set of optimizations that are intended to offer improved performance without an unreasonable increase in time or storage that is required for compilation including:
  - Eliminates redundant code
  - Basic loop optimization
  - Can structure code to take advantage of -qarch and -qtune settings
- Includes:
  - -O
    - -O2
      
      -O
- -O
- -O\b
- -O enables the level of optimization that represents the best tradeoff between compilation speed and run-time performance. If you need a specific level of optimization, specify the appropriate numeric value. Currently, -O is equivalent to -O2.
- Includes:
  - -O2
    - -O
      
      -O2
- -qarch
- -qarch=(\S+)\b
- Produces object code containing instructions that will run on the specified processors. "auto" selects the processor the compile is being done on. "pwr5x" is the POWER5+ processor.
  
  Supported values for this flag are
  - auto - Use the processor on which the program is compiled.
  - pwr8 - The POWER8 processor based systems.
  - pwr7 - The POWER7 processor based systems.
  - pwr6e - The POWER6 processor in "Enhanced" mode based systems.
  - pwr6 - The POWER6 processor based systems.
  - pwr5x - The POWER5+ processor based systems.
  - pwr5 - The POWER5 processor based systems.
  - pwr4 - The POWER4 processor based systems.
  - ppc970 - The PPC970 processor based systems.
- -qtune
- -qtune=(\S+)\b
- Specifies the system architecture for which the executable program is optimized. This includes instruction scheduling and cache setting.
  
  The supported values for suboption are
  - auto - Use the processor on which the program is compiled.
  - pwr8 - The POWER8 processor based systems.
  - pwr7 - The POWER7 processor based systems.
  - pwr6e - The POWER6 processor in "Enhanced" mode based systems.
  - pwr6 - The POWER6 processor based systems.
  - pwr5x - The POWER5+ processor based systems.
  - pwr5 - The POWER5 processor based systems.
  - pwr4 - The POWER4 processor based systems.
  - ppc970 - The PPC970 processor based systems.
- -qnoinline
- -qnoinline\b
- This option specifies that no functions are to be inlined.
- -qinlglue
- -qinlglue\b
- This option inlines glue code that optimizes external function calls when compiling.
- -qhot, -qhot=level=1, -qhot=simd -qhot=novector
- -qhot(=arraypad|=simd|=(no)?vector|=level=[01])?\b
- Performs high-order transformations on loops during optimization. The supported values for suboption are:
  - arraypad - The compiler will pad any arrays where it infers that there may be a benefit.
  - level=0 - The compiler performs a limited set of high-order loop transformations.
  - level=1 - The compiler performs its full set of high-order loop transformations.
  - simd - Replaces certain instruction sequences with vector instructions.
  - vector - Replaces certain instruction sequences with calls to the MASS library.
  Specifying -qhot without suboptions implies -qhot=nosimd, -qhot=noarraypad, -qhot=vector and -qhot=level=1. The -qhot option is also implied by -O4, and -O5.
- -qipa=level
- -qipa=level=[012]\b
- Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
  
  level=0 Does only minimal interprocedural analysis and optimization
  
  level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
  
  level=2 turns on full interprocedural data flow and alias analysis
- -qnoipa
- -qnoipa\b
- Suppresses interprocedural analysis (IPA), which is enabled by default at optimization levels -O4 and -O5.
- -qpdf1
- -qpdf1\b
- The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both execution path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
- -qpdf2
- -qpdf2\b
- The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
- -qfdpr
- -qfdpr\b
- The compiler generates additional symbol information for use by the "fdpr" binary optimization tool.
- -qnothreaded
- -qnothreaded\b
- Do not use the XL compiler thread information.
- -qnoxlcompatmacros
- -qnoxlcompatmacros\b
- Do not use the XL compiler compat macros.

-qxlf90=nosignedzero
-qxlf90=(signedzero|nosignedzero|autodealloc|noautodealloc|oldpad|nooldpad|)\b

         -qxlf90=<suboption>
                Determines whether the compiler provides the
                Fortran 90 or the Fortran 95 level of support for
                certain aspects of the language. <suboption> can be
                one of the following:

                signedzero | nosignedzero
                     Determines how the SIGN(A,B) function handles
                     signed real 0.0. In addition, determines
                     whether negative internal values will be
                     prefixed with a minus when formatted output
                     would produce a negative sign zero.
                autodealloc | noautodealloc
                     Determines whether the compiler deallocates
                     allocatable arrays that are declared locally
                     without either the SAVE or the STATIC
                     attribute and have a status of currently
                     allocated when the subprogram terminates.
                oldpad | nooldpad
                     When the PAD=specifier is present in the
                     INQUIRE statement, specifying -qxlf90=nooldpad
                     returns UNDEFINED when there is no connection,
                     or when the connection is for unformatted I/O.
                     This behavior conforms with the Fortran 95
                     standard and above. Specifying -qxlf90=oldpad
                     preserves the Fortran 90 behavior.

                Default:
                     o signedzero, autodealloc and nooldpad for the
                     xlf95, xlf95_r, xlf95_r7 and f95 invocation
                     commands.
                     o nosignedzero, noautodealloc and oldpad for
                     all other invocation commands.

- -q64
- -q64\b
- Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
- -qsmallstack=dynlenonheap
- -qsmallstack=dynlenonheap\b
- Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack
- -qsave
- -qsave\b
- Specifies that all local variables be treated as STATIC.
- -qsimd -qnosimd -qsimd=noauto
- -q(no)?simd(=auto|=noauto)?\b
- Enables the generation of vector instructions for processors that support them.
- -qenablevmx -qnoenablevmx
- -q(no)?enablevmx\b
- Enables the generation of vector instructions for processors that support them.
- -qvecnvol
- -qvecnvol\b
- Specifies whether to use volatile or non-volatile vector registers. Volatile vector registers are registers whose value is not preserved across function calls so the compiler will not depend on values in them across function calls.
- -lmass
- -lmass\b
- Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions.
- -lessl
- -lessl\b
- Link the Engineering and Scientific Subroutine Library (ESSL).
- -qessl
- -qessl\b
- Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so.
- -qrtti
- -qrtti\b
- Cause the C++ compiler to generate Run Time Type Identification code

-qalias=noansi, -qalias=nostd
-qalias=(noansi|nostd)\b

 qalias=ansi | noansi
   If ansi is specified, type-based aliasing is
   used during optimization, which restricts the
   lvalues that can be safely used to access a
   data object. The default is ansi for the xlc,
   xlC, and c89 commands. This option has no
   effect unless you also specify the -O option.

 qalias=std |nostd
   Indicates whether the compilation units contain
   any non-standard aliasing (see Compiler Reference
   for more information). If so, specify nostd.

-qalign=natural
-qalign=(\S+)\b

           Specifies what aggregate alignment rules the
                compiler uses for file compilation, where the
                alignment options are:

                bit_packed
                     The compiler uses the bit_packed alignment
                     rules.
                full
                     The compiler uses the RISC System/6000
                     alignment rules. This is the same as power.
                mac68k
                     The compiler uses the Macintosh alignment
                     rules.  This suboption is valid only for 32-
                     bit compilations.
                natural
                     The compiler maps structure members to their
                     natural boundaries.
                packed
                     The compiler uses the packed alignment rules.
                power
                     The compiler uses the RISC System/6000
                     alignment rules.
                twobyte
                     The compiler uses the Macintosh alignment
                     rules.  This suboption is valid only for 32-
                     bit compilations.  The mac68k option is the
                     same as twobyte.

                The default is -qalign=full.

-qassert=refalign -qassert=contig
-qassert=(refalign|contig)?\b

 qassert=refalign | norefalign | contig
   refalign specifies that all pointers inside the compilation
   unit only point to data that is naturally aligned
   according to the length of the pointer types.

   contig specifies the compiler can perform optimizations
   according to the memory layout of the objects occupying
   contiguous blocks of memory.

-qprefetch=aggressive
-qprefetch=aggressive\b

 qprefetch=aggressive
   Aggressively prefetch data

- -qprefetch=dscr=42
- -qprefetch=dscr=(\S+)\b
- The prefetch=dscr option causes the Data Streams Control Register to be set to the value specified when executing this program.
- -qnoprefetch
- -qnoprefetch\b
- The noprefetch option causes the compiler to generate no prefetch instructions and to not adjust the DSCR when executing this program.
- -qrestrict
- -qrestrict\b
- ```
 qrestrict
   TBD
```
- -qsmp=auto
- -qsmp=auto\b
- Yes
- Causes the compiler to automatically generate parallel code using OMP controls when possible.
- -qsmp=omp
- -qsmp=omp\b
- Yes
- Tell the compiler that OMP controls are used to identify parallel code.

-qstrict, -qnostrict
-q(no)?strict\b

                Ensures that optimizations done by default at
                optimization levels -O3 and higher, and, optionally
                at -O2, do not alter the semantics of a program.

                The -qstrict=all, -qstrict=precision,
                -qstrict=exceptions, -qstrict=ieeefp, and
                -qstrict=order suboptions and their negative forms
                are group suboptions that affect multiple,
                individual suboptions. Group suboptions act as if
                either the positive or the no form of every
                suboption of the group is specified.

                Default:

                     o Always -qstrict or -qstrict=all when the
                     -qnoopt or -O0 optimization level is in effect
                     o -qstrict or -qstrict=all is the default when
                     the -O2 or -O optimization level is in effect
                     o -qnostrict or -qstrict=none is the default
                     when -O3 or a higher optimization level is in
                     effect

                <suboptions_list> is a colon-separated list of one
                or more of the following:

                all | none
                     all disables all semantics-changing
                     transformations, including those controlled by
                     the ieeefp, order, library, precision, and
                     exceptions suboptions.  none enables these
                     transformations.
                precision | noprecision
                     precision disables all transformations that
                     are likely to affect floating-point precision,
                     including those controlled by the subnormals,
                     operationprecision, association,
                     reductionorder, and library suboptions.
                     noprecision enables these transformations.
                exceptions | noexceptions
                     exceptions disables all transformations likely
                     to affect exceptions or be affected by them,
                     including those controlled by the nans,
                     infinities, subnormals, guards, and library
                     suboptions. noexceptions enables these
                     transformations.
                ieeefp | noieeefp
                     ieeefp disables transformations that affect
                     IEEE floating-point compliance, including
                     those controlled by the nans, infinities,
                     subnormals, zerosigns, and operationprecision
                     suboptions. noieeefp enables these
                     transformations.
                nans | nonans
                     nans disables transformations that may produce
                     incorrect results in the presence of, or that
                     may incorrectly produce IEEE floating-point
                     signaling NaN (not-a-number) values. nonans
                     enables these transformations.
                infinities | noinfinities
                     infinities disables transformations that may
                     produce incorrect results in the presence of,
                     or that may incorrectly produce floating-point
                     infinities.  noinfinities enables these
                     transformations.
                subnormals | nosubnormals
                     subnormals disables transformations that may
                     produce incorrect results in the presence of,
                     or that may incorrectly produce IEEE
                     floating-point subnormals (formerly known as
                     denorms). nosubnormals enables these
                     transformations.
                zerosigns | nozerosigns
                     zerosigns disables transformations that may
                     affect or be affected by whether the sign of a
                     floating-point zero is correct. nozerosigns
                     enables these transformations.
                operationprecision | nooperationprecision
                     operationprecision disables transformations
                     that produce approximate results for
                     individual floating-point operations.
                     nooperationprecision enables these
                     transformations.
                order | noorder
                     order disables all code reordering between
                     multiple operations that may affect results or
                     exceptions, including those controlled by the
                     association, reductionorder, and guards
                     suboptions. noorder enables code reordering.
                association | noassociation
                     association disables reordering operations
                     within an expression. noassociation enables
                     reordering operations.
                reductionorder | noreductionorder
                     reductionorder disables parallelizing
                     floating-point reductions. noreductionorder
                     enables these reductions.
                guards | noguards
                     guards disables moving operations past guards
                     or calls which control whether the operation
                     should be executed or not. enables these
                     moving operations.
                library | nolibrary
                     library disables transformations that affect
                     floating-point library functions. nolibrary
                     enables these transformations.

- -D__extern_always_inline=inline
- -D__extern_always_inline=inline\b
- Macro to have compiler always inline externs if specified.
- -qinline=40
- -qinline=(\S+)\b
- The inline option specifies the threshold and limit of inlined functions
- -qipa=inline=limit=1000 -qipa=inline=threshold=100
- -qipa=inline=(\S+)\b
- The inline suboption specifies the threshold and limit of inlined functions
- -qipa=partition=large
- -qipa=partition=large\b
- The partition suboption specifies the size of the program sections that are analysed together. Larger partitons may produce better analysis but require more storage. Default is medium.
- -qipa=threads
- -qipa=threads(=\d+)?\b
- The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
- -ltcmalloc
- (?:^|(?<=\s))-ltcmalloc(?:=\S*)?(?=\s|$)
- Link with tcmalloc's library for Linux on POWER. This is a library that optimizes calls to new, delete, malloc and free.
- -lhugetlbfs
- (?:^|(?<=\s))-lhugetlbfs(?:=\S*)?(?=\s|$)
- Link with libhugetlbfs.so. This enables heap to be backed by the 16 Megabyte pages.

-tl
(?:^|(?<=\s))-tl(?:=\S*)?(?=\s|$)

Applies the prefix specified by the -B option to the designated components.

Parameter	Description	Executable name
a	Assembler	as
b	Low-level optimizer	xlfcode
c	Compiler front end	xlfentry
d	Disassembler	dis
F	C preprocessor	cpp
h	Array language optimizer	xlfhot
I	High-level optimizer, compile step	ipa
l	Linker	ld
z	Binder	bolt

- "-Wl,--wholearchive /usr/lib/libhugetlbfs.a"
- -Wl,--whole-archive\s/\S*
- Instructs the linker to include every object file in the specified library, rather than searching the library for the required object files.
- "/usr/lib/libdl.a"
- /usr/lib/libdl.a
- Instructs the linker to include libdl.a to enable dynamic linking loader.
- -link_no_whole_archive
- -Wl,--no-whole-archive
- Turn off the effect of the --whole-archive flag.
- -link_mul_defs
- -Wl,-z,muldefs
- Instructs the linker to allow multiple definitions and the first definition will be used. Normally when a symbol is defined multiple times, the linker will report a fatal error.
- -hugetlbfs_BDT
- -Wl,--hugetlbfs-link=BDT
- Pass the --hugetlbfs-link=BDT flag to the linker so that the text, initialized data, and BSS segments of the application are backed by hugepages.
- -hugetlbfs_align
- -Wl,--hugetlbfs-align
- Pass the --hugetlbfs-align flag to the linker so that we can control (by environment variable HUGETLB_ELFMAP) which program segments are placed in hugepages.
- -B/opt/at8.0/share/libhugetlbfs/
- -B/\S*
- Determines substitute path names for XL Fortran executables such as the compiler, assembler, linker, and preprocessor. It can be used in combination with the -t option, which determines which of these components are affected by -B.
- -link_emit_relocation
- -Wl,-q\b
- Pass the -q flag to the linker causing the final executable to have the relocation information.
- -lstd8d
- (?:^|(?<=\s))-lstd8d(?:=\S*)?(?=\s|$)
- Link with the Apache C++ Standard Library ("stdcxx"). "libstd8d.so" is a 32-bit shared library with optimization enabled.
- -Lstd
- -L\s*[^ ]*stdcxx[^ ]*
- Adds the directory for the Apache C++ Standard Library to the search path at link time.
- -Rstd
- -R\s*[^ ]*stdcxx[^ ]*
- Specifies library search directory for the Apache C++ Standard Library for use by the runtime linker. The information is recorded in the object file and passed to the runtime linker.

IBM XL Compiler Flags and Common Unix Commands and Environment Settings

Sections

Optimization Flags

Portability Flags

Compiler Flags

Other Flags

Commands and Options Used for Feedback-Directed Optimization