Compilers: IBM XL C/C++ Enterprise Edition Version 8.0 for AIX
Compilers: IBM XL Fortran Enterprise Edition Version 10.1 for AIX
Compilers: IBM XL C/C++ Enterprise Edition Version 9.0 for AIX
Compilers: IBM XL Fortran Enterprise Edition Version 11.1 for AIX
OS: IBM AIX 5L V5.3
Last updated: 13-Jun-2007
Selecting one of the following will take you directly to that section:
Perform optimizations for maximum performance. This includes maximum interprocedural analysis on all of the objects presented on the "link" step. This level of optimization will increase the compiler's memory usage and compile time requirements. -O5 Provides all of the functionality of the -O4 option, but also provides the functionality of the -qipa=level=2 option.
-O5 is equivalent to the following flags
Perform optimizations for maximum performance. This includes interprocedural analysis on all of the objects presented on the "link" step.
-O4 is equivalent to the following flags
-O3 Performs additional optimizations that are memory intensive, compile-time intensive, and may change the semantics of the program slightly, unless -qstrict is specified. We recommend these optimizations when the desire for run-time speed improvements outweighs the concern for limiting compile-time resources.
-O2 is equivalent to the following flags
Produces object code containg instructins that will run on the specified processors. "auto" selects the processor the complile is being done on. "pwr5x" is the POWER5+ processor.
Supported values for this flag are
Specifies the architecture system for which the executable program is optimized. This includes instruction scheduling and cache setting. The supported values for suboption are:
This option inlines glue code that optimizes external function calls when compiling.
Performs high-order transformations on loops during optimization.
Enhances optimization by doing detailed analysis across procedures (interprocedural analysis or IPA). The level determines the amount of interprocedural analysis and optimization that is performed.
level=0 Does only minimal interprocedural analysis and optimization
level=1 turns on inlining , limited alias analysis, and limited call-site tailoring
level=2 turns on full interprocedural data flow and alias analysis
The option used in the first pass of a profile directed feedback compile that causes pdf information to be generated. The profile directed feedback optimization gathers data on both exectuion path and data values. It does not use hardware counters, nor gather any data other than path and data values for PDF specific optimizations.
The option used in the second pass of a profile directed feedback compile that causes PDF information to be utilized during optimization.
-qxlf90=Determines whether the compiler provides the Fortran 90 or the Fortran 95 level of support for certain aspects of the language. can be one of the following: signedzero | nosignedzero Determines how the SIGN(A,B) function handles signed real 0.0. In addition, determines whether negative internal values will be prefixed with a minus when formatted output would produce a negative sign zero. autodealloc | noautodealloc Determines whether the compiler deallocates allocatable arrays that are declared locally without either the SAVE or the STATIC attribute and have a status of currently allocated when the subprogram terminates. oldpad | nooldpad When the PAD=specifier is present in the INQUIRE statement, specifying -qxlf90=nooldpad returns UNDEFINED when there is no connection, or when the connection is for unformatted I/O. This behavior conforms with the Fortran 95 standard and above. Specifying -qxlf90=oldpad preserves the Fortran 90 behavior. Default: o signedzero, autodealloc and nooldpad for the xlf95, xlf95_r, xlf95_r7 and f95 invocation commands. o nosignedzero, noautodealloc and oldpad for all other invocation commands.
Generates 64 bit ABI binaries. The default is to generate 32 bit ABI binaries.
Causes the system loader to put the heap in its own segment of the size specified. This is only required for 32-bit applications, as their segments are 256M. If the last digit of the value is "C", then it also turns off the malloc pool option for that executable.
Sets the bit in the file's XCOFF header indicating that this executable will request the use of large pages when they are available on the system and when the user has an appropriate privilege
Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 and higher based systems.
Indicates that the compiler understands how to do alloca().
Causes the Fortran compiler to allocate dynamic arrays on the heap instead of the stack
Enables the generation of vector instructions for processors that support them.
Specifies whether to use volatile or non-volatile vector registers. Volatile vector registers are registers whose value is not preserved across function calls so the compiler will not depend on values in them across function calls.
The __IBM_FAST_VECTOR macro defines a different iterator for the std::vector template class. This iterator results in faster code, but is not compatible with code using the default iterator for a std::vector template class. All uses of std::vector for a data type must use the same iterator. Add -D__IBM_FAST_VECTOR to the compile line, or "#define __IBM_FAST_VECTOR 1" to your source code to use the faster iterator for std::vector template class. You must compile all sources with this macro.
Causes AIX to define "ischar()" (and friends) as macro's and no subroutines.
Cause the C++ compiler to generate Run Time Type Identification code
qalias=ansi | noansi If ansi is specified, type-based aliasing is used during optimization, which restricts the lvalues that can be safely used to access a data object. The default is ansi for the xlc, xlC, and c89 commands. This option has no effect unless you also specify the -O option. qalias=std |nostd Indicates whether the compilation units contain any non-standard aliasing (see Compiler Reference for more information). If so, specify nostd.
Specifies what aggregate alignment rules the compiler uses for file compilation, where the alignment options are: bit_packed The compiler uses the bit_packed alignment rules. full The compiler uses the RISC System/6000 alignment rules. This is the same as power. mac68k The compiler uses the Macintosh alignment rules. This suboption is valid only for 32- bit compilations. natural The compiler maps structure members to their natural boundaries. packed The compiler uses the packed alignment rules. power The compiler uses the RISC System/6000 alignment rules. twobyte The compiler uses the Macintosh alignment rules. This suboption is valid only for 32- bit compilations. The mac68k option is the same as twobyte. The default is -qalign=full.
Causes the compiler to treat "char" variables as signed instead of the default of unsigned.
Indicates that the input fortran source program is in fixed form.
Adds an underscore to global entites to match the C compiler ABI
Invoke the IBM XL C compiler. 32-bit binaries are produced by default.
Invoke the IBM XL C++ compiler. 32-bit binaries are produced by default.
Invoke the IBM XL Fortran compiler. 32-bit binaries are produced by default.
Allows most any c dialect.
Turns off aggressive optimizations which have the potential to alter the semantics of your program. -qstrict sets -qfloat=nofltint:norsqrt. -qnostrict sets -qfloat=rsqrt. This option is only valid with -O2 or higher optimization levels. Default: o -qnostrict at -O3 or higher. o -qstrict otherwise.
Specifies whether to include standard object code in the object files. The noobject suboption can substantially reduce overall compilation time, by not generating object code during the first IPA phase. This option does not affect the code in the final binary created.
The threads suboption allows the IPA optimizer to run portions of the optimization process in parallel threads, which can speed up the compilation process on multi-processor systems. All the available threads, or the number specified by N, may be used. N must be a positive integer. Specifying nothreads does not run any parallel threads; this is equivalent to running one serial thread. This option does not affect the code in the final binary created.
Causes the compiler to output a traceback if it abends.
Suppresses the message with the message number specified.
Usage: chsyscfg -r lpar | prof | sys | sysprof | frame -m <managed system> | -e <managed frame> -f <configuration file> | -i "<configuration data>" [--help] Changes partitions, partition profiles, system profiles, or the attributes of a managed system or a managed frame. -r - the type of resource(s) to be changed: lpar - partition prof - partition profile sys - managed system sysprof - system profile frame - managed frame -m <managed system> - the managed system's name -e <managed frame> - the managed frame's name -f <configuration file> - the name of the file containing the configuration data for this command. The format is: attr_name1=value,attr_name2=value,... or "attr_name1=value1,value2,...",... -i "<configuration data>" - the configuration data for this command. The format is: "attr_name1=value,attr_name2=value,..." or ""attr_name1=value1,value2,...",..." --help - prints this help The valid attribute names for this command are: -r prof required: name, lpar_id | lpar_name optional: ... lpar_proc_compat_mode (default | POWER6_enhanced)
Environment variables set before the run:
Usage: fdpr [options] -p program [-x invocation] where -p specifies the input program, in a form of executable, shared object or archive file -x specifies how to invoke the program [options] can be one or more of the following: Action Options: -123 Specifies which actions/phases to run, where: -1 generates instrumented program for profile gathering -2 runs the instrumented program and updates profile data (requires -x <invocation>) -3 generates optimized program Default is set to run all three phases (-123) -a/--action [action] Specifies customized actions where [action] can be one of the following: anl analyze program instr generate instrumented program for profile gathering (same as -1) opt generate optimized program (same as -3) check_sign check fdpr signature in the input program Analysis Options: -esa, --extra-safe-analysis Do not attempt to analyze unconventional CSects containing hand-written assembly code (when used, must be specified at both instrumentation and optimization phases) -aawc/-noaawc, --analyze-assembly-written-csects/--noanalyze-assembly-written-csects Analyze/Do not analyze objects written in assembly (when used/not used, must be specified at both instrumentation and optimization phases). The default is set to analyze assembly written modules -iinf, --ignore-info Ignore .info sections produced with the -qfdpr option during compile time -fca, --funcsect-analysis Apply special analysis for an input executable that was compiled with the -qfuncsect compiler option -ff <string>, --file-format <string> Input file format: can be LM (load module) or PO (program object) Instrumentation Options: -ei, --embedded-instrumentation Perform embedded instrumentation. Profile will be collected into global variables -infp, --ignore-not-found-procedures Ignore not found procedures -fd <Fdesc>, --file-descriptor <Fdesc> Set file descriptor number to be used when opening the profile file that is mapped to the shared memory area during profiling. The default of <Fdesc> is set to the maximum-allowed open files -M <addr>, --profile-map <addr> Set shared memory segment address for profiling. Alternate shared memory addresses are needed when the instrumented program application creates a conflict with the shared-memory addresses preserved for the profiling. Typical alternative values are 0x40000000, 0x50000000, ... up to 0xC0000000. Default is set to 0x3000000 -ri/-nori, --register-instrumentation/--noregister-instrumentation Instrument/Do not instrument the input program file to collect profile information about indirect branches via registers (applicable only with the -a instr option). The default is set to collect the profile information -sfp/-nosfp, --save-floating-point-registers/--nosave-floating-point-registers Save/Do not save floating point registers in instrumented code (the default is set to save floating point registers) -ipnvr, --instrumentation-preserve-non-volatile-registers Preserve non volatile registers while calling stubs -iplr/-noiplr, --instrumentation-preserve-link-register/--noinstrumentation-preserve-link-register Preserve/Do not preserve link register while calling stubs -ipcr/-noipcr, --instrumentation-preserve-condition-register/--noinstrumentation-preserve-condition-register Preserve/Do not preserve Condition Register while calling stubs -ipctr/-noipctr, --instrumentation-preserve-count-register/--noinstrumentation-preserve-count-register Preserve/Do not preserve Count Register while calling stubs -ipxer/-noipxer, --instrumentation-preserve-fixed-point-exception-register/--noinstrumentation-preserve-fixed-point-exception-register Preserve/Do not preserve Fixed-Point Exception Register while calling stubs -ipspr/-noipspr, --instrumentation-preserve-special-registers/--noinstrumentation-preserve-special-registers Preserve/Do not preserve special purpose registers while calling stubs -ipvr/-noipvr, --instrumentation-preserve-volatile-registers/--noinstrumentation-preserve-volatile-registers Preserve/Do not preserve volatile registers while calling stubs. -noipvr implies -noipnvr and -nosfp -ipe/-noipe, --instrumentation-preserve-environment/--noinstrumentation-preserve-environment Do not preserve registers that are not overwritten while calling stubs. -noipe implies -noipvr -noipspr -spescr <0-127>, --spe-scratch-register <0-127> Specify a global SPE scratch register, decreasing instrumenation overhead, in order to minimize possibility of local store overflow Profile Files Options: -af <prof_file>, --ascii-profile-file <prof_file> Set the name of an ASCII profile file containing profile information given by three different XML entry options: <Simple .. >, <Cond .. > and <Reg .. > for profiling data on regular, conditional or branch via registers instructions accordingly -aop, --accept-old-profile Accept old profile file collected on previous versions of the input program file (requires the -f flag) -f <prof_file>, --profile-file <prof_file> Set the profile file name. The profile file is created during the instrumentation phase when issued with -a instr option. The file is read by fdpr when issued with the -a opt or -a anl options. Note, the profile file is updated automatically when running the instrumented program -spefdir <directory>, --spe-profile-directory <directory> Set the directory where SPE profiles are located in integrated mode (see -cell). Default is where <program> is located Optimization Options: -A <num_of_bytes>, --align-code <num_of_bytes> Align program code according to given <num_of_bytes> -bldcg, --build-dcg Build a DCG (data connectivity graph) for enhanced data reordering (applicable only with the -RD flag) -btcar, --branch-table-csect-anchor-removal Eliminate load instructions related to the usage of branch tables in the code -cRD, --conservativeRD Perform conservative static data reordering by packing all frequently referenced static variables together -cbtd, --convert-bss-to-data Convert bss section into a data section (useful for more aggressive tocload and RD optimizations) -dce, --dead-code-elimination Eliminate instructions related to unused local variables within frequently executed functions (useful mainly after applying function inlining optimization) -dp, --data-prefetch Insert dcbt instructions to improve data-cache performance -dpht <threshold>, --data-placement-hotness-threshold <threshold> Set data placement algorithm hotness threshold between (0,1), where 0 will reorder the static variables in large groups based on the control flow, and 1 reorders the variables in very small groups based on their access frequency. (applicable only with the -RD flag) -dpnf <factor>, --data-placement-normalization-factor <factor> Set data placement algorithm normalization factor between (0,1), where 0 causes static variables to be reordered regardless of their size, and 1 locates only small sized variables first. (applicable only with the -RD flag) -ece, --epilog-code-eliminate Reduce code size by grouping common instructions in functions' epilogs, into a single unified code -fc, --function-cloning Enable only function cloning phase during function inlining optimizations (applicable only with function inlining flags: -i, -si, -ihf, -isf, -shci) -hr, --hco-reschedule Relocate instructions from frequently executed code to rarely executed code areas, when possible -hrf <factor>, --hco-resched-factor <factor> Set the aggressiveness of the -hr optimization option according to a factor value between (0,1), where 0 is the least aggressive factor (applicable only with the -hr option) -i, --inline Same as --selective-inline with --inline-small-funcs 12 -ihf <pct>, --inline-hot-functions <pct> Inline all function call sites to functions that have a frequency count greater than the given <pct> frequency percentage -isf <size>, --inline-small-funcs <size> Inline all functions that are smaller or equal to the given <size> in bytes -kr, --killed-registers Eliminate stores and restores of registers that are killed (overwritten) after frequently executed function calls -lap, --load-address-propagation Eliminate load instructions of variables' addresses by re-using pre-loaded addresses of adjacent variables -las, --load-after-store Add NOP instructions to place each load instruction further apart following a store instruction that reference the same memory address -lro, --link-register-optimization Eliminate saves and restores of the link register in frequently-executed functions -lu <aggressiveness_factor>, --loop-unroll <aggressiveness_factor> Unroll short loops containing of one to several basic blocks according to an aggressiveness factor between (1,9), where 1 is the least aggressive unrolling option for very hot and short loops -lun <unrolling_number>, --loop-unrolling-number <unrolling_number> Set the number of unrolled iterations in each unrolled loop. The allowed range is between (2,50). Default is set to 2. (applicable only with the -lu flag) -O Switch on basic optimizations only. Same as -RC -nop -bp -bf -O2 Switch on less aggressive optimization flags. Same as -O -hr -pto -isf 8 -tlo -kr -O3 Switch on aggressive optimization flags. Same as -O2 -RD -isf 12 -si -dp -lro -las -vro -btcar -lu 9 -rt 0 -O4 Switch on aggressive optimization flags together with aggressive function inlining. Same as -O3 -sidf 50 -ihf 20 -pc, --preserve-csects Preserve CSects' boundaries in reordered code -pca, --propagate-constant-area Relocate the constant variables area to the top of the code section when possible -pfb, --preserve-first-bb Preserve original location of the entry point basic block in program -pp, --preserve-functions Preserve functions' boundaries in reordered code -pr/-nopr, --ptrgl-r11/--noptrgl-r11 Perform/Do not perform removal of R11 load instruction in _ptrgl csect (the default is to perform the optimization) -pto, --ptrgl-optimization Perform optimization of indirect call instructions via registers by replacing them with conditional direct jumps -ptosl <limit_size>, --ptrgl-optimization-size-limit <limit_size> Set the limit of the number of conditional statements generated by -pto optimization. Allowed values are between 1..100. Default value set to 3. (applicable only with the -pto flag) -ptoht <heatness_threshold>, --ptrgl-optimization-heatness-threshold <heatness_threshold> Set the frequency threshold for indirect calls that are to be optimized by -pto optimization. Allowed range between 0..1. Default is set to 0.8. (applicable only with -pto flag) -rcaf <aggressiveness_factor>, --reorder-code-aggressivenes-factor <aggressiveness_factor> Set the aggressiveness of code reordering optimization. Allowed values are 1 and 2, where 1 is less aggressive. Default is set to 1. (applicable only with the -RC flag) -rcctf <termination_factor>, --reorder-code-chain-termination-factor <termination_factor> Set the threshold fraction which determines when to terminate each chain of basic blocks during code reordering. Allowed input range is between 0.0 to 1.0 where 0.0 generates long chains and 1.0 creates single basic block chains. Default is set to 0.05. (applicable only with the -RC flag) -rccrf <reversal_factor>, --reorder-code-condition-reversal-factor <reversal_factor> Set the threshold fraction which determines when to enable condition reversal for each conditional branch during code reordering. Allowed input range is between 0.0 to 1.0 when 0.0 tries to preserve original condition direction and 1.0 ignores it. Default is set to 0.8 (applicable only with the -RC flag) -RD, --reorder-data Perform static data reordering -rmte, --remove-multiple-toc-entries Remove multiple TOC entries pointing to the same location in the input program file -rt <removal_factor>, --reduce-toc <removal_factor> Perform removal of TOC entries according to a removal factor between (0,1), where 0 removes non-accessed TOC entries only, and 1 removes all possible TOC entries -sdp <aggressiveness_factor>, --stride-data-prefetch <aggressiveness_factor> Perform data prefetching within frequently executed loops based on stride analysis, according to an aggressiveness factor between (1,9), where 1 is least aggressive -sdpla <iterations_number>, --stride-data-prefetch-look-ahead <iterations_number> Set the number of iterations for which data is prefetched into the cache ahead of time. Default value is set to 4 iterations. (applicable only with the -sdp flag) -sdpms <stride_min_size>, --stride-data-prefetch-min-size <stride_min_size> Set the minimal stride size in bytes, for which data will be considered as a candidate for prefetching. Default value is set to 128 bytes. (applicable only with the -sdp flag) -shci <pct>, --selective-hot-code-inline <pct> Perform selective inlining of functions in order to decrease the total execution counts -si, --selective-inline Perform selective inlining of dominant hot function calls -sll <Lib1:Prof1,...,LibN:ProfN>, --static-link-libraries <Lib1:Prof1,...,LibN:ProfN> Statically link hot code from specified dynamically linked libraries to the input program. The parameter consists of comma-separated list of libraries and their profiles. IMPORTANT: licensing rights of specified libraries should be observed when applying this copying optimization -sllht <hotness_threshold>, --static-link-libraries-hotness-threshold <hotness_threshold> Set hotness threshold for the --static-link-libraries optimization. The allowed input range is between 0 (least aggressive) to 1, or -1, which does not require profile and selects all code that might be called by the input program from the given libraries. Default is 0.5 -sidf <percentage_factor>, --selective-inline-dominant-factor <percentage_factor> Set a dominant factor percentage for selective inline optimization. The allowed range is between (0,100). Default is set to 80 (applicable only with the -si and -pbsi flags) -siht <frequency_factor>, --selective-inline-hotness-threshold <frequency_factor> Set a hotness threshold factor percentage for selective inline optimization to inline all dominant function calls that have a frequency count greater than the given frequency percentage. Default is set to 100 (applicable only with the -si -pbsi flags) -so, --stack-optimization Reduce the stack frame size of functions which are called with a small number of arguments -stf, --stack-flattening Merge the stack frames of inlined functions with the frames of the calling functions -tb, --preserve-traceback-tables Force the restructuring of traceback tables in reordered code. If -tb option is omitted, traceback tables are automatically included only for C++ applications which use the Try & Catch mechanism -rtb, --remove-traceback-tables Remove traceback tables in reordered code -tlo, --tocload-optimization Replace each load instruction that references the TOC with a corresponding add-immediate instruction via the TOC anchor register, when possible -ucde, --unreachable-code-data-elimination Remove unreachable code and non-accessed static data -vro, --volatile-registers-optimization Eliminate stores and restores of non-volatile registers in frequently executed functions by using available volatile registers Output Options: -d, --disassemble-text Print the disassembled text section of the output program into <output_file>.dis_text file -dap, --dump-ascii-profile Dump profile information in ASCII format into <program>.aprof (requires the -f flag) -db, --disassemble-bss Print the disassembled bss section of the output program into <output_file>.dis_bss file -dd, --disassemble-data Print the disassembled data section of the output program into <output_file>.dis_data file -diap, --dump-initial-ascii-profile Dump initial profile information in ASCII format into <program>.aprof.init (requires the -f flag) -dim, --dump-instruction-mix Dump instruction mix statistics based on gathered profile information -dm, --dump-mapper Print a map of basic blocks and static variables with their respective new -> old addresses into a <program>.mapper file -o <output_file>, --output-file <output_file> Set the name of the output file. The default instrumented file is <program>.instr. The default optimized file is <program>.fdpr -pif, --print-inlined-funcs Print the list of inlined functions along with their corresponding calling functions, in ASCII format into a <program>.inlined file (requires the -si or -i or -isf flags) -ppcf, --print-prof-counts-file Print the profiling counters in ASCII format into a <program>.counts file (requires the -f flag) -simo, --single-input-multiple-outputs Optimize in parallel into multiple outputs as specified by option sets read from stdin -sf, --strip-file Strip the optimized output file -spe, --speculative-profile-enhancement Complements given partial profile information of basic blocks' frequencies, i.e., transforms basic block profile to a complete edge profile -spedir <directory>, --spe-directory <directory> Set the directory into which SPE executables will be extracted and from which they will be encapsulated -enc, --encapsulate Encapsulate SPE executables present in the PPE input (see --spe-directory) General Options: -gro, --generate-relinkable-output Generate relinkable output -h, --help Print online usage help -m <machine-model>, --machine <machine-model> Generate code for the specified machine model. Target machine can be one of the following models: power2, power3, ppc405, ppc440, power4, ppc970, power5, power6, spe, spe_edp. Default is set to no machine -q, --quiet Set quiet output mode, suppressing informational messages -st <stat_file>, --statistics <stat_file> Output statistics information to <stat_file>. If <stat_file> is '-', output goes to standard output. See --verbose for the default -V, --version Print version -v <level>, --verbose <level> Set verbose output mode level. When set, various statistics about the target optimized program are printed into file <program>.stat. Allowed level range is between (0,3). Default is set to 0 -cell, --cell-supervisor Integrated PPE/SPE processing. Perform SPE extraction, processing, and encapsulation automatically prior to PPE processing -armember For archive files - list of archive members to be optimized, if -armember is not specified, all members will be optimized