IBM AIX Flag Disclosure SPEC CPU2000 & OMP2001 For use with AIX submissions with the IBM XL compilers. Last Revised 16 July, 2006 Notes ===== The IBM C/C++ & Fortran compilers produce 32-bit binaries by default. Flags are described below which cause the compilers to produce 64-bit binaries. Source Level Portability Options ================================ -DHOST_WORDS_BIG_ENDIAN (176.gcc) Host system is big-endian. -DAIX (186.crafty) Sets some basic parameters like endian-ess, OS type, and ANSI language extensions to be compatible with an AIX system. -DNDEBUG (252.eon) SPEC default for C++ compiler but also needed explicitly by some linkers. Defining this disables any assert macros used for debugging. -DNEED_EXPLICIT_SPECIALIZATION (252.eon) Supply function definitions with explicit types in two cases where templatized versions fail to compile. -DSPEC_CPU2000_AIX (253.perlbmk) Compile the SPEC CPU2000 modified perl for an AIX system. -DSYS_IS_BSD (254.gap) Compile gap for a BSDish system. -DSYS_STRING_H (254.gap) Do not explicitly include string.h -DSYS_HAS_TIME_PROTO (254.gap) Do not supply prototypes for the time(), times() and getrusage() functions. -DSYS_HAS_MALLOC_PROTO (254.gap) Do not supply prototypes for malloc() and free(). -DSYS_HAS_CALLOC_PROTO (254.gap) Do not supply a prototype for calloc(). -DHAVE_SIGNED_CHAR (300.twolf) System allows signed char type. Compiler Invocation =================== xlc Invokes the compiler for C source files with a default language level of ansi and specifies that it allow type-based aliasing. cc Invokes the compiler for C source files with a default language of extended and specifies that it provide compatibility with older IBM compilers and allow placement of string literals or constant values in read/write storage. cc does not conform to the ISO/ANSI C standard. xlC Invokes the compiler for C++ source files with a default language level of ansi and specifies that it allow type-based aliasing. xlc_r The same as "xlc" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf Invokes the compiler for Fortran source files with a default language of Fortran 77. xlf90 Invokes the compiler for Fortran source files with a default language of Fortran 90. xlf90_r The same as "xlf90" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. cleanpdf Erase the information in the PDF directory if any exists to ensure no feedback information is reused between compilations. Compiler Options ================ -ma Use built-in alloca() function. -O Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. -O3 Perform some memory and compile time intensive optimizations in addition to those executed with -O. The -O3 specific optimizations have the potential to slightly alter the semantics of a user's program. Optimizations may include, but are not limited to: Aggressive code motion, and scheduling on computations that have the potential to raise an exception, but no valid exceptions will be suppressed; Relaxed conformance to IEEE rules in cases where the difference in the results is not important to an application; Rewriting of floating point expressions. -O4 Equivalent to -O3 -qipa -qhot with automatic generation of architecture ( -qarch= )and tuning ( -qtune= )options ideal for that platform. The qipa level defaults to level=1. -O5 Equivalent to -O3 -qipa=level=2 -qhot with automatic generation of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal for that platform. -Dfloor=__floor Causes the XL compiler to inline this function whenever possible. -D_ILS_MACROS Defined in /usr/include/ctype.h to use the macro version of the string classification functions (e.g. isupper()). -Q, -qinline The -Q option without any list inlines all appropriate procedures, subject to limits on the number of inlined calls and the amount of code size increase as a result. -qinline is an alias for -Q. -Q=xxx Inline all functions that contain less than xxx lines of abstract code units. -q64 Selects 64-bit compiler mode. -qalign=struct=natural The compiler maps structure members to their -qalign=natural natural boundaries. The first form is used by the Fortran compiler; the second form is used by the C compiler and is a deprecated form for the Fortran compiler. -qansialias Use type-based aliasing during optimization -qarch=ppc Produces object code containing instructions that will run on any of the 32-bit PowerPC hardware platforms. -qarch=ppc970 Produces object code containing instructions that will run on PPC970 processors. -qarch=pwr3 Produces object code containing instructions that will run on power3 processors. -qarch=pwr4 Produces object code containing instructions that will run on power4/power4+ processors. -qarch=pwr5 Produces object code containing instructions that will run on power5 processors. -qarch=pwr5x Produces object code containing instructions that will run on power5+ processors. -qarch=rs64b Produces object code containing instructions that will run on RS64-II processors. -qarch=auto Produces object code containing instructions that will run on the hardware platform on which the program is compiled. -qdatalocal Changes the default to assume that all variables ar local. -qenablevmx On PPC970 processors, binary can contain instructions for the vector arithmetic (VMX) unit. -qessl Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so. -qlibessl Specifies that all functions whose names match ESSL library- functions are, in fact, the library functions. -qfdpr Collect information about programs for use with the AIX fdpr (Feedback Directed Program Restructuring) performance-tuning utility. -qfixed Indicates that the input source program is in fixed form. Allows fixed format Fortran 77 programs to be compiled using the xlf90 compiler invocation. -qfixed= States that Fortran code is in fixed source form, with optional argument specifying the maximum line length. -qfloat=rsqrt Changes a division by the result of a square root operation into a multiply by the reciprocal of the square root. -qhot Perform high-order transformations on loops during optimization. -qhot=arraypad Pad the sizes of arrays to align better in cache. -qipa=level=1 Turns on interprocedural analysis with inlining, limited alias analysis, and limited call-site tailoring. This is the default level of -qipa. -qipa=level=2 Turns on interprocedural analysis with inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal. -qipa=noobject Do not generate object files during the first stage of inter- procedural analysis. -qinline Alias for -Q. See -Q. -qipa=partition=large Specifies the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize. -qipa=threads=n Tells the compiler it can use n threads during interprocedure analysis and code generation. -qlanglvl=ansi Compilation conforms to the ANSI standard. -qlargepage Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 or better CPUs. -qmaxmem=-1 Allows the compiler to use as much memory as it needs to execute. -qpdf1/pdf2 Profile directed feedback optimization -qsave Sets the default storage class for local variables to STATIC. -qsmp=omp Enable OpenMP parallelization directives. -qsuffix=f=f90 Sets the suffix for source files to be .f90. The .f90 suffix is required by xlf90 to compile Fortran 90 programs. -qtune=604 Instruction selection, scheduling, and other implementation dependent performance enhancements for the PowerPC 604/604e processor. -qtune=pwr3 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power3 processor. -qtune=pwr4 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power4/Power4+ processors. -qtune=pwr5 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power5 processors. -qtune=pwr5x Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power5+ processors. -qtune=rs64b Instruction selection, scheduling, and other implementation dependent performance enhancements for the RS64-II processor. -qtune=auto Instruction selection, scheduling, and other implementation dependent performance enhancements for the hardware platform on which the program is compiled. -qunroll=n Unrolls inner loops in th program by a factor of n. -w Suppress warning messages from the C, C++, and Fortran compilers. Linker Options ============== -Ldir Link looks in the directory that is specified by the option "dir". -lblacssmp Link the Parallel ESSL SMP BLACS Library. -lessl Link the Engineering and Scientifc Subroutine Library (ESSL). -lesslsmp Link the threadsafe version of the ESSL library. -lpesslsmp Link the threadsafe, parallelized version of the ESSL library. -lmass Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions. See http://techsupport.services.ibm.com/server/mass?fetch=home.html -lhmu Link fast malloc libraries. These libraries are part of the memdbg package that is included with IBM C compilers. -lpdf Routines used in the first pass of the profile directed feedback process. Routines from this library are not used in building the final executable. In newer compilers, -qpdf1 does this automatically, so using this in conjunction with -qpdf1 is redundant. -blpdata Sets the bit in the file's XCOFF header indicating that this executable will request the use of large pages when they are available on the system and when the user has an appropriate privilege -bdatapsize:64K These flags set the page-sizes of the data, stack, and -bstackpsize:64K text segments to 64K. -btextpsize:64K -bmaxdata:0x........ Sets the maximum combined size of the program's stack- and data- segments to this number of byes, specified in hexadecimal, when the default is too small. -bmaxdata:0x......../dsa The "dsa" causes shared library readonly segments to be mapped into the user address space only if needed. -bnso Brings referenced library procedures into the object file -bI:/lib/syscalls.exp Create statically linked object files (syscalls.exp supplies the names of the routines that can be imported). FDPR: ===== The fdpr (feedback directed program restructuring) program optimizes the executable image of a program by collecting information on the behavior of the program while the program is used for some typical workload, and then creating a new version. It is available on AIX Version 4 and 5 systems as part of the Performance Toolbox for AIX. Options: -o OutFile Specifies the name of the output file from the optimizer. -p ProgramName The name of the executable program to optimize. -q Processing/compilation produces no output to STDOUT. -v Selects verbose output during processing/compilation. -x Command Specifies the command used for invoking the instrumented program. All the arguments after the -x flag are used for the invocation. -O2 Employ a program-reordering technique in which the original structure of the program, including traceback entries, is preserved. -O3 Employ global reordering techniques that do not preserve debug information. The compilers include an optional "-qfdpr" flag that assists FDPR analysis but is not required for it. Large Page Settings: ==================== vmo command options (AIX 5.2 & above): -r Apply changes at the system boot. -o lgpg_regions=# Specifies the number of large pages to reserve. Example: #=200 allows 200 large pages to be reserved. -o lgpg_size=# Specifies the size in bytes of the hardware-supported large pages. Example: #=16777216 is a 16M page size. vmtune/vmtune64 command options (AIX 5.1 Only): -g Sets the page size for the large page. Example: -g 16777216 for 16M page. -L Sets the number of large pages. Example: -L 200 allows 200 large pages. -y1 Enables the memory affinity. chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE $USER Allows $USER (non-root ID) to access the large pages that are available. It takes effect on next login. bosboot -a Creates a boot image used on the next system reboot. shutdown -rF Halt the operating system and reboot. Shared Memory Pinning: ====================== vmo command options (AIX 5.2 & above): -r Apply changes at the system boot -o v_pinshm=1 Shared memory segments are "pinned" in the sense that the allocated pages cannot be swapped out of memory. Memory Affinity: ================ vmo command options (AIX 5.2 & above): -r Apply changes at the system boot -o memory_affinity=1 Enable the VMM to restrict the memory frames attached to the executing MCM Note that the system needs to be rebooted to activate the Memory Affinity feature, i.e. "bosboot -a; shutdown -r" as described above, and the Large Page, Shared Memory Pinning, and Memory Affinity options can be used together. Memory Affinity is active by default in AIX Version 5.2 5765-H62 (05/2003) and above. In all cases the "MEMORY_AFFINITY" environment variable, defined below, needs to be set for the job that is running. AIX Environment Variables: ========================== LDR_CNTRL=LARGE_PAGE_DATA=M Asserts that there are sufficient large-pages available for program data, allowing them to be allocated on first reference, instead of allocating all of them at load time. MEMORY_AFFINITY=MCM Turn on Memory Affinity which has been enabled with the vmo command. MALLOCMULTIHEAP=1 Maintains multiple heaps in the process, for servicing simultaneous "malloc" requests. OMP_DYNAMIC=FALSE Disables dynamic adjustment of the number of available threads. OMP_NUM_THREADS=... The exact number of threads available to be used, or if OMP_DYNAMIC is TRUE, the upper limit on the number of available threads. XLFRTEOPTS=NAMELIST=OLD Allows a newly compiled program to read the namelist from a binary compiled with the older namelist format. XLFRTEOPTS=intrinthds={num_threads} Specifies the number of threads for parallel execution for parallel execution of the MATMUL and RANDOM_NUMBER intrinsic procedures. The default value for num_threads when using the MATMUL intrinsic equals the number of processors online. The default value for num_threads when using the RANDOM_NUMBER intrinsic is equal to the number of processors online*2. Changing the number of threads available to the MATMUL and RANDOM_NUMBER intrinsic procedures can influence performance. XLSMPOPTS A list of runtime settings affecting SMP execution. Here are some of the possibilities: SCHEDULE=STATIC Work is scheduled to threads round-robin. SPINS=0 Allows work-requests to spin indefinitely without the thread having to yield the time-slice. STACK=.... Specifies the largest allowable size of a thread's stack, in bytes. YIELDS=0 Allows the thread to yield an indefinite number of times without being driven into a sleep state. STARTPROC=0 When assigning threads to CPU's, begin with thread 0 on CPU 0. STRIDE=X When assigning the next thread to a CPU, add X to the current CPU index instead of using (CPU+1). System & Process Management: ============================ The following commands are used to bind processes to processors in SPEC/CPU runs. The SPEC/CPU harness uses the $SPECUSERNUM variable to enumerate the different processes in a rate-run; in the text of the SPEC/CPU config-file, this is expressed as "\$SPECUSERNUM" in order for the variable-name to be evaluated at runtime. bindprocessor X Y AIX command, binding process X to CPU Y. smtctl -m on -w boot AIX commands enabling & disabling SMT (Simultaneous smtctl -m off -w boot Multi-Threading) which allows a single CPU core to process multiple execution threads simultaneously. These forms of the command must be followed by a "bosboot -a" command and a "shutdown -r" reboot. drmgr -r -c cpu AIX command, deallocating one processor from the Operating System partition so it is not available for computation. The processors are reallocated on system reboot.