IBM Linux Flag Disclosure SPEC CPU2000 & OMP2001 Last Revised 17 October, 2006 Source Level Portability Options ================================ -DHOST_WORDS_BIG_ENDIAN (176.gcc) Host system is big-endian. -DLINUX_PPC32 (186.crafty) Sets some basic parameters like endian-ess, OS type, and ANSI language extensions to be compatible with a Linux system. -DHAS_ERRLIST (252.eon) Tells that the system provides the "sys_nerr" and "sys_errlist[]" variables in 252.eon. -DSPEC_CPU2000_LINUX_PPC32 (253.perlbmk) Compile the SPEC CPU2000 modified perl for a Linux system. -DSPEC_CPU2000_NEED_BOOL (253.perlbmk) Use SPEC provided definition of the boolean type -DSPEC_CPU2000_LP64 (252.eon and 253.perlbmk - peak) Allow compilation of program in 64-bit mode. -DSYS_STRING_H (254.gap) Do not explicitly include string.h -DSYS_IS_USG (254.gap) Tells that the operating system is USG compliant -DSYS_HAS_IOTCL_PROTO (254.gap) Do not explicitly declare ioctl() -DSYS_HAS_CALLOC_PROTO (254.gap) Do not supply a prototype for calloc(). -DHAVE_SIGNED_CHAR (300.twolf) System allows signed char type. Compiler Invocation =================== xlc Invokes the compiler for C source files with a default language level of ansi and specifies that it allow type-based aliasing. cc Invokes the compiler for C source files with a default language of extended and specifies that it provide compatibility with older IBM compilers and allow placement of string literals or constant values in read/write storage. cc does not conform to the ISO/ANSI C standard. xlC Invokes the compiler for C++ source files with a default language level of ansi and specifies that it allow type-based aliasing. xlc_r The same as "xlc" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf Invokes the compiler for Fortran source files with a default language of Fortran 77. xlf_r The same as "xlf" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. xlf90 Invokes the compiler for Fortran source files with a default language of Fortran 90. xlf90_r The same as "xlf90" except that it generates a threadsafe executable, compliant with the POSIX pthreads API. cleanpdf Erase the information in the PDF directory if any exists to ensure no feedback information is reused between compilations. Compiler Options ================ -ma Use built-in alloca() function. -O Performs optimizations that the compiler developers considered the best combination for compilation speed and runtime performance. -O3 Perform some memory and compile time intensive optimizations in addition to those executed with -O. The -O3 specific optimizations have the potential to slightly alter the semantics of a user's program. Optimizations may include, but are not limited to: Aggressive code motion, and scheduling on computations that have the potential to raise an exception, but no valid exceptions wil be suppressed; Relaxed conformance to IEEE rules in cases where the difference in the results is not important to an application; Rewriting of floating point expressions. -O4 Equivalent to -O3 -qipa -qhot with automatic generation of architecture ( -qarch= )and tuning ( -qtune= )options ideal for that platform. The qipa level defaults to level=1. -O5 Equivalent to -O3 -qipa=level=2 -qhot with automatic generation of architecture ( -qarch= ) and tuning ( -qtune= ) options ideal for that platform. -Q, -qinline The -Q option without any list inlines all appropriate procedures, subject to limits on the number of inlined calls and the amount of code size increase as a result. -qinline is an alias for -Q. -Q=xxx Inline all functions that contain less than xxx lines of abstract code units. -q64 Selects 64-bit compiler mode. -q32 Selects 32-bit compiler mode. -qalign=natural The compiler maps structure members to their natural boundaries. -qalign=linuxppc The compiler maps structure members to their natural boundaries for default GNU C/C++ alignment rules -qansialias Use type-based aliasing during optimization -qarch=ppc Produces object code containing instructions that will run on any of the 32-bit PowerPC hardware platforms. -qarch=pwr3 Produces object code containing instructions that will run on power3 processors. -qarch=pwr4 Produces object code containing instructions that will run on power4/power4+ processors. -qarch=pwr5 Produces object code containing instructions that will run on power5 processors. -qarch=pwr5x Produces object code containing instructions that will run on power5+ processors. -qarch=rs64b Produces object code containing instructions that will run on RS64-II processors. -qarch=auto Produces object code containing instructions that will run on the hardware platform on which the program is compiled. -qdatalocal Changes the default to assume that all variables ar local. -qessl Specifies that, if either -lessl or -lesslsmp are also specified, then Engineering and Scientific Subroutine Library (ESSL) routines should be used in place of some Fortran 90 intrinsic procedures when there is a safe opportunity to do so. -qlibessl Specifies that all functions whose names match ESSL library- functions are, in fact, the library functions. -qfdpr Collect information about programs for use with the AIX fdpr (Feedback Directed Program Restructuring) performance-tuning utility. -qfixed Indicates that the input source program is in fixed form. Allows fixed format Fortran 77 programs to be compiled using the xlf90 compiler invocation. -qfixed= States that Fortran code is in fixed source form, with optional argument specifying the maximum line length. -qfloat=rsqrt Changes a division by the result of a square root operation into a multiply by the reciprocal of the square root. -qhot Perform high-order transformations on loops during optimization. -qhot=arraypad Pad the sizes of arrays to align better in cache. -qipa=level=1 Turns on interprocedural analysis with inlining, limited alias analysis, and limited call-site tailoring. This is the default level of -qipa. -qipa=level=2 Turns on interprocedural analysis with inlining, cloning, full alias analysis, constant propagation, call-site tailoring, and dead code removal. -qipa=noobject Do not generate object files during the first stage of inter- procedural analysis. -qinline Alias for -Q. See -Q. -qipa=partition=large Specifies the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize. -qipa=threads=n Tells the compiler it can use n threads during interprocedure analysis and code generation. -qlanglvl=ansi Compilation conforms to the ANSI standard. -qlargepage Indicates that a program, designed to execute in a large page memory environment, can take advantage of large 16 MB pages provided on POWER4 or better CPUs. -qmaxmem=-1 Allows the compiler to use as much memory as it needs to execute. -qpdf1/pdf2 Profile directed feedback optimization -qsave Sets the default storage class for local variables to STATIC. -qsmp=omp Enable OpenMP parallelization directives. -qstaticlink Objects generated with the compiler option will link only with static libraries. -qsuffix=f=f90 Sets the suffix for source files to be .f90. The .f90 suffix is required by xlf90 to compile Fortran 90 programs. -qtune=604 Instruction selection, scheduling, and other implementation dependent performance enhancements for the PowerPC 604/604e processor. -qtune=pwr3 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power3 processor. -qtune=pwr4 Instruction selection, scheduling, and other implementation dependent performance enhancements for the Power4/Power4+ processors. -qtune=pwr5 Instruction selection, scheduling, and other implementation dependent performance enhancements for the POWER5 processors. -qtune=pwr5x Instruction selection, scheduling, and other implementation dependent performance enhancements for the POWER5+ processors. -qtune=rs64b Instruction selection, scheduling, and other implementation dependent performance enhancements for the RS64-II processor. -qtune=auto Instruction selection, scheduling, and other implemention dependent performance enhancements for the hardware platform on which the program is compiled. -qunroll=n Unrolls inner loops in th program by a factor of n. -w Suppress warning messages from the C, C++, and Fortran compilers. Linker Options ============== -B/usr/share/libhugetlbfs/ -tl -Wl,--hugetlbfs-link=BDT Enables the usage of the libhugetlbfs "ld" script in place of normal linker. BDT will link the application to store text, initialized data, and bss data into hugepages. libhugetlbfs is installed with make install PREFIX=/usr -lessl Link the Engineering and Scientifc Subroutine Library (ESSL). -lmass Link the mathematical acceleration subsystem libraries (MASS), which contain libraries of tuned mathematical intrinsic functions. -lpdf Routines used in the first pass of the profile directed feedback process. Routines from this library are not used in building the final executable. In newer compilers, -qpdf1 does this automatically, so using this in conjunction with -qpdf1 is redundant. Linux Environment Variables =========================== HUGETLB_MORECORE=yes Enables the libhugetlbfs functionality hugepage malloc() feature, instructing libhugetlbfs to override libc's normal morecore() function with a hugepage version and use it for malloc(). From sourceforge libhugetlbfs Version 1 product. LD_PRELOAD=libhugetlbfs.so This tells the dynamic linker to load the libhugetlbfs shared library, even though the program wasn't originally linked against it. Enables HUGETLB_MORECORE processing. MALLOCMULTIHEAP=1 Maintains multiple heaps in the process, for servicing simultaneous "malloc" requests. OMP_DYNAMIC=FALSE Disables dynamic adjustment of the number of available threads. OMP_NUM_THREADS=... The exact number of threads available to be used, or if OMP_DYNAMIC is TRUE, the upper limit on the number of available threads. XLFRTEOPTS=NAMELIST=OLD Allows a newly compiled program to read the namelist from a binary compiled with the older namelist format. XLFRTEOPTS=intrinthds={num_threads} Specifies the number of threads for parallel execution for parallel execution of the MATMUL and RANDOM_NUMBER intrinsic procedures. The default value for num_threads when using the MATMUL intrinsic equals the number of processors online. The default value for num_threads when using the RANDOM_NUMBER intrinsic is equal to the number of processors online*2. Changing the number of threads available to the MATMUL and RANDOM_NUMBER intrinsic procedures can influence performance. XLSMPOPTS A list of runtime settings affecting SMP execution. Here are some of the possibilities: SCHEDULE=STATIC Work is scheduled to threads round-robin. SPINS=0 Allows work-requests to spin indefinitely without the thread having to yield the time-slice. STACK=.... Specifies the largest allowable size of a thread's stack, in bytes. YIELDS=0 Allows the thread to yield an indefinite number of times without being driven into a sleep state. STARTPROC=0 When assigning threads to CPU's, begin with thread 0 on CPU 0. STRIDE=X When assigning the next thread to a CPU, add X to the current CPU index instead of using (CPU+1). Stack Size Information: ======================= Stack size set to unlimited using the command "ulimit -s unlimited".