OMP2012 Flag Description
Supermicro SuperServer SYS-221H-TN24R (INTEL XEON PLATINUM 8592+)

Copyright © 2012 Intel Corporation. All Rights Reserved.


Base Compiler Invocation

C benchmarks

C++ benchmarks

Fortran benchmarks


Base Portability Flags

350.md

357.bt331

363.swim

367.imagick


Base Optimization Flags

C benchmarks

C++ benchmarks

Fortran benchmarks


Implicitly Included Flags

This section contains descriptions of flags that were included implicitly by other flags, but which do not have a permanent home at SPEC.


Shell, Environment, and Other Software Settings

This result has been formatted using multiple flags files. The "sw environment" from each of them appears next.


Sw environment from Supermicro-ic2022.linux64-oneAPI

SPEC OMP2012 Flag Description for the Intel(R) C/C++ Compiler for IA32 and Intel 64 applications and Intel(R) Fortran Compiler for IA32 and Intel 64 applications

Open MP Tuning Flags

  • KMP_AFFINITY

    The KMP_AFFINITY environment variable uses the following general syntax:

    Syntax

    KMP_AFFINITY=[<modifier>,...]<type>[,<permute>][,<offset>]

    For example, to list a machine topology map, specify KMP_AFFINITY=verbose,none to use a modifier of verbose and a type of none.

    The following table describes the supported specific arguments.

    Argument

    Default

    Description

    modifier

    noverbose

    respect

    granularity=core

    Optional. String consisting of keyword and specifier.

    • granularity=<specifier>
      takes the following specifiers: fine, thread, and core

    • norespect

    • noverbose

    • nowarnings

    • proclist={<proc-list>}

    • respect

    • verbose

    • warnings

    type

    none

    Required string. Indicates the thread affinity to use.

    • compact

    • disabled

    • explicit

    • none

    • scatter

    • logical (deprecated; instead use compact, but omit any permute value)

    • physical (deprecated; instead use scatter, possibly with an offset value)

    The logical and physical types are deprecated but supported for backward compatibility.

    permute

    0

    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

    offset

    0

    Optional. Positive integer value. Not valid with type values of explicit, none, or disabled.

    Affinity Types

    Type is the only required argument.

    type = none (default)

    Does not bind OpenMP threads to particular thread contexts; however, if the operating system supports affinity, the compiler still uses the OpenMP thread affinity interface to determine machine topology. Specify KMP_AFFINITY=verbose,none to list a machine topology map.

    type = compact

    Specifying compact assigns the OpenMP thread <n>+1 to a free thread context as close as possible to the thread context where the <n> OpenMP thread was placed. For example, in a topology map, the nearer a node is to the root, the more significance the node has when sorting the threads.

    type = disabled

    Specifying disabled completely disables the thread affinity interfaces. This forces the OpenMP run-time library to behave as if the affinity interface was not supported by the operating system. This includes the low-level API interfaces such as kmp_set_affinity and kmp_get_affinity, which have no effect and will return a nonzero error code.

    type = explicit

    Specifying explicit assigns OpenMP threads to a list of OS proc IDs that have been explicitly specified by using the proclist= modifier, which is required for this affinity type.

    type = scatter

    Specifying scatter distributes the threads as evenly as possible across the entire system. scatter is the opposite of compact; so the leaves of the node are most significant when sorting through the machine topology map.

    Deprecated Types: logical and physical

    Types logical and physical are deprecated and may become unsupported in a future release. Both are supported for backward compatibility.

    For logical and physical affinity types, a single trailing integer is interpreted as an offset specifier instead of a permute specifier. In contrast, with compact and scatter types, a single trailing integer is interpreted as a permute specifier.

    Specifying logical assigns OpenMP threads to consecutive logical processors, which are also called hardware thread contexts. The type is equivalent to compact, except that the permute specifier is not allowed. Thus, KMP_AFFINITY=logical,n is equivalent to KMP_AFFINITY=compact,0,n  (this equivalence is true regardless of the whether or not a  granularity=fine modifier is present).

    Permute and offset combinations

    For both compact and scatter, permute and offset are allowed; however, if you specify only one integer, the compiler interprets the value as a permute specifier. Both permute and offset default to 0.  

    The permute specifier controls which levels are most significant when sorting the machine topology map. A value for permute forces the mappings to make the specified number of most significant levels of the sort the least significant, and it inverts the order of significance. The root node of the tree is not considered a separate level for the sort operations.

    The offset specifier indicates the starting position for thread assignment.

    Modifier Values for Affinity Types

    Modifiers are optional arguments that precede type. If you do not specify a modifier, the noverbose, respect, and granularity=core modifiers are used automatically.

    Modifiers are interpreted in order from left to right, and can negate each other. For example, specifying KMP_AFFINITY=verbose,noverbose,scatter is therefore equivalent to setting KMP_AFFINITY=noverbose,scatter, or just KMP_AFFINITY=scatter.

    modifier = noverbose (default)

    Does not print verbose messages.

    modifier = verbose

    Prints messages concerning the supported affinity. The messages include information about the number of packages, number of cores in each package, number of thread contexts for each core, and OpenMP thread bindings to physical thread contexts.

    Information about binding OpenMP threads to physical thread contexts is indirectly shown in the form of the mappings between hardware thread contexts and the operating system (OS) processor (proc) IDs. The affinity mask for each OpenMP thread is printed as a set of OS processor IDs.

  • KMP_LIBRARY

    KMP_LIBRARY = { throughput | turnaround | serial }, Selects the OpenMP run-time library execution mode. The options for the variable value are throughput, turnaround, and serial.

    Execution modes

    The compiler with OpenMP enables you to run an application under different execution modes that can be specified at run time. The libraries support the serial, turnaround, and throughput modes.

    Serial

    The serial mode forces parallel applications to run on a single processor.

    Turnaround

    In a dedicated (batch or single user) parallel environment where all processors are exclusively allocated to the program for its entire run, it is most important to effectively utilize all of the processors all of the time. The turnaround mode is designed to keep active all of the processors involved in the parallel computation in order to minimize the execution time of a single job. In this mode, the worker threads actively wait for more parallel work, without yielding to other threads.

    Avoid over-allocating system resources. This occurs if either too many threads have been specified, or if too few processors are available at run time. If system resources are over-allocated, this mode will cause poor performance. The throughput mode should be used instead if this occurs.

    Throughput

    In a multi-user environment where the load on the parallel machine is not constant or where the job stream is not predictable, it may be better to design and tune for throughput. This minimizes the total time to run multiple jobs simultaneously. In this mode, the worker threads will yield to other threads while waiting for more parallel work.

    The throughput mode is designed to make the program aware of its environment (that is, the system load) and to adjust its resource usage to produce efficient execution in a dynamic environment. This mode is the default.

  • KMP_BLOCKTIME

    KMP_BLOCKTIME = value. Sets the time, in milliseconds, that a thread should wait, after completing the execution of a parallel region, before sleeping.Use the optional character suffixes: s (seconds), m (minutes), h (hours), or d (days) to specify the units.Specify infinite for an unlimited wait time.

  • KMP_STACKSIZE

    KMP_STACKSIZE = value. Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16m. Use the optional suffixes: b (bytes), k (kilobytes), m (megabytes), g (gigabytes), or t (terabytes) to specify the units. This variable does not affect the native operating system threads created by the user program nor the thread executing the sequential part of an OpenMP* program or parallel programs created using -parallel.

  • OMP_NUM_THREADS

    Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel. Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8

  • OMP_DYNAMIC

    OMP_DYNAMIC={ 1 | 0 } Enables (1, true) or disables (0,false) the dynamic adjustment of the number of threads.

  • OMP_SCHEDULE

    OMP_SCHEDULE={ type,[chunk size]} Controls the scheduling of the for-loop work-sharing construct. type can be either of static,dynamic,guided,runtime chunk size should be positive integer

  • OMP_NESTED

    OMP_NESTED={ 1 | 0 } Enables creation of new teams in case of nested parallel regions (1,true) or serializes (0,false) all nested parallel regions. Default is 0.


  • Sw environment from Supermicro-Platform-Settings-V1.2-SPR-revF

    SPEC CPU2017 Platform Settings for Supermicro Systems

    Operating System Tuning settings

    One or more of the following settings may have been applied to the testbed. If so, the "Platform Notes" section of the report will say so; and you can read below to find out more about what these settings mean.

    LD_LIBRARY_PATH=<directories> (linker)
    LD_LIBRARY_PATH controls the search order for both the compile-time and run-time linkers. Usually, it can be defaulted; but testers may sometimes choose to explicitly set it (as documented in the notes in the submission), in order to ensure that the correct versions of libraries are picked up.

    STACKSIZE=<n>
    Set the size of the stack (temporary storage area) for each slave thread of a multithreaded program.

    ulimit -s <n>
    Sets the stack size to n kbytes, or "unlimited" to allow the stack size to grow without limit.


    Operating System Tuning Parameters

    Transparent Hugepages (THP)
    THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages. It is designed to hide much of the complexity in using huge pages from system administrators and developers. Huge pages increase the memory page size from 4 kilobytes to 2 megabytes. This provides significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Most recent Linux OS releases have THP enabled by default.
    THP usage is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/enabled.
    Possible values: THP creation is controlled by the sysfs setting /sys/kernel/mm/transparent_hugepage/defrag.
    Possible values: An application that "always" requests THP often can benefit from waiting for an allocation until those huge pages can be assembled.
    For more information see the Linux transparent hugepage documentation.
    CPUFreq scaling governor:
    Governors are power schemes for the CPU. It is in-kernel pre-configured power schemes for the CPU and allows you to change the clock speed of the CPUs on the fly. On Linux systems can set the govenor for all CPUs through the cpupower utility with the following command: Below are governors in the Linux kernel:
    tuned-adm:
    A commandline interface for switching between different tuning profiles available in supported Linux distributions. The distribution provided profiles are located in /usr/lib/tuned and the user defined profiles in /etc/tuned. To set a profile, one can issue the command "tuned-adm profile (profile_name)".
    Below are details about some relevant profiles:
    drop_caches
    Writing this will cause kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free. Set through "sysctl -w vm.drop_caches=3" to free slab objects and pagecache.

    Firmware / BIOS / Microcode Settings

    Hyper-Threading [ALL]: (Default="Enable")
    Enabled for Windows and Linux (OS optimized for Hyper-Threading Technology) and Disabled for other OS (OS not optimized for Hyper-Threading Technology). When Disabled only one thread per enabled core is enabled.
    Intel Virtualization Technology: (Default = "Enable")
    When enabled, a VMM can utilize the additional hardware capabilities provided by Vanderpool Technology.
    LLC Prefetch: (Default = "Disable")
    The LLC prefetcher is an additional prefetch mechanism on top of the existing prefetchers that prefetch data into the core Data Cache Unit (DCU) and Mid-Level Cache (MLC). Enabling LLC prefetch gives the core prefetcher the ability to prefetch data directly into the LLC without necessarily filling into the MLC.
    DCU IP Prefetcher: (Default = "Enable")
    This L1-cache prefetcher looks for sequential load history (based on the Instruction Pointer of previous loads) and attempts on this basis to determine the next data to be expected and, if necessary, to prefetch this data from the L2 cache or the main memory into the L1 cache.
    DCU Streamer Prefetcher: (Default = "Enable")
    This prefetcher is a L1 data cache prefetcher, which detects multiple loads from the same cache line done within a time limit, in order to then prefetch the next line from the L2 cache or the main memory into the L1 cache based on the assumption that the next cache line will also be needed.
    Power Technology: (Default = "Custom")
    The options are Disable, Energy Efficient, and Custom. Switch processor power management features. If value "Custom" is set, Customer can define the values of all power management setup items. Select Energy Efficient to support power-saving mode. Select Custom to customize system power settings. Select Disabled to disable power-saving settings.
    Power Performance Tuning: (Default = "OS Controls EPB")
    Allows the OS or BIOS to control the Energy Performance Bias.
    Available options are:
    ENERGY_PERF_BIAS_CFG mode (Energy Performance Bias Setting): (Default = "Balanced Performance")
    This BIOS option allows for processor performance and power optmization.
    Available options are:
    CPU C6 Report: (Default = "Auto")
    Controls the BIOS to report the CPU C6 State (ACPI C3) to the operating system. During the CPU C6 State, the power to all cache is turned off.
    Available options are:
    Enhanced Halt State (C1E): (Default = "Enable")
    Power saving feature where, when enabled, idle processor cores will halt.
    Hardware P-states: (Default = "Disable")
    The Hardware P-State setting allows the user to select between OS and hardware-controlled P-states. Selecting Native Mode allows the OS to choose a P-state. Selecting Out of Band Mode allows the hardware to autonomously choose a P-state without OS guidance. Selecting Native Mode with No Legacy Support functions as Native Mode with no support for older hardware.
    SNC: (Default = "Auto")
    Sub-NUMA Clusters (SNC) is a feature that provides similar localization benefits as Cluster-On-Die (COD), without some of COD's downsides. SNC breaks up the LLC into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC.
    KTI Prefetch: (Default = "Auto")
    When KTI Prefetch is set to Enable, the Ultra-Path Interconnect (UPI) Prefetcher will allow the memory read from a remote socket to start earlier, in an effort to reduce latency. Available options are "Auto", "Disable" and "Enable".
    Stale AtoS: (Default = "Auto")
    The in-memory directory has three states: I, A, and S. I (invalid) state means the data is clean and does not exist in any other socket's cache. The A (snoopAll) state means the data may exist in another socket in exclusive or modified state. S (Shared) state means the data is clean and may be shared across one or more socket's caches. When doing a read to memory, if the directory line is in the A state we must snoop all the other sockets because another socket may have the line in modified state. If this is the case, the snoop will return the modified data. However, it may be the case that a line is read in A state and all the snoops come back a miss. This can happen if another socket read the line earlier and then silently dropped it from its cache without modifying it.
    Available options are:
    LLC Dead Line Alloc: (Default = "Enable")
    In the Skylake-SP non-inclusive cache scheme, MLC evictions are filled into the LLC. When lines are evicted from the MLC, the core can flag them as "dead" (i.e., not likely to be read again). The LLC has the option to drop dead lines and not fill them in the LLC. If the LLC Dead Line Alloc feature is disabled, dead lines will always be dropped and will never fill into the LLC. This can help save space in the LLC and prevent the LLC from evicting useful data. However, if the LLC Dead Line Alloc feature is enabled, the LLC can opportunistically fill dead lines into the LLC if there is free space available. Available options are "Auto", "Enable" and "Disable".
    Enforce DDR Memory Frequency POR: (Default = "POR")
    Set to POR enforce Plan Of Record restrictions for DDR5 frequency and voltage programming. Memory speeds will be capped at Intel guidelines. Disabling allows user selection of additional supported memory speeds. Available options are "POR" and "Disable".
    Memory Frequency: (Default = "Auto")
    Set the maximum memory frequency for onboard memory modules. Available options are "Auto", "3200", "3600", "4000", "4400", "4800", "5200", "5600".
    ADDDC Sparing: (Default = "Enabled")
    Adaptive Double Device Data Correction (ADDDC) Sparing detects the predetermined threshold for correctable errors, copying the contents of the failing DIMM to spare memory. The failing DIMM or memory rank will then be disabled.
    Available options are:
    Patrol Scrub: (Default = "Enable at End of POST")
    Enable or disable the ability to proactively search the system memory, repairing correctable errors.
    Turbo Mode: (Default = "Enable")
    Select Enable to allow the CPU to operate at the manufacturer-defined turbo speed by increasing CPU clock frequency. This feature is available when it is supported by the processors used in the system. The options are Disable and Enable.
    NUMA: (Default = "Enabled")
    Use this feature to enable Non-Uniform Memory Access (NUMA) to enhance system performance. The options are Disabled and Enabled.
    UMA-Based Clustering: (Default = "Quadrant (4-clusters")
    When this feature is set to Hemisphere, Uniform Memory Access (UMA)-based clustering will support 2-cluster configuration for system performance enhancement. The options are Disabled (All2All), Hemisphere (2-clusters), and Quadrant (4-clusters).

    Flag description origin markings:

    [user] Indicates that the flag description came from the user flags file.
    [suite] Indicates that the flag description came from the suite-wide flags file.
    [benchmark] Indicates that the flag description came from a per-benchmark flags file.

    The flags files that were used to format this result can be browsed at
    http://www.spec.org/omp2012/flags/Supermicro-ic2022.linux64-oneAPI.html,
    http://www.spec.org/omp2012/flags/Supermicro-Platform-Settings-V1.2-SPR-revF.html.

    You can also download the XML flags sources by saving the following links:
    http://www.spec.org/omp2012/flags/Supermicro-ic2022.linux64-oneAPI.xml,
    http://www.spec.org/omp2012/flags/Supermicro-Platform-Settings-V1.2-SPR-revF.xml.


    For questions about the meanings of these flags, please contact the tester.
    For other inquiries, please contact webmaster@spec.org
    Copyright 2012-2023 Standard Performance Evaluation Corporation
    Tested with SPEC OMP2012 v1.1.
    Report generated on Thu Dec 14 09:35:33 2023 by SPEC OMP2012 flags formatter v538.