Sets the stack size to n kbytes, or unlimited to allow the stack size to grow without limit.
Launching a process with numactl --interleave=all sets the memory interleave policy so that memory will be allocated using round robin on nodes. When memory cannot be allocated on the current interleave target fall back to other nodes.
The command "echo 1> /proc/sys/vm/drop_caches" is used to free up the filesystem page cache.
For multi-copy runs or single copy runs on systems with multiple sockets, it is advantageous to bind a process to a particular core. Otherwise, the OS may arbitrarily move your process from one core to another. This can effect performance. To help, SPEC allows the use of a "submit" command where users can specify a utility to use to bind processes. We have found the utility 'numactl' to be the best choice.
numactl runs processes with a specific NUMA scheduling or memory placement policy. The policy is set for a command and inherited by all of its children. The numactl flag "--physcpubind" specifies which core(s) to bind the process. "-l" instructs numactl to keep a process memory on the local node while "-m" specifies which node(s) to place a process memory. For full details on using numactl, please refer to your Linux documentation, 'man numactl'
In order to take advantage of large pages, your system must be configured to use large pages. To configure your system for huge pages perform the following steps:
Create a mount point for the huge pages: "mkdir /mnt/hugepages" The huge page file system needs to be mounted when the systems reboots. Add the following to a system boot configuration file before any services are started: "mount -t hugetlbfs nodev /mnt/hugepages" Set vm/nr_hugepages=N in your /etc/sysctl.conf file where N is the maximum number of pages the system may allocate. Reboot to have the changes take effect.(Not necessary on some operating systems like RedHat Enterprise Linux 5.5.
Note that further information about huge pages may be found in your Linux documentation file: /usr/src/linux/Documentation/vm/hugetlbpage.txt
Transparent Huge Pages
On RedHat EL 6 and later, Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads. If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead. Hugepages are used by default if /sys/kernel/mm/redhat_transparent_hugepage/enabled is set to always
Set this environment variable to "yes" to enable applications to use large pages.
Setting this environment variable is necessary to enable applications to use large pages.
Specify stack size to be allocated for each thread.
KMP_AFFINITY = < physical | logical >, starting-core-id specifies the static mapping of user threads to physical cores. For example, if you have a system configured with 8 cores, OMP_NUM_THREADS=8 and KMP_AFFINITY=physical,0 then thread 0 will mapped to core 0, thread 1 will be mapped to core 1, and so on in a round-robin fashion. KMP_AFFINITY = granularity=fine,scatter The value for the environment variable KMP_AFFINITY affects how the threads from an auto-parallelized program are scheduled across processors. Specifying granularity=fine selects the finest granularity level, causes each OpenMP thread to be bound to a single thread context. This ensures that there is only one thread per core on cores supporting HyperThreading Technology Specifying scatter distributes the threads as evenly as possible across the entire system. Hence a combination of these two options, will spread the threads evenly across sockets, with one thread per physical core.
Sets the maximum number of threads to use for OpenMP* parallel regions if no other value is specified in the application. This environment variable applies to both -openmp and -parallel (Linux and Mac OS X) or /Qopenmp and /Qparallel (Windows). Example syntax on a Linux system with 8 cores: export OMP_NUM_THREADS=8
Enabling this option allows the processor cores to automatically increase its frequency and increasing performance if it is running below power, temperature.
Enabling this option allows to use processor resources more efficiently, enabling multiple threads to run on each core and increases processor throughput, improving overall performance on threaded software.
Enabling this option allows the system to dynamically adjust processor voltage and core frequency. This technology can result in decreased average power consumption and decreased average heat production.
This option Specifies the number of logical processor cores that can run on the server. This option sets he state of logical processor cores in a package. If you disable this setting, Hyper Threading is also disabled.
If the processor uses Intel Virtualization Technology, which allows a platform to run multiple operating systems and applications in independent partitions. Users should disabled this option for performing application benchmarking.
Enabling this option allows processors to increase I/O performance by placing data from I/O devices directly into the processor cache. This setting helps to reduce cache misses.
This BIOS option enables to configure the CPU power management settings such as Enhance Intel Speedstep technology, Intel Turbo Boost technology and Processor Power State C6. Settings in Custom will allows to change the CPU Power management settings. Settings in Energy Efficient will determine the best settings for the BIOS parameters. Settings in Disabled state does not perform any CPU power management and any settings for the BIOS paramaters.
Enabling this option allows the processor to transition to its minimum frequency upon entering C1. This setting does not take effect until after you have rebooted the server. In disabled state, the CPU continues to run at its maximum frequency in C1 state. Users should disabled this option for performing application benchmarking.
Enabling this option allows the processor to send the C6 report to the Operating system. Users should disabled this option for performing application benchmarking.
This BIOS option allows you to determine whether system Performance or energy efficiency is more important on server. This can be one of the following: Balanced Energy, Balanced Performance, Energy Efficient and Performance.
This BIOS option allows the enabling/disabling of a processor mechanism in 3 modes Enterprise, High-Throughput and HPC. Setting this BIOS option in Enterprise and High-throughput mode, will enable all the prefetchers and disables Data Reuse technology. Setting this BIOS option in HPC mode, will enable all the prefetchers and enables Data Reuse technology.
This BIOS option controls the DIMM power savings mode policy. Setting this BIOS option in Disabled, DIMMs do not enter power saving mode. Setting this BIOS option in Slow, DIMMs can enter power saving mode, but the requirements are higher. Therefore, DIMMs enter power saving mode less frequently. Setting this BIOS option in Fast, DIMMs enter power saving mode as often as possible. Setting this BIOS option in Auto, BIOS controls when a DIMM enters power saving mode based on the DIMM configuration.
This BIOS option allows the enabling/disabling of a memory operations. Setting this BIOS option in Power-saving-mode, will prioritizes low voltage memory operations over high frequency memory operations. This mode may lower memory frequency in order to keep the voltage low. Setting this BIOS option in Performance-mode, will prioritizes high frequency operations over low voltage operations.
This BIOS option allows to enable/disable temperature-based memory throttling feature. By default this BIOS option is enabled. By enabling this BIOS option, the system BIOS will intiate memory throttling to manage memory performane by limiting bandwith to the DIMMs, therefore capping the power consumption and preventing the DIMMs from overheating.
This BIOS option allows to configure memory reliability, availability and serviceability (RAS). Setting this BIOS option in maximum performance, system performance is optimized Setting this BIOS option in mirroring, system reliability is optimized by using half the system memory as backup. Setting this BIOS option in lockstep, If the DIMM pairs in the server have an identical type, size, and organization and are populated across the SMI channels, you can enable lockstep mode to minimize memory access latency and provide better performance. Setting this BIOS option in sparing, System reliability is enhanced with a degree of memory redundancy while making more memory available to the operating system than mirrorin
This option controls the refresh interval rate for internal memory. By default, the refresh interval rate set as Auto, which is 2X DRAM refresh for every 32ns. Setting this BIOS option in 1X, DRAM cells are refreshed every 64ns.
There are 4 snoop mode options for how to maintain cache coherency across the Intel QPI fabric, each with varying memory latency and bandwidth characteristics depending on how the snoop traffic is generated.
Cluster on Die (COD) mode logically splits a socket into 2 NUMA domains that are exposed to the OS with half the amount of cores and LLC assigned to each NUMA domain in a socket. This mode utilizes an on-die directory cache and in memory directory bits to determine whether a snoop needs to be sent. Use this mode for highly NUMA optimized workloads to get the lowest local memory latency and highest local memory bandwidth for NUMA workloads.
Home Directory Snoop with OSB is the Opportunistic Snoop Broadcast (OSB) directory mode, the HA could choose to do speculative home snoop broadcast under very lightly loaded conditions even before the directory information has been collected and checked.
In Home Snoop and Early Snoop modes, snoops are always sent , they just originate from different places: the caching agent (earlier) in Early Snoop mode and the home agent (later) in Home Snoop mode.
Enabling this option allows the chipset to defer memory transactions and process them out of order for optimal performance.
When running multiple copies of benchmarks, the SPEC config file feature submit is sometimes used to cause individual jobs to be bound to specific processors. This specific submit command is used for Linux. The description of the elements of the command are:
/usr/bin/taskset [options] [mask] [pid | command [arg] ... ] :This perl script is used to ensure that for a system with N cores the first N/2 benchmark copies are bound to a core that does not share its L2 cache with any of the other copies. The script does this by retrieving and using CPU data from /proc/cpuinfo. Note this script will only work for 6-core CPUs.