SPEC CPU2006 Platform Settings for SGI Intel-SNB-2S-based systems
- dplace -c cpulist -r bl $command
- dplace is a tool for binding processes to cpus
Here is a brief description of options used in the config file:
- -c cpulist: pin processes to the specified comma-separated list or range of cpus. These are logical cpus, relative to the enclosing cpuset.
- -r bl: specifies that text should be replicated on the NUMA node or nodes where the process is running. 'b' indicates that binary (a.out) text should be replicated; 'l' indicates that library text should be replicated.
- For full details on using dplace, please refer to your Linux documentation, 'man dplace'.
- Transparent Huge Pages
- On SLES11 SP2 and later, Transparent Hugepages increase the memory page size from 4 kilobytes to 2 megabytes. Transparent Hugepages provide significant performance advantages on systems with highly contended resources and large memory workloads.
If memory utilization is too high or memory is badly fragmented which prevents hugepages being allocated, the kernel will assign smaller 4k pages instead.
Hugepages are used by default unless the /sys/kernel/mm/transparent_hugepage/enabled field is changed from its default of 'always' to 'never' or 'madvise'.
It's also possible to limit defrag efforts in the VM to generate hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular pages unless hugepages are immediately available.
Clearly if we spend CPU time to defrag memory, we would expect to gain even more by the fact we use hugepages later instead of regular pages.
This isn't always guaranteed, but it may be more likely in case the allocation is for a MADV_HUGEPAGE region.
- echo always >/sys/kernel/mm/transparent_hugepage/defrag
- echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
- echo never >/sys/kernel/mm/transparent_hugepage/defrag
- sysctl vm.numa_zonelist_order=N
- This sysctl is only for NUMA.
'where the memory is allocated from' is controlled by zonelists.
In NUMA case, you can think of following 2 types of order.
Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL
- Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
- Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
Type(1) offers the best locality for processes on Node(0), but ZONE_DMA
will be used before ZONE_NORMAL exhaustion. This increases possibility of
out-of-memory(OOM) of ZONE_DMA because ZONE_DMA is tend to be small.
Type(2) cannot offer the best locality but is more robust against OOM of
the DMA zone.
Type(1) is called as "Node" order. Type (2) is "Zone" order.
"Node order" orders the zonelists by node, then by zone within each node.
Specify "[Nn]ode" for node order
"Zone Order" orders the zonelists by zone type, then by node within each
zone. Specify "[Zz]one" for zone order.
Specify "[Dd]efault" to request automatic configuration. Autoconfiguration
will select "node" order in following case.
- if the DMA zone does not exist or
- if the DMA zone comprises greater than 50% of the available memory or
- if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough.
The above material is excerpted from Linux kernel file
Documentation/sysctl/vm.txt, copyright Rik van Riel and
Peter W. Morreale.
- Hardware Prefetch:
-
This BIOS option allows the enabling/disabling of a processor mechanism
to prefetch data into the cache according to a pattern-recognition algorithm
In some cases, setting this option to Disabled may improve performance.
Users should only disable this option after performing application benchmarking
to verify improved performance in their environment.
- Adjacent Sector Prefetch:
-
This BIOS option allows the enabling/disabling of a processor mechanism
to fetch the adjacent cache line within a 128-byte sector that contains the
data needed due to a cache line miss.
In some cases, setting this option to Disabled may improve performance.
Users should only disable this option after performing application benchmarking
to verify improved performance in their environment.
- High Bandwidth:
- Enabling this option allows the chipset to defer memory transactions and
process them out of order for optimal performance.