SPEChpc™ 2021 Benchmark Suites
Run and Reporting Rules
SPEC High Performance Group

ABSTRACT
This document provides guidelines required to build,
run, and report on the SPEChpc 2021 benchmark suite.


Version 1.0
July 2021


Abbreviated Contents

1. Philosophy

1.1 A SPEChpc 2021 Benchmark Suites Result Is An Observation

1.2 A Published SPEChpc 2021 Benchmark Suites Result Is a Declaration of Expected Performance

1.3 A SPEChpc 2021 Benchmark Suites Result is a Claim About Maturity of Performance Methods

1.4 Peak and base builds and runs

1.5 Estimates

1.6 About SPEC

1.7 Compliance and Compatibility Commitments

1.8 Usage of the Philosophy Section

2. Building SPEChpc 2021 Benchmark Suites

2.1 Build Procedures

2.2 General Rules for Selecting Compilation Flags

2.3 Base Optimization Rules

2.4 Peak Optimization Rules

3. Running SPEChpc 2021 Benchmark Suites

3.1 System Configuration

3.2 Controlling Benchmark Jobs

3.3 Run-time environment

3.4 Continuous Run Requirement

3.5 Base, peak, and basepeak

3.6 Run-Time Dynamic Optimization

4. Results Disclosure

4.1 Rules regarding availability dates and systems not yet shipped

4.2 Configuration Disclosure

4.3 Test Results Disclosure

4.4 Required Disclosures

4.5 Research and Academic usage of SPEChpc 2021 Benchmark Suites

4.6 Fair Use

5. Run Rule Exceptions

(Click on an item number above, to go to the detailed contents about that item.)



Detailed Contents

1. Philosophy

1.1. A SPEChpc 2021 Benchmark Suites Result Is An Observation

1.1.1. Test Methods

1.1.2. Conditions of Observation

1.1.3. Assumptions About the Tester

1.1.4. A SPEChpc 2021 Benchmark Suites Result is a measurement using MPI and, optionally, a node level model as the parallel paradigm

1.2. A Published SPEChpc 2021 Benchmark Suites Result Is a Declaration of Expected Performance

1.2.1. Reproducibility

1.2.2. Obtaining Components

1.2.2.1 Hardware, Operating System and Compilers

1.2.2.2 MPI Implementations

1.2.2.3 Interconnect Software Layer

1.3. A SPEChpc 2021 Benchmark Suites Result is a Claim About Maturity of Performance Methods

1.4. Peak and base builds and runs

1.5. Estimates

1.6. About SPEC

1.6.1. Publication on SPEC's web site is encouraged

1.6.2. Publication on SPEC's web site is not required

1.6.3. SPEC May Require New Tests

1.6.4. SPEC May Adapt the Suites

1.7. Compliance and Compatibility Commitments

1.7.1. 32- and 64- Bit Systems

1.7.2. Target Languages

1.7.3. Supported Operating Systems

1.7.4. MPI Standards

1.7.5. Node-level Parallel Model Standards

1.8. Usage of the Philosophy Section

2. Building SPEChpc 2021 Benchmark Suites

2.1. Build Procedures

2.1.1 SPEC's tools must be used

2.1.2 The runhpc build environment

2.1.3 Continuous Build requirement

2.1.4 Changes to the runhpc build environment

2.1.5 Cross-compilation allowed

2.1.6 Individual builds allowed

2.1.7 Tester's assertion of equivalence between build types

2.2 General Rules for Selecting Compilation Flags

2.2.1 Must not use names

2.2.2 Limitations on substitution flags

2.2.3 Limitations on size changes

2.2.4 Portability Flags

2.2.5 Source Code Modification for Portability

2.2.6 Feedback-directed optimization is NOT allowed

2.3 Base Optimization Rules

2.3.1 Safety and Standards Conformance

2.3.2 Same for all

2.3.2.1 Same for all benchmarks of a given language

2.3.2.2 Same MPI Version

2.3.2.3 Node level parallel model selection

2.3.4 Base build environment

2.3.4 Assertion flags must NOT be used in base

2.3.5 Floating point reordering allowed

2.3.6 Alignment switches are allowed

2.4 Peak Optimization Rules

2.4.1 Safety and Standards Conformance

2.4.2 Peak Tuning per Benchmark

2.4.2.1 Individual Benchmark Tuning

2.4.2.2 Node level parallel model selection

2.4.3 Peak build environment

2.4.4 Assertion flags may be used in peak

2.4.5 Directive Modifications

3. Running SPEChpc 2021 Benchmark Suites

3.1. System Configuration

3.1.1 Operating System State

3.1.2 File Systems and File Servers

3.1.3 Interconnects for MPI, Memory and File System Communication

3.2. Controlling Benchmark Jobs

3.2.1 Number of runs in a reportable result

3.2.2 Number of ranks in base

3.2.3 Number of ranks in peak

3.2.4 The ranks variable

3.2.5 Thread selection in base

3.2.6 Thread selection in peak

3.2.7 The threads variable

3.2.8 The submit directive

3.2.9 MPI program startup/launch

3.3. Run-time environment

3.4. Continuous Run Requirement

3.5. Base, peak, and basepeak

3.6. Run-Time Dynamic Optimization

3.6.1 Definitions and Background

3.6.2 RDO Is Allowed, Subject to Certain Conditions

3.6.3 RDO Disclosure and Resources

3.6.4 RDO Settings Cannot Be Changed At Run-time

3.6.5 RDO and safety in base

3.6.6 RDO carry-over by program is not allowed

4. Results Disclosure

4.1. Rules regarding availability dates and systems not yet shipped

4.1.1 Pre-production software can be used

4.1.2 Software component names

4.1.3 Specifying dates

4.1.4 If dates are not met

4.1.5 Performance changes for pre-production systems

4.2. Configuration Disclosure

4.2.1 Identification of System, Manufacturer and Tester

4.2.2 Node Configuration

4.2.3 Accelerator Configuration

4.2.4 Adaptor Configurations

4.2.5 Interconnect Configuration

4.2.6 Software Configuration

4.2.7 Tuning Configuration

4.2.8 Description of Portability and Tuning Options ("Flags File")

4.2.9 Configuration Disclosure for User Built Systems

4.3. Test Results Disclosure

4.3.1 SPEChpc 2021 Benchmark Suites Performance Metrics

4.3.2 Metric Selection

4.3.3 Estimates are allowed

4.3.4 Performance changes for production systems

4.4. Required Disclosures

4.5. Research and Academic usage of HPC 2021

4.6. Fair Use

5. Run Rule Exceptions



1. Philosophy

This section is an overview of the purpose, definitions, methods, and assumptions for the SPEChpc 2021 benchmark suites run rules. The purpose of the SPEChpc 2021 benchmark suites and its run rules is to further the cause of fair and objective benchmarking of high-performance computing systems. The rules help ensure that published results are meaningful, comparable to other results, and reproducible. SPEC believes that the user community benefits from an objective series of tests which serve as a common reference.

Per the SPEC license agreement, all SPEChpc benchmark suite results disclosed in public -- whether in writing or in verbal form -- must adhere to the Run and Reporting Rules, or be clearly described as estimates.

A published SPEChpc 2021 Benchmark Suites result means three things:

  1. A performance observation;
  2. A declaration of expected performance; and
  3. A claim about maturity of performance methods.

1.1 A SPEChpc 2021 Benchmark Suites Result Is An Observation

A published SPEChpc 2021 benchmark suites result is an empirical report of performance observed when carrying out certain computation- and communication-intensive tasks.

1.1.1 Test Methods

SPEC supplies the benchmarks in the form of source code, which testers are not allowed to modify except under certain very restricted circumstances. The SPEChpc 2021 benchmark suites includes 9 benchmarks, organized into a sequence of suites:

SPEChpc 2021 Tiny Benchmark Suite
SPEChpc 2021 Small Benchmark Suite
SPEChpc 2021 Medium Benchmark Suite
SPEChpc 2021 Large Benchmark Suite

The tester supplies compilers, MPI libraries, and the System Under Test (SUT). In addition, the tester provides a config file, where appropriate optimization flags are set, as well as, where they are needed, portability flags. SPEC provides example config files in the config subtree as well as documentation on how to create a config file in Docs/config.html. SPEC supplies tools which automatically:

  1. archive the user-selected configuration file,
  2. generate Makefiles which are used to compile the benchmarks,
  3. run all the benchmarks in the suite in a single continuous run,
  4. validate each benchmark output to ensure that the benchmark generated acceptable outputs,
  5. pick the median across the benchmark runs,
  6. and compute metrics, such as SPEChpc 2021-Sml_base, SPEChpc 2021-Med_peak.

1.1.2 Conditions of Observation

The report that certain performance has been observed is meaningful only if the conditions of observation are stated. SPEC therefore requires that a published result include a description of all performance-relevant conditions.

1.1.3 Assumptions About the Tester

It is assumed that the tester:

  1. is willing to describe the observation and its conditions clearly;
  2. is able to learn how to operate the SUT in ways that comply with the rules in this document, for example by selecting compilation options that meet SPEC's requirements;
  3. knows the SUT better than those who have only indirect contact with it;
  4. is honest: SPEC/HPG does not employ an independent auditor process, though it does have requirements for reproducibility and does encourage use of a peer review process.

The person who actually carries out the test is, therefore, the first and the most important audience for these run rules. The rules attempt to help the tester by trying to be clear about what is and what is not allowed.

1.1.4 A SPEChpc 2021 Benchmark Suite Result is a measurement using MPI and, optionally, a node level model as the parallel paradigm

The intention of the four benchmark suites is to measure performance of applications using the Message Passing Interface (MPI) as the means to implement parallel computation. Optionally, hybrid node level parallel models, in addition to MPI, such as explicit thread-level parallelism, or offloading computation to an accelerator device, may be used.

1.2 A Published SPEChpc 2021 Benchmark Suite Result Is a Declaration of Expected Performance

A published SPEChpc 2021 benchmark suite result is a declaration that the observed level of performance can be obtained by others. Such declarations are widely used by vendors in their marketing literature, and are expected to be meaningful to ordinary customers.

1.2.1 Reproducibility

It is expected that later testers can obtain a copy of the SPEChpc 2021 Benchmark Suites, obtain the components described in the original result, and reproduce the claimed performance, within a small range to allow for run-to-run variation.

1.2.2 Obtaining Components

Therefore, it is expected that the components used in a published result can in fact be obtained, with the level of quality commonly expected for products sold to ordinary customers. Different components are subject to different standards, described below:

1.2.2.1 Hardware, Operating System and Compilers

Subcomponents are required to:

  1. be specified using customer-recognizable names,
  2. be generally available within certain time frames,
  3. provide documentation,
  4. provide an option for customer support,
  5. be of production quality, and
  6. provide a suitable environment for programming.

The judgment of whether a component meets the above list may sometimes pose difficulty, and various references are given in these rules to provide guidelines for such judgment. By way of introduction, imagine a vendor-internal version of a compiler, designated only by an internal code name, unavailable to customers, which frequently generates incorrect code. Such a compiler would fail to provide a suitable environment for general programming, and would not be ready for use in a result.

1.2.2.2 MPI Implementations

Such message-passing interface components are required to:

  1. be generally available within certain time frames,
  2. provide documentation,
  3. provide an option for customer support,
  4. be of production quality, and
  5. be obtained from a vendor, or through an industry or academic consortium which has collaborated to provide a proprietary or Open Source form of the product.

The judgment of whether a component meets the above list may sometimes pose difficulty, and various references are given in these rules as guidelines for such judgment. By way of introduction, imagine a university produced MPI implementation, designated only by an internal code name, unavailable for general download, which frequently causes non-SPEC programs to terminate abnormally. Such an MPI implementation would fail to provide a suitable environment for general programming, and would not be ready for use in a result.

The MPI implementation used is expected to be safe, and it is expected that system or software vendors would endorse the general use of this implementation by customers who seek to achieve good application performance.

1.2.2.3 Interconnect Software Layer

Such components are required to:

  1. be installed on the system using supported and reproducible methods,
  2. be generally available within certain time frames,
  3. provide documentation,
  4. provide an option for customer support, and
  5. be of production quality

The judgment of whether a component meets the above list may sometimes pose difficulty, and various references are given in these rules to guidelines for such judgment. For example, suppose you had available a vendor-internal version of an Infiniband driver, designated only by an internal code name, unavailable to customers, which frequently generates message delivery failures. Such a driver/library would fail to provide a suitable environment for general applications, and would not be ready for use in a result.

1.3 A SPEChpc 2021 Benchmark Suites Result is a Claim About Maturity of Performance Methods

A published SPEChpc 2021 benchmark suites result carries an implicit claim that the performance methods it employs are more than just "prototype" or "experimental" or "research" methods; it is a claim that there is a certain level of maturity and general applicability in its methods. Unless clearly described as an estimate, a published result is a claim that the performance methods employed (whether hardware or software, compiler, MPI library, or other):

  1. Generate correct code for a class of programs larger than the included benchmarks,
  2. Improve performance for a class of programs larger than the included benchmarks,
  3. Are recommended by the vendor for a specified class of programs larger than the included benchmarks,
  4. Are generally available, documented, supported, and
  5. If used as part of base, are safe. (Refer to section 2.3.1 on safety issues).

SPEC is aware of the importance of optimizations in producing the best performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks, versus optimizations that exclusively target the SPEC benchmarks. However, with the list above, SPEC wants to increase awareness of implementers and end users to issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.

The tester must describe the performance methods that are used in terms that a performance-aware user can follow, so that users can understand how the performance was obtained and can determine whether the methods may be applicable to their own applications. The tester must be able to make a credible public claim that a class of applications in the real world may benefit from these methods.

1.4 Peak and base builds and runs

"Peak" metrics may be produced by building each benchmark in a suite with a set of optimizations and/or node level parallel models individually selected for that benchmark, and running them with environment settings individually selected for that benchmark. The optimizations selected must adhere to the set of general benchmark optimization rules described in section 2.2 below. This may also be referred to as "aggressive compilation".

"Base" metrics must be produced by building all the benchmarks in a suite with a common set of optimizations, a node level parallel model, and running them with environment settings common to all the benchmarks in the suite. In addition to the general benchmark optimization rules (section 2.2), base optimizations must adhere to a stricter set of rules described in section 2.3.

These additional rules serve to form a "baseline" of performance that can be obtained with a single set of compiler switches, single-pass make process, and a high degree of portability, safety, and performance.

  1. The choice of a single set of switches is intended to reflect the performance that may be attained by a user who is interested in performance, but who prefers not to invest the time required for tuning of individual programs.
  2. SPEC allows base builds to assume that the program follows the relevant language standard (i.e. it is portable). But this assumption may be made only where it does not interfere with getting the expected answer. For all testing, SPEC requires that benchmark outputs match an expected set of outputs, typically within a benchmark-defined tolerance to allow for implementation differences among systems.

    Because the benchmarks are drawn from real applications, some of them use popular practices that compilers must commonly cater to, even if those practices are non compliant with the language standard. In particular, some of the programs (and, therefore, all of base) may have to be compiled with settings that do not exploit all optimization possibilities that would be possible for programs with perfect standards compliance.
  3. In base, the compiler may not make unsafe assumptions that are more aggressive than what the language standard allows.
  4. In base, a single node level parallel model may be selected.
  5. Finally, though, as performance suites, SPEC/HPG has throughout its history allowed certain common optimizations to nevertheless be included in base, such as reordering of operands in accordance with algebraic identities.

Rules for building the benchmarks are described in section 2.

1.5 Estimates

SPEChpc 2021 benchmark suites metrics may be estimated. All estimates must be clearly designated as such.

This philosophy section has described how a "result" has certain characteristics: e.g. a result is an empirical report of performance, includes a full disclosure of performance-relevant conditions, can be reproduced, uses mature performance methods. By contrast, estimates may fail to provide one or even all of these characteristics.

Nevertheless, estimates have long been seen as valuable for SPEC benchmarks. Estimates are set at inception of a new chip design and are tracked carefully through analytic, simulation, and HDL (Hardware Description Language) models. They are validated against prototype hardware and, eventually, production hardware. With chip designs taking years, and requiring very large investments, estimates are central to corporate roadmaps. Such roadmaps may compare SPEChpc estimates for several generations of processors, and, explicitly or by implication, contrast one company's products and plans with another's.

SPEC wants these benchmarks to be useful, and part of that usefulness is allowing the metrics to be estimated.

The key philosophical point is simply that estimates must be clearly distinguished from results.

1.6 About SPEC

1.6.1 Publication on SPEC's web site is encouraged

SPEC encourages the review of results by the SPEC/HPG committee, and subsequent publication on SPEC's web site (http://www.spec.org/hpc2021). SPEC uses a peer-review process prior to publication, in order to improve consistency in the understanding, application, and interpretation of these run rules.

1.6.2 Publication on SPEC's web site is not required

Review by SPEC is not required. Testers may publish rule-compliant results independently. No matter where published, all results publicly disclosed must adhere to the SPEC Run and Reporting Rules, or be clearly marked as estimates. SPEC may take action if the rules are not followed.

1.6.3 SPEC May Require New Tests

In cases where it appears that the run rules have not been followed, SPEC may investigate such a claim and require that a result be regenerated, or may require that the tester correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation).

1.6.4 SPEC May Adapt the Suites

The SPEC High Performance Group reserves the right to adapt the SPEChpc 2021 benchmark suites as it deems necessary to preserve its goal of fair benchmarking. Such adaptations might include (but are not limited to) removing benchmarks, modifying codes or workloads, adapting metrics, republishing old results adapted to a new metric, or requiring retesting by the original tester.

1.7 Compliance and Compatibility Commitments

1.7.1 32 and 64- Bit Systems

While previous SPEC/HPG suites have been able to be run as 32-bit binaries, the SPEChpc 2021 benchmark suites have not been tested, nor expected to be able to run in 32-bits.

1.7.2 Target Languages

The SPEChpc 2021 benchmark suites benchmarks are written in Fortran 2008, C11 and C++14. If benchmarks fail due to non-compliance with the appropriate Language Standard, the SPEC/HPG committee will be inclined to approve performance-neutral source-code changes.

1.7.3 Supported Operating Systems

The SPEChpc 2021 benchmark suites have been tested on Linux/UNIX systems. It is not our intent to exclude them from working on other platforms. The burden of porting the benchmarks and tools to other operating systems is likely to fall on you, however, if you decide to submit results. Section 5 provides for exceptional cases where the standard run-rules cannot be followed.

1.7.4 MPI Standards

The SPEChpc 2021 benchmark suites are written to comply with the MPI 3.0 Standard. If benchmarks fail due to non-compliance with the MPI Standard, the SPEC/HPG committee will be inclined to approve performance-neutral source-code changes. In cases where the library is non-compliant or imposes some fundamental limitation, such as the number of processes or communicators, the SPEC/HPG Committee is inclined to advocate fixing the library rather than accept changes to the benchmark source.

1.7.5 Node-level Parallel Model Standards

The SPEChpc 2021 benchmark suites include support for the OpenMP and OpenACC node-level parallel models.

OpenACC is written to comply with the OpenACC 2.6 Standard.

OpenMP is written to comply with the OpenMP 5.0 Standard. However, the benchmarks include two separate ports, "Thread" and "Target". The "Thread" ports are written for thread or task based parallelism. The "Target" ports also include newer features such as the "target", "distribute", and "map" directives, which allow for, but not required to, offloading to accelerator devices.

If benchmarks fail due to non-compliance with these standards, the SPEC/HPG committee will be inclined to approve performance-neutral source-code changes.

1.8 Usage of the Philosophy Section

This philosophy section is intended to introduce concepts of fair benchmarking. It is understood that in some cases, this section uses terms that may require judgment, or which may lack specificity. For more specific requirements, please see the sections below.

In case of a conflict between this philosophy section and a run rule in one of the sections below, normally the run rule found below takes priority.

Nevertheless, there are several conditions under which questions should be resolved by reference to this section: (a) self-conflict: if rules below are found to impose incompatible requirements; (b) ambiguity: if they are unclear or silent with respect to a question that affects how a result is obtained, published, or interpreted; (c) obsolescence: if the rules below are made obsolete by changing technical circumstances or by directives from superior entities within SPEC.

When questions arise as to interpretation of the run rules:

  1. Interested parties should seek first to resolve questions based on the rules as written in the sections that follow. If this is not practical (because of problems of contradiction, ambiguity, or obsolescence), then the principles of the philosophy section should be used to resolve the issue.
  2. The SPEC/HPG committee should be notified of the issue. Contact information may be found via the SPEC web site, www.spec.org.
  3. SPEC may choose to declare a ruling on the issue at hand, and may choose to amend the rules to avoid future such issues.


2. Building the SPEChpc 2021 Benchmark Suite

SPEC has adopted a set of rules defining how SPEChpc 2021 benchmark suites must be built and run to produce peak and base metrics.

2.1 Build Procedures

2.1.1 SPEC's tools must be used

With the release of SPEChpc 2021 benchmark suites set of tools based on GNU Make and Perl5 are supplied to build and run the benchmarks. To produce publication-quality results, these SPEC tools must be used. This helps ensure reproducibility of results by requiring that all individual benchmarks in the suites are run in the same way and that a configuration file is available that defines the optimizations used.

The primary tool is called runhpc. It is described in the runhpc documentation in the Docs subdirectory of the SPEC root directory -- in a Bourne shell that would be called ${SPEC}/Docs/.

Some Fortran programs need to be preprocessed, for example to choose variable sizes depending on whether -DSPEC_OPENMP has been set. Fortran preprocessing must be done using the SPEC-supplied preprocessor, even if the vendor's compiler has its own preprocessor. The runhpc tool will automatically enforce this requirement by invoking the SPEC preprocessor.

SPEC supplies pre-compiled versions of the tools for a variety of platforms. If a new platform is used, please see tools-build[.html] in the Docs directories for information on how to build the tools, and how to obtain approval for them. SPEC's approval is required for the tools build, so a log must be generated during the build.

For more complex ways of compilation, SPEC has provided hooks in the tools so that such compilation and execution is possible (see the tools documentation for details). If, for some reason, building and running with the SPEC tools does not work for your environment, the test sponsor may ask for permission to use performance-neutral alternatives (see section 5).

2.1.2 The runhpc build environment

When runhpc is used to build the SPEChpc 2021 benchmark suites, anything that contributes to performance must be disclosed (section 4) and must meet the usual general availability tests, as described in the Philosophy (section 1): supported, documented, product quality, recommended, and so forth. These requirements apply to all aspects of the build environment, including but not limited to:

  1. The operating system and any tuning thereof.
  2. Performance-enhancing software, firmware, or hardware.
  3. Resource management.
  4. Environment variables

2.1.3 Continuous Build requirement

For a reportable run, a suite of benchmarks are compiled (for example, 9 benchmarks for the Small Suite). Optional peak tuning doubles the number of compiles (making 22 for the example of the Small suite). If a result is made public, then it must be possible for new testers to use its config file to compile all the benchmarks (both base and peak, if peak was used) in a single invocation of runhpc; and obtain executable binaries that are, from a performance point of view, equivalent to the binaries used by the original tester.

Of course, the new tester may need to set up the system to match the original build environment, as described just above (rule 2.1.2) and may need to make minor config file adjustments, e.g. for directory paths.

Note that this rule does not require that the original tester actually build all the benchmarks in a single invocation. Instead, it requires that the tester ensure that nothing would prevent a continuous build. The simplest and least error-prone way to meet this requirement is simply to do a complete build of all the benchmarks in a single invocation of runhpc. Nevertheless, SPEC recognizes that there is a cost to benchmarking and that it may be convenient to build benchmarks individually, perhaps as part of a tuning project.

Here are some examples of practices that this rule prohibits:

2.1.4 Changes to the runhpc build environment

The SPEChpc 2021 benchmark suites base binaries must be built using the environment rules of section 2.1.2, and must not rely upon any changes to the environment during the build.

Note 1: Base cross compilations using multiple hosts are allowed (2.1.5), but the performance of the resulting binaries must not depend upon environmental differences among the hosts. It must be possible to build performance-equivalent base binaries with one set of switches (2.3.1), in one execution of runhpc (2.1.3), on one host, with one environment (2.1.2).

For a peak build, the environment may be changed, subject to the following constraints:

  1. The environment change must be accomplished using the SPEC-provided config file hooks (such as ENV_).
  2. The environment change must be fully disclosed to SPEC (see section 4).
  3. The environment change must not be incompatible with a Continuous Build (see section 2.1.3).
  4. The environment change must be accomplished using simple shell commands. It is not permitted to invoke a more complex entity unless that entity is provided as part of a generally-available software package.
    Examples:

Note 2: Peak cross compilations using multiple hosts are allowed (2.1.5), but the performance of the resulting binaries must not depend upon environmental differences among the hosts. It must be possible to build performance-equivalent peak binaries with one config file, in one execution of runhpc (2.1.3), in the same execution of runhpc that built the base binaries, on one host, starting from the environment used for the base build (2.1.2), and changing that environment only through config file hooks (2.1.4).

2.1.5 Cross-compilation allowed

It is permitted to use more than one host in a cross-compilation. If more than one host is used in a cross-compilation, they must be sufficiently equivalent so as not to violate rule 2.1.3. That is, it must be possible to build the entire suite on a single host and obtain binaries that are equivalent to the binaries produced using multiple hosts.

The purpose of allowing multiple hosts is so that testers can save time when recompiling many programs. Multiple hosts must NOT be used in order to gain performance advantages due to environmental differences among the hosts. In fact, the tester must exercise great care to ensure that any environment differences are performance neutral among the hosts, for example by ensuring that each has the same version of the operating system, the same performance software, the same compilers, and the same libraries. The tester must exercise due diligence to ensure that differences that appear to be performance neutral - such as differing MHz or differing memory amounts on the build hosts - are in fact truly neutral.

Multiple hosts must NOT be used in order to work around system or compiler incompatibilities (e.g. compiling the C benchmarks on a different OS version than the Fortran benchmarks in order to meet the different compilers' respective OS requirements), since that would violate the Continuous Build rule (2.1.3).

2.1.6 Individual builds allowed

It is permitted to build the benchmarks with multiple invocations of runhpc, for example during a tuning effort. But, the executables must be built using a consistent set of software. If a change to the software environment is introduced (for example, installing a new version of the C compiler which is expected to improve the performance of one of the benchmarks), then all affected benchmarks must be rebuilt (in this example, all the C benchmark).

2.1.7 Tester's assertion of equivalence between build types

The previous 4 rules may appear to contradict each other (2.1.3 through 2.1.6), but the key word in 2.1.3 is the word "possible".

Consider the following sequence of events:

  1. A tester has built a complete set of benchmark executable images ("binaries") on her usual host system.
  2. A new SUT arrives for a limited period of time. It has no compilers installed.
  3. A SPEChpc 2021 benchmark suites tree is installed on the SUT, along with the binaries and config file generated on the usual host.
  4. It is learned that performance could be improved if the peak version of 901.sluggard were compiled with -O5 instead of -O4.
  5. On the host system, the tester edits the config file to change to -O5 for 901.sluggard, and issues the command:
    runhpc -c myconfig -D -a build -T peak sluggard 
  6. The tester copies the new binary and config file to the SUT
  7. A complete run is started by issuing the command:
    runhpc -c myconfig -a validate all 
  8. Performance is as expected, and the results are published at SPEC (including the config file).
In this example, the tester is taken to be asserting that the above sequence of events produces binaries that are, from a performance point of view, equivalent to binaries that it would have been possible to build in a single invocation of the tools.

If there is some optimization that can only be applied to individual benchmark builds, but which it is not possible to apply in a continuous build, the optimization must not be used.

Rule 2.1.7 is intended to provide some guidance about the kinds of practices that are reasonable, but the ultimate responsibility for result reproducibility lies with the tester. If the tester is uncertain whether a cross-compile or an individual benchmark build is equivalent to a full build on the SUT, then a full build on the SUT is required (or, in the case of a true cross-compilation which is documented as such, then a single runhpc -a build is required on a single host.) Although full builds add to the cost of benchmarking, in some instances a full build in a single runhpc may be the only way to ensure that results will be reproducible.

2.2 General Rules for Selecting Compilation Flags

The following rules apply to compiler flag selection for the SPEChpc 2021 Benchmark Peak and Base Metrics. Additional rules for Base Metrics follow in section 2.3.

2.2.1 Must not use names

Benchmark source file, variable, and subroutine names must not be used within optimization flags or compiler/build options.

Identifiers used in preprocessor directives to select alternative source code are also forbidden, except for a rule-compliant library substitution (2.2.2), an approved portability flag (2.2.4), or a specifically provided SPEC-approved alternate source (src.alt).

For example, if a benchmark source code uses one of:

#ifdef IDENTIFIER
#ifndef IDENTIFIER
#if defined IDENTIFIER
#if !defined IDENTIFIER

to provide alternative source code under the control of a compiler option such as -DIDENTIFIER, such a switch may not be used unless it meets the criteria of 2.2.2 or 2.2.4.

2.2.2 Limitations on substitution flags

Flags which substitute pre-computed (e.g. library-based) routines for routines defined in the benchmark on the basis of the routine's name must not be used. Exceptions are:

  1. the function alloca. It is permitted to use a flag that substitutes the system's builtin_alloca. Such a flag may be applied to individual benchmarks (in both base and peak).

Note: This rule does not forbid flags that select alternative implementations of library functions defined in an ANSI/ISO language standard. For example, such flags might select an optimized library of these functions, or allow them to be inlined.

2.2.3 Limitations on size changes

Flags that change a data type size to a size different from the default size of the compilation system are not allowed to be used in base builds. Exceptions are:

  1. C long can be set to 32 or greater bits.
  2. Pointer sizes can be set different from the default size.
  3. Fortran REAL and INTEGER can be promoted to 8 bytes.

which are acceptable as portability flags in base builds, and may be used as (or to facilitate) optimizations in peak builds.

2.2.4 Portability Flags

Rule 2.3.2 requires that all benchmarks use the same flags in base. Portability flags are an exception to this rule: they may differ from one benchmark to another, even in base. Such flags are subject to two major requirements:

  1. They must be used via the provided config file PORTABILITY flags (such as CPORTABILITY, FPORTABILITY, etc).
  2. They must be approved by the SPEC/HPG committee.

Exception: Portability flags used to select a node-level parallel model: SPEC_OPENMP, SPEC_OPENMP_TARGET, and SPEC_OPENACC, must be the same for all benchmarks in base.

The initial published results will include a reviewed set of portability flags on several operating systems; later users who propose to apply additional portability flags must prepare a justification for their use.

A proposed portability flag will normally be approved if one of the following conditions holds:

  1. The flag selects a performance-neutral alternate benchmark source, and the benchmark cannot build and execute correctly on the given platform unless the alternate source is selected. (Examples might be flags such as -DHOST_WORDS_BIG_ENDIAN, -DSPEC_HAVE_SIGNED_CHAR).
  2. The flag selects a compiler mode that allows basic parsing of the input source program, and it is not possible to set that flag for all programs of the given language in the suite. (An example might be -fixedform, to select Fortran source code fixed format).
  3. The flag selects features from a certain version of the language, and it is not possible to set that flag for all programs of the given language in the suite. (An example might be -language:c89).
  4. The flag adjusts a resource limit, and it is not possible to set that flag for all programs of the given language in the suite.

A proposed portability flag will normally not be approved unless it is essential in order to successfully build and run the benchmark.

If more than one solution can be used for a problem, the SPEC/HPG committee will review attributes such as precedent from previously published results, performance neutrality, standards compliance, amount of code affected, impact on the expressed original intent of the program, and good coding practices (in rough order of priority).

If a benchmark is discovered to violate the relevant standard, that may or may not be reason for the ommittee to grant a portability flag. If the justification for a portability flag is standards compliance, the tester must include a specific reference to the offending source code module and line number, and a specific reference to the relevant sections of the appropriate standard. The tester should also address impact on the other attributes mentioned in the previous paragraph.

If a given portability problem (within a given language) occurs in multiple places within a suite, then, in base, the same method(s) must be applied to solve all instances of the problem.

2.2.5 Source Code Modification for Portability

While SPEChpc 2021 has been tested on a variety of systems, SPEC acknowledges that new systems may require limited porting efforts to enable successful build of a benchmark. Such changes may be allowed with the following criteria:

2.2.6 Feedback-directed optimization is NOT allowed

Feedback directed optimization is not allowed to be used with the SPEChpc 2021 Benchmark Suites. As examples, the config file directives

PASS1_FFLAGS   =
PASS1_CFLAGS   =
PASS1_CXXFLAGS =
PASS2_FFLAGS   =

fdo_run2  =
fdo_post2 =

from the SPEC CPU 2006 and SPEC OMP 2001 benchmark suites are not allowed to be used.

2.3 Base Optimization Rules

In addition to the rules listed in section 2.2 above, the selection of optimizations to be used to produce SPEChpc 2021 Benchmark Base Metrics includes the following:

2.3.1 Safety and Standards Conformance

The optimizations used are expected to be safe, and it is expected that system or compiler vendors would endorse the general use of these optimizations by customers who seek to achieve good application performance.

The requirements that optimizations be safe, and that they generate correct code for a class of programs larger than the suites themselves (rule 1.4), are normally interpreted as requiring that the system, as used in base, implement the language correctly. "The language" is defined by the appropriate ANSI/ISO standard (C11, Fortran 2008, C++ 14).

The principle of standards conformance is not automatically applied, because SPEC has historically allowed certain exceptions:

  1. Section 2.3.6  allows reordering of arithmetic operands.
  2. SPEC has not insisted on conformance to the C standard in the setting of errno.
  3. SPEC has not dealt with (and does not intend to deal with) language standard violations that are performance neutral.
  4. When a more recent language standard modifies a requirement imposed by an earlier standard, SPEC will also accept systems that adhere to the more recent ANSI/ISO language standard.

Otherwise, a deviation from the standard that is not performance neutral, and that gives the particular implementation a performance advantage over standard-conforming implementations, is considered an indication that the requirements about "safe" and "correct code" optimizations are probably not met. Such a deviation may be a reason for SPEC to find a result not rule-conforming.

If an optimization causes any benchmark to fail to validate, and if the relevant portion of this benchmark's code is within the language standard, then the failure is taken as additional evidence that an optimization is not safe.

Regarding C++: Note that for C++ applications, the standard calls for support of both run-time type information (RTTI) and exception handling. The compiler, as used in base, must enable these.

For example, a compiler enables exception handling by default; it can be turned off with --noexcept. The switch --noexcept is not allowed in base.

For example, a compiler defaults to no run time type information, but allows it to be turned on via --rtti. The switch --rtti must be used in base.

Regarding accuracy: Because language standards generally do not set specific requirements for accuracy, SPEC has also chosen not to do so. Nevertheless:

  1. Optimizations are expected to generate code that provides appropriate accuracy for a class of problems, where that class is larger than the SPEC benchmarks themselves.
  2. Implementations are encouraged to clearly document any accuracy limitations.
  3. Implementations are encouraged to adhere to the principle of "no surprises"; this can be achieved both by predictable algorithms and by documentation.

In cases where the class of appropriate applications appears to be so narrowly drawn as to constitute a "benchmark special", that may be a reason for SPEC to find a result non-conforming.

2.3.2 Same for all

2.3.2.1 Same for all benchmarks of a given language

In base, the same compiler must be used for all modules of a given language within a benchmark suite. Except for portability flags (see 2.2.4 above), all flags or options that affect the transformation process from SPEC-supplied source to completed executable must be the same, including but not limited to:

  1. compiler options
  2. linker options
  3. preprocessor options
  4. libraries, including compiler, runtime, and optional math
  5. flags that set warning levels (typically -w)
  6. Flags that create object files (typically -c, -o).
  7. flags that affect the verbosity level of the compiler driver (typically -v)
  8. language dialect selection switches (e.g. -ansi99, -std)
  9. flags that assert standards compliance by the benchmarks (see 2.4.4, below)
  10. flags that are set at installation time
  11. flags that are set on a system-global basis

All flags must be applied in the same order for all compiles of a given language.

Note that the SPEC tools provide methods to set flags on a per-language basis.

For example, if a tester sets:

default=base:
COPTIMIZE = -O4
FOPTIMIZE = -O5

then the C benchmarks will be compiled with -O4 and the Fortran benchmarks with -O5. (This is legal: there is no requirement to compile C codes with the same optimization level as Fortran codes).

Regarding benchmarks that have been written in more than one language:

In a mixed-language benchmark, the tools automatically compile each source module with the options that have been set for its language.

Continuing the example just above, a benchmark that uses both C and Fortran would have its C modules compiled with -O4 and its Fortran modules with -O5. This, too, is legal.

In order to link an executable for a mixed-language benchmark, the tools need to decide which link options to apply (e.g. those defined in CLD/CLDOPT vs. those in FLD/FLDOPT vs. those in CXXLD/CXXLDOPT). This decision is based on benchmark classifications that were determined during development of the benchmark suites. For reasons of link time library inclusion, the classifications were not made based on percentage of code nor on the language of the main routine; rather, the classifications had been set to either F (for mixed Fortran/C benchmarks) or CXX (for benchmarks that include C++).

Link options must be consistent in a base build. For example, if FLD is set to /usr/opt/advanced/ld for pure Fortran benchmarks, the same setting must be used for any mixed language benchmarks that have been classified, for purpose of linking, as Fortran.

Additional compiler runtime libraries may be added to link individual benchmarks when using mixed languages. For example, if linking with a C compiler with a mixed C and Fortran source base, the Fortran compiler's runtime libraries may be added to the link.

Inter-module optimization and mixed-language benchmarks:

For mixed-language benchmarks, if the compilers have an incompatible inter-module optimization format, flags that require inter-module format compatibility may be dropped from base optimization of mixed-language benchmarks. The same flags must be dropped from all benchmarks that use the same combination of languages. All other base optimization flags for a given language must be retained for the modules of that language.

For example, suppose that a suite has exactly two benchmarks that employ both C and Fortran, namely 997.CFmix1 and 998.CFmix2. A tester uses a C compiler and Fortran compiler that are sufficiently compatible to be able to allow their object modules to be linked together - but not sufficiently compatible to allow inter-module optimization. The C compiler spells its intermodule optimization switch -ifo, and the Fortran compiler spells its switch --intermodule_optimize. In this case, the following would be legal:

default=base:
COPTIMIZE = -fast -O4 -ur=8 -ifo
FOPTIMIZE = --prefetch:all --optimize:5 --intermodule_optimize
FLD=/usr/opt/advanced/ld
FLDOPT=--nocompress --lazyload --intermodule_optimize

997.CFmix1,998.CFmix2=base:
COPTIMIZE = -fast -O4 -ur=8
FOPTIMIZE = --prefetch:all --optimize:5
FLD=/usr/opt/advanced/ld
FLDOPT=--nocompress --lazyload

Following the precedence rules as explained in config.html, the above section specifiers set default tuning for the C and Fortran benchmarks, but the tuning is modified for the two mixed-language benchmarks to remove switches that would have attempted inter-module optimization.

2.3.2.2 Same MPI Version

In base, the same MPI version (MPICH, MVAPICH, OpenMPI, etc.) must be used for all benchmarks in the suite.  Furthermore, the same revision of the software (1.6, 1.5.1, etc.) must also be used for all benchmarks in the suite.  If different MPI libraries must be built to satisfy needs of using multiple compilers, the requirements of same version and same revision of the MPI software must be met. Additionally, the MPI must use the same interconnect software layer (e.g. for MPICH, they must use only one of ch_p4, ch_shmem, etc.).

2.3.2.3 Node level parallel model selection

Users may optionally select a node level parallel model using the appropriate pmodel setting.

For base, the same node level parallel model must be used for all benchmarks where the selected pmodel may be one of the following:

MPIMPI-Only (default)
ACCMPI+OpenACC
OMPMPI+OpenMP using task/thread directives
TGTMPI+OpenMP using target directives

2.3.3 Base build environment

The system environment must not be manipulated during a build of the base binaries and the same environment must be used for all benchmarks. For example, suppose that an environment variable called BIGPAGES can be set to yes or no, and the default is no. The tester must not change the choice during the build or run of the base binaries. See section 2.1.4.

2.3.4 Assertion flags must NOT be used in base

An assertion flag is one that supplies semantic information that the compilation system did not derive from the source statements of the benchmark.

With an assertion flag, the programmer asserts to the compiler that the program has certain nice properties that allow the compiler to apply more aggressive optimization techniques (for example, that there is no aliasing via C pointers). The problem is that there can be legal programs (possibly strange, but still standard-conforming programs) where such a property does not hold. These programs could crash or give incorrect results if an assertion flag is used. This is the reason why such flags are sometimes also called "unsafe flags". Assertion flags should never be applied to a production program without previous careful checks; therefore they must not be used for base.

Exception: a tester is free to turn on a flag that asserts that the benchmark source code complies to the relevant standard (e.g. -ansi_alias). Note, however, that if such a flag is used, it must be applied to all compiles of the given language (C, C++, or Fortran), while still passing SPEC's validation tools with correct answers for all the affected programs.

2.3.5 Floating point reordering allowed

Base results may use flags which affect the numerical accuracy or sensitivity by reordering floating-point operations based on algebraic identities, provided that the result validates.

2.3.6 Alignment switches are allowed

Switches that cause data to be aligned on natural boundaries may be used in base.


2.4 Peak Optimization Rules

In addition to the rules listed in section 2.2 above, the selection of optimizations to be used to produce SPEChpc 2021 Benchmark Peak Metrics includes the following:

2.4.1 Safety and Standards Conformance

More aggressive optimizations may be used in peak and need not be considered safe for general usage provided that the benchmark passes validation. However, optimizations should still be recommended for use under certain conditions by vendors and for use by a wider body of codes than the SPEChpc 2021 benchmark suite.

2.4.2 Peak Tuning per Benchmark

2.4.2.1 Individual Benchmark Tuning

In peak, different compilers may be used for each benchmark. Options that may be tuned per benchmark include:

  1. compiler options
  2. linker options
  3. preprocessor options
  4. libraries, including compiler, runtime, and optional math
  5. flags that set warning levels (typically -w)
  6. Flags that create object files (typically -c, -o).
  7. flags that affect the verbosity level of the compiler driver (typically -v)
  8. language dialect selection switches (e.g. -ansi99, -std)
2.4.2.2 Node level parallel model selection

For peak, different node level parallel models may selected for each individual benchmark using the appropriate pmodel setting. Selected model may be one of the following:

MPIMPI-Only (default)
ACCMPI+OpenACC
OMPMPI+OpenMP using task/thread directives
TGTMPI+OpenMP using target directives

2.4.3 Peak build environment

The system environment may be manipulated during a build of the peak binaries. For example, suppose that an environment variable called BIGPAGES can be set to yes or no, and the default is no. The tester may change the choice during the build/run of the peak binaries via setting the "ENV_" option (See: envars) in the configuration file's section for a particular benchmark.

2.4.4 Assertion flags may be used in peak

Assertion flags, where one that supplies semantic information that the compilation system did not derive from the source statements of the benchmark, may be used in peak provided that the benchmark validates. However, such flags must still adhere to the general rules as described in the rules listed in section 2.2 above.

2.4.5 Directive Modification

Submitters may modify a benchmark's OpenACC or OpenMP directives in peak runs for architecture tuning with the following restrictions:

The alternate source code package must be submitted to SPEC/HPG for review and approval before they can be used in compliant results. The srcalt will be made available for other SPEC license holders to use.


3. Running SPEChpc 2021 Benchmark Suites

3.1 System Configuration

3.1.1 Operating System State

The operating system state (multi-user, single-user, init level N) may be selected by the tester. This state along with any changes in the default configuration of daemon processes or system tuning parameters must be documented in the notes section of the results disclosure.

3.1.2 File Systems and File Servers

SPEChpc 2021 benchmark suites requires that a single file system be used to contain the installed directory tree. Additional file systems may be used to store temporary build and run directories. A single shared run-directory must be used for each benchmark in a base run. Peak runs are allowed to replicate run directories, typically where each node in a cluster stores a private copy in its local file system, and the directories and file systems can be arranged differently for different benchmarks.

SPEC allows any type of file system (disk-based, memory-based, NFS, DFS, FAT, NTFS etc.) to be used. The type and arrangement of the directories and file systems must be disclosed in reported results.

3.1.3 Interconnects for MPI, Memory and File System Communication

A system may use different interconnects to support MPI, Memory, and File System communication operations. For peak runs, these may vary between benchmarks. A system may manage communication in a general enough way as to use different interconnects at different times; this can be used so long as, for base, no special build or runtime settings are made per benchmark.

3.2 Controlling Benchmark Jobs

3.2.1 Number of runs in a reportable result

A reportable run consists of at least of two runs of the suite. The reportable result will be the median of an odd number of runs, or the lower median of an even number of runs. For an even number 2xN of runs, the lower median is the Nth smallest value.

3.2.2 Number of ranks in base

The tester must select a single value to use as the number of ranks to be applied to all benchmarks in the suite.

3.2.3 Number of ranks in peak

The tester is free to choose the number of ranks for each individual benchmark independently of the other benchmarks, and this number may be less than, equal to, or greater than the number of ranks specified for base.

3.2.4 The ranks variable

The config file supplies a variable called ranks, which is read using the form $ranks and is set in the config file using a form

ranks = 32

or on the runhpc command-line using the form

runhpc ... -ranks 32 ...

The ranks variable is captured in the run result file, and is used to report the number of ranks used in the overall result. In a reportable run, this variable must always be set to the number of ranks that are running. When the peak run of a benchmark uses a different number of ranks than base, the ranks variable must be reassigned in the peak rule for that benchmark:

ranks = 32
605.lbm_s=peak=default=default:
ranks=16

Further, in a reportable run, the ranks variable must be used in the config file to explicitly control the number of ranks used in the run, so it can be determined that the number of ranks reported in the result has to be the number that were actually used. Here are two examples:

submit = mpirun -np $ranks $command                   # Specification on the invocation line.

Note that the command-line "-ranks" option will take precedence over the "ranks" variable set in a config file. Hence if using the "ranks" variable in peak, it is advisable to not use the command-line option.

The submit directive is discussed below.

3.2.5 Thread selection in base

When using the OpenMP or OpenACC node-level parallel models targeting multicore CPUs, the tester must select a single value to use as the number of CPU threads to be applied to all benchmarks using these models in base.

3.2.6 Thread selection in peak

When using the OpenMP or OpenACC node-level parallel models targeting multicore CPUs, the tester is free to choose the number of CPU threads for each individual benchmark independently of the other benchmarks, and this number may be less than, equal to, or greater than the number of CPU threads specified for base.

3.2.7 The threads variable

The config file supplies a variable called threads, which is used to set the OpenMP "OMP_NUM_THREADS" and OpenACC "ACC_NUM_CORES" environment variables. The same number of threads will be used with each MPI rank.

threads = 32

or on the runhpc command-line using the form

runhpc ... -threads 32 ...

The threads variable is captured in the run result file, and is used to report the number of threads used in the overall result. In a reportable run, this variable must always be set to the number of threads that are running. When the peak run of a benchmark uses a different number of threads than base, the threads variable must be reassigned in the peak rule for that benchmark:

threads = 32
605.lbm_s=peak=default=default:
threads=16

Note that the command-line "-threads" option will take precedence over the "threads" variable set in a config file. Hence if using the "threads" variable in peak, it is advisable to not use the command-line option.

3.2.8 The submit directive

The config file directive submit is the preferred means to assign work to processors. The tester may, if desired:

  1. place benchmarks on desired processors;
  2. place the benchmark memory on a desired memory unit;
  3. do arithmetic (e.g. via shell commands or scripts) to derive a valid processor number for each rank;
  4. cause the tools to write each copy's benchmark invocation lines to a file, which is then sent to its processor;
  5. reference a testbed description provided by the tester (such as a topology file).

The submit directive can be used to change the run time environment (see section 3.3). In addition, if a testbed description is referenced by a submit directive, the same description must be used by all benchmarks in a base run. This means that in base, the submit directive may only differ between benchmarks in the suite for portability reasons.

In peak, different benchmarks may use different submit directives.

3.2.9 MPI program startup/launch

Typically MPI implementations provide an execution tool (mpirun, mpiexec, or other) that initiates the process creation and placement on nodes. The elapsed time reported for one application in SPEChpc 2021 benchmark suites shall be the time it takes to launch, run the application, and complete the MPI job. No prior knowledge about an upcoming application run started by the 'submit' command can be used to gain a performance advantage or carried over between different application runs in the benchmark suite. Examples of such information are, but are not limited to: the number of MPI ranks to be started, what binaries will be started, and on what specific nodes the applications will run on. This information shall only be used by the job submission, i.e., in the 'submit' command and the use of the information should be part of the elapsed time reported.

SPEChpc 2021 Benchmark Suites allow various options on the mpirun/mpiexec/etc., including but not limited to

  1. number of MPI ranks to launch
  2. command line to affect process placement
  3. interconnect selection
  4. tunable thresholds for interconnects
  5. enablement of communication optimizations
  6. environment variable propagation
  7. enable portability features within an MPI library such as 'spawn' or '1sided'
  8. binding options

For base submissions, the options will be identical for all benchmarks. Peak submissions may vary the options on a per benchmark basis.

3.3 Run-time environment

Run-time environment settings are treated similarly to compilation options. The rules are as follows, from highest precedence to lowest:

  1. Environment settings may be written into the submit line of the config file, i.e.
    submit = export MP_PROCS=$ranks; ....
    
    Settings are hard to discern if too many details are packed into the submit line. One advantage, however, is that changing the settings does not cause anything to rebuild.
  2. Environment settings may be made in the header section of the config file using entries of the form
    preENV_LD_LIBRARY_PATH = /path/to/library/dir/lib
    
    "preENV" environment settings get applied at the start of the invocation of runhpc and are set globally for the duration of the run. "preENV" environment settings may be overridden in peak using the "ENV" setting (see next).
  3. Environment settings may be made in the config file using entries of the form
    env_vars = 1
    ENV_MP_PROCS = $ranks
    
    The settings are quite clear from reading the text of the config file. A disadvantage is that the settings also apply to the build phase, and changing a setting will cause the affected benchmarks to rebuild.
  4. Environment settings may be made outside the invocation of runhpc:
    export MP_PROCS=...
    runhpc ...
    
    The settings are invisible to the automatic report generation and must be carefully documented.
  5. For base runs, environment settings should be consistent for all benchmarks in the suite. If this is not possible, the same rules apply as with portability flags in how the committee will accept them.
  6. Environment settings can be varied between benchmarks in peak runs. Note that you can't do this with type 4 settings made prior to the runhpc invocation.
  7. The semantics of the settings must be documented.

3.4 Continuous Run Requirement

All benchmark executions, including the validation steps, contributing to a particular submittable report must occur continuously, that is, in one execution of runhpc.

3.5 Base, peak, and basepeak

If a submittable report will contain both base and peak measurements, a single runhpc invocation must be used for the runs. When both base and peak are run, the tools run the base executables first, followed by the peak executables.

It is permitted to publish base results as peak. This can be accomplished in various ways, all of which are allowed:

  1. Set basepeak=yes in the config file for individual benchmarks.
In this case, the tools will run the same binary for both base and peak; however, the base times will be reported for both base and peak. (The reason for running the binary during both base and peak is to remove the possibility that skipping a benchmark altogether might somehow change the performance of some other benchmark.)
  1. Set basepeak=yes in the config file for an entire suite.
In this case, the peak runs will be skipped and base results will be reported as both base and peak for the suite.
  1. Select the --basepeak option when using the rawformat utility..
Doing so will cause a new rawfile to be written, with base results copied to peak. It is permitted to use this feature to copy all of the base results to peak, or just the results for selected benchmarks.

Note: It is permitted but not required to compile in the same runhpc invocation as the execution. See rule 2.1.5 regarding cross compilation.

3.6 Run-Time Dynamic Optimization

3.6.1 Definitions and Background

As used in these run rules, the term "run-time dynamic optimization" (RDO) refers broadly to any method by which a system adapts to improve performance of an executing program based upon observation of its behavior as it runs. This is an intentionally broad definition, intended to include techniques such as:

RDO may be under control of hardware, software, or both.

Understood this broadly, RDO is already commonly in use, and usage can be expected to increase. SPEC believes that RDO is useful, and does not wish to prevent its development. Furthermore, SPEC views at least some RDO techniques as appropriate for base, on the grounds that some techniques may require no special settings or user intervention; the system simply learns about the workload and adapts.

However, benchmarking a system that includes RDO presents a challenge. A central idea of SPEC benchmarking is to create tests that are repeatable: if you run a benchmark suite multiple times, it is expected that results will be similar, although there will be a small degree of run-to-run variation. But an adaptive system may recognize the program that it is asked to run, and "carry over" lessons learned in the previous execution; therefore, it might complete a benchmark more quickly each time it is run. Furthermore, unlike in real life, the programs in the benchmark suites are presented with the same inputs each time they are run: value prediction is too easy if the inputs never change. In the extreme case, an adaptive system could be imagined that notices which program is about to run, notices what the inputs are, and which reduces the entire execution to a print statement. In the interest of benchmarking that is both repeatable and representative of real-life usage, it is therefore necessary to place limits on RDO carry-over.

3.6.2 RDO Is Allowed, Subject to Certain Conditions

Run time dynamic optimization is allowed, subject to the usual provisions that the techniques must be generally available, documented, and supported. It is also subject to the conditions listed in the rules immediately following.

3.6.3 RDO Disclosure and Resources

Rule 4.2 applies to run-time dynamic optimization: any settings which the tester has set to non-default values must be disclosed. Resources consumed by RDO must be included in the description of the hardware configuration as used by the benchmark suite.

For example, suppose that a system can be described as a 64-core system. After experimenting for a while, the tester decides that the optimum performance is achieved by dedicating 4 cores to the run-time dynamic optimizer, and running the benchmarks with only 60 ranks. The system under test is still correctly described as a 64-core system, even though only 60 cores were used to run SPEC code.

3.6.4 RDO Settings Cannot Be Changed At Run-time

Run time dynamic optimization is subject to rule 3.4: settings cannot be changed at run-time. But Note 2 of rule 3.4 also applies to RDO: for example, in peak it would be acceptable to compile a subset of the benchmarks with a flag that suggests to the run-time dynamic optimizer that code rearrangement should be attempted. Of course, rule 2.1.1 also would apply: such a flag could not tell RDO which routines to rearrange.

3.6.5 RDO and safety in base

If run-time dynamic optimization is effectively enabled for base (after taking into account the system state at run-time and any compilation flags that interact with the run-time state), then RDO must comply with 2.2.1, the safety rule. It is understood that the safety rule has sometimes required judgment, including deliberation by SPEC in order to determine its applicability. The following is intended as guidance for the tester and for SPEC:

3.6.6 RDO carry-over by program is not allowed

As described in section 3.6.1, SPEC has an interest in preventing carry-over of information from run to run. Specifically, no information may be carried over which identifies the specific program or executable image. Here are some examples of behavior that is, and is not, allowed.

It doesn't matter whether the information is intentionally stored, or just "left over"; if it's about a specific program, it's not allowed:

If information is left over from a previous run that is not associated with a specific program, that is allowed:

Any form of RDO that uses memory about a specific program is forbidden:

The system is allowed to respond to the currently running program, and to the overall workload:



4. Results Disclosure

SPEC requires a full disclosure of results and configuration details sufficient to reproduce the results. For results published on its web site, SPEC also requires that base results be published whenever peak results are published. If peak results are published outside of the SPEC web site (http://www.spec.org/hpc2021/) in a publicly available medium, the tester must supply base results on request. Publication of results under non-disclosure or company internal use or company confidential are not "publicly" available.

A full disclosure of results must include:

  1. The components of the disclosure page, as generated by the SPEC tools.
  2. The tester's configuration file and any supplemental files needed to build the executables used to generate the results.
  3. A flags definition disclosure.

A full disclosure of results must include sufficient information to allow a result to be independently reproduced. If a tester is aware that a configuration choice affects performance, then they must document it in the full disclosure.

Note: this rule is not meant to imply that the tester must describe irrelevant details or provide massively redundant information.

For example, if the SuperHero Model 1 comes with a write-through cache, and the SuperHero Model 2 comes with a write-back cache, then specifying the model number is sufficient, and no additional steps need to be taken to document the cache protocol. But if the Model 3 is available with both write-through and write-back caches, then a full disclosure must specify which cache is used.

For information on how to publish a result on SPEC's web site, contact the SPEC office. Contact information is maintained at the SPEC web site, http://www.spec.org/.

4.1 Rules regarding availability dates and systems not yet shipped

If a tester publishes results for a hardware or software configuration that has not yet shipped,

  1. The component suppliers must have firm plans to make production versions of all components generally available within 90 days of the first public release of the result (whether first published by the tester or by SPEC); and
  2. The tester must specify the general availability dates that are planned.

Note 1: "Generally available" is defined in the SPEC High Performance Group Policy document, which can be found at http://www.spec.org/hpg/policy.html.

Note 2: It is acceptable to test larger configurations than customers are currently ordering, provided that the larger configurations can be ordered and the company is prepared to ship them.

For example, if the SuperHero is available in configurations of 1 to 1000 CPUs, but the largest order received to date is for 128 CPUs, the tester would still be at liberty to test a 1000 CPU configuration and publish the result.

4.1.1 Pre-production software can be used

A "pre-production", "alpha", "beta", or other pre-release version of a compiler (or other software) can be used in a test, provided that the performance-related features of the software are committed for inclusion in the final product.

The tester must practice due diligence to ensure that the tests do not use an uncommitted prototype with no particular shipment plans. An example of due diligence would be a memo from the compiler Project Leader which asserts that the tester's version accurately represents the planned product, and that the product will ship on date X.

The final, production version of all components must be generally available within 90 days after first public release of the result.

4.1.2 Software component names

When specifying a software component name in the results disclosure, the component name that should be used is the name that customers are expected to be able to use to order the component, as best as can be determined by the tester. It is understood that sometimes this may not be known with full accuracy; for example, the tester may believe that the component will be called "TurboUnix V5.1.1" and later find out that it has been renamed "TurboUnix V5.2", or even "Nirvana 1.0". In such cases, an editorial request can be made to update the result after publication.

Some testers may wish to also specify the exact identifier of the version actually used in the test (for example, "build 20070604"). Such additional identifiers may aid in later result reproduction, but are not required; the key point is to include the name that customers will be able to use to order the component.

4.1.3 Specifying dates

The configuration disclosure includes fields for both "Hardware Availability" and "Software Availability". In both cases, the date which must be used is the date of the component which is the last of the respective type to become generally available. The date is specified Mmm-YYYY as in the following examples: Jan-2007, Feb-2007. The Month is abbreviated to three letters with the first letter capitalized. A hyphen separates the Month and Year fields. The Year field is specified with four digits.

Since all components must be available within 90 days of the first public release of the result, the first day of the specified Month (and Year) must fall within this 90 day window.

4.1.4 If dates are not met

If a software or hardware date changes, but still falls within 90 days of first publication, a result page may be updated on request to SPEC.

If a software or hardware date changes to more than 90 days after first publication, the result is considered Non Compliant. For procedures regarding Non Compliant results, see the SPEC High Performance Group Policy Document, http://www.spec.org/hpg/policy.html.

4.1.5 Performance changes for pre-production systems

SPEC is aware that performance results for pre-production systems may sometimes be subject to change, for example when a last-minute bugfix reduces the final performance.

For results measured on pre-production systems, if the tester becomes aware of something that will reduce production system performance by more than 5% on an overall metric, the tester is required to republish the result, and the original result shall be considered Non Compliant.

Analogous rules apply to performance changes across post-production upgrades (Section 4.3.4).

4.2 Configuration Disclosure

The SPEC website posts a set of reports, each consisting of a measured result and the description of the system being measured. A SPEChpc 2021 Benchmark Suite report is broken down into separate sections as follows:

  1. The System Model name, and a description of the Manufacturer and Tester.
  2. The measured Results.
  3. The description(s) of the Node(s).
  4. The description(s) of the Accelerator(s).
  5. The description(s) of the Network(s).
  6. The description of how the Benchmarks were built and run.

The measured Results are captured in the submission file by the runhpc script when a run is made. The other description information is read from the config file and captured in the preamble of the submission file. This preamble can be edited after the run is made, if the information in the config file was incorrect or incomplete. The config file document describes how the fields are to be written. The following subsections describe the information to provide for each piece of the disclosure report, giving examples in some cases.

4.2.1 Identification of System, Manufacturer and Tester

Details are

  1. Model/System Name
  2. System Class: Hetero, Homo, or SMP to differentiate between Heterogeneous or Homogeneous clusters versus Symmetric Multi-Processors.
  3. Test Date: Month, Year
  4. Hardware Vendor
  5. Test sponsor: the entity sponsoring the testing (defaults to hardware vendor).
  6. Tester: the entity actually carrying out the tests (defaults to test sponsor).
  7. HPC 2021 license number of the test sponsor or the tester, for example HPG0001.
  8. Hardware Availability Date: Month, Year
  9. Software Availability Date: Month, Year

4.2.2 Node Configuration

The system will consist of one (in the case of an SMP) or more (in the case of a cluster) nodes. The nodes may be of different types and each type must be described separately. Further, most nodes are used for computation but others may act as file servers or provide other utilities. Both the Hardware and the Software are described:

  1. Number of enabled nodes of this type
  2. Model Name
  3. Hardware Vendor
  4. Purpose: compute (for compute nodes), file server (for file server nodes), head (for head nodes), other (for other types), or combinations of these for multi-purpose nodes.
  5. CPU Name: A manufacturer-determined processor formal name.
  6. CPU Characteristics: Technical characteristics to help identify the processor.
    1. This field must be used to disambiguate which processor is used, unless the CPU is already unambiguously designated by the combination of the fields "CPU Name", "CPU MHz", and "Level (n) Cache".
    2. In addition, SPEC encourages use of this field to make it easier for the reader to identify a processor, even if the processor choice is not, technically, ambiguous.
    3. SPEC does not require that HPC 2021 results be published on the SPEC web site, although such publication is encouraged. For results that are published on its web site, SPEC is likely to use this field to note CPU technical characteristics that SPEC may deem useful for queries, and may adjust its contents from time to time.
    4. Some processor differences may not be relevant to performance, such as differences in packaging, distribution channels, or CPU revision levels that affect a SPEChpc 2021 benchmark suites overall performance metric by less than 5%. In those cases, SPEC does not require disambiguation as to which processor was tested.

    An example may help to clarify these four points:

    For example, when first introduced, the TurboBlaster series is available with only one instruction set, and runs at speeds up to 2GHz. Later, a second instruction set (known as "Arch2") is introduced and older processors are commonly, but informally, referred to as having employed "Arch1", even though they were not sold with that term at the time. Chips with Arch2 are sold at speeds of 2GHz and higher. The manufacturer has chosen to call both Arch1 and Arch2 chips by the same formal chip name (TurboBlaster).

    1. A 2.0GHz TurboBlaster result is published. Since the formal chip name is the same, and since both Arch1 and Arch2 are available at 2.0GHz, the CPU Characteristics field must be used to identify whether this is an Arch1 or Arch2 chip.
    2. A 2.2GHz TurboBlaster result is published. In this case, there is technically no ambiguity, since all 2.2GHz results use Arch2. Nevertheless, the tester is encouraged to note that the chip uses Arch2, to help the reader disambiguate the processors.
    3. As an aid to technical readers doing queries, SPEC may decide to adjust all the TurboBlaster results that have been posted on its website by adding either "Arch1" or "Arch2" to all posted results.
    4. The 2.2GHz TurboBlaster is available in an OEM package and a Consumer package. These are highly similar, although the OEM version has additional testing features for use by OEMs. But these are both 2.2GHz TurboBlasters, with the same cache structure, same instruction set, and, within run-to-run variation, the same HPC 2021 performance. In this case, it is not necessary to specify whether the OEM or Consumer version was tested.
  7. CPU MHz: a numeric value expressed in megahertz. That is, do not say "1.0 GHz", say "1000". The value here is to be the speed at which the CPU is run, even if the chip itself is sold at a different clock rate. That is, if you "over-clock" or "under-clock" the part, disclose here the actual speed used.
  8. Level 1 (primary) Cache: Size, location, number of instances (e.g. "32 KB I + 64 KB D on chip per core").
  9. Level 2 (secondary) Cache: Size, location, number of instances.
  10. Level 3 (tertiary) Cache: Size, location, number of instances.
  11. Other Cache: Size, location, number of instances.
  12. Number of CPUs in the node It is assumed that systems can be described as a cluster of one or more compute "nodes", built from one or more processor "chips", each of which contains some number of "cores", each of which can run some number of hardware "threads". Fields are provided in the results disclosure for each of these. If industry practice evolves such that these terms are no longer sufficient to describe processors, SPEC may adjust the field set.
  13. The current fields, for each node of type YYY, are:

    1. node_YYY_hw_ncores: number of processor cores enabled during this test
    2. node_YYY_hw_nchips: number of processor chips enabled during this test
    3. node_YYY_hw_ncoresperchip: number of cores that are manufactured into a chip
    4. node_YYY_hw_nthreadspercore: number of hardware threads enabled (per core) during this test
    5. node_YYY_count: number of processing nodes of type YYY enabled in the system.

    Regarding the fields in the above list that mention the word "enabled": if a node, chip, core, or thread is available for use during the test, then it must be counted. If one of these resources is disabled - for example by a firmware setting prior to boot - then it need not be counted, but the tester must exercise due diligence to ensure that disabled resources are truly disabled, and not silently giving help to the result.

    Regarding the field (hw_ncoresperchip), the tester must count the cores irrespective of whether they are enabled.

    Example: In the following tests, the SUT is a Turboblaster Model 32-64-256, which contains 8 nodes of 4 chips each. Each chip has 2 cores. Each core can run 4 hardware threads.

    1. A 256-rank SPEChpc 2021-Tnytest uses all the available resources. It is reported as:
      node_Turbo_hw_nnodes:          8
      node_Turbo_hw_ncores:          8
      node_Turbo_hw_nchips:          4
      node_Turbo_hw_ncoresperchip:   2
      node_Turbo_hw_nthreadspercore: 4
      				
    2. The same system is tested with a 32-rank SPEChpc 2021-Tny test, without changing the system configuration. Even though they are now only lightly loaded, all the above resources are still configured into the SUT; therefore the SUT must still be described as:
      node_Turbo_hw_nnodes:          8
      node_Turbo_hw_ncores:          8
      node_Turbo_hw_nchips:          4
      node_Turbo_hw_ncoresperchip:   2
      node_Turbo_hw_nthreadspercore: 4
      		
    3. The system is halted, and firmware commands are entered to disable all but 2 of the nodes. All resources are available on the remaining 8 chips. The system is rebooted and a 32-rank test is run once more. This time, the resources are:
      node_Turbo_hw_nnodes:          2
      node_Turbo_hw_ncores:          8
      node_Turbo_hw_nchips:          4
      node_Turbo_hw_ncoresperchip:   2
      node_Turbo_hw_nthreadspercore: 4
      		
    4. The system is halted, and firmware commands are entered to enable 4 nodes (for a total of 16 chips); but only 1 core is enabled per chip, and hardware threading is turned off. The system is booted, and a 16-rank test is run. The resources this time are:
      node_Turbo_hw_nnodes:          4
      node_Turbo_hw_ncores:          4
      node_Turbo_hw_nchips:          4
      node_Turbo_hw_ncoresperchip:   2
      node_Turbo_hw_nthreadspercore: 1
      		

    Note: if resources are disabled, the method(s) used for such disabling must be documented and supported.

  14. Number of CPUs orderable. Specify the number of processors that can be ordered per node, using whatever units the customer would use when placing an order. If necessary, provide a mapping from that unit to the nodes/chips/cores units just above. For example:
    1 to 8 TurboCabinets. Each TurboCabinet contains 4 chips.
    		
  15. Memory: Size in MB/GB. Performance-relevant information as to the memory configuration must be included, either in the field or in the notes section. If there is one and only one way to configure memory of the stated size, then no additional detail need be disclosed. But if a buyer of the system has choices to make, then the result page must document the choices that were made by the tester.
    For example, the tester may need to document number of memory carriers, size of DIMMs, banks, interleaving, access time, or even arrangement of modules: which sockets were used, which were left empty, which sockets had the bigger DIMMs.
    Exception: if the tester has evidence that a memory configuration choice does not affect performance, then SPEC does not require disclosure of the choice made by the tester.
    For example, if an 8GB system is known to perform identically whether configured with 8 x 1GB DIMMs or 4 x 2GB DIMMs, then SPEC does not require disclosure of which choice was made.
  16. Disk Subsystem: Describes disks attached to processor nodes. File Server disks are documented as below. Details are Size (GB/TB), Type (SCSI, Fast SCSI etc.), and any other performance-relevant characteristics. The disk subsystem used for the SPEChpc 2021 benchmark suites run directories must be described. If other disks are also performance relevant, then they must also be described.
  17. Other Hardware: Additional equipment added to improve performance (special disk controller, NVRAM file system accelerator etc.).
  18. Operating System: (Name and Version)
  19. System State: On Linux systems with multiple run levels, the system state must be described by stating the run level and a very brief description of the meaning of that run level, for example:

    System State: Run level 4 (multi-user with display manager)

    On other systems:

    Note: some Unix (and Unix-like) systems have deprecated the concept of "run levels", preferring other terminology for state description. In such cases, the system state field should use the vocabulary recommended by the operating system vendor.

  20. Additional detail about system state may be added in free form notes.

  21. Local File System Type(s): the type of the file system, or list of types of file systems, local to the node.
  22. Shared File System Type(s): the types of any file systems shared by the node.
  23. Other Software: Additional software added to improve performance.

4.2.3 Accelerator Configuration

This section describes the configuration of the target device. The following information needs to be supplied per node. Note, if more than one model of accelerator is used on a node, use a comma delimited list for each field (where appropriate). If the node does not use an accelerator, use "N/A" for the field description. Leaving a field blank is discouraged as it's unclear if an accelerator was not installed or if the tester forgot to include the description. Details are:

  1. Model Name:  Stand-alone product name (what you would buy)
  2. Hardware Vendor: Company that makes the Model
  3. Count:  Number of accelerators installed on a node
  4. Type: GPU, APU, FPGA, Co-Processor, CPU, MIC, Manycore CPU, etc.
  5. Connection to Host: PCIe, PCIe external, SoC, N/A
  6. ECC enabled:  yes/no
  7. Hardware Description:  Free Form notes (sysinfo) detailing specific information about the accelerator which should include but is not limited to:
  8. Driver:  Version and name of the driver software

4.2.4 Adaptor Configurations

In a Cluster system, each node will use one or more adaptors to attach to the available interconnect(s). For each node type, then, any number of adaptors can be described with the following attributes:

  1. Vendor and Model
  2. Count: How many of this kind of adaptor are attached to the node.
  3. Slot Type: PCle x8, HTX, etc.
  4. Data rate: The per-port nominal data-rate, in some units like GB/s.
  5. Ports used: The number of cables attaching the adaptor to the interconnect.
  6. Which interconnect that the adaptor attaches to.
  7. Driver and firmware levels

4.2.5 Interconnect Configuration

The nodes in a compute-cluster will be linked via one or more interconnects that will carry various kinds of communication. The report will contain one section for each interconnect. An SMP system with no external file server will not use an interconnect for communication and file transfers, so none of the below fields will be filled in and the report will not contain any interconnect sections. The details are:

  1. Vendor and model
  2. Purpose: what kind of communication traffic crosses this interconnect.
  3. Topology: a brief description of the interconnect arrangement, including the branch and root switches.
  4. Switch vendor and model
  5. Switch count: How many switches of this type compose the interconnect.
  6. Switch ports: The number of cables attached to this switch.
  7. Switch data rate
  8. Switch firmware level

4.2.6 Software Configuration

This section describes the compiler invocation and running of the benchmarks. Details are:

  1. Compilers:
  2. MPI library: (Name, Version and Vendor) If the MPI is provided as a source distribution, then the disclosure should describe how to obtain the software and how to configure, build and install.
  3. Drivers: Device Driver Names and Versions
  4. Other Software: Additional software added to improve performance.
  5. System Services: If performance relevant system services or daemons are shut down (e.g. remote management service, disk indexer / defragmenter, spyware defender, screen savers) these must be documented in the notes section. Incidental services that are not performance relevant may be shut down without being disclosed, such as the print service on a system with no printers attached. The tester remains responsible for the results being reproducible as described.
  6. Scripted Installations and Pre-configured Software: In order to reduce the cost of benchmarking, test systems are sometimes installed using automatic scripting, or installed as preconfigured system images. A tester might use a set of scripts that configure the corporate-required customizations for IT Standards, or might install by copying a disk image that includes Best Practices of the performance community. SPEC understands that there is a cost to benchmarking, and does not forbid such installations, with the proviso that the tester is responsible to disclose how end users can achieve the claimed performance (using appropriate fields above).

    Example: the Corporate Standard Jumpstart Installation Script has 73 documented customizations and 278 undocumented customizations, 34 of which no one remembers. Of the various customizations, 17 are performance relevant for SPEC CPU2006 - and 4 of these are in the category "no one remembers". The tester is nevertheless responsible for finding and documenting all 17. Therefore to remove doubt, the tester prudently decides that it is less error-prone and more straightforward to simply start from customer media, rather than the Corporate Jumpstart.

4.2.7 Tuning Configuration

  1. Base flags list.
  2. Peak flags for each benchmark.
  3. Portability flags used for any benchmark.
  4. Any additional notes such as listing any use of SPEC-approved alternate sources or tool changes.

  5. System Tuning: System tuning must be documented. The free form notes must mention the tuning parameters (including BIOS settings) that have been applied. The definition of the parameters may be in the free form notes or in the flags file. Tuning parameters must also be documented and supported by the vendor.
  6. Selection of SPEC-approved alternate benchmark source-codes or tool changes.
  7. If a change is planned for the spelling of a tuning string, both spellings should be documented in the notes section.

    For example, suppose the tester uses a pre-release compiler with:

    f90 -O4 --newcodegen --loopunroll:outerloop:alldisable
    	

    but the tester knows that the new code generator will be automatically applied in the final product, and that the spelling of the unroll switch will be simpler than the spelling used here. The recommended spelling for customers who wish to achieve the effect of the above command will be:

    f90 -O4 -no-outer-unroll
    	

    In this case, the flags report will include the actual spelling used by the tester, but a note should be added to document the spelling that will be recommended for customers.

  8. Mapping of Ranks to Cores on the System. Utilities used to perform the mapping as well as the topology should be described.
  9. Any additional notes including Non-default BIOS settings.

4.2.8 Description of Portability and Tuning Options ("Flags File")

SPEChpc 2021 benchmark suites provides benchmarks in source code form, which are compiled under control of SPEC's toolset. The SPEC tools automatically detect the use of Compilation and Linkage flags in the config file and document them in the Benchmark Configuration section of the final report. Both portability and optimization flags will be captured in the report subsection.

The SPEC tools require a flag description file which provides information about the syntax of the flags and their meanings. A result will be marked "invalid" unless it has an associated flag description file. A description of how to write one may be found at www.spec.org/hpc2021/Docs .

The level of detail in the description of a flag is expected to be sufficient so that an interested technical reader can form a preliminary judgment of whether he or she would also want to apply the option.

It is acceptable, and even common practice, for testers to build on each other's flags files, copying all or part of flags files posted by others into their own flags files; but doing so does not relieve an individual tester of the responsibility to ensure that the description is accurate.

Although these descriptions have historically been called "flags files", they must also include descriptions of other performance-relevant options that have been selected, including but not limited to environment variables, kernel options, file system tuning options, BIOS options, and options for any other performance-relevant software packages.

4.2.9 Configuration Disclosure for User Built Systems

SPEChpc 2021 Benchmark Results are for systems, not just for chips: it is required that a user be able to obtain the system described in the result page and reproduce the result (within a small range for run-to-run variation).

Nevertheless, SPEC recognizes that chip and motherboard suppliers have a legitimate interest in HPC benchmarking. For those suppliers, the performance-relevant hardware components typically are the cpu chip, motherboard, memory, and interconnect; but users would not be able to reproduce a result using only those four. To actually run the benchmarks, the user has to supply other components, such as a case, power supply, and disk; perhaps also a specialized CPU cooler, extra fans, a disk controller, graphics card, network adapter, BIOS, and configuration software.

Such systems are sometimes referred to as "white box", "home built", "kit built", or by various informal terms. For SPEC purposes, the key point is that the user has to do extra work in order to reproduce the performance of the tested components; therefore, this document refers to such systems as "user built".

For user built systems, the configuration disclosure must supply a parts list sufficient to reproduce the result. As of the listed availability dates in the disclosure, the user should be able to obtain the items described in the disclosure, spread them out on an anti-static work area, and, by following the instructions supplied with the components, plus any special instructions in the SPEC disclosure, build a working system that reproduces the result. It is acceptable to describe components using a generic name (e.g. "Any ATX case"), but the recipe must also give specific model names or part numbers that the user could order (e.g. "such as a Mimble Company ATX3 case").

Component settings that are listed in the disclosure must be within the supported ranges for those components. For example, if the memory timings are manipulated in the BIOS, the selected timings must be supported for the chosen type of memory.

Components for a user built system may be divided into two kinds: performance-relevant and non-performance-relevant. For example, benchmark scores are affected by memory speed, and motherboards often support more than one choice for memory; therefore, the choice of memory type is performance-relevant. By contrast, the motherboard needs to be mounted in a case. Which case is chosen in not normally performance-relevant; it simply has to be the correct size (e.g. ATX, microATX, etc).

Example:

hw_cpu_name    = Frooble 1500 
hw_memory      = 2 GB (2x 1GB Mumble Inc Z12 DDR2 1066) 
sw_other       = SnailBios 17
notes_plat_000 = 
notes_plat_005 = The BIOS is the Mumble Inc SnailBios Version 17,
notes_plat_010 = which is required in order to set memory timings
notes_plat_015 = manually to DDR2-800 5-5-5-15.  The 2 DIMMs were
notes_plat_020 = configured in dual-channel mode. 
notes_plat_025 = 
notes_plat_030 = A standard ATX case is required, along with a 500W
notes_plat_035 = (minimum) ATX power supply [4-pin (+12V), 8-pin (+12V)
notes_plat_040 = and 24-pin are required].  An AGP or PCI graphics
notes_plat_045 = adapter is required in order to configure the system.
notes_plat_050 =
notes_plat_055 = The Frooble 1500 CPU chip is available in a retail box,
notes_plat_060 = part 12-34567, with appropriate heatsinks and fan assembly.  
notes_plat_065 =
notes_plat_070 = As tested, the system used a Mimble Company ATX3 case,
notes_plat_075 = a Frimble Ltd PS500 power supply, and a Frumble
notes_plat_080 = Corporation PCIe Z19 graphics adapter.
notes_plat_085 = 

Additional notes:

Note 1: Regarding graphics adapters:

Note 2: Regarding power modes: Sometimes processors are capable of running with differing performance characteristics according to how much power the user would like to spend. If non-default power choices are made for a user built system, those choices must be documented in the notes section.

Note 3: Regarding cooling systems: Sometimes processors are capable of running with degraded performance if the cooling system (fans, heatsinks, etc.) is inadequate. When describing user built systems, the notes section must describe how to provide cooling that allows the processor to achieve the measured performance.

4.3 Test Results Disclosure

The actual test results consist of the elapsed times and ratios for the individual benchmarks and the overall SPEC metric produced by running the benchmarks via the SPEC tools. The required use of the SPEC tools ensures that the results generated are based on benchmarks built, run, and validated according to the SPEC run rules.

4.3.1 SPEChpc 2021 Performance Metrics

Below is a list of the measurement components for each SPEChpc 2021 benchmark suites and metric:

These are calculated as follows:

  1. For the given suite (Tiny, Small, Medium, Large), the elapsed time in seconds for each of its benchmark runs is reported.
  2. The ratio of the reference system (TU-Dresden's Taurus System) time divided by the corresponding measured time is reported.
  3. Separately for base and peak, the median or lower median of these ratios is reported per benchmark.
  4. The "base" metric is the geometric mean of medians of the base ratios, and the "peak" metric is the geometric mean of medians of the peak ratios.

All runs of a specific benchmark when using the SPEC tools are required to have validated correctly. The benchmark executables must have been built according to the rules described in section 2 above.

4.3.2 Metric Selection

Publication of peak results are considered optional by SPEC, so the tester may choose to publish only base results. Since by definition base results adhere to all the rules that apply to peak results, the tester may choose to refer to these results by either the base or peak metric names (e.g. SPEChpc 2021-Med_base or SPEChpc 2021-Lrg_peak) or the name SPEChpc 2021_Sml whose value is the greater of SPEChpc 2021-Sml_base and SPEChpc 2021-Sml_peak.

It is permitted to publish base-only results. Alternatively, the use of the flag basepeak is permitted, as described in section 3.5.

4.3.3 Estimates are allowed

SPEChpc 2021 benchmark suites metrics may be estimated. All estimates must be clearly identified as such. It is acceptable to estimate a single metric (for example, SPEChpc 2021-Sml_base, or SPEChpc 2021-Med_peak, or the elapsed seconds for 605.lbm_s). Note that it is permitted to estimate a peak metric without being required to provide a corresponding estimate for base.

SPEC requires that every use of an estimated number be clearly marked with "est." or "estimated" next to each estimated number, rather than burying a footnote at the bottom of a page.

For example, say that the JumboFast will achieve estimated performance of:

Model 1   SPEChpc 2021-Sml_base    50 est.
          SPEChpc 2021-Sml_peak    60 est.
Model 2   SPEChpc 2021-Sml_base    70 est.
          SPEChpc 2021-Sml_peak    80 est.

If estimates are used in graphs, the word "estimated" or "est." must be plainly visible within the graph, for example in the title, the scale, the legend, or next to each individual result that is estimated.

Note: the term "plainly visible" in this rule is not defined; it is intended as a call for responsible design of graphical elements. Nevertheless, for the sake of giving at least rough guidance, here are two examples of the right way and wrong way to mark estimated results in graphs:

Licensees are encouraged to give a rationale or methodology for any estimates, together with other information that may help the reader assess the accuracy of the estimate. For example:

  1. "This is a measured estimate: SPEChpc 2021 Small Benchmark Suite was run on pre-production hardware. Customer systems, planned for Q4, are expected to be similar."
  2. "Performance estimates are modeled using the cycle simulator GrokSim Mark IV. It is likely that actual hardware, if built, would include significant differences."

Those who publish estimates are encouraged to publish actual SPEC metrics as soon as possible.

4.3.4 Performance changes for production systems

As mentioned previously, performance may sometimes change for pre-production systems; but this is also true of production systems (that is, systems that have already begun shipping). For example, a later revision to the firmware, or a mandatory OS bugfix, might reduce performance.

For production systems, if the tester becomes aware of something that reduces performance by more than 5% on an overall metric (for example, SPEChpc 2021-Sml_base or SPEChpc 2021-Med_base), the tester is encouraged but not required to republish the result. In such cases, the original result is not considered Non Compliant. The tester is also encouraged, but not required, to include a reference to the change that makes the results different (e.g. "with OS patch 20020604-02").

4.4 Required Disclosures

If a SPEC licensee publicly discloses a SPEChpc 2021 benchmark suites result (for example in a press release, academic paper, magazine article, or public web site), and does not clearly mark the result as an estimate, any SPEC member may request that the rawfile(s) from the run(s) be sent to SPEC. The rawfiles must be made available to all interested members no later than 10 working days after the request. The rawfile is expected to be complete, including configuration information (section 4.2 above).

A required disclosure is considered public information as soon as it is provided, including the configuration description.

For example, Company A claims a result of 1000 SPEChpc 2021-Sml_base. A rawfile is requested, and supplied. Company B notices that the result was achieved by stringing together 50 chips in single-user mode. Company B is free to use this information in public (e.g. it could compare the Company A system vs. a Company B system that scores 999 using only 25 chips in multi-user mode).

Review of the result: Any SPEC member may request that a required disclosure be reviewed by the SPEC/HPG committee. At the conclusion of the review period, if the tester does not wish to have the result posted on the SPEC result pages, the result will not be posted. Nevertheless, as described above, the details of the disclosure are public information.

When public claims are made about a SPEChpc 2021 benchmark suites result, whether by vendors or by academic researchers, SPEC reserves the right to take action if the rawfile is not made available, or shows different performance than the tester's claim, or has other rule violations.

4.5 Research and Academic usage of SPEChpc 2021 benchmark suites

SPEC encourages use of the SPEChpc 2021 benchmark suites in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of testers who publish on the SPEC web site. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to meet the Continuous Run requirement (see section 3.3), or may use research compilers that are unsupported and are not generally available (see section 1).

Nevertheless, SPEC would like to encourage researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results.

Where the rules cannot be followed, SPEC requires that the deviations from the rules be clearly disclosed, and that any SPEC metrics (such as SPEChpc 2021-Sml_base) be clearly marked as estimated.

It is especially important to clearly distinguish results that do not comply with the run rules when the areas of non-compliance are major, such as not using the reference workload, or only being able to correctly validate a subset of the benchmarks.

4.6 Fair Use

Consistency and fairness are guiding principles for SPEC. To help assure that these principles are met, any organization or individual who makes public use of SPEC benchmark results must do so in accordance with the SPEC Fair Use Rule, as posted at http://www.spec.org/fairuse.html.



5. Run Rule Exceptions

If for some reason, the test sponsor cannot run the benchmarks as specified in these rules, the test sponsor can seek SPEC approval for performance-neutral alternatives. No publication may be done without such approval. The SPEC High Performance Group (HPG) maintains a Policies and Procedures document that defines the procedures for such exceptions.



Copyright � 1999-2021 Standard Performance Evaluation Corporation
All Rights Reserved