SPEC Run and Reporting Rules for
Date:
|
This document provides guidelines required to build, run, and report on the benchmarks in the SPEChpc suite.
This document specifies how the benchmarks in the SPEChpc suite are to be run for measuring and publicly reporting performance results of commercially available computer systems. This ensures that results generated with this suite are meaningful for cross-platform comparisons, are comparable to results generated on other platforms, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEChpc license agreement, all results publicly disclosed must adhere to the SPEChpc Run and Reporting Rules. The following basics are expected:
Each of these points are discussed in further detail below.
Suggestions for improving this run methodology should be made to SPEC for consideration in future releases.
The SPEC High Performance Group (HPG) Benchmarking Philosophy is congruent with the basic principles of SPEC. Specifically, the SPEC High Performance Steering Committee believes that fair and consistent comparisons across high performance computing platforms is of significant benefit to the computing industry.
In addition, and somewhat different from the SPEC Open Systems Group, the SPEC HPG realizes that "out of the box performance" is not an expectation in the High Performance computing marketplace. Rather, it is the norm that systems and codes are extensively tuned to achieve the best possible performance, yielding improved results turnaround times. This aspect of high performance computing stems from the higher cost of the compute engines. More efficient, faster code makes better use of very expensive machine resources. Furthermore, it is the intent of the SPEC HPG to provide benchmarks to measure sustained application performance, which the vendors will preferably use in the future instead of the still widely used peak performance numbers.
Agreeing to a set of benchmark rules for SPEChpc has been a much more difficult task than it had been for the existing SPEC benchmarks due to the lack of agreed software standards for parallel systems and the non-uniformity of the high performance platforms. A further complication is that fundamentally different code implementations (scalar vs. vector, shared-memory vs. message passing, etc.) may be required for different high performance systems.
To ensure that results are relevant to end-users, SPEC HPG expects that the hardware, software, and benchmark implementations used for obtaining SPEChpc results adhere to the following conventions:
In the case where it appears that the above guidelines have not been followed, SPEC HPG may investigate such a claim and request that the offending optimization (e.g. recognizing a SPEC-benchmark specific pattern and replacing it with optimized code) be backed off and the results resubmitted. Or, SPEC HPG may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPEC reserves the right to adapt the SPEChpc suite as it deems necessary to preserve its goal of fair benchmarking (e.g. remove benchmark, modify benchmark code or workload, etc). If a change is made to the suite, SPEC will notify the appropriate parties (i.e. members and licensees). SPEC will redesignate the metrics (e.g. changing the metric from SPECseis96 to SPECseis96a). In the case that a benchmark is removed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, such a republication may necessitate retesting and may require support from the original test sponsor.
Based on these considerations, what follows are run and reporting rules intended to provide results that are consistent and comparable across the different architectures currently represented by SPEC HPG member vendors.
SPEC HPG has adopted a set of rules defining how the SPEChpc benchmark suite must be prepared, run, and measured to produce benchmark metrics. These rules are discussed in the following sections.
A set of run/make tools are supplied with the SPEChpc suite to build and run the benchmarks. To produce "publication-quality" results, SPEC HPG recommends use of these tools. This facilitates reproducibility of results such that all individual benchmarks in the suite are run in the same environment and that a configuration file that defines the optimizations used is available. SPEC does recognize that some members may find it more appropriate to use a different tool set to build and run the suite. This is permitted and the run environment will be reviewed at submission time by the SPEC Review Subcommittee to insure compliance with the intent of the run and reporting rules.
Compiler command line options are referred to as switches or flags. Any standard flag is allowed if it is supported and documented by the compiler supplier. This includes porting, optimization, and preprocessor invocation flags.
Two sets of compile line options per language per code per report will be allowed. As an example, consider the SPECseis96_sm benchmark. Since it has both Fortran and C components, 4 different sets of command line options may be used. That is, the files containing Fortran code (*.f) may be divided into 2 sets, the first set being compiled with one set of compiler flags and the second with another set of compiler flags. The same may be done with the files containing C-language code (*.c).
This accommodates flag level optimizations such as vector versus scalar, parallel versus serial, or other selective segments. This restriction is introduced to provide a means for optimization, but to also limit the numbers of flags which could be introduced by an unrestricted rule.
Any flag used must be generally available, documented, and supported by the vendor, and it should improve performance for a class of programs larger than the single benchmark.
Finally, all compiler switches and flags used must be disclosed in the results submission.
The integrity of the released benchmark source code will be maintained. All source code modifications must be disclosed in the results submission. All modifications to the source code and any supplied data sets are subject to review and approval by SPEC HPG before public release.
SPEC HPG acknowledges the non-uniformity of high performance computer systems and that embedded source program directives are common industry practice on these systems to reach acceptable performance levels. Therefore, SPEC HPG has made provision for the use of source code directives in the SPEChpc benchmark codes. Vendors who support language extensions instead of directives (such as PARALLEL DO instead of C$OMP PARALLEL) may use them accordingly.
Examples of acceptable directives include:
The goal is to provide a benchmark code which exploits the key architectural features of a system, but not necessarily optimally.
Compiler directives must not be uniquely applicable to the execution of the SPEChpc benchmark codes.
SPEC HPG acknowledges the widespread practice of using of general purpose mathematical libraries when building large applications. Such libraries may include functions such as Fourier transforms, BLAS (basic linear algebra subprograms), convolution and correlation, LINPACK (linear equation solution), EISPACK (eigensystem solution), or the like.
These libraries are commonly obtained from several sources, most often supplied by system vendors as supported products or purchased from independent software vendors (e.g. IMSL(TM) or NAG(TM)). In some cases, libraries are publicly available from other sources (e.g. LAPACK).
Acceptable use of such libraries is subject to the following rules:
Any other modification required for porting of benchmark codes is subject to review by the SPEC HPG.
The system state (multi-user, single-user, init level N) may be selected by the benchmarker. This state along with any changes in the default configuration of daemon processes or system tuning parameters must be documented in the notes section of the results disclosure.
All benchmark executions, including the validations steps, contributing to a particular result page must occur continuously and without interruption. Exceptions must be approved by SPEC HPG prior to public release.
If for some reason, the test sponsor cannot run the benchmarks as specified in these rules, the test sponsor can seek SPEC HPG approval for performance-neutral alternatives. No publication may be done without such approval.
All performance results of commercially available computer systems that refer to SPEChpc benchmarks and metrics must conform to the rules defined in this document.
If a benchmarker wishes to publish SPEChpc results in a public forum/medium, they are required to submit a full disclosure as described in this section to SPEC HPG for prior review and approval. Results designated as company confidential, used in non-disclosure situations, or used internally within the company are not considered public. These rules are intended to ensure protection of the use and, therefore, the integrity, of the SPEChpc metrics.
A full disclosure of how the SPEChpc benchmark was prepared, run, and measured is required for publication on the SPEC web page. This information is intended as a specification of the hardware and software environment in which the benchmark was run such that it is adequate for replicating results.
Refer to Section 4.3 for instructions on how to submit the full disclosure to SPEC HPG for publication on the SPEC web page and the form the disclosure must take.
Note: All components of the tested system must be available within 6 months of the publication of results.
Note: If any of the configuration elements listed above are not relevant to the system begin benchmarked, that should be noted in the disclosure. Conversely, hardware configuration elements that are relevant and which are not specifically listed above, should be disclosed.
Description of system tuning, including any special OS parameters set, or changes to standard daemons, etc. The default is defined by those parameters that take effect given no intervention by "end user" system management.
Refer to section 2 of this document for rules concerning what flags, directives, and libraries are allowed.
The benchmarker's configuration file (for example, the file produced by the SPEC HPG run tools) and any supplemental makefiles needed to build the executables used to generate the results.
The actual test results consist of benchmark elapsed times. The SPEChpc benchmark metric is defined as 86400 seconds, the number of seconds in one day, divided by the elapsed time to run the benchmark code in seconds. For example, if the benchmark elapsed time is 1000 seconds, then the SPEChpc benchmark metric is 86.4. The metric number must always appear with the complete SPEChpc metric name, such as SPECchem96_MD. The disclosure must include the output file produced by the benchmark run, which indicates the time of the benchmark run, the elapsed time, and the validation result.
Prior to publication on the SPEC Web page, all SPEChpc results/configuration disclosures will be reviewed by SPEC HPG. To facilitate this process, disclosures should be sent to SPEC via email to info@spec.org or submitted via the Web-based submission form to be found on the SPEC HPG internal Web pages.
When submitting results/configuration disclosures to SPEC HPG, include the following elements:
All ASCII text files submitted should include a header block with the name of the submitting company, date, and contact information (email address and phone number).
Within the SPEChpc suite, individual benchmark applications represent specific areas of industrial interest and scientific research. The application areas, benchmarks in these areas, and their SPEC names are defined in the SPEC/HPG charter (section "Resolutions of the High-Performance Steering Committee") and described on the SPEC/HPG public web page.
Benchmark metrics are reported per application area, per problem size. Four problem sizes are provided. Thus, for each benchmark there are the following metric names:
Reporting pages (see Appendix A) correspond to the metrics listed above, and benchmarkers may report on any subset of them.
Keeping all things the same in the system configuration (hardware and software), the benchmarker may report a table of results for a particular metric where the only thing varied was the number of processors (or threads) used by the benchmark code. The reporting page provides a table for this.
The metrics are unitless. They are defined as follows:
86400 seconds metric = ------------------------------------ Elapsed time of benchmark in seconds
It is left to the benchmarker to determine what number of CPUs to report upon.