ABSTRACT
This document describes the contents and arrangement of a SPEC MPI2007 result disclosure.
In this document, we will refer to the arrangement of fields in the HTML
format of a SPEC MPI2007 result disclosure, since there may be
differences in arrangement of result fields between the text, HTML and
other report formats (CSV, PDF, PS, etc.).
While the reports are formatted in a way that is intended to be self-explanatory to the reader,
he/she may desire a formal statement of the meaning of a field, or have technical questions
about information provided in the fields.
Further, the SPEC website contain links from the fields of the published reports
to their descriptions in this document.
The contents of the result reports are either generated by the run of the benchmarks, extracted from the configuration file that controls the building and running of the benchmarks, or are provided by descriptive fields filled in by the tester. These follow conventions that are specified in the separate documents on the Run Rules, Config Files, and XML Flags Files. Reports published on the SPEC website have been peer-reviewed by the members of the SPEC/HPG committee and are expected to be correct in every detail.
(To check for possible updates to this document, please see http://www.spec.org/mpi2007/Docs/)
Selecting one of the following will take you to the detailed table of contents for that section or subsection:
2. Result and Configuration Summary
2.1 Top bar
2.2 Results table
2.3 System Summary
3.2 Interconnect Description(s)
4.2 Base & Peak Forbidden Flags
4.3 Base & Peak Compiler Invocation
4.4 Base & Peak Portability Flags
4.5 Base & Peak Optimization Flags
6. Errors
1. SPEC MPI2007 Benchmarks
1.1 Benchmarks by suite
1.1.1 Benchmarks in the medium suite
1.1.2 Benchmarks in the large suite
1.2 Benchmarks by language
1.2.1 C Benchmarks
1.2.2 C++ Benchmarks
1.2.3 Fortran Benchmarks
1.2.4 Benchmarks using both Fortran and C
2. Result and Configuration Summary
2.1 Top bar
2.1.1 MPIM2007 Result
2.1.2 System Vendor
2.1.3 System Name
2.1.4 SPECmpiM_peak2007
2.1.5 SPECmpiM_base2007
2.1.6 SPECmpiL_peak2007
2.1.7 SPECmpiL_base2007
2.1.8 MPI2007 license #
2.1.9 Hardware Availability
2.1.10 Software Availability
2.1.11 Test date
2.1.12 Test sponsor
2.1.13 Tested by
2.2 Results table
2.2.1 Benchmark
2.2.2 Ranks
2.2.3 Seconds
2.2.4 Ratio
2.2.5 Identifying the Median results
2.2.6 Significance of the run order
2.3 System Summary
2.3.1 Type of System
2.3.2 Compute Node
2.3.3 InterConnects
2.3.4 File Server Node
2.3.5 Head Node
2.3.6 Other Node
2.3.7 Total Compute Nodes
2.3.8 Total Chips
2.3.9 Total Cores
2.3.10 Total Threads
2.3.11 Total Memory
2.3.12 Maximum Ranks Run
2.3.13 Minimum Peak Ranks
2.3.14 Maximum Peak Ranks
2.4 Benchmark Configuration
2.4.1 C Compiler
2.4.2 C++ Compiler
2.4.3 Fortran Compiler
2.4.4 Auto Parallel
2.4.5 Base Pointers
2.4.6 Peak Pointers
2.4.7 MPI Library
2.4.8 Other MPI Information
2.4.9 Pre-processors
2.4.10 Other Software
3. Hardware Description
3.1 Node Description(s)
3.1.1 Hardware
3.1.1.1 Node class
3.1.1.2 Number of nodes
3.1.1.3 Vendor
3.1.1.4 Model
3.1.1.5 CPU Name
3.1.1.6 CPU(s) orderable
3.1.1.7 Chips(s) enabled
3.1.1.8 Core(s) enabled
3.1.1.9 Cores per Chip
3.1.1.10 Threads per Core
3.1.1.11 CPU Characteristics
3.1.1.12 CPU MHz
3.1.1.13 Primary Cache
3.1.1.14 Secondary Cache
3.1.1.15 L3 Cache
3.1.1.16 Other Cache
3.1.1.17 Memory
3.1.1.18 Disk Subsystem
3.1.1.19 Other Hardware
3.1.1.20 Adapter Card Model Name
3.1.1.20.1 Adapter Count
3.1.1.20.2 Adapter Slot Type
3.1.1.20.3 Data Rate
3.1.1.20.4 Ports Used
3.1.1.20.5 Adapter Interconnect
3.1.2 Software
3.1.2.1 Adapter Driver
3.1.2.2 Firmware
3.1.2.3 Operating System
3.1.2.4 Local File System
3.1.2.5 Shared File System
3.1.2.6 System State
3.1.2.7 Other Software
3.1.3 Node Notes
3.2 Interconnect Description(s)
3.2.1 Vendor
3.2.2 Model
3.2.3 Switch Model(s)
3.2.3.1 Number of switches
3.2.3.2 Ports per switch
3.2.3.3 Data Rate
3.2.3.4 Firmware
3.2.4 Topology
3.2.5 Primary Use
3.2.6 Notes
4. Compilation Description
4.1 Base & Peak Unknown Flags
4.2 Peak & Peak Forbidden Flags
4.3 Base & Peak Compiler Invocation
4.3.1 Notes
4.4 Base & Peak Portability Flags
4.4.1 C benchmarks
4.4.2 C++ benchmarks
4.4.3 Fortran benchmarks
4.4.4 Benchmarks using both Fortran and C
4.4.5 Notes
4.5 Base & Peak Optimization Flags
4.5.1 C benchmarks
4.5.2 C++ benchmarks
4.5.3 Fortran benchmarks
4.5.4 Benchmarks using both Fortran and C
4.5.5 Notes
4.6 Base & Peak Other Flags
4.6.1 C benchmarks
4.6.2 C++ benchmarks
4.6.3 Fortran benchmarks
4.6.4 Benchmarks using both Fortran and C
4.6.5 Notes
5. General Notes
6. Errors
The MPIM2007 suite is comprised of 13 floating-point compute intensive codes; 4 in Fortran, 2 in C, 1 in C++, and 6 which contain both Fortran and C.
The MPIL2007 suite is comprised of 12 floating-point compute intensive codes; 4 in Fortran, 3 in C, 1 in C++, and 4 which contain both Fortran and C.
Two benchmarks in the MPIM2007 suite are written in C:
Three benchmarks in the MPIL2007 suite are written in C:
One benchmark in the MPIM2007 and MPIL2007 suites is written in C++:
Four benchmarks in the MPIM2007 suite are written in Fortran:
Four benchmarks in the MPIL2007 suite are written in Fortran:
Six benchmarks in the MPIM2007 suite are written using both Fortran and C:
Four benchmarks in the MPIL2007 suite are written using both Fortran and C:
More detailed information about metrics is in sections 4.3.1 and 4.3.2 of the MPI2007 Run and Reporting Rules.
This result is from the MPIM2007 suite using the medium (or mref) dataset.
The vendor of the system under test.
The name of the system under test.
The geometric mean of thirteen normalized ratios (one for each benchmark) when compiled with aggressive optimization for each benchmark.
More detailed information about this metric is in section 4.3.1 of the MPI2007 Run and Reporting Rules.
The geometric mean of thirteen normalized ratios when compiled with conservative optimization for each benchmark.
More detailed information about this metric is in section 4.3.1 of the MPI2007 Run and Reporting Rules.
The geometric mean of thirteen normalized ratios (one for each benchmark when run on the Large/lref workloads) when compiled with aggressive optimization for each benchmark.
More detailed information about this metric is in section 4.3.1 of the MPI2007 Run and Reporting Rules.
The geometric mean of thirteen normalized ratios (computed from run times and reference times when run on the lref workloads) when compiled with conservative optimization for each benchmark.
More detailed information about this metric is in section 4.3.1 of the MPI2007 Run and Reporting Rules.
The SPEC CPU license number of the organization or individual that ran the result.
The date when all the hardware necessary to run the result is generally available. For example, if the CPU is available in Aug-2007, but the memory is not available until Oct-2007, then the hardware availability date is Oct-2007 (unless some other component pushes it out farther).
The date when all the software necessary to run the result is generally available. For example, if the operating system is available in Aug-2007, but the compiler or other libraries are not available until Oct-2007, then the software availability date is Oct-2007 (unless some other component pushes it out farther).
The date when the test is run. This value is supplied by the tester; the time reported by the system under test is recorded in the raw result file.
The name of the organization or individual that sponsored the test. Generally, this is the name of the license holder.
The name of the organization or individual that ran the test. If there are installations in multiple geographic locations, sometimes that will also be listed in this field.
In addition to the graph, the results of the individual benchmark runs are also presented in table form.
The name of the benchmarks making up this MPI2007 suite.
This column indicates the number of MPI ranks (processes) that were used during the running of the benchmark.
This is the amount of elapsed (wall) time in seconds that the benchmark took to run from job submit to job completion.
This is the ratio of benchmark run time (number of seconds) to the run time on the reference platform.
For a reportable MPI2007 run, at least two iterations of each benchmark are run, and the median of the runs (lower of middle two, if even) is selected to be part of the overall metric. In output formats that support it, the medians in the result table are underlined in bold. The ".txt" report will mark each median score with an asterisk "*".
Each iteration in the MPI2007 benchmark suite will run each benchmark once, in order. For example, given benchmarks "910.aaa", "920.bbb", and "930.ccc", here's what you might see as the benchmarks were run if they were part of each suite:
MPI2007
Running (#1) 910.aaa ref base oct09a default Running (#1) 920.bbb ref base oct09a default Running (#1) 930.ccc ref base oct09a default Running (#2) 910.aaa ref base oct09a default Running (#2) 920.bbb ref base oct09a default Running (#2) 930.ccc ref base oct09a default Running (#3) 910.aaa ref base oct09a default Running (#3) 920.bbb ref base oct09a default Running (#3) 930.ccc ref base oct09a default
When you read the results table from a run, such as this one, the results in the results table are listed in the order that they were run, in column-major order. In other words, if you're interested in the base scores as they were produced, start in the upper-lefthand column and read down the first column, then read the middle column, then the right column.
If the benchmarks were run with both base and peak tuning, all base runs were completed before starting peak.
Collective hardware details across the whole system. Run rules relating to these items can be found in section 4.2 of the MPI2007 Run and Reporting Rules.
Description of the system being benchmarked: SMP, Homogeneous Cluster or Heterogeneous Cluster.
These systems are used as compute nodes during the run of the benchmark.
These devices are used for the interconnects during the benchmark run.
These systems are used as the file server for the benchmark run.
This system is the head node, a node which is used as the lead for the the benchmark run.
This system is part of the entire configuration and has purpose other than those previously mentioned.
Number of compute nodes used to execute the benchmark.
The total number of chips in the compute nodes available to execute the benchmark.
The total number of cores in the compute nodes available to execute the benchmark.
The total number of threads in the compute nodes available to execute the benchmark.
The total amount of memory in all of the compute nodes available to execute the benchmark.
The number of MPI ranks used to execute the benchmark on the base optimization runs.
The smallest number of ranks used to execute the benchmark runs using peak optimizations.
The largest number of ranks used to execute the peak version of the benchmark.
Information on how the benchmark binaries are constructed. Run rules relating to these items can be found in section 4.2 of the MPI2007 Run and Reporting Rules.
The names and versions of C compilers, preprocessors, and performance libraries used to generate the result.
The names and versions of C++ compilers, preprocessors, and performance libraries used to generate the result.
The names and versions of Fortran compilers, preprocessors, and performance libraries used to generate the result.
Indicates whether all the benchmarks in base used 32-bit pointers, 64-bit pointers, or a mixture. For example, if the C and C++ benchmarks used 32-bit pointers, and the Fortran benchmarks used 64-bit pointers, then "32/64-bit" would be reported here.
Indicates whether all the benchmarks in peak used 32-bit pointers, 64-bit pointers, or a mixture.
The names and versions of MPI Libraries used to generate the result.
Any performance-relevant MPI information used to generate the result.
Any performance-relevant pre-processors that were used to generate the result.
Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
SPEC mpi2007 is capable of running on large heterogeneous clusters containing different kinds of nodes linked by different kinds of interconnects. The report format contains a separate section for each kind of node and each kind of interconnect. Section 4.2 of the Run Rules document describes what information is to be provided by the tester.
For example, an SMP will consist of one node and no interconnect. Homogeneous cluster systems will typically consist of one kind of compute node and one or two kinds of interconnects. There will also often be a file server node. It is possible that the node and interconnect components are available from their respective vendors but no vendor sells the configured system as a whole; in this case the report is intended to provide enough detail to reconstruct an equivalent system with equivalent performance.
Description of the hardware and software configuration of the node.
Hardware configuration of the node.
The number of nodes of this type in the system.
The purpose of this type of node: compute node, file server, head node, etc.
The manufacturer of this kind of node.
The model name of this kind of node.
A manufacturer-determined formal name of the processor used in this node type.
The number of CPUs that can be ordered in this kind of node.
The number of Chips (CPUs) that were enabled and active in the node during the benchmark run.
The number of cores that were enabled and active in the node during the benchmark run.
The number of cores in each chip that were enabled and active in the node during the benchmark run.
The number of threads in each core that were enabled and active in the node during the benchmark run.
Technical characteristics to help identify the processor type used in the node.
The clock frequency of the CPU used in the node, expressed in megahertz.
Description (size and organization) of the CPU's primary cache. This cache is also referred to as "L1 cache".
Description (size and organization) of the CPU's secondary cache. This cache is also referred to as "L2 cache".
Description (size and organization) of the CPU's tertiary, or "Level 3" cache.
Description (size and organization) of any other levels of cache memory.
Description of the system main memory configuration. End-user options that affect performance, such as arrangement of memory modules, interleaving, latency, etc, are documented here.
A description of the disk subsystem (size, type, and RAID level if any) of the storage used to hold the benchmark tree during the run.
Any additional equipment added to improve performance.
There will be one of these groups of entries for each network adapter -- aka Host Channel Adapter (HCA) or Network Interface Card (NIC) -- used to connect to an interconnect to carry MPI or file server traffic.
This field contains this adapter's vendor and model name.
How many of these adapters attach to the node.
The type of slot used to attach the adapter card to the node.
The per-port, nominal data transfer rate of the adapter.
The number of ports used to run the benchmark on the adapter (especially for those which have multiple ports available).
In general terms, the type of interconnect (Ethernet, InfiniBand, etc.) attached to this adapter.
Software configuration of the node.
The driver type and level for this adapter.
The firmware type and level for this device.
The operating system name and version. If there are patches applied that affect performance, they must be disclosed in the Notes.
The type of the file system local to each compute node.
The type of the file system used to contain the run directories.
The state (sometimes called "run level") of the system while the benchmarks were being run. Generally, this is "single user", "multi-user", "default", etc.
Any performance-relevant non-compiler software used, including third-party libraries, accelerators, etc.
Free-form notes about other hardware or software details of the node, such as changes to the default operating system state, changes to the default hardware state of the node, or other tuning information.
Description of the configuration of the interconnect.
The manufacturer(s) of this interconnect.
The model name(s) of the interconnect as a whole, or components of it -- not including the switch model, which is the next field.
The model and manufacturer of the switching element(s) of this interconnect. There may be more than one kind declared.
The number of switches of this type in the interconnect.
The number of ports per switch available for carrying the type of traffic noted in the "Primary Use" field.
The per-port, nominal data transfer rate of the adapter.
The Firmware type and level for the switch(es).
Description of the arrangement of switches and links in the interconnect.
The kind of data traffic carried by the interconnect: MPI, file server, etc.
Free-form notes about other hardware or software details of the interconnect.
This section describes how the benchmarks are compiled. The HTML and PDF reports contain links from the settings that are listed, to the descriptions of those settings in the XML flags file report.
Much information is derived from compilation rules written into the config file and interpreted according to rules specified in the XML flags file. Free-form notes can be added to this. Sections only show up if the corresponding flags are used, such as peak optimization flags; otherwise the section is not printed. Section 2 of the MPI2007 Run and Reporting Rules document gives rules on how these items can be used in reportable runs.
This section lists flags, used in the base or peak compiles, that were not recognized by the report generation. Results with unknown flags are marked "invalid" and may not be published.
Likely the flagsurl parameter was not set correctly, or details need to be added to the XML flags file. The "invalid" marking may be removed by reformatting the result using a flags file that describes all of the unknown flags.
This section lists flags, used in the base or peak compiles, that are designated as Forbidden in the XML flags file for the benchmark or the platform. Results with forbidden flags are marked "invalid" and may not be published.
This section describes how the compilers are invoked, whether any special paths had to be used or flags were passed, etc.
Additional free-form notes explaining the compiler usage.
This section describes the portability settings that are used to build the benchmarks. Optimization settings are not listed here.
Portability settings specific to benchmarks listed.
Portability settings specific to the benchmarks listed.
Portability settings specific to the benchmarks listed.
Portability settings specific to the benchmarks listed.
Additional free-form notes explaining the portability settings.
This section describes the optimizations settings that are used to build the benchmark binaries for the base and peak runs.
Optimization settings specific to the C benchmarks.
Optimization settings specific to the C++ benchmarks.
Optimization settings specific to the Fortran benchmarks.
Optimization settings specific to the mixed-language benchmarks.
Additional free-form notes explaining the optimization settings.
This section describes the other settings that are used to build or run the benchmark binaries for the base and peak runs. These are classified as being neither portability nor optimization settings.
Settings specific to the C benchmarks.
Settings specific to the C++ benchmarks.
Settings specific to the Fortran benchmarks.
Settings specific to the mixed-language benchmarks.
Additional free-form notes explaining the other settings.
This section is where the tester provides notes about things not covered in the other notes sections.
This section is automatically inserted by the benchmark tools when there are errors present that prevent the result from being a valid reportable result.
Copyright © 2007-2009 Standard Performance Evaluation Corporation
All Rights Reserved
Copyright © 2007-2010 Standard Performance Evaluation Corporation
All Rights Reserved