SPEC CPU2000 Press Release FAQ

Date: December 30, 1999

Answers to Common Questions About the SPEC CPU2000 Benchmark

Q1: What is SPEC CPU2000?

A1: SPEC CPU2000 is a software benchmark product produced by the Standard Performance Evaluation Corp. (SPEC), a non-profit group that includes computer vendors, systems integrators, universities, research organizations, publishers and consultants from around the world. It is designed to provide performance measurements that can be used to compare compute-intensive workloads on different computer systems.

SPEC CPU2000 contains two benchmark suites: CINT2000 for measuring and comparing compute-intensive integer performance, and CFP2000 for measuring and comparing compute-intensive floating point performance.

Q2: What is a benchmark?

A2: A benchmark is a standard of measurement or evaluation. SPEC is a non-profit organization formed to establish and maintain computer benchmarks for measuring and comparing component- and system-level computer performance.

Q3: What does the "C" in CINT2000 and CFP2000 stand for?

A3: The "C" denotes that these are component-level benchmarks as opposed to whole system benchmarks.

Q4: What components do CINT2000 and CFP2000 measure?

A4: Being compute-intensive benchmarks, they measure performance of the computer's processor, memory architecture and compiler. It is important to remember the contribution of the latter two components -- performance is more than just the processor.

Q5: What component performance is not measured by CINT2000 and CFP2000?

A5: The CINT2000 and CFP2000 benchmarks do not stress I/O (disk drives), networking or graphics. It might be possible to configure a system in such a way that one or more of these components impact the performance of CINT2000 and CFP2000, but that is not the intent of the suites.

Q6: What is included in the SPEC CPU2000 package?

A6: SPEC provides the following in its CPU2000 package:

CPU2000 tools for compiling, running and validating the benchmarks for a variety of operating systems
source code for the tools, so that they can be built for systems not covered by the pre-compiled tools
source code for the benchmarks
tools for generating performance reports
run and reporting rules defining how the benchmarks should be used to produce standard results
CPU2000 documentation

SPEC CPU2000 includes tools for most UNIX operating systems and Windows NT. Additional products for other operating systems will be released later if SPEC detects enough demand. CPU2000 will be shipped on a single CD-ROM disk.

Q7: What does the SPEC CPU2000 user have to provide?

A7: The user must have a computer system running UNIX or Windows NT with C, C++, and FORTRAN90 (FORTRAN77 may be used for some benchmarks) compilers. A CD-ROM drive must also be available. Depending on the system under test, approximately 1GB will be needed on a hard drive to install, build and run SPEC CPU2000.

It is also assumed that the system has at least 256MB of RAM to ensure that the benchmarks remain compute-intensive (SPEC is requiring a larger memory size to measure the performance of larger applications). While it is possible to build a system with less memory, this would introduce paging effects into the measurement, making it less compute-intensive.

Q8: What are the basic steps in running the benchmarks?

A8: Installation and use are covered in detail in the SPEC CPU2000 User Documentation. The basic steps are as follows:

Install CPU2000 from media.
Run the installation scripts specifying your operating system.
Compile the tools if executables are not provided in CPU2000.
Determine which metric you want to run.
Create a configuration file for that metric. In this file, you specify compiler flags and other system-dependent information.
Run the SPEC tools to build (compile), run and validate the benchmarks.
If the above steps are successful, generate a report based on the run times and metric equations.

Q9: What source code is provided? What exactly makes up these suites?

A9: CINT2000 and CFP2000 are based on compute-intensive applications provided as source code. CINT2000 contains 11 applications written in C and one in C++ (252.eon) that are used as benchmarks:

Name	Brief Description
164.gzip	Data compression utility
175.vpr	FPGA circuit placement and routing
176.gcc	C compiler
181.mcf	Minimum cost network flow solver
186.crafty	Chess program
197.parser	Natural language processing
252.eon	Ray tracing
253.perlbmk	Perl
254.gap	Computational group theory
255.vortex	Object-oriented database
256.bzip2	Data compression utility
300.twolf	Place and route simulator

CFP2000 contains 14 applications (six FORTRAN77, four FORTRAN90 and four C) that are used as benchmarks:

Name	Brief Description
168.wupwise	Quantum chromodynamics
171.swim	Shallow water modeling
172.mgrid	Multi-grid solver in 3D potential field
173.applu	Parabolic/elliptic partial differential equations
177.mesa	3D graphics library
178.galgel	Fluid dynamics: analysis of oscillatory instability
179.art	Neural network simulation: adaptive resonance theory
183.equake	Finite element simulation: earthquake modeling
187.facerec	Computer vision: recognizes faces
188.ammp	Computational chemistry
189.lucas	Number theory: primality testing
191.fma3d	Finite-element crash simulation
200.sixtrack	Particle accelerator model
301.apsi	Solves problems regarding temperature, wind, distribution of pollutants

The numbers in the benchmarks' names serve as identifiers to distinguish programs from one another (i.e., some programs were updated from SPEC CPU95 and need to be distinguished from their previous versions).

More detailed descriptions on the benchmarks (with reference to papers, web sites, etc.) can be found in the individual benchmark directories in the SPEC benchmark tree.

Q10: What metrics can be measured?

A10: The CINT2000 and CFP2000 suites can be used to measure and calculate the following metrics:

CINT2000:

SPECint2000: The geometric mean of 12 normalized ratios (one for each integer benchmark) when compiled with "aggressive" optimization for each benchmark.

SPECint_base2000: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization for each benchmark.

SPECint_rate2000: The geometric mean of 12 normalized throughput ratios when compiled with "aggressive" optimization for each benchmark.

SPECint_rate_base2000: The geometric mean of 12 normalized throughput ratios when compiled with "conservative" optimization for each benchmark.

CFP2000:

SPECfp2000: The geometric mean of 14 normalized ratios (one for each floating point benchmark) when compiled with "aggressive" optimization for each benchmark.

SPECfp_base2000: The geometric mean of 14 normalized ratios when compiled with "conservative" optimization for each benchmark.

SPECfp_rate2000: The geometric mean of 14 normalized throughput ratios when compiled with "aggressive" optimization for each benchmark.

SPECfp_rate_base2000: The geometric mean of 14 normalized throughput ratios when compiled with "conservative" optimization for each benchmark.

The ratio for each of the benchmarks is calculated using a SPEC-determined reference time and the actual run time of the benchmark.

Q11: What is the difference between a "conservative" (base) metric and an "aggressive" (non-base) metric?

A11: In order to provide comparisons across different computer hardware, SPEC provides benchmarks as source code. This means they must be compiled before they can be run. There was agreement within SPEC that the benchmarks should be compiled the way users compile programs.

But how do users compile programs? On one side, people might just compile with the general high-performance options suggested by the compiler vendor. On the other side, people might experiment with many different compilers and compiler flags to achieve the best performance. So, while SPEC cannot match exactly how everyone uses compilers, it can provide metrics that represent the general characteristics of these two groups.

The base metrics (e.g., SPECint_base2000) are required for all reported results and have set guidelines for compilation (e.g., the same flags must be used in the same order for all benchmarks of the same language, no more than four optimization flags, no assertion flags). The assumed model uses performance compiler flags that a compiler vendor would suggest for a given program knowing only its own language. The non-base metrics (e.g., SPECint2000) are optional and have less strict requirements (e.g., different compiler options may be used on each benchmark).

A full description of the distinctions can be found in the SPEC CPU2000 run and reporting rules.

Q12: What is the difference between a "rate" and a "non-rate" metric?

A12: There are several different ways to measure computer performance. One way is to measure how fast the computer completes a single task; this is a speed measurement. Another way is to measure how many tasks a computer can accomplish in a certain amount of time; this is called a throughput, capacity or rate measurement.

The SPEC speed metrics, or non-rate metrics, (i.e., SPECint2000) are used for comparing the ability of a computer to complete single tasks. The SPEC rate metrics (i.e., SPECint_rate2000) measure the throughput or rate of a machine carrying out a number of similar tasks. Traditionally, the rate metrics have been used to demonstrate the performance of multi-processor systems.

Q13: How should I use SPEC CPU2000?

A13: Typically, the best measurement of a system is the performance of your own application with your own workload. Unfortunately, it is often very difficult and expensive to get a wide base of reliable, repeatable and comparable measurements on different systems with this type of performance audit due to time, money or other constraints.

Benchmarks act as a reference point for comparison. It's the same reason that gas mileage ratings exist, although probably no driver gets exactly the same mileage as listed in the ratings. If you understand what benchmarks measure, they're useful.

It's important to know that CINT2000 and CFP2000 are CPU-focused, not system-focused benchmarks. They concentrate on only one portion of the factors that contribute to applications performance. A graphics or network performance bottleneck within an application, for example, will not be reflected in these benchmarks.

Understanding your own needs helps determine the relevance of the benchmarks.

Q14: Which SPEC CPU2000 metric(s) should be used to determine performance?

A14: It depends on your needs. SPEC provides the benchmarks and results as tools for you to use. You need to determine how you use a computer or what your performance requirements are and then choose the appropriate SPEC metric(s).

A single user running a compute-intensive integer program, for example, might only be interested in SPECint2000 or SPECint_base2000. On the other hand, a person who maintains a machine used by multiple scientists running floating point simulations might be more concerned with SPECfp_rate2000 or SPECfp_rate_base2000.

Q15: SPEC CPU95 is already an available product. Why create SPEC CPU2000? Will it show anything different from SPEC CPU95?

A15: Technology is always improving. As the technology improves, the benchmarks should improve as well. SPEC needed to address the following issues:

Run-time:
Several of the CPU95 benchmarks were finishing in less than a minute on leading-edge processors/systems. Given the SPEC measurement tools, small changes or fluctuations in the measurements were having significant impacts on the percentage of improvements being seen. SPEC chose to make run times for CPU2000 benchmarks longer to take into account future performance and prevent this from being an issue for the lifetime of the suites.

Application size:
Many comments received by SPEC indicated that applications had grown in complexity and size and that CPU95 was becoming less representative of what runs on current systems. For CPU2000, SPEC selected programs with larger resource requirements to provide a mix with some of the smaller programs.

Application type:
SPEC felt that there were additional application areas that should be included in CPU2000 to increase variety and representation within the suites. Areas such as 3D and image recognition have been added and data compression has been expanded.

Moving target:
CPU95 has been available for five years and much improvement in hardware and software has occurred during this time. Benchmarks need to evolve to keep pace with improvements.

Education:
As the computer industry grows, benchmark results are quoted more often. With the release of new benchmark suites such as CPU2000, there is a fresh opportunity to discuss benchmark results and their significance.

Q16: What happens to SPEC CPU95 after SPEC CPU2000 is released?

A16: SPEC will begin the process of retiring CPU95. Three months after the announcement of CPU2000, SPEC will require all CPU95 submissions to be accompanied by CPU2000 results. After six months, SPEC will stop accepting CPU95 results for its web site. SPEC will also stop selling CPU95 at a soon-to-be determined date.

Q17: Is there a way to translate SPEC CPU95 results to SPEC CPU2000 results or vice versa?

A17: There is no formula for converting CPU95 results to CPU2000 results and vice versa; they are different products. There probably will be some correlation between CPU95 and CPU2000 results (i.e., machines with higher CPU95 results often will have higher CPU2000 results), but there is no universal formula for all systems.

SPEC strongly encourages SPEC licensees to publish CPU2000 numbers on older platforms to provide a historical perspective on performance.

Q18: What criteria were used to select the benchmarks?

A18: In the process of selecting applications to use as benchmarks, SPEC considered the following criteria:

portability to all SPEC hardware architectures (32- and 64-bit including Alpha, Intel Architecture, PA-RISC, Rxx00, SPARC, etc.)
portability to various operating systems, particularly UNIX and NT
benchmarks should not include measurable I/O
benchmarks should not include networking or graphics
benchmarks should run in 256MB RAM without swapping (SPEC is assuming this will be a minimal memory requirement for the life of CPU2000 and the emphasis is on compute-intensive performance, not disk activity)
no more than five percent of benchmarking time should be spent processing code not provided by SPEC.

Q19: Weren't some of the SPEC CPU2000 benchmarks in SPEC CPU95? How are they different?

A19: Although some of the benchmarks from CPU95 are included in CPU2000, they all have been given different (usually larger) workloads or modified to improve their coding style or use of resources.

The revised benchmarks have been assigned different identifying numbers to distinguish them from versions in previous suites and to indicate that they are not comparable with their predecessors.

Q20: Why were some of the benchmarks not carried over from CPU95?

A20: There are several reasons why SPEC did not vote to carry over certain benchmarks. Some benchmarks were not retained because it was not possible to create a longer-running or more robust workload. Others were left out because SPEC felt that they did not add significant performance information compared to the other benchmarks under consideration.

Q21: Why does SPEC use a reference machine for determining performance metrics? What machine is used for SPEC CPU2000 benchmark suites?

A21: SPEC uses a reference machine to normalize the performance metrics used in the CPU2000 suites. Each benchmark is run and measured on this machine to establish a reference time for that benchmark. These times are then used in the SPEC calculations.

SPEC uses a Sun Ultra5_10 with a 300MHz processor as the reference machine. It takes approximately two days to do a SPEC-conforming run of CINT2000 and CFP2000 on this machine.

The performance relation between two systems measured with the CPU2000 benchmarks would remain the same even if a different reference machine was used. This is a consequence of the mathematics involved in calculating the individual and overall (geometric mean) metrics.

Q22: How long does it take to run the SPEC CPU2000 benchmark suites?

A22: This depends on the suite and the machine that is running the benchmarks. As mentioned above, on the reference machine it takes two days for a SPEC-conforming run (at least three iterations of each benchmark to ensure that results can be reproduced).

Q23: What if the tools cannot be run or built on a system? Can they be run manually?

A23: To generate SPEC-compliant results, the tools used must be approved by SPEC. If several attempts at using the SPEC tools are not successful for the operating system for which you purchased CPU2000, you should contact SPEC for technical support. SPEC will work with you to correct the problem and/or investigate SPEC-compliant alternatives.

Q24: Where are SPEC CPU2000 results available?

A24: Results for all measurements submitted to SPEC are available at http://www.spec.org.

Q25: Can SPEC CPU2000 results be published outside of the SPEC web site?

A25: Yes, SPEC CPU2000 results can be freely published if all the run and reporting rules have been followed. The CPU2000 license agreement binds every purchaser of the suite to the run and reporting rules if results are quoted in public. A full disclosure of the details of a performance measurement must be provided to anyone who asks.

SPEC strongly encourages that results be submitted for the web site, since it ensures a peer review process and uniform presentation of all results.

The run and reporting rules contain an exemption clause for research and academic use of SPEC CPU2000. Results obtained in this context need not comply with all the requirements for other measurements. It is required, however, that research and academic results be clearly distinguished from results submitted officially to SPEC.

Q26: How do I contact SPEC?

A26: Send an e-mail to info@spec.org.

Questions and answers were prepared by Kaivalya Dixit of IBM and Jeff Reilly of Intel Corp. Dixit is president of SPEC and Reilly is release manager for SPEC CPU2000.

Press contacts:

Bob Cramblitt or Ellen Gooch
Cramblitt & Company
919-481-4599; cramco@cramco.com

Standard Performance Evaluation Corporation

SPEC CPU2000 Press Release FAQ

Answers to Common Questions About the SPEC CPU2000 Benchmark