-
-
Q1: What is SPEC CPU2000?
-
A1: SPEC CPU2000 is a software benchmark product produced by the Standard
Performance Evaluation Corp. (SPEC), a non-profit group that includes
computer vendors, systems integrators, universities, research
organizations, publishers and consultants from around the world. It is
designed to provide performance measurements that can be used to compare
compute-intensive workloads on different computer systems.
SPEC CPU2000 contains two benchmark suites: CINT2000 for measuring and
comparing compute-intensive integer performance, and CFP2000 for
measuring and comparing compute-intensive floating point performance.
-
Q2: What is a benchmark?
-
A2: A benchmark is a standard of measurement or evaluation. SPEC is a
non-profit organization formed to establish and maintain computer
benchmarks for measuring and comparing component- and system-level
computer performance.
-
Q3: What does the "C" in CINT2000 and CFP2000 stand
for?
-
A3: The "C" denotes that these are component-level benchmarks
as opposed to whole system benchmarks.
-
Q4: What components do CINT2000 and CFP2000 measure?
-
A4: Being compute-intensive benchmarks, they measure performance of the
computer's processor, memory architecture and compiler. It is
important to remember the contribution of the latter two components --
performance is more than just the processor.
-
Q5: What component performance is not measured by CINT2000 and
CFP2000?
-
A5: The CINT2000 and CFP2000 benchmarks do not stress I/O (disk drives),
networking or graphics. It might be possible to configure a system in
such a way that one or more of these components impact the performance of
CINT2000 and CFP2000, but that is not the intent of the suites.
-
Q6: What is included in the SPEC CPU2000 package?
-
A6: SPEC provides the following in its CPU2000 package:
-
CPU2000 tools for compiling, running and validating the benchmarks
for a variety of operating systems
-
source code for the tools, so that they can be built for systems not
covered by the pre-compiled tools
-
source code for the benchmarks
-
tools for generating performance reports
-
run and reporting rules defining how the benchmarks should be used to
produce standard results
-
CPU2000 documentation
SPEC CPU2000 includes tools for most UNIX operating systems and Windows
NT. Additional products for other operating systems will be released
later if SPEC detects enough demand. CPU2000 will be shipped on a
single CD-ROM disk.
-
Q7: What does the SPEC CPU2000 user have to provide?
-
A7: The user must have a computer system running UNIX or Windows NT with
C, C++, and FORTRAN90 (FORTRAN77 may be used for some benchmarks)
compilers. A CD-ROM drive must also be available. Depending on the system
under test, approximately 1GB will be needed on a hard drive to install,
build and run SPEC CPU2000.
It is also assumed that the system has at least 256MB of RAM to ensure
that the benchmarks remain compute-intensive (SPEC is requiring a
larger memory size to measure the performance of larger applications).
While it is possible to build a system with less memory, this would
introduce paging effects into the measurement, making it less
compute-intensive.
-
Q8: What are the basic steps in running
the benchmarks?
-
A8: Installation and use are covered in detail in the SPEC CPU2000 User
Documentation. The basic steps are as follows:
-
Install CPU2000 from media.
-
Run the installation scripts specifying your operating system.
-
Compile the tools if executables are not provided in CPU2000.
-
Determine which metric you want to run.
-
Create a configuration file for that metric. In this file, you
specify compiler flags and other system-dependent information.
-
Run the SPEC tools to build (compile), run and validate the
benchmarks.
-
If the above steps are successful, generate a report based on the run
times and metric equations.
-
Q9: What source code is provided? What exactly makes up these
suites?
-
A9: CINT2000 and CFP2000 are based on compute-intensive applications
provided as source code. CINT2000 contains 11 applications written in C
and one in C++ (252.eon) that are used as benchmarks:
Name
|
Brief Description
|
164.gzip
|
Data compression utility
|
175.vpr
|
FPGA circuit placement and routing
|
176.gcc
|
C compiler
|
181.mcf
|
Minimum cost network flow solver
|
186.crafty
|
Chess program
|
197.parser
|
Natural language processing
|
252.eon
|
Ray tracing
|
253.perlbmk
|
Perl
|
254.gap
|
Computational group theory
|
255.vortex
|
Object-oriented database
|
256.bzip2
|
Data compression utility
|
300.twolf
|
Place and route simulator
|
CFP2000 contains 14 applications (six FORTRAN77, four FORTRAN90 and
four C) that are used as benchmarks:
Name
|
Brief Description
|
168.wupwise
|
Quantum chromodynamics
|
171.swim
|
Shallow water modeling
|
172.mgrid
|
Multi-grid solver in 3D potential field
|
173.applu
|
Parabolic/elliptic partial differential equations
|
177.mesa
|
3D graphics library
|
178.galgel
|
Fluid dynamics: analysis of oscillatory instability
|
179.art
|
Neural network simulation: adaptive resonance theory
|
183.equake
|
Finite element simulation: earthquake modeling
|
187.facerec
|
Computer vision: recognizes faces
|
188.ammp
|
Computational chemistry
|
189.lucas
|
Number theory: primality testing
|
191.fma3d
|
Finite-element crash simulation
|
200.sixtrack
|
Particle accelerator model
|
301.apsi
|
Solves problems regarding temperature, wind, distribution of
pollutants
|
The numbers in the benchmarks' names serve as identifiers to
distinguish programs from one another (i.e., some programs were updated
from SPEC CPU95 and need to be distinguished from their previous
versions).
More detailed descriptions on the benchmarks (with reference to papers,
web sites, etc.) can be found in the individual benchmark directories
in the SPEC benchmark tree.
-
Q10: What metrics can be measured?
-
A10: The CINT2000 and CFP2000 suites can be used to measure and calculate
the following metrics:
CINT2000:
SPECint2000: The geometric mean of 12 normalized ratios (one for each
integer benchmark) when compiled with "aggressive"
optimization for each benchmark.
SPECint_base2000: The geometric mean of 12 normalized ratios when
compiled with "conservative" optimization for each
benchmark.
SPECint_rate2000: The geometric mean of 12 normalized throughput
ratios when compiled with "aggressive" optimization for
each benchmark.
SPECint_rate_base2000: The geometric mean of 12 normalized throughput
ratios when compiled with "conservative" optimization for
each benchmark.
CFP2000:
SPECfp2000: The geometric mean of 14 normalized ratios (one for each
floating point benchmark) when compiled with "aggressive"
optimization for each benchmark.
SPECfp_base2000: The geometric mean of 14 normalized ratios when
compiled with "conservative" optimization for each
benchmark.
SPECfp_rate2000: The geometric mean of 14 normalized throughput
ratios when compiled with "aggressive" optimization for
each benchmark.
SPECfp_rate_base2000: The geometric mean of 14 normalized throughput
ratios when compiled with "conservative" optimization for
each benchmark.
The ratio for each of the benchmarks is calculated using a
SPEC-determined reference time and the actual run time of the benchmark.
-
Q11: What is the difference between a "conservative" (base)
metric and an "aggressive" (non-base) metric?
-
A11: In order to provide comparisons across different computer hardware,
SPEC provides benchmarks as source code. This means they must be compiled
before they can be run. There was agreement within SPEC that the
benchmarks should be compiled the way users compile programs.
But how do users compile programs? On one side, people might just
compile with the general high-performance options suggested by the
compiler vendor. On the other side, people might experiment with many
different compilers and compiler flags to achieve the best performance.
So, while SPEC cannot match exactly how everyone uses compilers, it can
provide metrics that represent the general characteristics of these two
groups.
The base metrics (e.g., SPECint_base2000) are required for all reported
results and have set guidelines for compilation (e.g., the same flags
must be used in the same order for all benchmarks of the same language,
no more than four optimization flags, no assertion flags). The assumed
model uses performance compiler flags that a compiler vendor would
suggest for a given program knowing only its own language. The non-base
metrics (e.g., SPECint2000) are optional and have less strict
requirements (e.g., different compiler options may be used on each
benchmark).
A full description of the distinctions can be found in the SPEC CPU2000
run and reporting rules.
-
Q12: What is the difference between a "rate" and a
"non-rate" metric?
-
A12: There are several different ways to measure computer performance.
One way is to measure how fast the computer completes a single task; this
is a speed measurement. Another way is to measure how many tasks a
computer can accomplish in a certain amount of time; this is called a
throughput, capacity or rate measurement.
The SPEC speed metrics, or non-rate metrics, (i.e., SPECint2000) are
used for comparing the ability of a computer to complete single tasks.
The SPEC rate metrics (i.e., SPECint_rate2000) measure the throughput
or rate of a machine carrying out a number of similar tasks.
Traditionally, the rate metrics have been used to demonstrate the
performance of multi-processor systems.
-
Q13: How should I use SPEC CPU2000?
-
A13: Typically, the best measurement of a system is the performance of
your own application with your own workload. Unfortunately, it is often
very difficult and expensive to get a wide base of reliable, repeatable
and comparable measurements on different systems with this type of
performance audit due to time, money or other constraints.
Benchmarks act as a reference point for comparison. It's the same
reason that gas mileage ratings exist, although probably no driver gets
exactly the same mileage as listed in the ratings. If you understand
what benchmarks measure, they're useful.
It's important to know that CINT2000 and CFP2000 are CPU-focused,
not system-focused benchmarks. They concentrate on only one portion of
the factors that contribute to applications performance. A graphics or
network performance bottleneck within an application, for example, will
not be reflected in these benchmarks.
Understanding your own needs helps determine the relevance of the
benchmarks.
-
Q14: Which SPEC CPU2000 metric(s) should be used to determine
performance?
-
A14: It depends on your needs. SPEC provides the benchmarks and results
as tools for you to use. You need to determine how you use a computer or
what your performance requirements are and then choose the appropriate
SPEC metric(s).
A single user running a compute-intensive integer program, for example,
might only be interested in SPECint2000 or SPECint_base2000. On the
other hand, a person who maintains a machine used by multiple
scientists running floating point simulations might be more concerned
with SPECfp_rate2000 or SPECfp_rate_base2000.
-
Q15: SPEC CPU95 is already an available product. Why create SPEC
CPU2000? Will it show anything different from SPEC CPU95?
-
A15: Technology is always improving. As the technology improves, the
benchmarks should improve as well. SPEC needed to address the following
issues:
Run-time:
Several of the CPU95 benchmarks were finishing in less than a minute
on leading-edge processors/systems. Given the SPEC measurement tools,
small changes or fluctuations in the measurements were having
significant impacts on the percentage of improvements being seen. SPEC
chose to make run times for CPU2000 benchmarks longer to take into
account future performance and prevent this from being an issue for the
lifetime of the suites.
Application size:
Many comments received by SPEC indicated that applications had grown
in complexity and size and that CPU95 was becoming less
representative of what runs on current systems. For CPU2000, SPEC
selected programs with larger resource requirements to provide a mix
with some of the smaller programs.
Application type:
SPEC felt that there were additional application areas that should
be included in CPU2000 to increase variety and representation within
the suites. Areas such as 3D and image recognition have been added
and data compression has been expanded.
Moving target:
CPU95 has been available for five years and much improvement in
hardware and software has occurred during this time. Benchmarks need
to evolve to keep pace with improvements.
Education:
As the computer industry grows, benchmark results are quoted more
often. With the release of new benchmark suites such as CPU2000,
there is a fresh opportunity to discuss benchmark results and their
significance.
-
Q16: What happens to SPEC CPU95 after SPEC CPU2000 is
released?
-
A16: SPEC will begin the process of retiring CPU95. Three months after
the announcement of CPU2000, SPEC will require all CPU95 submissions to
be accompanied by CPU2000 results. After six months, SPEC will stop
accepting CPU95 results for its web site. SPEC will also stop selling
CPU95 at a soon-to-be determined date.
-
Q17: Is there a way to translate SPEC CPU95 results to SPEC CPU2000
results or vice versa?
-
A17: There is no formula for converting CPU95 results to CPU2000 results
and vice versa; they are different products. There probably will be some
correlation between CPU95 and CPU2000 results (i.e., machines with higher
CPU95 results often will have higher CPU2000 results), but there is no
universal formula for all systems.
SPEC strongly encourages SPEC licensees to publish CPU2000 numbers on
older platforms to provide a historical perspective on performance.
-
Q18: What criteria were used to select the benchmarks?
-
A18: In the process of selecting applications to use as benchmarks, SPEC
considered the following criteria:
-
portability to all SPEC hardware architectures (32- and 64-bit
including Alpha, Intel Architecture, PA-RISC, Rxx00, SPARC, etc.)
-
portability to various operating systems, particularly UNIX and NT
-
benchmarks should not include measurable I/O
-
benchmarks should not include networking or graphics
-
benchmarks should run in 256MB RAM without swapping (SPEC is assuming
this will be a minimal memory requirement for the life of CPU2000 and
the emphasis is on compute-intensive performance, not disk activity)
-
no more than five percent of benchmarking time should be spent
processing code not provided by SPEC.
-
Q19: Weren't some of the SPEC CPU2000 benchmarks in SPEC
CPU95? How are they different?
-
A19: Although some of the benchmarks from CPU95 are included in CPU2000,
they all have been given different (usually larger) workloads or modified
to improve their coding style or use of resources.
The revised benchmarks have been assigned different identifying numbers
to distinguish them from versions in previous suites and to indicate
that they are not comparable with their predecessors.
-
Q20: Why were some of the benchmarks not carried over from
CPU95?
-
A20: There are several reasons why SPEC did not vote to carry over
certain benchmarks. Some benchmarks were not retained because it was not
possible to create a longer-running or more robust workload. Others were
left out because SPEC felt that they did not add significant performance
information compared to the other benchmarks under consideration.
-
Q21: Why does SPEC use a reference machine for determining
performance metrics? What machine is used for SPEC CPU2000 benchmark
suites?
-
A21: SPEC uses a reference machine to normalize the performance metrics
used in the CPU2000 suites. Each benchmark is run and measured on this
machine to establish a reference time for that benchmark. These times are
then used in the SPEC calculations.
SPEC uses a Sun Ultra5_10 with a 300MHz processor as the reference
machine. It takes approximately two days to do a SPEC-conforming run of
CINT2000 and CFP2000 on this machine.
The performance relation between two systems measured with the CPU2000
benchmarks would remain the same even if a different reference machine
was used. This is a consequence of the mathematics involved in
calculating the individual and overall (geometric mean) metrics.
-
Q22: How long does it take to run the SPEC CPU2000 benchmark
suites?
-
A22: This depends on the suite and the machine that is running the
benchmarks. As mentioned above, on the reference machine it takes two
days for a SPEC-conforming run (at least three iterations of each
benchmark to ensure that results can be reproduced).
-
Q23: What if the tools cannot be run or built on a system? Can
they be run manually?
-
A23: To generate SPEC-compliant results, the tools used must be approved
by SPEC. If several attempts at using the SPEC tools are not successful
for the operating system for which you purchased CPU2000, you should
contact SPEC for technical support. SPEC will work with you to correct
the problem and/or investigate SPEC-compliant alternatives.
-
Q24: Where are SPEC CPU2000 results available?
-
A24: Results for all measurements submitted to SPEC are available at http://www.spec.org.
-
Q25: Can SPEC CPU2000 results be published outside of the SPEC
web site?
-
A25: Yes, SPEC CPU2000 results can be freely published if all the run and
reporting rules have been followed. The CPU2000 license agreement binds
every purchaser of the suite to the run and reporting rules if results
are quoted in public. A full disclosure of the details of a performance
measurement must be provided to anyone who asks.
SPEC strongly encourages that results be submitted for the web site,
since it ensures a peer review process and uniform presentation of all
results.
The run and reporting rules contain an exemption clause for research
and academic use of SPEC CPU2000. Results obtained in this context need
not comply with all the requirements for other measurements. It is
required, however, that research and academic results be clearly
distinguished from results submitted officially to SPEC.
-
Q26: How do I contact SPEC?
-
A26: Send an e-mail to info@spec.org.
Questions and answers were prepared by Kaivalya Dixit of IBM and Jeff
Reilly of Intel Corp. Dixit is president of SPEC and Reilly is release
manager for SPEC CPU2000.