Frequently Asked Questions (FAQs)
About the SPECjvm98 Benchmark
-
What is SPECjvm98?
-
What is the price of a SPECjvm98 license and when will it
be available?
-
What specific aspects of performance does SPECjvm98
measure?
-
What metrics does SPECjvm98 use to report
performance?
-
What is the difference between a "base" and
"non-base" metric?
-
How are the numerical values for these numbers
calculated?
-
Which metric should be used to measure performance?
-
What programs make up the test suite?
-
Which tests represent real applications?
-
Can I use these applications for non-benchmarking
purposes?
-
What do you get when you order SPECjvm98?
-
What are the requirements for running SPECjvm98?
-
Do I need Java capabilities on my server to run the
benchmark?
-
Can I see what the benchmark suite looks like?
-
What criteria were used to select the benchmarks?
-
Where can official SPECjvm98 results be obtained?
-
Can SPECjvm98 results be compared to results from other
SPEC benchmarks?
-
What is the reference machine and why was it chosen?
-
Does the choice of reference machine affect the
metrics?
-
Why did you choose bytecode execution times instead of
JITted execution
-
Does garbage collection affect SPECjvm98 results?
-
Why do you separate results into different memory sizes?
What are these classifications supposed to represent?
-
What is the smallest amount of memory needed to run the
benchmarks?
-
How much JVM heap memory is required?
-
How much memory would it take to avoid garbage
collection altogether?
-
Are SPECjvm98 results repeatable?
-
Why must the benchmarks be run as applets from a web
server?
-
Why in some cases do results differ significantly when
the server is local as opposed to remote? Are "server local"
results (when the web server is run on the client machine) comparable
to "server remote" results?
-
What does "problem size 100" mean? Is the
benchmark scalable?
-
Does this benchmark suite replace CINT95 or CFP95?
-
How do SPECjvm98 benchmark results relate to
SPECweb96?
-
Is this the multimedia benchmark suite I read about from
SPEC/GPC?
-
Do you provide source code for all the benchmarks?
-
Is there a "rate" metric for SPECjvm98?
-
Why is it necessary to read all the SPECjvm98 numbers to
get a good idea of performance?
-
Can SPECjvm98 help users compare Java and C++
performance?
-
How long does it take to run the SPECjvm98
benchmarks?
-
What factors affecting Java performance are not measured
by SPECjvm98?
-
Why doesn't the SPECjvm98 suite cover AWT/graphics
performance?
-
How do you recommend I measure graphics performance?
-
I don't have a web server of my own. Can I run the
benchmark?
-
Can SPECjvm98 run under JDK 1.2?
-
Is this suite suitable for measuring server
performance?
-
Can I use this suite for embedded processor
platforms?
-
Can this benchmark be used to compare performance for
JVMs running on the same x86 based systems? Won't whether or not a
JVM uses 80-bit mode affect the benchmark results?
-
How do I contact SPEC for more information?
Q1:
What is SPECjvm98?
A1: SPECjvm98 is a benchmark suite that measures
performance for Java virtual machine (JVM) client platforms. It contains
eight different tests, five of which are real applications or are derived
from real applications. Seven tests are used for computing performance
metrics. One test validates some of the features of Java, such as testing
for loop bounds.
Q2: What is the price of a SPECjvm98 license and when
will it be available?
A2: SPECjvm98 is available now for $100.
Q3: What specific aspects of performance does SPECjvm98
measure?
A3: SPECjvm98 measures the time it takes to load the
program, verify the class files, compile on the fly if a just-in-time
(JIT) compiler is used, and execute the test. From the software
perspective, these tests measure the efficiency of JVM, JIT compiler and
operating system implementations on a given hardware platform. From the
hardware perspective, the benchmark measures CPU (integer and
floating-point), cache, memory, and other platform-specific hardware
performance.
Q4: What metrics does SPECjvm98 use to report
performance?
A4: SPECjvm98 and SPECjvm_base98 are the two performance
metrics.
Q5: What is the difference between a "base"
and "non-base" metric?
A5: Each benchmark is run a number of times and the
fastest and slowest times are used for computing the metrics. The base
metric is computed using SPEC ratios of the worst elapsed time and the
peak metric is computed using SPEC ratios of the best elapsed time. The
first time a benchmark is executed, in addition to the run time to
execute the benchmark, there is additional overhead that does not
typically occur for subsequent executions, including:
-
time to load the benchmark classes,
-
time to verify classes and perform security checks,
-
time to compile bytecodes to native code (JIT)
-
time to initialize static class variables
-
time to load the benchmark's input data
Because of these overheads, the worst execution time will typically be
from the first execution, and the best time will be from a later test
run. This is not always the case, however, since garbage collection might
occur, slowing down a later run.
Q6: How are the numerical values for these numbers
calculated?
A6: The elapsed time (run time) for the system under
test is captured for each benchmark. That time is divided into the
elapsed time of the reference machine (reference time) to give a
"SPEC ratio." A SPEC ratio, therefore, is the ratio of the
speed of the system under test to the speed of the reference machine; in
other words, it's how many times faster the test system is compared
to the reference machine. Composite metrics are calculated as the
geometric means of the SPEC ratios. The geometric mean of N numbers is
the Nth root of the product of the numbers. So, for SPECjvm98, the
composite metrics are calculated as below, where the ^ sign denotes
exponentiation:
( SPECratioOf_201_compress *
SPECratioOf_202_jess *
SPECratioOf_209_db *
SPECratioOf_213_javac *
SPECratioOf_222_mpegaudio *
SPECratioOf_227_mtrt *
SPECratioOf_228_jack ) ^ (1/7)
Q7: Which metric should be used to measure performance?
A7: It depends on your needs. SPEC provides benchmarks
and results as tools. Users need to determine which benchmarks and
results are most relevant to them. Someone who will only run a bytecode
once, for example, might only be interested in the base performance.
Whereas someone who will use bytecode several times in a row might be
interested in the non-base metrics. SPEC encourages vendors and
publications to publish all numbers, along with complete hardware and
software configuration information. A single-number characterization is
strongly discouraged. Both base and peak numbers must be reported to give
an accurate indication of performance.
Q8: What programs make up the test suite?
A8: The following eight programs make up the test suite:
-
_200_check - checks JVM and Java features
-
_201_compress - A popular utility used to compress/uncompress files
-
_202_jess - a Java expert system shell
-
_209_db - A small data management program
-
_213_javac - the Java compiler, compiling 225,000 lines of code
-
_222_mpegaudio - an MPEG-3 audio stream decoder
-
_227_mtrt - a dual-threaded program that ray traces an image file
-
_228_jack - a parser generator with lexical analysis
Q9: Which tests represent real applications?
A9:
-
_202_jess - a Java version of NASA's popular CLIPS rule-based
expert system; it is distributed freely by Sandia National Labs at http://herzberg.ca.sandia.gov/jess/
-
_201_compress - a Java version of the LZW file compression utilities in
wide distribution as freeware.
-
_222_mpegaudio - an MPEG-3 audio stream decoder from Fraunhofer
Institut fuer Integrierte Schaltungen, a leading international research
lab involved in multimedia standards. More information is available at
http://www.iis.fhg.de/audio
-
_228_jack - a parser generator from Sun Microsystems, now named the
Java Compiler Compiler; it is distributed freely at: http://www.suntest.com/JavaCC/
-
_213_javac - a Java compiler from Sun Microsystems that is distributed
freely with the Java Development Kit at: http://java.sun.com/products
Q10: Can I use these applications for non-benchmarking
purposes?
A10: No.
Q11: What do you get when you order SPECjvm98?
A11: You get a CD-ROM as a SPEC-licensed product and
paper documentation.
Q12: What are the requirements for running SPECjvm98?
A12: The user needs a Java client with a minimum of 32MB
memory (this might vary) and a JVM environment supporting the 1.1 Java
API. SPECjvm98 is installed on a server, which needs 32MB or more of disk
space for the installed software. To report results, you need a SPEC tool
harness, which requires a graphics display. For reportable results, a web
server is required to store the benchmark suite and to serve class and
data files to the benchmark applet. The web server can be on another
machine networked to the client system under test, or the it can be
located on the client machine (http://localhost), in which case no
network is required. You can use a web browser or
"appletviewer" to run the benchmark. JITC is optional.
Q13: Do I need Java capabilities on my server to run the
benchmark?
A13: The preferred method of installation uses
InstallShield Java Edition, which requires Java capability. You can also
install using a "tar/gzip" archive, or skip installation and
run directly from the CD-ROM. During benchmark execution, only http (web)
service - and no Java services - is required from the server.
Q14: Can I see what the benchmark suite looks like?
A14: You can read the documentation and run a
demonstration subset of SPECjvm98 on a trial basis from http://www.spec.org/jvm98/demo Running SPECjvm98
requires a Java 1.1.X-compatible browser. The demo includes only the
benchmark harness and the _200_check program to test that you have a
suitable Java platform to run the benchmarks.
Q15: What criteria were used to select the benchmarks?
A15: The benchmark applications were selected and
developed by member companies using criteria such as:
-
bytecode content (needed high bytecode content to test JVM)
-
execution profile (looked for a flat profile)
-
result validation (same results without code changes)
-
heap usage up to 24MB
-
either I/cache or D/cache miss on reference platform
Q16: Where can official SPECjvm98 results be obtained?
A16: Results are available on http://www.spec.org . SPEC licensees may
also publish their own results in accordance with the SPEC run and
reporting rules.
Q17: Can SPECjvm98 results be compared to results from
other SPEC benchmarks?
A17: . No. SPECjvm98 contains both integer and
floating-point computation, libraries, some I/O, and opportunities for
dynamic compilation, resource management, and JVM. It is possible that
platforms with high SPECint ratings might do better on integer-intensive
tests and platforms with higher SPECfp ratings might do better on
floating-point-intensive tests, _222_mpegaudio and _227_mtrt. There is no
logical way, however, to translate results directly from one benchmark to
another.
Q18: What is the reference machine and why was it
chosen?
A18: We selected the IBM PowerPC 604@133 as the
reference machine because it was available and it is a midrange system.
Here are specifics on the machine:
-
Architecture: PowerPC, Implementation: 604
-
Number of CPU's: 1
-
Separate I & D caches
-
L1 icache: 16KB, 4-way associative, 32 Byte block size, 32 Byte line
size
-
L1 dcache: 16KB, 4-way associative, 32 Byte block size, 32 Byte line
size
-
L2 cache: 512KB, 1-way associative
-
Separate I & D TLBs
-
ITLB size: 128, 2-way associative
-
DTLB size: 128, 2-way associative
-
Memory: 96MB
-
Disks: 2 x 1GB (SCSI)
-
Operating System: AIX 4.2.1.0
-
JDK Version: JDK1.1.4 (JIT: off)
Benchmark
|
Reference Time (seconds)
|
_201_compress
|
1175
|
_202_jess
|
380
|
_209_db
|
505
|
_213_javac
|
425
|
_222_mpegaudio
|
1100
|
_227_mtrt
|
460
|
_228_jack
|
455
|
Q19: Does the choice of reference machine affect the
metrics?
A19: No. SPECjvm98 metrics are calculated using the
geometric mean, so relative rankings of different systems are independent
of the reference machine.
Q20: Why did you choose bytecode execution times instead
of JITted execution times as reference.
A20: This issue was debated, but we based our decision
on two principal factors: (1) some platforms do not use JIT due to its
larger memory footprint, and (2) JITted execution times could lead to a
SPEC ratio of less than 1, a score that SPEC believes discourages people
from reporting results.
Q21: Does garbage collection affect SPECjvm98 results?
A21: Yes. The above reference times were derived using a
heap of 24MB. There are many different types of garbage collectors and a
larger heap might or might not be better. The initial results published
at the SPEC web site show some differences between small- and
large-memory configurations.
Q22: Why do you separate results into different memory
sizes? What are these classifications supposed to represent?
A22: The classifications reflect general industry
categories. The "0-48MB" classification is a small memory
configuration. In this category, there will be considerable garbage
collection while running the benchmark. Most results submitted to SPEC
will likely fall in the "48-256MB" category, where there is
sufficient memory that garbage collection will not normally be a major
influence on performance. The "over 256MB" category represents
a large memory configuration.
Q23: What is the smallest amount of memory needed to run
the benchmarks?
A23: About 32MB, depending on your JVM, OS and heap
size.
Q24: How much JVM heap memory is required?
A24: A 24MB heap is sufficient to run all the benchmark
tests. The _202_jess test has the smallest heap size - it needs only 2MB.
Q25: How much memory would it take to avoid garbage
collection altogether?
A25: _202_jess is also the benchmark that allocates the
largest amount of objects, 748MB. To run it three times without
reclaiming space, plus have room for the JVM and OS, would take about 2.5
GB.
Q26: Are SPECjvm98 results repeatable?
A26: Yes. The threshold for repeatability can be set
(default setting is three percent). On the reference machine the
difference between successive runs was less than one percent. Users can
set a minimum and a maximum number of iterations (runs) and the threshold
of repeatability in the properties file.
Q27: Why must the benchmarks be run as applets from a
web server?
A27: Loading applets from the server ensures that all
systems tested perform the required applet security verification.
Q28: Why in some cases do results differ significantly
when the server is local as opposed to remote? Are "server
local" results (when the web server is run on the client machine)
comparable to "server remote" results?
A28: Network speed, congestion, number of hops from
server to client, and other factors could influence results when the
server is remote. When the server is local, the loopback interface is, of
course, much faster than a network connection. But, when the server is
local there is extra overhead from running the web server. Although web
server CPU overhead is not great - a maximum of one to three percent
depending on benchmark and execution phase - it still might take a
significant toll on smaller memory clients. For "server remote"
configurations, the cost of the network transfer will be greatest on the
first (base metric) execution, and might best be ameliorated by using a
fast network in a controlled test environment without competing network
traffic.
Q29: What does "problem size 100" mean? Is the
benchmark scalable?
A29: No, the benchmark is not scalable. Results can be
published only for the run with the problem size of 100. Other problem
sizes (1 and 10) are provided to quickly test the system (shorter
run-times) to verify that it runs. The designated numbers (1, 10 and 100)
are labels only and do not describe scaling.
Q30: Does this benchmark suite replace CINT95 or CFP95?
A30: No.
Q31: How do SPECjvm98 benchmark results relate to
SPECweb96?
A31: SPECjvm98 and SPECweb96 results are not comparable.
Q32: Is this the multimedia benchmark suite I read about
from SPEC/GPC?
A32: No, the multimedia benchmark suite from
SPEC/GPC's Multimedia Benchmarking Committee is currently in
development and will measure different performance aspects.
Q33: Do you provide source code for all the benchmarks?
A33: We have provided source code for all the benchmarks
except _213_javac, 222_mpegaudio and _224_jack, which are commercial
applications.
Q34: Is there a "rate" metric for SPECjvm98?
A34: No, there is no "rate" metric at this
time. SPEC might consider offering this at a later date.
Q35: Why is it necessary to read all the SPECjvm98
numbers to get a good idea of performance?
A35: SPECjvm98 measures the efficiency of the JVM and
the JIT compiler when running seven tests on a given platform. It
provides data to show how well that system performs with respect to the
reference platform. Each test has a unique profile. Some tests have more
instruction miss and others have more data cache miss. The JIT compiler
might be more effective on some tests than on others. Informed comparison
requires reading all the numbers.
Q36: Can SPECjvm98 help users compare Java and C++
performance?
A36: No. SPECjvm98 is a JVM client platform benchmark,
not a language benchmark.
Q37: How long does it take to run the SPECjvm98
benchmarks?
A37: A single run on the reference machine takes close
to three hours. The run-time on your machine is a function of its JVM and
JIT compiler performance.
Q38: What factors affecting Java performance are not
measured by SPECjvm98?
A38: AWT, network, database, and graphics performance
are not measured by the benchmark.
Q39: Why doesn't the SPECjvm98 suite cover
AWT/graphics performance?
A39: We wanted very much to measure AWT/graphics
performance, but we discovered several pitfalls:
-
It was difficult to validate that all platforms did the same amount of
work.
-
AWT is not stressed by displaying existing GIF or MPEG files.
-
It's difficult to validate the workload at the pixel level.
-
Some graphics subsystems are more powerful than the platform CPU.
-
Many graphics tests use "sleep" to match the speed
differences between the platform's CPU and graphics subsystem.
-
Graphics performance is a function of "drivers" that have
almost no bytecode content.
-
Display resolution, color, quality of image, and many other graphics
subsystem attributes determine graphics performance.
SPEC is working on solutions to these problems for a future suite.
Q40: How do you recommend I measure graphics
performance?
A40: SPEC's GPC Group has a wide range of graphics
benchmarks to measure various aspects of graphics performance: http://www.spec.org/gpc
Q41: I don't have a web server of my own. Can I run
the benchmark?
A41: You can run the benchmark but your results will not
be compliant nor comparable to those run under the SPECjvm98 Run and
Reporting Rules.
Q42: Can SPECjvm98 run under JDK 1.2?
A42: The answer is a qualified yes. At the time of this
release, JDK 1.2 was in the beta test stage, so we have little experience
with it. SPECjvm98 will not test many of the new features included in JDK
1.2, such as security, JFC, 2D APIs, and 3D APIs.
Q43: Is this suite suitable for measuring server
performance?
A43: Using SPECjvm98 on a server might not stress
servers adequately nor demonstrate their strengths. It can be used,
however, as a tool to study the JVM and JIT compiler. You could, for
example, increase the workloads by running SPECjvm98 as multiple copies
and study its effect on the server. SPEC's Java Server group is
developing a server benchmark suite - contact SPEC for more information.
Q44: Can I use this suite for embedded processor
platforms?
A44: SPECjvm98 might have too heavy of a workload for
entry-level systems. It might require more physical memory than some
smaller systems have, and might run too long to easily measure slower
processors. In addition, SPECjvm98 does not address the key performance
concerns of many embedded system designers, such as real-time response
and speed of core algorithms such as digital signal processing. SPEC
welcomes input from anyone who wants to help define performance metrics
for Java-based embedded systems.
Q45: Can this benchmark be used to compare performance
for JVMs running on the same x86 based systems? Won't whether or not
a JVM uses 80-bit mode affect the benchmark results?
A45: Yes, the benchmark can be used to compare
performance for JVMs on x86-based systems. Some JVMs might include a
switch to select 64- or 80-bit mode. This issue should fade away with the
release of the 1.2 specifications.
Q46: I
A46: For more information, contact: SPEC, 6585 Merchant
Place, Suite 100, Warrenton, VA, 20187; tel: (540) 349-7878; fax: (540)
349-5992; e-mail: info@spec.org.
This FAQs document was prepared by Kaivalya Dixit, SPEC president,
and Walter Bays, SPECjvm98 Project Manager.