SPECjvm98 FAQ

Frequently Asked Questions (FAQs)
About the SPECjvm98 Benchmark

What is SPECjvm98?
What is the price of a SPECjvm98 license and when will it be available?
What specific aspects of performance does SPECjvm98 measure?
What metrics does SPECjvm98 use to report performance?
What is the difference between a "base" and "non-base" metric?
How are the numerical values for these numbers calculated?
Which metric should be used to measure performance?
What programs make up the test suite?
Which tests represent real applications?
Can I use these applications for non-benchmarking purposes?
What do you get when you order SPECjvm98?
What are the requirements for running SPECjvm98?
Do I need Java capabilities on my server to run the benchmark?
Can I see what the benchmark suite looks like?
What criteria were used to select the benchmarks?
Where can official SPECjvm98 results be obtained?
Can SPECjvm98 results be compared to results from other SPEC benchmarks?
What is the reference machine and why was it chosen?
Does the choice of reference machine affect the metrics?
Why did you choose bytecode execution times instead of JITted execution
Does garbage collection affect SPECjvm98 results?
Why do you separate results into different memory sizes? What are these classifications supposed to represent?
What is the smallest amount of memory needed to run the benchmarks?
How much JVM heap memory is required?
How much memory would it take to avoid garbage collection altogether?
Are SPECjvm98 results repeatable?
Why must the benchmarks be run as applets from a web server?
Why in some cases do results differ significantly when the server is local as opposed to remote? Are "server local" results (when the web server is run on the client machine) comparable to "server remote" results?
What does "problem size 100" mean? Is the benchmark scalable?
Does this benchmark suite replace CINT95 or CFP95?
How do SPECjvm98 benchmark results relate to SPECweb96?
Is this the multimedia benchmark suite I read about from SPEC/GPC?
Do you provide source code for all the benchmarks?
Is there a "rate" metric for SPECjvm98?
Why is it necessary to read all the SPECjvm98 numbers to get a good idea of performance?
Can SPECjvm98 help users compare Java and C++ performance?
How long does it take to run the SPECjvm98 benchmarks?
What factors affecting Java performance are not measured by SPECjvm98?
Why doesn't the SPECjvm98 suite cover AWT/graphics performance?
How do you recommend I measure graphics performance?
I don't have a web server of my own. Can I run the benchmark?
Can SPECjvm98 run under JDK 1.2?
Is this suite suitable for measuring server performance?
Can I use this suite for embedded processor platforms?
Can this benchmark be used to compare performance for JVMs running on the same x86 based systems? Won't whether or not a JVM uses 80-bit mode affect the benchmark results?
How do I contact SPEC for more information?

Q1: What is SPECjvm98?

A1: SPECjvm98 is a benchmark suite that measures performance for Java virtual machine (JVM) client platforms. It contains eight different tests, five of which are real applications or are derived from real applications. Seven tests are used for computing performance metrics. One test validates some of the features of Java, such as testing for loop bounds.

Q2: What is the price of a SPECjvm98 license and when will it be available?

A2: SPECjvm98 is available now for $100.

Q3: What specific aspects of performance does SPECjvm98 measure?

A3: SPECjvm98 measures the time it takes to load the program, verify the class files, compile on the fly if a just-in-time (JIT) compiler is used, and execute the test. From the software perspective, these tests measure the efficiency of JVM, JIT compiler and operating system implementations on a given hardware platform. From the hardware perspective, the benchmark measures CPU (integer and floating-point), cache, memory, and other platform-specific hardware performance.

Q4: What metrics does SPECjvm98 use to report performance?

A4: SPECjvm98 and SPECjvm_base98 are the two performance metrics.

Q5: What is the difference between a "base" and "non-base" metric?

A5: Each benchmark is run a number of times and the fastest and slowest times are used for computing the metrics. The base metric is computed using SPEC ratios of the worst elapsed time and the peak metric is computed using SPEC ratios of the best elapsed time. The first time a benchmark is executed, in addition to the run time to execute the benchmark, there is additional overhead that does not typically occur for subsequent executions, including:

time to load the benchmark classes,
time to verify classes and perform security checks,
time to compile bytecodes to native code (JIT)
time to initialize static class variables
time to load the benchmark's input data

Because of these overheads, the worst execution time will typically be from the first execution, and the best time will be from a later test run. This is not always the case, however, since garbage collection might occur, slowing down a later run.

Q6: How are the numerical values for these numbers calculated?

A6: The elapsed time (run time) for the system under test is captured for each benchmark. That time is divided into the elapsed time of the reference machine (reference time) to give a "SPEC ratio." A SPEC ratio, therefore, is the ratio of the speed of the system under test to the speed of the reference machine; in other words, it's how many times faster the test system is compared to the reference machine. Composite metrics are calculated as the geometric means of the SPEC ratios. The geometric mean of N numbers is the Nth root of the product of the numbers. So, for SPECjvm98, the composite metrics are calculated as below, where the ^ sign denotes exponentiation:

    ( SPECratioOf_201_compress  *
      SPECratioOf_202_jess      *
      SPECratioOf_209_db        *
      SPECratioOf_213_javac     *
      SPECratioOf_222_mpegaudio *
      SPECratioOf_227_mtrt      *
      SPECratioOf_228_jack        ) ^ (1/7)

Q7: Which metric should be used to measure performance?

A7: It depends on your needs. SPEC provides benchmarks and results as tools. Users need to determine which benchmarks and results are most relevant to them. Someone who will only run a bytecode once, for example, might only be interested in the base performance. Whereas someone who will use bytecode several times in a row might be interested in the non-base metrics. SPEC encourages vendors and publications to publish all numbers, along with complete hardware and software configuration information. A single-number characterization is strongly discouraged. Both base and peak numbers must be reported to give an accurate indication of performance.

Q8: What programs make up the test suite?

A8: The following eight programs make up the test suite:

_200_check - checks JVM and Java features

_201_compress - A popular utility used to compress/uncompress files

_202_jess - a Java expert system shell

_209_db - A small data management program

_213_javac - the Java compiler, compiling 225,000 lines of code

_222_mpegaudio - an MPEG-3 audio stream decoder

_227_mtrt - a dual-threaded program that ray traces an image file

_228_jack - a parser generator with lexical analysis

Q9: Which tests represent real applications?

A9:

_202_jess - a Java version of NASA's popular CLIPS rule-based expert system; it is distributed freely by Sandia National Labs at http://herzberg.ca.sandia.gov/jess/
_201_compress - a Java version of the LZW file compression utilities in wide distribution as freeware.
_222_mpegaudio - an MPEG-3 audio stream decoder from Fraunhofer Institut fuer Integrierte Schaltungen, a leading international research lab involved in multimedia standards. More information is available at http://www.iis.fhg.de/audio
_228_jack - a parser generator from Sun Microsystems, now named the Java Compiler Compiler; it is distributed freely at: http://www.suntest.com/JavaCC/
_213_javac - a Java compiler from Sun Microsystems that is distributed freely with the Java Development Kit at: http://java.sun.com/products

Q10: Can I use these applications for non-benchmarking purposes?

A10: No.

Q11: What do you get when you order SPECjvm98?

A11: You get a CD-ROM as a SPEC-licensed product and paper documentation.

Q12: What are the requirements for running SPECjvm98?

A12: The user needs a Java client with a minimum of 32MB memory (this might vary) and a JVM environment supporting the 1.1 Java API. SPECjvm98 is installed on a server, which needs 32MB or more of disk space for the installed software. To report results, you need a SPEC tool harness, which requires a graphics display. For reportable results, a web server is required to store the benchmark suite and to serve class and data files to the benchmark applet. The web server can be on another machine networked to the client system under test, or the it can be located on the client machine (http://localhost), in which case no network is required. You can use a web browser or "appletviewer" to run the benchmark. JITC is optional.

Q13: Do I need Java capabilities on my server to run the benchmark?

A13: The preferred method of installation uses InstallShield Java Edition, which requires Java capability. You can also install using a "tar/gzip" archive, or skip installation and run directly from the CD-ROM. During benchmark execution, only http (web) service - and no Java services - is required from the server.

Q14: Can I see what the benchmark suite looks like?

A14: You can read the documentation and run a demonstration subset of SPECjvm98 on a trial basis from http://www.spec.org/jvm98/demo Running SPECjvm98 requires a Java 1.1.X-compatible browser. The demo includes only the benchmark harness and the _200_check program to test that you have a suitable Java platform to run the benchmarks.

Q15: What criteria were used to select the benchmarks?

A15: The benchmark applications were selected and developed by member companies using criteria such as:

bytecode content (needed high bytecode content to test JVM)
execution profile (looked for a flat profile)
result validation (same results without code changes)
heap usage up to 24MB
either I/cache or D/cache miss on reference platform

Q16: Where can official SPECjvm98 results be obtained?

A16: Results are available on http://www.spec.org . SPEC licensees may also publish their own results in accordance with the SPEC run and reporting rules.

Q17: Can SPECjvm98 results be compared to results from other SPEC benchmarks?

A17: . No. SPECjvm98 contains both integer and floating-point computation, libraries, some I/O, and opportunities for dynamic compilation, resource management, and JVM. It is possible that platforms with high SPECint ratings might do better on integer-intensive tests and platforms with higher SPECfp ratings might do better on floating-point-intensive tests, _222_mpegaudio and _227_mtrt. There is no logical way, however, to translate results directly from one benchmark to another.

Q18: What is the reference machine and why was it chosen?

A18: We selected the IBM PowerPC 604@133 as the reference machine because it was available and it is a midrange system. Here are specifics on the machine:

Architecture: PowerPC, Implementation: 604

Number of CPU's: 1

Separate I & D caches

L1 icache: 16KB, 4-way associative, 32 Byte block size, 32 Byte line size

L1 dcache: 16KB, 4-way associative, 32 Byte block size, 32 Byte line size

L2 cache: 512KB, 1-way associative

Separate I & D TLBs

ITLB size: 128, 2-way associative

DTLB size: 128, 2-way associative

Memory: 96MB

Disks: 2 x 1GB (SCSI)

Operating System: AIX 4.2.1.0

JDK Version: JDK1.1.4 (JIT: off)

Benchmark	Reference Time (seconds)
_201_compress	1175
_202_jess	380
_209_db	505
_213_javac	425
_222_mpegaudio	1100
_227_mtrt	460
_228_jack	455

Q19: Does the choice of reference machine affect the metrics?

A19: No. SPECjvm98 metrics are calculated using the geometric mean, so relative rankings of different systems are independent of the reference machine.

Q20: Why did you choose bytecode execution times instead of JITted execution times as reference.

A20: This issue was debated, but we based our decision on two principal factors: (1) some platforms do not use JIT due to its larger memory footprint, and (2) JITted execution times could lead to a SPEC ratio of less than 1, a score that SPEC believes discourages people from reporting results.

Q21: Does garbage collection affect SPECjvm98 results?

A21: Yes. The above reference times were derived using a heap of 24MB. There are many different types of garbage collectors and a larger heap might or might not be better. The initial results published at the SPEC web site show some differences between small- and large-memory configurations.

Q22: Why do you separate results into different memory sizes? What are these classifications supposed to represent?

A22: The classifications reflect general industry categories. The "0-48MB" classification is a small memory configuration. In this category, there will be considerable garbage collection while running the benchmark. Most results submitted to SPEC will likely fall in the "48-256MB" category, where there is sufficient memory that garbage collection will not normally be a major influence on performance. The "over 256MB" category represents a large memory configuration.

Q23: What is the smallest amount of memory needed to run the benchmarks?

A23: About 32MB, depending on your JVM, OS and heap size.

Q24: How much JVM heap memory is required?

A24: A 24MB heap is sufficient to run all the benchmark tests. The _202_jess test has the smallest heap size - it needs only 2MB.

Q25: How much memory would it take to avoid garbage collection altogether?

A25: _202_jess is also the benchmark that allocates the largest amount of objects, 748MB. To run it three times without reclaiming space, plus have room for the JVM and OS, would take about 2.5 GB.

Q26: Are SPECjvm98 results repeatable?

A26: Yes. The threshold for repeatability can be set (default setting is three percent). On the reference machine the difference between successive runs was less than one percent. Users can set a minimum and a maximum number of iterations (runs) and the threshold of repeatability in the properties file.

Q27: Why must the benchmarks be run as applets from a web server?

A27: Loading applets from the server ensures that all systems tested perform the required applet security verification.

Q28: Why in some cases do results differ significantly when the server is local as opposed to remote? Are "server local" results (when the web server is run on the client machine) comparable to "server remote" results?

A28: Network speed, congestion, number of hops from server to client, and other factors could influence results when the server is remote. When the server is local, the loopback interface is, of course, much faster than a network connection. But, when the server is local there is extra overhead from running the web server. Although web server CPU overhead is not great - a maximum of one to three percent depending on benchmark and execution phase - it still might take a significant toll on smaller memory clients. For "server remote" configurations, the cost of the network transfer will be greatest on the first (base metric) execution, and might best be ameliorated by using a fast network in a controlled test environment without competing network traffic.

Q29: What does "problem size 100" mean? Is the benchmark scalable?

A29: No, the benchmark is not scalable. Results can be published only for the run with the problem size of 100. Other problem sizes (1 and 10) are provided to quickly test the system (shorter run-times) to verify that it runs. The designated numbers (1, 10 and 100) are labels only and do not describe scaling.

Q30: Does this benchmark suite replace CINT95 or CFP95?

A30: No.

Q31: How do SPECjvm98 benchmark results relate to SPECweb96?

A31: SPECjvm98 and SPECweb96 results are not comparable.

Q32: Is this the multimedia benchmark suite I read about from SPEC/GPC?

A32: No, the multimedia benchmark suite from SPEC/GPC's Multimedia Benchmarking Committee is currently in development and will measure different performance aspects.

Q33: Do you provide source code for all the benchmarks?

A33: We have provided source code for all the benchmarks except _213_javac, 222_mpegaudio and _224_jack, which are commercial applications.

Q34: Is there a "rate" metric for SPECjvm98?

A34: No, there is no "rate" metric at this time. SPEC might consider offering this at a later date.

Q35: Why is it necessary to read all the SPECjvm98 numbers to get a good idea of performance?

A35: SPECjvm98 measures the efficiency of the JVM and the JIT compiler when running seven tests on a given platform. It provides data to show how well that system performs with respect to the reference platform. Each test has a unique profile. Some tests have more instruction miss and others have more data cache miss. The JIT compiler might be more effective on some tests than on others. Informed comparison requires reading all the numbers.

Q36: Can SPECjvm98 help users compare Java and C++ performance?

A36: No. SPECjvm98 is a JVM client platform benchmark, not a language benchmark.

Q37: How long does it take to run the SPECjvm98 benchmarks?

A37: A single run on the reference machine takes close to three hours. The run-time on your machine is a function of its JVM and JIT compiler performance.

Q38: What factors affecting Java performance are not measured by SPECjvm98?

A38: AWT, network, database, and graphics performance are not measured by the benchmark.

Q39: Why doesn't the SPECjvm98 suite cover AWT/graphics performance?

A39: We wanted very much to measure AWT/graphics performance, but we discovered several pitfalls:

It was difficult to validate that all platforms did the same amount of work.
AWT is not stressed by displaying existing GIF or MPEG files.
It's difficult to validate the workload at the pixel level.
Some graphics subsystems are more powerful than the platform CPU.
Many graphics tests use "sleep" to match the speed differences between the platform's CPU and graphics subsystem.
Graphics performance is a function of "drivers" that have almost no bytecode content.
Display resolution, color, quality of image, and many other graphics subsystem attributes determine graphics performance.

SPEC is working on solutions to these problems for a future suite.

Q40: How do you recommend I measure graphics performance?

A40: SPEC's GPC Group has a wide range of graphics benchmarks to measure various aspects of graphics performance: http://www.spec.org/gpc

Q41: I don't have a web server of my own. Can I run the benchmark?

A41: You can run the benchmark but your results will not be compliant nor comparable to those run under the SPECjvm98 Run and Reporting Rules.

Q42: Can SPECjvm98 run under JDK 1.2?

A42: The answer is a qualified yes. At the time of this release, JDK 1.2 was in the beta test stage, so we have little experience with it. SPECjvm98 will not test many of the new features included in JDK 1.2, such as security, JFC, 2D APIs, and 3D APIs.

Q43: Is this suite suitable for measuring server performance?

A43: Using SPECjvm98 on a server might not stress servers adequately nor demonstrate their strengths. It can be used, however, as a tool to study the JVM and JIT compiler. You could, for example, increase the workloads by running SPECjvm98 as multiple copies and study its effect on the server. SPEC's Java Server group is developing a server benchmark suite - contact SPEC for more information.

Q44: Can I use this suite for embedded processor platforms?

A44: SPECjvm98 might have too heavy of a workload for entry-level systems. It might require more physical memory than some smaller systems have, and might run too long to easily measure slower processors. In addition, SPECjvm98 does not address the key performance concerns of many embedded system designers, such as real-time response and speed of core algorithms such as digital signal processing. SPEC welcomes input from anyone who wants to help define performance metrics for Java-based embedded systems.

Q45: Can this benchmark be used to compare performance for JVMs running on the same x86 based systems? Won't whether or not a JVM uses 80-bit mode affect the benchmark results?

A45: Yes, the benchmark can be used to compare performance for JVMs on x86-based systems. Some JVMs might include a switch to select 64- or 80-bit mode. This issue should fade away with the release of the 1.2 specifications.

Q46: I

A46: For more information, contact: SPEC, 6585 Merchant Place, Suite 100, Warrenton, VA, 20187; tel: (540) 349-7878; fax: (540) 349-5992; e-mail: info@spec.org.

This FAQs document was prepared by Kaivalya Dixit, SPEC president, and Walter Bays, SPECjvm98 Project Manager.

Standard Performance Evaluation Corporation

Frequently Asked Questions (FAQs) About the SPECjvm98 Benchmark

Frequently Asked Questions (FAQs)
About the SPECjvm98 Benchmark