Version 1.1
Last modified: July 30, 2014
This document specifies how the SPECjvm2008 benchmark is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by SPEC. This ensures that results generated with this benchmark are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
SPEC intends that this benchmark measure the overall performance of systems and Java Virtual Machines running on those systems.
The general philosophy behind the rules for running the SPECjvm2008 benchmark is to ensure that an independent party can reproduce the reported results.
For results to be publishable, SPEC expects:
Proper use of the SPEC benchmark tools as provided.
Availability of an appropriate full disclosure report.
Support for all of the appropriate APIs.
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the rules below, SPEC wants to increase the awareness by implementers and end users of issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.
Hardware and software used to run the SPECjvm2008 benchmark must provide a suitable environment for running typical Java applications.
Optimizations must generate correct code for all java programs.
Optimizations must improve performance for a class of programs, where the class of programs must be larger than a single SPEC benchmark.
The vendor encourages the implementation for general use.
The implementation is generally available, documented and supported by the providing vendor.
Furthermore, SPEC expects that any public use of results from this benchmark shall be for configurations that are appropriate for public consumption and comparison.
In the case where it appears that the above guidelines have not been followed, SPEC may investigate such a claim and request that the offending optimization (e.g. a SPEC-benchmark specific pattern matching) be backed off and the results resubmitted. Or, SPEC may request that the vendor correct the deficiency (e.g. make the optimization more general purpose or correct problems with code generation) before submitting results based on the optimization.
SPECjvm2008 does permit Open Source Applications outside of a commercial distribution or support contract with some limitations. The following are the rules that govern the admissibility of the Open Source Application in the context of a benchmark run or implementation. Open Source Applications do not include shareware and freeware, where the source is not part of the distribution.
1. | Open Source Application rules do not apply to Open Source operating systems, which would still require a commercial distribution and support. |
2. | Only a "stable" release can be used in the benchmark environment; non-"stable" releases (alpha, beta, or release candidates) cannot be used. Reason: An open source project is not contractually bound and volunteer resources make predictable future release dates unlikely (i.e. may be more likely to miss SPEC's 90 day General Availability window). A "stable" release is one that is clearly denoted as a stable release or a release that is available and recommended for general use. It must be a release that is not on the development fork, not designated as an alpha, beta, test, preliminary, pre- released, prototype, release-candidate, or any other terms that indicate that it may not be suitable for general use. |
3. | The initial "stable" release of the application must be a minimum of 12 months old. Reason: This helps ensure that the software has real application to the intended user base and is not a benchmark special that's put out with a benchmark result and only available for the 1st three months to meet SPEC's forward availability window. |
4. | At least two additional stable releases (major, minor, or bug fix) must have been completed, announced and shipped beyond the initial stable release. Reason: This helps establish a track record for the project and shows that it is actively maintained. |
5. | An established online support forum must be in place and clearly active, "usable", and "useful". It�s expected that there be at least one posting within the last 90 days. Postings from the benchmarkers or their representatives, or members of the Web subcommittee will not be included in the count. Reason: Another aspect that establishes that support is available for the software. However, benchmarkers must not cause the forum to appear active when it otherwise would not be. A "useful" support forum is defined as one that provides useful responses to users� questions, such that if a previously unreported problem is reported with sufficient detail, it is responded to by a project developer or community member with sufficient information that the user ends up with a solution, a workaround, or has been notified that the issue will be address in a future release, or that its outside the scope of the project. The archive of the problem-reporting tool must have examples of this level of conversation. A "usable" support forum is defined as one where the problem reporting tool was available without restriction, had a simple user-interface, and users can access old reports. |
6. | The project must have at least 2 identified developers contributing and maintaining the application. Reason: To help ensure that this is a real application with real developers and not a fly-by-night benchmark special. |
7. | The application must use a standard open source license such as one of those listed at http://www.opensource.org/licenses/. |
8. | The "stable" release used in the actual test run must be the current stable release at the time the test result is run or the prior "stable" release if the superseding/current "stable" release will be less than 90 days old at the time the result is made public. |
9. | The "stable" release used in the actual test run must be no older than 18 months. If there has not been a "stable" release within 18 months, then the open source project may no longer be active and as such may no longer meet these requirements. An exception may be made for "mature" projects (see below). |
10. | In rare cases, open source projects may reach "maturity" where the software requires little or no maintenance and there may no longer be active development. If it can be demonstrated that the software is still in general use and recommended either by commercial organizations or active open source projects or user forums and the source code for the software is less than 20,000 lines, then a request can be made to the subcommittee to grant this software "mature" status. This status may be reviewed semi-annually. |
There are two run categories in SPECjvm2008, base and peak. In a peak run, the Java Virtual Machine (JVM) can be configured in order to obtain the best score possible, including command line parameters, property files, or other means used, which must be explained in the submission. The user may also increase the duration for warmup and iteration time.
In a base run, no configuration or hand tuning of the JVM or benchmark runtime is allowed.
SPEC reserves the right to adapt the benchmark codes, workloads, and rules of SPECjvm2008 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to the benchmark and may rename the metrics. In the event that the workload or metric is changed, SPEC reserves the right to republish in summary form "adapted" results for previously published systems, converted to the new metric. In the case of other changes, a republication may necessitate retesting and may require support from the original test sponsor.
Tested systems must provide an environment suitable for running typical Java SE applications and must be generally available for that purpose. Any tested system must include an implementation of the Java (TM) Virtual Machine as described by the following references, or as amended by SPEC for later Java versions:
Java Virtual Machine Specification Second Edition (ISBN: 0201432943)
The following are specifically allowed, within the bounds of the Java Platform:
Precompilation and on-disk storage of compiled executables are specifically allowed. However, support for dynamic loading is required. Additional rules are defined in section 2.1.1.
The system must include a complete implementation of those classes that are referenced by this benchmark as in the Java SE 1.5.0, Java SE 6 or Java SE 7 specifications. Other Java SE specifications are not supported.
SPEC does not check for implementation of APIs not used in the benchmark.
Feedback directed optimization targeting the SPECjvm2008 benchmark is allowed only in a peak submission, subject to the restrictions regarding benchmark-specific optimizations in section 1.2. A base submission must be possible to produce in the first run using a product. Feedback-optimization before the measured invocation of the benchmark are therefore allowed only in peak category. Steps taken to produce such a result must be fully disclosed.
The SPECjvm2008 benchmark binaries are provided in jar files containing the Java classes and in resource files containing input and validation data. Compliant runs must use the provided jar files and resource files and these files must not be updated or modified in any way. While the source code of the benchmark is provided for reference, any runs that use recompiled class files and/or modified resource files are not compliant.
In addition to other run rule requirements, it is required that for a compliant run, these automatically performed checks pass:
Two run categories are described in Section 1.4, base and peak.
In a peak run, the Java Virtual Machine (JVM) can be configured in any way the benchmarker likes, by using command line parameters, properties files, system variables or by other means, which must explained in the submission. These changes must be "generally available", i.e. available, supported and documented.
In a base run, no configuration or hand tuning of the JVM is allowed. Any install options or other things affecting the result must be disclosed to provide full reproducibility. A base submission must be possible to produce in the first run using a product.
Any deviations from the standard, default configuration for the SUT will need to be documented so an independent party would be able to reproduce the result without further assistance. Changes to hardware, bios and operating system are allowed both in base and peak.
These changes must be "generally available", i.e. available, supported and documented. For example, if a special tool is needed to change the OS state, it must be provided to users and documented.
There are several parameters that control the operation of the SPECjvm2008 harness. These are checked at startup of the harness.
Configuration in base and peak:
|
Analyzers may be used when running, but the analyzers may not noticeably affect the results, i.e. it must be possible to reproduce the result without the analyzers. So, for example, even if it is possible to write a analyzer that invokes a garbage collection before each benchmark, it is not allowed. If analyzers are used that are not part of the SPECjvm2008 kit, the source code for them should be included with the submission for review.
SPECjvm2008 produces these throughput metrics in operations per minute (ops/m):
The method used to compute the overall throughput result is discussed in the "User's Guide"
A throughput result that does not qualify as one of the metrics above can be mentioned with metric ops/m.
The run must meet all requirements described in section 2 to be a compliant run. This includes producing correct result for the full run.
All components, both hardware and software, must be generally available within 3 months of the publication date in order to be a valid publication. However, if JVM licensing issues cause a change in software availability date after publication date, the change will be allowed to be made without penalty, subject to subcommittee review.
If a new or updated version of any software product is released causing earlier versions of said product to no longer be supported or encouraged by the providing vendor(s), new publications or submissions occurring after four complete review cycles have elapsed must use a version of the product encouraged by the providing vendor(s).
For example, with result review cycles ending April 16, April 30th, May 14th, May 28th, June 11th, and June 25th, if a new JDK version released between April 16th and April 29th contains critical fixes causing earlier versions of the JDK to no longer be supported or encouraged by the providing vendor(s), results submitted or published on June 25th must use the new JDK version.
If pre-release hardware or software is tested, then the test sponsor represents that the performance measured is generally representative of the performance to be expected on the same configuration of the released system. If the sponsor later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the sponsor is obligated to report a corrected test result.
In order to publicly disclose SPECjvm2008 results, the tester must adhere to these reporting rules in addition to having followed the run rules above. The goal of the reporting rules is to ensure the system under test is sufficiently documented such that someone could reproduce the test and its results.
SPEC encourages the submission of results to SPEC for review by the relevant subcommittee and subsequent publication on SPEC's website. Vendors may publish compliant results independently. However any SPEC member may request a full disclosure report for that result and the tester must comply within 10 business days. Issues raised concerning a result's compliance to the run and reporting rules will be taken up by the relevant subcommittee regardless of whether or not the result was formally submitted to SPEC.
Any SPECjvm2008 result produced in compliance with these run and reporting rules may be publicly disclosed and represented as valid SPECjvm2008 results.
Any test result not in full compliance with the run and reporting rules must not be represented using the SPECjvm2008 base ops/m and SPECjvm2008 peaks ops/m metrics.
Once you have a compliant run and wish to submit it to SPEC for review, you will need to provide the raw file created by the run.
How to submit results is described in the user guide.
Estimated results are not allowed and may not be publicly disclosed.
SPECjvm2008 results must not be publicly compared to results from any other benchmark. This would be a violation of the SPECjvm2008 reporting rules.
SPECjvm2008 requires adherence to the SPEC Fair Use Rules, which can be found at http://www.spec.org/fairuse.html.
SPECjvm2008 requires full disclosure of all software configuration and tuning applied to the software stack, including all modifications to the system firmware or BIOS, OS, JRE, and other dependent software. A detailed description is necessary to allow SPEC members to reproduce results within 5% of a reported score.
SPECjvm2008 requires a detailed description and full disclosure of the hardware configuration of the system under test. A detailed description of the CPU, System memory with disclosure of rank, geometry, density, vendor and model number is required. Disclosure of any additional hardware dependencies is required. A detailed description is necessary to allow SPEC members to reproduce results within 5% of a reported score.
SPECjvm2008 allows individual subtest competitive comparisions. It is expected that essential information to reproduce the reported score is publicly disclosed. It is acceptable to provide a url with necessary details to reproduce in lieu of complete disclosure included in marketing materials.
SPEC encourages use of the SPECjvm2008 benchmark in academic and research environments. It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. For example, a research environment may use early prototype hardware that simply cannot be expected to stay up for the length of time required to run the required number of points, or may use research compilers that are unsupported and are not generally available.
Nevertheless, SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. SPEC respectfully suggests that following the rules will improve the clarity, reproducibility, and comparability of research results. Where the rules cannot be followed, SPEC requires the results be clearly distinguished from results officially submitted to SPEC, by disclosing the deviations from the rules.
Copyright 2008 Standard Performance Evaluation Corporation
Home - Contact - Site Map - Privacy - About SPEC
webmaster@spec.org
Last updated: April 14, 2008
Copyright
© 1995 - 2008 Standard Performance Evaluation Corporation
URL: http://www.spec.org/jvm2008/docs/RunRules.html