This document specifies how the benchmarks in the SPECweb96 Release 1.0
suite are to be run for measuring and publicly reporting performance
results. These rules are according to the norms laid down by the SPEC Web
Subcommittee and approved by the SPEC Open Systems Steering Committee. This
ensures that results generated with this suite are meaningful, comparable
to other generated results, and are repeatable (with documentation covering
factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere
to these Run and Reporting Rules.
The general philosophy behind the rules for running the SPECweb96 Release
1.0 benchmark is to ensure that an independent party can reproduce the
reported results.
The following attributes are expected:
Furthermore, SPEC expects that any public use of results from this benchmark suite shall be for servers and configurations that are appropriate for public consumption and comparison. Thus, it is also expected that:
SPEC reserves the right to adapt the benchmark codes, workloads, and rules
of SPECweb96 Release 1.0 as deemed necessary to preserve the goal of fair
benchmarking. SPEC with notify members and licencees whenever it makes
changes to the suite and will rename the metrics (e.g. from SPECweb96 to
SPECweb97a). In the event that a workload is removed, SPEC reserves the
right to republish in summary form "adapted" results for
previously published systems, converted to the new metric. In the case of
other changes, a republication may necessitate retesting and may require
support from the original test sponsor.
Relevant standards are cited in these run rules as URL references, and are
current as of the date of publication. Changes or updates to these
referenced documents or URL's may necessitate repairs to the links
and/or amendment of the run rules. The current run rules will be available
at the SPEC web site at http://www.spec.org. SPEC with notify
members and licencees whenever it makes changes to the suite.
As the WWW is defined by its interoperative protocol definitions, SPECweb requires adherence to the related protocol standards. The benchmark environment shall be governed by the following standards:
For further explanation of these protocols, the following might be helpful:
For a run to be valid, the following attributes must hold true:
Any deviations from the standard, default configuration for the server will need to be documented so an independent party would be able to reproduce the result without further assistance.
The benchmark will make references to files located on the server. The
range of files access will be determined by the particular level of
requested load for each measurement. The particular files referenced shall
be determined by the random workload generation in the benchmark
itself.
The benchmark suite provides tools for the creation of the files to be
used. It is the responsibility of the benchmarker to ensure that these
files are placed on the server so that they can be accessed properly by the
benchmark. These files, and only these files shall be used as the target
file set. The benchmark shall perform internal validations to verify the
expected file(s); no modification or bypassing of this validation is
allowed.
Each benchmark run consists of a set of requested load levels for which an
actual measurement is made. The benchmark measures the actual level
achieved and the associated average response time for each of the requested
levels.
The measurement of all data points defining a performance curve is made
within a single benchmark run, starting with the lowest requested load
level and proceeding to the highest requested load level. The requested
load levels are specified in a list, from lowest to highest, from left to
right, respectively, in the parameter file.
If any requested load level must be rerun for any reason, the entire
benchmark run must be restarted and the series of requested load levels
repeated. No server or testbed configuration changes, server reboots, or
file system initializations (e.g., "newfs") are allowed between
requested load levels.
The performance curve must consist of a minimum of 10 data points of
requested load, uniformly distributed across the range from zero to the
maximum requested load. Additional points in addition to these 10 uniformly
distributed points can also be reported.
All benchmark parameter values must be left at their default values when generating reportable SPECweb96 results, except as noted in the following list:
In particular, there are several settings that cannot be changed without invalidating the result.
The report of results for the SPECweb96 benchmark is generated in ASCII and
HTML format by the provided SPEC tools. These tools may not be changed,
except for portability reasons with prior SPEC approval. This section
describes the report generated by those tools. The tools perform error
checking and will flag many error conditions as resulting in an
"invalid run". However, these automatic checks are only there for
your convenience, and do not relieve you of your responsibility to check
your own results and follow the run and reporting rules.
While SPEC believes that a full performance curve best describes a
server's performance, the need for a single figure of merit is
recognized. The benchmark single figure of merit, SPECweb96, is the peak
throughput measured during the run (reported in operations per second). For
a result to be valid, the peak throughput must be within 5% of the
corresponding requested load. The results of a benchmark run, comprised of
several load levels, are plotted on a performance curve on the results
reporting page. The data values for the points on the curve are also
enumerated in a table.
No data point within 25% of the maximum reported throughput may be
reported where the number of failed requests for any file class is greater
than 1% of total requests for that file class, plus one. No data point
within 25% of the maximum reported throughput may be reported whose
"Actual Mix Pcnt" versus "Target Mix Pcnt" differs by
more than 10% of the "Target Mix Pcnt" for any workload class.
E.g., if the target mix percent is 0.35 then valid actual mix percents are
0.35 +/- 0.035.
The server performance graph is contstructed from a table containing the data points from a single run of the benchmark. The table consists of two columns:
Server performance is depicted in a plot with the following format:
All data points of the plot must be enumerated in the table described in paragraph 3.1.1.
The SPEC tools will allow verbose output optionally to be selected, in which case additional data are reported in a table:
The system configuration information that is required to duplicate published performance results must be reported. This list is not intended to be all-inclusive, nor is each feature in the list required to be described. The rule of thumb is: if it affects performance or the feature is required to duplicate the results, describe it. All components must be generally available within 6 months of the or iginal publication of a performance result.
The following server hardware components must be reported:
The following server software components must be reported:
A brief description of the network configuration used to achieve the benchmark results is required. The minimum information to be supplied is:
The following load generator hardware components must be reported:
The dates of general customer availability must be listed for the major components: hardware, HTTP server, and operating system, month and year. All the system, hardware and software features are required to be available within 6 months of the date of test.
The reporting page must list the date the test was performed , month and year, the organization which performed the test and is reporting the results, and the SPEC license number of that organization.
This section is used to document:
The following additional information is also required to appear on the results reporting page for SPECweb96 Release 1.0 results:
The following additional information may be required to be provided for SPEC's results review:
SPEC provides client driver software, which includes tools for running the
benchmark and reporting its' results. This software implements various
checks for conformance with these run and reporting rules. Therefore the
SPEC software must be used except that necessary substitution of equivalent
functionality (e.g. file set generation) may be done only with prior
approval from SPEC. Any such substitution must be reviewed and deemed
"performance-neutral" by the OSSC.
You may not change this software without prior approval from SPEC. SPEC
permits minimal performance-neutral portability changes, but only with
prior approval. All changes must be reviewed and deemed
"performance-neutral" by the OSSC. Source code changes required
for standards compliance must be reported to SPEC, citing appropriate
standards documents. SPEC will consider incorporating such changes in
future releases. Whenever possible, SPEC will strive to develop and enhance
the benchmark to be standards-compliant. The portability change will be
allowed if, without the change, the:
Special libraries may be used in conjunction with the benchmark code as
long as they do not replace routines in the benchmark source code, and they
are not "benchmark-specific".
Driver software includes C code (ANSI C) and perl scripts (perl5). SPEC
will provide prebuilt versions of perl and the driver code, or these may be
recompiled from the provided source. SPEC requires the user to provide OS
and server software to support HTTP 1.0 as described in section 2.