Version 1.08
Last Modified: December 14, 2006
Section 2 - Running SPECjAppServer2004
Appendix A - Isolation Level Definitions
This document specifies how the SPECjAppServer2004 benchmark is to be run for measuring and publicly reporting performance results. These rules abide by the norms laid down by SPEC. This ensures that results generated with this benchmark are meaningful, comparable to other generated results, and are repeatable (with documentation covering factors pertinent to duplicating the results).
Per the SPEC license agreement, all results publicly disclosed must adhere to these Run and Reporting Rules.
The general philosophy behind the rules for running the SPECjAppServer2004 benchmark is to ensure that an independent party can reproduce the reported results.
For results to be publishable, SPEC expects:
SPEC is aware of the importance of optimizations in producing the best system performance. SPEC is also aware that it is sometimes hard to draw an exact line between legitimate optimizations that happen to benefit SPEC benchmarks and optimizations that specifically target the SPEC benchmarks. However, with the rules below, SPEC wants to increase the awareness by implementers and end users of issues of unwanted benchmark-specific optimizations that would be incompatible with SPEC's goal of fair benchmarking.
Results must be reviewed and accepted by SPEC prior to public disclosure. The submitter must have a valid SPEC license for this benchmark to submit results. Furthermore, SPEC expects that any public use of results from this benchmark shall follow the SPEC OSG Fair Use Policy and those specific to this benchmark (see the Fair Use section below). In the case where it appears that these guidelines have been violated, SPEC may investigate and request that the offense be corrected or the results resubmitted.
SPEC reserves the right to modify the benchmark codes, workloads, and rules of SPECjAppServer2004 as deemed necessary to preserve the goal of fair benchmarking. SPEC will notify members and licensees whenever it makes changes to the benchmark and may rename the metrics. In the event that the workload or metric is changed, SPEC reserves the right to republish in summary form adapted results for previously published systems.
The Components refers to J2EE elements such as EJBs, JSPs, and Servlets.
The Deployment Unit refers to a J2EE Server or set of Servers in which the components from a particular domain are deployed.
The SPECjAppServer2004 Application (or just "Application") refers to the implementation of the Components provided for the SPECjAppServer2004 workload.
The Driver refers to the provided client software used to generate the benchmark workload.
The System Under Test (SUT) comprises all components which are being tested. This includes the J2EE Application Servers, Database Servers, network connections, etc. It does not include the Driver or the Supplier Emulator.
The Supplier Emulator refers to the provided software used to emulate the external parts supplier (outside the SUT).
The SPECjAppServer2004 Kit (or just "Kit") refers to the complete kit provided for SPECjAppServer2004. This includes the SPECjAppServer2004 Application, Driver, Supplier Emulator, load programs, and reporter.
The "SPECjAppServer2004 JOPS" is the primary SPECjAppServer2004 metric and denotes the average number of successful "jAppServer Operations Per Second" completed during the Measurement Interval." "SPECjAppServer2004 JOPS" is composed of the total number of Business Transactions completed in the Dealer Domain, added to the total number of work orders completed in the Manufacturing Domain, normalized per second.
The Resource Manager is the software that manages a database and is the same as a Database Manager.
The Test Submitter (or just "Submitter") refers to the organization that is submitting a benchmark result and is responsible for the accuracy of the submission.
The EJB Container is the runtime environment that controls the life cycle of the enterprise beans of the SPECjAppServer2004 workload. Refer to the J2EE 1.3 or later and EJB 2.0 or later specifications for more details.
The Web Container is the runtime environment that controls the execution of Servlets and JSPs. Refer to the J2EE 1.3 specifications for more details.
The jASclient refers to a thread that sends requests to components in a Deployment Unit. A jASclient does not necessarily map into a single connection to the J2EE Server.
An ECtransaction is a remote method call on an Enterprise Java Bean of the Application.
A Web Interaction is an HTTP request to the Web-based portion of the Application.
A Business Transaction is a unit of work initiated by the Driver and may involve one or more ECtransactions and/or Web Interactions.
A Database Transaction (as used is this specification) is a unit of work on the database with full ACID properties. A Database Transaction is initiated by the EJB Container or an enterprise bean as part of a Business Transaction.
The Measurement Interval refers to a steady state period during the execution of the benchmark for which the test submitter is reporting a performance metric.
The Response Time refers to the time elapsed from when the first transaction in the Business Transaction is sent from the Driver to the SUT until the response from the last transaction in the Business Transaction is received by the Driver from the SUT.
The Injection Rate (Ir) refers to the rate at which Business Transaction requests from the Dealer application in the Dealer Domain are injected into the SUT.
The Delay Time refers to the time elapsed from the last byte received by the Driver to complete a Business Transaction until the first byte sent by the Driver to request the next Business Transaction. The Delay Time is a function of the Response Time and the Injection Rate. For a required Injection Rate, the Delay Time will be smaller for larger Response Times.
The Cycle Time refers to the time elapsed from the first byte sent by the Driver to request a Business Transaction until the first byte sent by the Driver to request the next Business Transaction. The Cycle Time is the sum of the Response Time and Delay Time.
The D_Driver refers to the part of the Driver that connects to the Dealer Domain of the Application.
The M_Driver refers to the part of the Driver that connects to the Manufacturing Domain of the Application.
All hardware required to run the SPECjAppServer2004 Application must be generally available, supported and documented (see the General Availability section for details on general availability rules).
All software required to run the benchmark in the System Under Test (SUT) must be implemented by products that are generally available, supported and documented. These include but are not limited to:
The J2EE server must provide a runtime environment that meets the requirements of the Java 2 Platform, Enterprise Edition, (J2EE) Version 1.3 or later specifications during the benchmark run.
A major new version (i.e. 1.0, 2.0, etc.) of a J2EE server must have passed the J2EE Compatibility Test Suite (CTS) by the product's general availability date.
A J2EE Server that has passed the J2EE Compatibility Test Suite (CTS) satisfies the J2EE compliance requirements for this benchmark regardless of the underlying hardware and other software used to run the benchmark on a specific configuration, provided the runtime configuration options result in behavior consistent with the J2EE specification. For example, using an option that violates J2EE argument passing semantics by enabling a pass-by-reference optimization, would not meet the J2EE compliance requirement.
Comment: The intent of this requirement is to ensure that the J2EE server is a complete implementation satisfying all requirements of the J2EE specification and to prevent any advantage gained by a server that implements only an incomplete or incompatible subset of the J2EE specification.
The class files provided in the driver.jar file of the SPECjAppServer2004 kit must be used as is. No source code recompilation is allowed.
The throughput of the SPECjAppServer2004 benchmark is driven by the activity of the Dealer and Manufacturing applications. The throughput of both applications is directly related to the chosen Injection Rate. To increase the throughput, the Injection Rate needs to be increased. The benchmark also requires a number of rows to be populated in the various tables. The scaling requirements are used to maintain the ratio between the Business Transaction load presented to the SUT, the cardinality of the tables accessed by the Business Transactions, the Injection Rate and the number of jASclients generating the load.
Database scaling is defined by the Load Injection Rate LIr, which is a step function of the Dealer Injection Rate Ir. The Load Injection Rate is defined to be:
LIr = (CEILING( Ir / step)) * step
where the step is:
step = 10 ^ (INT (LOG(Ir))) {log is base 10 }
For example:
Ir | Step | LIr |
---|---|---|
1-10 | 1 | 1,2,3...10 |
11-100 | 10 | 20,30,40...100 |
101-1000 | 100 | 200,300,400...1000 |
etc. |
The cardinality (the number of rows in the table) of the C_Site, C_Supplier, S_Site, and S_Supplier tables is fixed. The M_Largeorder table is empty at the start of the run but will be populated during the course of the run as noted in table 1. The cardinality of the remaining tables will increase as functions of the Ir as depicted in table 1.
The following scaling requirements represent the initial configuration of the tables in the various domains.
Domain | Table Name | Cardinality (in rows) | Comments |
---|---|---|---|
Corporate | |||
C_Site | 1 | Only used at load-time | |
C_Supplier | 10 | Only used at load-time | |
C_Customer | 7500 * LIr | ||
C_Parts | (1100 * LIr)1 | 100 * LIr assemblies + avg. of 10 component per assembly, only used at load-time | |
C_CustomerInventory | (7.5 * 7500 * LIr)1 | Avg. of 7.5 cars per customer | |
Dealer | |||
O_Item | 100 * LIr | 100 * LIr assemblies (we only sell assemblies) | |
O_Orders | 750 * LIr | ||
O_Orderline | (2250 * LIr)1 | Avg. of 3 per order | |
Manufacturing | |||
M_Parts | (1100 * LIr)1 | 100 * LIr assemblies + avg. of 10 component per assembly | |
M_BOM | (1000 * LIr)1 | ||
M_Workorder | 100 * LIr | Load 1 work order per assembly | |
M_Inventory | (1100 * LIr)1 | 1-to-1 to M_Parts and C_Parts | |
M_Largeorder |
0 | Insertion rate: 0.07 * Ir rows/sec | |
Supplier | |||
S_Site | 1 | ||
S_Supplier | 10 | ||
S_Component | (1000 * LIr)1 | Avg. of 10 components per assembly | |
S_Supp_Component | (10000 * LIr)1 | Relationship between supplier and component |
|
S_Purchase_Order | (20 * LIr)1 | 2% of components | |
S_Purchase_OrderLine | (100 * LIr)1 | Avg. of 5 per purchase order |
1. These sizes may vary depending on actual random numbers generated.
To satisfy the requirements of a wide variety of customers, the SPECjAppServer2004 benchmark can be run in Standard or Distributed mode. The SUT consists of one or more nodes, which number is freely chosen by the implementer. The entire number of databases and J2EE Servers can be mapped to nodes as required. The implementation must not, however, take special advantage of the colocation of databases and J2EE Servers, other than the inherent elimination of WAN/LAN traffic.
In the Standard version of the workload, all the 4 domains are allowed to be combined. This means that the benchmark implementer can choose to run a single Deployment Unit that accesses a single database that contains the tables of all the domains. However, a benchmark implementer is free to separate the domains into their Deployment Units, with one or more database instances
The Distributed version of the workload is intended to model application performance where the world-wide enterprise that SPECjAppServer2004 models performs Business Transactions across business domains employing heterogeneous resource managers. In this model, the workload requires a separate Deployment Unit and a separate DBMS instance for each domain. XA-compliant recoverable 2-phase commits (see The Open Group XA Specification: http://www.opengroup.org/public/pubs/catalog/c193.htm) are required in Business Transactions that span multiple domains. The configuration for this 2-phase commit is required to be done in a way that would support heterogeneous systems. Even though implementations are likely to use the same Resource Manager for all the domains, the J2EE Servers and Resource Managers cannot take advantage of the knowledge of homogeneous Resource Managers to optimize the 2-phase commits.
Comment: A submitter meeting the requirements of the distributed version may choose to publish on both categories.
To stress the ability of the J2EE Server to handle concurrent sessions, the benchmark implements a fixed number of jASclients equal to 10 * Ir where Ir is the chosen Injection Rate. The number does not change over the course of a benchmark run.
The Manufacturing Application scales in a similar manner to the Dealer Application. Since the goal is just-in-time manufacturing, as the number of orders increase a corresponding increase in the rate at which widgets are manufactured is required. This is achieved by increasing the number of Planned Lines p proportionally to Ir as
p = 3 * Ir
Since the arrival of large orders automatically determines the LargeOrder Lines, nothing special needs to be done about these.
All tables must have the properly scaled number of rows as defined by the database population requirements.
Additional database objects or DDL modifications made to the reference schema scripts in the schema/sql directory in the SPECjAppServer2004 Kit must be disclosed along with the specific reason for the modifications. The base tables and indexes in the reference scripts cannot be replaced or deleted. Views are not allowed. The data types of fields can be modified provided they are semantically equivalent to the standard types specified in the scripts.
Comment: Replacing CHAR with VARCHAR would be considered semantically equivalent. Changing the size of a field (for example: increasing the size of a char field from 8 to 10) would not be considered semantically equivalent. Replacing CHAR with INTEGER (for example: zip code) would not be considered semantically equivalent.
Modifications that a customer may make for compatibility with a particular database server are allowed. Changes may also be necessary to allow the benchmark to run without the database becoming a bottleneck, subject to approval by SPEC. Examples of such changes include:
In any committed state the primary key values must be unique within each table. For example, in the case of a horizontally partitioned table, primary key values of rows across all partitions must be unique.
The databases must be populated using the supplied load programs prior to the start of each benchmark run. That is, after running the benchmark, the databases must be reloaded prior to a subsequent run. Modifications to the load programs are permitted for porting purposes. All such modifications made must be disclosed in the Submission File.The database must be accessible at all times for external updates (updates via SQL external to the Application Server).
The submitter must run the SPECjAppServer2004 Application provided in the kit. The reference deployment descriptors supplied with the SPECjAppServer2004 kit must be used without modification as input to the deployment process on the target Application Server.
The deployment must assume that the database could be modified by external applications.
The dealer domain is exercised using three transaction types:
Business Transactions are selected by the Driver based on the mix
shown in Table 2. The actual mix achieved in the benchmark must be
within 5% of the targeted mix for each type of Business Transaction. For
example, the browse transactions can vary between 47.5% to 52.5% of the
total mix. The Driver checks and reports on whether the mix requirement
was met.
Business Transaction Type | Percent Mix |
---|---|
Purchase | 25% |
Manage | 25% |
Browse | 50% |
The Driver measures and records the Response Time of the different types of Business Transactions. Only successfully completed Business Transactions in the Measurement Interval are included. At least 90% of the Business Transactions of each type must have a Response Time of less than the constraint specified in Table 3 below. The average Response Time of each Business Transactions type must not be greater than 0.1 seconds more than the 90% Response Time. This requirement ensures that all users will see reasonable response times. For example, if the 90% Response Time of purchase transactions is 1 second, then the average cannot be greater than 1.1 seconds. The Driver checks and reports on whether the response time requirements were met.
Business Transaction Type | 90% RT (in seconds) |
---|---|
Purchase | 2 |
Manage | 2 |
Browse | 2 |
For each Business Transaction, the Driver selects cycle times from a negative exponential distribution, computed from the following equation:
Tc = -ln(x) * 10
where:
Tc = Cycle Time
ln = natural log (base e)
x = random number with at least 31 bits of precision,
from a uniform distribution such that (0 < x <= 1)
The distribution is truncated at 5 times the mean. For each Business Transaction, the Driver measures the Response Time Tr and computes the Delay Time Td as Td = Tc - Tr. If Td > 0, the Driver will sleep for this time before beginning the next Business Transaction. If the chosen cycle time Tc is smaller than Tr, then the actual cycle time (Ta) is larger than the chosen one.
The average actual cycle time is allowed to deviate from the targeted one by 5%. The Driver checks and reports on whether the cycle time requirements were met.
The table below shows the range of values allowed for various quantities in the application. The Driver will check and report on whether these requirements were met.
Quantity | Targeted Value | Min. Allowed | Max. Allowed |
---|---|---|---|
Average Vehicles per Order |
26.6 |
25.27 |
27.93 |
Vehicle Purchasing Rate (/sec) | 6.65 * Ir | 6.32* Ir | 6.98 * Ir |
Percent Purchases that are Large Orders | 10 | 9.5 | 10.5 |
Large Order Vehicle Purchasing Rate (/sec) | 3.5 * Ir | 3.33 * Ir | 3.68 * Ir |
Average # of Vehicles per Large Order | 140 | 133 | 147 |
Regular Order Vehicle Purchasing Rate (/sec) | 3.15 * Ir | 2.99 * Ir | 3.31 * Ir |
Average # of Vehicles per Regular Order | 14 | 13.3 | 14.7 |
The metric for the Dealer Domain is Dealer Transactions/sec, composed of the total count of all Business Transactions successfully completed during the measurement interval divided by the length of the measurement interval in seconds.
The M_Driver measures and records the time taken for a work order to complete. Only successfully completed work orders in the Measurement Interval are included. At least 90% of the work orders must have a Response Time of less than 5 seconds. The average Response Time must not be greater than 0.1 seconds more than the 90% Response Time.
The table below shows the range of values allowed for various quantities in the Manufacturing Application. The M_Driver will check and report on whether the run meets these requirements.
Quantity | Targeted Value | Min. Allowed | Max. Allowed |
---|---|---|---|
LargeOrderline Widget Rate/sec | 3.5 * Ir | 3.15 * Ir | 3.85 * Ir |
Planned Line Widget Rate/sec | 3.15 * Ir | 2.835 * Ir | 3.465 * Ir |
The metric for the Manufacturing Domain is Workorders/sec, whether produced on the Planned lines or on the LargeOrder lines.
The Driver is provided as part of the SPECjAppServer2004 kit. Submitters are required to use this Driver to run the SPECjAppServer2004 benchmark.
The Driver must reside on one or more systems that are not part of the SUT.
Comment: The intent of this requirement is that the communication between the Driver and the SUT be accomplished over the network.
The D_Driver communicates with the SUT using HTTP. The D_Driver uses
a single URL to establish a connection with the web tier. If more than
one Driver system is used all Driver systems must have the same URL for
the D_Driver.
The M_Driver communicates with the SUT using the RMI interface over a protocol supported by the J2EE Server (RMI/JRMP, RMI/IIOP, RMI/T3, etc.). The M_Driver may use one or more URLs to establish a connection with the EJB tier, but if more than one Driver system is used all Driver systems must have an identical list of URLs for the M_Driver. EJB object stubs invoked by the M_Driver on the Driver systems are limited to data marshaling functions, load-balancing and fail-over capabilities.
Comment: The purpose of the identical URL requirement is to ensure that any load balancing is done by the SUT and is transparent to the Driver systems.
As part of the run, the driver checks many statistics and audits
that the run has been properly executed. The driver tests the statistics
and audit results against the requirements specified in this
document and marks each criteria as "PASS" or "FAIL" in the summary
reports. A compliant run must not report failure of any criteria. Only
results from compliant runs may be submitted for review and published.
Non-compliant runs are not allowed to be published.
Pre-configured Driver decisions, based on specific knowledge of SPECjAppServer2004 and/or the benchmark configuration, are disallowed.
The Driver systems may not perform any processing ordinarily performed by the SUT, as defined in section 2.12. This includes, but is not limited to:
The Driver records all exceptions in error logs. The only expected errors are those related to transaction consistency when a transaction may occasionally rollback due to conflicts. Any other errors that appear in the logs must be explained in the Submission File.
The Dealer and Manufacturing Application must be started simultaneously at the start of a benchmark run. The Measurement Interval must be preceded by a ramp-up period of at least 10 minutes at the end of which a steady state throughput level must be reached. At the end of the Measurement Interval, the steady state throughput level must be maintained for at least 5 minutes, after which the run can terminate.
The reported metric must be computed over a Measurement Interval during which the throughput level is in a steady state condition that represents the true sustainable performance of the SUT. Each Measurement Interval must be at least 60 minutes long and should be representative of an 24 hour run.
Memory usage must be in a steady state during the Measurement Interval.
At least two database checkpoints must take place during the Measurement Interval.
Comment: The intent is that any periodic fluctuations in the throughput or any cyclical activities, e.g. JVM garbage collection, database checkpoints, etc. be included as part of the Measurement Interval.
To demonstrate the reproducibility of the steady state condition during the Measurement Interval, a minimum of one additional (and non-overlapping) Measurement Interval of the same duration as the reported Measurement Interval must be measured and its "SPECjAppServer2004 JOPS" must be equal to or greater than the reported "SPECjAppServer2004 JOPS". This reproducibility run's metric is required to be within 5% of the reported "SPECjAppServer2004 JOPS".
The Atomicity, Consistency, Isolation and Durability (ACID) properties of transaction processing systems must be supported by the SUT during the running of this benchmark.
The System Under Test must guarantee that all transactions are atomic, meeting at least XA's atomicity requirements; the system will either perform all individual operations on the data, or will assure that no partially-completed operations leave any effects on the data. The tests described below are used to assist in determining if the SUT meets the transactional atomicity requirements. Passing these tests is a necessary, but not sufficient condition for meeting the atomicity requirements. If the SUT reports a result of "FAILED" for any of these tests then the SUT is immediately known to have failed the requirements for atomicity. If the SUT reports results of "PASSED" for all tests, then this does not immediately conclude that the SUT meets the requirements for atomicity. The submitter must disclose how the requirements for atomicity are achieved.
This test checks to see if the proper transaction atomicity levels are being upheld in transactions associated with the benchmark. This test case drives placing an order for immediate insertion into the dealership's inventory. An exception is raised after placing the order and while adding the inventory to the dealers inventory table. This should cause the transaction changes to be removed from the database and all other items returned to how they existed before the transaction was attempted. This test case consists of the following three steps:
This test transaction simply tests that the application server is
working properly by inserting an order as in Atomicity test 1 but
without causing the exception, and verifying that
it shows up in the database.
This test checks to see if the proper transaction atomicity levels are being upheld in transactions associated with the messaging subsystem. This test case drives placing a order which contains a large order and an item to be inserted immediately into the dealership's inventory. An exception is raised after placing the order while adding the inventory to the dealer's inventory table. This should cause the transaction changes to be removed from the database, messages removed from queue and all other items returned to how they existed before the transaction was attempted. This test case has three steps as follows:
This section describes the consistency and isolation requirements for Transactional Resources (currently this consists of but is not limited to Database and Messaging). Submitters can choose to implement the requirements in this section by any mechanism supported by the SUT. The isolation levels READ_COMMITTED and REPEATABLE_READ as used in this benchmark are defined in Appendix A.
For any committed transaction in which a JMS PERSISTENT message is
produced (sent or published), the message must eventually be delivered
once and only once. If a JMS PERSISTENT message is produced within a
transaction which is subsequently rolled back, the message must not be
delivered.
A message is considered to have been "delivered" if and only if it is consumed by a Message Driven Bean using a committed container managed transaction.
Logical isolation levels are defined on a per-entity basis. The logical isolation levels do not imply the use of the corresponding database isolation level. For example, it is possible to use the READ_COMMITTED database isolation level and optimistic techniques such as verified finders, reads, updates and deletes, or pessimistic locking using SELECT FOR UPDATE type semantics to implement these logical isolation levels.
In all cases where a logical isolation level is specified, this is the minimum required. Use of a higher logical isolation level is permitted.
The following entities are infrequently updated with no concurrent updates and can be configured to run with a logical isolation level of READ_COMMITTED:
All other entities must run with a logical isolation level of REPEATABLE_READ.
Because ItemEnt is assumed to be infrequently updated by an external application, ItemEnt query results and/or row state may be cached, provided that stale information is refreshed with a time-out interval of no more than 20 minutes using a logical isolation level of READ_COMMITTED. In other words, no transaction may commit if it has used cached ItemEnt query results and/or row state that was obtained from the database more than 20 minutes previously. The effects of any item insertion, item deletion, or update of any item's details, are thereby ensured to be visible to all transactions that commit 20 minutes later, or thereafter.In order to preserve referential integrity between OrderEnt and OrderLineEnt, all access to order lines within a given Business Transaction is preceded by access to the corresponding order. The same restriction applies to CustomerEnt and CustomerInventory; all access to the customer inventory within a given Business Transaction is preceded by access to the corresponding customer.
The method used to achieve the requirements in this section must be disclosed. Database Transaction rollbacks caused by conflicts when using concurrency control techniques are permitted.
Comment: Implementations using optimistic concurrency (e.g. read from cache, with DBMS-verified reads) are clearly permitted by the above definitions, as long as the verification mechanism does not permit any phenomena that are disallowed by the selected isolation level.
The cache validity test is run by the benchmark driver to ensure
that cache invalidation rules for ItemEnt are followed. The two parts of
the test are conducted during the ramp-up and ramp-down period of a
benchmark run as follows:
All transactions must take the database from one consistent state to another. All transactions must have an isolation level of READ_COMMITTED or higher; i.e., dirty reads are not allowed.
If an entity is deployed with a logical isolation of READ_COMMITTED, and if that entity is not changed in a given Database Transaction, then the J2EE Server must not issue a database update that would have the effect of losing any external updates that are applied while the Database Transaction is executing. If the J2EE Server does not have the ability to suppress unnecessary updates that could interfere with external updates, then all entities must be deployed using the REPEATABLE_READ isolation level (or higher).
On an entity with an isolation level of REPEATABLE_READ, optimizations to avoid database updates to an entity that has not been changed in a given transaction are not valid if the suppression of updates result in an effective isolation level lower than REPEATABLE_READ. Additionally, if the J2EE Server pre-loads entity state while executing finder methods (to avoid re-selecting the data at ejbLoad time), the mechanism normally used to ensure REPEATABLE_READ must still be effective, unless another mechanism is provided to ensure REPEATABLE_READ in this case. For example, if SELECT FOR UPDATE would normally be used at ejbLoad time, then SELECT FOR UPDATE should be used when executing those finder methods which pre-load entity state.
Transactions must be durable from any single point of failure on the SUT.
Comment: Durability from a single point of failure can be achieved by ensuring that the logs can withstand a single point of failure. This is typically implemented by mirroring the logs onto a separate set of disks.
The Supplier Emulator is provided as part of the SPECjAppServer2004 Kit and can be deployed on any Web Server that supports Servlets 2.1 or higher.
The Supplier Emulator must reside on a system that is not part of the SUT. The Supplier Emulator may reside on one of the Driver systems.
Comment: The intent of this section is that the communication between the Supplier Emulator and the SUT be accomplished over the network.
The SUT comprises all components which are being tested. This includes network connections, Web Servers, J2EE Application Servers, Database Servers, etc. The Web Server must support HTTP v1.1.
The SUT consists of:
Comment 1: Any components which are required to form the physical TCP/IP connections (commonly known as the NIC, Network Interface Card) from the host system(s) to the client machines are considered part of the SUT.
Comment 2: A basic configuration consisting of one or more switches between the Driver and the SUT is not considered part of the SUT. However, if any software/hardware is used to influence the flow of traffic beyond basic IP routing and switching, it is considered part of the SUT. For example, if DNS Round Robin is used to implement load balancing, the DNS server is considered part of the SUT and therefore it must not run on a driver client.
The SUT services HTTP requests and remote method calls from the Driver and returns results generated by the SPECjAppServer2004 Application which may involve information retrieval from a RDBMS. The database must be accessible via JDBC.
The SUT must have sufficient on-line disk storage to support any
expanding system files and the durable database population resulting
from executing the SPECjAppServer2004 Business Transaction mix for 24
(twenty four) hours at the reported "SPECjAppServer2004 JOPS".
To ensure that SPECjAppServer2004 results are correctly obtained and requirements are met, the driver will make explicit audit checks by calling certain components on the SUT both at the beginning and at the end of the run. The driver will include audit results with the run results. The table below list the individual auditing activities the driver performs, the pass/fail criteria, and the specific purpose of a particular audit:
Description | Purpose | Criteria |
---|---|---|
Check initial database cardinalities | Ensures proper database loading | Database is loaded according to section 2.3.1, Database Scaling Rules |
Check LargeOrder categories and agents | Ensures each LargeOrder category is processed by a LargeOrder Agent | numLargeOrderAgent == numLargeOrderCategory |
Check new order transaction count against database | Ensures that all successful new orders have been persisted to the database | newOrderTxCount <= newOrderDBCount |
Check work order transaction count against database | Ensures that all successful work orders have been persisted to the database | workorderTxCount <= workOrderDBCount |
Check purchase order count in database against the emulator | Ensures the purchase orders have been sent to the supplier emulator | poDBCount within 5% of emulatorPOCount |
Check PO line and delivery count | Ensures most PO lines created are actually delivered, issues warning in case of discrepancy | deliveryCount within 10% of poLineDBCount |
Check component replenishment | Ensures depleted components are timely replenished, limits the number of non-replenished components and issues warning for excessive depletion | depletedComponentCount <= Ir * 36 |
Check LargeOrder Processing | Ensures LargeOrders are processed without excessive backlog, issues warning if backlog is beyond criteria | pendingLargeOrderCount <= Ir * 25 |
Perform atomicity tests | Ensures atomicity requirements as documented in section 2.10.1.1 are fulfilled | See section 2.10.1.1 |
Perform cache validation test | Ensures cached ItemEnt entity beans are refreshed at least once during the run | See section 2.10.2.2.1 |
The primary metric of the SPECjAppServer2004 benchmark is jAppServer Operations Per Second ("SPECjAppServer2004 JOPS") .
The primary metric for the SPECjAppServer 2004 benchmark is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain as:
SPECjAppServer2004 JOPS = Dealer Transactions/sec + Workorders/sec
All reported "SPECjAppServer2004 JOPS" must be measured, rather than estimated, and expressed to exactly two decimal places, rounded up to the hundredth place.
The performance metric must be reported with the category of the SUT that was used to generate the result (@Standard or @Distributed). For example, if a measurement yielded 123.45 "SPECjAppServer2004 JOPS" on a Standard system, this must be reported as 123.45 SPECjAppServer2004 JOPS@Standard.
A graph of the workorder throughput versus flashed time (i.e., wall clock time) must be reported for the Manufacturing Application for the entire Test Run. The x-axis represents the elapsed time from the start of the run. The y-axis represents the throughput in Business Transactions. At least 60 different intervals must be used with a maximum interval size of 30 seconds. The opening and the closing of the Measurement Interval must also be reported. An example of such a graph is shown below.
FIGURE 1: Sample Throughput Graph
Benchmark specific optimization is not allowed. Any optimization of either the configuration or products used on the SUT must improve performance for a larger class of workloads than that defined by this benchmark and must be supported and recommended by the provider. Optimizations must have the vendor's endorsement and should be suitable for production use in an environment comparable to the one represented by the benchmark. Optimizations that take advantage of the benchmark's specific features are forbidden.
An example of an inappropriate optimization is one that requires access to the source code of the application.
Comment: The intent of this section is to encourage optimizations to be done automatically by the products.
All hardware and software used must be orderable by customers. For any product not already generally released, the Submission File must include a committed general delivery date. That date must not exceed 3 months beyond the result's publication date. However, if Java and/or J2EE related licensing issues cause a change in software availability date after publication date, the change will be allowed to be made without penalty, subject to subcommittee review.
All products used must be the proposed final versions and not prototypes. When the product is finally released, the product performance must not decrease by more than 5% of the published "SPECjAppServer2004 JOPS". If the submitter later finds the performance of the released system to be 5% lower than that reported for the pre-release system, then the submitter is required to report a corrected test result.
Comment 1: The intent is to test products that customers will use, not prototypes. Beta versions of products can be used, provided that General Availability (GA) of the final product is within 3 months. If a beta version is used, the date reported in the results must be the GA date.
Comment 2: The 5% degradation limit only applies to a difference in performance between the tested product and the GA product. Subsequent GA releases (to fix bugs, etc.) are not subject to this restriction.
In order to publicly disclose SPECjAppServer2004 results, the submitter must adhere to these reporting rules in addition to having followed the run rules described in this document. The goal of the reporting rules is to ensure the system under test is sufficiently documented such that someone could reproduce the test and its results.
Compliant runs need to be submitted to SPEC for review and must be accepted prior to public disclosure. Submissions must include the Submission File, a Configuration Diagram, and the Full Disclosure Archive for the run (see section 5.3). See section 5.3 of the SPECjAppServer2004 User Guide for details on submitting results to SPEC.
Test results that have not been accepted and published by SPEC must not be publicly disclosed except as noted in Section 3.7, Research and Academic Usage. Research and academic usage test results that have not been accepted and published by SPEC must not use the SPECjAppServer metrics ("SPECjAppServer2004 JOPS").
SPECjAppServer2004 results must always be quoted using the performance metric and the category in which the results were generated.
Estimates are not allowed.
SPECjAppServer2004 results must not be publicly compared to results from any other benchmark. This would be a violation of the SPECjAppServer2004 Run and Reporting Rules and, in the case of the TPC benchmarks, a serious violation of the TPC "fair use policy."
Results between different categories (see the Standard Vs. Distributed section) within SPECjAppServer2004 may not be compared; any attempt to do so will be considered a violation of SPEC Fair Use Rules.
SPEC's fair use rules are available at the SPEC web site: http://www.spec.org/fairuse.html.
Performance comparisons may be based only upon the SPEC defined metric (SPECjAppServer2004 JOPS@Category). Other information from the result page may be used to differentiate systems, i.e. used to define a basis for comparing a subset of systems based on some attribute like number of CPU's or memory size.
When competitive comparisons are made using SPECjAppServer2004 benchmark results, SPEC expects that the following template be used:
SPECjAppServer is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of (date). [The comparison presented is based on (basis for comparison).] For the latest SPECjAppServer2004 results visit http://www.spec.org/osg/jAppServer2004(Note: [...] above required only if selective comparisons are used.)
Example:
SPECjAppServer is a trademark of the Standard Performance Evaluation Corp. (SPEC). Competitive numbers shown reflect results published on www.spec.org as of August 12, 2003. The comparison presented is based on best performing 4-CPU servers currently shipping by Vendor 1, Vendor 2 and Vendor 3. For the latest SPECjAppServer2004 results visit http://www.spec.org/osg/jAppServer2004.
The rationale for the template is to provide fair comparisons, by ensuring that:
SPEC encourages use of the SPECjAppServer2004 benchmark in academic and research environments. The researcher is responsible for compliance with the terms of any underlying licenses (Application Server, DB Server, hardware, etc.).
It is understood that experiments in such environments may be conducted in a less formal fashion than that demanded of licensees submitting to the SPEC web site. SPEC encourages researchers to obey as many of the run rules as practical, even for informal research. A special research workload, EAStress2004, may be used in such informal environments. If research results are being published, SPEC requires:
SPEC reserves the right to ask for a full disclosure of any published results.
Public use of SPECjAppServer2004 benchmark results are bound by the SPEC OSSC Fair Use Guidelines and the SPECjAppServer2004 specific Run and Reporting Rules (this document). All publications must clearly state that these results have not been reviewed or accepted by SPEC using text equivalent to this:
SPECjAppServer is a trademark of the Standard Performance Evaluation Corp. (SPEC). The SPECjAppServer2004 results or findings in this publication have not been reviewed or accepted by SPEC, therefore no comparison nor performance inference can be made against any published SPEC result. The official web site for SPECjAppServer2004 is located at http://www.spec.org/osg/jAppServer2004.
This disclosure must precede any results quoted from the tests. It must be displayed in the same font as the results being quoted.
The intent of the BOM rules is to enable a reviewer to confirm that the tested configuration satisfies the run rule requirements and to document the components used with sufficient detail to enable a customer to reproduce the tested configuration and obtain pricing information from the supplying vendors for each component of the SUT.
The suppliers for all components must be disclosed. All items supplied by a third party (i.e. not the Test Submitter) must be explicitly stated. Each third party supplier's items must be listed separately.
The Bill of Materials must reflect the level of detail a customer would see on an itemized bill (that is, it should list individual items in the SUT that are not part of a standard package).
For each item, the BOM should include the item's supplier,
description, the item's ID (the code used by the vendor when ordering
the item), and the quantity of that item in the SUT.
For ease of benchmarking, the BOM may include hardware components
that are different from the tested system, as long as the substituted
components perform equivalently or better in the benchmark. Any
substitutions must be disclosed in the BOM. For example, disk drives
with lower capacity or speed in the tested system can be replaced by
faster ones in the BOM. However, it is not permissible to replace key
components such as CPU, memory or any software.
All components of the SUT (see section 2.12.1) must be included, including all hardware, software, and support for a three year period.
All hardware components included must be new and not reconditioned or previously owned. The software may use term limited licenses (i.e., software leasing), provided there are no additional conditions associated with the term limited licensing. If term limited licensing is used, the licensing must be for a minimum of three years. The three year support period must cover both hardware maintenance and software support.
The number of users for SPECjAppServer2004 is 13 * Ir (where 10 * Ir are Internet users and 3 * Ir are Intranet users). Any usage based licensing for the above number of users should be based on the licensing policy of the company supplying the licensed component.
Additional components such as operator consoles and backup devices must also be included, if explicitly required for the installation, operation, administration, or maintenance, of the SUT.
If software needs to be loaded from a particular device either during installation or for updates, the device must be included.
Hardware maintenance and software support must include 7 days/week, 24 hours/day coverage, either on-site, or if available as standard offering, via a central support facility.
If a central support facility is utilized, then all hardware and software required to connect to the central support must be installed and functional on the SUT during the measurement run and included.
The response time for hardware maintenance requests must not exceed
4 hours on any component whose replacement is necessary for the SUT to
return to the tested configuration.
Software support requests must include problem acknowledgment within 4 hours. No additional charges will be incurred for the resolution of software defects. Problem resolution for more than 10 non-defect problems per year is permitted to incur additional charges. Software support must include all available maintenance updates over the support period.
A Full Disclosure is required in order for results to be considered compliant with the SPECjAppServer2004 benchmark specification.
Comment 1: The intent of this disclosure is to be able to replicate the results of a submission of this benchmark given the equivalent hardware, software, and documentation.
Comment 2: In the sections below, when there is no specific reference to where the disclosure must occur, it must occur in the Submission File. Disclosures in the Archive are explicitly called out.
The term Full Disclosure refers to the information that must be provided when a benchmark result is reported.
The term Configuration Diagram refers to the picture in a common graphics format that depicts the configuration of the SUT. The Configuration Diagram is part of a Full Disclosure.
The term Full Disclosure Archive (FDA) refers to the soft-copy archive of files that is part of a Full Disclosure.
The term Submission File refers to the ASCII file that contains the information specified in this section, to which the "result.props" file from the run must be appended. The Submission File is part of a Full Disclosure.
The term Benchmark Results Page refers to the report in HTML or ASCII format that is generated from the Submission File. The Benchmark Results Page is the format used when displaying results on the SPEC web site. The Benchmark Results Page in HTML format will provide a link to the Configuration Diagram and the Full Disclosure Archive.
A Configuration Diagram of the entire configuration (including the SUT, Supplier Emulator, and load drivers) must be provided in PNG, JPEG or GIF format. The diagram should include, but is not limited to:
The Full Disclosure Archive (FDA) must be in ZIP, TAR or JAR format.
All scripts/programs used to create any logical volumes for the database devices must be included as part of the FDA. The distribution of tables and logs across all media must be explicitly depicted.
All table definition statements and all other statements used to set-up the database must be included as part of the FDA. The scripts used to create the database should be included in the "Schema" sub-directory.
If the load programs in the SPECjAppServer2004 kit were modified (see the Database Requirements section), all such modifications must be disclosed in the benchmark.load_program_modifications section of the Submission File and the modified programs must be included in the FDA.
All deployment descriptors used must be included in the FDA under the "Deploy" sub-directory. The deployed EAR file must also be in the "Deploy" directory.
Any vendor-specific tools, flags or properties used to perform ejbStore optimizations that are not transparent to the user must be disclosed in the system.sw.J2EE_Server.tuning section of the Submission File.
All steps used to build and deploy the SPECjAppServer2004 EJBs must be disclosed in a file called "deployCmds.txt" within the "Deploy" sub-directory of the FDA.
The input parameters to the Driver must be disclosed by including the following files used to run the benchmark in the FDA:
If the Launcher package was modified, its source must be included in the FDA.
The entire output directory from the run must be included in the FDA under the "FinalRun" sub-directory. Use the result file "result.props" from this run for the Submission File.
The entire output directory from the reproducibility run (see Reproducibility section) must be included in the FDA under the "RepeatRun" sub-directory.
A graph, in PNG, JPEG or GIF format, of the workorder throughput versus elapsed time (see Required Reporting section) must be included in the FDA under the "FinalRun" sub-directory.
The Audit.report file generated by the Driver for the final run and the reproducibility run must be included in the FDA under the "FinalRun" and the "RepeatRun" sub-directories.
Moreover, the modules that this instance configuration deploys and
runs must be disclosed in the system.sw.J2EE[#].web.* and system.sw.J2EE[#].EJB.*
sections by marking the modules as true or false.
If the xerces.jar package in the jars sub-directory of the SPECjAppServer2004 Kit was not used in the SUT, the reason for this should be disclosed in the system.sw.J2EE[#].detail section of the Submission File. Likewise, it the provided xerces.jar was not used in the Emulator, the reason must be disclosed in the system.sw.Emulator[#].detail section. The version and source of the actual packages used must also be disclosed.
The Dealer Injection Rate used to load the database(s) must be disclosed in the benchmark.load.injection_rate section of the Submission File.
If the schema was changed from the reference one provided in the Kit
(see the Database Requirements section),
the reason for the modifications must be disclosed in the benchmark.schema_modifications
section of the Submission File.
If the load program was changed from the reference on provided in the Kit, the reason for the modifications must be disclosed in the benchmark.load_program_modifications section of the Submission File.
The method used to meet the isolation requirements in section 2.10.4 must be disclosed in the benchmark.isolation_requirement_info section of the Submission File.
The method used to meet the durability requirements in section 2.10.5 must be disclosed in
the benchmark.durability_requirement_info section of the
Submission File.
Any errors that appear in the Driver error logs must be explained in
the notes section of the Submission File.
The number and types of systems used must be disclosed in the system.hw[#] section of the Submission File. The following information is required for each system configuration:
The method used to meet the storage requirements of section 2.12.3 must be disclosed in the benchmark.storage_requirement_info section of the Submission File.
If any software/hardware is used to influence the flow of network traffic beyond basic IP routing and switching, the additional software/hardware and settings (see section 2.12) must be disclosed in the benchmark.other section of the Submission File.
The bandwidth of the network used in the tested configuration must be disclosed in the benchmark.other section of the Submission File.
The protocol used by the Driver to communicate with the Manufacturing domain on the SUT (the driver communicates directly with the Manufacturing domain's EJBs, it does not use the web interface) must be disclosed in the system.sw.J2EE_Server.protocol section of the Submission File.
The hardware and software used to perform load balancing must be disclosed in the benchmark.other section of the Submission File. If the driver systems perform any load-balancing functions as defined in the Driver Rules section, the details of these functions must also be disclosed.
The version number of the SPECjAppServer2004 Kit used to run the benchmark must be included in the Submission File. The version number is written to the result.props file with the configuration and result information.
The "SPECjAppServer2004 JOPS" from the reproducibility run (see Reproducibility section) must be disclosed in the result.reproducibility_run.jops field of the Submission File.
The Bill of Materials, which contains the hardware and software used in the SUT, must be disclosed in the bom.* section of the Submission File. See section 4 for what is required in the Bill Of Materials.
The various isolation levels are described in ANSI SQL and J2SE documentation for java.sql.Connection. ANSI SQL defines isolation levels in terms of phenomena (P1, P2, P3) that are or are not possible under a specific isolation level. The interpretations of P1, P2 and P3 used in this benchmark are as follows:
- P1 (Dirty read)
- Transaction T1 modifies a row. Another transaction T2 then reads that row and obtains the modified value, before T1 has completed a COMMIT or ROLLBACK. Transaction T2 eventually commits successfully; it does not matter whether T1 commits or rolls back and whether it does so before or after T2 commits.
- P2 (Non-repeatable read)
- Transaction T1 reads a row. Another transaction T2 then modifies or deletes that row, before T1 has completed a COMMIT. Both transactions eventually commit successfully.
- P3 (Phantom read)
- Transaction T1 reads the set of rows N that satisfy some
<search condition>
. Transaction T2 then generates one or more rows that satisfy the<search condition>
used by T1, before T1 has completed a COMMIT. Both transactions eventually commit successfully.
An isolation level of READ_COMMITTED is defined to disallow P1 but allow P2 and P3.
An isolation level of REPEATABLE_READ is defined to disallow P1 and P2 but allow P3.
The ANSI SQL definition for REPEATABLE_READ disallows P2. Disallowing P2 also disallows the anomaly known as Read Skew. Read Skew arises in situations such as the following:
If P2 is disallowed, transaction T2 upon trying to modify X, would either be blocked until T1 completes OR it would be allowed to proceed but transaction T1 would eventually have to be rolled back.
Disallowing P2 also disallows the anomaly known as Write Skew. Write Skew arises in situations such as the following:
If P2 is disallowed, either T1 or T2 would have to be rolled back.