|
EQNTOTT-Specific Optimizations
AbstractThe EQNTOTT benchmark is a compute-intensive program that spends the majority of execution time in the function cmppt(). Some compilers use EQNTOTT-specific optimizations to achieve the best possible SPEC92 performance. Although EQNTOTT has not been included in the SPEC95 benchmark suite (in part due to problems discussed below), the issues relating to "benchmark-specific optimizations" are still the subject of debate.Source CodeThe examples discussed below have been codified into three test programs that exhibit the EQNTOTT-specific optimizations. The source code is available in eqntott-tests.tar.Z.OverviewThe EQNTOTT benchmark is a compute-intensive program that spends the majority of execution time in the function cmppt(). The function cmppt() has a loop that compares two strings of short integers. This function is similar to strncmp(), except it compares short integers instead of characters, and the value 2 is considered equal to the value 0. Some compilers use EQNTOTT-specific optimizations to achieve the best possible run-time performance. Examples include compilers that
Example 1The function cmppt() returns a value in the set {1, 0, -1} based on the comparisons of two short integer strings. These return values are relatively independent of the inner loop. However, some compilers will generate exceptionally good code for the loop when the return values are {1, 0, -1}, but generate poor code when the return values are {2, 0, -1}. The source code that exhibits this is included in test1.c. Example 2Some compilers use pattern matching to recognize the function cmppt(). However, these compilers sometimes pattern match other programs which are very similar to EQNTOTT, but are not semantically equivalent, thus generating incorrect code. For example, the cmppt() inner loop bounds are 0 to size, but some compilers will emit the code for EQNTOTT, even when the loop bounds are some other values, such as 1 to size. The source code that exhibits this defect is included in test2.c. Example 3Consider the following code fragment. if (aa == 2) aa = 0; To improve the performance of the EQNTOTT benchmark, some compilers move the compare of aa and bb before the compares with the constants, as in: if (aa == bb) ... While this transformation is legal in the EQNTOTT benchmark, some compilers apply this transformation to similar programs in a way that is not correct. If the types of aa and bb are changed to short and unsigned short respectively, and if aa and bb are assigned the constant -1 instead of 0, the above described transformation is not legal. The source code that exhibits this defect is included in test3.c. Example 4The input data for cmppt() in EQNTOTT does not include a wide range of possible values, and some compilers generate code for cmppt() that will yield correct results for EQNTOTT, but will not yield correct results for all possible input data. This test compares the results of two semantically equivalent routines cmppt() and cmppt_reference(). The rouinte cmppt() will be recoqnized by EQNTOTT-specific optimizers. The reference routine has extra code that does not alter the semantics, but is difficult to vecorize and will not be recognized by EQNTOTT-specific optimizers. In addition, the input data includes a wider range of values than the EQNTOTT benchmark, and will report a failure if cmppt() and cmppt_reference() do not return the same values for the same input data. The source code that exhibits this defect is included in test4.c. Example 5EQNTOTT-specific optimizers use 32-bit and 64-bit load instructions to load two or four short-ints per memory fetch. Some archcitures require 32-bit and 64-bit loads to have 32-bit and 64-bit alignment. Some compilers assume that the eqntott input arrays will have alignment greater than short int, and will fail if the input arrays have 16-bit alignment. This test varies the starting addresses of the two input arrays, and will fail if the compiler does not emit code to ensure correct alignments before using load-multiple-short-int instructions. The source code that exhibits this defect is included in test5.c. © 1990-2012 Nullstone Corporation. All Rights Reserved. |