[2] | 1 | ________________________________________________________________________
|
---|
| 2 |
|
---|
| 3 | PYBENCH - A Python Benchmark Suite
|
---|
| 4 | ________________________________________________________________________
|
---|
| 5 |
|
---|
[391] | 6 | Extendable suite of low-level benchmarks for measuring
|
---|
[2] | 7 | the performance of the Python implementation
|
---|
| 8 | (interpreter, compiler or VM).
|
---|
| 9 |
|
---|
| 10 | pybench is a collection of tests that provides a standardized way to
|
---|
| 11 | measure the performance of Python implementations. It takes a very
|
---|
| 12 | close look at different aspects of Python programs and let's you
|
---|
| 13 | decide which factors are more important to you than others, rather
|
---|
| 14 | than wrapping everything up in one number, like the other performance
|
---|
| 15 | tests do (e.g. pystone which is included in the Python Standard
|
---|
| 16 | Library).
|
---|
| 17 |
|
---|
| 18 | pybench has been used in the past by several Python developers to
|
---|
| 19 | track down performance bottlenecks or to demonstrate the impact of
|
---|
| 20 | optimizations and new features in Python.
|
---|
| 21 |
|
---|
| 22 | The command line interface for pybench is the file pybench.py. Run
|
---|
| 23 | this script with option '--help' to get a listing of the possible
|
---|
| 24 | options. Without options, pybench will simply execute the benchmark
|
---|
| 25 | and then print out a report to stdout.
|
---|
| 26 |
|
---|
| 27 |
|
---|
| 28 | Micro-Manual
|
---|
| 29 | ------------
|
---|
| 30 |
|
---|
| 31 | Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
|
---|
| 32 | the benchmark suite using default settings and 'pybench.py -f <file>'
|
---|
| 33 | to have it store the results in a file too.
|
---|
| 34 |
|
---|
| 35 | It is usually a good idea to run pybench.py multiple times to see
|
---|
| 36 | whether the environment, timers and benchmark run-times are suitable
|
---|
| 37 | for doing benchmark tests.
|
---|
| 38 |
|
---|
| 39 | You can use the comparison feature of pybench.py ('pybench.py -c
|
---|
| 40 | <file>') to check how well the system behaves in comparison to a
|
---|
| 41 | reference run.
|
---|
| 42 |
|
---|
| 43 | If the differences are well below 10% for each test, then you have a
|
---|
| 44 | system that is good for doing benchmark testings. Of you get random
|
---|
| 45 | differences of more than 10% or significant differences between the
|
---|
| 46 | values for minimum and average time, then you likely have some
|
---|
| 47 | background processes running which cause the readings to become
|
---|
| 48 | inconsistent. Examples include: web-browsers, email clients, RSS
|
---|
| 49 | readers, music players, backup programs, etc.
|
---|
| 50 |
|
---|
| 51 | If you are only interested in a few tests of the whole suite, you can
|
---|
| 52 | use the filtering option, e.g. 'pybench.py -t string' will only
|
---|
| 53 | run/show the tests that have 'string' in their name.
|
---|
| 54 |
|
---|
| 55 | This is the current output of pybench.py --help:
|
---|
| 56 |
|
---|
| 57 | """
|
---|
| 58 | ------------------------------------------------------------------------
|
---|
| 59 | PYBENCH - a benchmark test suite for Python interpreters/compilers.
|
---|
| 60 | ------------------------------------------------------------------------
|
---|
| 61 |
|
---|
| 62 | Synopsis:
|
---|
| 63 | pybench.py [option] files...
|
---|
| 64 |
|
---|
| 65 | Options and default settings:
|
---|
| 66 | -n arg number of rounds (10)
|
---|
| 67 | -f arg save benchmark to file arg ()
|
---|
| 68 | -c arg compare benchmark with the one in file arg ()
|
---|
| 69 | -s arg show benchmark in file arg, then exit ()
|
---|
| 70 | -w arg set warp factor to arg (10)
|
---|
| 71 | -t arg run only tests with names matching arg ()
|
---|
| 72 | -C arg set the number of calibration runs to arg (20)
|
---|
| 73 | -d hide noise in comparisons (0)
|
---|
| 74 | -v verbose output (not recommended) (0)
|
---|
| 75 | --with-gc enable garbage collection (0)
|
---|
| 76 | --with-syscheck use default sys check interval (0)
|
---|
| 77 | --timer arg use given timer (time.time)
|
---|
| 78 | -h show this help text
|
---|
| 79 | --help show this help text
|
---|
| 80 | --debug enable debugging
|
---|
| 81 | --copyright show copyright
|
---|
| 82 | --examples show examples of usage
|
---|
| 83 |
|
---|
| 84 | Version:
|
---|
| 85 | 2.0
|
---|
| 86 |
|
---|
| 87 | The normal operation is to run the suite and display the
|
---|
| 88 | results. Use -f to save them for later reuse or comparisons.
|
---|
| 89 |
|
---|
| 90 | Available timers:
|
---|
| 91 |
|
---|
| 92 | time.time
|
---|
| 93 | time.clock
|
---|
| 94 | systimes.processtime
|
---|
| 95 |
|
---|
| 96 | Examples:
|
---|
| 97 |
|
---|
| 98 | python2.1 pybench.py -f p21.pybench
|
---|
| 99 | python2.5 pybench.py -f p25.pybench
|
---|
| 100 | python pybench.py -s p25.pybench -c p21.pybench
|
---|
| 101 | """
|
---|
| 102 |
|
---|
| 103 | License
|
---|
| 104 | -------
|
---|
| 105 |
|
---|
| 106 | See LICENSE file.
|
---|
| 107 |
|
---|
| 108 |
|
---|
| 109 | Sample output
|
---|
| 110 | -------------
|
---|
| 111 |
|
---|
| 112 | """
|
---|
| 113 | -------------------------------------------------------------------------------
|
---|
| 114 | PYBENCH 2.0
|
---|
| 115 | -------------------------------------------------------------------------------
|
---|
| 116 | * using Python 2.4.2
|
---|
| 117 | * disabled garbage collection
|
---|
| 118 | * system check interval set to maximum: 2147483647
|
---|
| 119 | * using timer: time.time
|
---|
| 120 |
|
---|
| 121 | Calibrating tests. Please wait...
|
---|
| 122 |
|
---|
| 123 | Running 10 round(s) of the suite at warp factor 10:
|
---|
| 124 |
|
---|
| 125 | * Round 1 done in 6.388 seconds.
|
---|
| 126 | * Round 2 done in 6.485 seconds.
|
---|
| 127 | * Round 3 done in 6.786 seconds.
|
---|
| 128 | ...
|
---|
| 129 | * Round 10 done in 6.546 seconds.
|
---|
| 130 |
|
---|
| 131 | -------------------------------------------------------------------------------
|
---|
| 132 | Benchmark: 2006-06-12 12:09:25
|
---|
| 133 | -------------------------------------------------------------------------------
|
---|
| 134 |
|
---|
| 135 | Rounds: 10
|
---|
| 136 | Warp: 10
|
---|
| 137 | Timer: time.time
|
---|
| 138 |
|
---|
| 139 | Machine Details:
|
---|
| 140 | Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
|
---|
| 141 | Processor: x86_64
|
---|
| 142 |
|
---|
| 143 | Python:
|
---|
| 144 | Executable: /usr/local/bin/python
|
---|
| 145 | Version: 2.4.2
|
---|
| 146 | Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
|
---|
| 147 | Bits: 64bit
|
---|
| 148 | Build: Oct 1 2005 15:24:35 (#1)
|
---|
| 149 | Unicode: UCS2
|
---|
| 150 |
|
---|
| 151 |
|
---|
| 152 | Test minimum average operation overhead
|
---|
| 153 | -------------------------------------------------------------------------------
|
---|
| 154 | BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
|
---|
| 155 | BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
|
---|
| 156 | CompareFloats: 109ms 110ms 0.09us 0.361ms
|
---|
| 157 | CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
|
---|
| 158 | CompareIntegers: 137ms 138ms 0.08us 0.542ms
|
---|
| 159 | CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
|
---|
| 160 | CompareLongs: 100ms 104ms 0.10us 0.316ms
|
---|
| 161 | CompareStrings: 111ms 115ms 0.12us 0.929ms
|
---|
| 162 | CompareUnicode: 108ms 128ms 0.17us 0.693ms
|
---|
| 163 | ConcatStrings: 142ms 155ms 0.31us 0.562ms
|
---|
| 164 | ConcatUnicode: 119ms 127ms 0.42us 0.384ms
|
---|
| 165 | CreateInstances: 123ms 128ms 1.14us 0.367ms
|
---|
| 166 | CreateNewInstances: 121ms 126ms 1.49us 0.335ms
|
---|
| 167 | CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
|
---|
| 168 | CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
|
---|
| 169 | DictCreation: 108ms 109ms 0.27us 0.361ms
|
---|
| 170 | DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
|
---|
| 171 | DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
|
---|
| 172 | DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
|
---|
| 173 | ForLoops: 110ms 111ms 4.46us 0.063ms
|
---|
| 174 | IfThenElse: 118ms 119ms 0.09us 0.685ms
|
---|
| 175 | ListSlicing: 116ms 120ms 8.59us 0.103ms
|
---|
| 176 | NestedForLoops: 125ms 137ms 0.09us 0.019ms
|
---|
| 177 | NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
|
---|
| 178 | NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
|
---|
| 179 | PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
|
---|
| 180 | PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
|
---|
| 181 | Recursion: 156ms 166ms 3.32us 0.452ms
|
---|
| 182 | SecondImport: 112ms 118ms 1.18us 0.180ms
|
---|
| 183 | SecondPackageImport: 118ms 127ms 1.27us 0.180ms
|
---|
| 184 | SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
|
---|
| 185 | SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
|
---|
| 186 | SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
|
---|
| 187 | SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
|
---|
| 188 | SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
|
---|
| 189 | SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
|
---|
| 190 | SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
|
---|
| 191 | SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
|
---|
| 192 | SmallLists: 105ms 116ms 0.17us 0.366ms
|
---|
| 193 | SmallTuples: 108ms 128ms 0.24us 0.406ms
|
---|
| 194 | SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
|
---|
| 195 | SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
|
---|
| 196 | StringMappings: 115ms 121ms 0.48us 0.405ms
|
---|
| 197 | StringPredicates: 120ms 129ms 0.18us 2.064ms
|
---|
| 198 | StringSlicing: 111ms 127ms 0.23us 0.781ms
|
---|
| 199 | TryExcept: 125ms 126ms 0.06us 0.681ms
|
---|
| 200 | TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
|
---|
| 201 | TupleSlicing: 117ms 120ms 0.46us 0.066ms
|
---|
| 202 | UnicodeMappings: 156ms 160ms 4.44us 0.429ms
|
---|
| 203 | UnicodePredicates: 117ms 121ms 0.22us 2.487ms
|
---|
| 204 | UnicodeProperties: 115ms 153ms 0.38us 2.070ms
|
---|
| 205 | UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
|
---|
| 206 | -------------------------------------------------------------------------------
|
---|
| 207 | Totals: 6283ms 6673ms
|
---|
| 208 | """
|
---|
| 209 | ________________________________________________________________________
|
---|
| 210 |
|
---|
| 211 | Writing New Tests
|
---|
| 212 | ________________________________________________________________________
|
---|
| 213 |
|
---|
| 214 | pybench tests are simple modules defining one or more pybench.Test
|
---|
| 215 | subclasses.
|
---|
| 216 |
|
---|
| 217 | Writing a test essentially boils down to providing two methods:
|
---|
| 218 | .test() which runs .rounds number of .operations test operations each
|
---|
| 219 | and .calibrate() which does the same except that it doesn't actually
|
---|
| 220 | execute the operations.
|
---|
| 221 |
|
---|
| 222 |
|
---|
| 223 | Here's an example:
|
---|
| 224 | ------------------
|
---|
| 225 |
|
---|
| 226 | from pybench import Test
|
---|
| 227 |
|
---|
| 228 | class IntegerCounting(Test):
|
---|
| 229 |
|
---|
| 230 | # Version number of the test as float (x.yy); this is important
|
---|
| 231 | # for comparisons of benchmark runs - tests with unequal version
|
---|
| 232 | # number will not get compared.
|
---|
| 233 | version = 1.0
|
---|
| 234 |
|
---|
| 235 | # The number of abstract operations done in each round of the
|
---|
| 236 | # test. An operation is the basic unit of what you want to
|
---|
| 237 | # measure. The benchmark will output the amount of run-time per
|
---|
| 238 | # operation. Note that in order to raise the measured timings
|
---|
| 239 | # significantly above noise level, it is often required to repeat
|
---|
| 240 | # sets of operations more than once per test round. The measured
|
---|
| 241 | # overhead per test round should be less than 1 second.
|
---|
| 242 | operations = 20
|
---|
| 243 |
|
---|
| 244 | # Number of rounds to execute per test run. This should be
|
---|
| 245 | # adjusted to a figure that results in a test run-time of between
|
---|
| 246 | # 1-2 seconds (at warp 1).
|
---|
| 247 | rounds = 100000
|
---|
| 248 |
|
---|
| 249 | def test(self):
|
---|
| 250 |
|
---|
| 251 | """ Run the test.
|
---|
| 252 |
|
---|
| 253 | The test needs to run self.rounds executing
|
---|
| 254 | self.operations number of operations each.
|
---|
| 255 |
|
---|
| 256 | """
|
---|
| 257 | # Init the test
|
---|
| 258 | a = 1
|
---|
| 259 |
|
---|
| 260 | # Run test rounds
|
---|
| 261 | #
|
---|
| 262 | # NOTE: Use xrange() for all test loops unless you want to face
|
---|
| 263 | # a 20MB process !
|
---|
| 264 | #
|
---|
| 265 | for i in xrange(self.rounds):
|
---|
| 266 |
|
---|
| 267 | # Repeat the operations per round to raise the run-time
|
---|
| 268 | # per operation significantly above the noise level of the
|
---|
| 269 | # for-loop overhead.
|
---|
| 270 |
|
---|
| 271 | # Execute 20 operations (a += 1):
|
---|
| 272 | a += 1
|
---|
| 273 | a += 1
|
---|
| 274 | a += 1
|
---|
| 275 | a += 1
|
---|
| 276 | a += 1
|
---|
| 277 | a += 1
|
---|
| 278 | a += 1
|
---|
| 279 | a += 1
|
---|
| 280 | a += 1
|
---|
| 281 | a += 1
|
---|
| 282 | a += 1
|
---|
| 283 | a += 1
|
---|
| 284 | a += 1
|
---|
| 285 | a += 1
|
---|
| 286 | a += 1
|
---|
| 287 | a += 1
|
---|
| 288 | a += 1
|
---|
| 289 | a += 1
|
---|
| 290 | a += 1
|
---|
| 291 | a += 1
|
---|
| 292 |
|
---|
| 293 | def calibrate(self):
|
---|
| 294 |
|
---|
| 295 | """ Calibrate the test.
|
---|
| 296 |
|
---|
| 297 | This method should execute everything that is needed to
|
---|
| 298 | setup and run the test - except for the actual operations
|
---|
| 299 | that you intend to measure. pybench uses this method to
|
---|
| 300 | measure the test implementation overhead.
|
---|
| 301 |
|
---|
| 302 | """
|
---|
| 303 | # Init the test
|
---|
| 304 | a = 1
|
---|
| 305 |
|
---|
| 306 | # Run test rounds (without actually doing any operation)
|
---|
| 307 | for i in xrange(self.rounds):
|
---|
| 308 |
|
---|
| 309 | # Skip the actual execution of the operations, since we
|
---|
| 310 | # only want to measure the test's administration overhead.
|
---|
| 311 | pass
|
---|
| 312 |
|
---|
| 313 | Registering a new test module
|
---|
| 314 | -----------------------------
|
---|
| 315 |
|
---|
| 316 | To register a test module with pybench, the classes need to be
|
---|
| 317 | imported into the pybench.Setup module. pybench will then scan all the
|
---|
| 318 | symbols defined in that module for subclasses of pybench.Test and
|
---|
| 319 | automatically add them to the benchmark suite.
|
---|
| 320 |
|
---|
| 321 |
|
---|
| 322 | Breaking Comparability
|
---|
| 323 | ----------------------
|
---|
| 324 |
|
---|
| 325 | If a change is made to any individual test that means it is no
|
---|
| 326 | longer strictly comparable with previous runs, the '.version' class
|
---|
| 327 | variable should be updated. Therefafter, comparisons with previous
|
---|
| 328 | versions of the test will list as "n/a" to reflect the change.
|
---|
| 329 |
|
---|
| 330 |
|
---|
| 331 | Version History
|
---|
| 332 | ---------------
|
---|
| 333 |
|
---|
| 334 | 2.0: rewrote parts of pybench which resulted in more repeatable
|
---|
| 335 | timings:
|
---|
| 336 | - made timer a parameter
|
---|
| 337 | - changed the platform default timer to use high-resolution
|
---|
| 338 | timers rather than process timers (which have a much lower
|
---|
| 339 | resolution)
|
---|
| 340 | - added option to select timer
|
---|
| 341 | - added process time timer (using systimes.py)
|
---|
| 342 | - changed to use min() as timing estimator (average
|
---|
| 343 | is still taken as well to provide an idea of the difference)
|
---|
| 344 | - garbage collection is turned off per default
|
---|
| 345 | - sys check interval is set to the highest possible value
|
---|
| 346 | - calibration is now a separate step and done using
|
---|
| 347 | a different strategy that allows measuring the test
|
---|
| 348 | overhead more accurately
|
---|
| 349 | - modified the tests to each give a run-time of between
|
---|
| 350 | 100-200ms using warp 10
|
---|
| 351 | - changed default warp factor to 10 (from 20)
|
---|
| 352 | - compared results with timeit.py and confirmed measurements
|
---|
| 353 | - bumped all test versions to 2.0
|
---|
| 354 | - updated platform.py to the latest version
|
---|
| 355 | - changed the output format a bit to make it look
|
---|
| 356 | nicer
|
---|
| 357 | - refactored the APIs somewhat
|
---|
| 358 | 1.3+: Steve Holden added the NewInstances test and the filtering
|
---|
| 359 | option during the NeedForSpeed sprint; this also triggered a long
|
---|
| 360 | discussion on how to improve benchmark timing and finally
|
---|
| 361 | resulted in the release of 2.0
|
---|
| 362 | 1.3: initial checkin into the Python SVN repository
|
---|
| 363 |
|
---|
| 364 |
|
---|
| 365 | Have fun,
|
---|
| 366 | --
|
---|
| 367 | Marc-Andre Lemburg
|
---|
| 368 | mal@lemburg.com
|
---|