1 | ________________________________________________________________________
|
---|
2 |
|
---|
3 | PYBENCH - A Python Benchmark Suite
|
---|
4 | ________________________________________________________________________
|
---|
5 |
|
---|
6 | Extendable suite of of low-level benchmarks for measuring
|
---|
7 | the performance of the Python implementation
|
---|
8 | (interpreter, compiler or VM).
|
---|
9 |
|
---|
10 | pybench is a collection of tests that provides a standardized way to
|
---|
11 | measure the performance of Python implementations. It takes a very
|
---|
12 | close look at different aspects of Python programs and let's you
|
---|
13 | decide which factors are more important to you than others, rather
|
---|
14 | than wrapping everything up in one number, like the other performance
|
---|
15 | tests do (e.g. pystone which is included in the Python Standard
|
---|
16 | Library).
|
---|
17 |
|
---|
18 | pybench has been used in the past by several Python developers to
|
---|
19 | track down performance bottlenecks or to demonstrate the impact of
|
---|
20 | optimizations and new features in Python.
|
---|
21 |
|
---|
22 | The command line interface for pybench is the file pybench.py. Run
|
---|
23 | this script with option '--help' to get a listing of the possible
|
---|
24 | options. Without options, pybench will simply execute the benchmark
|
---|
25 | and then print out a report to stdout.
|
---|
26 |
|
---|
27 |
|
---|
28 | Micro-Manual
|
---|
29 | ------------
|
---|
30 |
|
---|
31 | Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run
|
---|
32 | the benchmark suite using default settings and 'pybench.py -f <file>'
|
---|
33 | to have it store the results in a file too.
|
---|
34 |
|
---|
35 | It is usually a good idea to run pybench.py multiple times to see
|
---|
36 | whether the environment, timers and benchmark run-times are suitable
|
---|
37 | for doing benchmark tests.
|
---|
38 |
|
---|
39 | You can use the comparison feature of pybench.py ('pybench.py -c
|
---|
40 | <file>') to check how well the system behaves in comparison to a
|
---|
41 | reference run.
|
---|
42 |
|
---|
43 | If the differences are well below 10% for each test, then you have a
|
---|
44 | system that is good for doing benchmark testings. Of you get random
|
---|
45 | differences of more than 10% or significant differences between the
|
---|
46 | values for minimum and average time, then you likely have some
|
---|
47 | background processes running which cause the readings to become
|
---|
48 | inconsistent. Examples include: web-browsers, email clients, RSS
|
---|
49 | readers, music players, backup programs, etc.
|
---|
50 |
|
---|
51 | If you are only interested in a few tests of the whole suite, you can
|
---|
52 | use the filtering option, e.g. 'pybench.py -t string' will only
|
---|
53 | run/show the tests that have 'string' in their name.
|
---|
54 |
|
---|
55 | This is the current output of pybench.py --help:
|
---|
56 |
|
---|
57 | """
|
---|
58 | ------------------------------------------------------------------------
|
---|
59 | PYBENCH - a benchmark test suite for Python interpreters/compilers.
|
---|
60 | ------------------------------------------------------------------------
|
---|
61 |
|
---|
62 | Synopsis:
|
---|
63 | pybench.py [option] files...
|
---|
64 |
|
---|
65 | Options and default settings:
|
---|
66 | -n arg number of rounds (10)
|
---|
67 | -f arg save benchmark to file arg ()
|
---|
68 | -c arg compare benchmark with the one in file arg ()
|
---|
69 | -s arg show benchmark in file arg, then exit ()
|
---|
70 | -w arg set warp factor to arg (10)
|
---|
71 | -t arg run only tests with names matching arg ()
|
---|
72 | -C arg set the number of calibration runs to arg (20)
|
---|
73 | -d hide noise in comparisons (0)
|
---|
74 | -v verbose output (not recommended) (0)
|
---|
75 | --with-gc enable garbage collection (0)
|
---|
76 | --with-syscheck use default sys check interval (0)
|
---|
77 | --timer arg use given timer (time.time)
|
---|
78 | -h show this help text
|
---|
79 | --help show this help text
|
---|
80 | --debug enable debugging
|
---|
81 | --copyright show copyright
|
---|
82 | --examples show examples of usage
|
---|
83 |
|
---|
84 | Version:
|
---|
85 | 2.0
|
---|
86 |
|
---|
87 | The normal operation is to run the suite and display the
|
---|
88 | results. Use -f to save them for later reuse or comparisons.
|
---|
89 |
|
---|
90 | Available timers:
|
---|
91 |
|
---|
92 | time.time
|
---|
93 | time.clock
|
---|
94 | systimes.processtime
|
---|
95 |
|
---|
96 | Examples:
|
---|
97 |
|
---|
98 | python2.1 pybench.py -f p21.pybench
|
---|
99 | python2.5 pybench.py -f p25.pybench
|
---|
100 | python pybench.py -s p25.pybench -c p21.pybench
|
---|
101 | """
|
---|
102 |
|
---|
103 | License
|
---|
104 | -------
|
---|
105 |
|
---|
106 | See LICENSE file.
|
---|
107 |
|
---|
108 |
|
---|
109 | Sample output
|
---|
110 | -------------
|
---|
111 |
|
---|
112 | """
|
---|
113 | -------------------------------------------------------------------------------
|
---|
114 | PYBENCH 2.0
|
---|
115 | -------------------------------------------------------------------------------
|
---|
116 | * using Python 2.4.2
|
---|
117 | * disabled garbage collection
|
---|
118 | * system check interval set to maximum: 2147483647
|
---|
119 | * using timer: time.time
|
---|
120 |
|
---|
121 | Calibrating tests. Please wait...
|
---|
122 |
|
---|
123 | Running 10 round(s) of the suite at warp factor 10:
|
---|
124 |
|
---|
125 | * Round 1 done in 6.388 seconds.
|
---|
126 | * Round 2 done in 6.485 seconds.
|
---|
127 | * Round 3 done in 6.786 seconds.
|
---|
128 | ...
|
---|
129 | * Round 10 done in 6.546 seconds.
|
---|
130 |
|
---|
131 | -------------------------------------------------------------------------------
|
---|
132 | Benchmark: 2006-06-12 12:09:25
|
---|
133 | -------------------------------------------------------------------------------
|
---|
134 |
|
---|
135 | Rounds: 10
|
---|
136 | Warp: 10
|
---|
137 | Timer: time.time
|
---|
138 |
|
---|
139 | Machine Details:
|
---|
140 | Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
|
---|
141 | Processor: x86_64
|
---|
142 |
|
---|
143 | Python:
|
---|
144 | Executable: /usr/local/bin/python
|
---|
145 | Version: 2.4.2
|
---|
146 | Compiler: GCC 3.3.4 (pre 3.3.5 20040809)
|
---|
147 | Bits: 64bit
|
---|
148 | Build: Oct 1 2005 15:24:35 (#1)
|
---|
149 | Unicode: UCS2
|
---|
150 |
|
---|
151 |
|
---|
152 | Test minimum average operation overhead
|
---|
153 | -------------------------------------------------------------------------------
|
---|
154 | BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms
|
---|
155 | BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms
|
---|
156 | CompareFloats: 109ms 110ms 0.09us 0.361ms
|
---|
157 | CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms
|
---|
158 | CompareIntegers: 137ms 138ms 0.08us 0.542ms
|
---|
159 | CompareInternedStrings: 124ms 127ms 0.08us 1.367ms
|
---|
160 | CompareLongs: 100ms 104ms 0.10us 0.316ms
|
---|
161 | CompareStrings: 111ms 115ms 0.12us 0.929ms
|
---|
162 | CompareUnicode: 108ms 128ms 0.17us 0.693ms
|
---|
163 | ConcatStrings: 142ms 155ms 0.31us 0.562ms
|
---|
164 | ConcatUnicode: 119ms 127ms 0.42us 0.384ms
|
---|
165 | CreateInstances: 123ms 128ms 1.14us 0.367ms
|
---|
166 | CreateNewInstances: 121ms 126ms 1.49us 0.335ms
|
---|
167 | CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms
|
---|
168 | CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms
|
---|
169 | DictCreation: 108ms 109ms 0.27us 0.361ms
|
---|
170 | DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms
|
---|
171 | DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms
|
---|
172 | DictWithStringKeys: 114ms 117ms 0.10us 0.905ms
|
---|
173 | ForLoops: 110ms 111ms 4.46us 0.063ms
|
---|
174 | IfThenElse: 118ms 119ms 0.09us 0.685ms
|
---|
175 | ListSlicing: 116ms 120ms 8.59us 0.103ms
|
---|
176 | NestedForLoops: 125ms 137ms 0.09us 0.019ms
|
---|
177 | NormalClassAttribute: 124ms 136ms 0.11us 0.457ms
|
---|
178 | NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms
|
---|
179 | PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms
|
---|
180 | PythonMethodCalls: 140ms 149ms 0.66us 0.141ms
|
---|
181 | Recursion: 156ms 166ms 3.32us 0.452ms
|
---|
182 | SecondImport: 112ms 118ms 1.18us 0.180ms
|
---|
183 | SecondPackageImport: 118ms 127ms 1.27us 0.180ms
|
---|
184 | SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms
|
---|
185 | SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms
|
---|
186 | SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms
|
---|
187 | SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms
|
---|
188 | SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms
|
---|
189 | SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms
|
---|
190 | SimpleListManipulation: 103ms 113ms 0.10us 0.587ms
|
---|
191 | SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms
|
---|
192 | SmallLists: 105ms 116ms 0.17us 0.366ms
|
---|
193 | SmallTuples: 108ms 128ms 0.24us 0.406ms
|
---|
194 | SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms
|
---|
195 | SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms
|
---|
196 | StringMappings: 115ms 121ms 0.48us 0.405ms
|
---|
197 | StringPredicates: 120ms 129ms 0.18us 2.064ms
|
---|
198 | StringSlicing: 111ms 127ms 0.23us 0.781ms
|
---|
199 | TryExcept: 125ms 126ms 0.06us 0.681ms
|
---|
200 | TryRaiseExcept: 133ms 137ms 2.14us 0.361ms
|
---|
201 | TupleSlicing: 117ms 120ms 0.46us 0.066ms
|
---|
202 | UnicodeMappings: 156ms 160ms 4.44us 0.429ms
|
---|
203 | UnicodePredicates: 117ms 121ms 0.22us 2.487ms
|
---|
204 | UnicodeProperties: 115ms 153ms 0.38us 2.070ms
|
---|
205 | UnicodeSlicing: 126ms 129ms 0.26us 0.689ms
|
---|
206 | -------------------------------------------------------------------------------
|
---|
207 | Totals: 6283ms 6673ms
|
---|
208 | """
|
---|
209 | ________________________________________________________________________
|
---|
210 |
|
---|
211 | Writing New Tests
|
---|
212 | ________________________________________________________________________
|
---|
213 |
|
---|
214 | pybench tests are simple modules defining one or more pybench.Test
|
---|
215 | subclasses.
|
---|
216 |
|
---|
217 | Writing a test essentially boils down to providing two methods:
|
---|
218 | .test() which runs .rounds number of .operations test operations each
|
---|
219 | and .calibrate() which does the same except that it doesn't actually
|
---|
220 | execute the operations.
|
---|
221 |
|
---|
222 |
|
---|
223 | Here's an example:
|
---|
224 | ------------------
|
---|
225 |
|
---|
226 | from pybench import Test
|
---|
227 |
|
---|
228 | class IntegerCounting(Test):
|
---|
229 |
|
---|
230 | # Version number of the test as float (x.yy); this is important
|
---|
231 | # for comparisons of benchmark runs - tests with unequal version
|
---|
232 | # number will not get compared.
|
---|
233 | version = 1.0
|
---|
234 |
|
---|
235 | # The number of abstract operations done in each round of the
|
---|
236 | # test. An operation is the basic unit of what you want to
|
---|
237 | # measure. The benchmark will output the amount of run-time per
|
---|
238 | # operation. Note that in order to raise the measured timings
|
---|
239 | # significantly above noise level, it is often required to repeat
|
---|
240 | # sets of operations more than once per test round. The measured
|
---|
241 | # overhead per test round should be less than 1 second.
|
---|
242 | operations = 20
|
---|
243 |
|
---|
244 | # Number of rounds to execute per test run. This should be
|
---|
245 | # adjusted to a figure that results in a test run-time of between
|
---|
246 | # 1-2 seconds (at warp 1).
|
---|
247 | rounds = 100000
|
---|
248 |
|
---|
249 | def test(self):
|
---|
250 |
|
---|
251 | """ Run the test.
|
---|
252 |
|
---|
253 | The test needs to run self.rounds executing
|
---|
254 | self.operations number of operations each.
|
---|
255 |
|
---|
256 | """
|
---|
257 | # Init the test
|
---|
258 | a = 1
|
---|
259 |
|
---|
260 | # Run test rounds
|
---|
261 | #
|
---|
262 | # NOTE: Use xrange() for all test loops unless you want to face
|
---|
263 | # a 20MB process !
|
---|
264 | #
|
---|
265 | for i in xrange(self.rounds):
|
---|
266 |
|
---|
267 | # Repeat the operations per round to raise the run-time
|
---|
268 | # per operation significantly above the noise level of the
|
---|
269 | # for-loop overhead.
|
---|
270 |
|
---|
271 | # Execute 20 operations (a += 1):
|
---|
272 | a += 1
|
---|
273 | a += 1
|
---|
274 | a += 1
|
---|
275 | a += 1
|
---|
276 | a += 1
|
---|
277 | a += 1
|
---|
278 | a += 1
|
---|
279 | a += 1
|
---|
280 | a += 1
|
---|
281 | a += 1
|
---|
282 | a += 1
|
---|
283 | a += 1
|
---|
284 | a += 1
|
---|
285 | a += 1
|
---|
286 | a += 1
|
---|
287 | a += 1
|
---|
288 | a += 1
|
---|
289 | a += 1
|
---|
290 | a += 1
|
---|
291 | a += 1
|
---|
292 |
|
---|
293 | def calibrate(self):
|
---|
294 |
|
---|
295 | """ Calibrate the test.
|
---|
296 |
|
---|
297 | This method should execute everything that is needed to
|
---|
298 | setup and run the test - except for the actual operations
|
---|
299 | that you intend to measure. pybench uses this method to
|
---|
300 | measure the test implementation overhead.
|
---|
301 |
|
---|
302 | """
|
---|
303 | # Init the test
|
---|
304 | a = 1
|
---|
305 |
|
---|
306 | # Run test rounds (without actually doing any operation)
|
---|
307 | for i in xrange(self.rounds):
|
---|
308 |
|
---|
309 | # Skip the actual execution of the operations, since we
|
---|
310 | # only want to measure the test's administration overhead.
|
---|
311 | pass
|
---|
312 |
|
---|
313 | Registering a new test module
|
---|
314 | -----------------------------
|
---|
315 |
|
---|
316 | To register a test module with pybench, the classes need to be
|
---|
317 | imported into the pybench.Setup module. pybench will then scan all the
|
---|
318 | symbols defined in that module for subclasses of pybench.Test and
|
---|
319 | automatically add them to the benchmark suite.
|
---|
320 |
|
---|
321 |
|
---|
322 | Breaking Comparability
|
---|
323 | ----------------------
|
---|
324 |
|
---|
325 | If a change is made to any individual test that means it is no
|
---|
326 | longer strictly comparable with previous runs, the '.version' class
|
---|
327 | variable should be updated. Therefafter, comparisons with previous
|
---|
328 | versions of the test will list as "n/a" to reflect the change.
|
---|
329 |
|
---|
330 |
|
---|
331 | Version History
|
---|
332 | ---------------
|
---|
333 |
|
---|
334 | 2.0: rewrote parts of pybench which resulted in more repeatable
|
---|
335 | timings:
|
---|
336 | - made timer a parameter
|
---|
337 | - changed the platform default timer to use high-resolution
|
---|
338 | timers rather than process timers (which have a much lower
|
---|
339 | resolution)
|
---|
340 | - added option to select timer
|
---|
341 | - added process time timer (using systimes.py)
|
---|
342 | - changed to use min() as timing estimator (average
|
---|
343 | is still taken as well to provide an idea of the difference)
|
---|
344 | - garbage collection is turned off per default
|
---|
345 | - sys check interval is set to the highest possible value
|
---|
346 | - calibration is now a separate step and done using
|
---|
347 | a different strategy that allows measuring the test
|
---|
348 | overhead more accurately
|
---|
349 | - modified the tests to each give a run-time of between
|
---|
350 | 100-200ms using warp 10
|
---|
351 | - changed default warp factor to 10 (from 20)
|
---|
352 | - compared results with timeit.py and confirmed measurements
|
---|
353 | - bumped all test versions to 2.0
|
---|
354 | - updated platform.py to the latest version
|
---|
355 | - changed the output format a bit to make it look
|
---|
356 | nicer
|
---|
357 | - refactored the APIs somewhat
|
---|
358 | 1.3+: Steve Holden added the NewInstances test and the filtering
|
---|
359 | option during the NeedForSpeed sprint; this also triggered a long
|
---|
360 | discussion on how to improve benchmark timing and finally
|
---|
361 | resulted in the release of 2.0
|
---|
362 | 1.3: initial checkin into the Python SVN repository
|
---|
363 |
|
---|
364 |
|
---|
365 | Have fun,
|
---|
366 | --
|
---|
367 | Marc-Andre Lemburg
|
---|
368 | mal@lemburg.com
|
---|