1 | This document describes some caveats about the use of Valgrind with
|
---|
2 | Python. Valgrind is used periodically by Python developers to try
|
---|
3 | to ensure there are no memory leaks or invalid memory reads/writes.
|
---|
4 |
|
---|
5 | If you don't want to read about the details of using Valgrind, there
|
---|
6 | are still two things you must do to suppress the warnings. First,
|
---|
7 | you must use a suppressions file. One is supplied in
|
---|
8 | Misc/valgrind-python.supp. Second, you must do one of the following:
|
---|
9 |
|
---|
10 | * Uncomment Py_USING_MEMORY_DEBUGGER in Objects/obmalloc.c,
|
---|
11 | then rebuild Python
|
---|
12 | * Uncomment the lines in Misc/valgrind-python.supp that
|
---|
13 | suppress the warnings for PyObject_Free and PyObject_Realloc
|
---|
14 |
|
---|
15 | If you want to use Valgrind more effectively and catch even more
|
---|
16 | memory leaks, you will need to configure python --without-pymalloc.
|
---|
17 | PyMalloc allocates a few blocks in big chunks and most object
|
---|
18 | allocations don't call malloc, they use chunks doled about by PyMalloc
|
---|
19 | from the big blocks. This means Valgrind can't detect
|
---|
20 | many allocations (and frees), except for those that are forwarded
|
---|
21 | to the system malloc. Note: configuring python --without-pymalloc
|
---|
22 | makes Python run much slower, especially when running under Valgrind.
|
---|
23 | You may need to run the tests in batches under Valgrind to keep
|
---|
24 | the memory usage down to allow the tests to complete. It seems to take
|
---|
25 | about 5 times longer to run --without-pymalloc.
|
---|
26 |
|
---|
27 | Apr 15, 2006:
|
---|
28 | test_ctypes causes Valgrind 3.1.1 to fail (crash).
|
---|
29 | test_socket_ssl should be skipped when running valgrind.
|
---|
30 | The reason is that it purposely uses uninitialized memory.
|
---|
31 | This causes many spurious warnings, so it's easier to just skip it.
|
---|
32 |
|
---|
33 |
|
---|
34 | Details:
|
---|
35 | --------
|
---|
36 | Python uses its own small-object allocation scheme on top of malloc,
|
---|
37 | called PyMalloc.
|
---|
38 |
|
---|
39 | Valgrind may show some unexpected results when PyMalloc is used.
|
---|
40 | Starting with Python 2.3, PyMalloc is used by default. You can disable
|
---|
41 | PyMalloc when configuring python by adding the --without-pymalloc option.
|
---|
42 | If you disable PyMalloc, most of the information in this document and
|
---|
43 | the supplied suppressions file will not be useful. As discussed above,
|
---|
44 | disabling PyMalloc can catch more problems.
|
---|
45 |
|
---|
46 | If you use valgrind on a default build of Python, you will see
|
---|
47 | many errors like:
|
---|
48 |
|
---|
49 | ==6399== Use of uninitialised value of size 4
|
---|
50 | ==6399== at 0x4A9BDE7E: PyObject_Free (obmalloc.c:711)
|
---|
51 | ==6399== by 0x4A9B8198: dictresize (dictobject.c:477)
|
---|
52 |
|
---|
53 | These are expected and not a problem. Tim Peters explains
|
---|
54 | the situation:
|
---|
55 |
|
---|
56 | PyMalloc needs to know whether an arbitrary address is one
|
---|
57 | that's managed by it, or is managed by the system malloc.
|
---|
58 | The current scheme allows this to be determined in constant
|
---|
59 | time, regardless of how many memory areas are under pymalloc's
|
---|
60 | control.
|
---|
61 |
|
---|
62 | The memory pymalloc manages itself is in one or more "arenas",
|
---|
63 | each a large contiguous memory area obtained from malloc.
|
---|
64 | The base address of each arena is saved by pymalloc
|
---|
65 | in a vector. Each arena is carved into "pools", and a field at
|
---|
66 | the start of each pool contains the index of that pool's arena's
|
---|
67 | base address in that vector.
|
---|
68 |
|
---|
69 | Given an arbitrary address, pymalloc computes the pool base
|
---|
70 | address corresponding to it, then looks at "the index" stored
|
---|
71 | near there. If the index read up is out of bounds for the
|
---|
72 | vector of arena base addresses pymalloc maintains, then
|
---|
73 | pymalloc knows for certain that this address is not under
|
---|
74 | pymalloc's control. Otherwise the index is in bounds, and
|
---|
75 | pymalloc compares
|
---|
76 |
|
---|
77 | the arena base address stored at that index in the vector
|
---|
78 |
|
---|
79 | to
|
---|
80 |
|
---|
81 | the arbitrary address pymalloc is investigating
|
---|
82 |
|
---|
83 | pymalloc controls this arbitrary address if and only if it lies
|
---|
84 | in the arena the address's pool's index claims it lies in.
|
---|
85 |
|
---|
86 | It doesn't matter whether the memory pymalloc reads up ("the
|
---|
87 | index") is initialized. If it's not initialized, then
|
---|
88 | whatever trash gets read up will lead pymalloc to conclude
|
---|
89 | (correctly) that the address isn't controlled by it, either
|
---|
90 | because the index is out of bounds, or the index is in bounds
|
---|
91 | but the arena it represents doesn't contain the address.
|
---|
92 |
|
---|
93 | This determination has to be made on every call to one of
|
---|
94 | pymalloc's free/realloc entry points, so its speed is critical
|
---|
95 | (Python allocates and frees dynamic memory at a ferocious rate
|
---|
96 | -- everything in Python, from integers to "stack frames",
|
---|
97 | lives in the heap).
|
---|