1 | \section{\module{pickle} --- Python object serialization}
|
---|
2 |
|
---|
3 | \declaremodule{standard}{pickle}
|
---|
4 | \modulesynopsis{Convert Python objects to streams of bytes and back.}
|
---|
5 | % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
|
---|
6 | % Rewritten by Barry Warsaw <barry@zope.com>
|
---|
7 |
|
---|
8 | \index{persistence}
|
---|
9 | \indexii{persistent}{objects}
|
---|
10 | \indexii{serializing}{objects}
|
---|
11 | \indexii{marshalling}{objects}
|
---|
12 | \indexii{flattening}{objects}
|
---|
13 | \indexii{pickling}{objects}
|
---|
14 |
|
---|
15 | The \module{pickle} module implements a fundamental, but powerful
|
---|
16 | algorithm for serializing and de-serializing a Python object
|
---|
17 | structure. ``Pickling'' is the process whereby a Python object
|
---|
18 | hierarchy is converted into a byte stream, and ``unpickling'' is the
|
---|
19 | inverse operation, whereby a byte stream is converted back into an
|
---|
20 | object hierarchy. Pickling (and unpickling) is alternatively known as
|
---|
21 | ``serialization'', ``marshalling,''\footnote{Don't confuse this with
|
---|
22 | the \refmodule{marshal} module} or ``flattening'',
|
---|
23 | however, to avoid confusion, the terms used here are ``pickling'' and
|
---|
24 | ``unpickling''.
|
---|
25 |
|
---|
26 | This documentation describes both the \module{pickle} module and the
|
---|
27 | \refmodule{cPickle} module.
|
---|
28 |
|
---|
29 | \subsection{Relationship to other Python modules}
|
---|
30 |
|
---|
31 | The \module{pickle} module has an optimized cousin called the
|
---|
32 | \module{cPickle} module. As its name implies, \module{cPickle} is
|
---|
33 | written in C, so it can be up to 1000 times faster than
|
---|
34 | \module{pickle}. However it does not support subclassing of the
|
---|
35 | \function{Pickler()} and \function{Unpickler()} classes, because in
|
---|
36 | \module{cPickle} these are functions, not classes. Most applications
|
---|
37 | have no need for this functionality, and can benefit from the improved
|
---|
38 | performance of \module{cPickle}. Other than that, the interfaces of
|
---|
39 | the two modules are nearly identical; the common interface is
|
---|
40 | described in this manual and differences are pointed out where
|
---|
41 | necessary. In the following discussions, we use the term ``pickle''
|
---|
42 | to collectively describe the \module{pickle} and
|
---|
43 | \module{cPickle} modules.
|
---|
44 |
|
---|
45 | The data streams the two modules produce are guaranteed to be
|
---|
46 | interchangeable.
|
---|
47 |
|
---|
48 | Python has a more primitive serialization module called
|
---|
49 | \refmodule{marshal}, but in general
|
---|
50 | \module{pickle} should always be the preferred way to serialize Python
|
---|
51 | objects. \module{marshal} exists primarily to support Python's
|
---|
52 | \file{.pyc} files.
|
---|
53 |
|
---|
54 | The \module{pickle} module differs from \refmodule{marshal} several
|
---|
55 | significant ways:
|
---|
56 |
|
---|
57 | \begin{itemize}
|
---|
58 |
|
---|
59 | \item The \module{pickle} module keeps track of the objects it has
|
---|
60 | already serialized, so that later references to the same object
|
---|
61 | won't be serialized again. \module{marshal} doesn't do this.
|
---|
62 |
|
---|
63 | This has implications both for recursive objects and object
|
---|
64 | sharing. Recursive objects are objects that contain references
|
---|
65 | to themselves. These are not handled by marshal, and in fact,
|
---|
66 | attempting to marshal recursive objects will crash your Python
|
---|
67 | interpreter. Object sharing happens when there are multiple
|
---|
68 | references to the same object in different places in the object
|
---|
69 | hierarchy being serialized. \module{pickle} stores such objects
|
---|
70 | only once, and ensures that all other references point to the
|
---|
71 | master copy. Shared objects remain shared, which can be very
|
---|
72 | important for mutable objects.
|
---|
73 |
|
---|
74 | \item \module{marshal} cannot be used to serialize user-defined
|
---|
75 | classes and their instances. \module{pickle} can save and
|
---|
76 | restore class instances transparently, however the class
|
---|
77 | definition must be importable and live in the same module as
|
---|
78 | when the object was stored.
|
---|
79 |
|
---|
80 | \item The \module{marshal} serialization format is not guaranteed to
|
---|
81 | be portable across Python versions. Because its primary job in
|
---|
82 | life is to support \file{.pyc} files, the Python implementers
|
---|
83 | reserve the right to change the serialization format in
|
---|
84 | non-backwards compatible ways should the need arise. The
|
---|
85 | \module{pickle} serialization format is guaranteed to be
|
---|
86 | backwards compatible across Python releases.
|
---|
87 |
|
---|
88 | \end{itemize}
|
---|
89 |
|
---|
90 | \begin{notice}[warning]
|
---|
91 | The \module{pickle} module is not intended to be secure against
|
---|
92 | erroneous or maliciously constructed data. Never unpickle data
|
---|
93 | received from an untrusted or unauthenticated source.
|
---|
94 | \end{notice}
|
---|
95 |
|
---|
96 | Note that serialization is a more primitive notion than persistence;
|
---|
97 | although
|
---|
98 | \module{pickle} reads and writes file objects, it does not handle the
|
---|
99 | issue of naming persistent objects, nor the (even more complicated)
|
---|
100 | issue of concurrent access to persistent objects. The \module{pickle}
|
---|
101 | module can transform a complex object into a byte stream and it can
|
---|
102 | transform the byte stream into an object with the same internal
|
---|
103 | structure. Perhaps the most obvious thing to do with these byte
|
---|
104 | streams is to write them onto a file, but it is also conceivable to
|
---|
105 | send them across a network or store them in a database. The module
|
---|
106 | \refmodule{shelve} provides a simple interface
|
---|
107 | to pickle and unpickle objects on DBM-style database files.
|
---|
108 |
|
---|
109 | \subsection{Data stream format}
|
---|
110 |
|
---|
111 | The data format used by \module{pickle} is Python-specific. This has
|
---|
112 | the advantage that there are no restrictions imposed by external
|
---|
113 | standards such as XDR\index{XDR}\index{External Data Representation}
|
---|
114 | (which can't represent pointer sharing); however it means that
|
---|
115 | non-Python programs may not be able to reconstruct pickled Python
|
---|
116 | objects.
|
---|
117 |
|
---|
118 | By default, the \module{pickle} data format uses a printable \ASCII{}
|
---|
119 | representation. This is slightly more voluminous than a binary
|
---|
120 | representation. The big advantage of using printable \ASCII{} (and of
|
---|
121 | some other characteristics of \module{pickle}'s representation) is that
|
---|
122 | for debugging or recovery purposes it is possible for a human to read
|
---|
123 | the pickled file with a standard text editor.
|
---|
124 |
|
---|
125 | There are currently 3 different protocols which can be used for pickling.
|
---|
126 |
|
---|
127 | \begin{itemize}
|
---|
128 |
|
---|
129 | \item Protocol version 0 is the original ASCII protocol and is backwards
|
---|
130 | compatible with earlier versions of Python.
|
---|
131 |
|
---|
132 | \item Protocol version 1 is the old binary format which is also compatible
|
---|
133 | with earlier versions of Python.
|
---|
134 |
|
---|
135 | \item Protocol version 2 was introduced in Python 2.3. It provides
|
---|
136 | much more efficient pickling of new-style classes.
|
---|
137 |
|
---|
138 | \end{itemize}
|
---|
139 |
|
---|
140 | Refer to PEP 307 for more information.
|
---|
141 |
|
---|
142 | If a \var{protocol} is not specified, protocol 0 is used.
|
---|
143 | If \var{protocol} is specified as a negative value
|
---|
144 | or \constant{HIGHEST_PROTOCOL},
|
---|
145 | the highest protocol version available will be used.
|
---|
146 |
|
---|
147 | \versionchanged[Introduced the \var{protocol} parameter]{2.3}
|
---|
148 |
|
---|
149 | A binary format, which is slightly more efficient, can be chosen by
|
---|
150 | specifying a \var{protocol} version >= 1.
|
---|
151 |
|
---|
152 | \subsection{Usage}
|
---|
153 |
|
---|
154 | To serialize an object hierarchy, you first create a pickler, then you
|
---|
155 | call the pickler's \method{dump()} method. To de-serialize a data
|
---|
156 | stream, you first create an unpickler, then you call the unpickler's
|
---|
157 | \method{load()} method. The \module{pickle} module provides the
|
---|
158 | following constant:
|
---|
159 |
|
---|
160 | \begin{datadesc}{HIGHEST_PROTOCOL}
|
---|
161 | The highest protocol version available. This value can be passed
|
---|
162 | as a \var{protocol} value.
|
---|
163 | \versionadded{2.3}
|
---|
164 | \end{datadesc}
|
---|
165 |
|
---|
166 | \note{Be sure to always open pickle files created with protocols >= 1 in
|
---|
167 | binary mode. For the old ASCII-based pickle protocol 0 you can use
|
---|
168 | either text mode or binary mode as long as you stay consistent.
|
---|
169 |
|
---|
170 | A pickle file written with protocol 0 in binary mode will contain
|
---|
171 | lone linefeeds as line terminators and therefore will look ``funny''
|
---|
172 | when viewed in Notepad or other editors which do not support this
|
---|
173 | format.}
|
---|
174 |
|
---|
175 | The \module{pickle} module provides the
|
---|
176 | following functions to make the pickling process more convenient:
|
---|
177 |
|
---|
178 | \begin{funcdesc}{dump}{obj, file\optional{, protocol}}
|
---|
179 | Write a pickled representation of \var{obj} to the open file object
|
---|
180 | \var{file}. This is equivalent to
|
---|
181 | \code{Pickler(\var{file}, \var{protocol}).dump(\var{obj})}.
|
---|
182 |
|
---|
183 | If the \var{protocol} parameter is omitted, protocol 0 is used.
|
---|
184 | If \var{protocol} is specified as a negative value
|
---|
185 | or \constant{HIGHEST_PROTOCOL},
|
---|
186 | the highest protocol version will be used.
|
---|
187 |
|
---|
188 | \versionchanged[Introduced the \var{protocol} parameter]{2.3}
|
---|
189 |
|
---|
190 | \var{file} must have a \method{write()} method that accepts a single
|
---|
191 | string argument. It can thus be a file object opened for writing, a
|
---|
192 | \refmodule{StringIO} object, or any other custom
|
---|
193 | object that meets this interface.
|
---|
194 | \end{funcdesc}
|
---|
195 |
|
---|
196 | \begin{funcdesc}{load}{file}
|
---|
197 | Read a string from the open file object \var{file} and interpret it as
|
---|
198 | a pickle data stream, reconstructing and returning the original object
|
---|
199 | hierarchy. This is equivalent to \code{Unpickler(\var{file}).load()}.
|
---|
200 |
|
---|
201 | \var{file} must have two methods, a \method{read()} method that takes
|
---|
202 | an integer argument, and a \method{readline()} method that requires no
|
---|
203 | arguments. Both methods should return a string. Thus \var{file} can
|
---|
204 | be a file object opened for reading, a
|
---|
205 | \module{StringIO} object, or any other custom
|
---|
206 | object that meets this interface.
|
---|
207 |
|
---|
208 | This function automatically determines whether the data stream was
|
---|
209 | written in binary mode or not.
|
---|
210 | \end{funcdesc}
|
---|
211 |
|
---|
212 | \begin{funcdesc}{dumps}{obj\optional{, protocol}}
|
---|
213 | Return the pickled representation of the object as a string, instead
|
---|
214 | of writing it to a file.
|
---|
215 |
|
---|
216 | If the \var{protocol} parameter is omitted, protocol 0 is used.
|
---|
217 | If \var{protocol} is specified as a negative value
|
---|
218 | or \constant{HIGHEST_PROTOCOL},
|
---|
219 | the highest protocol version will be used.
|
---|
220 |
|
---|
221 | \versionchanged[The \var{protocol} parameter was added]{2.3}
|
---|
222 |
|
---|
223 | \end{funcdesc}
|
---|
224 |
|
---|
225 | \begin{funcdesc}{loads}{string}
|
---|
226 | Read a pickled object hierarchy from a string. Characters in the
|
---|
227 | string past the pickled object's representation are ignored.
|
---|
228 | \end{funcdesc}
|
---|
229 |
|
---|
230 | The \module{pickle} module also defines three exceptions:
|
---|
231 |
|
---|
232 | \begin{excdesc}{PickleError}
|
---|
233 | A common base class for the other exceptions defined below. This
|
---|
234 | inherits from \exception{Exception}.
|
---|
235 | \end{excdesc}
|
---|
236 |
|
---|
237 | \begin{excdesc}{PicklingError}
|
---|
238 | This exception is raised when an unpicklable object is passed to
|
---|
239 | the \method{dump()} method.
|
---|
240 | \end{excdesc}
|
---|
241 |
|
---|
242 | \begin{excdesc}{UnpicklingError}
|
---|
243 | This exception is raised when there is a problem unpickling an object.
|
---|
244 | Note that other exceptions may also be raised during unpickling,
|
---|
245 | including (but not necessarily limited to) \exception{AttributeError},
|
---|
246 | \exception{EOFError}, \exception{ImportError}, and \exception{IndexError}.
|
---|
247 | \end{excdesc}
|
---|
248 |
|
---|
249 | The \module{pickle} module also exports two callables\footnote{In the
|
---|
250 | \module{pickle} module these callables are classes, which you could
|
---|
251 | subclass to customize the behavior. However, in the \refmodule{cPickle}
|
---|
252 | module these callables are factory functions and so cannot be
|
---|
253 | subclassed. One common reason to subclass is to control what
|
---|
254 | objects can actually be unpickled. See section~\ref{pickle-sub} for
|
---|
255 | more details.}, \class{Pickler} and \class{Unpickler}:
|
---|
256 |
|
---|
257 | \begin{classdesc}{Pickler}{file\optional{, protocol}}
|
---|
258 | This takes a file-like object to which it will write a pickle data
|
---|
259 | stream.
|
---|
260 |
|
---|
261 | If the \var{protocol} parameter is omitted, protocol 0 is used.
|
---|
262 | If \var{protocol} is specified as a negative value,
|
---|
263 | the highest protocol version will be used.
|
---|
264 |
|
---|
265 | \versionchanged[Introduced the \var{protocol} parameter]{2.3}
|
---|
266 |
|
---|
267 | \var{file} must have a \method{write()} method that accepts a single
|
---|
268 | string argument. It can thus be an open file object, a
|
---|
269 | \module{StringIO} object, or any other custom
|
---|
270 | object that meets this interface.
|
---|
271 | \end{classdesc}
|
---|
272 |
|
---|
273 | \class{Pickler} objects define one (or two) public methods:
|
---|
274 |
|
---|
275 | \begin{methoddesc}[Pickler]{dump}{obj}
|
---|
276 | Write a pickled representation of \var{obj} to the open file object
|
---|
277 | given in the constructor. Either the binary or \ASCII{} format will
|
---|
278 | be used, depending on the value of the \var{protocol} argument passed to the
|
---|
279 | constructor.
|
---|
280 | \end{methoddesc}
|
---|
281 |
|
---|
282 | \begin{methoddesc}[Pickler]{clear_memo}{}
|
---|
283 | Clears the pickler's ``memo''. The memo is the data structure that
|
---|
284 | remembers which objects the pickler has already seen, so that shared
|
---|
285 | or recursive objects pickled by reference and not by value. This
|
---|
286 | method is useful when re-using picklers.
|
---|
287 |
|
---|
288 | \begin{notice}
|
---|
289 | Prior to Python 2.3, \method{clear_memo()} was only available on the
|
---|
290 | picklers created by \refmodule{cPickle}. In the \module{pickle} module,
|
---|
291 | picklers have an instance variable called \member{memo} which is a
|
---|
292 | Python dictionary. So to clear the memo for a \module{pickle} module
|
---|
293 | pickler, you could do the following:
|
---|
294 |
|
---|
295 | \begin{verbatim}
|
---|
296 | mypickler.memo.clear()
|
---|
297 | \end{verbatim}
|
---|
298 |
|
---|
299 | Code that does not need to support older versions of Python should
|
---|
300 | simply use \method{clear_memo()}.
|
---|
301 | \end{notice}
|
---|
302 | \end{methoddesc}
|
---|
303 |
|
---|
304 | It is possible to make multiple calls to the \method{dump()} method of
|
---|
305 | the same \class{Pickler} instance. These must then be matched to the
|
---|
306 | same number of calls to the \method{load()} method of the
|
---|
307 | corresponding \class{Unpickler} instance. If the same object is
|
---|
308 | pickled by multiple \method{dump()} calls, the \method{load()} will
|
---|
309 | all yield references to the same object.\footnote{\emph{Warning}: this
|
---|
310 | is intended for pickling multiple objects without intervening
|
---|
311 | modifications to the objects or their parts. If you modify an object
|
---|
312 | and then pickle it again using the same \class{Pickler} instance, the
|
---|
313 | object is not pickled again --- a reference to it is pickled and the
|
---|
314 | \class{Unpickler} will return the old value, not the modified one.
|
---|
315 | There are two problems here: (1) detecting changes, and (2)
|
---|
316 | marshalling a minimal set of changes. Garbage Collection may also
|
---|
317 | become a problem here.}
|
---|
318 |
|
---|
319 | \class{Unpickler} objects are defined as:
|
---|
320 |
|
---|
321 | \begin{classdesc}{Unpickler}{file}
|
---|
322 | This takes a file-like object from which it will read a pickle data
|
---|
323 | stream. This class automatically determines whether the data stream
|
---|
324 | was written in binary mode or not, so it does not need a flag as in
|
---|
325 | the \class{Pickler} factory.
|
---|
326 |
|
---|
327 | \var{file} must have two methods, a \method{read()} method that takes
|
---|
328 | an integer argument, and a \method{readline()} method that requires no
|
---|
329 | arguments. Both methods should return a string. Thus \var{file} can
|
---|
330 | be a file object opened for reading, a
|
---|
331 | \module{StringIO} object, or any other custom
|
---|
332 | object that meets this interface.
|
---|
333 | \end{classdesc}
|
---|
334 |
|
---|
335 | \class{Unpickler} objects have one (or two) public methods:
|
---|
336 |
|
---|
337 | \begin{methoddesc}[Unpickler]{load}{}
|
---|
338 | Read a pickled object representation from the open file object given
|
---|
339 | in the constructor, and return the reconstituted object hierarchy
|
---|
340 | specified therein.
|
---|
341 | \end{methoddesc}
|
---|
342 |
|
---|
343 | \begin{methoddesc}[Unpickler]{noload}{}
|
---|
344 | This is just like \method{load()} except that it doesn't actually
|
---|
345 | create any objects. This is useful primarily for finding what's
|
---|
346 | called ``persistent ids'' that may be referenced in a pickle data
|
---|
347 | stream. See section~\ref{pickle-protocol} below for more details.
|
---|
348 |
|
---|
349 | \strong{Note:} the \method{noload()} method is currently only
|
---|
350 | available on \class{Unpickler} objects created with the
|
---|
351 | \module{cPickle} module. \module{pickle} module \class{Unpickler}s do
|
---|
352 | not have the \method{noload()} method.
|
---|
353 | \end{methoddesc}
|
---|
354 |
|
---|
355 | \subsection{What can be pickled and unpickled?}
|
---|
356 |
|
---|
357 | The following types can be pickled:
|
---|
358 |
|
---|
359 | \begin{itemize}
|
---|
360 |
|
---|
361 | \item \code{None}, \code{True}, and \code{False}
|
---|
362 |
|
---|
363 | \item integers, long integers, floating point numbers, complex numbers
|
---|
364 |
|
---|
365 | \item normal and Unicode strings
|
---|
366 |
|
---|
367 | \item tuples, lists, sets, and dictionaries containing only picklable objects
|
---|
368 |
|
---|
369 | \item functions defined at the top level of a module
|
---|
370 |
|
---|
371 | \item built-in functions defined at the top level of a module
|
---|
372 |
|
---|
373 | \item classes that are defined at the top level of a module
|
---|
374 |
|
---|
375 | \item instances of such classes whose \member{__dict__} or
|
---|
376 | \method{__setstate__()} is picklable (see
|
---|
377 | section~\ref{pickle-protocol} for details)
|
---|
378 |
|
---|
379 | \end{itemize}
|
---|
380 |
|
---|
381 | Attempts to pickle unpicklable objects will raise the
|
---|
382 | \exception{PicklingError} exception; when this happens, an unspecified
|
---|
383 | number of bytes may have already been written to the underlying file.
|
---|
384 | Trying to pickle a highly recursive data structure may exceed the
|
---|
385 | maximum recursion depth, a \exception{RuntimeError} will be raised
|
---|
386 | in this case. You can carefully raise this limit with
|
---|
387 | \function{sys.setrecursionlimit()}.
|
---|
388 |
|
---|
389 | Note that functions (built-in and user-defined) are pickled by ``fully
|
---|
390 | qualified'' name reference, not by value. This means that only the
|
---|
391 | function name is pickled, along with the name of module the function
|
---|
392 | is defined in. Neither the function's code, nor any of its function
|
---|
393 | attributes are pickled. Thus the defining module must be importable
|
---|
394 | in the unpickling environment, and the module must contain the named
|
---|
395 | object, otherwise an exception will be raised.\footnote{The exception
|
---|
396 | raised will likely be an \exception{ImportError} or an
|
---|
397 | \exception{AttributeError} but it could be something else.}
|
---|
398 |
|
---|
399 | Similarly, classes are pickled by named reference, so the same
|
---|
400 | restrictions in the unpickling environment apply. Note that none of
|
---|
401 | the class's code or data is pickled, so in the following example the
|
---|
402 | class attribute \code{attr} is not restored in the unpickling
|
---|
403 | environment:
|
---|
404 |
|
---|
405 | \begin{verbatim}
|
---|
406 | class Foo:
|
---|
407 | attr = 'a class attr'
|
---|
408 |
|
---|
409 | picklestring = pickle.dumps(Foo)
|
---|
410 | \end{verbatim}
|
---|
411 |
|
---|
412 | These restrictions are why picklable functions and classes must be
|
---|
413 | defined in the top level of a module.
|
---|
414 |
|
---|
415 | Similarly, when class instances are pickled, their class's code and
|
---|
416 | data are not pickled along with them. Only the instance data are
|
---|
417 | pickled. This is done on purpose, so you can fix bugs in a class or
|
---|
418 | add methods to the class and still load objects that were created with
|
---|
419 | an earlier version of the class. If you plan to have long-lived
|
---|
420 | objects that will see many versions of a class, it may be worthwhile
|
---|
421 | to put a version number in the objects so that suitable conversions
|
---|
422 | can be made by the class's \method{__setstate__()} method.
|
---|
423 |
|
---|
424 | \subsection{The pickle protocol
|
---|
425 | \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
|
---|
426 |
|
---|
427 | This section describes the ``pickling protocol'' that defines the
|
---|
428 | interface between the pickler/unpickler and the objects that are being
|
---|
429 | serialized. This protocol provides a standard way for you to define,
|
---|
430 | customize, and control how your objects are serialized and
|
---|
431 | de-serialized. The description in this section doesn't cover specific
|
---|
432 | customizations that you can employ to make the unpickling environment
|
---|
433 | slightly safer from untrusted pickle data streams; see section~\ref{pickle-sub}
|
---|
434 | for more details.
|
---|
435 |
|
---|
436 | \subsubsection{Pickling and unpickling normal class
|
---|
437 | instances\label{pickle-inst}}
|
---|
438 |
|
---|
439 | When a pickled class instance is unpickled, its \method{__init__()}
|
---|
440 | method is normally \emph{not} invoked. If it is desirable that the
|
---|
441 | \method{__init__()} method be called on unpickling, an old-style class
|
---|
442 | can define a method \method{__getinitargs__()}, which should return a
|
---|
443 | \emph{tuple} containing the arguments to be passed to the class
|
---|
444 | constructor (\method{__init__()} for example). The
|
---|
445 | \method{__getinitargs__()} method is called at
|
---|
446 | pickle time; the tuple it returns is incorporated in the pickle for
|
---|
447 | the instance.
|
---|
448 | \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
|
---|
449 | \withsubitem{(instance constructor)}{\ttindex{__init__()}}
|
---|
450 |
|
---|
451 | \withsubitem{(copy protocol)}{\ttindex{__getnewargs__()}}
|
---|
452 |
|
---|
453 | New-style types can provide a \method{__getnewargs__()} method that is
|
---|
454 | used for protocol 2. Implementing this method is needed if the type
|
---|
455 | establishes some internal invariants when the instance is created, or
|
---|
456 | if the memory allocation is affected by the values passed to the
|
---|
457 | \method{__new__()} method for the type (as it is for tuples and
|
---|
458 | strings). Instances of a new-style type \class{C} are created using
|
---|
459 |
|
---|
460 | \begin{alltt}
|
---|
461 | obj = C.__new__(C, *\var{args})
|
---|
462 | \end{alltt}
|
---|
463 |
|
---|
464 | where \var{args} is the result of calling \method{__getnewargs__()} on
|
---|
465 | the original object; if there is no \method{__getnewargs__()}, an
|
---|
466 | empty tuple is assumed.
|
---|
467 |
|
---|
468 | \withsubitem{(copy protocol)}{
|
---|
469 | \ttindex{__getstate__()}\ttindex{__setstate__()}}
|
---|
470 | \withsubitem{(instance attribute)}{
|
---|
471 | \ttindex{__dict__}}
|
---|
472 |
|
---|
473 | Classes can further influence how their instances are pickled; if the
|
---|
474 | class defines the method \method{__getstate__()}, it is called and the
|
---|
475 | return state is pickled as the contents for the instance, instead of
|
---|
476 | the contents of the instance's dictionary. If there is no
|
---|
477 | \method{__getstate__()} method, the instance's \member{__dict__} is
|
---|
478 | pickled.
|
---|
479 |
|
---|
480 | Upon unpickling, if the class also defines the method
|
---|
481 | \method{__setstate__()}, it is called with the unpickled
|
---|
482 | state.\footnote{These methods can also be used to implement copying
|
---|
483 | class instances.} If there is no \method{__setstate__()} method, the
|
---|
484 | pickled state must be a dictionary and its items are assigned to the
|
---|
485 | new instance's dictionary. If a class defines both
|
---|
486 | \method{__getstate__()} and \method{__setstate__()}, the state object
|
---|
487 | needn't be a dictionary and these methods can do what they
|
---|
488 | want.\footnote{This protocol is also used by the shallow and deep
|
---|
489 | copying operations defined in the
|
---|
490 | \refmodule{copy} module.}
|
---|
491 |
|
---|
492 | \begin{notice}[warning]
|
---|
493 | For new-style classes, if \method{__getstate__()} returns a false
|
---|
494 | value, the \method{__setstate__()} method will not be called.
|
---|
495 | \end{notice}
|
---|
496 |
|
---|
497 |
|
---|
498 | \subsubsection{Pickling and unpickling extension types}
|
---|
499 |
|
---|
500 | When the \class{Pickler} encounters an object of a type it knows
|
---|
501 | nothing about --- such as an extension type --- it looks in two places
|
---|
502 | for a hint of how to pickle it. One alternative is for the object to
|
---|
503 | implement a \method{__reduce__()} method. If provided, at pickling
|
---|
504 | time \method{__reduce__()} will be called with no arguments, and it
|
---|
505 | must return either a string or a tuple.
|
---|
506 |
|
---|
507 | If a string is returned, it names a global variable whose contents are
|
---|
508 | pickled as normal. The string returned by \method{__reduce__} should
|
---|
509 | be the object's local name relative to its module; the pickle module
|
---|
510 | searches the module namespace to determine the object's module.
|
---|
511 |
|
---|
512 | When a tuple is returned, it must be between two and five elements
|
---|
513 | long. Optional elements can either be omitted, or \code{None} can be provided
|
---|
514 | as their value. The semantics of each element are:
|
---|
515 |
|
---|
516 | \begin{itemize}
|
---|
517 |
|
---|
518 | \item A callable object that will be called to create the initial
|
---|
519 | version of the object. The next element of the tuple will provide
|
---|
520 | arguments for this callable, and later elements provide additional
|
---|
521 | state information that will subsequently be used to fully reconstruct
|
---|
522 | the pickled date.
|
---|
523 |
|
---|
524 | In the unpickling environment this object must be either a class, a
|
---|
525 | callable registered as a ``safe constructor'' (see below), or it must
|
---|
526 | have an attribute \member{__safe_for_unpickling__} with a true value.
|
---|
527 | Otherwise, an \exception{UnpicklingError} will be raised in the
|
---|
528 | unpickling environment. Note that as usual, the callable itself is
|
---|
529 | pickled by name.
|
---|
530 |
|
---|
531 | \item A tuple of arguments for the callable object.
|
---|
532 | \versionchanged[Formerly, this argument could also be \code{None}]{2.5}
|
---|
533 |
|
---|
534 | \item Optionally, the object's state, which will be passed to
|
---|
535 | the object's \method{__setstate__()} method as described in
|
---|
536 | section~\ref{pickle-inst}. If the object has no
|
---|
537 | \method{__setstate__()} method, then, as above, the value must
|
---|
538 | be a dictionary and it will be added to the object's
|
---|
539 | \member{__dict__}.
|
---|
540 |
|
---|
541 | \item Optionally, an iterator (and not a sequence) yielding successive
|
---|
542 | list items. These list items will be pickled, and appended to the
|
---|
543 | object using either \code{obj.append(\var{item})} or
|
---|
544 | \code{obj.extend(\var{list_of_items})}. This is primarily used for
|
---|
545 | list subclasses, but may be used by other classes as long as they have
|
---|
546 | \method{append()} and \method{extend()} methods with the appropriate
|
---|
547 | signature. (Whether \method{append()} or \method{extend()} is used
|
---|
548 | depends on which pickle protocol version is used as well as the number
|
---|
549 | of items to append, so both must be supported.)
|
---|
550 |
|
---|
551 | \item Optionally, an iterator (not a sequence)
|
---|
552 | yielding successive dictionary items, which should be tuples of the
|
---|
553 | form \code{(\var{key}, \var{value})}. These items will be pickled
|
---|
554 | and stored to the object using \code{obj[\var{key}] = \var{value}}.
|
---|
555 | This is primarily used for dictionary subclasses, but may be used by
|
---|
556 | other classes as long as they implement \method{__setitem__}.
|
---|
557 |
|
---|
558 | \end{itemize}
|
---|
559 |
|
---|
560 | It is sometimes useful to know the protocol version when implementing
|
---|
561 | \method{__reduce__}. This can be done by implementing a method named
|
---|
562 | \method{__reduce_ex__} instead of \method{__reduce__}.
|
---|
563 | \method{__reduce_ex__}, when it exists, is called in preference over
|
---|
564 | \method{__reduce__} (you may still provide \method{__reduce__} for
|
---|
565 | backwards compatibility). The \method{__reduce_ex__} method will be
|
---|
566 | called with a single integer argument, the protocol version.
|
---|
567 |
|
---|
568 | The \class{object} class implements both \method{__reduce__} and
|
---|
569 | \method{__reduce_ex__}; however, if a subclass overrides
|
---|
570 | \method{__reduce__} but not \method{__reduce_ex__}, the
|
---|
571 | \method{__reduce_ex__} implementation detects this and calls
|
---|
572 | \method{__reduce__}.
|
---|
573 |
|
---|
574 | An alternative to implementing a \method{__reduce__()} method on the
|
---|
575 | object to be pickled, is to register the callable with the
|
---|
576 | \refmodule[copyreg]{copy_reg} module. This module provides a way
|
---|
577 | for programs to register ``reduction functions'' and constructors for
|
---|
578 | user-defined types. Reduction functions have the same semantics and
|
---|
579 | interface as the \method{__reduce__()} method described above, except
|
---|
580 | that they are called with a single argument, the object to be pickled.
|
---|
581 |
|
---|
582 | The registered constructor is deemed a ``safe constructor'' for purposes
|
---|
583 | of unpickling as described above.
|
---|
584 |
|
---|
585 |
|
---|
586 | \subsubsection{Pickling and unpickling external objects}
|
---|
587 |
|
---|
588 | For the benefit of object persistence, the \module{pickle} module
|
---|
589 | supports the notion of a reference to an object outside the pickled
|
---|
590 | data stream. Such objects are referenced by a ``persistent id'',
|
---|
591 | which is just an arbitrary string of printable \ASCII{} characters.
|
---|
592 | The resolution of such names is not defined by the \module{pickle}
|
---|
593 | module; it will delegate this resolution to user defined functions on
|
---|
594 | the pickler and unpickler.\footnote{The actual mechanism for
|
---|
595 | associating these user defined functions is slightly different for
|
---|
596 | \module{pickle} and \module{cPickle}. The description given here
|
---|
597 | works the same for both implementations. Users of the \module{pickle}
|
---|
598 | module could also use subclassing to effect the same results,
|
---|
599 | overriding the \method{persistent_id()} and \method{persistent_load()}
|
---|
600 | methods in the derived classes.}
|
---|
601 |
|
---|
602 | To define external persistent id resolution, you need to set the
|
---|
603 | \member{persistent_id} attribute of the pickler object and the
|
---|
604 | \member{persistent_load} attribute of the unpickler object.
|
---|
605 |
|
---|
606 | To pickle objects that have an external persistent id, the pickler
|
---|
607 | must have a custom \function{persistent_id()} method that takes an
|
---|
608 | object as an argument and returns either \code{None} or the persistent
|
---|
609 | id for that object. When \code{None} is returned, the pickler simply
|
---|
610 | pickles the object as normal. When a persistent id string is
|
---|
611 | returned, the pickler will pickle that string, along with a marker
|
---|
612 | so that the unpickler will recognize the string as a persistent id.
|
---|
613 |
|
---|
614 | To unpickle external objects, the unpickler must have a custom
|
---|
615 | \function{persistent_load()} function that takes a persistent id
|
---|
616 | string and returns the referenced object.
|
---|
617 |
|
---|
618 | Here's a silly example that \emph{might} shed more light:
|
---|
619 |
|
---|
620 | \begin{verbatim}
|
---|
621 | import pickle
|
---|
622 | from cStringIO import StringIO
|
---|
623 |
|
---|
624 | src = StringIO()
|
---|
625 | p = pickle.Pickler(src)
|
---|
626 |
|
---|
627 | def persistent_id(obj):
|
---|
628 | if hasattr(obj, 'x'):
|
---|
629 | return 'the value %d' % obj.x
|
---|
630 | else:
|
---|
631 | return None
|
---|
632 |
|
---|
633 | p.persistent_id = persistent_id
|
---|
634 |
|
---|
635 | class Integer:
|
---|
636 | def __init__(self, x):
|
---|
637 | self.x = x
|
---|
638 | def __str__(self):
|
---|
639 | return 'My name is integer %d' % self.x
|
---|
640 |
|
---|
641 | i = Integer(7)
|
---|
642 | print i
|
---|
643 | p.dump(i)
|
---|
644 |
|
---|
645 | datastream = src.getvalue()
|
---|
646 | print repr(datastream)
|
---|
647 | dst = StringIO(datastream)
|
---|
648 |
|
---|
649 | up = pickle.Unpickler(dst)
|
---|
650 |
|
---|
651 | class FancyInteger(Integer):
|
---|
652 | def __str__(self):
|
---|
653 | return 'I am the integer %d' % self.x
|
---|
654 |
|
---|
655 | def persistent_load(persid):
|
---|
656 | if persid.startswith('the value '):
|
---|
657 | value = int(persid.split()[2])
|
---|
658 | return FancyInteger(value)
|
---|
659 | else:
|
---|
660 | raise pickle.UnpicklingError, 'Invalid persistent id'
|
---|
661 |
|
---|
662 | up.persistent_load = persistent_load
|
---|
663 |
|
---|
664 | j = up.load()
|
---|
665 | print j
|
---|
666 | \end{verbatim}
|
---|
667 |
|
---|
668 | In the \module{cPickle} module, the unpickler's
|
---|
669 | \member{persistent_load} attribute can also be set to a Python
|
---|
670 | list, in which case, when the unpickler reaches a persistent id, the
|
---|
671 | persistent id string will simply be appended to this list. This
|
---|
672 | functionality exists so that a pickle data stream can be ``sniffed''
|
---|
673 | for object references without actually instantiating all the objects
|
---|
674 | in a pickle.\footnote{We'll leave you with the image of Guido and Jim
|
---|
675 | sitting around sniffing pickles in their living rooms.} Setting
|
---|
676 | \member{persistent_load} to a list is usually used in conjunction with
|
---|
677 | the \method{noload()} method on the Unpickler.
|
---|
678 |
|
---|
679 | % BAW: Both pickle and cPickle support something called
|
---|
680 | % inst_persistent_id() which appears to give unknown types a second
|
---|
681 | % shot at producing a persistent id. Since Jim Fulton can't remember
|
---|
682 | % why it was added or what it's for, I'm leaving it undocumented.
|
---|
683 |
|
---|
684 | \subsection{Subclassing Unpicklers \label{pickle-sub}}
|
---|
685 |
|
---|
686 | By default, unpickling will import any class that it finds in the
|
---|
687 | pickle data. You can control exactly what gets unpickled and what
|
---|
688 | gets called by customizing your unpickler. Unfortunately, exactly how
|
---|
689 | you do this is different depending on whether you're using
|
---|
690 | \module{pickle} or \module{cPickle}.\footnote{A word of caution: the
|
---|
691 | mechanisms described here use internal attributes and methods, which
|
---|
692 | are subject to change in future versions of Python. We intend to
|
---|
693 | someday provide a common interface for controlling this behavior,
|
---|
694 | which will work in either \module{pickle} or \module{cPickle}.}
|
---|
695 |
|
---|
696 | In the \module{pickle} module, you need to derive a subclass from
|
---|
697 | \class{Unpickler}, overriding the \method{load_global()}
|
---|
698 | method. \method{load_global()} should read two lines from the pickle
|
---|
699 | data stream where the first line will the name of the module
|
---|
700 | containing the class and the second line will be the name of the
|
---|
701 | instance's class. It then looks up the class, possibly importing the
|
---|
702 | module and digging out the attribute, then it appends what it finds to
|
---|
703 | the unpickler's stack. Later on, this class will be assigned to the
|
---|
704 | \member{__class__} attribute of an empty class, as a way of magically
|
---|
705 | creating an instance without calling its class's \method{__init__()}.
|
---|
706 | Your job (should you choose to accept it), would be to have
|
---|
707 | \method{load_global()} push onto the unpickler's stack, a known safe
|
---|
708 | version of any class you deem safe to unpickle. It is up to you to
|
---|
709 | produce such a class. Or you could raise an error if you want to
|
---|
710 | disallow all unpickling of instances. If this sounds like a hack,
|
---|
711 | you're right. Refer to the source code to make this work.
|
---|
712 |
|
---|
713 | Things are a little cleaner with \module{cPickle}, but not by much.
|
---|
714 | To control what gets unpickled, you can set the unpickler's
|
---|
715 | \member{find_global} attribute to a function or \code{None}. If it is
|
---|
716 | \code{None} then any attempts to unpickle instances will raise an
|
---|
717 | \exception{UnpicklingError}. If it is a function,
|
---|
718 | then it should accept a module name and a class name, and return the
|
---|
719 | corresponding class object. It is responsible for looking up the
|
---|
720 | class and performing any necessary imports, and it may raise an
|
---|
721 | error to prevent instances of the class from being unpickled.
|
---|
722 |
|
---|
723 | The moral of the story is that you should be really careful about the
|
---|
724 | source of the strings your application unpickles.
|
---|
725 |
|
---|
726 | \subsection{Example \label{pickle-example}}
|
---|
727 |
|
---|
728 | For the simplest code, use the \function{dump()} and \function{load()}
|
---|
729 | functions. Note that a self-referencing list is pickled and restored
|
---|
730 | correctly.
|
---|
731 |
|
---|
732 | \begin{verbatim}
|
---|
733 | import pickle
|
---|
734 |
|
---|
735 | data1 = {'a': [1, 2.0, 3, 4+6j],
|
---|
736 | 'b': ('string', u'Unicode string'),
|
---|
737 | 'c': None}
|
---|
738 |
|
---|
739 | selfref_list = [1, 2, 3]
|
---|
740 | selfref_list.append(selfref_list)
|
---|
741 |
|
---|
742 | output = open('data.pkl', 'wb')
|
---|
743 |
|
---|
744 | # Pickle dictionary using protocol 0.
|
---|
745 | pickle.dump(data1, output)
|
---|
746 |
|
---|
747 | # Pickle the list using the highest protocol available.
|
---|
748 | pickle.dump(selfref_list, output, -1)
|
---|
749 |
|
---|
750 | output.close()
|
---|
751 | \end{verbatim}
|
---|
752 |
|
---|
753 | The following example reads the resulting pickled data. When reading
|
---|
754 | a pickle-containing file, you should open the file in binary mode
|
---|
755 | because you can't be sure if the ASCII or binary format was used.
|
---|
756 |
|
---|
757 | \begin{verbatim}
|
---|
758 | import pprint, pickle
|
---|
759 |
|
---|
760 | pkl_file = open('data.pkl', 'rb')
|
---|
761 |
|
---|
762 | data1 = pickle.load(pkl_file)
|
---|
763 | pprint.pprint(data1)
|
---|
764 |
|
---|
765 | data2 = pickle.load(pkl_file)
|
---|
766 | pprint.pprint(data2)
|
---|
767 |
|
---|
768 | pkl_file.close()
|
---|
769 | \end{verbatim}
|
---|
770 |
|
---|
771 | Here's a larger example that shows how to modify pickling behavior for a
|
---|
772 | class. The \class{TextReader} class opens a text file, and returns
|
---|
773 | the line number and line contents each time its \method{readline()}
|
---|
774 | method is called. If a \class{TextReader} instance is pickled, all
|
---|
775 | attributes \emph{except} the file object member are saved. When the
|
---|
776 | instance is unpickled, the file is reopened, and reading resumes from
|
---|
777 | the last location. The \method{__setstate__()} and
|
---|
778 | \method{__getstate__()} methods are used to implement this behavior.
|
---|
779 |
|
---|
780 | \begin{verbatim}
|
---|
781 | class TextReader:
|
---|
782 | """Print and number lines in a text file."""
|
---|
783 | def __init__(self, file):
|
---|
784 | self.file = file
|
---|
785 | self.fh = open(file)
|
---|
786 | self.lineno = 0
|
---|
787 |
|
---|
788 | def readline(self):
|
---|
789 | self.lineno = self.lineno + 1
|
---|
790 | line = self.fh.readline()
|
---|
791 | if not line:
|
---|
792 | return None
|
---|
793 | if line.endswith("\n"):
|
---|
794 | line = line[:-1]
|
---|
795 | return "%d: %s" % (self.lineno, line)
|
---|
796 |
|
---|
797 | def __getstate__(self):
|
---|
798 | odict = self.__dict__.copy() # copy the dict since we change it
|
---|
799 | del odict['fh'] # remove filehandle entry
|
---|
800 | return odict
|
---|
801 |
|
---|
802 | def __setstate__(self,dict):
|
---|
803 | fh = open(dict['file']) # reopen file
|
---|
804 | count = dict['lineno'] # read from file...
|
---|
805 | while count: # until line count is restored
|
---|
806 | fh.readline()
|
---|
807 | count = count - 1
|
---|
808 | self.__dict__.update(dict) # update attributes
|
---|
809 | self.fh = fh # save the file object
|
---|
810 | \end{verbatim}
|
---|
811 |
|
---|
812 | A sample usage might be something like this:
|
---|
813 |
|
---|
814 | \begin{verbatim}
|
---|
815 | >>> import TextReader
|
---|
816 | >>> obj = TextReader.TextReader("TextReader.py")
|
---|
817 | >>> obj.readline()
|
---|
818 | '1: #!/usr/local/bin/python'
|
---|
819 | >>> # (more invocations of obj.readline() here)
|
---|
820 | ... obj.readline()
|
---|
821 | '7: class TextReader:'
|
---|
822 | >>> import pickle
|
---|
823 | >>> pickle.dump(obj,open('save.p','w'))
|
---|
824 | \end{verbatim}
|
---|
825 |
|
---|
826 | If you want to see that \refmodule{pickle} works across Python
|
---|
827 | processes, start another Python session, before continuing. What
|
---|
828 | follows can happen from either the same process or a new process.
|
---|
829 |
|
---|
830 | \begin{verbatim}
|
---|
831 | >>> import pickle
|
---|
832 | >>> reader = pickle.load(open('save.p'))
|
---|
833 | >>> reader.readline()
|
---|
834 | '8: "Print and number lines in a text file."'
|
---|
835 | \end{verbatim}
|
---|
836 |
|
---|
837 |
|
---|
838 | \begin{seealso}
|
---|
839 | \seemodule[copyreg]{copy_reg}{Pickle interface constructor
|
---|
840 | registration for extension types.}
|
---|
841 |
|
---|
842 | \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
|
---|
843 |
|
---|
844 | \seemodule{copy}{Shallow and deep object copying.}
|
---|
845 |
|
---|
846 | \seemodule{marshal}{High-performance serialization of built-in types.}
|
---|
847 | \end{seealso}
|
---|
848 |
|
---|
849 |
|
---|
850 | \section{\module{cPickle} --- A faster \module{pickle}}
|
---|
851 |
|
---|
852 | \declaremodule{builtin}{cPickle}
|
---|
853 | \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
|
---|
854 | \moduleauthor{Jim Fulton}{jim@zope.com}
|
---|
855 | \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
|
---|
856 |
|
---|
857 | The \module{cPickle} module supports serialization and
|
---|
858 | de-serialization of Python objects, providing an interface and
|
---|
859 | functionality nearly identical to the
|
---|
860 | \refmodule{pickle}\refstmodindex{pickle} module. There are several
|
---|
861 | differences, the most important being performance and subclassability.
|
---|
862 |
|
---|
863 | First, \module{cPickle} can be up to 1000 times faster than
|
---|
864 | \module{pickle} because the former is implemented in C. Second, in
|
---|
865 | the \module{cPickle} module the callables \function{Pickler()} and
|
---|
866 | \function{Unpickler()} are functions, not classes. This means that
|
---|
867 | you cannot use them to derive custom pickling and unpickling
|
---|
868 | subclasses. Most applications have no need for this functionality and
|
---|
869 | should benefit from the greatly improved performance of the
|
---|
870 | \module{cPickle} module.
|
---|
871 |
|
---|
872 | The pickle data stream produced by \module{pickle} and
|
---|
873 | \module{cPickle} are identical, so it is possible to use
|
---|
874 | \module{pickle} and \module{cPickle} interchangeably with existing
|
---|
875 | pickles.\footnote{Since the pickle data format is actually a tiny
|
---|
876 | stack-oriented programming language, and some freedom is taken in the
|
---|
877 | encodings of certain objects, it is possible that the two modules
|
---|
878 | produce different data streams for the same input objects. However it
|
---|
879 | is guaranteed that they will always be able to read each other's
|
---|
880 | data streams.}
|
---|
881 |
|
---|
882 | There are additional minor differences in API between \module{cPickle}
|
---|
883 | and \module{pickle}, however for most applications, they are
|
---|
884 | interchangeable. More documentation is provided in the
|
---|
885 | \module{pickle} module documentation, which
|
---|
886 | includes a list of the documented differences.
|
---|
887 |
|
---|
888 |
|
---|