Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

pickletools.py

Last change on this file was 391, checked in by dmik, 11 years ago
python: Merge vendor 2.7.6 to trunk.
Property svn:eol-style set to `native`
File size: 72.8 KB

Rev	Line
[2]	1	'''"Executable documentation" for the pickle module.
	2
	3	Extensive comments about the pickle protocols and pickle-machine opcodes
	4	can be found here. Some functions meant for external use:
	5
	6	genops(pickle)
	7	Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
	8
	9	dis(pickle, out=None, memo=None, indentlevel=4)
	10	Print a symbolic disassembly of a pickle.
	11	'''
	12
	13	__all__ = ['dis', 'genops', 'optimize']
	14
	15	# Other ideas:
	16	#
	17	# - A pickle verifier: read a pickle and check it exhaustively for
	18	# well-formedness. dis() does a lot of this already.
	19	#
	20	# - A protocol identifier: examine a pickle and return its protocol number
	21	# (== the highest .proto attr value among all the opcodes in the pickle).
	22	# dis() already prints this info at the end.
	23	#
	24	# - A pickle optimizer: for example, tuple-building code is sometimes more
	25	# elaborate than necessary, catering for the possibility that the tuple
	26	# is recursive. Or lots of times a PUT is generated that's never accessed
	27	# by a later GET.
	28
	29
	30	"""
	31	"A pickle" is a program for a virtual pickle machine (PM, but more accurately
	32	called an unpickling machine). It's a sequence of opcodes, interpreted by the
	33	PM, building an arbitrarily complex Python object.
	34
	35	For the most part, the PM is very simple: there are no looping, testing, or
	36	conditional instructions, no arithmetic and no function calls. Opcodes are
	37	executed once each, from first to last, until a STOP opcode is reached.
	38
	39	The PM has two data areas, "the stack" and "the memo".
	40
	41	Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
	42	integer object on the stack, whose value is gotten from a decimal string
	43	literal immediately following the INT opcode in the pickle bytestream. Other
	44	opcodes take Python objects off the stack. The result of unpickling is
	45	whatever object is left on the stack when the final STOP opcode is executed.
	46
	47	The memo is simply an array of objects, or it can be implemented as a dict
	48	mapping little integers to objects. The memo serves as the PM's "long term
	49	memory", and the little integers indexing the memo are akin to variable
	50	names. Some opcodes pop a stack object into the memo at a given index,
	51	and others push a memo object at a given index onto the stack again.
	52
	53	At heart, that's all the PM has. Subtleties arise for these reasons:
	54
	55	+ Object identity. Objects can be arbitrarily complex, and subobjects
	56	may be shared (for example, the list [a, a] refers to the same object a
	57	twice). It can be vital that unpickling recreate an isomorphic object
	58	graph, faithfully reproducing sharing.
	59
	60	+ Recursive objects. For example, after "L = []; L.append(L)", L is a
	61	list, and L[0] is the same list. This is related to the object identity
	62	point, and some sequences of pickle opcodes are subtle in order to
	63	get the right result in all cases.
	64
	65	+ Things pickle doesn't know everything about. Examples of things pickle
	66	does know everything about are Python's builtin scalar and container
	67	types, like ints and tuples. They generally have opcodes dedicated to
	68	them. For things like module references and instances of user-defined
	69	classes, pickle's knowledge is limited. Historically, many enhancements
	70	have been made to the pickle protocol in order to do a better (faster,
	71	and/or more compact) job on those.
	72
	73	+ Backward compatibility and micro-optimization. As explained below,
	74	pickle opcodes never go away, not even when better ways to do a thing
	75	get invented. The repertoire of the PM just keeps growing over time.
	76	For example, protocol 0 had two opcodes for building Python integers (INT
	77	and LONG), protocol 1 added three more for more-efficient pickling of short
	78	integers, and protocol 2 added two more for more-efficient pickling of
	79	long integers (before protocol 2, the only ways to pickle a Python long
	80	took time quadratic in the number of digits, for both pickling and
	81	unpickling). "Opcode bloat" isn't so much a subtlety as a source of
	82	wearying complication.
	83
	84
	85	Pickle protocols:
	86
	87	For compatibility, the meaning of a pickle opcode never changes. Instead new
	88	pickle opcodes get added, and each version's unpickler can handle all the
	89	pickle opcodes in all protocol versions to date. So old pickles continue to
	90	be readable forever. The pickler can generally be told to restrict itself to
	91	the subset of opcodes available under previous protocol versions too, so that
	92	users can create pickles under the current version readable by older
	93	versions. However, a pickle does not contain its version number embedded
	94	within it. If an older unpickler tries to read a pickle using a later
	95	protocol, the result is most likely an exception due to seeing an unknown (in
	96	the older unpickler) opcode.
	97
	98	The original pickle used what's now called "protocol 0", and what was called
	99	"text mode" before Python 2.3. The entire pickle bytestream is made up of
	100	printable 7-bit ASCII characters, plus the newline character, in protocol 0.
	101	That's why it was called text mode. Protocol 0 is small and elegant, but
	102	sometimes painfully inefficient.
	103
	104	The second major set of additions is now called "protocol 1", and was called
	105	"binary mode" before Python 2.3. This added many opcodes with arguments
	106	consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
	107	bytes. Binary mode pickles can be substantially smaller than equivalent
	108	text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
	109	int as 4 bytes following the opcode, which is cheaper to unpickle than the
	110	(perhaps) 11-character decimal string attached to INT. Protocol 1 also added
	111	a number of opcodes that operate on many stack elements at once (like APPENDS
	112	and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
	113
	114	The third major set of additions came in Python 2.3, and is called "protocol
	115	2". This added:
	116
	117	- A better way to pickle instances of new-style classes (NEWOBJ).
	118
	119	- A way for a pickle to identify its protocol (PROTO).
	120
	121	- Time- and space- efficient pickling of long ints (LONG{1,4}).
	122
	123	- Shortcuts for small tuples (TUPLE{1,2,3}}.
	124
	125	- Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
	126
	127	- The "extension registry", a vector of popular objects that can be pushed
	128	efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
	129	the registry contents are predefined (there's nothing akin to the memo's
	130	PUT).
	131
	132	Another independent change with Python 2.3 is the abandonment of any
	133	pretense that it might be safe to load pickles received from untrusted
	134	parties -- no sufficient security analysis has been done to guarantee
	135	this and there isn't a use case that warrants the expense of such an
	136	analysis.
	137
	138	To this end, all tests for __safe_for_unpickling__ or for
	139	copy_reg.safe_constructors are removed from the unpickling code.
	140	References to these variables in the descriptions below are to be seen
	141	as describing unpickling in Python 2.2 and before.
	142	"""
	143
	144	# Meta-rule: Descriptions are stored in instances of descriptor objects,
	145	# with plain constructors. No meta-language is defined from which
	146	# descriptors could be constructed. If you want, e.g., XML, write a little
	147	# program to generate XML from the objects.
	148
	149	##############################################################################
	150	# Some pickle opcodes have an argument, following the opcode in the
	151	# bytestream. An argument is of a specific type, described by an instance
	152	# of ArgumentDescriptor. These are not to be confused with arguments taken
	153	# off the stack -- ArgumentDescriptor applies only to arguments embedded in
	154	# the opcode stream, immediately following an opcode.
	155
	156	# Represents the number of bytes consumed by an argument delimited by the
	157	# next newline character.
	158	UP_TO_NEWLINE = -1
	159
	160	# Represents the number of bytes consumed by a two-argument opcode where
	161	# the first argument gives the number of bytes in the second argument.
	162	TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
	163	TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
	164
	165	class ArgumentDescriptor(object):
	166	__slots__ = (
	167	# name of descriptor record, also a module global name; a string
	168	'name',
	169
	170	# length of argument, in bytes; an int; UP_TO_NEWLINE and
	171	# TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length
	172	# cases
	173	'n',
	174
	175	# a function taking a file-like object, reading this kind of argument
	176	# from the object at the current position, advancing the current
	177	# position by n bytes, and returning the value of the argument
	178	'reader',
	179
	180	# human-readable docs for this arg descriptor; a string
	181	'doc',
	182	)
	183
	184	def __init__(self, name, n, reader, doc):
	185	assert isinstance(name, str)
	186	self.name = name
	187
	188	assert isinstance(n, int) and (n >= 0 or
	189	n in (UP_TO_NEWLINE,
	190	TAKEN_FROM_ARGUMENT1,
	191	TAKEN_FROM_ARGUMENT4))
	192	self.n = n
	193
	194	self.reader = reader
	195
	196	assert isinstance(doc, str)
	197	self.doc = doc
	198
	199	from struct import unpack as _unpack
	200
	201	def read_uint1(f):
	202	r"""
	203	>>> import StringIO
	204	>>> read_uint1(StringIO.StringIO('\xff'))
	205	255
	206	"""
	207
	208	data = f.read(1)
	209	if data:
	210	return ord(data)
	211	raise ValueError("not enough data in stream to read uint1")
	212
	213	uint1 = ArgumentDescriptor(
	214	name='uint1',
	215	n=1,
	216	reader=read_uint1,
	217	doc="One-byte unsigned integer.")
	218
	219
	220	def read_uint2(f):
	221	r"""
	222	>>> import StringIO
	223	>>> read_uint2(StringIO.StringIO('\xff\x00'))
	224	255
	225	>>> read_uint2(StringIO.StringIO('\xff\xff'))
	226	65535
	227	"""
	228
	229	data = f.read(2)
	230	if len(data) == 2:
	231	return _unpack("<H", data)[0]
	232	raise ValueError("not enough data in stream to read uint2")
	233
	234	uint2 = ArgumentDescriptor(
	235	name='uint2',
	236	n=2,
	237	reader=read_uint2,
	238	doc="Two-byte unsigned integer, little-endian.")
	239
	240
	241	def read_int4(f):
	242	r"""
	243	>>> import StringIO
	244	>>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))
	245	255
	246	>>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)
	247	True
	248	"""
	249
	250	data = f.read(4)
	251	if len(data) == 4:
	252	return _unpack("<i", data)[0]
	253	raise ValueError("not enough data in stream to read int4")
	254
	255	int4 = ArgumentDescriptor(
	256	name='int4',
	257	n=4,
	258	reader=read_int4,
	259	doc="Four-byte signed integer, little-endian, 2's complement.")
	260
	261
	262	def read_stringnl(f, decode=True, stripquotes=True):
	263	r"""
	264	>>> import StringIO
	265	>>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))
	266	'abcd'
	267
	268	>>> read_stringnl(StringIO.StringIO("\n"))
	269	Traceback (most recent call last):
	270	...
	271	ValueError: no string quotes around ''
	272
	273	>>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)
	274	''
	275
	276	>>> read_stringnl(StringIO.StringIO("''\n"))
	277	''
	278
	279	>>> read_stringnl(StringIO.StringIO('"abcd"'))
	280	Traceback (most recent call last):
	281	...
	282	ValueError: no newline found when trying to read stringnl
	283
	284	Embedded escapes are undone in the result.
	285	>>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))
	286	'a\n\\b\x00c\td'
	287	"""
	288
	289	data = f.readline()
	290	if not data.endswith('\n'):
	291	raise ValueError("no newline found when trying to read stringnl")
	292	data = data[:-1] # lose the newline
	293
	294	if stripquotes:
	295	for q in "'\"":
	296	if data.startswith(q):
	297	if not data.endswith(q):
	298	raise ValueError("strinq quote %r not found at both "
	299	"ends of %r" % (q, data))
	300	data = data[1:-1]
	301	break
	302	else:
	303	raise ValueError("no string quotes around %r" % data)
	304
	305	# I'm not sure when 'string_escape' was added to the std codecs; it's
	306	# crazy not to use it if it's there.
	307	if decode:
	308	data = data.decode('string_escape')
	309	return data
	310
	311	stringnl = ArgumentDescriptor(
	312	name='stringnl',
	313	n=UP_TO_NEWLINE,
	314	reader=read_stringnl,
	315	doc="""A newline-terminated string.
	316
	317	This is a repr-style string, with embedded escapes, and
	318	bracketing quotes.
	319	""")
	320
	321	def read_stringnl_noescape(f):
	322	return read_stringnl(f, decode=False, stripquotes=False)
	323
	324	stringnl_noescape = ArgumentDescriptor(
	325	name='stringnl_noescape',
	326	n=UP_TO_NEWLINE,
	327	reader=read_stringnl_noescape,
	328	doc="""A newline-terminated string.
	329
	330	This is a str-style string, without embedded escapes,
	331	or bracketing quotes. It should consist solely of
	332	printable ASCII characters.
	333	""")
	334
	335	def read_stringnl_noescape_pair(f):
	336	r"""
	337	>>> import StringIO
	338	>>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))
	339	'Queue Empty'
	340	"""
	341
	342	return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
	343
	344	stringnl_noescape_pair = ArgumentDescriptor(
	345	name='stringnl_noescape_pair',
	346	n=UP_TO_NEWLINE,
	347	reader=read_stringnl_noescape_pair,
	348	doc="""A pair of newline-terminated strings.
	349
	350	These are str-style strings, without embedded
	351	escapes, or bracketing quotes. They should
	352	consist solely of printable ASCII characters.
	353	The pair is returned as a single string, with
	354	a single blank separating the two strings.
	355	""")
	356
	357	def read_string4(f):
	358	r"""
	359	>>> import StringIO
	360	>>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))
	361	''
	362	>>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))
	363	'abc'
	364	>>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))
	365	Traceback (most recent call last):
	366	...
	367	ValueError: expected 50331648 bytes in a string4, but only 6 remain
	368	"""
	369
	370	n = read_int4(f)
	371	if n < 0:
	372	raise ValueError("string4 byte count < 0: %d" % n)
	373	data = f.read(n)
	374	if len(data) == n:
	375	return data
	376	raise ValueError("expected %d bytes in a string4, but only %d remain" %
	377	(n, len(data)))
	378
	379	string4 = ArgumentDescriptor(
	380	name="string4",
	381	n=TAKEN_FROM_ARGUMENT4,
	382	reader=read_string4,
	383	doc="""A counted string.
	384
	385	The first argument is a 4-byte little-endian signed int giving
	386	the number of bytes in the string, and the second argument is
	387	that many bytes.
	388	""")
	389
	390
	391	def read_string1(f):
	392	r"""
	393	>>> import StringIO
	394	>>> read_string1(StringIO.StringIO("\x00"))
	395	''
	396	>>> read_string1(StringIO.StringIO("\x03abcdef"))
	397	'abc'
	398	"""
	399
	400	n = read_uint1(f)
	401	assert n >= 0
	402	data = f.read(n)
	403	if len(data) == n:
	404	return data
	405	raise ValueError("expected %d bytes in a string1, but only %d remain" %
	406	(n, len(data)))
	407
	408	string1 = ArgumentDescriptor(
	409	name="string1",
	410	n=TAKEN_FROM_ARGUMENT1,
	411	reader=read_string1,
	412	doc="""A counted string.
	413
	414	The first argument is a 1-byte unsigned int giving the number
	415	of bytes in the string, and the second argument is that many
	416	bytes.
	417	""")
	418
	419
	420	def read_unicodestringnl(f):
	421	r"""
	422	>>> import StringIO
	423	>>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))
	424	u'abc\uabcd'
	425	"""
	426
	427	data = f.readline()
	428	if not data.endswith('\n'):
	429	raise ValueError("no newline found when trying to read "
	430	"unicodestringnl")
	431	data = data[:-1] # lose the newline
	432	return unicode(data, 'raw-unicode-escape')
	433
	434	unicodestringnl = ArgumentDescriptor(
	435	name='unicodestringnl',
	436	n=UP_TO_NEWLINE,
	437	reader=read_unicodestringnl,
	438	doc="""A newline-terminated Unicode string.
	439
	440	This is raw-unicode-escape encoded, so consists of
	441	printable ASCII characters, and may contain embedded
	442	escape sequences.
	443	""")
	444
	445	def read_unicodestring4(f):
	446	r"""
	447	>>> import StringIO
	448	>>> s = u'abcd\uabcd'
	449	>>> enc = s.encode('utf-8')
	450	>>> enc
	451	'abcd\xea\xaf\x8d'
	452	>>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length
	453	>>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))
	454	>>> s == t
	455	True
	456
	457	>>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))
	458	Traceback (most recent call last):
	459	...
	460	ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
	461	"""
	462
	463	n = read_int4(f)
	464	if n < 0:
	465	raise ValueError("unicodestring4 byte count < 0: %d" % n)
	466	data = f.read(n)
	467	if len(data) == n:
	468	return unicode(data, 'utf-8')
	469	raise ValueError("expected %d bytes in a unicodestring4, but only %d "
	470	"remain" % (n, len(data)))
	471
	472	unicodestring4 = ArgumentDescriptor(
	473	name="unicodestring4",
	474	n=TAKEN_FROM_ARGUMENT4,
	475	reader=read_unicodestring4,
	476	doc="""A counted Unicode string.
	477
	478	The first argument is a 4-byte little-endian signed int
	479	giving the number of bytes in the string, and the second
	480	argument-- the UTF-8 encoding of the Unicode string --
	481	contains that many bytes.
	482	""")
	483
	484
	485	def read_decimalnl_short(f):
	486	r"""
	487	>>> import StringIO
	488	>>> read_decimalnl_short(StringIO.StringIO("1234\n56"))
	489	1234
	490
	491	>>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))
	492	Traceback (most recent call last):
	493	...
	494	ValueError: trailing 'L' not allowed in '1234L'
	495	"""
	496
	497	s = read_stringnl(f, decode=False, stripquotes=False)
	498	if s.endswith("L"):
	499	raise ValueError("trailing 'L' not allowed in %r" % s)
	500
	501	# It's not necessarily true that the result fits in a Python short int:
	502	# the pickle may have been written on a 64-bit box. There's also a hack
	503	# for True and False here.
	504	if s == "00":
	505	return False
	506	elif s == "01":
	507	return True
	508
	509	try:
	510	return int(s)
	511	except OverflowError:
	512	return long(s)
	513
	514	def read_decimalnl_long(f):
	515	r"""
	516	>>> import StringIO
	517
	518	>>> read_decimalnl_long(StringIO.StringIO("1234\n56"))
	519	Traceback (most recent call last):
	520	...
	521	ValueError: trailing 'L' required in '1234'
	522
	523	Someday the trailing 'L' will probably go away from this output.
	524
	525	>>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))
	526	1234L
	527
	528	>>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))
	529	123456789012345678901234L
	530	"""
	531
	532	s = read_stringnl(f, decode=False, stripquotes=False)
	533	if not s.endswith("L"):
	534	raise ValueError("trailing 'L' required in %r" % s)
	535	return long(s)
	536
	537
	538	decimalnl_short = ArgumentDescriptor(
	539	name='decimalnl_short',
	540	n=UP_TO_NEWLINE,
	541	reader=read_decimalnl_short,
	542	doc="""A newline-terminated decimal integer literal.
	543
	544	This never has a trailing 'L', and the integer fit
	545	in a short Python int on the box where the pickle
	546	was written -- but there's no guarantee it will fit
	547	in a short Python int on the box where the pickle
	548	is read.
	549	""")
	550
	551	decimalnl_long = ArgumentDescriptor(
	552	name='decimalnl_long',
	553	n=UP_TO_NEWLINE,
	554	reader=read_decimalnl_long,
	555	doc="""A newline-terminated decimal integer literal.
	556
	557	This has a trailing 'L', and can represent integers
	558	of any size.
	559	""")
	560
	561
	562	def read_floatnl(f):
	563	r"""
	564	>>> import StringIO
	565	>>> read_floatnl(StringIO.StringIO("-1.25\n6"))
	566	-1.25
	567	"""
	568	s = read_stringnl(f, decode=False, stripquotes=False)
	569	return float(s)
	570
	571	floatnl = ArgumentDescriptor(
	572	name='floatnl',
	573	n=UP_TO_NEWLINE,
	574	reader=read_floatnl,
	575	doc="""A newline-terminated decimal floating literal.
	576
	577	In general this requires 17 significant digits for roundtrip
	578	identity, and pickling then unpickling infinities, NaNs, and
	579	minus zero doesn't work across boxes, or on some boxes even
	580	on itself (e.g., Windows can't read the strings it produces
	581	for infinities or NaNs).
	582	""")
	583
	584	def read_float8(f):
	585	r"""
	586	>>> import StringIO, struct
	587	>>> raw = struct.pack(">d", -1.25)
	588	>>> raw
	589	'\xbf\xf4\x00\x00\x00\x00\x00\x00'
	590	>>> read_float8(StringIO.StringIO(raw + "\n"))
	591	-1.25
	592	"""
	593
	594	data = f.read(8)
	595	if len(data) == 8:
	596	return _unpack(">d", data)[0]
	597	raise ValueError("not enough data in stream to read float8")
	598
	599
	600	float8 = ArgumentDescriptor(
	601	name='float8',
	602	n=8,
	603	reader=read_float8,
	604	doc="""An 8-byte binary representation of a float, big-endian.
	605
	606	The format is unique to Python, and shared with the struct
	607	module (format string '>d') "in theory" (the struct and cPickle
	608	implementations don't share the code -- they should). It's
	609	strongly related to the IEEE-754 double format, and, in normal
	610	cases, is in fact identical to the big-endian 754 double format.
	611	On other boxes the dynamic range is limited to that of a 754
	612	double, and "add a half and chop" rounding is used to reduce
	613	the precision to 53 bits. However, even on a 754 box,
	614	infinities, NaNs, and minus zero may not be handled correctly
	615	(may not survive roundtrip pickling intact).
	616	""")
	617
	618	# Protocol 2 formats
	619
	620	from pickle import decode_long
	621
	622	def read_long1(f):
	623	r"""
	624	>>> import StringIO
	625	>>> read_long1(StringIO.StringIO("\x00"))
	626	0L
	627	>>> read_long1(StringIO.StringIO("\x02\xff\x00"))
	628	255L
	629	>>> read_long1(StringIO.StringIO("\x02\xff\x7f"))
	630	32767L
	631	>>> read_long1(StringIO.StringIO("\x02\x00\xff"))
	632	-256L
	633	>>> read_long1(StringIO.StringIO("\x02\x00\x80"))
	634	-32768L
	635	"""
	636
	637	n = read_uint1(f)
	638	data = f.read(n)
	639	if len(data) != n:
	640	raise ValueError("not enough data in stream to read long1")
	641	return decode_long(data)
	642
	643	long1 = ArgumentDescriptor(
	644	name="long1",
	645	n=TAKEN_FROM_ARGUMENT1,
	646	reader=read_long1,
	647	doc="""A binary long, little-endian, using 1-byte size.
	648
	649	This first reads one byte as an unsigned size, then reads that
	650	many bytes and interprets them as a little-endian 2's-complement long.
	651	If the size is 0, that's taken as a shortcut for the long 0L.
	652	""")
	653
	654	def read_long4(f):
	655	r"""
	656	>>> import StringIO
	657	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))
	658	255L
	659	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))
	660	32767L
	661	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))
	662	-256L
	663	>>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))
	664	-32768L
	665	>>> read_long1(StringIO.StringIO("\x00\x00\x00\x00"))
	666	0L
	667	"""
	668
	669	n = read_int4(f)
	670	if n < 0:
	671	raise ValueError("long4 byte count < 0: %d" % n)
	672	data = f.read(n)
	673	if len(data) != n:
	674	raise ValueError("not enough data in stream to read long4")
	675	return decode_long(data)
	676
	677	long4 = ArgumentDescriptor(
	678	name="long4",
	679	n=TAKEN_FROM_ARGUMENT4,
	680	reader=read_long4,
	681	doc="""A binary representation of a long, little-endian.
	682
	683	This first reads four bytes as a signed size (but requires the
	684	size to be >= 0), then reads that many bytes and interprets them
	685	as a little-endian 2's-complement long. If the size is 0, that's taken
	686	as a shortcut for the long 0L, although LONG1 should really be used
	687	then instead (and in any case where # of bytes < 256).
	688	""")
	689
	690
	691	##############################################################################
	692	# Object descriptors. The stack used by the pickle machine holds objects,
	693	# and in the stack_before and stack_after attributes of OpcodeInfo
	694	# descriptors we need names to describe the various types of objects that can
	695	# appear on the stack.
	696
	697	class StackObject(object):
	698	__slots__ = (
	699	# name of descriptor record, for info only
	700	'name',
	701
	702	# type of object, or tuple of type objects (meaning the object can
	703	# be of any type in the tuple)
	704	'obtype',
	705
	706	# human-readable docs for this kind of stack object; a string
	707	'doc',
	708	)
	709
	710	def __init__(self, name, obtype, doc):
	711	assert isinstance(name, str)
	712	self.name = name
	713
	714	assert isinstance(obtype, type) or isinstance(obtype, tuple)
	715	if isinstance(obtype, tuple):
	716	for contained in obtype:
	717	assert isinstance(contained, type)
	718	self.obtype = obtype
	719
	720	assert isinstance(doc, str)
	721	self.doc = doc
	722
	723	def __repr__(self):
	724	return self.name
	725
	726
	727	pyint = StackObject(
	728	name='int',
	729	obtype=int,
	730	doc="A short (as opposed to long) Python integer object.")
	731
	732	pylong = StackObject(
	733	name='long',
	734	obtype=long,
	735	doc="A long (as opposed to short) Python integer object.")
	736
	737	pyinteger_or_bool = StackObject(
	738	name='int_or_bool',
	739	obtype=(int, long, bool),
	740	doc="A Python integer object (short or long), or "
	741	"a Python bool.")
	742
	743	pybool = StackObject(
	744	name='bool',
	745	obtype=(bool,),
	746	doc="A Python bool object.")
	747
	748	pyfloat = StackObject(
	749	name='float',
	750	obtype=float,
	751	doc="A Python float object.")
	752
	753	pystring = StackObject(
	754	name='str',
	755	obtype=str,
	756	doc="A Python string object.")
	757
	758	pyunicode = StackObject(
	759	name='unicode',
	760	obtype=unicode,
	761	doc="A Python Unicode string object.")
	762
	763	pynone = StackObject(
	764	name="None",
	765	obtype=type(None),
	766	doc="The Python None object.")
	767
	768	pytuple = StackObject(
	769	name="tuple",
	770	obtype=tuple,
	771	doc="A Python tuple object.")
	772
	773	pylist = StackObject(
	774	name="list",
	775	obtype=list,
	776	doc="A Python list object.")
	777
	778	pydict = StackObject(
	779	name="dict",
	780	obtype=dict,
	781	doc="A Python dict object.")
	782
	783	anyobject = StackObject(
	784	name='any',
	785	obtype=object,
	786	doc="Any kind of object whatsoever.")
	787
	788	markobject = StackObject(
	789	name="mark",
	790	obtype=StackObject,
	791	doc="""'The mark' is a unique object.
	792
	793	Opcodes that operate on a variable number of objects
	794	generally don't embed the count of objects in the opcode,
	795	or pull it off the stack. Instead the MARK opcode is used
	796	to push a special marker object on the stack, and then
	797	some other opcodes grab all the objects from the top of
	798	the stack down to (but not including) the topmost marker
	799	object.
	800	""")
	801
	802	stackslice = StackObject(
	803	name="stackslice",
	804	obtype=StackObject,
	805	doc="""An object representing a contiguous slice of the stack.
	806
[391]	807	This is used in conjunction with markobject, to represent all
[2]	808	of the stack following the topmost markobject. For example,
	809	the POP_MARK opcode changes the stack from
	810
	811	[..., markobject, stackslice]
	812	to
	813	[...]
	814
	815	No matter how many object are on the stack after the topmost
	816	markobject, POP_MARK gets rid of all of them (including the
	817	topmost markobject too).
	818	""")
	819
	820	##############################################################################
	821	# Descriptors for pickle opcodes.
	822
	823	class OpcodeInfo(object):
	824
	825	__slots__ = (
	826	# symbolic name of opcode; a string
	827	'name',
	828
	829	# the code used in a bytestream to represent the opcode; a
	830	# one-character string
	831	'code',
	832
	833	# If the opcode has an argument embedded in the byte string, an
	834	# instance of ArgumentDescriptor specifying its type. Note that
	835	# arg.reader(s) can be used to read and decode the argument from
	836	# the bytestream s, and arg.doc documents the format of the raw
	837	# argument bytes. If the opcode doesn't have an argument embedded
	838	# in the bytestream, arg should be None.
	839	'arg',
	840
	841	# what the stack looks like before this opcode runs; a list
	842	'stack_before',
	843
	844	# what the stack looks like after this opcode runs; a list
	845	'stack_after',
	846
	847	# the protocol number in which this opcode was introduced; an int
	848	'proto',
	849
	850	# human-readable docs for this opcode; a string
	851	'doc',
	852	)
	853
	854	def __init__(self, name, code, arg,
	855	stack_before, stack_after, proto, doc):
	856	assert isinstance(name, str)
	857	self.name = name
	858
	859	assert isinstance(code, str)
	860	assert len(code) == 1
	861	self.code = code
	862
	863	assert arg is None or isinstance(arg, ArgumentDescriptor)
	864	self.arg = arg
	865
	866	assert isinstance(stack_before, list)
	867	for x in stack_before:
	868	assert isinstance(x, StackObject)
	869	self.stack_before = stack_before
	870
	871	assert isinstance(stack_after, list)
	872	for x in stack_after:
	873	assert isinstance(x, StackObject)
	874	self.stack_after = stack_after
	875
	876	assert isinstance(proto, int) and 0 <= proto <= 2
	877	self.proto = proto
	878
	879	assert isinstance(doc, str)
	880	self.doc = doc
	881
	882	I = OpcodeInfo
	883	opcodes = [
	884
	885	# Ways to spell integers.
	886
	887	I(name='INT',
	888	code='I',
	889	arg=decimalnl_short,
	890	stack_before=[],
	891	stack_after=[pyinteger_or_bool],
	892	proto=0,
	893	doc="""Push an integer or bool.
	894
	895	The argument is a newline-terminated decimal literal string.
	896
	897	The intent may have been that this always fit in a short Python int,
	898	but INT can be generated in pickles written on a 64-bit box that
	899	require a Python long on a 32-bit box. The difference between this
	900	and LONG then is that INT skips a trailing 'L', and produces a short
	901	int whenever possible.
	902
	903	Another difference is due to that, when bool was introduced as a
	904	distinct type in 2.3, builtin names True and False were also added to
	905	2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
	906	True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
	907	Leading zeroes are never produced for a genuine integer. The 2.3
	908	(and later) unpicklers special-case these and return bool instead;
	909	earlier unpicklers ignore the leading "0" and return the int.
	910	"""),
	911
	912	I(name='BININT',
	913	code='J',
	914	arg=int4,
	915	stack_before=[],
	916	stack_after=[pyint],
	917	proto=1,
	918	doc="""Push a four-byte signed integer.
	919
	920	This handles the full range of Python (short) integers on a 32-bit
	921	box, directly as binary bytes (1 for the opcode and 4 for the integer).
	922	If the integer is non-negative and fits in 1 or 2 bytes, pickling via
	923	BININT1 or BININT2 saves space.
	924	"""),
	925
	926	I(name='BININT1',
	927	code='K',
	928	arg=uint1,
	929	stack_before=[],
	930	stack_after=[pyint],
	931	proto=1,
	932	doc="""Push a one-byte unsigned integer.
	933
	934	This is a space optimization for pickling very small non-negative ints,
	935	in range(256).
	936	"""),
	937
	938	I(name='BININT2',
	939	code='M',
	940	arg=uint2,
	941	stack_before=[],
	942	stack_after=[pyint],
	943	proto=1,
	944	doc="""Push a two-byte unsigned integer.
	945
	946	This is a space optimization for pickling small positive ints, in
	947	range(256, 2**16). Integers in range(256) can also be pickled via
	948	BININT2, but BININT1 instead saves a byte.
	949	"""),
	950
	951	I(name='LONG',
	952	code='L',
	953	arg=decimalnl_long,
	954	stack_before=[],
	955	stack_after=[pylong],
	956	proto=0,
	957	doc="""Push a long integer.
	958
	959	The same as INT, except that the literal ends with 'L', and always
	960	unpickles to a Python long. There doesn't seem a real purpose to the
	961	trailing 'L'.
	962
	963	Note that LONG takes time quadratic in the number of digits when
	964	unpickling (this is simply due to the nature of decimal->binary
	965	conversion). Proto 2 added linear-time (in C; still quadratic-time
	966	in Python) LONG1 and LONG4 opcodes.
	967	"""),
	968
	969	I(name="LONG1",
	970	code='\x8a',
	971	arg=long1,
	972	stack_before=[],
	973	stack_after=[pylong],
	974	proto=2,
	975	doc="""Long integer using one-byte length.
	976
	977	A more efficient encoding of a Python long; the long1 encoding
	978	says it all."""),
	979
	980	I(name="LONG4",
	981	code='\x8b',
	982	arg=long4,
	983	stack_before=[],
	984	stack_after=[pylong],
	985	proto=2,
	986	doc="""Long integer using found-byte length.
	987
	988	A more efficient encoding of a Python long; the long4 encoding
	989	says it all."""),
	990
	991	# Ways to spell strings (8-bit, not Unicode).
	992
	993	I(name='STRING',
	994	code='S',
	995	arg=stringnl,
	996	stack_before=[],
	997	stack_after=[pystring],
	998	proto=0,
	999	doc="""Push a Python string object.
	1000
	1001	The argument is a repr-style string, with bracketing quote characters,
	1002	and perhaps embedded escapes. The argument extends until the next
	1003	newline character.
	1004	"""),
	1005
	1006	I(name='BINSTRING',
	1007	code='T',
	1008	arg=string4,
	1009	stack_before=[],
	1010	stack_after=[pystring],
	1011	proto=1,
	1012	doc="""Push a Python string object.
	1013
	1014	There are two arguments: the first is a 4-byte little-endian signed int
	1015	giving the number of bytes in the string, and the second is that many
	1016	bytes, which are taken literally as the string content.
	1017	"""),
	1018
	1019	I(name='SHORT_BINSTRING',
	1020	code='U',
	1021	arg=string1,
	1022	stack_before=[],
	1023	stack_after=[pystring],
	1024	proto=1,
	1025	doc="""Push a Python string object.
	1026
	1027	There are two arguments: the first is a 1-byte unsigned int giving
	1028	the number of bytes in the string, and the second is that many bytes,
	1029	which are taken literally as the string content.
	1030	"""),
	1031
	1032	# Ways to spell None.
	1033
	1034	I(name='NONE',
	1035	code='N',
	1036	arg=None,
	1037	stack_before=[],
	1038	stack_after=[pynone],
	1039	proto=0,
	1040	doc="Push None on the stack."),
	1041
	1042	# Ways to spell bools, starting with proto 2. See INT for how this was
	1043	# done before proto 2.
	1044
	1045	I(name='NEWTRUE',
	1046	code='\x88',
	1047	arg=None,
	1048	stack_before=[],
	1049	stack_after=[pybool],
	1050	proto=2,
	1051	doc="""True.
	1052
	1053	Push True onto the stack."""),
	1054
	1055	I(name='NEWFALSE',
	1056	code='\x89',
	1057	arg=None,
	1058	stack_before=[],
	1059	stack_after=[pybool],
	1060	proto=2,
	1061	doc="""True.
	1062
	1063	Push False onto the stack."""),
	1064
	1065	# Ways to spell Unicode strings.
	1066
	1067	I(name='UNICODE',
	1068	code='V',
	1069	arg=unicodestringnl,
	1070	stack_before=[],
	1071	stack_after=[pyunicode],
	1072	proto=0, # this may be pure-text, but it's a later addition
	1073	doc="""Push a Python Unicode string object.
	1074
	1075	The argument is a raw-unicode-escape encoding of a Unicode string,
	1076	and so may contain embedded escape sequences. The argument extends
	1077	until the next newline character.
	1078	"""),
	1079
	1080	I(name='BINUNICODE',
	1081	code='X',
	1082	arg=unicodestring4,
	1083	stack_before=[],
	1084	stack_after=[pyunicode],
	1085	proto=1,
	1086	doc="""Push a Python Unicode string object.
	1087
	1088	There are two arguments: the first is a 4-byte little-endian signed int
	1089	giving the number of bytes in the string. The second is that many
	1090	bytes, and is the UTF-8 encoding of the Unicode string.
	1091	"""),
	1092
	1093	# Ways to spell floats.
	1094
	1095	I(name='FLOAT',
	1096	code='F',
	1097	arg=floatnl,
	1098	stack_before=[],
	1099	stack_after=[pyfloat],
	1100	proto=0,
	1101	doc="""Newline-terminated decimal float literal.
	1102
	1103	The argument is repr(a_float), and in general requires 17 significant
	1104	digits for roundtrip conversion to be an identity (this is so for
	1105	IEEE-754 double precision values, which is what Python float maps to
	1106	on most boxes).
	1107
	1108	In general, FLOAT cannot be used to transport infinities, NaNs, or
	1109	minus zero across boxes (or even on a single box, if the platform C
	1110	library can't read the strings it produces for such things -- Windows
	1111	is like that), but may do less damage than BINFLOAT on boxes with
	1112	greater precision or dynamic range than IEEE-754 double.
	1113	"""),
	1114
	1115	I(name='BINFLOAT',
	1116	code='G',
	1117	arg=float8,
	1118	stack_before=[],
	1119	stack_after=[pyfloat],
	1120	proto=1,
	1121	doc="""Float stored in binary form, with 8 bytes of data.
	1122
	1123	This generally requires less than half the space of FLOAT encoding.
	1124	In general, BINFLOAT cannot be used to transport infinities, NaNs, or
	1125	minus zero, raises an exception if the exponent exceeds the range of
	1126	an IEEE-754 double, and retains no more than 53 bits of precision (if
	1127	there are more than that, "add a half and chop" rounding is used to
	1128	cut it back to 53 significant bits).
	1129	"""),
	1130
	1131	# Ways to build lists.
	1132
	1133	I(name='EMPTY_LIST',
	1134	code=']',
	1135	arg=None,
	1136	stack_before=[],
	1137	stack_after=[pylist],
	1138	proto=1,
	1139	doc="Push an empty list."),
	1140
	1141	I(name='APPEND',
	1142	code='a',
	1143	arg=None,
	1144	stack_before=[pylist, anyobject],
	1145	stack_after=[pylist],
	1146	proto=0,
	1147	doc="""Append an object to a list.
	1148
	1149	Stack before: ... pylist anyobject
	1150	Stack after: ... pylist+[anyobject]
	1151
	1152	although pylist is really extended in-place.
	1153	"""),
	1154
	1155	I(name='APPENDS',
	1156	code='e',
	1157	arg=None,
	1158	stack_before=[pylist, markobject, stackslice],
	1159	stack_after=[pylist],
	1160	proto=1,
	1161	doc="""Extend a list by a slice of stack objects.
	1162
	1163	Stack before: ... pylist markobject stackslice
	1164	Stack after: ... pylist+stackslice
	1165
	1166	although pylist is really extended in-place.
	1167	"""),
	1168
	1169	I(name='LIST',
	1170	code='l',
	1171	arg=None,
	1172	stack_before=[markobject, stackslice],
	1173	stack_after=[pylist],
	1174	proto=0,
	1175	doc="""Build a list out of the topmost stack slice, after markobject.
	1176
	1177	All the stack entries following the topmost markobject are placed into
	1178	a single Python list, which single list object replaces all of the
	1179	stack from the topmost markobject onward. For example,
	1180
	1181	Stack before: ... markobject 1 2 3 'abc'
	1182	Stack after: ... [1, 2, 3, 'abc']
	1183	"""),
	1184
	1185	# Ways to build tuples.
	1186
	1187	I(name='EMPTY_TUPLE',
	1188	code=')',
	1189	arg=None,
	1190	stack_before=[],
	1191	stack_after=[pytuple],
	1192	proto=1,
	1193	doc="Push an empty tuple."),
	1194
	1195	I(name='TUPLE',
	1196	code='t',
	1197	arg=None,
	1198	stack_before=[markobject, stackslice],
	1199	stack_after=[pytuple],
	1200	proto=0,
	1201	doc="""Build a tuple out of the topmost stack slice, after markobject.
	1202
	1203	All the stack entries following the topmost markobject are placed into
	1204	a single Python tuple, which single tuple object replaces all of the
	1205	stack from the topmost markobject onward. For example,
	1206
	1207	Stack before: ... markobject 1 2 3 'abc'
	1208	Stack after: ... (1, 2, 3, 'abc')
	1209	"""),
	1210
	1211	I(name='TUPLE1',
	1212	code='\x85',
	1213	arg=None,
	1214	stack_before=[anyobject],
	1215	stack_after=[pytuple],
	1216	proto=2,
[391]	1217	doc="""Build a one-tuple out of the topmost item on the stack.
[2]	1218
	1219	This code pops one value off the stack and pushes a tuple of
[391]	1220	length 1 whose one item is that value back onto it. In other
	1221	words:
[2]	1222
	1223	stack[-1] = tuple(stack[-1:])
	1224	"""),
	1225
	1226	I(name='TUPLE2',
	1227	code='\x86',
	1228	arg=None,
	1229	stack_before=[anyobject, anyobject],
	1230	stack_after=[pytuple],
	1231	proto=2,
[391]	1232	doc="""Build a two-tuple out of the top two items on the stack.
[2]	1233
[391]	1234	This code pops two values off the stack and pushes a tuple of
	1235	length 2 whose items are those values back onto it. In other
	1236	words:
[2]	1237
	1238	stack[-2:] = [tuple(stack[-2:])]
	1239	"""),
	1240
	1241	I(name='TUPLE3',
	1242	code='\x87',
	1243	arg=None,
	1244	stack_before=[anyobject, anyobject, anyobject],
	1245	stack_after=[pytuple],
	1246	proto=2,
[391]	1247	doc="""Build a three-tuple out of the top three items on the stack.
[2]	1248
[391]	1249	This code pops three values off the stack and pushes a tuple of
	1250	length 3 whose items are those values back onto it. In other
	1251	words:
[2]	1252
	1253	stack[-3:] = [tuple(stack[-3:])]
	1254	"""),
	1255
	1256	# Ways to build dicts.
	1257
	1258	I(name='EMPTY_DICT',
	1259	code='}',
	1260	arg=None,
	1261	stack_before=[],
	1262	stack_after=[pydict],
	1263	proto=1,
	1264	doc="Push an empty dict."),
	1265
	1266	I(name='DICT',
	1267	code='d',
	1268	arg=None,
	1269	stack_before=[markobject, stackslice],
	1270	stack_after=[pydict],
	1271	proto=0,
	1272	doc="""Build a dict out of the topmost stack slice, after markobject.
	1273
	1274	All the stack entries following the topmost markobject are placed into
	1275	a single Python dict, which single dict object replaces all of the
	1276	stack from the topmost markobject onward. The stack slice alternates
	1277	key, value, key, value, .... For example,
	1278
	1279	Stack before: ... markobject 1 2 3 'abc'
	1280	Stack after: ... {1: 2, 3: 'abc'}
	1281	"""),
	1282
	1283	I(name='SETITEM',
	1284	code='s',
	1285	arg=None,
	1286	stack_before=[pydict, anyobject, anyobject],
	1287	stack_after=[pydict],
	1288	proto=0,
	1289	doc="""Add a key+value pair to an existing dict.
	1290
	1291	Stack before: ... pydict key value
	1292	Stack after: ... pydict
	1293
	1294	where pydict has been modified via pydict[key] = value.
	1295	"""),
	1296
	1297	I(name='SETITEMS',
	1298	code='u',
	1299	arg=None,
	1300	stack_before=[pydict, markobject, stackslice],
	1301	stack_after=[pydict],
	1302	proto=1,
	1303	doc="""Add an arbitrary number of key+value pairs to an existing dict.
	1304
	1305	The slice of the stack following the topmost markobject is taken as
	1306	an alternating sequence of keys and values, added to the dict
	1307	immediately under the topmost markobject. Everything at and after the
	1308	topmost markobject is popped, leaving the mutated dict at the top
	1309	of the stack.
	1310
	1311	Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
	1312	Stack after: ... pydict
	1313
	1314	where pydict has been modified via pydict[key_i] = value_i for i in
	1315	1, 2, ..., n, and in that order.
	1316	"""),
	1317
	1318	# Stack manipulation.
	1319
	1320	I(name='POP',
	1321	code='0',
	1322	arg=None,
	1323	stack_before=[anyobject],
	1324	stack_after=[],
	1325	proto=0,
	1326	doc="Discard the top stack item, shrinking the stack by one item."),
	1327
	1328	I(name='DUP',
	1329	code='2',
	1330	arg=None,
	1331	stack_before=[anyobject],
	1332	stack_after=[anyobject, anyobject],
	1333	proto=0,
	1334	doc="Push the top stack item onto the stack again, duplicating it."),
	1335
	1336	I(name='MARK',
	1337	code='(',
	1338	arg=None,
	1339	stack_before=[],
	1340	stack_after=[markobject],
	1341	proto=0,
	1342	doc="""Push markobject onto the stack.
	1343
	1344	markobject is a unique object, used by other opcodes to identify a
	1345	region of the stack containing a variable number of objects for them
	1346	to work on. See markobject.doc for more detail.
	1347	"""),
	1348
	1349	I(name='POP_MARK',
	1350	code='1',
	1351	arg=None,
	1352	stack_before=[markobject, stackslice],
	1353	stack_after=[],
	1354	proto=1,
	1355	doc="""Pop all the stack objects at and above the topmost markobject.
	1356
	1357	When an opcode using a variable number of stack objects is done,
	1358	POP_MARK is used to remove those objects, and to remove the markobject
	1359	that delimited their starting position on the stack.
	1360	"""),
	1361
	1362	# Memo manipulation. There are really only two operations (get and put),
	1363	# each in all-text, "short binary", and "long binary" flavors.
	1364
	1365	I(name='GET',
	1366	code='g',
	1367	arg=decimalnl_short,
	1368	stack_before=[],
	1369	stack_after=[anyobject],
	1370	proto=0,
	1371	doc="""Read an object from the memo and push it on the stack.
	1372
[391]	1373	The index of the memo object to push is given by the newline-terminated
[2]	1374	decimal string following. BINGET and LONG_BINGET are space-optimized
	1375	versions.
	1376	"""),
	1377
	1378	I(name='BINGET',
	1379	code='h',
	1380	arg=uint1,
	1381	stack_before=[],
	1382	stack_after=[anyobject],
	1383	proto=1,
	1384	doc="""Read an object from the memo and push it on the stack.
	1385
	1386	The index of the memo object to push is given by the 1-byte unsigned
	1387	integer following.
	1388	"""),
	1389
	1390	I(name='LONG_BINGET',
	1391	code='j',
	1392	arg=int4,
	1393	stack_before=[],
	1394	stack_after=[anyobject],
	1395	proto=1,
	1396	doc="""Read an object from the memo and push it on the stack.
	1397
	1398	The index of the memo object to push is given by the 4-byte signed
	1399	little-endian integer following.
	1400	"""),
	1401
	1402	I(name='PUT',
	1403	code='p',
	1404	arg=decimalnl_short,
	1405	stack_before=[],
	1406	stack_after=[],
	1407	proto=0,
	1408	doc="""Store the stack top into the memo. The stack is not popped.
	1409
	1410	The index of the memo location to write into is given by the newline-
	1411	terminated decimal string following. BINPUT and LONG_BINPUT are
	1412	space-optimized versions.
	1413	"""),
	1414
	1415	I(name='BINPUT',
	1416	code='q',
	1417	arg=uint1,
	1418	stack_before=[],
	1419	stack_after=[],
	1420	proto=1,
	1421	doc="""Store the stack top into the memo. The stack is not popped.
	1422
	1423	The index of the memo location to write into is given by the 1-byte
	1424	unsigned integer following.
	1425	"""),
	1426
	1427	I(name='LONG_BINPUT',
	1428	code='r',
	1429	arg=int4,
	1430	stack_before=[],
	1431	stack_after=[],
	1432	proto=1,
	1433	doc="""Store the stack top into the memo. The stack is not popped.
	1434
	1435	The index of the memo location to write into is given by the 4-byte
	1436	signed little-endian integer following.
	1437	"""),
	1438
	1439	# Access the extension registry (predefined objects). Akin to the GET
	1440	# family.
	1441
	1442	I(name='EXT1',
	1443	code='\x82',
	1444	arg=uint1,
	1445	stack_before=[],
	1446	stack_after=[anyobject],
	1447	proto=2,
	1448	doc="""Extension code.
	1449
	1450	This code and the similar EXT2 and EXT4 allow using a registry
	1451	of popular objects that are pickled by name, typically classes.
	1452	It is envisioned that through a global negotiation and
	1453	registration process, third parties can set up a mapping between
	1454	ints and object names.
	1455
	1456	In order to guarantee pickle interchangeability, the extension
	1457	code registry ought to be global, although a range of codes may
	1458	be reserved for private use.
	1459
	1460	EXT1 has a 1-byte integer argument. This is used to index into the
	1461	extension registry, and the object at that index is pushed on the stack.
	1462	"""),
	1463
	1464	I(name='EXT2',
	1465	code='\x83',
	1466	arg=uint2,
	1467	stack_before=[],
	1468	stack_after=[anyobject],
	1469	proto=2,
	1470	doc="""Extension code.
	1471
	1472	See EXT1. EXT2 has a two-byte integer argument.
	1473	"""),
	1474
	1475	I(name='EXT4',
	1476	code='\x84',
	1477	arg=int4,
	1478	stack_before=[],
	1479	stack_after=[anyobject],
	1480	proto=2,
	1481	doc="""Extension code.
	1482
	1483	See EXT1. EXT4 has a four-byte integer argument.
	1484	"""),
	1485
	1486	# Push a class object, or module function, on the stack, via its module
	1487	# and name.
	1488
	1489	I(name='GLOBAL',
	1490	code='c',
	1491	arg=stringnl_noescape_pair,
	1492	stack_before=[],
	1493	stack_after=[anyobject],
	1494	proto=0,
	1495	doc="""Push a global object (module.attr) on the stack.
	1496
	1497	Two newline-terminated strings follow the GLOBAL opcode. The first is
	1498	taken as a module name, and the second as a class name. The class
	1499	object module.class is pushed on the stack. More accurately, the
	1500	object returned by self.find_class(module, class) is pushed on the
	1501	stack, so unpickling subclasses can override this form of lookup.
	1502	"""),
	1503
	1504	# Ways to build objects of classes pickle doesn't know about directly
	1505	# (user-defined classes). I despair of documenting this accurately
	1506	# and comprehensibly -- you really have to read the pickle code to
	1507	# find all the special cases.
	1508
	1509	I(name='REDUCE',
	1510	code='R',
	1511	arg=None,
	1512	stack_before=[anyobject, anyobject],
	1513	stack_after=[anyobject],
	1514	proto=0,
	1515	doc="""Push an object built from a callable and an argument tuple.
	1516
	1517	The opcode is named to remind of the __reduce__() method.
	1518
	1519	Stack before: ... callable pytuple
	1520	Stack after: ... callable(*pytuple)
	1521
	1522	The callable and the argument tuple are the first two items returned
	1523	by a __reduce__ method. Applying the callable to the argtuple is
	1524	supposed to reproduce the original object, or at least get it started.
	1525	If the __reduce__ method returns a 3-tuple, the last component is an
	1526	argument to be passed to the object's __setstate__, and then the REDUCE
	1527	opcode is followed by code to create setstate's argument, and then a
	1528	BUILD opcode to apply __setstate__ to that argument.
	1529
	1530	If type(callable) is not ClassType, REDUCE complains unless the
	1531	callable has been registered with the copy_reg module's
	1532	safe_constructors dict, or the callable has a magic
	1533	'__safe_for_unpickling__' attribute with a true value. I'm not sure
	1534	why it does this, but I've sure seen this complaint often enough when
	1535	I didn't want to <wink>.
	1536	"""),
	1537
	1538	I(name='BUILD',
	1539	code='b',
	1540	arg=None,
	1541	stack_before=[anyobject, anyobject],
	1542	stack_after=[anyobject],
	1543	proto=0,
	1544	doc="""Finish building an object, via __setstate__ or dict update.
	1545
	1546	Stack before: ... anyobject argument
	1547	Stack after: ... anyobject
	1548
	1549	where anyobject may have been mutated, as follows:
	1550
	1551	If the object has a __setstate__ method,
	1552
	1553	anyobject.__setstate__(argument)
	1554
	1555	is called.
	1556
	1557	Else the argument must be a dict, the object must have a __dict__, and
	1558	the object is updated via
	1559
	1560	anyobject.__dict__.update(argument)
	1561
	1562	This may raise RuntimeError in restricted execution mode (which
	1563	disallows access to __dict__ directly); in that case, the object
	1564	is updated instead via
	1565
	1566	for k, v in argument.items():
	1567	anyobject[k] = v
	1568	"""),
	1569
	1570	I(name='INST',
	1571	code='i',
	1572	arg=stringnl_noescape_pair,
	1573	stack_before=[markobject, stackslice],
	1574	stack_after=[anyobject],
	1575	proto=0,
	1576	doc="""Build a class instance.
	1577
	1578	This is the protocol 0 version of protocol 1's OBJ opcode.
	1579	INST is followed by two newline-terminated strings, giving a
	1580	module and class name, just as for the GLOBAL opcode (and see
	1581	GLOBAL for more details about that). self.find_class(module, name)
	1582	is used to get a class object.
	1583
	1584	In addition, all the objects on the stack following the topmost
	1585	markobject are gathered into a tuple and popped (along with the
	1586	topmost markobject), just as for the TUPLE opcode.
	1587
	1588	Now it gets complicated. If all of these are true:
	1589
	1590	+ The argtuple is empty (markobject was at the top of the stack
	1591	at the start).
	1592
	1593	+ It's an old-style class object (the type of the class object is
	1594	ClassType).
	1595
	1596	+ The class object does not have a __getinitargs__ attribute.
	1597
	1598	then we want to create an old-style class instance without invoking
	1599	its __init__() method (pickle has waffled on this over the years; not
	1600	calling __init__() is current wisdom). In this case, an instance of
	1601	an old-style dummy class is created, and then we try to rebind its
	1602	__class__ attribute to the desired class object. If this succeeds,
	1603	the new instance object is pushed on the stack, and we're done. In
	1604	restricted execution mode it can fail (assignment to __class__ is
	1605	disallowed), and I'm not really sure what happens then -- it looks
	1606	like the code ends up calling the class object's __init__ anyway,
	1607	via falling into the next case.
	1608
	1609	Else (the argtuple is not empty, it's not an old-style class object,
	1610	or the class object does have a __getinitargs__ attribute), the code
	1611	first insists that the class object have a __safe_for_unpickling__
	1612	attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
	1613	it doesn't matter whether this attribute has a true or false value, it
	1614	only matters whether it exists (XXX this is a bug; cPickle
	1615	requires the attribute to be true). If __safe_for_unpickling__
	1616	doesn't exist, UnpicklingError is raised.
	1617
	1618	Else (the class object does have a __safe_for_unpickling__ attr),
	1619	the class object obtained from INST's arguments is applied to the
	1620	argtuple obtained from the stack, and the resulting instance object
	1621	is pushed on the stack.
	1622
	1623	NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
	1624	"""),
	1625
	1626	I(name='OBJ',
	1627	code='o',
	1628	arg=None,
	1629	stack_before=[markobject, anyobject, stackslice],
	1630	stack_after=[anyobject],
	1631	proto=1,
	1632	doc="""Build a class instance.
	1633
	1634	This is the protocol 1 version of protocol 0's INST opcode, and is
	1635	very much like it. The major difference is that the class object
	1636	is taken off the stack, allowing it to be retrieved from the memo
	1637	repeatedly if several instances of the same class are created. This
	1638	can be much more efficient (in both time and space) than repeatedly
	1639	embedding the module and class names in INST opcodes.
	1640
	1641	Unlike INST, OBJ takes no arguments from the opcode stream. Instead
	1642	the class object is taken off the stack, immediately above the
	1643	topmost markobject:
	1644
	1645	Stack before: ... markobject classobject stackslice
	1646	Stack after: ... new_instance_object
	1647
	1648	As for INST, the remainder of the stack above the markobject is
	1649	gathered into an argument tuple, and then the logic seems identical,
	1650	except that no __safe_for_unpickling__ check is done (XXX this is
	1651	a bug; cPickle does test __safe_for_unpickling__). See INST for
	1652	the gory details.
	1653
	1654	NOTE: In Python 2.3, INST and OBJ are identical except for how they
	1655	get the class object. That was always the intent; the implementations
	1656	had diverged for accidental reasons.
	1657	"""),
	1658
	1659	I(name='NEWOBJ',
	1660	code='\x81',
	1661	arg=None,
	1662	stack_before=[anyobject, anyobject],
	1663	stack_after=[anyobject],
	1664	proto=2,
	1665	doc="""Build an object instance.
	1666
	1667	The stack before should be thought of as containing a class
	1668	object followed by an argument tuple (the tuple being the stack
	1669	top). Call these cls and args. They are popped off the stack,
	1670	and the value returned by cls.__new__(cls, *args) is pushed back
	1671	onto the stack.
	1672	"""),
	1673
	1674	# Machine control.
	1675
	1676	I(name='PROTO',
	1677	code='\x80',
	1678	arg=uint1,
	1679	stack_before=[],
	1680	stack_after=[],
	1681	proto=2,
	1682	doc="""Protocol version indicator.
	1683
	1684	For protocol 2 and above, a pickle must start with this opcode.
	1685	The argument is the protocol version, an int in range(2, 256).
	1686	"""),
	1687
	1688	I(name='STOP',
	1689	code='.',
	1690	arg=None,
	1691	stack_before=[anyobject],
	1692	stack_after=[],
	1693	proto=0,
	1694	doc="""Stop the unpickling machine.
	1695
	1696	Every pickle ends with this opcode. The object at the top of the stack
	1697	is popped, and that's the result of unpickling. The stack should be
	1698	empty then.
	1699	"""),
	1700
	1701	# Ways to deal with persistent IDs.
	1702
	1703	I(name='PERSID',
	1704	code='P',
	1705	arg=stringnl_noescape,
	1706	stack_before=[],
	1707	stack_after=[anyobject],
	1708	proto=0,
	1709	doc="""Push an object identified by a persistent ID.
	1710
	1711	The pickle module doesn't define what a persistent ID means. PERSID's
	1712	argument is a newline-terminated str-style (no embedded escapes, no
	1713	bracketing quote characters) string, which is "the persistent ID".
	1714	The unpickler passes this string to self.persistent_load(). Whatever
	1715	object that returns is pushed on the stack. There is no implementation
	1716	of persistent_load() in Python's unpickler: it must be supplied by an
	1717	unpickler subclass.
	1718	"""),
	1719
	1720	I(name='BINPERSID',
	1721	code='Q',
	1722	arg=None,
	1723	stack_before=[anyobject],
	1724	stack_after=[anyobject],
	1725	proto=1,
	1726	doc="""Push an object identified by a persistent ID.
	1727
	1728	Like PERSID, except the persistent ID is popped off the stack (instead
	1729	of being a string embedded in the opcode bytestream). The persistent
	1730	ID is passed to self.persistent_load(), and whatever object that
	1731	returns is pushed on the stack. See PERSID for more detail.
	1732	"""),
	1733	]
	1734	del I
	1735
	1736	# Verify uniqueness of .name and .code members.
	1737	name2i = {}
	1738	code2i = {}
	1739
	1740	for i, d in enumerate(opcodes):
	1741	if d.name in name2i:
	1742	raise ValueError("repeated name %r at indices %d and %d" %
	1743	(d.name, name2i[d.name], i))
	1744	if d.code in code2i:
	1745	raise ValueError("repeated code %r at indices %d and %d" %
	1746	(d.code, code2i[d.code], i))
	1747
	1748	name2i[d.name] = i
	1749	code2i[d.code] = i
	1750
	1751	del name2i, code2i, i, d
	1752
	1753	##############################################################################
	1754	# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
	1755	# Also ensure we've got the same stuff as pickle.py, although the
	1756	# introspection here is dicey.
	1757
	1758	code2op = {}
	1759	for d in opcodes:
	1760	code2op[d.code] = d
	1761	del d
	1762
	1763	def assure_pickle_consistency(verbose=False):
	1764	import pickle, re
	1765
	1766	copy = code2op.copy()
	1767	for name in pickle.__all__:
	1768	if not re.match("[A-Z][A-Z0-9_]+$", name):
	1769	if verbose:
	1770	print "skipping %r: it doesn't look like an opcode name" % name
	1771	continue
	1772	picklecode = getattr(pickle, name)
	1773	if not isinstance(picklecode, str) or len(picklecode) != 1:
	1774	if verbose:
	1775	print ("skipping %r: value %r doesn't look like a pickle "
	1776	"code" % (name, picklecode))
	1777	continue
	1778	if picklecode in copy:
	1779	if verbose:
	1780	print "checking name %r w/ code %r for consistency" % (
	1781	name, picklecode)
	1782	d = copy[picklecode]
	1783	if d.name != name:
	1784	raise ValueError("for pickle code %r, pickle.py uses name %r "
	1785	"but we're using name %r" % (picklecode,
	1786	name,
	1787	d.name))
	1788	# Forget this one. Any left over in copy at the end are a problem
	1789	# of a different kind.
	1790	del copy[picklecode]
	1791	else:
	1792	raise ValueError("pickle.py appears to have a pickle opcode with "
	1793	"name %r and code %r, but we don't" %
	1794	(name, picklecode))
	1795	if copy:
	1796	msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
	1797	for code, d in copy.items():
	1798	msg.append(" name %r with code %r" % (d.name, code))
	1799	raise ValueError("\n".join(msg))
	1800
	1801	assure_pickle_consistency()
	1802	del assure_pickle_consistency
	1803
	1804	##############################################################################
	1805	# A pickle opcode generator.
	1806
	1807	def genops(pickle):
	1808	"""Generate all the opcodes in a pickle.
	1809
	1810	'pickle' is a file-like object, or string, containing the pickle.
	1811
	1812	Each opcode in the pickle is generated, from the current pickle position,
	1813	stopping after a STOP opcode is delivered. A triple is generated for
	1814	each opcode:
	1815
	1816	opcode, arg, pos
	1817
	1818	opcode is an OpcodeInfo record, describing the current opcode.
	1819
	1820	If the opcode has an argument embedded in the pickle, arg is its decoded
	1821	value, as a Python object. If the opcode doesn't have an argument, arg
	1822	is None.
	1823
	1824	If the pickle has a tell() method, pos was the value of pickle.tell()
	1825	before reading the current opcode. If the pickle is a string object,
	1826	it's wrapped in a StringIO object, and the latter's tell() result is
	1827	used. Else (the pickle doesn't have a tell(), and it's not obvious how
	1828	to query its current position) pos is None.
	1829	"""
	1830
	1831	import cStringIO as StringIO
	1832
	1833	if isinstance(pickle, str):
	1834	pickle = StringIO.StringIO(pickle)
	1835
	1836	if hasattr(pickle, "tell"):
	1837	getpos = pickle.tell
	1838	else:
	1839	getpos = lambda: None
	1840
	1841	while True:
	1842	pos = getpos()
	1843	code = pickle.read(1)
	1844	opcode = code2op.get(code)
	1845	if opcode is None:
	1846	if code == "":
	1847	raise ValueError("pickle exhausted before seeing STOP")
	1848	else:
	1849	raise ValueError("at position %s, opcode %r unknown" % (
	1850	pos is None and "<unknown>" or pos,
	1851	code))
	1852	if opcode.arg is None:
	1853	arg = None
	1854	else:
	1855	arg = opcode.arg.reader(pickle)
	1856	yield opcode, arg, pos
	1857	if code == '.':
	1858	assert opcode.name == 'STOP'
	1859	break
	1860
	1861	##############################################################################
	1862	# A pickle optimizer.
	1863
	1864	def optimize(p):
	1865	'Optimize a pickle string by removing unused PUT opcodes'
	1866	gets = set() # set of args used by a GET opcode
	1867	puts = [] # (arg, startpos, stoppos) for the PUT opcodes
	1868	prevpos = None # set to pos if previous opcode was a PUT
	1869	for opcode, arg, pos in genops(p):
	1870	if prevpos is not None:
	1871	puts.append((prevarg, prevpos, pos))
	1872	prevpos = None
	1873	if 'PUT' in opcode.name:
	1874	prevarg, prevpos = arg, pos
	1875	elif 'GET' in opcode.name:
	1876	gets.add(arg)
	1877
	1878	# Copy the pickle string except for PUTS without a corresponding GET
	1879	s = []
	1880	i = 0
	1881	for arg, start, stop in puts:
	1882	j = stop if (arg in gets) else start
	1883	s.append(p[i:j])
	1884	i = stop
	1885	s.append(p[i:])
	1886	return ''.join(s)
	1887
	1888	##############################################################################
	1889	# A symbolic pickle disassembler.
	1890
	1891	def dis(pickle, out=None, memo=None, indentlevel=4):
	1892	"""Produce a symbolic disassembly of a pickle.
	1893
	1894	'pickle' is a file-like object, or string, containing a (at least one)
	1895	pickle. The pickle is disassembled from the current position, through
	1896	the first STOP opcode encountered.
	1897
	1898	Optional arg 'out' is a file-like object to which the disassembly is
	1899	printed. It defaults to sys.stdout.
	1900
	1901	Optional arg 'memo' is a Python dict, used as the pickle's memo. It
	1902	may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
	1903	Passing the same memo object to another dis() call then allows disassembly
	1904	to proceed across multiple pickles that were all created by the same
	1905	pickler with the same memo. Ordinarily you don't need to worry about this.
	1906
	1907	Optional arg indentlevel is the number of blanks by which to indent
	1908	a new MARK level. It defaults to 4.
	1909
	1910	In addition to printing the disassembly, some sanity checks are made:
	1911
	1912	+ All embedded opcode arguments "make sense".
	1913
	1914	+ Explicit and implicit pop operations have enough items on the stack.
	1915
	1916	+ When an opcode implicitly refers to a markobject, a markobject is
	1917	actually on the stack.
	1918
	1919	+ A memo entry isn't referenced before it's defined.
	1920
	1921	+ The markobject isn't stored in the memo.
	1922
	1923	+ A memo entry isn't redefined.
	1924	"""
	1925
	1926	# Most of the hair here is for sanity checks, but most of it is needed
	1927	# anyway to detect when a protocol 0 POP takes a MARK off the stack
	1928	# (which in turn is needed to indent MARK blocks correctly).
	1929
	1930	stack = [] # crude emulation of unpickler stack
	1931	if memo is None:
[391]	1932	memo = {} # crude emulation of unpickler memo
[2]	1933	maxproto = -1 # max protocol number seen
	1934	markstack = [] # bytecode positions of MARK opcodes
	1935	indentchunk = ' ' * indentlevel
	1936	errormsg = None
	1937	for opcode, arg, pos in genops(pickle):
	1938	if pos is not None:
	1939	print >> out, "%5d:" % pos,
	1940
	1941	line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
	1942	indentchunk * len(markstack),
	1943	opcode.name)
	1944
	1945	maxproto = max(maxproto, opcode.proto)
	1946	before = opcode.stack_before # don't mutate
	1947	after = opcode.stack_after # don't mutate
	1948	numtopop = len(before)
	1949
	1950	# See whether a MARK should be popped.
	1951	markmsg = None
	1952	if markobject in before or (opcode.name == "POP" and
	1953	stack and
	1954	stack[-1] is markobject):
	1955	assert markobject not in after
	1956	if __debug__:
	1957	if markobject in before:
	1958	assert before[-1] is stackslice
	1959	if markstack:
	1960	markpos = markstack.pop()
	1961	if markpos is None:
	1962	markmsg = "(MARK at unknown opcode offset)"
	1963	else:
	1964	markmsg = "(MARK at %d)" % markpos
	1965	# Pop everything at and after the topmost markobject.
	1966	while stack[-1] is not markobject:
	1967	stack.pop()
	1968	stack.pop()
	1969	# Stop later code from popping too much.
	1970	try:
	1971	numtopop = before.index(markobject)
	1972	except ValueError:
	1973	assert opcode.name == "POP"
	1974	numtopop = 0
	1975	else:
	1976	errormsg = markmsg = "no MARK exists on stack"
	1977
	1978	# Check for correct memo usage.
	1979	if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT"):
	1980	assert arg is not None
	1981	if arg in memo:
	1982	errormsg = "memo key %r already defined" % arg
	1983	elif not stack:
	1984	errormsg = "stack is empty -- can't store into memo"
	1985	elif stack[-1] is markobject:
	1986	errormsg = "can't store markobject in the memo"
	1987	else:
	1988	memo[arg] = stack[-1]
	1989
	1990	elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
	1991	if arg in memo:
	1992	assert len(after) == 1
	1993	after = [memo[arg]] # for better stack emulation
	1994	else:
	1995	errormsg = "memo key %r has never been stored into" % arg
	1996
	1997	if arg is not None or markmsg:
	1998	# make a mild effort to align arguments
	1999	line += ' ' * (10 - len(opcode.name))
	2000	if arg is not None:
	2001	line += ' ' + repr(arg)
	2002	if markmsg:
	2003	line += ' ' + markmsg
	2004	print >> out, line
	2005
	2006	if errormsg:
	2007	# Note that we delayed complaining until the offending opcode
	2008	# was printed.
	2009	raise ValueError(errormsg)
	2010
	2011	# Emulate the stack effects.
	2012	if len(stack) < numtopop:
	2013	raise ValueError("tries to pop %d items from stack with "
	2014	"only %d items" % (numtopop, len(stack)))
	2015	if numtopop:
	2016	del stack[-numtopop:]
	2017	if markobject in after:
	2018	assert markobject not in before
	2019	markstack.append(pos)
	2020
	2021	stack.extend(after)
	2022
	2023	print >> out, "highest protocol among opcodes =", maxproto
	2024	if stack:
	2025	raise ValueError("stack not empty after STOP: %r" % stack)
	2026
	2027	# For use in the doctest, simply as an example of a class to pickle.
	2028	class _Example:
	2029	def __init__(self, value):
	2030	self.value = value
	2031
	2032	_dis_test = r"""
	2033	>>> import pickle
	2034	>>> x = [1, 2, (3, 4), {'abc': u"def"}]
	2035	>>> pkl = pickle.dumps(x, 0)
	2036	>>> dis(pkl)
	2037	0: ( MARK
	2038	1: l LIST (MARK at 0)
	2039	2: p PUT 0
	2040	5: I INT 1
	2041	8: a APPEND
	2042	9: I INT 2
	2043	12: a APPEND
	2044	13: ( MARK
	2045	14: I INT 3
	2046	17: I INT 4
	2047	20: t TUPLE (MARK at 13)
	2048	21: p PUT 1
	2049	24: a APPEND
	2050	25: ( MARK
	2051	26: d DICT (MARK at 25)
	2052	27: p PUT 2
	2053	30: S STRING 'abc'
	2054	37: p PUT 3
	2055	40: V UNICODE u'def'
	2056	45: p PUT 4
	2057	48: s SETITEM
	2058	49: a APPEND
	2059	50: . STOP
	2060	highest protocol among opcodes = 0
	2061
	2062	Try again with a "binary" pickle.
	2063
	2064	>>> pkl = pickle.dumps(x, 1)
	2065	>>> dis(pkl)
	2066	0: ] EMPTY_LIST
	2067	1: q BINPUT 0
	2068	3: ( MARK
	2069	4: K BININT1 1
	2070	6: K BININT1 2
	2071	8: ( MARK
	2072	9: K BININT1 3
	2073	11: K BININT1 4
	2074	13: t TUPLE (MARK at 8)
	2075	14: q BINPUT 1
	2076	16: } EMPTY_DICT
	2077	17: q BINPUT 2
	2078	19: U SHORT_BINSTRING 'abc'
	2079	24: q BINPUT 3
	2080	26: X BINUNICODE u'def'
	2081	34: q BINPUT 4
	2082	36: s SETITEM
	2083	37: e APPENDS (MARK at 3)
	2084	38: . STOP
	2085	highest protocol among opcodes = 1
	2086
	2087	Exercise the INST/OBJ/BUILD family.
	2088
	2089	>>> import pickletools
	2090	>>> dis(pickle.dumps(pickletools.dis, 0))
	2091	0: c GLOBAL 'pickletools dis'
	2092	17: p PUT 0
	2093	20: . STOP
	2094	highest protocol among opcodes = 0
	2095
	2096	>>> from pickletools import _Example
	2097	>>> x = [_Example(42)] * 2
	2098	>>> dis(pickle.dumps(x, 0))
	2099	0: ( MARK
	2100	1: l LIST (MARK at 0)
	2101	2: p PUT 0
	2102	5: ( MARK
	2103	6: i INST 'pickletools _Example' (MARK at 5)
	2104	28: p PUT 1
	2105	31: ( MARK
	2106	32: d DICT (MARK at 31)
	2107	33: p PUT 2
	2108	36: S STRING 'value'
	2109	45: p PUT 3
	2110	48: I INT 42
	2111	52: s SETITEM
	2112	53: b BUILD
	2113	54: a APPEND
	2114	55: g GET 1
	2115	58: a APPEND
	2116	59: . STOP
	2117	highest protocol among opcodes = 0
	2118
	2119	>>> dis(pickle.dumps(x, 1))
	2120	0: ] EMPTY_LIST
	2121	1: q BINPUT 0
	2122	3: ( MARK
	2123	4: ( MARK
	2124	5: c GLOBAL 'pickletools _Example'
	2125	27: q BINPUT 1
	2126	29: o OBJ (MARK at 4)
	2127	30: q BINPUT 2
	2128	32: } EMPTY_DICT
	2129	33: q BINPUT 3
	2130	35: U SHORT_BINSTRING 'value'
	2131	42: q BINPUT 4
	2132	44: K BININT1 42
	2133	46: s SETITEM
	2134	47: b BUILD
	2135	48: h BINGET 2
	2136	50: e APPENDS (MARK at 3)
	2137	51: . STOP
	2138	highest protocol among opcodes = 1
	2139
	2140	Try "the canonical" recursive-object test.
	2141
	2142	>>> L = []
	2143	>>> T = L,
	2144	>>> L.append(T)
	2145	>>> L[0] is T
	2146	True
	2147	>>> T[0] is L
	2148	True
	2149	>>> L[0][0] is L
	2150	True
	2151	>>> T[0][0] is T
	2152	True
	2153	>>> dis(pickle.dumps(L, 0))
	2154	0: ( MARK
	2155	1: l LIST (MARK at 0)
	2156	2: p PUT 0
	2157	5: ( MARK
	2158	6: g GET 0
	2159	9: t TUPLE (MARK at 5)
	2160	10: p PUT 1
	2161	13: a APPEND
	2162	14: . STOP
	2163	highest protocol among opcodes = 0
	2164
	2165	>>> dis(pickle.dumps(L, 1))
	2166	0: ] EMPTY_LIST
	2167	1: q BINPUT 0
	2168	3: ( MARK
	2169	4: h BINGET 0
	2170	6: t TUPLE (MARK at 3)
	2171	7: q BINPUT 1
	2172	9: a APPEND
	2173	10: . STOP
	2174	highest protocol among opcodes = 1
	2175
	2176	Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
	2177	has to emulate the stack in order to realize that the POP opcode at 16 gets
	2178	rid of the MARK at 0.
	2179
	2180	>>> dis(pickle.dumps(T, 0))
	2181	0: ( MARK
	2182	1: ( MARK
	2183	2: l LIST (MARK at 1)
	2184	3: p PUT 0
	2185	6: ( MARK
	2186	7: g GET 0
	2187	10: t TUPLE (MARK at 6)
	2188	11: p PUT 1
	2189	14: a APPEND
	2190	15: 0 POP
	2191	16: 0 POP (MARK at 0)
	2192	17: g GET 1
	2193	20: . STOP
	2194	highest protocol among opcodes = 0
	2195
	2196	>>> dis(pickle.dumps(T, 1))
	2197	0: ( MARK
	2198	1: ] EMPTY_LIST
	2199	2: q BINPUT 0
	2200	4: ( MARK
	2201	5: h BINGET 0
	2202	7: t TUPLE (MARK at 4)
	2203	8: q BINPUT 1
	2204	10: a APPEND
	2205	11: 1 POP_MARK (MARK at 0)
	2206	12: h BINGET 1
	2207	14: . STOP
	2208	highest protocol among opcodes = 1
	2209
	2210	Try protocol 2.
	2211
	2212	>>> dis(pickle.dumps(L, 2))
	2213	0: \x80 PROTO 2
	2214	2: ] EMPTY_LIST
	2215	3: q BINPUT 0
	2216	5: h BINGET 0
	2217	7: \x85 TUPLE1
	2218	8: q BINPUT 1
	2219	10: a APPEND
	2220	11: . STOP
	2221	highest protocol among opcodes = 2
	2222
	2223	>>> dis(pickle.dumps(T, 2))
	2224	0: \x80 PROTO 2
	2225	2: ] EMPTY_LIST
	2226	3: q BINPUT 0
	2227	5: h BINGET 0
	2228	7: \x85 TUPLE1
	2229	8: q BINPUT 1
	2230	10: a APPEND
	2231	11: 0 POP
	2232	12: h BINGET 1
	2233	14: . STOP
	2234	highest protocol among opcodes = 2
	2235	"""
	2236
	2237	_memo_test = r"""
	2238	>>> import pickle
	2239	>>> from StringIO import StringIO
	2240	>>> f = StringIO()
	2241	>>> p = pickle.Pickler(f, 2)
	2242	>>> x = [1, 2, 3]
	2243	>>> p.dump(x)
	2244	>>> p.dump(x)
	2245	>>> f.seek(0)
	2246	>>> memo = {}
	2247	>>> dis(f, memo=memo)
	2248	0: \x80 PROTO 2
	2249	2: ] EMPTY_LIST
	2250	3: q BINPUT 0
	2251	5: ( MARK
	2252	6: K BININT1 1
	2253	8: K BININT1 2
	2254	10: K BININT1 3
	2255	12: e APPENDS (MARK at 5)
	2256	13: . STOP
	2257	highest protocol among opcodes = 2
	2258	>>> dis(f, memo=memo)
	2259	14: \x80 PROTO 2
	2260	16: h BINGET 0
	2261	18: . STOP
	2262	highest protocol among opcodes = 2
	2263	"""
	2264
	2265	__test__ = {'disassembler_test': _dis_test,
	2266	'disassembler_memo_test': _memo_test,
	2267	}
	2268
	2269	def _test():
	2270	import doctest
	2271	return doctest.testmod()
	2272
	2273	if __name__ == "__main__":
	2274	_test()

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Lib/pickletools.py

Download in other formats: