1 | \section{\module{struct} ---
|
---|
2 | Interpret strings as packed binary data}
|
---|
3 | \declaremodule{builtin}{struct}
|
---|
4 |
|
---|
5 | \modulesynopsis{Interpret strings as packed binary data.}
|
---|
6 |
|
---|
7 | \indexii{C}{structures}
|
---|
8 | \indexiii{packing}{binary}{data}
|
---|
9 |
|
---|
10 | This module performs conversions between Python values and C
|
---|
11 | structs represented as Python strings. It uses \dfn{format strings}
|
---|
12 | (explained below) as compact descriptions of the lay-out of the C
|
---|
13 | structs and the intended conversion to/from Python values. This can
|
---|
14 | be used in handling binary data stored in files or from network
|
---|
15 | connections, among other sources.
|
---|
16 |
|
---|
17 | The module defines the following exception and functions:
|
---|
18 |
|
---|
19 |
|
---|
20 | \begin{excdesc}{error}
|
---|
21 | Exception raised on various occasions; argument is a string
|
---|
22 | describing what is wrong.
|
---|
23 | \end{excdesc}
|
---|
24 |
|
---|
25 | \begin{funcdesc}{pack}{fmt, v1, v2, \textrm{\ldots}}
|
---|
26 | Return a string containing the values
|
---|
27 | \code{\var{v1}, \var{v2}, \textrm{\ldots}} packed according to the given
|
---|
28 | format. The arguments must match the values required by the format
|
---|
29 | exactly.
|
---|
30 | \end{funcdesc}
|
---|
31 |
|
---|
32 | \begin{funcdesc}{unpack}{fmt, string}
|
---|
33 | Unpack the string (presumably packed by \code{pack(\var{fmt},
|
---|
34 | \textrm{\ldots})}) according to the given format. The result is a
|
---|
35 | tuple even if it contains exactly one item. The string must contain
|
---|
36 | exactly the amount of data required by the format
|
---|
37 | (\code{len(\var{string})} must equal \code{calcsize(\var{fmt})}).
|
---|
38 | \end{funcdesc}
|
---|
39 |
|
---|
40 | \begin{funcdesc}{calcsize}{fmt}
|
---|
41 | Return the size of the struct (and hence of the string)
|
---|
42 | corresponding to the given format.
|
---|
43 | \end{funcdesc}
|
---|
44 |
|
---|
45 | Format characters have the following meaning; the conversion between
|
---|
46 | C and Python values should be obvious given their types:
|
---|
47 |
|
---|
48 | \begin{tableiv}{c|l|l|c}{samp}{Format}{C Type}{Python}{Notes}
|
---|
49 | \lineiv{x}{pad byte}{no value}{}
|
---|
50 | \lineiv{c}{\ctype{char}}{string of length 1}{}
|
---|
51 | \lineiv{b}{\ctype{signed char}}{integer}{}
|
---|
52 | \lineiv{B}{\ctype{unsigned char}}{integer}{}
|
---|
53 | \lineiv{h}{\ctype{short}}{integer}{}
|
---|
54 | \lineiv{H}{\ctype{unsigned short}}{integer}{}
|
---|
55 | \lineiv{i}{\ctype{int}}{integer}{}
|
---|
56 | \lineiv{I}{\ctype{unsigned int}}{long}{}
|
---|
57 | \lineiv{l}{\ctype{long}}{integer}{}
|
---|
58 | \lineiv{L}{\ctype{unsigned long}}{long}{}
|
---|
59 | \lineiv{q}{\ctype{long long}}{long}{(1)}
|
---|
60 | \lineiv{Q}{\ctype{unsigned long long}}{long}{(1)}
|
---|
61 | \lineiv{f}{\ctype{float}}{float}{}
|
---|
62 | \lineiv{d}{\ctype{double}}{float}{}
|
---|
63 | \lineiv{s}{\ctype{char[]}}{string}{}
|
---|
64 | \lineiv{p}{\ctype{char[]}}{string}{}
|
---|
65 | \lineiv{P}{\ctype{void *}}{integer}{}
|
---|
66 | \end{tableiv}
|
---|
67 |
|
---|
68 | \noindent
|
---|
69 | Notes:
|
---|
70 |
|
---|
71 | \begin{description}
|
---|
72 | \item[(1)]
|
---|
73 | The \character{q} and \character{Q} conversion codes are available in
|
---|
74 | native mode only if the platform C compiler supports C \ctype{long long},
|
---|
75 | or, on Windows, \ctype{__int64}. They are always available in standard
|
---|
76 | modes.
|
---|
77 | \versionadded{2.2}
|
---|
78 | \end{description}
|
---|
79 |
|
---|
80 |
|
---|
81 | A format character may be preceded by an integral repeat count. For
|
---|
82 | example, the format string \code{'4h'} means exactly the same as
|
---|
83 | \code{'hhhh'}.
|
---|
84 |
|
---|
85 | Whitespace characters between formats are ignored; a count and its
|
---|
86 | format must not contain whitespace though.
|
---|
87 |
|
---|
88 | For the \character{s} format character, the count is interpreted as the
|
---|
89 | size of the string, not a repeat count like for the other format
|
---|
90 | characters; for example, \code{'10s'} means a single 10-byte string, while
|
---|
91 | \code{'10c'} means 10 characters. For packing, the string is
|
---|
92 | truncated or padded with null bytes as appropriate to make it fit.
|
---|
93 | For unpacking, the resulting string always has exactly the specified
|
---|
94 | number of bytes. As a special case, \code{'0s'} means a single, empty
|
---|
95 | string (while \code{'0c'} means 0 characters).
|
---|
96 |
|
---|
97 | The \character{p} format character encodes a "Pascal string", meaning
|
---|
98 | a short variable-length string stored in a fixed number of bytes.
|
---|
99 | The count is the total number of bytes stored. The first byte stored is
|
---|
100 | the length of the string, or 255, whichever is smaller. The bytes
|
---|
101 | of the string follow. If the string passed in to \function{pack()} is too
|
---|
102 | long (longer than the count minus 1), only the leading count-1 bytes of the
|
---|
103 | string are stored. If the string is shorter than count-1, it is padded
|
---|
104 | with null bytes so that exactly count bytes in all are used. Note that
|
---|
105 | for \function{unpack()}, the \character{p} format character consumes count
|
---|
106 | bytes, but that the string returned can never contain more than 255
|
---|
107 | characters.
|
---|
108 |
|
---|
109 | For the \character{I}, \character{L}, \character{q} and \character{Q}
|
---|
110 | format characters, the return value is a Python long integer.
|
---|
111 |
|
---|
112 | For the \character{P} format character, the return value is a Python
|
---|
113 | integer or long integer, depending on the size needed to hold a
|
---|
114 | pointer when it has been cast to an integer type. A \NULL{} pointer will
|
---|
115 | always be returned as the Python integer \code{0}. When packing pointer-sized
|
---|
116 | values, Python integer or long integer objects may be used. For
|
---|
117 | example, the Alpha and Merced processors use 64-bit pointer values,
|
---|
118 | meaning a Python long integer will be used to hold the pointer; other
|
---|
119 | platforms use 32-bit pointers and will use a Python integer.
|
---|
120 |
|
---|
121 | By default, C numbers are represented in the machine's native format
|
---|
122 | and byte order, and properly aligned by skipping pad bytes if
|
---|
123 | necessary (according to the rules used by the C compiler).
|
---|
124 |
|
---|
125 | Alternatively, the first character of the format string can be used to
|
---|
126 | indicate the byte order, size and alignment of the packed data,
|
---|
127 | according to the following table:
|
---|
128 |
|
---|
129 | \begin{tableiii}{c|l|l}{samp}{Character}{Byte order}{Size and alignment}
|
---|
130 | \lineiii{@}{native}{native}
|
---|
131 | \lineiii{=}{native}{standard}
|
---|
132 | \lineiii{<}{little-endian}{standard}
|
---|
133 | \lineiii{>}{big-endian}{standard}
|
---|
134 | \lineiii{!}{network (= big-endian)}{standard}
|
---|
135 | \end{tableiii}
|
---|
136 |
|
---|
137 | If the first character is not one of these, \character{@} is assumed.
|
---|
138 |
|
---|
139 | Native byte order is big-endian or little-endian, depending on the
|
---|
140 | host system. For example, Motorola and Sun processors are big-endian;
|
---|
141 | Intel and DEC processors are little-endian.
|
---|
142 |
|
---|
143 | Native size and alignment are determined using the C compiler's
|
---|
144 | \keyword{sizeof} expression. This is always combined with native byte
|
---|
145 | order.
|
---|
146 |
|
---|
147 | Standard size and alignment are as follows: no alignment is required
|
---|
148 | for any type (so you have to use pad bytes);
|
---|
149 | \ctype{short} is 2 bytes;
|
---|
150 | \ctype{int} and \ctype{long} are 4 bytes;
|
---|
151 | \ctype{long long} (\ctype{__int64} on Windows) is 8 bytes;
|
---|
152 | \ctype{float} and \ctype{double} are 32-bit and 64-bit
|
---|
153 | IEEE floating point numbers, respectively.
|
---|
154 |
|
---|
155 | Note the difference between \character{@} and \character{=}: both use
|
---|
156 | native byte order, but the size and alignment of the latter is
|
---|
157 | standardized.
|
---|
158 |
|
---|
159 | The form \character{!} is available for those poor souls who claim they
|
---|
160 | can't remember whether network byte order is big-endian or
|
---|
161 | little-endian.
|
---|
162 |
|
---|
163 | There is no way to indicate non-native byte order (force
|
---|
164 | byte-swapping); use the appropriate choice of \character{<} or
|
---|
165 | \character{>}.
|
---|
166 |
|
---|
167 | The \character{P} format character is only available for the native
|
---|
168 | byte ordering (selected as the default or with the \character{@} byte
|
---|
169 | order character). The byte order character \character{=} chooses to
|
---|
170 | use little- or big-endian ordering based on the host system. The
|
---|
171 | struct module does not interpret this as native ordering, so the
|
---|
172 | \character{P} format is not available.
|
---|
173 |
|
---|
174 | Examples (all using native byte order, size and alignment, on a
|
---|
175 | big-endian machine):
|
---|
176 |
|
---|
177 | \begin{verbatim}
|
---|
178 | >>> from struct import *
|
---|
179 | >>> pack('hhl', 1, 2, 3)
|
---|
180 | '\x00\x01\x00\x02\x00\x00\x00\x03'
|
---|
181 | >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
|
---|
182 | (1, 2, 3)
|
---|
183 | >>> calcsize('hhl')
|
---|
184 | 8
|
---|
185 | \end{verbatim}
|
---|
186 |
|
---|
187 | Hint: to align the end of a structure to the alignment requirement of
|
---|
188 | a particular type, end the format with the code for that type with a
|
---|
189 | repeat count of zero. For example, the format \code{'llh0l'}
|
---|
190 | specifies two pad bytes at the end, assuming longs are aligned on
|
---|
191 | 4-byte boundaries. This only works when native size and alignment are
|
---|
192 | in effect; standard size and alignment does not enforce any alignment.
|
---|
193 |
|
---|
194 | \begin{seealso}
|
---|
195 | \seemodule{array}{Packed binary storage of homogeneous data.}
|
---|
196 | \seemodule{xdrlib}{Packing and unpacking of XDR data.}
|
---|
197 | \end{seealso}
|
---|