[2] | 1 |
|
---|
| 2 | :mod:`struct` --- Interpret strings as packed binary data
|
---|
| 3 | =========================================================
|
---|
| 4 |
|
---|
| 5 | .. module:: struct
|
---|
| 6 | :synopsis: Interpret strings as packed binary data.
|
---|
| 7 |
|
---|
| 8 | .. index::
|
---|
| 9 | pair: C; structures
|
---|
| 10 | triple: packing; binary; data
|
---|
| 11 |
|
---|
| 12 | This module performs conversions between Python values and C structs represented
|
---|
[391] | 13 | as Python strings. This can be used in handling binary data stored in files or
|
---|
| 14 | from network connections, among other sources. It uses
|
---|
| 15 | :ref:`struct-format-strings` as compact descriptions of the layout of the C
|
---|
| 16 | structs and the intended conversion to/from Python values.
|
---|
[2] | 17 |
|
---|
[391] | 18 | .. note::
|
---|
| 19 |
|
---|
| 20 | By default, the result of packing a given C struct includes pad bytes in
|
---|
| 21 | order to maintain proper alignment for the C types involved; similarly,
|
---|
| 22 | alignment is taken into account when unpacking. This behavior is chosen so
|
---|
| 23 | that the bytes of a packed struct correspond exactly to the layout in memory
|
---|
| 24 | of the corresponding C struct. To handle platform-independent data formats
|
---|
| 25 | or omit implicit pad bytes, use ``standard`` size and alignment instead of
|
---|
| 26 | ``native`` size and alignment: see :ref:`struct-alignment` for details.
|
---|
| 27 |
|
---|
| 28 | Functions and Exceptions
|
---|
| 29 | ------------------------
|
---|
| 30 |
|
---|
[2] | 31 | The module defines the following exception and functions:
|
---|
| 32 |
|
---|
| 33 |
|
---|
| 34 | .. exception:: error
|
---|
| 35 |
|
---|
[391] | 36 | Exception raised on various occasions; argument is a string describing what
|
---|
| 37 | is wrong.
|
---|
[2] | 38 |
|
---|
| 39 |
|
---|
| 40 | .. function:: pack(fmt, v1, v2, ...)
|
---|
| 41 |
|
---|
| 42 | Return a string containing the values ``v1, v2, ...`` packed according to the
|
---|
| 43 | given format. The arguments must match the values required by the format
|
---|
| 44 | exactly.
|
---|
| 45 |
|
---|
| 46 |
|
---|
| 47 | .. function:: pack_into(fmt, buffer, offset, v1, v2, ...)
|
---|
| 48 |
|
---|
[391] | 49 | Pack the values ``v1, v2, ...`` according to the given format, write the
|
---|
| 50 | packed bytes into the writable *buffer* starting at *offset*. Note that the
|
---|
| 51 | offset is a required argument.
|
---|
[2] | 52 |
|
---|
| 53 | .. versionadded:: 2.5
|
---|
| 54 |
|
---|
| 55 |
|
---|
| 56 | .. function:: unpack(fmt, string)
|
---|
| 57 |
|
---|
| 58 | Unpack the string (presumably packed by ``pack(fmt, ...)``) according to the
|
---|
[391] | 59 | given format. The result is a tuple even if it contains exactly one item.
|
---|
| 60 | The string must contain exactly the amount of data required by the format
|
---|
[2] | 61 | (``len(string)`` must equal ``calcsize(fmt)``).
|
---|
| 62 |
|
---|
| 63 |
|
---|
| 64 | .. function:: unpack_from(fmt, buffer[,offset=0])
|
---|
| 65 |
|
---|
| 66 | Unpack the *buffer* according to the given format. The result is a tuple even
|
---|
[391] | 67 | if it contains exactly one item. The *buffer* must contain at least the
|
---|
| 68 | amount of data required by the format (``len(buffer[offset:])`` must be at
|
---|
| 69 | least ``calcsize(fmt)``).
|
---|
[2] | 70 |
|
---|
| 71 | .. versionadded:: 2.5
|
---|
| 72 |
|
---|
| 73 |
|
---|
| 74 | .. function:: calcsize(fmt)
|
---|
| 75 |
|
---|
| 76 | Return the size of the struct (and hence of the string) corresponding to the
|
---|
| 77 | given format.
|
---|
| 78 |
|
---|
[391] | 79 | .. _struct-format-strings:
|
---|
| 80 |
|
---|
| 81 | Format Strings
|
---|
| 82 | --------------
|
---|
| 83 |
|
---|
| 84 | Format strings are the mechanism used to specify the expected layout when
|
---|
| 85 | packing and unpacking data. They are built up from :ref:`format-characters`,
|
---|
| 86 | which specify the type of data being packed/unpacked. In addition, there are
|
---|
| 87 | special characters for controlling the :ref:`struct-alignment`.
|
---|
| 88 |
|
---|
| 89 |
|
---|
| 90 | .. _struct-alignment:
|
---|
| 91 |
|
---|
| 92 | Byte Order, Size, and Alignment
|
---|
| 93 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
---|
| 94 |
|
---|
| 95 | By default, C types are represented in the machine's native format and byte
|
---|
| 96 | order, and properly aligned by skipping pad bytes if necessary (according to the
|
---|
| 97 | rules used by the C compiler).
|
---|
| 98 |
|
---|
| 99 | Alternatively, the first character of the format string can be used to indicate
|
---|
| 100 | the byte order, size and alignment of the packed data, according to the
|
---|
| 101 | following table:
|
---|
| 102 |
|
---|
| 103 | +-----------+------------------------+----------+-----------+
|
---|
| 104 | | Character | Byte order | Size | Alignment |
|
---|
| 105 | +===========+========================+==========+===========+
|
---|
| 106 | | ``@`` | native | native | native |
|
---|
| 107 | +-----------+------------------------+----------+-----------+
|
---|
| 108 | | ``=`` | native | standard | none |
|
---|
| 109 | +-----------+------------------------+----------+-----------+
|
---|
| 110 | | ``<`` | little-endian | standard | none |
|
---|
| 111 | +-----------+------------------------+----------+-----------+
|
---|
| 112 | | ``>`` | big-endian | standard | none |
|
---|
| 113 | +-----------+------------------------+----------+-----------+
|
---|
| 114 | | ``!`` | network (= big-endian) | standard | none |
|
---|
| 115 | +-----------+------------------------+----------+-----------+
|
---|
| 116 |
|
---|
| 117 | If the first character is not one of these, ``'@'`` is assumed.
|
---|
| 118 |
|
---|
| 119 | Native byte order is big-endian or little-endian, depending on the host
|
---|
| 120 | system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
|
---|
| 121 | Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
|
---|
| 122 | switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
|
---|
| 123 | endianness of your system.
|
---|
| 124 |
|
---|
| 125 | Native size and alignment are determined using the C compiler's
|
---|
| 126 | ``sizeof`` expression. This is always combined with native byte order.
|
---|
| 127 |
|
---|
| 128 | Standard size depends only on the format character; see the table in
|
---|
| 129 | the :ref:`format-characters` section.
|
---|
| 130 |
|
---|
| 131 | Note the difference between ``'@'`` and ``'='``: both use native byte order, but
|
---|
| 132 | the size and alignment of the latter is standardized.
|
---|
| 133 |
|
---|
| 134 | The form ``'!'`` is available for those poor souls who claim they can't remember
|
---|
| 135 | whether network byte order is big-endian or little-endian.
|
---|
| 136 |
|
---|
| 137 | There is no way to indicate non-native byte order (force byte-swapping); use the
|
---|
| 138 | appropriate choice of ``'<'`` or ``'>'``.
|
---|
| 139 |
|
---|
| 140 | Notes:
|
---|
| 141 |
|
---|
| 142 | (1) Padding is only automatically added between successive structure members.
|
---|
| 143 | No padding is added at the beginning or the end of the encoded struct.
|
---|
| 144 |
|
---|
| 145 | (2) No padding is added when using non-native size and alignment, e.g.
|
---|
| 146 | with '<', '>', '=', and '!'.
|
---|
| 147 |
|
---|
| 148 | (3) To align the end of a structure to the alignment requirement of a
|
---|
| 149 | particular type, end the format with the code for that type with a repeat
|
---|
| 150 | count of zero. See :ref:`struct-examples`.
|
---|
| 151 |
|
---|
| 152 |
|
---|
| 153 | .. _format-characters:
|
---|
| 154 |
|
---|
| 155 | Format Characters
|
---|
| 156 | ^^^^^^^^^^^^^^^^^
|
---|
| 157 |
|
---|
[2] | 158 | Format characters have the following meaning; the conversion between C and
|
---|
[391] | 159 | Python values should be obvious given their types. The 'Standard size' column
|
---|
| 160 | refers to the size of the packed value in bytes when using standard size; that
|
---|
| 161 | is, when the format string starts with one of ``'<'``, ``'>'``, ``'!'`` or
|
---|
| 162 | ``'='``. When using native size, the size of the packed value is
|
---|
| 163 | platform-dependent.
|
---|
[2] | 164 |
|
---|
[391] | 165 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 166 | | Format | C Type | Python type | Standard size | Notes |
|
---|
| 167 | +========+==========================+====================+================+============+
|
---|
| 168 | | ``x`` | pad byte | no value | | |
|
---|
| 169 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 170 | | ``c`` | :c:type:`char` | string of length 1 | 1 | |
|
---|
| 171 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 172 | | ``b`` | :c:type:`signed char` | integer | 1 | \(3) |
|
---|
| 173 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 174 | | ``B`` | :c:type:`unsigned char` | integer | 1 | \(3) |
|
---|
| 175 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 176 | | ``?`` | :c:type:`_Bool` | bool | 1 | \(1) |
|
---|
| 177 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 178 | | ``h`` | :c:type:`short` | integer | 2 | \(3) |
|
---|
| 179 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 180 | | ``H`` | :c:type:`unsigned short` | integer | 2 | \(3) |
|
---|
| 181 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 182 | | ``i`` | :c:type:`int` | integer | 4 | \(3) |
|
---|
| 183 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 184 | | ``I`` | :c:type:`unsigned int` | integer | 4 | \(3) |
|
---|
| 185 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 186 | | ``l`` | :c:type:`long` | integer | 4 | \(3) |
|
---|
| 187 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 188 | | ``L`` | :c:type:`unsigned long` | integer | 4 | \(3) |
|
---|
| 189 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 190 | | ``q`` | :c:type:`long long` | integer | 8 | \(2), \(3) |
|
---|
| 191 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 192 | | ``Q`` | :c:type:`unsigned long | integer | 8 | \(2), \(3) |
|
---|
| 193 | | | long` | | | |
|
---|
| 194 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 195 | | ``f`` | :c:type:`float` | float | 4 | \(4) |
|
---|
| 196 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 197 | | ``d`` | :c:type:`double` | float | 8 | \(4) |
|
---|
| 198 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 199 | | ``s`` | :c:type:`char[]` | string | | |
|
---|
| 200 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 201 | | ``p`` | :c:type:`char[]` | string | | |
|
---|
| 202 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
| 203 | | ``P`` | :c:type:`void \*` | integer | | \(5), \(3) |
|
---|
| 204 | +--------+--------------------------+--------------------+----------------+------------+
|
---|
[2] | 205 |
|
---|
| 206 | Notes:
|
---|
| 207 |
|
---|
| 208 | (1)
|
---|
[391] | 209 | The ``'?'`` conversion code corresponds to the :c:type:`_Bool` type defined by
|
---|
| 210 | C99. If this type is not available, it is simulated using a :c:type:`char`. In
|
---|
[2] | 211 | standard mode, it is always represented by one byte.
|
---|
| 212 |
|
---|
| 213 | .. versionadded:: 2.6
|
---|
| 214 |
|
---|
| 215 | (2)
|
---|
| 216 | The ``'q'`` and ``'Q'`` conversion codes are available in native mode only if
|
---|
[391] | 217 | the platform C compiler supports C :c:type:`long long`, or, on Windows,
|
---|
| 218 | :c:type:`__int64`. They are always available in standard modes.
|
---|
[2] | 219 |
|
---|
| 220 | .. versionadded:: 2.2
|
---|
| 221 |
|
---|
[391] | 222 | (3)
|
---|
| 223 | When attempting to pack a non-integer using any of the integer conversion
|
---|
| 224 | codes, if the non-integer has a :meth:`__index__` method then that method is
|
---|
| 225 | called to convert the argument to an integer before packing. If no
|
---|
| 226 | :meth:`__index__` method exists, or the call to :meth:`__index__` raises
|
---|
| 227 | :exc:`TypeError`, then the :meth:`__int__` method is tried. However, the use
|
---|
| 228 | of :meth:`__int__` is deprecated, and will raise :exc:`DeprecationWarning`.
|
---|
| 229 |
|
---|
| 230 | .. versionchanged:: 2.7
|
---|
| 231 | Use of the :meth:`__index__` method for non-integers is new in 2.7.
|
---|
| 232 |
|
---|
| 233 | .. versionchanged:: 2.7
|
---|
| 234 | Prior to version 2.7, not all integer conversion codes would use the
|
---|
| 235 | :meth:`__int__` method to convert, and :exc:`DeprecationWarning` was
|
---|
| 236 | raised only for float arguments.
|
---|
| 237 |
|
---|
| 238 | (4)
|
---|
| 239 | For the ``'f'`` and ``'d'`` conversion codes, the packed representation uses
|
---|
| 240 | the IEEE 754 binary32 (for ``'f'``) or binary64 (for ``'d'``) format,
|
---|
| 241 | regardless of the floating-point format used by the platform.
|
---|
| 242 |
|
---|
| 243 | (5)
|
---|
| 244 | The ``'P'`` format character is only available for the native byte ordering
|
---|
| 245 | (selected as the default or with the ``'@'`` byte order character). The byte
|
---|
| 246 | order character ``'='`` chooses to use little- or big-endian ordering based
|
---|
| 247 | on the host system. The struct module does not interpret this as native
|
---|
| 248 | ordering, so the ``'P'`` format is not available.
|
---|
| 249 |
|
---|
| 250 |
|
---|
[2] | 251 | A format character may be preceded by an integral repeat count. For example,
|
---|
| 252 | the format string ``'4h'`` means exactly the same as ``'hhhh'``.
|
---|
| 253 |
|
---|
| 254 | Whitespace characters between formats are ignored; a count and its format must
|
---|
| 255 | not contain whitespace though.
|
---|
| 256 |
|
---|
| 257 | For the ``'s'`` format character, the count is interpreted as the size of the
|
---|
| 258 | string, not a repeat count like for the other format characters; for example,
|
---|
| 259 | ``'10s'`` means a single 10-byte string, while ``'10c'`` means 10 characters.
|
---|
[391] | 260 | If a count is not given, it defaults to 1. For packing, the string is
|
---|
| 261 | truncated or padded with null bytes as appropriate to make it fit. For
|
---|
| 262 | unpacking, the resulting string always has exactly the specified number of
|
---|
| 263 | bytes. As a special case, ``'0s'`` means a single, empty string (while
|
---|
| 264 | ``'0c'`` means 0 characters).
|
---|
[2] | 265 |
|
---|
| 266 | The ``'p'`` format character encodes a "Pascal string", meaning a short
|
---|
[391] | 267 | variable-length string stored in a *fixed number of bytes*, given by the count.
|
---|
| 268 | The first byte stored is the length of the string, or 255, whichever is smaller.
|
---|
| 269 | The bytes of the string follow. If the string passed in to :func:`pack` is too
|
---|
| 270 | long (longer than the count minus 1), only the leading ``count-1`` bytes of the
|
---|
| 271 | string are stored. If the string is shorter than ``count-1``, it is padded with
|
---|
| 272 | null bytes so that exactly count bytes in all are used. Note that for
|
---|
| 273 | :func:`unpack`, the ``'p'`` format character consumes count bytes, but that the
|
---|
| 274 | string returned can never contain more than 255 characters.
|
---|
[2] | 275 |
|
---|
| 276 | For the ``'P'`` format character, the return value is a Python integer or long
|
---|
| 277 | integer, depending on the size needed to hold a pointer when it has been cast to
|
---|
| 278 | an integer type. A *NULL* pointer will always be returned as the Python integer
|
---|
| 279 | ``0``. When packing pointer-sized values, Python integer or long integer objects
|
---|
| 280 | may be used. For example, the Alpha and Merced processors use 64-bit pointer
|
---|
| 281 | values, meaning a Python long integer will be used to hold the pointer; other
|
---|
| 282 | platforms use 32-bit pointers and will use a Python integer.
|
---|
| 283 |
|
---|
| 284 | For the ``'?'`` format character, the return value is either :const:`True` or
|
---|
| 285 | :const:`False`. When packing, the truth value of the argument object is used.
|
---|
| 286 | Either 0 or 1 in the native or standard bool representation will be packed, and
|
---|
| 287 | any non-zero value will be True when unpacking.
|
---|
| 288 |
|
---|
| 289 |
|
---|
| 290 |
|
---|
[391] | 291 | .. _struct-examples:
|
---|
[2] | 292 |
|
---|
[391] | 293 | Examples
|
---|
| 294 | ^^^^^^^^
|
---|
[2] | 295 |
|
---|
[391] | 296 | .. note::
|
---|
| 297 | All examples assume a native byte order, size, and alignment with a
|
---|
| 298 | big-endian machine.
|
---|
[2] | 299 |
|
---|
[391] | 300 | A basic example of packing/unpacking three integers::
|
---|
[2] | 301 |
|
---|
| 302 | >>> from struct import *
|
---|
| 303 | >>> pack('hhl', 1, 2, 3)
|
---|
| 304 | '\x00\x01\x00\x02\x00\x00\x00\x03'
|
---|
| 305 | >>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
|
---|
| 306 | (1, 2, 3)
|
---|
| 307 | >>> calcsize('hhl')
|
---|
| 308 | 8
|
---|
| 309 |
|
---|
| 310 | Unpacked fields can be named by assigning them to variables or by wrapping
|
---|
| 311 | the result in a named tuple::
|
---|
| 312 |
|
---|
| 313 | >>> record = 'raymond \x32\x12\x08\x01\x08'
|
---|
| 314 | >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record)
|
---|
| 315 |
|
---|
| 316 | >>> from collections import namedtuple
|
---|
| 317 | >>> Student = namedtuple('Student', 'name serialnum school gradelevel')
|
---|
[391] | 318 | >>> Student._make(unpack('<10sHHb', record))
|
---|
[2] | 319 | Student(name='raymond ', serialnum=4658, school=264, gradelevel=8)
|
---|
| 320 |
|
---|
[391] | 321 | The ordering of format characters may have an impact on size since the padding
|
---|
| 322 | needed to satisfy alignment requirements is different::
|
---|
| 323 |
|
---|
| 324 | >>> pack('ci', '*', 0x12131415)
|
---|
| 325 | '*\x00\x00\x00\x12\x13\x14\x15'
|
---|
| 326 | >>> pack('ic', 0x12131415, '*')
|
---|
| 327 | '\x12\x13\x14\x15*'
|
---|
| 328 | >>> calcsize('ci')
|
---|
| 329 | 8
|
---|
| 330 | >>> calcsize('ic')
|
---|
| 331 | 5
|
---|
| 332 |
|
---|
| 333 | The following format ``'llh0l'`` specifies two pad bytes at the end, assuming
|
---|
| 334 | longs are aligned on 4-byte boundaries::
|
---|
| 335 |
|
---|
| 336 | >>> pack('llh0l', 1, 2, 3)
|
---|
| 337 | '\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00'
|
---|
| 338 |
|
---|
| 339 | This only works when native size and alignment are in effect; standard size and
|
---|
| 340 | alignment does not enforce any alignment.
|
---|
| 341 |
|
---|
| 342 |
|
---|
[2] | 343 | .. seealso::
|
---|
| 344 |
|
---|
| 345 | Module :mod:`array`
|
---|
| 346 | Packed binary storage of homogeneous data.
|
---|
| 347 |
|
---|
| 348 | Module :mod:`xdrlib`
|
---|
| 349 | Packing and unpacking of XDR data.
|
---|
| 350 |
|
---|
| 351 |
|
---|
| 352 | .. _struct-objects:
|
---|
| 353 |
|
---|
[391] | 354 | Classes
|
---|
| 355 | -------
|
---|
[2] | 356 |
|
---|
| 357 | The :mod:`struct` module also defines the following type:
|
---|
| 358 |
|
---|
| 359 |
|
---|
| 360 | .. class:: Struct(format)
|
---|
| 361 |
|
---|
[391] | 362 | Return a new Struct object which writes and reads binary data according to
|
---|
| 363 | the format string *format*. Creating a Struct object once and calling its
|
---|
| 364 | methods is more efficient than calling the :mod:`struct` functions with the
|
---|
| 365 | same format since the format string only needs to be compiled once.
|
---|
[2] | 366 |
|
---|
| 367 | .. versionadded:: 2.5
|
---|
| 368 |
|
---|
| 369 | Compiled Struct objects support the following methods and attributes:
|
---|
| 370 |
|
---|
| 371 |
|
---|
| 372 | .. method:: pack(v1, v2, ...)
|
---|
| 373 |
|
---|
| 374 | Identical to the :func:`pack` function, using the compiled format.
|
---|
| 375 | (``len(result)`` will equal :attr:`self.size`.)
|
---|
| 376 |
|
---|
| 377 |
|
---|
| 378 | .. method:: pack_into(buffer, offset, v1, v2, ...)
|
---|
| 379 |
|
---|
| 380 | Identical to the :func:`pack_into` function, using the compiled format.
|
---|
| 381 |
|
---|
| 382 |
|
---|
| 383 | .. method:: unpack(string)
|
---|
| 384 |
|
---|
| 385 | Identical to the :func:`unpack` function, using the compiled format.
|
---|
| 386 | (``len(string)`` must equal :attr:`self.size`).
|
---|
| 387 |
|
---|
| 388 |
|
---|
[391] | 389 | .. method:: unpack_from(buffer, offset=0)
|
---|
[2] | 390 |
|
---|
| 391 | Identical to the :func:`unpack_from` function, using the compiled format.
|
---|
| 392 | (``len(buffer[offset:])`` must be at least :attr:`self.size`).
|
---|
| 393 |
|
---|
| 394 |
|
---|
| 395 | .. attribute:: format
|
---|
| 396 |
|
---|
| 397 | The format string used to construct this Struct object.
|
---|
| 398 |
|
---|
| 399 | .. attribute:: size
|
---|
| 400 |
|
---|
| 401 | The calculated size of the struct (and hence of the string) corresponding
|
---|
| 402 | to :attr:`format`.
|
---|
| 403 |
|
---|