1 | \section{\module{zlib} ---
|
---|
2 | Compression compatible with \program{gzip}}
|
---|
3 |
|
---|
4 | \declaremodule{builtin}{zlib}
|
---|
5 | \modulesynopsis{Low-level interface to compression and decompression
|
---|
6 | routines compatible with \program{gzip}.}
|
---|
7 |
|
---|
8 |
|
---|
9 | For applications that require data compression, the functions in this
|
---|
10 | module allow compression and decompression, using the zlib library.
|
---|
11 | The zlib library has its own home page at \url{http://www.zlib.net}.
|
---|
12 | There are known incompatibilities between the Python module and
|
---|
13 | versions of the zlib library earlier than 1.1.3; 1.1.3 has a security
|
---|
14 | vulnerability, so we recommend using 1.1.4 or later.
|
---|
15 |
|
---|
16 | zlib's functions have many options and often need to be used in a
|
---|
17 | particular order. This documentation doesn't attempt to cover all of
|
---|
18 | the permutations; consult the zlib manual at
|
---|
19 | \url{http://www.zlib.net/manual.html} for authoritative information.
|
---|
20 |
|
---|
21 | The available exception and functions in this module are:
|
---|
22 |
|
---|
23 | \begin{excdesc}{error}
|
---|
24 | Exception raised on compression and decompression errors.
|
---|
25 | \end{excdesc}
|
---|
26 |
|
---|
27 |
|
---|
28 | \begin{funcdesc}{adler32}{string\optional{, value}}
|
---|
29 | Computes a Adler-32 checksum of \var{string}. (An Adler-32
|
---|
30 | checksum is almost as reliable as a CRC32 but can be computed much
|
---|
31 | more quickly.) If \var{value} is present, it is used as the
|
---|
32 | starting value of the checksum; otherwise, a fixed default value is
|
---|
33 | used. This allows computing a running checksum over the
|
---|
34 | concatenation of several input strings. The algorithm is not
|
---|
35 | cryptographically strong, and should not be used for
|
---|
36 | authentication or digital signatures. Since the algorithm is
|
---|
37 | designed for use as a checksum algorithm, it is not suitable for
|
---|
38 | use as a general hash algorithm.
|
---|
39 | \end{funcdesc}
|
---|
40 |
|
---|
41 | \begin{funcdesc}{compress}{string\optional{, level}}
|
---|
42 | Compresses the data in \var{string}, returning a string contained
|
---|
43 | compressed data. \var{level} is an integer from \code{1} to
|
---|
44 | \code{9} controlling the level of compression; \code{1} is fastest
|
---|
45 | and produces the least compression, \code{9} is slowest and produces
|
---|
46 | the most. The default value is \code{6}. Raises the
|
---|
47 | \exception{error} exception if any error occurs.
|
---|
48 | \end{funcdesc}
|
---|
49 |
|
---|
50 | \begin{funcdesc}{compressobj}{\optional{level}}
|
---|
51 | Returns a compression object, to be used for compressing data streams
|
---|
52 | that won't fit into memory at once. \var{level} is an integer from
|
---|
53 | \code{1} to \code{9} controlling the level of compression; \code{1} is
|
---|
54 | fastest and produces the least compression, \code{9} is slowest and
|
---|
55 | produces the most. The default value is \code{6}.
|
---|
56 | \end{funcdesc}
|
---|
57 |
|
---|
58 | \begin{funcdesc}{crc32}{string\optional{, value}}
|
---|
59 | Computes a CRC (Cyclic Redundancy Check)%
|
---|
60 | \index{Cyclic Redundancy Check}
|
---|
61 | \index{checksum!Cyclic Redundancy Check}
|
---|
62 | checksum of \var{string}. If
|
---|
63 | \var{value} is present, it is used as the starting value of the
|
---|
64 | checksum; otherwise, a fixed default value is used. This allows
|
---|
65 | computing a running checksum over the concatenation of several
|
---|
66 | input strings. The algorithm is not cryptographically strong, and
|
---|
67 | should not be used for authentication or digital signatures. Since
|
---|
68 | the algorithm is designed for use as a checksum algorithm, it is not
|
---|
69 | suitable for use as a general hash algorithm.
|
---|
70 | \end{funcdesc}
|
---|
71 |
|
---|
72 | \begin{funcdesc}{decompress}{string\optional{, wbits\optional{, bufsize}}}
|
---|
73 | Decompresses the data in \var{string}, returning a string containing
|
---|
74 | the uncompressed data. The \var{wbits} parameter controls the size of
|
---|
75 | the window buffer. If \var{bufsize} is given, it is used as the
|
---|
76 | initial size of the output buffer. Raises the \exception{error}
|
---|
77 | exception if any error occurs.
|
---|
78 |
|
---|
79 | The absolute value of \var{wbits} is the base two logarithm of the
|
---|
80 | size of the history buffer (the ``window size'') used when compressing
|
---|
81 | data. Its absolute value should be between 8 and 15 for the most
|
---|
82 | recent versions of the zlib library, larger values resulting in better
|
---|
83 | compression at the expense of greater memory usage. The default value
|
---|
84 | is 15. When \var{wbits} is negative, the standard
|
---|
85 | \program{gzip} header is suppressed; this is an undocumented feature
|
---|
86 | of the zlib library, used for compatibility with \program{unzip}'s
|
---|
87 | compression file format.
|
---|
88 |
|
---|
89 | \var{bufsize} is the initial size of the buffer used to hold
|
---|
90 | decompressed data. If more space is required, the buffer size will be
|
---|
91 | increased as needed, so you don't have to get this value exactly
|
---|
92 | right; tuning it will only save a few calls to \cfunction{malloc()}. The
|
---|
93 | default size is 16384.
|
---|
94 |
|
---|
95 | \end{funcdesc}
|
---|
96 |
|
---|
97 | \begin{funcdesc}{decompressobj}{\optional{wbits}}
|
---|
98 | Returns a decompression object, to be used for decompressing data
|
---|
99 | streams that won't fit into memory at once. The \var{wbits}
|
---|
100 | parameter controls the size of the window buffer.
|
---|
101 | \end{funcdesc}
|
---|
102 |
|
---|
103 | Compression objects support the following methods:
|
---|
104 |
|
---|
105 | \begin{methoddesc}[Compress]{compress}{string}
|
---|
106 | Compress \var{string}, returning a string containing compressed data
|
---|
107 | for at least part of the data in \var{string}. This data should be
|
---|
108 | concatenated to the output produced by any preceding calls to the
|
---|
109 | \method{compress()} method. Some input may be kept in internal buffers
|
---|
110 | for later processing.
|
---|
111 | \end{methoddesc}
|
---|
112 |
|
---|
113 | \begin{methoddesc}[Compress]{flush}{\optional{mode}}
|
---|
114 | All pending input is processed, and a string containing the remaining
|
---|
115 | compressed output is returned. \var{mode} can be selected from the
|
---|
116 | constants \constant{Z_SYNC_FLUSH}, \constant{Z_FULL_FLUSH}, or
|
---|
117 | \constant{Z_FINISH}, defaulting to \constant{Z_FINISH}. \constant{Z_SYNC_FLUSH} and
|
---|
118 | \constant{Z_FULL_FLUSH} allow compressing further strings of data, while
|
---|
119 | \constant{Z_FINISH} finishes the compressed stream and
|
---|
120 | prevents compressing any more data. After calling
|
---|
121 | \method{flush()} with \var{mode} set to \constant{Z_FINISH}, the
|
---|
122 | \method{compress()} method cannot be called again; the only realistic
|
---|
123 | action is to delete the object.
|
---|
124 | \end{methoddesc}
|
---|
125 |
|
---|
126 | \begin{methoddesc}[Compress]{copy}{}
|
---|
127 | Returns a copy of the compression object. This can be used to efficiently
|
---|
128 | compress a set of data that share a common initial prefix.
|
---|
129 | \versionadded{2.5}
|
---|
130 | \end{methoddesc}
|
---|
131 |
|
---|
132 | Decompression objects support the following methods, and two attributes:
|
---|
133 |
|
---|
134 | \begin{memberdesc}{unused_data}
|
---|
135 | A string which contains any bytes past the end of the compressed data.
|
---|
136 | That is, this remains \code{""} until the last byte that contains
|
---|
137 | compression data is available. If the whole string turned out to
|
---|
138 | contain compressed data, this is \code{""}, the empty string.
|
---|
139 |
|
---|
140 | The only way to determine where a string of compressed data ends is by
|
---|
141 | actually decompressing it. This means that when compressed data is
|
---|
142 | contained part of a larger file, you can only find the end of it by
|
---|
143 | reading data and feeding it followed by some non-empty string into a
|
---|
144 | decompression object's \method{decompress} method until the
|
---|
145 | \member{unused_data} attribute is no longer the empty string.
|
---|
146 | \end{memberdesc}
|
---|
147 |
|
---|
148 | \begin{memberdesc}{unconsumed_tail}
|
---|
149 | A string that contains any data that was not consumed by the last
|
---|
150 | \method{decompress} call because it exceeded the limit for the
|
---|
151 | uncompressed data buffer. This data has not yet been seen by the zlib
|
---|
152 | machinery, so you must feed it (possibly with further data
|
---|
153 | concatenated to it) back to a subsequent \method{decompress} method
|
---|
154 | call in order to get correct output.
|
---|
155 | \end{memberdesc}
|
---|
156 |
|
---|
157 |
|
---|
158 | \begin{methoddesc}[Decompress]{decompress}{string\optional{, max_length}}
|
---|
159 | Decompress \var{string}, returning a string containing the
|
---|
160 | uncompressed data corresponding to at least part of the data in
|
---|
161 | \var{string}. This data should be concatenated to the output produced
|
---|
162 | by any preceding calls to the
|
---|
163 | \method{decompress()} method. Some of the input data may be preserved
|
---|
164 | in internal buffers for later processing.
|
---|
165 |
|
---|
166 | If the optional parameter \var{max_length} is supplied then the return value
|
---|
167 | will be no longer than \var{max_length}. This may mean that not all of the
|
---|
168 | compressed input can be processed; and unconsumed data will be stored
|
---|
169 | in the attribute \member{unconsumed_tail}. This string must be passed
|
---|
170 | to a subsequent call to \method{decompress()} if decompression is to
|
---|
171 | continue. If \var{max_length} is not supplied then the whole input is
|
---|
172 | decompressed, and \member{unconsumed_tail} is an empty string.
|
---|
173 | \end{methoddesc}
|
---|
174 |
|
---|
175 | \begin{methoddesc}[Decompress]{flush}{\optional{length}}
|
---|
176 | All pending input is processed, and a string containing the remaining
|
---|
177 | uncompressed output is returned. After calling \method{flush()}, the
|
---|
178 | \method{decompress()} method cannot be called again; the only realistic
|
---|
179 | action is to delete the object.
|
---|
180 |
|
---|
181 | The optional parameter \var{length} sets the initial size of the
|
---|
182 | output buffer.
|
---|
183 | \end{methoddesc}
|
---|
184 |
|
---|
185 | \begin{methoddesc}[Decompress]{copy}{}
|
---|
186 | Returns a copy of the decompression object. This can be used to save the
|
---|
187 | state of the decompressor midway through the data stream in order to speed up
|
---|
188 | random seeks into the stream at a future point.
|
---|
189 | \versionadded{2.5}
|
---|
190 | \end{methoddesc}
|
---|
191 |
|
---|
192 | \begin{seealso}
|
---|
193 | \seemodule{gzip}{Reading and writing \program{gzip}-format files.}
|
---|
194 | \seeurl{http://www.zlib.net}{The zlib library home page.}
|
---|
195 | \seeurl{http://www.zlib.net/manual.html}{The zlib manual explains
|
---|
196 | the semantics and usage of the library's many functions.}
|
---|
197 | \end{seealso}
|
---|