1 | \section{\module{rfc822} ---
|
---|
2 | Parse RFC 2822 mail headers}
|
---|
3 |
|
---|
4 | \declaremodule{standard}{rfc822}
|
---|
5 | \modulesynopsis{Parse \rfc{2822} style mail messages.}
|
---|
6 |
|
---|
7 | \deprecated{2.3}{The \refmodule{email} package should be used in
|
---|
8 | preference to the \module{rfc822} module. This
|
---|
9 | module is present only to maintain backward
|
---|
10 | compatibility.}
|
---|
11 |
|
---|
12 | This module defines a class, \class{Message}, which represents an
|
---|
13 | ``email message'' as defined by the Internet standard
|
---|
14 | \rfc{2822}.\footnote{This module originally conformed to \rfc{822},
|
---|
15 | hence the name. Since then, \rfc{2822} has been released as an
|
---|
16 | update to \rfc{822}. This module should be considered
|
---|
17 | \rfc{2822}-conformant, especially in cases where the
|
---|
18 | syntax or semantics have changed since \rfc{822}.} Such messages
|
---|
19 | consist of a collection of message headers, and a message body. This
|
---|
20 | module also defines a helper class
|
---|
21 | \class{AddressList} for parsing \rfc{2822} addresses. Please refer to
|
---|
22 | the RFC for information on the specific syntax of \rfc{2822} messages.
|
---|
23 |
|
---|
24 | The \refmodule{mailbox}\refstmodindex{mailbox} module provides classes
|
---|
25 | to read mailboxes produced by various end-user mail programs.
|
---|
26 |
|
---|
27 | \begin{classdesc}{Message}{file\optional{, seekable}}
|
---|
28 | A \class{Message} instance is instantiated with an input object as
|
---|
29 | parameter. Message relies only on the input object having a
|
---|
30 | \method{readline()} method; in particular, ordinary file objects
|
---|
31 | qualify. Instantiation reads headers from the input object up to a
|
---|
32 | delimiter line (normally a blank line) and stores them in the
|
---|
33 | instance. The message body, following the headers, is not consumed.
|
---|
34 |
|
---|
35 | This class can work with any input object that supports a
|
---|
36 | \method{readline()} method. If the input object has seek and tell
|
---|
37 | capability, the \method{rewindbody()} method will work; also, illegal
|
---|
38 | lines will be pushed back onto the input stream. If the input object
|
---|
39 | lacks seek but has an \method{unread()} method that can push back a
|
---|
40 | line of input, \class{Message} will use that to push back illegal
|
---|
41 | lines. Thus this class can be used to parse messages coming from a
|
---|
42 | buffered stream.
|
---|
43 |
|
---|
44 | The optional \var{seekable} argument is provided as a workaround for
|
---|
45 | certain stdio libraries in which \cfunction{tell()} discards buffered
|
---|
46 | data before discovering that the \cfunction{lseek()} system call
|
---|
47 | doesn't work. For maximum portability, you should set the seekable
|
---|
48 | argument to zero to prevent that initial \method{tell()} when passing
|
---|
49 | in an unseekable object such as a file object created from a socket
|
---|
50 | object.
|
---|
51 |
|
---|
52 | Input lines as read from the file may either be terminated by CR-LF or
|
---|
53 | by a single linefeed; a terminating CR-LF is replaced by a single
|
---|
54 | linefeed before the line is stored.
|
---|
55 |
|
---|
56 | All header matching is done independent of upper or lower case;
|
---|
57 | e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
|
---|
58 | \code{\var{m}['FROM']} all yield the same result.
|
---|
59 | \end{classdesc}
|
---|
60 |
|
---|
61 | \begin{classdesc}{AddressList}{field}
|
---|
62 | You may instantiate the \class{AddressList} helper class using a single
|
---|
63 | string parameter, a comma-separated list of \rfc{2822} addresses to be
|
---|
64 | parsed. (The parameter \code{None} yields an empty list.)
|
---|
65 | \end{classdesc}
|
---|
66 |
|
---|
67 | \begin{funcdesc}{quote}{str}
|
---|
68 | Return a new string with backslashes in \var{str} replaced by two
|
---|
69 | backslashes and double quotes replaced by backslash-double quote.
|
---|
70 | \end{funcdesc}
|
---|
71 |
|
---|
72 | \begin{funcdesc}{unquote}{str}
|
---|
73 | Return a new string which is an \emph{unquoted} version of \var{str}.
|
---|
74 | If \var{str} ends and begins with double quotes, they are stripped
|
---|
75 | off. Likewise if \var{str} ends and begins with angle brackets, they
|
---|
76 | are stripped off.
|
---|
77 | \end{funcdesc}
|
---|
78 |
|
---|
79 | \begin{funcdesc}{parseaddr}{address}
|
---|
80 | Parse \var{address}, which should be the value of some
|
---|
81 | address-containing field such as \mailheader{To} or \mailheader{Cc},
|
---|
82 | into its constituent ``realname'' and ``email address'' parts.
|
---|
83 | Returns a tuple of that information, unless the parse fails, in which
|
---|
84 | case a 2-tuple \code{(None, None)} is returned.
|
---|
85 | \end{funcdesc}
|
---|
86 |
|
---|
87 | \begin{funcdesc}{dump_address_pair}{pair}
|
---|
88 | The inverse of \method{parseaddr()}, this takes a 2-tuple of the form
|
---|
89 | \code{(\var{realname}, \var{email_address})} and returns the string
|
---|
90 | value suitable for a \mailheader{To} or \mailheader{Cc} header. If
|
---|
91 | the first element of \var{pair} is false, then the second element is
|
---|
92 | returned unmodified.
|
---|
93 | \end{funcdesc}
|
---|
94 |
|
---|
95 | \begin{funcdesc}{parsedate}{date}
|
---|
96 | Attempts to parse a date according to the rules in \rfc{2822}.
|
---|
97 | however, some mailers don't follow that format as specified, so
|
---|
98 | \function{parsedate()} tries to guess correctly in such cases.
|
---|
99 | \var{date} is a string containing an \rfc{2822} date, such as
|
---|
100 | \code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
|
---|
101 | the date, \function{parsedate()} returns a 9-tuple that can be passed
|
---|
102 | directly to \function{time.mktime()}; otherwise \code{None} will be
|
---|
103 | returned. Note that fields 6, 7, and 8 of the result tuple are not
|
---|
104 | usable.
|
---|
105 | \end{funcdesc}
|
---|
106 |
|
---|
107 | \begin{funcdesc}{parsedate_tz}{date}
|
---|
108 | Performs the same function as \function{parsedate()}, but returns
|
---|
109 | either \code{None} or a 10-tuple; the first 9 elements make up a tuple
|
---|
110 | that can be passed directly to \function{time.mktime()}, and the tenth
|
---|
111 | is the offset of the date's timezone from UTC (which is the official
|
---|
112 | term for Greenwich Mean Time). (Note that the sign of the timezone
|
---|
113 | offset is the opposite of the sign of the \code{time.timezone}
|
---|
114 | variable for the same timezone; the latter variable follows the
|
---|
115 | \POSIX{} standard while this module follows \rfc{2822}.) If the input
|
---|
116 | string has no timezone, the last element of the tuple returned is
|
---|
117 | \code{None}. Note that fields 6, 7, and 8 of the result tuple are not
|
---|
118 | usable.
|
---|
119 | \end{funcdesc}
|
---|
120 |
|
---|
121 | \begin{funcdesc}{mktime_tz}{tuple}
|
---|
122 | Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
|
---|
123 | timestamp. If the timezone item in the tuple is \code{None}, assume
|
---|
124 | local time. Minor deficiency: this first interprets the first 8
|
---|
125 | elements as a local time and then compensates for the timezone
|
---|
126 | difference; this may yield a slight error around daylight savings time
|
---|
127 | switch dates. Not enough to worry about for common use.
|
---|
128 | \end{funcdesc}
|
---|
129 |
|
---|
130 |
|
---|
131 | \begin{seealso}
|
---|
132 | \seemodule{email}{Comprehensive email handling package; supersedes
|
---|
133 | the \module{rfc822} module.}
|
---|
134 | \seemodule{mailbox}{Classes to read various mailbox formats produced
|
---|
135 | by end-user mail programs.}
|
---|
136 | \seemodule{mimetools}{Subclass of \class{rfc822.Message} that
|
---|
137 | handles MIME encoded messages.}
|
---|
138 | \end{seealso}
|
---|
139 |
|
---|
140 |
|
---|
141 | \subsection{Message Objects \label{message-objects}}
|
---|
142 |
|
---|
143 | A \class{Message} instance has the following methods:
|
---|
144 |
|
---|
145 | \begin{methoddesc}{rewindbody}{}
|
---|
146 | Seek to the start of the message body. This only works if the file
|
---|
147 | object is seekable.
|
---|
148 | \end{methoddesc}
|
---|
149 |
|
---|
150 | \begin{methoddesc}{isheader}{line}
|
---|
151 | Returns a line's canonicalized fieldname (the dictionary key that will
|
---|
152 | be used to index it) if the line is a legal \rfc{2822} header; otherwise
|
---|
153 | returns \code{None} (implying that parsing should stop here and the
|
---|
154 | line be pushed back on the input stream). It is sometimes useful to
|
---|
155 | override this method in a subclass.
|
---|
156 | \end{methoddesc}
|
---|
157 |
|
---|
158 | \begin{methoddesc}{islast}{line}
|
---|
159 | Return true if the given line is a delimiter on which Message should
|
---|
160 | stop. The delimiter line is consumed, and the file object's read
|
---|
161 | location positioned immediately after it. By default this method just
|
---|
162 | checks that the line is blank, but you can override it in a subclass.
|
---|
163 | \end{methoddesc}
|
---|
164 |
|
---|
165 | \begin{methoddesc}{iscomment}{line}
|
---|
166 | Return \code{True} if the given line should be ignored entirely, just skipped.
|
---|
167 | By default this is a stub that always returns \code{False}, but you can
|
---|
168 | override it in a subclass.
|
---|
169 | \end{methoddesc}
|
---|
170 |
|
---|
171 | \begin{methoddesc}{getallmatchingheaders}{name}
|
---|
172 | Return a list of lines consisting of all headers matching
|
---|
173 | \var{name}, if any. Each physical line, whether it is a continuation
|
---|
174 | line or not, is a separate list item. Return the empty list if no
|
---|
175 | header matches \var{name}.
|
---|
176 | \end{methoddesc}
|
---|
177 |
|
---|
178 | \begin{methoddesc}{getfirstmatchingheader}{name}
|
---|
179 | Return a list of lines comprising the first header matching
|
---|
180 | \var{name}, and its continuation line(s), if any. Return
|
---|
181 | \code{None} if there is no header matching \var{name}.
|
---|
182 | \end{methoddesc}
|
---|
183 |
|
---|
184 | \begin{methoddesc}{getrawheader}{name}
|
---|
185 | Return a single string consisting of the text after the colon in the
|
---|
186 | first header matching \var{name}. This includes leading whitespace,
|
---|
187 | the trailing linefeed, and internal linefeeds and whitespace if there
|
---|
188 | any continuation line(s) were present. Return \code{None} if there is
|
---|
189 | no header matching \var{name}.
|
---|
190 | \end{methoddesc}
|
---|
191 |
|
---|
192 | \begin{methoddesc}{getheader}{name\optional{, default}}
|
---|
193 | Like \code{getrawheader(\var{name})}, but strip leading and trailing
|
---|
194 | whitespace. Internal whitespace is not stripped. The optional
|
---|
195 | \var{default} argument can be used to specify a different default to
|
---|
196 | be returned when there is no header matching \var{name}.
|
---|
197 | \end{methoddesc}
|
---|
198 |
|
---|
199 | \begin{methoddesc}{get}{name\optional{, default}}
|
---|
200 | An alias for \method{getheader()}, to make the interface more compatible
|
---|
201 | with regular dictionaries.
|
---|
202 | \end{methoddesc}
|
---|
203 |
|
---|
204 | \begin{methoddesc}{getaddr}{name}
|
---|
205 | Return a pair \code{(\var{full name}, \var{email address})} parsed
|
---|
206 | from the string returned by \code{getheader(\var{name})}. If no
|
---|
207 | header matching \var{name} exists, return \code{(None, None)};
|
---|
208 | otherwise both the full name and the address are (possibly empty)
|
---|
209 | strings.
|
---|
210 |
|
---|
211 | Example: If \var{m}'s first \mailheader{From} header contains the
|
---|
212 | string \code{'jack@cwi.nl (Jack Jansen)'}, then
|
---|
213 | \code{m.getaddr('From')} will yield the pair
|
---|
214 | \code{('Jack Jansen', 'jack@cwi.nl')}.
|
---|
215 | If the header contained
|
---|
216 | \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
|
---|
217 | exact same result.
|
---|
218 | \end{methoddesc}
|
---|
219 |
|
---|
220 | \begin{methoddesc}{getaddrlist}{name}
|
---|
221 | This is similar to \code{getaddr(\var{list})}, but parses a header
|
---|
222 | containing a list of email addresses (e.g.\ a \mailheader{To} header) and
|
---|
223 | returns a list of \code{(\var{full name}, \var{email address})} pairs
|
---|
224 | (even if there was only one address in the header). If there is no
|
---|
225 | header matching \var{name}, return an empty list.
|
---|
226 |
|
---|
227 | If multiple headers exist that match the named header (e.g. if there
|
---|
228 | are several \mailheader{Cc} headers), all are parsed for addresses.
|
---|
229 | Any continuation lines the named headers contain are also parsed.
|
---|
230 | \end{methoddesc}
|
---|
231 |
|
---|
232 | \begin{methoddesc}{getdate}{name}
|
---|
233 | Retrieve a header using \method{getheader()} and parse it into a 9-tuple
|
---|
234 | compatible with \function{time.mktime()}; note that fields 6, 7, and 8
|
---|
235 | are not usable. If there is no header matching
|
---|
236 | \var{name}, or it is unparsable, return \code{None}.
|
---|
237 |
|
---|
238 | Date parsing appears to be a black art, and not all mailers adhere to
|
---|
239 | the standard. While it has been tested and found correct on a large
|
---|
240 | collection of email from many sources, it is still possible that this
|
---|
241 | function may occasionally yield an incorrect result.
|
---|
242 | \end{methoddesc}
|
---|
243 |
|
---|
244 | \begin{methoddesc}{getdate_tz}{name}
|
---|
245 | Retrieve a header using \method{getheader()} and parse it into a
|
---|
246 | 10-tuple; the first 9 elements will make a tuple compatible with
|
---|
247 | \function{time.mktime()}, and the 10th is a number giving the offset
|
---|
248 | of the date's timezone from UTC. Note that fields 6, 7, and 8
|
---|
249 | are not usable. Similarly to \method{getdate()}, if
|
---|
250 | there is no header matching \var{name}, or it is unparsable, return
|
---|
251 | \code{None}.
|
---|
252 | \end{methoddesc}
|
---|
253 |
|
---|
254 | \class{Message} instances also support a limited mapping interface.
|
---|
255 | In particular: \code{\var{m}[name]} is like
|
---|
256 | \code{\var{m}.getheader(name)} but raises \exception{KeyError} if
|
---|
257 | there is no matching header; and \code{len(\var{m})},
|
---|
258 | \code{\var{m}.get(\var{name}\optional{, \var{default}})},
|
---|
259 | \code{\var{m}.has_key(\var{name})}, \code{\var{m}.keys()},
|
---|
260 | \code{\var{m}.values()} \code{\var{m}.items()}, and
|
---|
261 | \code{\var{m}.setdefault(\var{name}\optional{, \var{default}})} act as
|
---|
262 | expected, with the one difference that \method{setdefault()} uses
|
---|
263 | an empty string as the default value. \class{Message} instances
|
---|
264 | also support the mapping writable interface \code{\var{m}[name] =
|
---|
265 | value} and \code{del \var{m}[name]}. \class{Message} objects do not
|
---|
266 | support the \method{clear()}, \method{copy()}, \method{popitem()}, or
|
---|
267 | \method{update()} methods of the mapping interface. (Support for
|
---|
268 | \method{get()} and \method{setdefault()} was only added in Python
|
---|
269 | 2.2.)
|
---|
270 |
|
---|
271 | Finally, \class{Message} instances have some public instance variables:
|
---|
272 |
|
---|
273 | \begin{memberdesc}{headers}
|
---|
274 | A list containing the entire set of header lines, in the order in
|
---|
275 | which they were read (except that setitem calls may disturb this
|
---|
276 | order). Each line contains a trailing newline. The
|
---|
277 | blank line terminating the headers is not contained in the list.
|
---|
278 | \end{memberdesc}
|
---|
279 |
|
---|
280 | \begin{memberdesc}{fp}
|
---|
281 | The file or file-like object passed at instantiation time. This can
|
---|
282 | be used to read the message content.
|
---|
283 | \end{memberdesc}
|
---|
284 |
|
---|
285 | \begin{memberdesc}{unixfrom}
|
---|
286 | The \UNIX{} \samp{From~} line, if the message had one, or an empty
|
---|
287 | string. This is needed to regenerate the message in some contexts,
|
---|
288 | such as an \code{mbox}-style mailbox file.
|
---|
289 | \end{memberdesc}
|
---|
290 |
|
---|
291 |
|
---|
292 | \subsection{AddressList Objects \label{addresslist-objects}}
|
---|
293 |
|
---|
294 | An \class{AddressList} instance has the following methods:
|
---|
295 |
|
---|
296 | \begin{methoddesc}{__len__}{}
|
---|
297 | Return the number of addresses in the address list.
|
---|
298 | \end{methoddesc}
|
---|
299 |
|
---|
300 | \begin{methoddesc}{__str__}{}
|
---|
301 | Return a canonicalized string representation of the address list.
|
---|
302 | Addresses are rendered in "name" <host@domain> form, comma-separated.
|
---|
303 | \end{methoddesc}
|
---|
304 |
|
---|
305 | \begin{methoddesc}{__add__}{alist}
|
---|
306 | Return a new \class{AddressList} instance that contains all addresses
|
---|
307 | in both \class{AddressList} operands, with duplicates removed (set
|
---|
308 | union).
|
---|
309 | \end{methoddesc}
|
---|
310 |
|
---|
311 | \begin{methoddesc}{__iadd__}{alist}
|
---|
312 | In-place version of \method{__add__()}; turns this \class{AddressList}
|
---|
313 | instance into the union of itself and the right-hand instance,
|
---|
314 | \var{alist}.
|
---|
315 | \end{methoddesc}
|
---|
316 |
|
---|
317 | \begin{methoddesc}{__sub__}{alist}
|
---|
318 | Return a new \class{AddressList} instance that contains every address
|
---|
319 | in the left-hand \class{AddressList} operand that is not present in
|
---|
320 | the right-hand address operand (set difference).
|
---|
321 | \end{methoddesc}
|
---|
322 |
|
---|
323 | \begin{methoddesc}{__isub__}{alist}
|
---|
324 | In-place version of \method{__sub__()}, removing addresses in this
|
---|
325 | list which are also in \var{alist}.
|
---|
326 | \end{methoddesc}
|
---|
327 |
|
---|
328 |
|
---|
329 | Finally, \class{AddressList} instances have one public instance variable:
|
---|
330 |
|
---|
331 | \begin{memberdesc}{addresslist}
|
---|
332 | A list of tuple string pairs, one per address. In each member, the
|
---|
333 | first is the canonicalized name part, the second is the
|
---|
334 | actual route-address (\character{@}-separated username-host.domain
|
---|
335 | pair).
|
---|
336 | \end{memberdesc}
|
---|