[2] | 1 |
|
---|
| 2 | :mod:`rfc822` --- Parse RFC 2822 mail headers
|
---|
| 3 | =============================================
|
---|
| 4 |
|
---|
| 5 | .. module:: rfc822
|
---|
| 6 | :synopsis: Parse 2822 style mail messages.
|
---|
| 7 | :deprecated:
|
---|
| 8 |
|
---|
| 9 |
|
---|
| 10 | .. deprecated:: 2.3
|
---|
| 11 | The :mod:`email` package should be used in preference to the :mod:`rfc822`
|
---|
| 12 | module. This module is present only to maintain backward compatibility, and
|
---|
[391] | 13 | has been removed in Python 3.
|
---|
[2] | 14 |
|
---|
| 15 | This module defines a class, :class:`Message`, which represents an "email
|
---|
| 16 | message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages
|
---|
| 17 | consist of a collection of message headers, and a message body. This module
|
---|
| 18 | also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
|
---|
| 19 | addresses. Please refer to the RFC for information on the specific syntax of
|
---|
| 20 | :rfc:`2822` messages.
|
---|
| 21 |
|
---|
| 22 | .. index:: module: mailbox
|
---|
| 23 |
|
---|
| 24 | The :mod:`mailbox` module provides classes to read mailboxes produced by
|
---|
| 25 | various end-user mail programs.
|
---|
| 26 |
|
---|
| 27 |
|
---|
| 28 | .. class:: Message(file[, seekable])
|
---|
| 29 |
|
---|
| 30 | A :class:`Message` instance is instantiated with an input object as parameter.
|
---|
| 31 | Message relies only on the input object having a :meth:`readline` method; in
|
---|
| 32 | particular, ordinary file objects qualify. Instantiation reads headers from the
|
---|
| 33 | input object up to a delimiter line (normally a blank line) and stores them in
|
---|
| 34 | the instance. The message body, following the headers, is not consumed.
|
---|
| 35 |
|
---|
| 36 | This class can work with any input object that supports a :meth:`readline`
|
---|
| 37 | method. If the input object has seek and tell capability, the
|
---|
| 38 | :meth:`rewindbody` method will work; also, illegal lines will be pushed back
|
---|
| 39 | onto the input stream. If the input object lacks seek but has an :meth:`unread`
|
---|
| 40 | method that can push back a line of input, :class:`Message` will use that to
|
---|
| 41 | push back illegal lines. Thus this class can be used to parse messages coming
|
---|
| 42 | from a buffered stream.
|
---|
| 43 |
|
---|
| 44 | The optional *seekable* argument is provided as a workaround for certain stdio
|
---|
[391] | 45 | libraries in which :c:func:`tell` discards buffered data before discovering that
|
---|
| 46 | the :c:func:`lseek` system call doesn't work. For maximum portability, you
|
---|
[2] | 47 | should set the seekable argument to zero to prevent that initial :meth:`tell`
|
---|
| 48 | when passing in an unseekable object such as a file object created from a socket
|
---|
| 49 | object.
|
---|
| 50 |
|
---|
| 51 | Input lines as read from the file may either be terminated by CR-LF or by a
|
---|
| 52 | single linefeed; a terminating CR-LF is replaced by a single linefeed before the
|
---|
| 53 | line is stored.
|
---|
| 54 |
|
---|
| 55 | All header matching is done independent of upper or lower case; e.g.
|
---|
| 56 | ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.
|
---|
| 57 |
|
---|
| 58 |
|
---|
| 59 | .. class:: AddressList(field)
|
---|
| 60 |
|
---|
| 61 | You may instantiate the :class:`AddressList` helper class using a single string
|
---|
| 62 | parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The
|
---|
| 63 | parameter ``None`` yields an empty list.)
|
---|
| 64 |
|
---|
| 65 |
|
---|
| 66 | .. function:: quote(str)
|
---|
| 67 |
|
---|
| 68 | Return a new string with backslashes in *str* replaced by two backslashes and
|
---|
| 69 | double quotes replaced by backslash-double quote.
|
---|
| 70 |
|
---|
| 71 |
|
---|
| 72 | .. function:: unquote(str)
|
---|
| 73 |
|
---|
| 74 | Return a new string which is an *unquoted* version of *str*. If *str* ends and
|
---|
| 75 | begins with double quotes, they are stripped off. Likewise if *str* ends and
|
---|
| 76 | begins with angle brackets, they are stripped off.
|
---|
| 77 |
|
---|
| 78 |
|
---|
| 79 | .. function:: parseaddr(address)
|
---|
| 80 |
|
---|
| 81 | Parse *address*, which should be the value of some address-containing field such
|
---|
| 82 | as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
|
---|
| 83 | "email address" parts. Returns a tuple of that information, unless the parse
|
---|
| 84 | fails, in which case a 2-tuple ``(None, None)`` is returned.
|
---|
| 85 |
|
---|
| 86 |
|
---|
| 87 | .. function:: dump_address_pair(pair)
|
---|
| 88 |
|
---|
| 89 | The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
|
---|
| 90 | email_address)`` and returns the string value suitable for a :mailheader:`To` or
|
---|
| 91 | :mailheader:`Cc` header. If the first element of *pair* is false, then the
|
---|
| 92 | second element is returned unmodified.
|
---|
| 93 |
|
---|
| 94 |
|
---|
| 95 | .. function:: parsedate(date)
|
---|
| 96 |
|
---|
| 97 | Attempts to parse a date according to the rules in :rfc:`2822`. however, some
|
---|
| 98 | mailers don't follow that format as specified, so :func:`parsedate` tries to
|
---|
| 99 | guess correctly in such cases. *date* is a string containing an :rfc:`2822`
|
---|
| 100 | date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing
|
---|
| 101 | the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
|
---|
| 102 | :func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6,
|
---|
| 103 | 7, and 8 of the result tuple are not usable.
|
---|
| 104 |
|
---|
| 105 |
|
---|
| 106 | .. function:: parsedate_tz(date)
|
---|
| 107 |
|
---|
| 108 | Performs the same function as :func:`parsedate`, but returns either ``None`` or
|
---|
| 109 | a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
|
---|
| 110 | :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
|
---|
| 111 | (which is the official term for Greenwich Mean Time). (Note that the sign of
|
---|
| 112 | the timezone offset is the opposite of the sign of the ``time.timezone``
|
---|
| 113 | variable for the same timezone; the latter variable follows the POSIX standard
|
---|
| 114 | while this module follows :rfc:`2822`.) If the input string has no timezone,
|
---|
| 115 | the last element of the tuple returned is ``None``. Note that indexes 6, 7, and
|
---|
| 116 | 8 of the result tuple are not usable.
|
---|
| 117 |
|
---|
| 118 |
|
---|
| 119 | .. function:: mktime_tz(tuple)
|
---|
| 120 |
|
---|
| 121 | Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If
|
---|
| 122 | the timezone item in the tuple is ``None``, assume local time. Minor
|
---|
| 123 | deficiency: this first interprets the first 8 elements as a local time and then
|
---|
| 124 | compensates for the timezone difference; this may yield a slight error around
|
---|
| 125 | daylight savings time switch dates. Not enough to worry about for common use.
|
---|
| 126 |
|
---|
| 127 |
|
---|
| 128 | .. seealso::
|
---|
| 129 |
|
---|
| 130 | Module :mod:`email`
|
---|
| 131 | Comprehensive email handling package; supersedes the :mod:`rfc822` module.
|
---|
| 132 |
|
---|
| 133 | Module :mod:`mailbox`
|
---|
| 134 | Classes to read various mailbox formats produced by end-user mail programs.
|
---|
| 135 |
|
---|
| 136 | Module :mod:`mimetools`
|
---|
| 137 | Subclass of :class:`rfc822.Message` that handles MIME encoded messages.
|
---|
| 138 |
|
---|
| 139 |
|
---|
| 140 | .. _message-objects:
|
---|
| 141 |
|
---|
| 142 | Message Objects
|
---|
| 143 | ---------------
|
---|
| 144 |
|
---|
| 145 | A :class:`Message` instance has the following methods:
|
---|
| 146 |
|
---|
| 147 |
|
---|
| 148 | .. method:: Message.rewindbody()
|
---|
| 149 |
|
---|
| 150 | Seek to the start of the message body. This only works if the file object is
|
---|
| 151 | seekable.
|
---|
| 152 |
|
---|
| 153 |
|
---|
| 154 | .. method:: Message.isheader(line)
|
---|
| 155 |
|
---|
| 156 | Returns a line's canonicalized fieldname (the dictionary key that will be used
|
---|
| 157 | to index it) if the line is a legal :rfc:`2822` header; otherwise returns
|
---|
| 158 | ``None`` (implying that parsing should stop here and the line be pushed back on
|
---|
| 159 | the input stream). It is sometimes useful to override this method in a
|
---|
| 160 | subclass.
|
---|
| 161 |
|
---|
| 162 |
|
---|
| 163 | .. method:: Message.islast(line)
|
---|
| 164 |
|
---|
| 165 | Return true if the given line is a delimiter on which Message should stop. The
|
---|
| 166 | delimiter line is consumed, and the file object's read location positioned
|
---|
| 167 | immediately after it. By default this method just checks that the line is
|
---|
| 168 | blank, but you can override it in a subclass.
|
---|
| 169 |
|
---|
| 170 |
|
---|
| 171 | .. method:: Message.iscomment(line)
|
---|
| 172 |
|
---|
| 173 | Return ``True`` if the given line should be ignored entirely, just skipped. By
|
---|
| 174 | default this is a stub that always returns ``False``, but you can override it in
|
---|
| 175 | a subclass.
|
---|
| 176 |
|
---|
| 177 |
|
---|
| 178 | .. method:: Message.getallmatchingheaders(name)
|
---|
| 179 |
|
---|
| 180 | Return a list of lines consisting of all headers matching *name*, if any. Each
|
---|
| 181 | physical line, whether it is a continuation line or not, is a separate list
|
---|
| 182 | item. Return the empty list if no header matches *name*.
|
---|
| 183 |
|
---|
| 184 |
|
---|
| 185 | .. method:: Message.getfirstmatchingheader(name)
|
---|
| 186 |
|
---|
| 187 | Return a list of lines comprising the first header matching *name*, and its
|
---|
| 188 | continuation line(s), if any. Return ``None`` if there is no header matching
|
---|
| 189 | *name*.
|
---|
| 190 |
|
---|
| 191 |
|
---|
| 192 | .. method:: Message.getrawheader(name)
|
---|
| 193 |
|
---|
| 194 | Return a single string consisting of the text after the colon in the first
|
---|
| 195 | header matching *name*. This includes leading whitespace, the trailing
|
---|
| 196 | linefeed, and internal linefeeds and whitespace if there any continuation
|
---|
| 197 | line(s) were present. Return ``None`` if there is no header matching *name*.
|
---|
| 198 |
|
---|
| 199 |
|
---|
| 200 | .. method:: Message.getheader(name[, default])
|
---|
| 201 |
|
---|
| 202 | Return a single string consisting of the last header matching *name*,
|
---|
| 203 | but strip leading and trailing whitespace.
|
---|
| 204 | Internal whitespace is not stripped. The optional *default* argument can be
|
---|
| 205 | used to specify a different default to be returned when there is no header
|
---|
| 206 | matching *name*; it defaults to ``None``.
|
---|
| 207 | This is the preferred way to get parsed headers.
|
---|
| 208 |
|
---|
| 209 |
|
---|
| 210 | .. method:: Message.get(name[, default])
|
---|
| 211 |
|
---|
| 212 | An alias for :meth:`getheader`, to make the interface more compatible with
|
---|
| 213 | regular dictionaries.
|
---|
| 214 |
|
---|
| 215 |
|
---|
| 216 | .. method:: Message.getaddr(name)
|
---|
| 217 |
|
---|
| 218 | Return a pair ``(full name, email address)`` parsed from the string returned by
|
---|
| 219 | ``getheader(name)``. If no header matching *name* exists, return ``(None,
|
---|
| 220 | None)``; otherwise both the full name and the address are (possibly empty)
|
---|
| 221 | strings.
|
---|
| 222 |
|
---|
| 223 | Example: If *m*'s first :mailheader:`From` header contains the string
|
---|
| 224 | ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
|
---|
| 225 | ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
|
---|
| 226 | <jack@cwi.nl>'`` instead, it would yield the exact same result.
|
---|
| 227 |
|
---|
| 228 |
|
---|
| 229 | .. method:: Message.getaddrlist(name)
|
---|
| 230 |
|
---|
| 231 | This is similar to ``getaddr(list)``, but parses a header containing a list of
|
---|
| 232 | email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
|
---|
| 233 | name, email address)`` pairs (even if there was only one address in the header).
|
---|
| 234 | If there is no header matching *name*, return an empty list.
|
---|
| 235 |
|
---|
| 236 | If multiple headers exist that match the named header (e.g. if there are several
|
---|
| 237 | :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
|
---|
| 238 | the named headers contain are also parsed.
|
---|
| 239 |
|
---|
| 240 |
|
---|
| 241 | .. method:: Message.getdate(name)
|
---|
| 242 |
|
---|
| 243 | Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
|
---|
| 244 | with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If
|
---|
| 245 | there is no header matching *name*, or it is unparsable, return ``None``.
|
---|
| 246 |
|
---|
| 247 | Date parsing appears to be a black art, and not all mailers adhere to the
|
---|
| 248 | standard. While it has been tested and found correct on a large collection of
|
---|
| 249 | email from many sources, it is still possible that this function may
|
---|
| 250 | occasionally yield an incorrect result.
|
---|
| 251 |
|
---|
| 252 |
|
---|
| 253 | .. method:: Message.getdate_tz(name)
|
---|
| 254 |
|
---|
| 255 | Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
|
---|
| 256 | first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
|
---|
| 257 | 10th is a number giving the offset of the date's timezone from UTC. Note that
|
---|
| 258 | fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is
|
---|
| 259 | no header matching *name*, or it is unparsable, return ``None``.
|
---|
| 260 |
|
---|
| 261 | :class:`Message` instances also support a limited mapping interface. In
|
---|
| 262 | particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
|
---|
| 263 | if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
|
---|
| 264 | ``name in m``, ``m.keys()``, ``m.values()`` ``m.items()``, and
|
---|
| 265 | ``m.setdefault(name[, default])`` act as expected, with the one difference
|
---|
| 266 | that :meth:`setdefault` uses an empty string as the default value.
|
---|
| 267 | :class:`Message` instances also support the mapping writable interface ``m[name]
|
---|
| 268 | = value`` and ``del m[name]``. :class:`Message` objects do not support the
|
---|
| 269 | :meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
|
---|
| 270 | mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only
|
---|
| 271 | added in Python 2.2.)
|
---|
| 272 |
|
---|
| 273 | Finally, :class:`Message` instances have some public instance variables:
|
---|
| 274 |
|
---|
| 275 |
|
---|
| 276 | .. attribute:: Message.headers
|
---|
| 277 |
|
---|
| 278 | A list containing the entire set of header lines, in the order in which they
|
---|
| 279 | were read (except that setitem calls may disturb this order). Each line contains
|
---|
| 280 | a trailing newline. The blank line terminating the headers is not contained in
|
---|
| 281 | the list.
|
---|
| 282 |
|
---|
| 283 |
|
---|
| 284 | .. attribute:: Message.fp
|
---|
| 285 |
|
---|
| 286 | The file or file-like object passed at instantiation time. This can be used to
|
---|
| 287 | read the message content.
|
---|
| 288 |
|
---|
| 289 |
|
---|
| 290 | .. attribute:: Message.unixfrom
|
---|
| 291 |
|
---|
| 292 | The Unix ``From`` line, if the message had one, or an empty string. This is
|
---|
| 293 | needed to regenerate the message in some contexts, such as an ``mbox``\ -style
|
---|
| 294 | mailbox file.
|
---|
| 295 |
|
---|
| 296 |
|
---|
| 297 | .. _addresslist-objects:
|
---|
| 298 |
|
---|
| 299 | AddressList Objects
|
---|
| 300 | -------------------
|
---|
| 301 |
|
---|
| 302 | An :class:`AddressList` instance has the following methods:
|
---|
| 303 |
|
---|
| 304 |
|
---|
| 305 | .. method:: AddressList.__len__()
|
---|
| 306 |
|
---|
| 307 | Return the number of addresses in the address list.
|
---|
| 308 |
|
---|
| 309 |
|
---|
| 310 | .. method:: AddressList.__str__()
|
---|
| 311 |
|
---|
| 312 | Return a canonicalized string representation of the address list. Addresses are
|
---|
| 313 | rendered in "name" <host@domain> form, comma-separated.
|
---|
| 314 |
|
---|
| 315 |
|
---|
| 316 | .. method:: AddressList.__add__(alist)
|
---|
| 317 |
|
---|
| 318 | Return a new :class:`AddressList` instance that contains all addresses in both
|
---|
| 319 | :class:`AddressList` operands, with duplicates removed (set union).
|
---|
| 320 |
|
---|
| 321 |
|
---|
| 322 | .. method:: AddressList.__iadd__(alist)
|
---|
| 323 |
|
---|
| 324 | In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
|
---|
| 325 | into the union of itself and the right-hand instance, *alist*.
|
---|
| 326 |
|
---|
| 327 |
|
---|
| 328 | .. method:: AddressList.__sub__(alist)
|
---|
| 329 |
|
---|
| 330 | Return a new :class:`AddressList` instance that contains every address in the
|
---|
| 331 | left-hand :class:`AddressList` operand that is not present in the right-hand
|
---|
| 332 | address operand (set difference).
|
---|
| 333 |
|
---|
| 334 |
|
---|
| 335 | .. method:: AddressList.__isub__(alist)
|
---|
| 336 |
|
---|
| 337 | In-place version of :meth:`__sub__`, removing addresses in this list which are
|
---|
| 338 | also in *alist*.
|
---|
| 339 |
|
---|
| 340 | Finally, :class:`AddressList` instances have one public instance variable:
|
---|
| 341 |
|
---|
| 342 |
|
---|
| 343 | .. attribute:: AddressList.addresslist
|
---|
| 344 |
|
---|
| 345 | A list of tuple string pairs, one per address. In each member, the first is the
|
---|
| 346 | canonicalized name part, the second is the actual route-address (``'@'``\
|
---|
| 347 | -separated username-host.domain pair).
|
---|
| 348 |
|
---|
| 349 | .. rubric:: Footnotes
|
---|
| 350 |
|
---|
| 351 | .. [#] This module originally conformed to :rfc:`822`, hence the name. Since then,
|
---|
| 352 | :rfc:`2822` has been released as an update to :rfc:`822`. This module should be
|
---|
| 353 | considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
|
---|
| 354 | semantics have changed since :rfc:`822`.
|
---|
| 355 |
|
---|