[391] | 1 | :mod:`email.header`: Internationalized headers
|
---|
| 2 | ----------------------------------------------
|
---|
[2] | 3 |
|
---|
| 4 | .. module:: email.header
|
---|
| 5 | :synopsis: Representing non-ASCII headers
|
---|
| 6 |
|
---|
| 7 |
|
---|
| 8 | :rfc:`2822` is the base standard that describes the format of email messages.
|
---|
| 9 | It derives from the older :rfc:`822` standard which came into widespread use at
|
---|
| 10 | a time when most email was composed of ASCII characters only. :rfc:`2822` is a
|
---|
| 11 | specification written assuming email contains only 7-bit ASCII characters.
|
---|
| 12 |
|
---|
| 13 | Of course, as email has been deployed worldwide, it has become
|
---|
| 14 | internationalized, such that language specific character sets can now be used in
|
---|
| 15 | email messages. The base standard still requires email messages to be
|
---|
| 16 | transferred using only 7-bit ASCII characters, so a slew of RFCs have been
|
---|
| 17 | written describing how to encode email containing non-ASCII characters into
|
---|
| 18 | :rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`,
|
---|
| 19 | :rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards
|
---|
| 20 | in its :mod:`email.header` and :mod:`email.charset` modules.
|
---|
| 21 |
|
---|
| 22 | If you want to include non-ASCII characters in your email headers, say in the
|
---|
| 23 | :mailheader:`Subject` or :mailheader:`To` fields, you should use the
|
---|
| 24 | :class:`Header` class and assign the field in the :class:`~email.message.Message`
|
---|
| 25 | object to an instance of :class:`Header` instead of using a string for the header
|
---|
| 26 | value. Import the :class:`Header` class from the :mod:`email.header` module.
|
---|
| 27 | For example::
|
---|
| 28 |
|
---|
| 29 | >>> from email.message import Message
|
---|
| 30 | >>> from email.header import Header
|
---|
| 31 | >>> msg = Message()
|
---|
| 32 | >>> h = Header('p\xf6stal', 'iso-8859-1')
|
---|
| 33 | >>> msg['Subject'] = h
|
---|
| 34 | >>> print msg.as_string()
|
---|
| 35 | Subject: =?iso-8859-1?q?p=F6stal?=
|
---|
| 36 |
|
---|
| 37 |
|
---|
| 38 |
|
---|
| 39 | Notice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII
|
---|
| 40 | character? We did this by creating a :class:`Header` instance and passing in
|
---|
| 41 | the character set that the byte string was encoded in. When the subsequent
|
---|
| 42 | :class:`~email.message.Message` instance was flattened, the :mailheader:`Subject`
|
---|
| 43 | field was properly :rfc:`2047` encoded. MIME-aware mail readers would show this
|
---|
| 44 | header using the embedded ISO-8859-1 character.
|
---|
| 45 |
|
---|
| 46 | .. versionadded:: 2.2.2
|
---|
| 47 |
|
---|
| 48 | Here is the :class:`Header` class description:
|
---|
| 49 |
|
---|
| 50 |
|
---|
| 51 | .. class:: Header([s[, charset[, maxlinelen[, header_name[, continuation_ws[, errors]]]]]])
|
---|
| 52 |
|
---|
| 53 | Create a MIME-compliant header that can contain strings in different character
|
---|
| 54 | sets.
|
---|
| 55 |
|
---|
| 56 | Optional *s* is the initial header value. If ``None`` (the default), the
|
---|
| 57 | initial header value is not set. You can later append to the header with
|
---|
| 58 | :meth:`append` method calls. *s* may be a byte string or a Unicode string, but
|
---|
| 59 | see the :meth:`append` documentation for semantics.
|
---|
| 60 |
|
---|
| 61 | Optional *charset* serves two purposes: it has the same meaning as the *charset*
|
---|
| 62 | argument to the :meth:`append` method. It also sets the default character set
|
---|
| 63 | for all subsequent :meth:`append` calls that omit the *charset* argument. If
|
---|
| 64 | *charset* is not provided in the constructor (the default), the ``us-ascii``
|
---|
| 65 | character set is used both as *s*'s initial charset and as the default for
|
---|
| 66 | subsequent :meth:`append` calls.
|
---|
| 67 |
|
---|
[391] | 68 | The maximum line length can be specified explicitly via *maxlinelen*. For
|
---|
[2] | 69 | splitting the first line to a shorter value (to account for the field header
|
---|
| 70 | which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the
|
---|
| 71 | field in *header_name*. The default *maxlinelen* is 76, and the default value
|
---|
| 72 | for *header_name* is ``None``, meaning it is not taken into account for the
|
---|
| 73 | first line of a long, split header.
|
---|
| 74 |
|
---|
| 75 | Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding whitespace,
|
---|
| 76 | and is usually either a space or a hard tab character. This character will be
|
---|
[391] | 77 | prepended to continuation lines. *continuation_ws* defaults to a single
|
---|
| 78 | space character (" ").
|
---|
[2] | 79 |
|
---|
| 80 | Optional *errors* is passed straight through to the :meth:`append` method.
|
---|
| 81 |
|
---|
| 82 |
|
---|
| 83 | .. method:: append(s[, charset[, errors]])
|
---|
| 84 |
|
---|
| 85 | Append the string *s* to the MIME header.
|
---|
| 86 |
|
---|
| 87 | Optional *charset*, if given, should be a :class:`~email.charset.Charset`
|
---|
| 88 | instance (see :mod:`email.charset`) or the name of a character set, which
|
---|
| 89 | will be converted to a :class:`~email.charset.Charset` instance. A value
|
---|
| 90 | of ``None`` (the default) means that the *charset* given in the constructor
|
---|
| 91 | is used.
|
---|
| 92 |
|
---|
| 93 | *s* may be a byte string or a Unicode string. If it is a byte string
|
---|
| 94 | (i.e. ``isinstance(s, str)`` is true), then *charset* is the encoding of
|
---|
| 95 | that byte string, and a :exc:`UnicodeError` will be raised if the string
|
---|
| 96 | cannot be decoded with that character set.
|
---|
| 97 |
|
---|
| 98 | If *s* is a Unicode string, then *charset* is a hint specifying the
|
---|
| 99 | character set of the characters in the string. In this case, when
|
---|
| 100 | producing an :rfc:`2822`\ -compliant header using :rfc:`2047` rules, the
|
---|
| 101 | Unicode string will be encoded using the following charsets in order:
|
---|
| 102 | ``us-ascii``, the *charset* hint, ``utf-8``. The first character set to
|
---|
| 103 | not provoke a :exc:`UnicodeError` is used.
|
---|
| 104 |
|
---|
| 105 | Optional *errors* is passed through to any :func:`unicode` or
|
---|
[391] | 106 | :meth:`unicode.encode` call, and defaults to "strict".
|
---|
[2] | 107 |
|
---|
| 108 |
|
---|
| 109 | .. method:: encode([splitchars])
|
---|
| 110 |
|
---|
| 111 | Encode a message header into an RFC-compliant format, possibly wrapping
|
---|
| 112 | long lines and encapsulating non-ASCII parts in base64 or quoted-printable
|
---|
| 113 | encodings. Optional *splitchars* is a string containing characters to
|
---|
| 114 | split long ASCII lines on, in rough support of :rfc:`2822`'s *highest
|
---|
| 115 | level syntactic breaks*. This doesn't affect :rfc:`2047` encoded lines.
|
---|
| 116 |
|
---|
| 117 | The :class:`Header` class also provides a number of methods to support
|
---|
| 118 | standard operators and built-in functions.
|
---|
| 119 |
|
---|
| 120 |
|
---|
| 121 | .. method:: __str__()
|
---|
| 122 |
|
---|
| 123 | A synonym for :meth:`Header.encode`. Useful for ``str(aHeader)``.
|
---|
| 124 |
|
---|
| 125 |
|
---|
| 126 | .. method:: __unicode__()
|
---|
| 127 |
|
---|
| 128 | A helper for the built-in :func:`unicode` function. Returns the header as
|
---|
| 129 | a Unicode string.
|
---|
| 130 |
|
---|
| 131 |
|
---|
| 132 | .. method:: __eq__(other)
|
---|
| 133 |
|
---|
| 134 | This method allows you to compare two :class:`Header` instances for
|
---|
| 135 | equality.
|
---|
| 136 |
|
---|
| 137 |
|
---|
| 138 | .. method:: __ne__(other)
|
---|
| 139 |
|
---|
| 140 | This method allows you to compare two :class:`Header` instances for
|
---|
| 141 | inequality.
|
---|
| 142 |
|
---|
| 143 | The :mod:`email.header` module also provides the following convenient functions.
|
---|
| 144 |
|
---|
| 145 |
|
---|
| 146 | .. function:: decode_header(header)
|
---|
| 147 |
|
---|
| 148 | Decode a message header value without converting the character set. The header
|
---|
| 149 | value is in *header*.
|
---|
| 150 |
|
---|
| 151 | This function returns a list of ``(decoded_string, charset)`` pairs containing
|
---|
| 152 | each of the decoded parts of the header. *charset* is ``None`` for non-encoded
|
---|
| 153 | parts of the header, otherwise a lower case string containing the name of the
|
---|
| 154 | character set specified in the encoded string.
|
---|
| 155 |
|
---|
| 156 | Here's an example::
|
---|
| 157 |
|
---|
| 158 | >>> from email.header import decode_header
|
---|
| 159 | >>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
---|
| 160 | [('p\xf6stal', 'iso-8859-1')]
|
---|
| 161 |
|
---|
| 162 |
|
---|
| 163 | .. function:: make_header(decoded_seq[, maxlinelen[, header_name[, continuation_ws]]])
|
---|
| 164 |
|
---|
| 165 | Create a :class:`Header` instance from a sequence of pairs as returned by
|
---|
| 166 | :func:`decode_header`.
|
---|
| 167 |
|
---|
| 168 | :func:`decode_header` takes a header value string and returns a sequence of
|
---|
| 169 | pairs of the format ``(decoded_string, charset)`` where *charset* is the name of
|
---|
| 170 | the character set.
|
---|
| 171 |
|
---|
| 172 | This function takes one of those sequence of pairs and returns a :class:`Header`
|
---|
| 173 | instance. Optional *maxlinelen*, *header_name*, and *continuation_ws* are as in
|
---|
| 174 | the :class:`Header` constructor.
|
---|
| 175 |
|
---|