1 | \declaremodule{standard}{email.header}
|
---|
2 | \modulesynopsis{Representing non-ASCII headers}
|
---|
3 |
|
---|
4 | \rfc{2822} is the base standard that describes the format of email
|
---|
5 | messages. It derives from the older \rfc{822} standard which came
|
---|
6 | into widespread use at a time when most email was composed of \ASCII{}
|
---|
7 | characters only. \rfc{2822} is a specification written assuming email
|
---|
8 | contains only 7-bit \ASCII{} characters.
|
---|
9 |
|
---|
10 | Of course, as email has been deployed worldwide, it has become
|
---|
11 | internationalized, such that language specific character sets can now
|
---|
12 | be used in email messages. The base standard still requires email
|
---|
13 | messages to be transferred using only 7-bit \ASCII{} characters, so a
|
---|
14 | slew of RFCs have been written describing how to encode email
|
---|
15 | containing non-\ASCII{} characters into \rfc{2822}-compliant format.
|
---|
16 | These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
|
---|
17 | The \module{email} package supports these standards in its
|
---|
18 | \module{email.header} and \module{email.charset} modules.
|
---|
19 |
|
---|
20 | If you want to include non-\ASCII{} characters in your email headers,
|
---|
21 | say in the \mailheader{Subject} or \mailheader{To} fields, you should
|
---|
22 | use the \class{Header} class and assign the field in the
|
---|
23 | \class{Message} object to an instance of \class{Header} instead of
|
---|
24 | using a string for the header value. Import the \class{Header} class from the
|
---|
25 | \module{email.header} module. For example:
|
---|
26 |
|
---|
27 | \begin{verbatim}
|
---|
28 | >>> from email.message import Message
|
---|
29 | >>> from email.header import Header
|
---|
30 | >>> msg = Message()
|
---|
31 | >>> h = Header('p\xf6stal', 'iso-8859-1')
|
---|
32 | >>> msg['Subject'] = h
|
---|
33 | >>> print msg.as_string()
|
---|
34 | Subject: =?iso-8859-1?q?p=F6stal?=
|
---|
35 |
|
---|
36 |
|
---|
37 | \end{verbatim}
|
---|
38 |
|
---|
39 | Notice here how we wanted the \mailheader{Subject} field to contain a
|
---|
40 | non-\ASCII{} character? We did this by creating a \class{Header}
|
---|
41 | instance and passing in the character set that the byte string was
|
---|
42 | encoded in. When the subsequent \class{Message} instance was
|
---|
43 | flattened, the \mailheader{Subject} field was properly \rfc{2047}
|
---|
44 | encoded. MIME-aware mail readers would show this header using the
|
---|
45 | embedded ISO-8859-1 character.
|
---|
46 |
|
---|
47 | \versionadded{2.2.2}
|
---|
48 |
|
---|
49 | Here is the \class{Header} class description:
|
---|
50 |
|
---|
51 | \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
|
---|
52 | maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
|
---|
53 | errors}}}}}}}
|
---|
54 | Create a MIME-compliant header that can contain strings in different
|
---|
55 | character sets.
|
---|
56 |
|
---|
57 | Optional \var{s} is the initial header value. If \code{None} (the
|
---|
58 | default), the initial header value is not set. You can later append
|
---|
59 | to the header with \method{append()} method calls. \var{s} may be a
|
---|
60 | byte string or a Unicode string, but see the \method{append()}
|
---|
61 | documentation for semantics.
|
---|
62 |
|
---|
63 | Optional \var{charset} serves two purposes: it has the same meaning as
|
---|
64 | the \var{charset} argument to the \method{append()} method. It also
|
---|
65 | sets the default character set for all subsequent \method{append()}
|
---|
66 | calls that omit the \var{charset} argument. If \var{charset} is not
|
---|
67 | provided in the constructor (the default), the \code{us-ascii}
|
---|
68 | character set is used both as \var{s}'s initial charset and as the
|
---|
69 | default for subsequent \method{append()} calls.
|
---|
70 |
|
---|
71 | The maximum line length can be specified explicit via
|
---|
72 | \var{maxlinelen}. For splitting the first line to a shorter value (to
|
---|
73 | account for the field header which isn't included in \var{s},
|
---|
74 | e.g. \mailheader{Subject}) pass in the name of the field in
|
---|
75 | \var{header_name}. The default \var{maxlinelen} is 76, and the
|
---|
76 | default value for \var{header_name} is \code{None}, meaning it is not
|
---|
77 | taken into account for the first line of a long, split header.
|
---|
78 |
|
---|
79 | Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
|
---|
80 | whitespace, and is usually either a space or a hard tab character.
|
---|
81 | This character will be prepended to continuation lines.
|
---|
82 | \end{classdesc}
|
---|
83 |
|
---|
84 | Optional \var{errors} is passed straight through to the
|
---|
85 | \method{append()} method.
|
---|
86 |
|
---|
87 | \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
|
---|
88 | Append the string \var{s} to the MIME header.
|
---|
89 |
|
---|
90 | Optional \var{charset}, if given, should be a \class{Charset} instance
|
---|
91 | (see \refmodule{email.charset}) or the name of a character set, which
|
---|
92 | will be converted to a \class{Charset} instance. A value of
|
---|
93 | \code{None} (the default) means that the \var{charset} given in the
|
---|
94 | constructor is used.
|
---|
95 |
|
---|
96 | \var{s} may be a byte string or a Unicode string. If it is a byte
|
---|
97 | string (i.e. \code{isinstance(s, str)} is true), then
|
---|
98 | \var{charset} is the encoding of that byte string, and a
|
---|
99 | \exception{UnicodeError} will be raised if the string cannot be
|
---|
100 | decoded with that character set.
|
---|
101 |
|
---|
102 | If \var{s} is a Unicode string, then \var{charset} is a hint
|
---|
103 | specifying the character set of the characters in the string. In this
|
---|
104 | case, when producing an \rfc{2822}-compliant header using \rfc{2047}
|
---|
105 | rules, the Unicode string will be encoded using the following charsets
|
---|
106 | in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The
|
---|
107 | first character set to not provoke a \exception{UnicodeError} is used.
|
---|
108 |
|
---|
109 | Optional \var{errors} is passed through to any \function{unicode()} or
|
---|
110 | \function{ustr.encode()} call, and defaults to ``strict''.
|
---|
111 | \end{methoddesc}
|
---|
112 |
|
---|
113 | \begin{methoddesc}[Header]{encode}{\optional{splitchars}}
|
---|
114 | Encode a message header into an RFC-compliant format, possibly
|
---|
115 | wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
|
---|
116 | quoted-printable encodings. Optional \var{splitchars} is a string
|
---|
117 | containing characters to split long ASCII lines on, in rough support
|
---|
118 | of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't
|
---|
119 | affect \rfc{2047} encoded lines.
|
---|
120 | \end{methoddesc}
|
---|
121 |
|
---|
122 | The \class{Header} class also provides a number of methods to support
|
---|
123 | standard operators and built-in functions.
|
---|
124 |
|
---|
125 | \begin{methoddesc}[Header]{__str__}{}
|
---|
126 | A synonym for \method{Header.encode()}. Useful for
|
---|
127 | \code{str(aHeader)}.
|
---|
128 | \end{methoddesc}
|
---|
129 |
|
---|
130 | \begin{methoddesc}[Header]{__unicode__}{}
|
---|
131 | A helper for the built-in \function{unicode()} function. Returns the
|
---|
132 | header as a Unicode string.
|
---|
133 | \end{methoddesc}
|
---|
134 |
|
---|
135 | \begin{methoddesc}[Header]{__eq__}{other}
|
---|
136 | This method allows you to compare two \class{Header} instances for equality.
|
---|
137 | \end{methoddesc}
|
---|
138 |
|
---|
139 | \begin{methoddesc}[Header]{__ne__}{other}
|
---|
140 | This method allows you to compare two \class{Header} instances for inequality.
|
---|
141 | \end{methoddesc}
|
---|
142 |
|
---|
143 | The \module{email.header} module also provides the following
|
---|
144 | convenient functions.
|
---|
145 |
|
---|
146 | \begin{funcdesc}{decode_header}{header}
|
---|
147 | Decode a message header value without converting the character set.
|
---|
148 | The header value is in \var{header}.
|
---|
149 |
|
---|
150 | This function returns a list of \code{(decoded_string, charset)} pairs
|
---|
151 | containing each of the decoded parts of the header. \var{charset} is
|
---|
152 | \code{None} for non-encoded parts of the header, otherwise a lower
|
---|
153 | case string containing the name of the character set specified in the
|
---|
154 | encoded string.
|
---|
155 |
|
---|
156 | Here's an example:
|
---|
157 |
|
---|
158 | \begin{verbatim}
|
---|
159 | >>> from email.header import decode_header
|
---|
160 | >>> decode_header('=?iso-8859-1?q?p=F6stal?=')
|
---|
161 | [('p\xf6stal', 'iso-8859-1')]
|
---|
162 | \end{verbatim}
|
---|
163 | \end{funcdesc}
|
---|
164 |
|
---|
165 | \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
|
---|
166 | header_name\optional{, continuation_ws}}}}
|
---|
167 | Create a \class{Header} instance from a sequence of pairs as returned
|
---|
168 | by \function{decode_header()}.
|
---|
169 |
|
---|
170 | \function{decode_header()} takes a header value string and returns a
|
---|
171 | sequence of pairs of the format \code{(decoded_string, charset)} where
|
---|
172 | \var{charset} is the name of the character set.
|
---|
173 |
|
---|
174 | This function takes one of those sequence of pairs and returns a
|
---|
175 | \class{Header} instance. Optional \var{maxlinelen},
|
---|
176 | \var{header_name}, and \var{continuation_ws} are as in the
|
---|
177 | \class{Header} constructor.
|
---|
178 | \end{funcdesc}
|
---|