1 | \declaremodule{standard}{email.parser}
|
---|
2 | \modulesynopsis{Parse flat text email messages to produce a message
|
---|
3 | object structure.}
|
---|
4 |
|
---|
5 | Message object structures can be created in one of two ways: they can be
|
---|
6 | created from whole cloth by instantiating \class{Message} objects and
|
---|
7 | stringing them together via \method{attach()} and
|
---|
8 | \method{set_payload()} calls, or they can be created by parsing a flat text
|
---|
9 | representation of the email message.
|
---|
10 |
|
---|
11 | The \module{email} package provides a standard parser that understands
|
---|
12 | most email document structures, including MIME documents. You can
|
---|
13 | pass the parser a string or a file object, and the parser will return
|
---|
14 | to you the root \class{Message} instance of the object structure. For
|
---|
15 | simple, non-MIME messages the payload of this root object will likely
|
---|
16 | be a string containing the text of the message. For MIME
|
---|
17 | messages, the root object will return \code{True} from its
|
---|
18 | \method{is_multipart()} method, and the subparts can be accessed via
|
---|
19 | the \method{get_payload()} and \method{walk()} methods.
|
---|
20 |
|
---|
21 | There are actually two parser interfaces available for use, the classic
|
---|
22 | \class{Parser} API and the incremental \class{FeedParser} API. The classic
|
---|
23 | \class{Parser} API is fine if you have the entire text of the message in
|
---|
24 | memory as a string, or if the entire message lives in a file on the file
|
---|
25 | system. \class{FeedParser} is more appropriate for when you're reading the
|
---|
26 | message from a stream which might block waiting for more input (e.g. reading
|
---|
27 | an email message from a socket). The \class{FeedParser} can consume and parse
|
---|
28 | the message incrementally, and only returns the root object when you close the
|
---|
29 | parser\footnote{As of email package version 3.0, introduced in
|
---|
30 | Python 2.4, the classic \class{Parser} was re-implemented in terms of the
|
---|
31 | \class{FeedParser}, so the semantics and results are identical between the two
|
---|
32 | parsers.}.
|
---|
33 |
|
---|
34 | Note that the parser can be extended in limited ways, and of course
|
---|
35 | you can implement your own parser completely from scratch. There is
|
---|
36 | no magical connection between the \module{email} package's bundled
|
---|
37 | parser and the \class{Message} class, so your custom parser can create
|
---|
38 | message object trees any way it finds necessary.
|
---|
39 |
|
---|
40 | \subsubsection{FeedParser API}
|
---|
41 |
|
---|
42 | \versionadded{2.4}
|
---|
43 |
|
---|
44 | The \class{FeedParser}, imported from the \module{email.feedparser} module,
|
---|
45 | provides an API that is conducive to incremental parsing of email messages,
|
---|
46 | such as would be necessary when reading the text of an email message from a
|
---|
47 | source that can block (e.g. a socket). The
|
---|
48 | \class{FeedParser} can of course be used to parse an email message fully
|
---|
49 | contained in a string or a file, but the classic \class{Parser} API may be
|
---|
50 | more convenient for such use cases. The semantics and results of the two
|
---|
51 | parser APIs are identical.
|
---|
52 |
|
---|
53 | The \class{FeedParser}'s API is simple; you create an instance, feed it a
|
---|
54 | bunch of text until there's no more to feed it, then close the parser to
|
---|
55 | retrieve the root message object. The \class{FeedParser} is extremely
|
---|
56 | accurate when parsing standards-compliant messages, and it does a very good
|
---|
57 | job of parsing non-compliant messages, providing information about how a
|
---|
58 | message was deemed broken. It will populate a message object's \var{defects}
|
---|
59 | attribute with a list of any problems it found in a message. See the
|
---|
60 | \refmodule{email.errors} module for the list of defects that it can find.
|
---|
61 |
|
---|
62 | Here is the API for the \class{FeedParser}:
|
---|
63 |
|
---|
64 | \begin{classdesc}{FeedParser}{\optional{_factory}}
|
---|
65 | Create a \class{FeedParser} instance. Optional \var{_factory} is a
|
---|
66 | no-argument callable that will be called whenever a new message object is
|
---|
67 | needed. It defaults to the \class{email.message.Message} class.
|
---|
68 | \end{classdesc}
|
---|
69 |
|
---|
70 | \begin{methoddesc}[FeedParser]{feed}{data}
|
---|
71 | Feed the \class{FeedParser} some more data. \var{data} should be a
|
---|
72 | string containing one or more lines. The lines can be partial and the
|
---|
73 | \class{FeedParser} will stitch such partial lines together properly. The
|
---|
74 | lines in the string can have any of the common three line endings, carriage
|
---|
75 | return, newline, or carriage return and newline (they can even be mixed).
|
---|
76 | \end{methoddesc}
|
---|
77 |
|
---|
78 | \begin{methoddesc}[FeedParser]{close}{}
|
---|
79 | Closing a \class{FeedParser} completes the parsing of all previously fed data,
|
---|
80 | and returns the root message object. It is undefined what happens if you feed
|
---|
81 | more data to a closed \class{FeedParser}.
|
---|
82 | \end{methoddesc}
|
---|
83 |
|
---|
84 | \subsubsection{Parser class API}
|
---|
85 |
|
---|
86 | The \class{Parser} class, imported from the \module{email.parser} module,
|
---|
87 | provides an API that can be used to parse a message when the complete contents
|
---|
88 | of the message are available in a string or file. The
|
---|
89 | \module{email.parser} module also provides a second class, called
|
---|
90 | \class{HeaderParser} which can be used if you're only interested in
|
---|
91 | the headers of the message. \class{HeaderParser} can be much faster in
|
---|
92 | these situations, since it does not attempt to parse the message body,
|
---|
93 | instead setting the payload to the raw body as a string.
|
---|
94 | \class{HeaderParser} has the same API as the \class{Parser} class.
|
---|
95 |
|
---|
96 | \begin{classdesc}{Parser}{\optional{_class}}
|
---|
97 | The constructor for the \class{Parser} class takes an optional
|
---|
98 | argument \var{_class}. This must be a callable factory (such as a
|
---|
99 | function or a class), and it is used whenever a sub-message object
|
---|
100 | needs to be created. It defaults to \class{Message} (see
|
---|
101 | \refmodule{email.message}). The factory will be called without
|
---|
102 | arguments.
|
---|
103 |
|
---|
104 | The optional \var{strict} flag is ignored. \deprecated{2.4}{Because the
|
---|
105 | \class{Parser} class is a backward compatible API wrapper around the
|
---|
106 | new-in-Python 2.4 \class{FeedParser}, \emph{all} parsing is effectively
|
---|
107 | non-strict. You should simply stop passing a \var{strict} flag to the
|
---|
108 | \class{Parser} constructor.}
|
---|
109 |
|
---|
110 | \versionchanged[The \var{strict} flag was added]{2.2.2}
|
---|
111 | \versionchanged[The \var{strict} flag was deprecated]{2.4}
|
---|
112 | \end{classdesc}
|
---|
113 |
|
---|
114 | The other public \class{Parser} methods are:
|
---|
115 |
|
---|
116 | \begin{methoddesc}[Parser]{parse}{fp\optional{, headersonly}}
|
---|
117 | Read all the data from the file-like object \var{fp}, parse the
|
---|
118 | resulting text, and return the root message object. \var{fp} must
|
---|
119 | support both the \method{readline()} and the \method{read()} methods
|
---|
120 | on file-like objects.
|
---|
121 |
|
---|
122 | The text contained in \var{fp} must be formatted as a block of \rfc{2822}
|
---|
123 | style headers and header continuation lines, optionally preceded by a
|
---|
124 | envelope header. The header block is terminated either by the
|
---|
125 | end of the data or by a blank line. Following the header block is the
|
---|
126 | body of the message (which may contain MIME-encoded subparts).
|
---|
127 |
|
---|
128 | Optional \var{headersonly} is as with the \method{parse()} method.
|
---|
129 |
|
---|
130 | \versionchanged[The \var{headersonly} flag was added]{2.2.2}
|
---|
131 | \end{methoddesc}
|
---|
132 |
|
---|
133 | \begin{methoddesc}[Parser]{parsestr}{text\optional{, headersonly}}
|
---|
134 | Similar to the \method{parse()} method, except it takes a string
|
---|
135 | object instead of a file-like object. Calling this method on a string
|
---|
136 | is exactly equivalent to wrapping \var{text} in a \class{StringIO}
|
---|
137 | instance first and calling \method{parse()}.
|
---|
138 |
|
---|
139 | Optional \var{headersonly} is a flag specifying whether to stop
|
---|
140 | parsing after reading the headers or not. The default is \code{False},
|
---|
141 | meaning it parses the entire contents of the file.
|
---|
142 |
|
---|
143 | \versionchanged[The \var{headersonly} flag was added]{2.2.2}
|
---|
144 | \end{methoddesc}
|
---|
145 |
|
---|
146 | Since creating a message object structure from a string or a file
|
---|
147 | object is such a common task, two functions are provided as a
|
---|
148 | convenience. They are available in the top-level \module{email}
|
---|
149 | package namespace.
|
---|
150 |
|
---|
151 | \begin{funcdesc}{message_from_string}{s\optional{, _class\optional{, strict}}}
|
---|
152 | Return a message object structure from a string. This is exactly
|
---|
153 | equivalent to \code{Parser().parsestr(s)}. Optional \var{_class} and
|
---|
154 | \var{strict} are interpreted as with the \class{Parser} class constructor.
|
---|
155 |
|
---|
156 | \versionchanged[The \var{strict} flag was added]{2.2.2}
|
---|
157 | \end{funcdesc}
|
---|
158 |
|
---|
159 | \begin{funcdesc}{message_from_file}{fp\optional{, _class\optional{, strict}}}
|
---|
160 | Return a message object structure tree from an open file object. This
|
---|
161 | is exactly equivalent to \code{Parser().parse(fp)}. Optional
|
---|
162 | \var{_class} and \var{strict} are interpreted as with the
|
---|
163 | \class{Parser} class constructor.
|
---|
164 |
|
---|
165 | \versionchanged[The \var{strict} flag was added]{2.2.2}
|
---|
166 | \end{funcdesc}
|
---|
167 |
|
---|
168 | Here's an example of how you might use this at an interactive Python
|
---|
169 | prompt:
|
---|
170 |
|
---|
171 | \begin{verbatim}
|
---|
172 | >>> import email
|
---|
173 | >>> msg = email.message_from_string(myString)
|
---|
174 | \end{verbatim}
|
---|
175 |
|
---|
176 | \subsubsection{Additional notes}
|
---|
177 |
|
---|
178 | Here are some notes on the parsing semantics:
|
---|
179 |
|
---|
180 | \begin{itemize}
|
---|
181 | \item Most non-\mimetype{multipart} type messages are parsed as a single
|
---|
182 | message object with a string payload. These objects will return
|
---|
183 | \code{False} for \method{is_multipart()}. Their
|
---|
184 | \method{get_payload()} method will return a string object.
|
---|
185 |
|
---|
186 | \item All \mimetype{multipart} type messages will be parsed as a
|
---|
187 | container message object with a list of sub-message objects for
|
---|
188 | their payload. The outer container message will return
|
---|
189 | \code{True} for \method{is_multipart()} and their
|
---|
190 | \method{get_payload()} method will return the list of
|
---|
191 | \class{Message} subparts.
|
---|
192 |
|
---|
193 | \item Most messages with a content type of \mimetype{message/*}
|
---|
194 | (e.g. \mimetype{message/delivery-status} and
|
---|
195 | \mimetype{message/rfc822}) will also be parsed as container
|
---|
196 | object containing a list payload of length 1. Their
|
---|
197 | \method{is_multipart()} method will return \code{True}. The
|
---|
198 | single element in the list payload will be a sub-message object.
|
---|
199 |
|
---|
200 | \item Some non-standards compliant messages may not be internally consistent
|
---|
201 | about their \mimetype{multipart}-edness. Such messages may have a
|
---|
202 | \mailheader{Content-Type} header of type \mimetype{multipart}, but their
|
---|
203 | \method{is_multipart()} method may return \code{False}. If such
|
---|
204 | messages were parsed with the \class{FeedParser}, they will have an
|
---|
205 | instance of the \class{MultipartInvariantViolationDefect} class in their
|
---|
206 | \var{defects} attribute list. See \refmodule{email.errors} for
|
---|
207 | details.
|
---|
208 | \end{itemize}
|
---|