source: python/trunk/Doc/library/shlex.rst

Last change on this file was 391, checked in by dmik, 11 years ago

python: Merge vendor 2.7.6 to trunk.

  • Property svn:eol-style set to native
File size: 11.0 KB
RevLine 
[2]1:mod:`shlex` --- Simple lexical analysis
2========================================
3
4.. module:: shlex
5 :synopsis: Simple lexical analysis for Unix shell-like languages.
6.. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
7.. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
8.. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
9.. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
10
11
12.. versionadded:: 1.5.2
13
[391]14**Source code:** :source:`Lib/shlex.py`
15
16--------------
17
18
19The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
20simple syntaxes resembling that of the Unix shell. This will often be useful
21for writing minilanguages, (for example, in run control files for Python
[2]22applications) or for parsing quoted strings.
23
[391]24Prior to Python 2.7.3, this module did not support Unicode input.
[2]25
26The :mod:`shlex` module defines the following functions:
27
28
29.. function:: split(s[, comments[, posix]])
30
31 Split the string *s* using shell-like syntax. If *comments* is :const:`False`
32 (the default), the parsing of comments in the given string will be disabled
[391]33 (setting the :attr:`~shlex.commenters` attribute of the
34 :class:`~shlex.shlex` instance to the empty string). This function operates
35 in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
36 false.
[2]37
38 .. versionadded:: 2.3
39
40 .. versionchanged:: 2.6
41 Added the *posix* parameter.
42
43 .. note::
44
[391]45 Since the :func:`split` function instantiates a :class:`~shlex.shlex`
46 instance, passing ``None`` for *s* will read the string to split from
47 standard input.
[2]48
49The :mod:`shlex` module defines the following class:
50
51
52.. class:: shlex([instream[, infile[, posix]]])
53
[391]54 A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
55 object. The initialization argument, if present, specifies where to read
56 characters from. It must be a file-/stream-like object with
57 :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
58 a string (strings are accepted since Python 2.3). If no argument is given,
59 input will be taken from ``sys.stdin``. The second optional argument is a
60 filename string, which sets the initial value of the :attr:`~shlex.infile`
61 attribute. If the *instream* argument is omitted or equal to ``sys.stdin``,
62 this second argument defaults to "stdin". The *posix* argument was
63 introduced in Python 2.3, and defines the operational mode. When *posix* is
64 not true (default), the :class:`~shlex.shlex` instance will operate in
65 compatibility mode. When operating in POSIX mode, :class:`~shlex.shlex`
66 will try to be as close as possible to the POSIX shell parsing rules.
[2]67
68
69.. seealso::
70
71 Module :mod:`ConfigParser`
72 Parser for configuration files similar to the Windows :file:`.ini` files.
73
74
75.. _shlex-objects:
76
77shlex Objects
78-------------
79
[391]80A :class:`~shlex.shlex` instance has the following methods:
[2]81
82
83.. method:: shlex.get_token()
84
85 Return a token. If tokens have been stacked using :meth:`push_token`, pop a
86 token off the stack. Otherwise, read one from the input stream. If reading
[391]87 encounters an immediate end-of-file, :attr:`eof` is returned (the empty
[2]88 string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
89
90
91.. method:: shlex.push_token(str)
92
93 Push the argument onto the token stack.
94
95
96.. method:: shlex.read_token()
97
98 Read a raw token. Ignore the pushback stack, and do not interpret source
99 requests. (This is not ordinarily a useful entry point, and is documented here
100 only for the sake of completeness.)
101
102
103.. method:: shlex.sourcehook(filename)
104
[391]105 When :class:`~shlex.shlex` detects a source request (see :attr:`source`
106 below) this method is given the following token as argument, and expected
107 to return a tuple consisting of a filename and an open file-like object.
[2]108
109 Normally, this method first strips any quotes off the argument. If the result
110 is an absolute pathname, or there was no previous source request in effect, or
111 the previous source was a stream (such as ``sys.stdin``), the result is left
112 alone. Otherwise, if the result is a relative pathname, the directory part of
113 the name of the file immediately before it on the source inclusion stack is
114 prepended (this behavior is like the way the C preprocessor handles ``#include
115 "file.h"``).
116
117 The result of the manipulations is treated as a filename, and returned as the
118 first component of the tuple, with :func:`open` called on it to yield the second
119 component. (Note: this is the reverse of the order of arguments in instance
120 initialization!)
121
122 This hook is exposed so that you can use it to implement directory search paths,
123 addition of file extensions, and other namespace hacks. There is no
[391]124 corresponding 'close' hook, but a shlex instance will call the
125 :meth:`~io.IOBase.close` method of the sourced input stream when it returns
126 EOF.
[2]127
128 For more explicit control of source stacking, use the :meth:`push_source` and
129 :meth:`pop_source` methods.
130
131
132.. method:: shlex.push_source(stream[, filename])
133
134 Push an input source stream onto the input stack. If the filename argument is
135 specified it will later be available for use in error messages. This is the
136 same method used internally by the :meth:`sourcehook` method.
137
138 .. versionadded:: 2.1
139
140
141.. method:: shlex.pop_source()
142
143 Pop the last-pushed input source from the input stack. This is the same method
144 used internally when the lexer reaches EOF on a stacked input stream.
145
146 .. versionadded:: 2.1
147
148
149.. method:: shlex.error_leader([file[, line]])
150
151 This method generates an error message leader in the format of a Unix C compiler
152 error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
153 with the name of the current source file and the ``%d`` with the current input
154 line number (the optional arguments can be used to override these).
155
156 This convenience is provided to encourage :mod:`shlex` users to generate error
157 messages in the standard, parseable format understood by Emacs and other Unix
158 tools.
159
[391]160Instances of :class:`~shlex.shlex` subclasses have some public instance
161variables which either control lexical analysis or can be used for debugging:
[2]162
163
164.. attribute:: shlex.commenters
165
166 The string of characters that are recognized as comment beginners. All
167 characters from the comment beginner to end of line are ignored. Includes just
168 ``'#'`` by default.
169
170
171.. attribute:: shlex.wordchars
172
173 The string of characters that will accumulate into multi-character tokens. By
174 default, includes all ASCII alphanumerics and underscore.
175
176
177.. attribute:: shlex.whitespace
178
179 Characters that will be considered whitespace and skipped. Whitespace bounds
180 tokens. By default, includes space, tab, linefeed and carriage-return.
181
182
183.. attribute:: shlex.escape
184
185 Characters that will be considered as escape. This will be only used in POSIX
186 mode, and includes just ``'\'`` by default.
187
188 .. versionadded:: 2.3
189
190
191.. attribute:: shlex.quotes
192
193 Characters that will be considered string quotes. The token accumulates until
194 the same quote is encountered again (thus, different quote types protect each
195 other as in the shell.) By default, includes ASCII single and double quotes.
196
197
198.. attribute:: shlex.escapedquotes
199
200 Characters in :attr:`quotes` that will interpret escape characters defined in
201 :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
202 default.
203
204 .. versionadded:: 2.3
205
206
207.. attribute:: shlex.whitespace_split
208
209 If ``True``, tokens will only be split in whitespaces. This is useful, for
[391]210 example, for parsing command lines with :class:`~shlex.shlex`, getting
211 tokens in a similar way to shell arguments.
[2]212
213 .. versionadded:: 2.3
214
215
216.. attribute:: shlex.infile
217
218 The name of the current input file, as initially set at class instantiation time
219 or stacked by later source requests. It may be useful to examine this when
220 constructing error messages.
221
222
223.. attribute:: shlex.instream
224
[391]225 The input stream from which this :class:`~shlex.shlex` instance is reading
226 characters.
[2]227
228
229.. attribute:: shlex.source
230
[391]231 This attribute is ``None`` by default. If you assign a string to it, that
232 string will be recognized as a lexical-level inclusion request similar to the
[2]233 ``source`` keyword in various shells. That is, the immediately following token
234 will opened as a filename and input taken from that stream until EOF, at which
[391]235 point the :meth:`~io.IOBase.close` method of that stream will be called and
236 the input source will again become the original input stream. Source
237 requests may be stacked any number of levels deep.
[2]238
239
240.. attribute:: shlex.debug
241
[391]242 If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
243 instance will print verbose progress output on its behavior. If you need
244 to use this, you can read the module source code to learn the details.
[2]245
246
247.. attribute:: shlex.lineno
248
249 Source line number (count of newlines seen so far plus one).
250
251
252.. attribute:: shlex.token
253
254 The token buffer. It may be useful to examine this when catching exceptions.
255
256
257.. attribute:: shlex.eof
258
259 Token used to determine end of file. This will be set to the empty string
260 (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
261
262 .. versionadded:: 2.3
263
264
265.. _shlex-parsing-rules:
266
267Parsing Rules
268-------------
269
[391]270When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
[2]271following rules.
272
273* Quote characters are not recognized within words (``Do"Not"Separate`` is
274 parsed as the single word ``Do"Not"Separate``);
275
276* Escape characters are not recognized;
277
278* Enclosing characters in quotes preserve the literal value of all characters
279 within the quotes;
280
281* Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
282 ``Separate``);
283
[391]284* If :attr:`~shlex.whitespace_split` is ``False``, any character not
285 declared to be a word character, whitespace, or a quote will be returned as
286 a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
287 split words in whitespaces;
[2]288
289* EOF is signaled with an empty string (``''``);
290
291* It's not possible to parse empty strings, even if quoted.
292
[391]293When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
294following parsing rules.
[2]295
296* Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
297 parsed as the single word ``DoNotSeparate``);
298
299* Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
300 next character that follows;
301
[391]302* Enclosing characters in quotes which are not part of
303 :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
304 of all characters within the quotes;
[2]305
[391]306* Enclosing characters in quotes which are part of
307 :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
308 of all characters within the quotes, with the exception of the characters
309 mentioned in :attr:`~shlex.escape`. The escape characters retain its
310 special meaning only when followed by the quote in use, or the escape
311 character itself. Otherwise the escape character will be considered a
[2]312 normal character.
313
314* EOF is signaled with a :const:`None` value;
315
316* Quoted empty strings (``''``) are allowed;
317
Note: See TracBrowser for help on using the repository browser.