[2] | 1 | :mod:`shlex` --- Simple lexical analysis
|
---|
| 2 | ========================================
|
---|
| 3 |
|
---|
| 4 | .. module:: shlex
|
---|
| 5 | :synopsis: Simple lexical analysis for Unix shell-like languages.
|
---|
| 6 | .. moduleauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
|
---|
| 7 | .. moduleauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
|
---|
| 8 | .. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com>
|
---|
| 9 | .. sectionauthor:: Gustavo Niemeyer <niemeyer@conectiva.com>
|
---|
| 10 |
|
---|
| 11 |
|
---|
| 12 | .. versionadded:: 1.5.2
|
---|
| 13 |
|
---|
[391] | 14 | **Source code:** :source:`Lib/shlex.py`
|
---|
| 15 |
|
---|
| 16 | --------------
|
---|
| 17 |
|
---|
| 18 |
|
---|
| 19 | The :class:`~shlex.shlex` class makes it easy to write lexical analyzers for
|
---|
| 20 | simple syntaxes resembling that of the Unix shell. This will often be useful
|
---|
| 21 | for writing minilanguages, (for example, in run control files for Python
|
---|
[2] | 22 | applications) or for parsing quoted strings.
|
---|
| 23 |
|
---|
[391] | 24 | Prior to Python 2.7.3, this module did not support Unicode input.
|
---|
[2] | 25 |
|
---|
| 26 | The :mod:`shlex` module defines the following functions:
|
---|
| 27 |
|
---|
| 28 |
|
---|
| 29 | .. function:: split(s[, comments[, posix]])
|
---|
| 30 |
|
---|
| 31 | Split the string *s* using shell-like syntax. If *comments* is :const:`False`
|
---|
| 32 | (the default), the parsing of comments in the given string will be disabled
|
---|
[391] | 33 | (setting the :attr:`~shlex.commenters` attribute of the
|
---|
| 34 | :class:`~shlex.shlex` instance to the empty string). This function operates
|
---|
| 35 | in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is
|
---|
| 36 | false.
|
---|
[2] | 37 |
|
---|
| 38 | .. versionadded:: 2.3
|
---|
| 39 |
|
---|
| 40 | .. versionchanged:: 2.6
|
---|
| 41 | Added the *posix* parameter.
|
---|
| 42 |
|
---|
| 43 | .. note::
|
---|
| 44 |
|
---|
[391] | 45 | Since the :func:`split` function instantiates a :class:`~shlex.shlex`
|
---|
| 46 | instance, passing ``None`` for *s* will read the string to split from
|
---|
| 47 | standard input.
|
---|
[2] | 48 |
|
---|
| 49 | The :mod:`shlex` module defines the following class:
|
---|
| 50 |
|
---|
| 51 |
|
---|
| 52 | .. class:: shlex([instream[, infile[, posix]]])
|
---|
| 53 |
|
---|
[391] | 54 | A :class:`~shlex.shlex` instance or subclass instance is a lexical analyzer
|
---|
| 55 | object. The initialization argument, if present, specifies where to read
|
---|
| 56 | characters from. It must be a file-/stream-like object with
|
---|
| 57 | :meth:`~io.TextIOBase.read` and :meth:`~io.TextIOBase.readline` methods, or
|
---|
| 58 | a string (strings are accepted since Python 2.3). If no argument is given,
|
---|
| 59 | input will be taken from ``sys.stdin``. The second optional argument is a
|
---|
| 60 | filename string, which sets the initial value of the :attr:`~shlex.infile`
|
---|
| 61 | attribute. If the *instream* argument is omitted or equal to ``sys.stdin``,
|
---|
| 62 | this second argument defaults to "stdin". The *posix* argument was
|
---|
| 63 | introduced in Python 2.3, and defines the operational mode. When *posix* is
|
---|
| 64 | not true (default), the :class:`~shlex.shlex` instance will operate in
|
---|
| 65 | compatibility mode. When operating in POSIX mode, :class:`~shlex.shlex`
|
---|
| 66 | will try to be as close as possible to the POSIX shell parsing rules.
|
---|
[2] | 67 |
|
---|
| 68 |
|
---|
| 69 | .. seealso::
|
---|
| 70 |
|
---|
| 71 | Module :mod:`ConfigParser`
|
---|
| 72 | Parser for configuration files similar to the Windows :file:`.ini` files.
|
---|
| 73 |
|
---|
| 74 |
|
---|
| 75 | .. _shlex-objects:
|
---|
| 76 |
|
---|
| 77 | shlex Objects
|
---|
| 78 | -------------
|
---|
| 79 |
|
---|
[391] | 80 | A :class:`~shlex.shlex` instance has the following methods:
|
---|
[2] | 81 |
|
---|
| 82 |
|
---|
| 83 | .. method:: shlex.get_token()
|
---|
| 84 |
|
---|
| 85 | Return a token. If tokens have been stacked using :meth:`push_token`, pop a
|
---|
| 86 | token off the stack. Otherwise, read one from the input stream. If reading
|
---|
[391] | 87 | encounters an immediate end-of-file, :attr:`eof` is returned (the empty
|
---|
[2] | 88 | string (``''``) in non-POSIX mode, and ``None`` in POSIX mode).
|
---|
| 89 |
|
---|
| 90 |
|
---|
| 91 | .. method:: shlex.push_token(str)
|
---|
| 92 |
|
---|
| 93 | Push the argument onto the token stack.
|
---|
| 94 |
|
---|
| 95 |
|
---|
| 96 | .. method:: shlex.read_token()
|
---|
| 97 |
|
---|
| 98 | Read a raw token. Ignore the pushback stack, and do not interpret source
|
---|
| 99 | requests. (This is not ordinarily a useful entry point, and is documented here
|
---|
| 100 | only for the sake of completeness.)
|
---|
| 101 |
|
---|
| 102 |
|
---|
| 103 | .. method:: shlex.sourcehook(filename)
|
---|
| 104 |
|
---|
[391] | 105 | When :class:`~shlex.shlex` detects a source request (see :attr:`source`
|
---|
| 106 | below) this method is given the following token as argument, and expected
|
---|
| 107 | to return a tuple consisting of a filename and an open file-like object.
|
---|
[2] | 108 |
|
---|
| 109 | Normally, this method first strips any quotes off the argument. If the result
|
---|
| 110 | is an absolute pathname, or there was no previous source request in effect, or
|
---|
| 111 | the previous source was a stream (such as ``sys.stdin``), the result is left
|
---|
| 112 | alone. Otherwise, if the result is a relative pathname, the directory part of
|
---|
| 113 | the name of the file immediately before it on the source inclusion stack is
|
---|
| 114 | prepended (this behavior is like the way the C preprocessor handles ``#include
|
---|
| 115 | "file.h"``).
|
---|
| 116 |
|
---|
| 117 | The result of the manipulations is treated as a filename, and returned as the
|
---|
| 118 | first component of the tuple, with :func:`open` called on it to yield the second
|
---|
| 119 | component. (Note: this is the reverse of the order of arguments in instance
|
---|
| 120 | initialization!)
|
---|
| 121 |
|
---|
| 122 | This hook is exposed so that you can use it to implement directory search paths,
|
---|
| 123 | addition of file extensions, and other namespace hacks. There is no
|
---|
[391] | 124 | corresponding 'close' hook, but a shlex instance will call the
|
---|
| 125 | :meth:`~io.IOBase.close` method of the sourced input stream when it returns
|
---|
| 126 | EOF.
|
---|
[2] | 127 |
|
---|
| 128 | For more explicit control of source stacking, use the :meth:`push_source` and
|
---|
| 129 | :meth:`pop_source` methods.
|
---|
| 130 |
|
---|
| 131 |
|
---|
| 132 | .. method:: shlex.push_source(stream[, filename])
|
---|
| 133 |
|
---|
| 134 | Push an input source stream onto the input stack. If the filename argument is
|
---|
| 135 | specified it will later be available for use in error messages. This is the
|
---|
| 136 | same method used internally by the :meth:`sourcehook` method.
|
---|
| 137 |
|
---|
| 138 | .. versionadded:: 2.1
|
---|
| 139 |
|
---|
| 140 |
|
---|
| 141 | .. method:: shlex.pop_source()
|
---|
| 142 |
|
---|
| 143 | Pop the last-pushed input source from the input stack. This is the same method
|
---|
| 144 | used internally when the lexer reaches EOF on a stacked input stream.
|
---|
| 145 |
|
---|
| 146 | .. versionadded:: 2.1
|
---|
| 147 |
|
---|
| 148 |
|
---|
| 149 | .. method:: shlex.error_leader([file[, line]])
|
---|
| 150 |
|
---|
| 151 | This method generates an error message leader in the format of a Unix C compiler
|
---|
| 152 | error label; the format is ``'"%s", line %d: '``, where the ``%s`` is replaced
|
---|
| 153 | with the name of the current source file and the ``%d`` with the current input
|
---|
| 154 | line number (the optional arguments can be used to override these).
|
---|
| 155 |
|
---|
| 156 | This convenience is provided to encourage :mod:`shlex` users to generate error
|
---|
| 157 | messages in the standard, parseable format understood by Emacs and other Unix
|
---|
| 158 | tools.
|
---|
| 159 |
|
---|
[391] | 160 | Instances of :class:`~shlex.shlex` subclasses have some public instance
|
---|
| 161 | variables which either control lexical analysis or can be used for debugging:
|
---|
[2] | 162 |
|
---|
| 163 |
|
---|
| 164 | .. attribute:: shlex.commenters
|
---|
| 165 |
|
---|
| 166 | The string of characters that are recognized as comment beginners. All
|
---|
| 167 | characters from the comment beginner to end of line are ignored. Includes just
|
---|
| 168 | ``'#'`` by default.
|
---|
| 169 |
|
---|
| 170 |
|
---|
| 171 | .. attribute:: shlex.wordchars
|
---|
| 172 |
|
---|
| 173 | The string of characters that will accumulate into multi-character tokens. By
|
---|
| 174 | default, includes all ASCII alphanumerics and underscore.
|
---|
| 175 |
|
---|
| 176 |
|
---|
| 177 | .. attribute:: shlex.whitespace
|
---|
| 178 |
|
---|
| 179 | Characters that will be considered whitespace and skipped. Whitespace bounds
|
---|
| 180 | tokens. By default, includes space, tab, linefeed and carriage-return.
|
---|
| 181 |
|
---|
| 182 |
|
---|
| 183 | .. attribute:: shlex.escape
|
---|
| 184 |
|
---|
| 185 | Characters that will be considered as escape. This will be only used in POSIX
|
---|
| 186 | mode, and includes just ``'\'`` by default.
|
---|
| 187 |
|
---|
| 188 | .. versionadded:: 2.3
|
---|
| 189 |
|
---|
| 190 |
|
---|
| 191 | .. attribute:: shlex.quotes
|
---|
| 192 |
|
---|
| 193 | Characters that will be considered string quotes. The token accumulates until
|
---|
| 194 | the same quote is encountered again (thus, different quote types protect each
|
---|
| 195 | other as in the shell.) By default, includes ASCII single and double quotes.
|
---|
| 196 |
|
---|
| 197 |
|
---|
| 198 | .. attribute:: shlex.escapedquotes
|
---|
| 199 |
|
---|
| 200 | Characters in :attr:`quotes` that will interpret escape characters defined in
|
---|
| 201 | :attr:`escape`. This is only used in POSIX mode, and includes just ``'"'`` by
|
---|
| 202 | default.
|
---|
| 203 |
|
---|
| 204 | .. versionadded:: 2.3
|
---|
| 205 |
|
---|
| 206 |
|
---|
| 207 | .. attribute:: shlex.whitespace_split
|
---|
| 208 |
|
---|
| 209 | If ``True``, tokens will only be split in whitespaces. This is useful, for
|
---|
[391] | 210 | example, for parsing command lines with :class:`~shlex.shlex`, getting
|
---|
| 211 | tokens in a similar way to shell arguments.
|
---|
[2] | 212 |
|
---|
| 213 | .. versionadded:: 2.3
|
---|
| 214 |
|
---|
| 215 |
|
---|
| 216 | .. attribute:: shlex.infile
|
---|
| 217 |
|
---|
| 218 | The name of the current input file, as initially set at class instantiation time
|
---|
| 219 | or stacked by later source requests. It may be useful to examine this when
|
---|
| 220 | constructing error messages.
|
---|
| 221 |
|
---|
| 222 |
|
---|
| 223 | .. attribute:: shlex.instream
|
---|
| 224 |
|
---|
[391] | 225 | The input stream from which this :class:`~shlex.shlex` instance is reading
|
---|
| 226 | characters.
|
---|
[2] | 227 |
|
---|
| 228 |
|
---|
| 229 | .. attribute:: shlex.source
|
---|
| 230 |
|
---|
[391] | 231 | This attribute is ``None`` by default. If you assign a string to it, that
|
---|
| 232 | string will be recognized as a lexical-level inclusion request similar to the
|
---|
[2] | 233 | ``source`` keyword in various shells. That is, the immediately following token
|
---|
| 234 | will opened as a filename and input taken from that stream until EOF, at which
|
---|
[391] | 235 | point the :meth:`~io.IOBase.close` method of that stream will be called and
|
---|
| 236 | the input source will again become the original input stream. Source
|
---|
| 237 | requests may be stacked any number of levels deep.
|
---|
[2] | 238 |
|
---|
| 239 |
|
---|
| 240 | .. attribute:: shlex.debug
|
---|
| 241 |
|
---|
[391] | 242 | If this attribute is numeric and ``1`` or more, a :class:`~shlex.shlex`
|
---|
| 243 | instance will print verbose progress output on its behavior. If you need
|
---|
| 244 | to use this, you can read the module source code to learn the details.
|
---|
[2] | 245 |
|
---|
| 246 |
|
---|
| 247 | .. attribute:: shlex.lineno
|
---|
| 248 |
|
---|
| 249 | Source line number (count of newlines seen so far plus one).
|
---|
| 250 |
|
---|
| 251 |
|
---|
| 252 | .. attribute:: shlex.token
|
---|
| 253 |
|
---|
| 254 | The token buffer. It may be useful to examine this when catching exceptions.
|
---|
| 255 |
|
---|
| 256 |
|
---|
| 257 | .. attribute:: shlex.eof
|
---|
| 258 |
|
---|
| 259 | Token used to determine end of file. This will be set to the empty string
|
---|
| 260 | (``''``), in non-POSIX mode, and to ``None`` in POSIX mode.
|
---|
| 261 |
|
---|
| 262 | .. versionadded:: 2.3
|
---|
| 263 |
|
---|
| 264 |
|
---|
| 265 | .. _shlex-parsing-rules:
|
---|
| 266 |
|
---|
| 267 | Parsing Rules
|
---|
| 268 | -------------
|
---|
| 269 |
|
---|
[391] | 270 | When operating in non-POSIX mode, :class:`~shlex.shlex` will try to obey to the
|
---|
[2] | 271 | following rules.
|
---|
| 272 |
|
---|
| 273 | * Quote characters are not recognized within words (``Do"Not"Separate`` is
|
---|
| 274 | parsed as the single word ``Do"Not"Separate``);
|
---|
| 275 |
|
---|
| 276 | * Escape characters are not recognized;
|
---|
| 277 |
|
---|
| 278 | * Enclosing characters in quotes preserve the literal value of all characters
|
---|
| 279 | within the quotes;
|
---|
| 280 |
|
---|
| 281 | * Closing quotes separate words (``"Do"Separate`` is parsed as ``"Do"`` and
|
---|
| 282 | ``Separate``);
|
---|
| 283 |
|
---|
[391] | 284 | * If :attr:`~shlex.whitespace_split` is ``False``, any character not
|
---|
| 285 | declared to be a word character, whitespace, or a quote will be returned as
|
---|
| 286 | a single-character token. If it is ``True``, :class:`~shlex.shlex` will only
|
---|
| 287 | split words in whitespaces;
|
---|
[2] | 288 |
|
---|
| 289 | * EOF is signaled with an empty string (``''``);
|
---|
| 290 |
|
---|
| 291 | * It's not possible to parse empty strings, even if quoted.
|
---|
| 292 |
|
---|
[391] | 293 | When operating in POSIX mode, :class:`~shlex.shlex` will try to obey to the
|
---|
| 294 | following parsing rules.
|
---|
[2] | 295 |
|
---|
| 296 | * Quotes are stripped out, and do not separate words (``"Do"Not"Separate"`` is
|
---|
| 297 | parsed as the single word ``DoNotSeparate``);
|
---|
| 298 |
|
---|
| 299 | * Non-quoted escape characters (e.g. ``'\'``) preserve the literal value of the
|
---|
| 300 | next character that follows;
|
---|
| 301 |
|
---|
[391] | 302 | * Enclosing characters in quotes which are not part of
|
---|
| 303 | :attr:`~shlex.escapedquotes` (e.g. ``"'"``) preserve the literal value
|
---|
| 304 | of all characters within the quotes;
|
---|
[2] | 305 |
|
---|
[391] | 306 | * Enclosing characters in quotes which are part of
|
---|
| 307 | :attr:`~shlex.escapedquotes` (e.g. ``'"'``) preserves the literal value
|
---|
| 308 | of all characters within the quotes, with the exception of the characters
|
---|
| 309 | mentioned in :attr:`~shlex.escape`. The escape characters retain its
|
---|
| 310 | special meaning only when followed by the quote in use, or the escape
|
---|
| 311 | character itself. Otherwise the escape character will be considered a
|
---|
[2] | 312 | normal character.
|
---|
| 313 |
|
---|
| 314 | * EOF is signaled with a :const:`None` value;
|
---|
| 315 |
|
---|
| 316 | * Quoted empty strings (``''``) are allowed;
|
---|
| 317 |
|
---|