[2] | 1 |
|
---|
| 2 | :mod:`parser` --- Access Python parse trees
|
---|
| 3 | ===========================================
|
---|
| 4 |
|
---|
| 5 | .. module:: parser
|
---|
| 6 | :synopsis: Access parse trees for Python source code.
|
---|
| 7 | .. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
|
---|
| 8 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
|
---|
| 9 |
|
---|
| 10 |
|
---|
| 11 | .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
|
---|
| 12 | L. Drake, Jr. This copyright notice must be distributed on all copies, but
|
---|
| 13 | this document otherwise may be distributed as part of the Python
|
---|
| 14 | distribution. No fee may be charged for this document in any representation,
|
---|
| 15 | either on paper or electronically. This restriction does not affect other
|
---|
| 16 | elements in a distributed package in any way.
|
---|
| 17 |
|
---|
| 18 | .. index:: single: parsing; Python source code
|
---|
| 19 |
|
---|
| 20 | The :mod:`parser` module provides an interface to Python's internal parser and
|
---|
| 21 | byte-code compiler. The primary purpose for this interface is to allow Python
|
---|
| 22 | code to edit the parse tree of a Python expression and create executable code
|
---|
| 23 | from this. This is better than trying to parse and modify an arbitrary Python
|
---|
| 24 | code fragment as a string because parsing is performed in a manner identical to
|
---|
| 25 | the code forming the application. It is also faster.
|
---|
| 26 |
|
---|
| 27 | .. note::
|
---|
| 28 |
|
---|
| 29 | From Python 2.5 onward, it's much more convenient to cut in at the Abstract
|
---|
| 30 | Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
|
---|
| 31 | module.
|
---|
| 32 |
|
---|
| 33 | The :mod:`parser` module exports the names documented here also with "st"
|
---|
| 34 | replaced by "ast"; this is a legacy from the time when there was no other
|
---|
| 35 | AST and has nothing to do with the AST found in Python 2.5. This is also the
|
---|
| 36 | reason for the functions' keyword arguments being called *ast*, not *st*.
|
---|
[391] | 37 | The "ast" functions have been removed in Python 3.
|
---|
[2] | 38 |
|
---|
| 39 | There are a few things to note about this module which are important to making
|
---|
| 40 | use of the data structures created. This is not a tutorial on editing the parse
|
---|
| 41 | trees for Python code, but some examples of using the :mod:`parser` module are
|
---|
| 42 | presented.
|
---|
| 43 |
|
---|
| 44 | Most importantly, a good understanding of the Python grammar processed by the
|
---|
| 45 | internal parser is required. For full information on the language syntax, refer
|
---|
| 46 | to :ref:`reference-index`. The parser
|
---|
| 47 | itself is created from a grammar specification defined in the file
|
---|
| 48 | :file:`Grammar/Grammar` in the standard Python distribution. The parse trees
|
---|
| 49 | stored in the ST objects created by this module are the actual output from the
|
---|
| 50 | internal parser when created by the :func:`expr` or :func:`suite` functions,
|
---|
| 51 | described below. The ST objects created by :func:`sequence2st` faithfully
|
---|
| 52 | simulate those structures. Be aware that the values of the sequences which are
|
---|
| 53 | considered "correct" will vary from one version of Python to another as the
|
---|
| 54 | formal grammar for the language is revised. However, transporting code from one
|
---|
| 55 | Python version to another as source text will always allow correct parse trees
|
---|
| 56 | to be created in the target version, with the only restriction being that
|
---|
| 57 | migrating to an older version of the interpreter will not support more recent
|
---|
| 58 | language constructs. The parse trees are not typically compatible from one
|
---|
| 59 | version to another, whereas source code has always been forward-compatible.
|
---|
| 60 |
|
---|
| 61 | Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
|
---|
| 62 | has a simple form. Sequences representing non-terminal elements in the grammar
|
---|
| 63 | always have a length greater than one. The first element is an integer which
|
---|
| 64 | identifies a production in the grammar. These integers are given symbolic names
|
---|
| 65 | in the C header file :file:`Include/graminit.h` and the Python module
|
---|
| 66 | :mod:`symbol`. Each additional element of the sequence represents a component
|
---|
| 67 | of the production as recognized in the input string: these are always sequences
|
---|
| 68 | which have the same form as the parent. An important aspect of this structure
|
---|
| 69 | which should be noted is that keywords used to identify the parent node type,
|
---|
| 70 | such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
|
---|
| 71 | node tree without any special treatment. For example, the :keyword:`if` keyword
|
---|
| 72 | is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
|
---|
| 73 | associated with all :const:`NAME` tokens, including variable and function names
|
---|
| 74 | defined by the user. In an alternate form returned when line number information
|
---|
| 75 | is requested, the same token might be represented as ``(1, 'if', 12)``, where
|
---|
| 76 | the ``12`` represents the line number at which the terminal symbol was found.
|
---|
| 77 |
|
---|
| 78 | Terminal elements are represented in much the same way, but without any child
|
---|
| 79 | elements and the addition of the source text which was identified. The example
|
---|
| 80 | of the :keyword:`if` keyword above is representative. The various types of
|
---|
| 81 | terminal symbols are defined in the C header file :file:`Include/token.h` and
|
---|
| 82 | the Python module :mod:`token`.
|
---|
| 83 |
|
---|
| 84 | The ST objects are not required to support the functionality of this module,
|
---|
| 85 | but are provided for three purposes: to allow an application to amortize the
|
---|
| 86 | cost of processing complex parse trees, to provide a parse tree representation
|
---|
| 87 | which conserves memory space when compared to the Python list or tuple
|
---|
| 88 | representation, and to ease the creation of additional modules in C which
|
---|
| 89 | manipulate parse trees. A simple "wrapper" class may be created in Python to
|
---|
| 90 | hide the use of ST objects.
|
---|
| 91 |
|
---|
| 92 | The :mod:`parser` module defines functions for a few distinct purposes. The
|
---|
| 93 | most important purposes are to create ST objects and to convert ST objects to
|
---|
| 94 | other representations such as parse trees and compiled code objects, but there
|
---|
| 95 | are also functions which serve to query the type of parse tree represented by an
|
---|
| 96 | ST object.
|
---|
| 97 |
|
---|
| 98 |
|
---|
| 99 | .. seealso::
|
---|
| 100 |
|
---|
| 101 | Module :mod:`symbol`
|
---|
| 102 | Useful constants representing internal nodes of the parse tree.
|
---|
| 103 |
|
---|
| 104 | Module :mod:`token`
|
---|
| 105 | Useful constants representing leaf nodes of the parse tree and functions for
|
---|
| 106 | testing node values.
|
---|
| 107 |
|
---|
| 108 |
|
---|
| 109 | .. _creating-sts:
|
---|
| 110 |
|
---|
| 111 | Creating ST Objects
|
---|
| 112 | -------------------
|
---|
| 113 |
|
---|
| 114 | ST objects may be created from source code or from a parse tree. When creating
|
---|
| 115 | an ST object from source, different functions are used to create the ``'eval'``
|
---|
| 116 | and ``'exec'`` forms.
|
---|
| 117 |
|
---|
| 118 |
|
---|
| 119 | .. function:: expr(source)
|
---|
| 120 |
|
---|
| 121 | The :func:`expr` function parses the parameter *source* as if it were an input
|
---|
| 122 | to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object
|
---|
| 123 | is created to hold the internal parse tree representation, otherwise an
|
---|
[391] | 124 | appropriate exception is raised.
|
---|
[2] | 125 |
|
---|
| 126 |
|
---|
| 127 | .. function:: suite(source)
|
---|
| 128 |
|
---|
| 129 | The :func:`suite` function parses the parameter *source* as if it were an input
|
---|
| 130 | to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object
|
---|
| 131 | is created to hold the internal parse tree representation, otherwise an
|
---|
[391] | 132 | appropriate exception is raised.
|
---|
[2] | 133 |
|
---|
| 134 |
|
---|
| 135 | .. function:: sequence2st(sequence)
|
---|
| 136 |
|
---|
| 137 | This function accepts a parse tree represented as a sequence and builds an
|
---|
| 138 | internal representation if possible. If it can validate that the tree conforms
|
---|
| 139 | to the Python grammar and all nodes are valid node types in the host version of
|
---|
| 140 | Python, an ST object is created from the internal representation and returned
|
---|
| 141 | to the called. If there is a problem creating the internal representation, or
|
---|
[391] | 142 | if the tree cannot be validated, a :exc:`ParserError` exception is raised. An
|
---|
[2] | 143 | ST object created this way should not be assumed to compile correctly; normal
|
---|
[391] | 144 | exceptions raised by compilation may still be initiated when the ST object is
|
---|
[2] | 145 | passed to :func:`compilest`. This may indicate problems not related to syntax
|
---|
| 146 | (such as a :exc:`MemoryError` exception), but may also be due to constructs such
|
---|
| 147 | as the result of parsing ``del f(0)``, which escapes the Python parser but is
|
---|
| 148 | checked by the bytecode compiler.
|
---|
| 149 |
|
---|
| 150 | Sequences representing terminal tokens may be represented as either two-element
|
---|
| 151 | lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
|
---|
| 152 | 'name', 56)``. If the third element is present, it is assumed to be a valid
|
---|
| 153 | line number. The line number may be specified for any subset of the terminal
|
---|
| 154 | symbols in the input tree.
|
---|
| 155 |
|
---|
| 156 |
|
---|
| 157 | .. function:: tuple2st(sequence)
|
---|
| 158 |
|
---|
| 159 | This is the same function as :func:`sequence2st`. This entry point is
|
---|
| 160 | maintained for backward compatibility.
|
---|
| 161 |
|
---|
| 162 |
|
---|
| 163 | .. _converting-sts:
|
---|
| 164 |
|
---|
| 165 | Converting ST Objects
|
---|
| 166 | ---------------------
|
---|
| 167 |
|
---|
| 168 | ST objects, regardless of the input used to create them, may be converted to
|
---|
| 169 | parse trees represented as list- or tuple- trees, or may be compiled into
|
---|
| 170 | executable code objects. Parse trees may be extracted with or without line
|
---|
| 171 | numbering information.
|
---|
| 172 |
|
---|
| 173 |
|
---|
| 174 | .. function:: st2list(ast[, line_info])
|
---|
| 175 |
|
---|
| 176 | This function accepts an ST object from the caller in *ast* and returns a
|
---|
| 177 | Python list representing the equivalent parse tree. The resulting list
|
---|
| 178 | representation can be used for inspection or the creation of a new parse tree in
|
---|
| 179 | list form. This function does not fail so long as memory is available to build
|
---|
| 180 | the list representation. If the parse tree will only be used for inspection,
|
---|
| 181 | :func:`st2tuple` should be used instead to reduce memory consumption and
|
---|
| 182 | fragmentation. When the list representation is required, this function is
|
---|
| 183 | significantly faster than retrieving a tuple representation and converting that
|
---|
| 184 | to nested lists.
|
---|
| 185 |
|
---|
| 186 | If *line_info* is true, line number information will be included for all
|
---|
| 187 | terminal tokens as a third element of the list representing the token. Note
|
---|
| 188 | that the line number provided specifies the line on which the token *ends*.
|
---|
| 189 | This information is omitted if the flag is false or omitted.
|
---|
| 190 |
|
---|
| 191 |
|
---|
| 192 | .. function:: st2tuple(ast[, line_info])
|
---|
| 193 |
|
---|
| 194 | This function accepts an ST object from the caller in *ast* and returns a
|
---|
| 195 | Python tuple representing the equivalent parse tree. Other than returning a
|
---|
| 196 | tuple instead of a list, this function is identical to :func:`st2list`.
|
---|
| 197 |
|
---|
| 198 | If *line_info* is true, line number information will be included for all
|
---|
| 199 | terminal tokens as a third element of the list representing the token. This
|
---|
| 200 | information is omitted if the flag is false or omitted.
|
---|
| 201 |
|
---|
| 202 |
|
---|
[391] | 203 | .. function:: compilest(ast, filename='<syntax-tree>')
|
---|
[2] | 204 |
|
---|
| 205 | .. index:: builtin: eval
|
---|
| 206 |
|
---|
| 207 | The Python byte compiler can be invoked on an ST object to produce code objects
|
---|
| 208 | which can be used as part of an :keyword:`exec` statement or a call to the
|
---|
| 209 | built-in :func:`eval` function. This function provides the interface to the
|
---|
| 210 | compiler, passing the internal parse tree from *ast* to the parser, using the
|
---|
| 211 | source file name specified by the *filename* parameter. The default value
|
---|
| 212 | supplied for *filename* indicates that the source was an ST object.
|
---|
| 213 |
|
---|
| 214 | Compiling an ST object may result in exceptions related to compilation; an
|
---|
| 215 | example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
|
---|
| 216 | this statement is considered legal within the formal grammar for Python but is
|
---|
| 217 | not a legal language construct. The :exc:`SyntaxError` raised for this
|
---|
| 218 | condition is actually generated by the Python byte-compiler normally, which is
|
---|
| 219 | why it can be raised at this point by the :mod:`parser` module. Most causes of
|
---|
| 220 | compilation failure can be diagnosed programmatically by inspection of the parse
|
---|
| 221 | tree.
|
---|
| 222 |
|
---|
| 223 |
|
---|
| 224 | .. _querying-sts:
|
---|
| 225 |
|
---|
| 226 | Queries on ST Objects
|
---|
| 227 | ---------------------
|
---|
| 228 |
|
---|
| 229 | Two functions are provided which allow an application to determine if an ST was
|
---|
| 230 | created as an expression or a suite. Neither of these functions can be used to
|
---|
| 231 | determine if an ST was created from source code via :func:`expr` or
|
---|
| 232 | :func:`suite` or from a parse tree via :func:`sequence2st`.
|
---|
| 233 |
|
---|
| 234 |
|
---|
| 235 | .. function:: isexpr(ast)
|
---|
| 236 |
|
---|
| 237 | .. index:: builtin: compile
|
---|
| 238 |
|
---|
| 239 | When *ast* represents an ``'eval'`` form, this function returns true, otherwise
|
---|
| 240 | it returns false. This is useful, since code objects normally cannot be queried
|
---|
| 241 | for this information using existing built-in functions. Note that the code
|
---|
| 242 | objects created by :func:`compilest` cannot be queried like this either, and
|
---|
| 243 | are identical to those created by the built-in :func:`compile` function.
|
---|
| 244 |
|
---|
| 245 |
|
---|
| 246 | .. function:: issuite(ast)
|
---|
| 247 |
|
---|
| 248 | This function mirrors :func:`isexpr` in that it reports whether an ST object
|
---|
| 249 | represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
|
---|
| 250 | assume that this function is equivalent to ``not isexpr(ast)``, as additional
|
---|
| 251 | syntactic fragments may be supported in the future.
|
---|
| 252 |
|
---|
| 253 |
|
---|
| 254 | .. _st-errors:
|
---|
| 255 |
|
---|
| 256 | Exceptions and Error Handling
|
---|
| 257 | -----------------------------
|
---|
| 258 |
|
---|
| 259 | The parser module defines a single exception, but may also pass other built-in
|
---|
| 260 | exceptions from other portions of the Python runtime environment. See each
|
---|
| 261 | function for information about the exceptions it can raise.
|
---|
| 262 |
|
---|
| 263 |
|
---|
| 264 | .. exception:: ParserError
|
---|
| 265 |
|
---|
| 266 | Exception raised when a failure occurs within the parser module. This is
|
---|
[391] | 267 | generally produced for validation failures rather than the built-in
|
---|
| 268 | :exc:`SyntaxError` raised during normal parsing. The exception argument is
|
---|
[2] | 269 | either a string describing the reason of the failure or a tuple containing a
|
---|
| 270 | sequence causing the failure from a parse tree passed to :func:`sequence2st`
|
---|
| 271 | and an explanatory string. Calls to :func:`sequence2st` need to be able to
|
---|
| 272 | handle either type of exception, while calls to other functions in the module
|
---|
| 273 | will only need to be aware of the simple string values.
|
---|
| 274 |
|
---|
| 275 | Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
|
---|
[391] | 276 | raise exceptions which are normally raised by the parsing and compilation
|
---|
[2] | 277 | process. These include the built in exceptions :exc:`MemoryError`,
|
---|
| 278 | :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
|
---|
| 279 | cases, these exceptions carry all the meaning normally associated with them.
|
---|
| 280 | Refer to the descriptions of each function for detailed information.
|
---|
| 281 |
|
---|
| 282 |
|
---|
| 283 | .. _st-objects:
|
---|
| 284 |
|
---|
| 285 | ST Objects
|
---|
| 286 | ----------
|
---|
| 287 |
|
---|
| 288 | Ordered and equality comparisons are supported between ST objects. Pickling of
|
---|
| 289 | ST objects (using the :mod:`pickle` module) is also supported.
|
---|
| 290 |
|
---|
| 291 |
|
---|
| 292 | .. data:: STType
|
---|
| 293 |
|
---|
| 294 | The type of the objects returned by :func:`expr`, :func:`suite` and
|
---|
| 295 | :func:`sequence2st`.
|
---|
| 296 |
|
---|
| 297 | ST objects have the following methods:
|
---|
| 298 |
|
---|
| 299 |
|
---|
| 300 | .. method:: ST.compile([filename])
|
---|
| 301 |
|
---|
| 302 | Same as ``compilest(st, filename)``.
|
---|
| 303 |
|
---|
| 304 |
|
---|
| 305 | .. method:: ST.isexpr()
|
---|
| 306 |
|
---|
| 307 | Same as ``isexpr(st)``.
|
---|
| 308 |
|
---|
| 309 |
|
---|
| 310 | .. method:: ST.issuite()
|
---|
| 311 |
|
---|
| 312 | Same as ``issuite(st)``.
|
---|
| 313 |
|
---|
| 314 |
|
---|
| 315 | .. method:: ST.tolist([line_info])
|
---|
| 316 |
|
---|
| 317 | Same as ``st2list(st, line_info)``.
|
---|
| 318 |
|
---|
| 319 |
|
---|
| 320 | .. method:: ST.totuple([line_info])
|
---|
| 321 |
|
---|
| 322 | Same as ``st2tuple(st, line_info)``.
|
---|
| 323 |
|
---|
| 324 |
|
---|
[391] | 325 | Example: Emulation of :func:`compile`
|
---|
| 326 | -------------------------------------
|
---|
[2] | 327 |
|
---|
| 328 | While many useful operations may take place between parsing and bytecode
|
---|
| 329 | generation, the simplest operation is to do nothing. For this purpose, using
|
---|
| 330 | the :mod:`parser` module to produce an intermediate data structure is equivalent
|
---|
| 331 | to the code ::
|
---|
| 332 |
|
---|
| 333 | >>> code = compile('a + 5', 'file.py', 'eval')
|
---|
| 334 | >>> a = 5
|
---|
| 335 | >>> eval(code)
|
---|
| 336 | 10
|
---|
| 337 |
|
---|
| 338 | The equivalent operation using the :mod:`parser` module is somewhat longer, and
|
---|
| 339 | allows the intermediate internal parse tree to be retained as an ST object::
|
---|
| 340 |
|
---|
| 341 | >>> import parser
|
---|
| 342 | >>> st = parser.expr('a + 5')
|
---|
| 343 | >>> code = st.compile('file.py')
|
---|
| 344 | >>> a = 5
|
---|
| 345 | >>> eval(code)
|
---|
| 346 | 10
|
---|
| 347 |
|
---|
| 348 | An application which needs both ST and code objects can package this code into
|
---|
| 349 | readily available functions::
|
---|
| 350 |
|
---|
| 351 | import parser
|
---|
| 352 |
|
---|
| 353 | def load_suite(source_string):
|
---|
| 354 | st = parser.suite(source_string)
|
---|
| 355 | return st, st.compile()
|
---|
| 356 |
|
---|
| 357 | def load_expression(source_string):
|
---|
| 358 | st = parser.expr(source_string)
|
---|
| 359 | return st, st.compile()
|
---|