Ignore:
Timestamp:
Mar 19, 2014, 11:31:01 PM (11 years ago)
Author:
dmik
Message:

python: Merge vendor 2.7.6 to trunk.

Location:
python/trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • python/trunk

  • python/trunk/Doc/library/parser.rst

    r2 r391  
    3535   AST and has nothing to do with the AST found in Python 2.5.  This is also the
    3636   reason for the functions' keyword arguments being called *ast*, not *st*.
    37    The "ast" functions will be removed in Python 3.0.
     37   The "ast" functions have been removed in Python 3.
    3838
    3939There are a few things to note about this module which are important to making
     
    122122   to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object
    123123   is created to hold the internal parse tree representation, otherwise an
    124    appropriate exception is thrown.
     124   appropriate exception is raised.
    125125
    126126
     
    130130   to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object
    131131   is created to hold the internal parse tree representation, otherwise an
    132    appropriate exception is thrown.
     132   appropriate exception is raised.
    133133
    134134
     
    140140   Python, an ST object is created from the internal representation and returned
    141141   to the called.  If there is a problem creating the internal representation, or
    142    if the tree cannot be validated, a :exc:`ParserError` exception is thrown.  An
     142   if the tree cannot be validated, a :exc:`ParserError` exception is raised.  An
    143143   ST object created this way should not be assumed to compile correctly; normal
    144    exceptions thrown by compilation may still be initiated when the ST object is
     144   exceptions raised by compilation may still be initiated when the ST object is
    145145   passed to :func:`compilest`.  This may indicate problems not related to syntax
    146146   (such as a :exc:`MemoryError` exception), but may also be due to constructs such
     
    201201
    202202
    203 .. function:: compilest(ast[, filename='<syntax-tree>'])
     203.. function:: compilest(ast, filename='<syntax-tree>')
    204204
    205205   .. index:: builtin: eval
     
    265265
    266266   Exception raised when a failure occurs within the parser module.  This is
    267    generally produced for validation failures rather than the built in
    268    :exc:`SyntaxError` thrown during normal parsing. The exception argument is
     267   generally produced for validation failures rather than the built-in
     268   :exc:`SyntaxError` raised during normal parsing. The exception argument is
    269269   either a string describing the reason of the failure or a tuple containing a
    270270   sequence causing the failure from a parse tree passed to :func:`sequence2st`
     
    274274
    275275Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
    276 throw exceptions which are normally thrown by the parsing and compilation
     276raise exceptions which are normally raised by the parsing and compilation
    277277process.  These include the built in exceptions :exc:`MemoryError`,
    278278:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these
     
    323323
    324324
    325 .. _st-examples:
    326 
    327 Examples
    328 --------
    329 
    330 .. index:: builtin: compile
    331 
    332 The parser modules allows operations to be performed on the parse tree of Python
    333 source code before the :term:`bytecode` is generated, and provides for inspection of the
    334 parse tree for information gathering purposes. Two examples are presented.  The
    335 simple example demonstrates emulation of the :func:`compile` built-in function
    336 and the complex example shows the use of a parse tree for information discovery.
    337 
    338 
    339 Emulation of :func:`compile`
    340 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     325Example: Emulation of :func:`compile`
     326-------------------------------------
    341327
    342328While many useful operations may take place between parsing and bytecode
     
    372358       st = parser.expr(source_string)
    373359       return st, st.compile()
    374 
    375 
    376 Information Discovery
    377 ^^^^^^^^^^^^^^^^^^^^^
    378 
    379 .. index::
    380    single: string; documentation
    381    single: docstrings
    382 
    383 Some applications benefit from direct access to the parse tree.  The remainder
    384 of this section demonstrates how the parse tree provides access to module
    385 documentation defined in docstrings without requiring that the code being
    386 examined be loaded into a running interpreter via :keyword:`import`.  This can
    387 be very useful for performing analyses of untrusted code.
    388 
    389 Generally, the example will demonstrate how the parse tree may be traversed to
    390 distill interesting information.  Two functions and a set of classes are
    391 developed which provide programmatic access to high level function and class
    392 definitions provided by a module.  The classes extract information from the
    393 parse tree and provide access to the information at a useful semantic level, one
    394 function provides a simple low-level pattern matching capability, and the other
    395 function defines a high-level interface to the classes by handling file
    396 operations on behalf of the caller.  All source files mentioned here which are
    397 not part of the Python installation are located in the :file:`Demo/parser/`
    398 directory of the distribution.
    399 
    400 The dynamic nature of Python allows the programmer a great deal of flexibility,
    401 but most modules need only a limited measure of this when defining classes,
    402 functions, and methods.  In this example, the only definitions that will be
    403 considered are those which are defined in the top level of their context, e.g.,
    404 a function defined by a :keyword:`def` statement at column zero of a module, but
    405 not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
    406 construct, though there are some good reasons for doing so in some situations.
    407 Nesting of definitions will be handled by the code developed in the example.
    408 
    409 To construct the upper-level extraction methods, we need to know what the parse
    410 tree structure looks like and how much of it we actually need to be concerned
    411 about.  Python uses a moderately deep parse tree so there are a large number of
    412 intermediate nodes.  It is important to read and understand the formal grammar
    413 used by Python.  This is specified in the file :file:`Grammar/Grammar` in the
    414 distribution. Consider the simplest case of interest when searching for
    415 docstrings: a module consisting of a docstring and nothing else.  (See file
    416 :file:`docstring.py`.) ::
    417 
    418    """Some documentation.
    419    """
    420 
    421 Using the interpreter to take a look at the parse tree, we find a bewildering
    422 mass of numbers and parentheses, with the documentation buried deep in nested
    423 tuples. ::
    424 
    425    >>> import parser
    426    >>> import pprint
    427    >>> st = parser.suite(open('docstring.py').read())
    428    >>> tup = st.totuple()
    429    >>> pprint.pprint(tup)
    430    (257,
    431     (264,
    432      (265,
    433       (266,
    434        (267,
    435         (307,
    436          (287,
    437           (288,
    438            (289,
    439             (290,
    440              (292,
    441               (293,
    442                (294,
    443                 (295,
    444                  (296,
    445                   (297,
    446                    (298,
    447                     (299,
    448                      (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
    449       (4, ''))),
    450     (4, ''),
    451     (0, ''))
    452 
    453 The numbers at the first element of each node in the tree are the node types;
    454 they map directly to terminal and non-terminal symbols in the grammar.
    455 Unfortunately, they are represented as integers in the internal representation,
    456 and the Python structures generated do not change that.  However, the
    457 :mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
    458 and dictionaries which map from the integers to the symbolic names for the node
    459 types.
    460 
    461 In the output presented above, the outermost tuple contains four elements: the
    462 integer ``257`` and three additional tuples.  Node type ``257`` has the symbolic
    463 name :const:`file_input`.  Each of these inner tuples contains an integer as the
    464 first element; these integers, ``264``, ``4``, and ``0``, represent the node
    465 types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
    466 Note that these values may change depending on the version of Python you are
    467 using; consult :file:`symbol.py` and :file:`token.py` for details of the
    468 mapping.  It should be fairly clear that the outermost node is related primarily
    469 to the input source rather than the contents of the file, and may be disregarded
    470 for the moment.  The :const:`stmt` node is much more interesting.  In
    471 particular, all docstrings are found in subtrees which are formed exactly as
    472 this node is formed, with the only difference being the string itself.  The
    473 association between the docstring in a similar tree and the defined entity
    474 (class, function, or module) which it describes is given by the position of the
    475 docstring subtree within the tree defining the described structure.
    476 
    477 By replacing the actual docstring with something to signify a variable component
    478 of the tree, we allow a simple pattern matching approach to check any given
    479 subtree for equivalence to the general pattern for docstrings.  Since the
    480 example demonstrates information extraction, we can safely require that the tree
    481 be in tuple form rather than list form, allowing a simple variable
    482 representation to be ``['variable_name']``.  A simple recursive function can
    483 implement the pattern matching, returning a Boolean and a dictionary of variable
    484 name to value mappings.  (See file :file:`example.py`.) ::
    485 
    486    from types import ListType, TupleType
    487 
    488    def match(pattern, data, vars=None):
    489        if vars is None:
    490            vars = {}
    491        if type(pattern) is ListType:
    492            vars[pattern[0]] = data
    493            return 1, vars
    494        if type(pattern) is not TupleType:
    495            return (pattern == data), vars
    496        if len(data) != len(pattern):
    497            return 0, vars
    498        for pattern, data in map(None, pattern, data):
    499            same, vars = match(pattern, data, vars)
    500            if not same:
    501                break
    502        return same, vars
    503 
    504 Using this simple representation for syntactic variables and the symbolic node
    505 types, the pattern for the candidate docstring subtrees becomes fairly readable.
    506 (See file :file:`example.py`.) ::
    507 
    508    import symbol
    509    import token
    510 
    511    DOCSTRING_STMT_PATTERN = (
    512        symbol.stmt,
    513        (symbol.simple_stmt,
    514         (symbol.small_stmt,
    515          (symbol.expr_stmt,
    516           (symbol.testlist,
    517            (symbol.test,
    518             (symbol.and_test,
    519              (symbol.not_test,
    520               (symbol.comparison,
    521                (symbol.expr,
    522                 (symbol.xor_expr,
    523                  (symbol.and_expr,
    524                   (symbol.shift_expr,
    525                    (symbol.arith_expr,
    526                     (symbol.term,
    527                      (symbol.factor,
    528                       (symbol.power,
    529                        (symbol.atom,
    530                         (token.STRING, ['docstring'])
    531                         )))))))))))))))),
    532         (token.NEWLINE, '')
    533         ))
    534 
    535 Using the :func:`match` function with this pattern, extracting the module
    536 docstring from the parse tree created previously is easy::
    537 
    538    >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
    539    >>> found
    540    1
    541    >>> vars
    542    {'docstring': '"""Some documentation.\n"""'}
    543 
    544 Once specific data can be extracted from a location where it is expected, the
    545 question of where information can be expected needs to be answered.  When
    546 dealing with docstrings, the answer is fairly simple: the docstring is the first
    547 :const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
    548 types).  A module consists of a single :const:`file_input` node, and class and
    549 function definitions each contain exactly one :const:`suite` node.  Classes and
    550 functions are readily identified as subtrees of code block nodes which start
    551 with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
    552 (funcdef, ...``.  Note that these subtrees cannot be matched by :func:`match`
    553 since it does not support multiple sibling nodes to match without regard to
    554 number.  A more elaborate matching function could be used to overcome this
    555 limitation, but this is sufficient for the example.
    556 
    557 Given the ability to determine whether a statement might be a docstring and
    558 extract the actual string from the statement, some work needs to be performed to
    559 walk the parse tree for an entire module and extract information about the names
    560 defined in each context of the module and associate any docstrings with the
    561 names.  The code to perform this work is not complicated, but bears some
    562 explanation.
    563 
    564 The public interface to the classes is straightforward and should probably be
    565 somewhat more flexible.  Each "major" block of the module is described by an
    566 object providing several methods for inquiry and a constructor which accepts at
    567 least the subtree of the complete parse tree which it represents.  The
    568 :class:`ModuleInfo` constructor accepts an optional *name* parameter since it
    569 cannot otherwise determine the name of the module.
    570 
    571 The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
    572 :class:`ModuleInfo`.  All objects provide the methods :meth:`get_name`,
    573 :meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`.  The
    574 :class:`ClassInfo` objects support :meth:`get_method_names` and
    575 :meth:`get_method_info` while the other classes provide
    576 :meth:`get_function_names` and :meth:`get_function_info`.
    577 
    578 Within each of the forms of code block that the public classes represent, most
    579 of the required information is in the same form and is accessed in the same way,
    580 with classes having the distinction that functions defined at the top level are
    581 referred to as "methods." Since the difference in nomenclature reflects a real
    582 semantic distinction from functions defined outside of a class, the
    583 implementation needs to maintain the distinction. Hence, most of the
    584 functionality of the public classes can be implemented in a common base class,
    585 :class:`SuiteInfoBase`, with the accessors for function and method information
    586 provided elsewhere. Note that there is only one class which represents function
    587 and method information; this parallels the use of the :keyword:`def` statement
    588 to define both types of elements.
    589 
    590 Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
    591 need to be overridden by subclasses.  More importantly, the extraction of most
    592 information from a parse tree is handled through a method called by the
    593 :class:`SuiteInfoBase` constructor.  The example code for most of the classes is
    594 clear when read alongside the formal grammar, but the method which recursively
    595 creates new information objects requires further examination.  Here is the
    596 relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
    597 
    598    class SuiteInfoBase:
    599        _docstring = ''
    600        _name = ''
    601 
    602        def __init__(self, tree = None):
    603            self._class_info = {}
    604            self._function_info = {}
    605            if tree:
    606                self._extract_info(tree)
    607 
    608        def _extract_info(self, tree):
    609            # extract docstring
    610            if len(tree) == 2:
    611                found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
    612            else:
    613                found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
    614            if found:
    615                self._docstring = eval(vars['docstring'])
    616            # discover inner definitions
    617            for node in tree[1:]:
    618                found, vars = match(COMPOUND_STMT_PATTERN, node)
    619                if found:
    620                    cstmt = vars['compound']
    621                    if cstmt[0] == symbol.funcdef:
    622                        name = cstmt[2][1]
    623                        self._function_info[name] = FunctionInfo(cstmt)
    624                    elif cstmt[0] == symbol.classdef:
    625                        name = cstmt[2][1]
    626                        self._class_info[name] = ClassInfo(cstmt)
    627 
    628 After initializing some internal state, the constructor calls the
    629 :meth:`_extract_info` method.  This method performs the bulk of the information
    630 extraction which takes place in the entire example.  The extraction has two
    631 distinct phases: the location of the docstring for the parse tree passed in, and
    632 the discovery of additional definitions within the code block represented by the
    633 parse tree.
    634 
    635 The initial :keyword:`if` test determines whether the nested suite is of the
    636 "short form" or the "long form."  The short form is used when the code block is
    637 on the same line as the definition of the code block, as in ::
    638 
    639    def square(x): "Square an argument."; return x ** 2
    640 
    641 while the long form uses an indented block and allows nested definitions::
    642 
    643    def make_power(exp):
    644        "Make a function that raises an argument to the exponent `exp`."
    645        def raiser(x, y=exp):
    646            return x ** y
    647        return raiser
    648 
    649 When the short form is used, the code block may contain a docstring as the
    650 first, and possibly only, :const:`small_stmt` element.  The extraction of such a
    651 docstring is slightly different and requires only a portion of the complete
    652 pattern used in the more common case.  As implemented, the docstring will only
    653 be found if there is only one :const:`small_stmt` node in the
    654 :const:`simple_stmt` node. Since most functions and methods which use the short
    655 form do not provide a docstring, this may be considered sufficient.  The
    656 extraction of the docstring proceeds using the :func:`match` function as
    657 described above, and the value of the docstring is stored as an attribute of the
    658 :class:`SuiteInfoBase` object.
    659 
    660 After docstring extraction, a simple definition discovery algorithm operates on
    661 the :const:`stmt` nodes of the :const:`suite` node.  The special case of the
    662 short form is not tested; since there are no :const:`stmt` nodes in the short
    663 form, the algorithm will silently skip the single :const:`simple_stmt` node and
    664 correctly not discover any nested definitions.
    665 
    666 Each statement in the code block is categorized as a class definition, function
    667 or method definition, or something else.  For the definition statements, the
    668 name of the element defined is extracted and a representation object appropriate
    669 to the definition is created with the defining subtree passed as an argument to
    670 the constructor.  The representation objects are stored in instance variables
    671 and may be retrieved by name using the appropriate accessor methods.
    672 
    673 The public classes provide any accessors required which are more specific than
    674 those provided by the :class:`SuiteInfoBase` class, but the real extraction
    675 algorithm remains common to all forms of code blocks.  A high-level function can
    676 be used to extract the complete set of information from a source file.  (See
    677 file :file:`example.py`.) ::
    678 
    679    def get_docs(fileName):
    680        import os
    681        import parser
    682 
    683        source = open(fileName).read()
    684        basename = os.path.basename(os.path.splitext(fileName)[0])
    685        st = parser.suite(source)
    686        return ModuleInfo(st.totuple(), basename)
    687 
    688 This provides an easy-to-use interface to the documentation of a module.  If
    689 information is required which is not extracted by the code of this example, the
    690 code may be extended at clearly defined points to provide additional
    691 capabilities.
    692 
Note: See TracChangeset for help on using the changeset viewer.