Context Navigation

libparser.tex

Visit:

Last change on this file was 3225, checked in by bird, 18 years ago
Python 2.5
File size: 30.3 KB

Line
1	\section{\module{parser} ---
2	Access Python parse trees}
3
4	% Copyright 1995 Virginia Polytechnic Institute and State University
5	% and Fred L. Drake, Jr. This copyright notice must be distributed on
6	% all copies, but this document otherwise may be distributed as part
7	% of the Python distribution. No fee may be charged for this document
8	% in any representation, either on paper or electronically. This
9	% restriction does not affect other elements in a distributed package
10	% in any way.
11
12	\declaremodule{builtin}{parser}
13	\modulesynopsis{Access parse trees for Python source code.}
14	\moduleauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
15	\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
16
17
18	\index{parsing!Python source code}
19
20	The \module{parser} module provides an interface to Python's internal
21	parser and byte-code compiler. The primary purpose for this interface
22	is to allow Python code to edit the parse tree of a Python expression
23	and create executable code from this. This is better than trying
24	to parse and modify an arbitrary Python code fragment as a string
25	because parsing is performed in a manner identical to the code
26	forming the application. It is also faster.
27
28	There are a few things to note about this module which are important
29	to making use of the data structures created. This is not a tutorial
30	on editing the parse trees for Python code, but some examples of using
31	the \module{parser} module are presented.
32
33	Most importantly, a good understanding of the Python grammar processed
34	by the internal parser is required. For full information on the
35	language syntax, refer to the \citetitle[../ref/ref.html]{Python
36	Language Reference}. The parser itself is created from a grammar
37	specification defined in the file \file{Grammar/Grammar} in the
38	standard Python distribution. The parse trees stored in the AST
39	objects created by this module are the actual output from the internal
40	parser when created by the \function{expr()} or \function{suite()}
41	functions, described below. The AST objects created by
42	\function{sequence2ast()} faithfully simulate those structures. Be
43	aware that the values of the sequences which are considered
44	``correct'' will vary from one version of Python to another as the
45	formal grammar for the language is revised. However, transporting
46	code from one Python version to another as source text will always
47	allow correct parse trees to be created in the target version, with
48	the only restriction being that migrating to an older version of the
49	interpreter will not support more recent language constructs. The
50	parse trees are not typically compatible from one version to another,
51	whereas source code has always been forward-compatible.
52
53	Each element of the sequences returned by \function{ast2list()} or
54	\function{ast2tuple()} has a simple form. Sequences representing
55	non-terminal elements in the grammar always have a length greater than
56	one. The first element is an integer which identifies a production in
57	the grammar. These integers are given symbolic names in the C header
58	file \file{Include/graminit.h} and the Python module
59	\refmodule{symbol}. Each additional element of the sequence represents
60	a component of the production as recognized in the input string: these
61	are always sequences which have the same form as the parent. An
62	important aspect of this structure which should be noted is that
63	keywords used to identify the parent node type, such as the keyword
64	\keyword{if} in an \constant{if_stmt}, are included in the node tree without
65	any special treatment. For example, the \keyword{if} keyword is
66	represented by the tuple \code{(1, 'if')}, where \code{1} is the
67	numeric value associated with all \constant{NAME} tokens, including
68	variable and function names defined by the user. In an alternate form
69	returned when line number information is requested, the same token
70	might be represented as \code{(1, 'if', 12)}, where the \code{12}
71	represents the line number at which the terminal symbol was found.
72
73	Terminal elements are represented in much the same way, but without
74	any child elements and the addition of the source text which was
75	identified. The example of the \keyword{if} keyword above is
76	representative. The various types of terminal symbols are defined in
77	the C header file \file{Include/token.h} and the Python module
78	\refmodule{token}.
79
80	The AST objects are not required to support the functionality of this
81	module, but are provided for three purposes: to allow an application
82	to amortize the cost of processing complex parse trees, to provide a
83	parse tree representation which conserves memory space when compared
84	to the Python list or tuple representation, and to ease the creation
85	of additional modules in C which manipulate parse trees. A simple
86	``wrapper'' class may be created in Python to hide the use of AST
87	objects.
88
89	The \module{parser} module defines functions for a few distinct
90	purposes. The most important purposes are to create AST objects and
91	to convert AST objects to other representations such as parse trees
92	and compiled code objects, but there are also functions which serve to
93	query the type of parse tree represented by an AST object.
94
95
96	\begin{seealso}
97	\seemodule{symbol}{Useful constants representing internal nodes of
98	the parse tree.}
99	\seemodule{token}{Useful constants representing leaf nodes of the
100	parse tree and functions for testing node values.}
101	\end{seealso}
102
103
104	\subsection{Creating AST Objects \label{Creating ASTs}}
105
106	AST objects may be created from source code or from a parse tree.
107	When creating an AST object from source, different functions are used
108	to create the \code{'eval'} and \code{'exec'} forms.
109
110	\begin{funcdesc}{expr}{source}
111	The \function{expr()} function parses the parameter \var{source}
112	as if it were an input to \samp{compile(\var{source}, 'file.py',
113	'eval')}. If the parse succeeds, an AST object is created to hold the
114	internal parse tree representation, otherwise an appropriate exception
115	is thrown.
116	\end{funcdesc}
117
118	\begin{funcdesc}{suite}{source}
119	The \function{suite()} function parses the parameter \var{source}
120	as if it were an input to \samp{compile(\var{source}, 'file.py',
121	'exec')}. If the parse succeeds, an AST object is created to hold the
122	internal parse tree representation, otherwise an appropriate exception
123	is thrown.
124	\end{funcdesc}
125
126	\begin{funcdesc}{sequence2ast}{sequence}
127	This function accepts a parse tree represented as a sequence and
128	builds an internal representation if possible. If it can validate
129	that the tree conforms to the Python grammar and all nodes are valid
130	node types in the host version of Python, an AST object is created
131	from the internal representation and returned to the called. If there
132	is a problem creating the internal representation, or if the tree
133	cannot be validated, a \exception{ParserError} exception is thrown. An AST
134	object created this way should not be assumed to compile correctly;
135	normal exceptions thrown by compilation may still be initiated when
136	the AST object is passed to \function{compileast()}. This may indicate
137	problems not related to syntax (such as a \exception{MemoryError}
138	exception), but may also be due to constructs such as the result of
139	parsing \code{del f(0)}, which escapes the Python parser but is
140	checked by the bytecode compiler.
141
142	Sequences representing terminal tokens may be represented as either
143	two-element lists of the form \code{(1, 'name')} or as three-element
144	lists of the form \code{(1, 'name', 56)}. If the third element is
145	present, it is assumed to be a valid line number. The line number
146	may be specified for any subset of the terminal symbols in the input
147	tree.
148	\end{funcdesc}
149
150	\begin{funcdesc}{tuple2ast}{sequence}
151	This is the same function as \function{sequence2ast()}. This entry point
152	is maintained for backward compatibility.
153	\end{funcdesc}
154
155
156	\subsection{Converting AST Objects \label{Converting ASTs}}
157
158	AST objects, regardless of the input used to create them, may be
159	converted to parse trees represented as list- or tuple- trees, or may
160	be compiled into executable code objects. Parse trees may be
161	extracted with or without line numbering information.
162
163	\begin{funcdesc}{ast2list}{ast\optional{, line_info}}
164	This function accepts an AST object from the caller in
165	\var{ast} and returns a Python list representing the
166	equivalent parse tree. The resulting list representation can be used
167	for inspection or the creation of a new parse tree in list form. This
168	function does not fail so long as memory is available to build the
169	list representation. If the parse tree will only be used for
170	inspection, \function{ast2tuple()} should be used instead to reduce memory
171	consumption and fragmentation. When the list representation is
172	required, this function is significantly faster than retrieving a
173	tuple representation and converting that to nested lists.
174
175	If \var{line_info} is true, line number information will be
176	included for all terminal tokens as a third element of the list
177	representing the token. Note that the line number provided specifies
178	the line on which the token \emph{ends}. This information is
179	omitted if the flag is false or omitted.
180	\end{funcdesc}
181
182	\begin{funcdesc}{ast2tuple}{ast\optional{, line_info}}
183	This function accepts an AST object from the caller in
184	\var{ast} and returns a Python tuple representing the
185	equivalent parse tree. Other than returning a tuple instead of a
186	list, this function is identical to \function{ast2list()}.
187
188	If \var{line_info} is true, line number information will be
189	included for all terminal tokens as a third element of the list
190	representing the token. This information is omitted if the flag is
191	false or omitted.
192	\end{funcdesc}
193
194	\begin{funcdesc}{compileast}{ast\optional{, filename\code{ = '<ast>'}}}
195	The Python byte compiler can be invoked on an AST object to produce
196	code objects which can be used as part of an \keyword{exec} statement or
197	a call to the built-in \function{eval()}\bifuncindex{eval} function.
198	This function provides the interface to the compiler, passing the
199	internal parse tree from \var{ast} to the parser, using the
200	source file name specified by the \var{filename} parameter.
201	The default value supplied for \var{filename} indicates that
202	the source was an AST object.
203
204	Compiling an AST object may result in exceptions related to
205	compilation; an example would be a \exception{SyntaxError} caused by the
206	parse tree for \code{del f(0)}: this statement is considered legal
207	within the formal grammar for Python but is not a legal language
208	construct. The \exception{SyntaxError} raised for this condition is
209	actually generated by the Python byte-compiler normally, which is why
210	it can be raised at this point by the \module{parser} module. Most
211	causes of compilation failure can be diagnosed programmatically by
212	inspection of the parse tree.
213	\end{funcdesc}
214
215
216	\subsection{Queries on AST Objects \label{Querying ASTs}}
217
218	Two functions are provided which allow an application to determine if
219	an AST was created as an expression or a suite. Neither of these
220	functions can be used to determine if an AST was created from source
221	code via \function{expr()} or \function{suite()} or from a parse tree
222	via \function{sequence2ast()}.
223
224	\begin{funcdesc}{isexpr}{ast}
225	When \var{ast} represents an \code{'eval'} form, this function
226	returns true, otherwise it returns false. This is useful, since code
227	objects normally cannot be queried for this information using existing
228	built-in functions. Note that the code objects created by
229	\function{compileast()} cannot be queried like this either, and are
230	identical to those created by the built-in
231	\function{compile()}\bifuncindex{compile} function.
232	\end{funcdesc}
233
234
235	\begin{funcdesc}{issuite}{ast}
236	This function mirrors \function{isexpr()} in that it reports whether an
237	AST object represents an \code{'exec'} form, commonly known as a
238	``suite.'' It is not safe to assume that this function is equivalent
239	to \samp{not isexpr(\var{ast})}, as additional syntactic fragments may
240	be supported in the future.
241	\end{funcdesc}
242
243
244	\subsection{Exceptions and Error Handling \label{AST Errors}}
245
246	The parser module defines a single exception, but may also pass other
247	built-in exceptions from other portions of the Python runtime
248	environment. See each function for information about the exceptions
249	it can raise.
250
251	\begin{excdesc}{ParserError}
252	Exception raised when a failure occurs within the parser module. This
253	is generally produced for validation failures rather than the built in
254	\exception{SyntaxError} thrown during normal parsing.
255	The exception argument is either a string describing the reason of the
256	failure or a tuple containing a sequence causing the failure from a parse
257	tree passed to \function{sequence2ast()} and an explanatory string. Calls to
258	\function{sequence2ast()} need to be able to handle either type of exception,
259	while calls to other functions in the module will only need to be
260	aware of the simple string values.
261	\end{excdesc}
262
263	Note that the functions \function{compileast()}, \function{expr()}, and
264	\function{suite()} may throw exceptions which are normally thrown by the
265	parsing and compilation process. These include the built in
266	exceptions \exception{MemoryError}, \exception{OverflowError},
267	\exception{SyntaxError}, and \exception{SystemError}. In these cases, these
268	exceptions carry all the meaning normally associated with them. Refer
269	to the descriptions of each function for detailed information.
270
271
272	\subsection{AST Objects \label{AST Objects}}
273
274	Ordered and equality comparisons are supported between AST objects.
275	Pickling of AST objects (using the \refmodule{pickle} module) is also
276	supported.
277
278	\begin{datadesc}{ASTType}
279	The type of the objects returned by \function{expr()},
280	\function{suite()} and \function{sequence2ast()}.
281	\end{datadesc}
282
283
284	AST objects have the following methods:
285
286
287	\begin{methoddesc}[AST]{compile}{\optional{filename}}
288	Same as \code{compileast(\var{ast}, \var{filename})}.
289	\end{methoddesc}
290
291	\begin{methoddesc}[AST]{isexpr}{}
292	Same as \code{isexpr(\var{ast})}.
293	\end{methoddesc}
294
295	\begin{methoddesc}[AST]{issuite}{}
296	Same as \code{issuite(\var{ast})}.
297	\end{methoddesc}
298
299	\begin{methoddesc}[AST]{tolist}{\optional{line_info}}
300	Same as \code{ast2list(\var{ast}, \var{line_info})}.
301	\end{methoddesc}
302
303	\begin{methoddesc}[AST]{totuple}{\optional{line_info}}
304	Same as \code{ast2tuple(\var{ast}, \var{line_info})}.
305	\end{methoddesc}
306
307
308	\subsection{Examples \label{AST Examples}}
309
310	The parser modules allows operations to be performed on the parse tree
311	of Python source code before the bytecode is generated, and provides
312	for inspection of the parse tree for information gathering purposes.
313	Two examples are presented. The simple example demonstrates emulation
314	of the \function{compile()}\bifuncindex{compile} built-in function and
315	the complex example shows the use of a parse tree for information
316	discovery.
317
318	\subsubsection{Emulation of \function{compile()}}
319
320	While many useful operations may take place between parsing and
321	bytecode generation, the simplest operation is to do nothing. For
322	this purpose, using the \module{parser} module to produce an
323	intermediate data structure is equivalent to the code
324
325	\begin{verbatim}
326	>>> code = compile('a + 5', 'file.py', 'eval')
327	>>> a = 5
328	>>> eval(code)
329	10
330	\end{verbatim}
331
332	The equivalent operation using the \module{parser} module is somewhat
333	longer, and allows the intermediate internal parse tree to be retained
334	as an AST object:
335
336	\begin{verbatim}
337	>>> import parser
338	>>> ast = parser.expr('a + 5')
339	>>> code = ast.compile('file.py')
340	>>> a = 5
341	>>> eval(code)
342	10
343	\end{verbatim}
344
345	An application which needs both AST and code objects can package this
346	code into readily available functions:
347
348	\begin{verbatim}
349	import parser
350
351	def load_suite(source_string):
352	ast = parser.suite(source_string)
353	return ast, ast.compile()
354
355	def load_expression(source_string):
356	ast = parser.expr(source_string)
357	return ast, ast.compile()
358	\end{verbatim}
359
360	\subsubsection{Information Discovery}
361
362	Some applications benefit from direct access to the parse tree. The
363	remainder of this section demonstrates how the parse tree provides
364	access to module documentation defined in
365	docstrings\index{string!documentation}\index{docstrings} without
366	requiring that the code being examined be loaded into a running
367	interpreter via \keyword{import}. This can be very useful for
368	performing analyses of untrusted code.
369
370	Generally, the example will demonstrate how the parse tree may be
371	traversed to distill interesting information. Two functions and a set
372	of classes are developed which provide programmatic access to high
373	level function and class definitions provided by a module. The
374	classes extract information from the parse tree and provide access to
375	the information at a useful semantic level, one function provides a
376	simple low-level pattern matching capability, and the other function
377	defines a high-level interface to the classes by handling file
378	operations on behalf of the caller. All source files mentioned here
379	which are not part of the Python installation are located in the
380	\file{Demo/parser/} directory of the distribution.
381
382	The dynamic nature of Python allows the programmer a great deal of
383	flexibility, but most modules need only a limited measure of this when
384	defining classes, functions, and methods. In this example, the only
385	definitions that will be considered are those which are defined in the
386	top level of their context, e.g., a function defined by a \keyword{def}
387	statement at column zero of a module, but not a function defined
388	within a branch of an \keyword{if} ... \keyword{else} construct, though
389	there are some good reasons for doing so in some situations. Nesting
390	of definitions will be handled by the code developed in the example.
391
392	To construct the upper-level extraction methods, we need to know what
393	the parse tree structure looks like and how much of it we actually
394	need to be concerned about. Python uses a moderately deep parse tree
395	so there are a large number of intermediate nodes. It is important to
396	read and understand the formal grammar used by Python. This is
397	specified in the file \file{Grammar/Grammar} in the distribution.
398	Consider the simplest case of interest when searching for docstrings:
399	a module consisting of a docstring and nothing else. (See file
400	\file{docstring.py}.)
401
402	\begin{verbatim}
403	"""Some documentation.
404	"""
405	\end{verbatim}
406
407	Using the interpreter to take a look at the parse tree, we find a
408	bewildering mass of numbers and parentheses, with the documentation
409	buried deep in nested tuples.
410
411	\begin{verbatim}
412	>>> import parser
413	>>> import pprint
414	>>> ast = parser.suite(open('docstring.py').read())
415	>>> tup = ast.totuple()
416	>>> pprint.pprint(tup)
417	(257,
418	(264,
419	(265,
420	(266,
421	(267,
422	(307,
423	(287,
424	(288,
425	(289,
426	(290,
427	(292,
428	(293,
429	(294,
430	(295,
431	(296,
432	(297,
433	(298,
434	(299,
435	(300, (3, '"""Some documentation.\n"""'))))))))))))))))),
436	(4, ''))),
437	(4, ''),
438	(0, ''))
439	\end{verbatim}
440
441	The numbers at the first element of each node in the tree are the node
442	types; they map directly to terminal and non-terminal symbols in the
443	grammar. Unfortunately, they are represented as integers in the
444	internal representation, and the Python structures generated do not
445	change that. However, the \refmodule{symbol} and \refmodule{token} modules
446	provide symbolic names for the node types and dictionaries which map
447	from the integers to the symbolic names for the node types.
448
449	In the output presented above, the outermost tuple contains four
450	elements: the integer \code{257} and three additional tuples. Node
451	type \code{257} has the symbolic name \constant{file_input}. Each of
452	these inner tuples contains an integer as the first element; these
453	integers, \code{264}, \code{4}, and \code{0}, represent the node types
454	\constant{stmt}, \constant{NEWLINE}, and \constant{ENDMARKER},
455	respectively.
456	Note that these values may change depending on the version of Python
457	you are using; consult \file{symbol.py} and \file{token.py} for
458	details of the mapping. It should be fairly clear that the outermost
459	node is related primarily to the input source rather than the contents
460	of the file, and may be disregarded for the moment. The \constant{stmt}
461	node is much more interesting. In particular, all docstrings are
462	found in subtrees which are formed exactly as this node is formed,
463	with the only difference being the string itself. The association
464	between the docstring in a similar tree and the defined entity (class,
465	function, or module) which it describes is given by the position of
466	the docstring subtree within the tree defining the described
467	structure.
468
469	By replacing the actual docstring with something to signify a variable
470	component of the tree, we allow a simple pattern matching approach to
471	check any given subtree for equivalence to the general pattern for
472	docstrings. Since the example demonstrates information extraction, we
473	can safely require that the tree be in tuple form rather than list
474	form, allowing a simple variable representation to be
475	\code{['variable_name']}. A simple recursive function can implement
476	the pattern matching, returning a Boolean and a dictionary of variable
477	name to value mappings. (See file \file{example.py}.)
478
479	\begin{verbatim}
480	from types import ListType, TupleType
481
482	def match(pattern, data, vars=None):
483	if vars is None:
484	vars = {}
485	if type(pattern) is ListType:
486	vars[pattern[0]] = data
487	return 1, vars
488	if type(pattern) is not TupleType:
489	return (pattern == data), vars
490	if len(data) != len(pattern):
491	return 0, vars
492	for pattern, data in map(None, pattern, data):
493	same, vars = match(pattern, data, vars)
494	if not same:
495	break
496	return same, vars
497	\end{verbatim}
498
499	Using this simple representation for syntactic variables and the symbolic
500	node types, the pattern for the candidate docstring subtrees becomes
501	fairly readable. (See file \file{example.py}.)
502
503	\begin{verbatim}
504	import symbol
505	import token
506
507	DOCSTRING_STMT_PATTERN = (
508	symbol.stmt,
509	(symbol.simple_stmt,
510	(symbol.small_stmt,
511	(symbol.expr_stmt,
512	(symbol.testlist,
513	(symbol.test,
514	(symbol.and_test,
515	(symbol.not_test,
516	(symbol.comparison,
517	(symbol.expr,
518	(symbol.xor_expr,
519	(symbol.and_expr,
520	(symbol.shift_expr,
521	(symbol.arith_expr,
522	(symbol.term,
523	(symbol.factor,
524	(symbol.power,
525	(symbol.atom,
526	(token.STRING, ['docstring'])
527	)))))))))))))))),
528	(token.NEWLINE, '')
529	))
530	\end{verbatim}
531
532	Using the \function{match()} function with this pattern, extracting the
533	module docstring from the parse tree created previously is easy:
534
535	\begin{verbatim}
536	>>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
537	>>> found
538	1
539	>>> vars
540	{'docstring': '"""Some documentation.\n"""'}
541	\end{verbatim}
542
543	Once specific data can be extracted from a location where it is
544	expected, the question of where information can be expected
545	needs to be answered. When dealing with docstrings, the answer is
546	fairly simple: the docstring is the first \constant{stmt} node in a code
547	block (\constant{file_input} or \constant{suite} node types). A module
548	consists of a single \constant{file_input} node, and class and function
549	definitions each contain exactly one \constant{suite} node. Classes and
550	functions are readily identified as subtrees of code block nodes which
551	start with \code{(stmt, (compound_stmt, (classdef, ...} or
552	\code{(stmt, (compound_stmt, (funcdef, ...}. Note that these subtrees
553	cannot be matched by \function{match()} since it does not support multiple
554	sibling nodes to match without regard to number. A more elaborate
555	matching function could be used to overcome this limitation, but this
556	is sufficient for the example.
557
558	Given the ability to determine whether a statement might be a
559	docstring and extract the actual string from the statement, some work
560	needs to be performed to walk the parse tree for an entire module and
561	extract information about the names defined in each context of the
562	module and associate any docstrings with the names. The code to
563	perform this work is not complicated, but bears some explanation.
564
565	The public interface to the classes is straightforward and should
566	probably be somewhat more flexible. Each ``major'' block of the
567	module is described by an object providing several methods for inquiry
568	and a constructor which accepts at least the subtree of the complete
569	parse tree which it represents. The \class{ModuleInfo} constructor
570	accepts an optional \var{name} parameter since it cannot
571	otherwise determine the name of the module.
572
573	The public classes include \class{ClassInfo}, \class{FunctionInfo},
574	and \class{ModuleInfo}. All objects provide the
575	methods \method{get_name()}, \method{get_docstring()},
576	\method{get_class_names()}, and \method{get_class_info()}. The
577	\class{ClassInfo} objects support \method{get_method_names()} and
578	\method{get_method_info()} while the other classes provide
579	\method{get_function_names()} and \method{get_function_info()}.
580
581	Within each of the forms of code block that the public classes
582	represent, most of the required information is in the same form and is
583	accessed in the same way, with classes having the distinction that
584	functions defined at the top level are referred to as ``methods.''
585	Since the difference in nomenclature reflects a real semantic
586	distinction from functions defined outside of a class, the
587	implementation needs to maintain the distinction.
588	Hence, most of the functionality of the public classes can be
589	implemented in a common base class, \class{SuiteInfoBase}, with the
590	accessors for function and method information provided elsewhere.
591	Note that there is only one class which represents function and method
592	information; this parallels the use of the \keyword{def} statement to
593	define both types of elements.
594
595	Most of the accessor functions are declared in \class{SuiteInfoBase}
596	and do not need to be overridden by subclasses. More importantly, the
597	extraction of most information from a parse tree is handled through a
598	method called by the \class{SuiteInfoBase} constructor. The example
599	code for most of the classes is clear when read alongside the formal
600	grammar, but the method which recursively creates new information
601	objects requires further examination. Here is the relevant part of
602	the \class{SuiteInfoBase} definition from \file{example.py}:
603
604	\begin{verbatim}
605	class SuiteInfoBase:
606	_docstring = ''
607	_name = ''
608
609	def __init__(self, tree = None):
610	self._class_info = {}
611	self._function_info = {}
612	if tree:
613	self._extract_info(tree)
614
615	def _extract_info(self, tree):
616	# extract docstring
617	if len(tree) == 2:
618	found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
619	else:
620	found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
621	if found:
622	self._docstring = eval(vars['docstring'])
623	# discover inner definitions
624	for node in tree[1:]:
625	found, vars = match(COMPOUND_STMT_PATTERN, node)
626	if found:
627	cstmt = vars['compound']
628	if cstmt[0] == symbol.funcdef:
629	name = cstmt[2][1]
630	self._function_info[name] = FunctionInfo(cstmt)
631	elif cstmt[0] == symbol.classdef:
632	name = cstmt[2][1]
633	self._class_info[name] = ClassInfo(cstmt)
634	\end{verbatim}
635
636	After initializing some internal state, the constructor calls the
637	\method{_extract_info()} method. This method performs the bulk of the
638	information extraction which takes place in the entire example. The
639	extraction has two distinct phases: the location of the docstring for
640	the parse tree passed in, and the discovery of additional definitions
641	within the code block represented by the parse tree.
642
643	The initial \keyword{if} test determines whether the nested suite is of
644	the ``short form'' or the ``long form.'' The short form is used when
645	the code block is on the same line as the definition of the code
646	block, as in
647
648	\begin{verbatim}
649	def square(x): "Square an argument."; return x ** 2
650	\end{verbatim}
651
652	while the long form uses an indented block and allows nested
653	definitions:
654
655	\begin{verbatim}
656	def make_power(exp):
657	"Make a function that raises an argument to the exponent `exp'."
658	def raiser(x, y=exp):
659	return x ** y
660	return raiser
661	\end{verbatim}
662
663	When the short form is used, the code block may contain a docstring as
664	the first, and possibly only, \constant{small_stmt} element. The
665	extraction of such a docstring is slightly different and requires only
666	a portion of the complete pattern used in the more common case. As
667	implemented, the docstring will only be found if there is only
668	one \constant{small_stmt} node in the \constant{simple_stmt} node.
669	Since most functions and methods which use the short form do not
670	provide a docstring, this may be considered sufficient. The
671	extraction of the docstring proceeds using the \function{match()} function
672	as described above, and the value of the docstring is stored as an
673	attribute of the \class{SuiteInfoBase} object.
674
675	After docstring extraction, a simple definition discovery
676	algorithm operates on the \constant{stmt} nodes of the
677	\constant{suite} node. The special case of the short form is not
678	tested; since there are no \constant{stmt} nodes in the short form,
679	the algorithm will silently skip the single \constant{simple_stmt}
680	node and correctly not discover any nested definitions.
681
682	Each statement in the code block is categorized as
683	a class definition, function or method definition, or
684	something else. For the definition statements, the name of the
685	element defined is extracted and a representation object
686	appropriate to the definition is created with the defining subtree
687	passed as an argument to the constructor. The representation objects
688	are stored in instance variables and may be retrieved by name using
689	the appropriate accessor methods.
690
691	The public classes provide any accessors required which are more
692	specific than those provided by the \class{SuiteInfoBase} class, but
693	the real extraction algorithm remains common to all forms of code
694	blocks. A high-level function can be used to extract the complete set
695	of information from a source file. (See file \file{example.py}.)
696
697	\begin{verbatim}
698	def get_docs(fileName):
699	import os
700	import parser
701
702	source = open(fileName).read()
703	basename = os.path.basename(os.path.splitext(fileName)[0])
704	ast = parser.suite(source)
705	return ModuleInfo(ast.totuple(), basename)
706	\end{verbatim}
707
708	This provides an easy-to-use interface to the documentation of a
709	module. If information is required which is not extracted by the code
710	of this example, the code may be extended at clearly defined points to
711	provide additional capabilities.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: vendor/python/2.5/Doc/lib/libparser.tex

Download in other formats: