source: vendor/python/2.5/Doc/tut/tut.tex

Last change on this file was 3225, checked in by bird, 18 years ago

Python 2.5

File size: 206.7 KB
Line 
1\documentclass{manual}
2\usepackage[T1]{fontenc}
3\usepackage{textcomp}
4
5% Things to do:
6% Should really move the Python startup file info to an appendix
7
8\title{Python Tutorial}
9
10\input{boilerplate}
11
12\makeindex
13
14\begin{document}
15
16\maketitle
17
18\ifhtml
19\chapter*{Front Matter\label{front}}
20\fi
21
22\input{copyright}
23
24\begin{abstract}
25
26\noindent
27Python is an easy to learn, powerful programming language. It has
28efficient high-level data structures and a simple but effective
29approach to object-oriented programming. Python's elegant syntax and
30dynamic typing, together with its interpreted nature, make it an ideal
31language for scripting and rapid application development in many areas
32on most platforms.
33
34The Python interpreter and the extensive standard library are freely
35available in source or binary form for all major platforms from the
36Python Web site, \url{http://www.python.org/}, and may be freely
37distributed. The same site also contains distributions of and
38pointers to many free third party Python modules, programs and tools,
39and additional documentation.
40
41The Python interpreter is easily extended with new functions and data
42types implemented in C or \Cpp{} (or other languages callable from C).
43Python is also suitable as an extension language for customizable
44applications.
45
46This tutorial introduces the reader informally to the basic concepts
47and features of the Python language and system. It helps to have a
48Python interpreter handy for hands-on experience, but all examples are
49self-contained, so the tutorial can be read off-line as well.
50
51For a description of standard objects and modules, see the
52\citetitle[../lib/lib.html]{Python Library Reference} document. The
53\citetitle[../ref/ref.html]{Python Reference Manual} gives a more
54formal definition of the language. To write extensions in C or
55\Cpp, read \citetitle[../ext/ext.html]{Extending and Embedding the
56Python Interpreter} and \citetitle[../api/api.html]{Python/C API
57Reference}. There are also several books covering Python in depth.
58
59This tutorial does not attempt to be comprehensive and cover every
60single feature, or even every commonly used feature. Instead, it
61introduces many of Python's most noteworthy features, and will give
62you a good idea of the language's flavor and style. After reading it,
63you will be able to read and write Python modules and programs, and
64you will be ready to learn more about the various Python library
65modules described in the \citetitle[../lib/lib.html]{Python Library
66Reference}.
67
68\end{abstract}
69
70\tableofcontents
71
72
73\chapter{Whetting Your Appetite \label{intro}}
74
75If you do much work on computers, eventually you find that there's
76some task you'd like to automate. For example, you may wish to
77perform a search-and-replace over a large number of text files, or
78rename and rearrange a bunch of photo files in a complicated way.
79Perhaps you'd like to write a small custom database, or a specialized
80GUI application, or a simple game.
81
82If you're a professional software developer, you may have to work with
83several C/\Cpp/Java libraries but find the usual
84write/compile/test/re-compile cycle is too slow. Perhaps you're
85writing a test suite for such a library and find writing the testing
86code a tedious task. Or maybe you've written a program that could use
87an extension language, and you don't want to design and implement a
88whole new language for your application.
89
90Python is just the language for you.
91
92You could write a {\UNIX} shell script or Windows batch files for some
93of these tasks, but shell scripts are best at moving around files and
94changing text data, not well-suited for GUI applications or games.
95You could write a C/{\Cpp}/Java program, but it can take a lot of
96development time to get even a first-draft program. Python is simpler
97to use, available on Windows, MacOS X, and {\UNIX} operating systems,
98and will help you get the job done more quickly.
99
100Python is simple to use, but it is a real programming language,
101offering much more structure and support for large programs than shell
102scripts or batch files can offer. On the other hand, Python also
103offers much more error checking than C, and, being a
104\emph{very-high-level language}, it has high-level data types built
105in, such as flexible arrays and dictionaries. Because of its more
106general data types Python is applicable to a much larger problem
107domain than Awk or even Perl, yet many things are at
108least as easy in Python as in those languages.
109
110Python allows you to split your program into modules that can be
111reused in other Python programs. It comes with a large collection of
112standard modules that you can use as the basis of your programs --- or
113as examples to start learning to program in Python. Some of these
114modules provide things like file I/O, system calls,
115sockets, and even interfaces to graphical user interface toolkits like Tk.
116
117Python is an interpreted language, which can save you considerable time
118during program development because no compilation and linking is
119necessary. The interpreter can be used interactively, which makes it
120easy to experiment with features of the language, to write throw-away
121programs, or to test functions during bottom-up program development.
122It is also a handy desk calculator.
123
124Python enables programs to be written compactly and readably. Programs
125written in Python are typically much shorter than equivalent C,
126\Cpp{}, or Java programs, for several reasons:
127\begin{itemize}
128\item
129the high-level data types allow you to express complex operations in a
130single statement;
131\item
132statement grouping is done by indentation instead of beginning and ending
133brackets;
134\item
135no variable or argument declarations are necessary.
136\end{itemize}
137
138Python is \emph{extensible}: if you know how to program in C it is easy
139to add a new built-in function or module to the interpreter, either to
140perform critical operations at maximum speed, or to link Python
141programs to libraries that may only be available in binary form (such
142as a vendor-specific graphics library). Once you are really hooked,
143you can link the Python interpreter into an application written in C
144and use it as an extension or command language for that application.
145
146By the way, the language is named after the BBC show ``Monty Python's
147Flying Circus'' and has nothing to do with nasty reptiles. Making
148references to Monty Python skits in documentation is not only allowed,
149it is encouraged!
150
151%\section{Where From Here \label{where}}
152
153Now that you are all excited about Python, you'll want to examine it
154in some more detail. Since the best way to learn a language is
155to use it, the tutorial invites you to play with the Python interpreter
156as you read.
157
158In the next chapter, the mechanics of using the interpreter are
159explained. This is rather mundane information, but essential for
160trying out the examples shown later.
161
162The rest of the tutorial introduces various features of the Python
163language and system through examples, beginning with simple
164expressions, statements and data types, through functions and modules,
165and finally touching upon advanced concepts like exceptions
166and user-defined classes.
167
168\chapter{Using the Python Interpreter \label{using}}
169
170\section{Invoking the Interpreter \label{invoking}}
171
172The Python interpreter is usually installed as
173\file{/usr/local/bin/python} on those machines where it is available;
174putting \file{/usr/local/bin} in your \UNIX{} shell's search path
175makes it possible to start it by typing the command
176
177\begin{verbatim}
178python
179\end{verbatim}
180
181to the shell. Since the choice of the directory where the interpreter
182lives is an installation option, other places are possible; check with
183your local Python guru or system administrator. (E.g.,
184\file{/usr/local/python} is a popular alternative location.)
185
186On Windows machines, the Python installation is usually placed in
187\file{C:\e Python24}, though you can change this when you're running
188the installer. To add this directory to your path,
189you can type the following command into the command prompt in a DOS box:
190
191\begin{verbatim}
192set path=%path%;C:\python24
193\end{verbatim}
194
195
196Typing an end-of-file character (\kbd{Control-D} on \UNIX,
197\kbd{Control-Z} on Windows) at the primary prompt causes the
198interpreter to exit with a zero exit status. If that doesn't work,
199you can exit the interpreter by typing the following commands:
200\samp{import sys; sys.exit()}.
201
202The interpreter's line-editing features usually aren't very
203sophisticated. On \UNIX, whoever installed the interpreter may have
204enabled support for the GNU readline library, which adds more
205elaborate interactive editing and history features. Perhaps the
206quickest check to see whether command line editing is supported is
207typing Control-P to the first Python prompt you get. If it beeps, you
208have command line editing; see Appendix \ref{interacting} for an
209introduction to the keys. If nothing appears to happen, or if
210\code{\^P} is echoed, command line editing isn't available; you'll
211only be able to use backspace to remove characters from the current
212line.
213
214The interpreter operates somewhat like the \UNIX{} shell: when called
215with standard input connected to a tty device, it reads and executes
216commands interactively; when called with a file name argument or with
217a file as standard input, it reads and executes a \emph{script} from
218that file.
219
220A second way of starting the interpreter is
221\samp{\program{python} \programopt{-c} \var{command} [arg] ...}, which
222executes the statement(s) in \var{command}, analogous to the shell's
223\programopt{-c} option. Since Python statements often contain spaces
224or other characters that are special to the shell, it is best to quote
225\var{command} in its entirety with double quotes.
226
227Some Python modules are also useful as scripts. These can be invoked using
228\samp{\program{python} \programopt{-m} \var{module} [arg] ...}, which
229executes the source file for \var{module} as if you had spelled out its
230full name on the command line.
231
232Note that there is a difference between \samp{python file} and
233\samp{python <file}. In the latter case, input requests from the
234program, such as calls to \function{input()} and \function{raw_input()}, are
235satisfied from \emph{file}. Since this file has already been read
236until the end by the parser before the program starts executing, the
237program will encounter end-of-file immediately. In the former case
238(which is usually what you want) they are satisfied from whatever file
239or device is connected to standard input of the Python interpreter.
240
241When a script file is used, it is sometimes useful to be able to run
242the script and enter interactive mode afterwards. This can be done by
243passing \programopt{-i} before the script. (This does not work if the
244script is read from standard input, for the same reason as explained
245in the previous paragraph.)
246
247\subsection{Argument Passing \label{argPassing}}
248
249When known to the interpreter, the script name and additional
250arguments thereafter are passed to the script in the variable
251\code{sys.argv}, which is a list of strings. Its length is at least
252one; when no script and no arguments are given, \code{sys.argv[0]} is
253an empty string. When the script name is given as \code{'-'} (meaning
254standard input), \code{sys.argv[0]} is set to \code{'-'}. When
255\programopt{-c} \var{command} is used, \code{sys.argv[0]} is set to
256\code{'-c'}. When \programopt{-m} \var{module} is used, \code{sys.argv[0]}
257is set to the full name of the located module. Options found after
258\programopt{-c} \var{command} or \programopt{-m} \var{module} are not consumed
259by the Python interpreter's option processing but left in \code{sys.argv} for
260the command or module to handle.
261
262\subsection{Interactive Mode \label{interactive}}
263
264When commands are read from a tty, the interpreter is said to be in
265\emph{interactive mode}. In this mode it prompts for the next command
266with the \emph{primary prompt}, usually three greater-than signs
267(\samp{>>>~}); for continuation lines it prompts with the
268\emph{secondary prompt}, by default three dots (\samp{...~}).
269The interpreter prints a welcome message stating its version number
270and a copyright notice before printing the first prompt:
271
272\begin{verbatim}
273python
274Python 1.5.2b2 (#1, Feb 28 1999, 00:02:06) [GCC 2.8.1] on sunos5
275Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
276>>>
277\end{verbatim}
278
279Continuation lines are needed when entering a multi-line construct.
280As an example, take a look at this \keyword{if} statement:
281
282\begin{verbatim}
283>>> the_world_is_flat = 1
284>>> if the_world_is_flat:
285... print "Be careful not to fall off!"
286...
287Be careful not to fall off!
288\end{verbatim}
289
290
291\section{The Interpreter and Its Environment \label{interp}}
292
293\subsection{Error Handling \label{error}}
294
295When an error occurs, the interpreter prints an error
296message and a stack trace. In interactive mode, it then returns to
297the primary prompt; when input came from a file, it exits with a
298nonzero exit status after printing
299the stack trace. (Exceptions handled by an \keyword{except} clause in a
300\keyword{try} statement are not errors in this context.) Some errors are
301unconditionally fatal and cause an exit with a nonzero exit; this
302applies to internal inconsistencies and some cases of running out of
303memory. All error messages are written to the standard error stream;
304normal output from executed commands is written to standard
305output.
306
307Typing the interrupt character (usually Control-C or DEL) to the
308primary or secondary prompt cancels the input and returns to the
309primary prompt.\footnote{
310 A problem with the GNU Readline package may prevent this.
311}
312Typing an interrupt while a command is executing raises the
313\exception{KeyboardInterrupt} exception, which may be handled by a
314\keyword{try} statement.
315
316\subsection{Executable Python Scripts \label{scripts}}
317
318On BSD'ish \UNIX{} systems, Python scripts can be made directly
319executable, like shell scripts, by putting the line
320
321\begin{verbatim}
322#! /usr/bin/env python
323\end{verbatim}
324
325(assuming that the interpreter is on the user's \envvar{PATH}) at the
326beginning of the script and giving the file an executable mode. The
327\samp{\#!} must be the first two characters of the file. On some
328platforms, this first line must end with a \UNIX-style line ending
329(\character{\e n}), not a Mac OS (\character{\e r}) or Windows
330(\character{\e r\e n}) line ending. Note that
331the hash, or pound, character, \character{\#}, is used to start a
332comment in Python.
333
334The script can be given an executable mode, or permission, using the
335\program{chmod} command:
336
337\begin{verbatim}
338$ chmod +x myscript.py
339\end{verbatim} % $ <-- bow to font-lock
340
341
342\subsection{Source Code Encoding}
343
344It is possible to use encodings different than \ASCII{} in Python source
345files. The best way to do it is to put one more special comment line
346right after the \code{\#!} line to define the source file encoding:
347
348\begin{alltt}
349# -*- coding: \var{encoding} -*-
350\end{alltt}
351
352With that declaration, all characters in the source file will be treated as
353having the encoding \var{encoding}, and it will be
354possible to directly write Unicode string literals in the selected
355encoding. The list of possible encodings can be found in the
356\citetitle[../lib/lib.html]{Python Library Reference}, in the section
357on \ulink{\module{codecs}}{../lib/module-codecs.html}.
358
359For example, to write Unicode literals including the Euro currency
360symbol, the ISO-8859-15 encoding can be used, with the Euro symbol
361having the ordinal value 164. This script will print the value 8364
362(the Unicode codepoint corresponding to the Euro symbol) and then
363exit:
364
365\begin{alltt}
366# -*- coding: iso-8859-15 -*-
367
368currency = u"\texteuro"
369print ord(currency)
370\end{alltt}
371
372If your editor supports saving files as \code{UTF-8} with a UTF-8
373\emph{byte order mark} (aka BOM), you can use that instead of an
374encoding declaration. IDLE supports this capability if
375\code{Options/General/Default Source Encoding/UTF-8} is set. Notice
376that this signature is not understood in older Python releases (2.2
377and earlier), and also not understood by the operating system for
378script files with \code{\#!} lines (only used on \UNIX{} systems).
379
380By using UTF-8 (either through the signature or an encoding
381declaration), characters of most languages in the world can be used
382simultaneously in string literals and comments. Using non-\ASCII{}
383characters in identifiers is not supported. To display all these
384characters properly, your editor must recognize that the file is
385UTF-8, and it must use a font that supports all the characters in the
386file.
387
388\subsection{The Interactive Startup File \label{startup}}
389
390% XXX This should probably be dumped in an appendix, since most people
391% don't use Python interactively in non-trivial ways.
392
393When you use Python interactively, it is frequently handy to have some
394standard commands executed every time the interpreter is started. You
395can do this by setting an environment variable named
396\envvar{PYTHONSTARTUP} to the name of a file containing your start-up
397commands. This is similar to the \file{.profile} feature of the
398\UNIX{} shells.
399
400This file is only read in interactive sessions, not when Python reads
401commands from a script, and not when \file{/dev/tty} is given as the
402explicit source of commands (which otherwise behaves like an
403interactive session). It is executed in the same namespace where
404interactive commands are executed, so that objects that it defines or
405imports can be used without qualification in the interactive session.
406You can also change the prompts \code{sys.ps1} and \code{sys.ps2} in
407this file.
408
409If you want to read an additional start-up file from the current
410directory, you can program this in the global start-up file using code
411like \samp{if os.path.isfile('.pythonrc.py'):
412execfile('.pythonrc.py')}. If you want to use the startup file in a
413script, you must do this explicitly in the script:
414
415\begin{verbatim}
416import os
417filename = os.environ.get('PYTHONSTARTUP')
418if filename and os.path.isfile(filename):
419 execfile(filename)
420\end{verbatim}
421
422
423\chapter{An Informal Introduction to Python \label{informal}}
424
425In the following examples, input and output are distinguished by the
426presence or absence of prompts (\samp{>>>~} and \samp{...~}): to repeat
427the example, you must type everything after the prompt, when the
428prompt appears; lines that do not begin with a prompt are output from
429the interpreter. %
430%\footnote{
431% I'd prefer to use different fonts to distinguish input
432% from output, but the amount of LaTeX hacking that would require
433% is currently beyond my ability.
434%}
435Note that a secondary prompt on a line by itself in an example means
436you must type a blank line; this is used to end a multi-line command.
437
438Many of the examples in this manual, even those entered at the
439interactive prompt, include comments. Comments in Python start with
440the hash character, \character{\#}, and extend to the end of the
441physical line. A comment may appear at the start of a line or
442following whitespace or code, but not within a string literal. A hash
443character within a string literal is just a hash character.
444
445Some examples:
446
447\begin{verbatim}
448# this is the first comment
449SPAM = 1 # and this is the second comment
450 # ... and now a third!
451STRING = "# This is not a comment."
452\end{verbatim}
453
454
455\section{Using Python as a Calculator \label{calculator}}
456
457Let's try some simple Python commands. Start the interpreter and wait
458for the primary prompt, \samp{>>>~}. (It shouldn't take long.)
459
460\subsection{Numbers \label{numbers}}
461
462The interpreter acts as a simple calculator: you can type an
463expression at it and it will write the value. Expression syntax is
464straightforward: the operators \code{+}, \code{-}, \code{*} and
465\code{/} work just like in most other languages (for example, Pascal
466or C); parentheses can be used for grouping. For example:
467
468\begin{verbatim}
469>>> 2+2
4704
471>>> # This is a comment
472... 2+2
4734
474>>> 2+2 # and a comment on the same line as code
4754
476>>> (50-5*6)/4
4775
478>>> # Integer division returns the floor:
479... 7/3
4802
481>>> 7/-3
482-3
483\end{verbatim}
484
485The equal sign (\character{=}) is used to assign a value to a variable.
486Afterwards, no result is displayed before the next interactive prompt:
487
488\begin{verbatim}
489>>> width = 20
490>>> height = 5*9
491>>> width * height
492900
493\end{verbatim}
494
495A value can be assigned to several variables simultaneously:
496
497\begin{verbatim}
498>>> x = y = z = 0 # Zero x, y and z
499>>> x
5000
501>>> y
5020
503>>> z
5040
505\end{verbatim}
506
507There is full support for floating point; operators with mixed type
508operands convert the integer operand to floating point:
509
510\begin{verbatim}
511>>> 3 * 3.75 / 1.5
5127.5
513>>> 7.0 / 2
5143.5
515\end{verbatim}
516
517Complex numbers are also supported; imaginary numbers are written with
518a suffix of \samp{j} or \samp{J}. Complex numbers with a nonzero
519real component are written as \samp{(\var{real}+\var{imag}j)}, or can
520be created with the \samp{complex(\var{real}, \var{imag})} function.
521
522\begin{verbatim}
523>>> 1j * 1J
524(-1+0j)
525>>> 1j * complex(0,1)
526(-1+0j)
527>>> 3+1j*3
528(3+3j)
529>>> (3+1j)*3
530(9+3j)
531>>> (1+2j)/(1+1j)
532(1.5+0.5j)
533\end{verbatim}
534
535Complex numbers are always represented as two floating point numbers,
536the real and imaginary part. To extract these parts from a complex
537number \var{z}, use \code{\var{z}.real} and \code{\var{z}.imag}.
538
539\begin{verbatim}
540>>> a=1.5+0.5j
541>>> a.real
5421.5
543>>> a.imag
5440.5
545\end{verbatim}
546
547The conversion functions to floating point and integer
548(\function{float()}, \function{int()} and \function{long()}) don't
549work for complex numbers --- there is no one correct way to convert a
550complex number to a real number. Use \code{abs(\var{z})} to get its
551magnitude (as a float) or \code{z.real} to get its real part.
552
553\begin{verbatim}
554>>> a=3.0+4.0j
555>>> float(a)
556Traceback (most recent call last):
557 File "<stdin>", line 1, in ?
558TypeError: can't convert complex to float; use abs(z)
559>>> a.real
5603.0
561>>> a.imag
5624.0
563>>> abs(a) # sqrt(a.real**2 + a.imag**2)
5645.0
565>>>
566\end{verbatim}
567
568In interactive mode, the last printed expression is assigned to the
569variable \code{_}. This means that when you are using Python as a
570desk calculator, it is somewhat easier to continue calculations, for
571example:
572
573\begin{verbatim}
574>>> tax = 12.5 / 100
575>>> price = 100.50
576>>> price * tax
57712.5625
578>>> price + _
579113.0625
580>>> round(_, 2)
581113.06
582>>>
583\end{verbatim}
584
585This variable should be treated as read-only by the user. Don't
586explicitly assign a value to it --- you would create an independent
587local variable with the same name masking the built-in variable with
588its magic behavior.
589
590\subsection{Strings \label{strings}}
591
592Besides numbers, Python can also manipulate strings, which can be
593expressed in several ways. They can be enclosed in single quotes or
594double quotes:
595
596\begin{verbatim}
597>>> 'spam eggs'
598'spam eggs'
599>>> 'doesn\'t'
600"doesn't"
601>>> "doesn't"
602"doesn't"
603>>> '"Yes," he said.'
604'"Yes," he said.'
605>>> "\"Yes,\" he said."
606'"Yes," he said.'
607>>> '"Isn\'t," she said.'
608'"Isn\'t," she said.'
609\end{verbatim}
610
611String literals can span multiple lines in several ways. Continuation
612lines can be used, with a backslash as the last character on the line
613indicating that the next line is a logical continuation of the line:
614
615\begin{verbatim}
616hello = "This is a rather long string containing\n\
617several lines of text just as you would do in C.\n\
618 Note that whitespace at the beginning of the line is\
619 significant."
620
621print hello
622\end{verbatim}
623
624Note that newlines still need to be embedded in the string using
625\code{\e n}; the newline following the trailing backslash is
626discarded. This example would print the following:
627
628\begin{verbatim}
629This is a rather long string containing
630several lines of text just as you would do in C.
631 Note that whitespace at the beginning of the line is significant.
632\end{verbatim}
633
634If we make the string literal a ``raw'' string, however, the
635\code{\e n} sequences are not converted to newlines, but the backslash
636at the end of the line, and the newline character in the source, are
637both included in the string as data. Thus, the example:
638
639\begin{verbatim}
640hello = r"This is a rather long string containing\n\
641several lines of text much as you would do in C."
642
643print hello
644\end{verbatim}
645
646would print:
647
648\begin{verbatim}
649This is a rather long string containing\n\
650several lines of text much as you would do in C.
651\end{verbatim}
652
653Or, strings can be surrounded in a pair of matching triple-quotes:
654\code{"""} or \code{'\code{'}'}. End of lines do not need to be escaped
655when using triple-quotes, but they will be included in the string.
656
657\begin{verbatim}
658print """
659Usage: thingy [OPTIONS]
660 -h Display this usage message
661 -H hostname Hostname to connect to
662"""
663\end{verbatim}
664
665produces the following output:
666
667\begin{verbatim}
668Usage: thingy [OPTIONS]
669 -h Display this usage message
670 -H hostname Hostname to connect to
671\end{verbatim}
672
673The interpreter prints the result of string operations in the same way
674as they are typed for input: inside quotes, and with quotes and other
675funny characters escaped by backslashes, to show the precise
676value. The string is enclosed in double quotes if the string contains
677a single quote and no double quotes, else it's enclosed in single
678quotes. (The \keyword{print} statement, described later, can be used
679to write strings without quotes or escapes.)
680
681Strings can be concatenated (glued together) with the
682\code{+} operator, and repeated with \code{*}:
683
684\begin{verbatim}
685>>> word = 'Help' + 'A'
686>>> word
687'HelpA'
688>>> '<' + word*5 + '>'
689'<HelpAHelpAHelpAHelpAHelpA>'
690\end{verbatim}
691
692Two string literals next to each other are automatically concatenated;
693the first line above could also have been written \samp{word = 'Help'
694'A'}; this only works with two literals, not with arbitrary string
695expressions:
696
697\begin{verbatim}
698>>> 'str' 'ing' # <- This is ok
699'string'
700>>> 'str'.strip() + 'ing' # <- This is ok
701'string'
702>>> 'str'.strip() 'ing' # <- This is invalid
703 File "<stdin>", line 1, in ?
704 'str'.strip() 'ing'
705 ^
706SyntaxError: invalid syntax
707\end{verbatim}
708
709Strings can be subscripted (indexed); like in C, the first character
710of a string has subscript (index) 0. There is no separate character
711type; a character is simply a string of size one. Like in Icon,
712substrings can be specified with the \emph{slice notation}: two indices
713separated by a colon.
714
715\begin{verbatim}
716>>> word[4]
717'A'
718>>> word[0:2]
719'He'
720>>> word[2:4]
721'lp'
722\end{verbatim}
723
724Slice indices have useful defaults; an omitted first index defaults to
725zero, an omitted second index defaults to the size of the string being
726sliced.
727
728\begin{verbatim}
729>>> word[:2] # The first two characters
730'He'
731>>> word[2:] # Everything except the first two characters
732'lpA'
733\end{verbatim}
734
735Unlike a C string, Python strings cannot be changed. Assigning to an
736indexed position in the string results in an error:
737
738\begin{verbatim}
739>>> word[0] = 'x'
740Traceback (most recent call last):
741 File "<stdin>", line 1, in ?
742TypeError: object doesn't support item assignment
743>>> word[:1] = 'Splat'
744Traceback (most recent call last):
745 File "<stdin>", line 1, in ?
746TypeError: object doesn't support slice assignment
747\end{verbatim}
748
749However, creating a new string with the combined content is easy and
750efficient:
751
752\begin{verbatim}
753>>> 'x' + word[1:]
754'xelpA'
755>>> 'Splat' + word[4]
756'SplatA'
757\end{verbatim}
758
759Here's a useful invariant of slice operations:
760\code{s[:i] + s[i:]} equals \code{s}.
761
762\begin{verbatim}
763>>> word[:2] + word[2:]
764'HelpA'
765>>> word[:3] + word[3:]
766'HelpA'
767\end{verbatim}
768
769Degenerate slice indices are handled gracefully: an index that is too
770large is replaced by the string size, an upper bound smaller than the
771lower bound returns an empty string.
772
773\begin{verbatim}
774>>> word[1:100]
775'elpA'
776>>> word[10:]
777''
778>>> word[2:1]
779''
780\end{verbatim}
781
782Indices may be negative numbers, to start counting from the right.
783For example:
784
785\begin{verbatim}
786>>> word[-1] # The last character
787'A'
788>>> word[-2] # The last-but-one character
789'p'
790>>> word[-2:] # The last two characters
791'pA'
792>>> word[:-2] # Everything except the last two characters
793'Hel'
794\end{verbatim}
795
796But note that -0 is really the same as 0, so it does not count from
797the right!
798
799\begin{verbatim}
800>>> word[-0] # (since -0 equals 0)
801'H'
802\end{verbatim}
803
804Out-of-range negative slice indices are truncated, but don't try this
805for single-element (non-slice) indices:
806
807\begin{verbatim}
808>>> word[-100:]
809'HelpA'
810>>> word[-10] # error
811Traceback (most recent call last):
812 File "<stdin>", line 1, in ?
813IndexError: string index out of range
814\end{verbatim}
815
816The best way to remember how slices work is to think of the indices as
817pointing \emph{between} characters, with the left edge of the first
818character numbered 0. Then the right edge of the last character of a
819string of \var{n} characters has index \var{n}, for example:
820
821\begin{verbatim}
822 +---+---+---+---+---+
823 | H | e | l | p | A |
824 +---+---+---+---+---+
825 0 1 2 3 4 5
826-5 -4 -3 -2 -1
827\end{verbatim}
828
829The first row of numbers gives the position of the indices 0...5 in
830the string; the second row gives the corresponding negative indices.
831The slice from \var{i} to \var{j} consists of all characters between
832the edges labeled \var{i} and \var{j}, respectively.
833
834For non-negative indices, the length of a slice is the difference of
835the indices, if both are within bounds. For example, the length of
836\code{word[1:3]} is 2.
837
838The built-in function \function{len()} returns the length of a string:
839
840\begin{verbatim}
841>>> s = 'supercalifragilisticexpialidocious'
842>>> len(s)
84334
844\end{verbatim}
845
846
847\begin{seealso}
848 \seetitle[../lib/typesseq.html]{Sequence Types}%
849 {Strings, and the Unicode strings described in the next
850 section, are examples of \emph{sequence types}, and
851 support the common operations supported by such types.}
852 \seetitle[../lib/string-methods.html]{String Methods}%
853 {Both strings and Unicode strings support a large number of
854 methods for basic transformations and searching.}
855 \seetitle[../lib/typesseq-strings.html]{String Formatting Operations}%
856 {The formatting operations invoked when strings and Unicode
857 strings are the left operand of the \code{\%} operator are
858 described in more detail here.}
859\end{seealso}
860
861
862\subsection{Unicode Strings \label{unicodeStrings}}
863\sectionauthor{Marc-Andre Lemburg}{mal@lemburg.com}
864
865Starting with Python 2.0 a new data type for storing text data is
866available to the programmer: the Unicode object. It can be used to
867store and manipulate Unicode data (see \url{http://www.unicode.org/})
868and integrates well with the existing string objects, providing
869auto-conversions where necessary.
870
871Unicode has the advantage of providing one ordinal for every character
872in every script used in modern and ancient texts. Previously, there
873were only 256 possible ordinals for script characters. Texts were
874typically bound to a code page which mapped the ordinals to script
875characters. This lead to very much confusion especially with respect
876to internationalization (usually written as \samp{i18n} ---
877\character{i} + 18 characters + \character{n}) of software. Unicode
878solves these problems by defining one code page for all scripts.
879
880Creating Unicode strings in Python is just as simple as creating
881normal strings:
882
883\begin{verbatim}
884>>> u'Hello World !'
885u'Hello World !'
886\end{verbatim}
887
888The small \character{u} in front of the quote indicates that a
889Unicode string is supposed to be created. If you want to include
890special characters in the string, you can do so by using the Python
891\emph{Unicode-Escape} encoding. The following example shows how:
892
893\begin{verbatim}
894>>> u'Hello\u0020World !'
895u'Hello World !'
896\end{verbatim}
897
898The escape sequence \code{\e u0020} indicates to insert the Unicode
899character with the ordinal value 0x0020 (the space character) at the
900given position.
901
902Other characters are interpreted by using their respective ordinal
903values directly as Unicode ordinals. If you have literal strings
904in the standard Latin-1 encoding that is used in many Western countries,
905you will find it convenient that the lower 256 characters
906of Unicode are the same as the 256 characters of Latin-1.
907
908For experts, there is also a raw mode just like the one for normal
909strings. You have to prefix the opening quote with 'ur' to have
910Python use the \emph{Raw-Unicode-Escape} encoding. It will only apply
911the above \code{\e uXXXX} conversion if there is an uneven number of
912backslashes in front of the small 'u'.
913
914\begin{verbatim}
915>>> ur'Hello\u0020World !'
916u'Hello World !'
917>>> ur'Hello\\u0020World !'
918u'Hello\\\\u0020World !'
919\end{verbatim}
920
921The raw mode is most useful when you have to enter lots of
922backslashes, as can be necessary in regular expressions.
923
924Apart from these standard encodings, Python provides a whole set of
925other ways of creating Unicode strings on the basis of a known
926encoding.
927
928The built-in function \function{unicode()}\bifuncindex{unicode} provides
929access to all registered Unicode codecs (COders and DECoders). Some of
930the more well known encodings which these codecs can convert are
931\emph{Latin-1}, \emph{ASCII}, \emph{UTF-8}, and \emph{UTF-16}.
932The latter two are variable-length encodings that store each Unicode
933character in one or more bytes. The default encoding is
934normally set to \ASCII, which passes through characters in the range
9350 to 127 and rejects any other characters with an error.
936When a Unicode string is printed, written to a file, or converted
937with \function{str()}, conversion takes place using this default encoding.
938
939\begin{verbatim}
940>>> u"abc"
941u'abc'
942>>> str(u"abc")
943'abc'
944>>> u"äöü"
945u'\xe4\xf6\xfc'
946>>> str(u"äöü")
947Traceback (most recent call last):
948 File "<stdin>", line 1, in ?
949UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
950\end{verbatim}
951
952To convert a Unicode string into an 8-bit string using a specific
953encoding, Unicode objects provide an \function{encode()} method
954that takes one argument, the name of the encoding. Lowercase names
955for encodings are preferred.
956
957\begin{verbatim}
958>>> u"äöü".encode('utf-8')
959'\xc3\xa4\xc3\xb6\xc3\xbc'
960\end{verbatim}
961
962If you have data in a specific encoding and want to produce a
963corresponding Unicode string from it, you can use the
964\function{unicode()} function with the encoding name as the second
965argument.
966
967\begin{verbatim}
968>>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
969u'\xe4\xf6\xfc'
970\end{verbatim}
971
972\subsection{Lists \label{lists}}
973
974Python knows a number of \emph{compound} data types, used to group
975together other values. The most versatile is the \emph{list}, which
976can be written as a list of comma-separated values (items) between
977square brackets. List items need not all have the same type.
978
979\begin{verbatim}
980>>> a = ['spam', 'eggs', 100, 1234]
981>>> a
982['spam', 'eggs', 100, 1234]
983\end{verbatim}
984
985Like string indices, list indices start at 0, and lists can be sliced,
986concatenated and so on:
987
988\begin{verbatim}
989>>> a[0]
990'spam'
991>>> a[3]
9921234
993>>> a[-2]
994100
995>>> a[1:-1]
996['eggs', 100]
997>>> a[:2] + ['bacon', 2*2]
998['spam', 'eggs', 'bacon', 4]
999>>> 3*a[:3] + ['Boo!']
1000['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']
1001\end{verbatim}
1002
1003Unlike strings, which are \emph{immutable}, it is possible to change
1004individual elements of a list:
1005
1006\begin{verbatim}
1007>>> a
1008['spam', 'eggs', 100, 1234]
1009>>> a[2] = a[2] + 23
1010>>> a
1011['spam', 'eggs', 123, 1234]
1012\end{verbatim}
1013
1014Assignment to slices is also possible, and this can even change the size
1015of the list or clear it entirely:
1016
1017\begin{verbatim}
1018>>> # Replace some items:
1019... a[0:2] = [1, 12]
1020>>> a
1021[1, 12, 123, 1234]
1022>>> # Remove some:
1023... a[0:2] = []
1024>>> a
1025[123, 1234]
1026>>> # Insert some:
1027... a[1:1] = ['bletch', 'xyzzy']
1028>>> a
1029[123, 'bletch', 'xyzzy', 1234]
1030>>> # Insert (a copy of) itself at the beginning
1031>>> a[:0] = a
1032>>> a
1033[123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
1034>>> # Clear the list: replace all items with an empty list
1035>>> a[:] = []
1036>>> a
1037[]
1038\end{verbatim}
1039
1040The built-in function \function{len()} also applies to lists:
1041
1042\begin{verbatim}
1043>>> len(a)
10448
1045\end{verbatim}
1046
1047It is possible to nest lists (create lists containing other lists),
1048for example:
1049
1050\begin{verbatim}
1051>>> q = [2, 3]
1052>>> p = [1, q, 4]
1053>>> len(p)
10543
1055>>> p[1]
1056[2, 3]
1057>>> p[1][0]
10582
1059>>> p[1].append('xtra') # See section 5.1
1060>>> p
1061[1, [2, 3, 'xtra'], 4]
1062>>> q
1063[2, 3, 'xtra']
1064\end{verbatim}
1065
1066Note that in the last example, \code{p[1]} and \code{q} really refer to
1067the same object! We'll come back to \emph{object semantics} later.
1068
1069\section{First Steps Towards Programming \label{firstSteps}}
1070
1071Of course, we can use Python for more complicated tasks than adding
1072two and two together. For instance, we can write an initial
1073sub-sequence of the \emph{Fibonacci} series as follows:
1074
1075\begin{verbatim}
1076>>> # Fibonacci series:
1077... # the sum of two elements defines the next
1078... a, b = 0, 1
1079>>> while b < 10:
1080... print b
1081... a, b = b, a+b
1082...
10831
10841
10852
10863
10875
10888
1089\end{verbatim}
1090
1091This example introduces several new features.
1092
1093\begin{itemize}
1094
1095\item
1096The first line contains a \emph{multiple assignment}: the variables
1097\code{a} and \code{b} simultaneously get the new values 0 and 1. On the
1098last line this is used again, demonstrating that the expressions on
1099the right-hand side are all evaluated first before any of the
1100assignments take place. The right-hand side expressions are evaluated
1101from the left to the right.
1102
1103\item
1104The \keyword{while} loop executes as long as the condition (here:
1105\code{b < 10}) remains true. In Python, like in C, any non-zero
1106integer value is true; zero is false. The condition may also be a
1107string or list value, in fact any sequence; anything with a non-zero
1108length is true, empty sequences are false. The test used in the
1109example is a simple comparison. The standard comparison operators are
1110written the same as in C: \code{<} (less than), \code{>} (greater than),
1111\code{==} (equal to), \code{<=} (less than or equal to),
1112\code{>=} (greater than or equal to) and \code{!=} (not equal to).
1113
1114\item
1115The \emph{body} of the loop is \emph{indented}: indentation is Python's
1116way of grouping statements. Python does not (yet!) provide an
1117intelligent input line editing facility, so you have to type a tab or
1118space(s) for each indented line. In practice you will prepare more
1119complicated input for Python with a text editor; most text editors have
1120an auto-indent facility. When a compound statement is entered
1121interactively, it must be followed by a blank line to indicate
1122completion (since the parser cannot guess when you have typed the last
1123line). Note that each line within a basic block must be indented by
1124the same amount.
1125
1126\item
1127The \keyword{print} statement writes the value of the expression(s) it is
1128given. It differs from just writing the expression you want to write
1129(as we did earlier in the calculator examples) in the way it handles
1130multiple expressions and strings. Strings are printed without quotes,
1131and a space is inserted between items, so you can format things nicely,
1132like this:
1133
1134\begin{verbatim}
1135>>> i = 256*256
1136>>> print 'The value of i is', i
1137The value of i is 65536
1138\end{verbatim}
1139
1140A trailing comma avoids the newline after the output:
1141
1142\begin{verbatim}
1143>>> a, b = 0, 1
1144>>> while b < 1000:
1145... print b,
1146... a, b = b, a+b
1147...
11481 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
1149\end{verbatim}
1150
1151Note that the interpreter inserts a newline before it prints the next
1152prompt if the last line was not completed.
1153
1154\end{itemize}
1155
1156
1157\chapter{More Control Flow Tools \label{moreControl}}
1158
1159Besides the \keyword{while} statement just introduced, Python knows
1160the usual control flow statements known from other languages, with
1161some twists.
1162
1163\section{\keyword{if} Statements \label{if}}
1164
1165Perhaps the most well-known statement type is the
1166\keyword{if} statement. For example:
1167
1168\begin{verbatim}
1169>>> x = int(raw_input("Please enter an integer: "))
1170>>> if x < 0:
1171... x = 0
1172... print 'Negative changed to zero'
1173... elif x == 0:
1174... print 'Zero'
1175... elif x == 1:
1176... print 'Single'
1177... else:
1178... print 'More'
1179...
1180\end{verbatim}
1181
1182There can be zero or more \keyword{elif} parts, and the
1183\keyword{else} part is optional. The keyword `\keyword{elif}' is
1184short for `else if', and is useful to avoid excessive indentation. An
1185\keyword{if} \ldots\ \keyword{elif} \ldots\ \keyword{elif} \ldots\ sequence
1186% Weird spacings happen here if the wrapping of the source text
1187% gets changed in the wrong way.
1188is a substitute for the \keyword{switch} or
1189\keyword{case} statements found in other languages.
1190
1191
1192\section{\keyword{for} Statements \label{for}}
1193
1194The \keyword{for}\stindex{for} statement in Python differs a bit from
1195what you may be used to in C or Pascal. Rather than always
1196iterating over an arithmetic progression of numbers (like in Pascal),
1197or giving the user the ability to define both the iteration step and
1198halting condition (as C), Python's
1199\keyword{for}\stindex{for} statement iterates over the items of any
1200sequence (a list or a string), in the order that they appear in
1201the sequence. For example (no pun intended):
1202% One suggestion was to give a real C example here, but that may only
1203% serve to confuse non-C programmers.
1204
1205\begin{verbatim}
1206>>> # Measure some strings:
1207... a = ['cat', 'window', 'defenestrate']
1208>>> for x in a:
1209... print x, len(x)
1210...
1211cat 3
1212window 6
1213defenestrate 12
1214\end{verbatim}
1215
1216It is not safe to modify the sequence being iterated over in the loop
1217(this can only happen for mutable sequence types, such as lists). If
1218you need to modify the list you are iterating over (for example, to
1219duplicate selected items) you must iterate over a copy. The slice
1220notation makes this particularly convenient:
1221
1222\begin{verbatim}
1223>>> for x in a[:]: # make a slice copy of the entire list
1224... if len(x) > 6: a.insert(0, x)
1225...
1226>>> a
1227['defenestrate', 'cat', 'window', 'defenestrate']
1228\end{verbatim}
1229
1230
1231\section{The \function{range()} Function \label{range}}
1232
1233If you do need to iterate over a sequence of numbers, the built-in
1234function \function{range()} comes in handy. It generates lists
1235containing arithmetic progressions:
1236
1237\begin{verbatim}
1238>>> range(10)
1239[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1240\end{verbatim}
1241
1242The given end point is never part of the generated list;
1243\code{range(10)} generates a list of 10 values, the legal
1244indices for items of a sequence of length 10. It is possible to let
1245the range start at another number, or to specify a different increment
1246(even negative; sometimes this is called the `step'):
1247
1248\begin{verbatim}
1249>>> range(5, 10)
1250[5, 6, 7, 8, 9]
1251>>> range(0, 10, 3)
1252[0, 3, 6, 9]
1253>>> range(-10, -100, -30)
1254[-10, -40, -70]
1255\end{verbatim}
1256
1257To iterate over the indices of a sequence, combine
1258\function{range()} and \function{len()} as follows:
1259
1260\begin{verbatim}
1261>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
1262>>> for i in range(len(a)):
1263... print i, a[i]
1264...
12650 Mary
12661 had
12672 a
12683 little
12694 lamb
1270\end{verbatim}
1271
1272
1273\section{\keyword{break} and \keyword{continue} Statements, and
1274 \keyword{else} Clauses on Loops
1275 \label{break}}
1276
1277The \keyword{break} statement, like in C, breaks out of the smallest
1278enclosing \keyword{for} or \keyword{while} loop.
1279
1280The \keyword{continue} statement, also borrowed from C, continues
1281with the next iteration of the loop.
1282
1283Loop statements may have an \code{else} clause; it is executed when
1284the loop terminates through exhaustion of the list (with
1285\keyword{for}) or when the condition becomes false (with
1286\keyword{while}), but not when the loop is terminated by a
1287\keyword{break} statement. This is exemplified by the following loop,
1288which searches for prime numbers:
1289
1290\begin{verbatim}
1291>>> for n in range(2, 10):
1292... for x in range(2, n):
1293... if n % x == 0:
1294... print n, 'equals', x, '*', n/x
1295... break
1296... else:
1297... # loop fell through without finding a factor
1298... print n, 'is a prime number'
1299...
13002 is a prime number
13013 is a prime number
13024 equals 2 * 2
13035 is a prime number
13046 equals 2 * 3
13057 is a prime number
13068 equals 2 * 4
13079 equals 3 * 3
1308\end{verbatim}
1309
1310
1311\section{\keyword{pass} Statements \label{pass}}
1312
1313The \keyword{pass} statement does nothing.
1314It can be used when a statement is required syntactically but the
1315program requires no action.
1316For example:
1317
1318\begin{verbatim}
1319>>> while True:
1320... pass # Busy-wait for keyboard interrupt
1321...
1322\end{verbatim}
1323
1324
1325\section{Defining Functions \label{functions}}
1326
1327We can create a function that writes the Fibonacci series to an
1328arbitrary boundary:
1329
1330\begin{verbatim}
1331>>> def fib(n): # write Fibonacci series up to n
1332... """Print a Fibonacci series up to n."""
1333... a, b = 0, 1
1334... while b < n:
1335... print b,
1336... a, b = b, a+b
1337...
1338>>> # Now call the function we just defined:
1339... fib(2000)
13401 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
1341\end{verbatim}
1342
1343The keyword \keyword{def} introduces a function \emph{definition}. It
1344must be followed by the function name and the parenthesized list of
1345formal parameters. The statements that form the body of the function
1346start at the next line, and must be indented. The first statement of
1347the function body can optionally be a string literal; this string
1348literal is the function's \index{documentation strings}documentation
1349string, or \dfn{docstring}.\index{docstrings}\index{strings, documentation}
1350
1351There are tools which use docstrings to automatically produce online
1352or printed documentation, or to let the user interactively browse
1353through code; it's good practice to include docstrings in code that
1354you write, so try to make a habit of it.
1355
1356The \emph{execution} of a function introduces a new symbol table used
1357for the local variables of the function. More precisely, all variable
1358assignments in a function store the value in the local symbol table;
1359whereas variable references first look in the local symbol table, then
1360in the global symbol table, and then in the table of built-in names.
1361Thus, global variables cannot be directly assigned a value within a
1362function (unless named in a \keyword{global} statement), although
1363they may be referenced.
1364
1365The actual parameters (arguments) to a function call are introduced in
1366the local symbol table of the called function when it is called; thus,
1367arguments are passed using \emph{call by value} (where the
1368\emph{value} is always an object \emph{reference}, not the value of
1369the object).\footnote{
1370 Actually, \emph{call by object reference} would be a better
1371 description, since if a mutable object is passed, the caller
1372 will see any changes the callee makes to it (items
1373 inserted into a list).
1374} When a function calls another function, a new local symbol table is
1375created for that call.
1376
1377A function definition introduces the function name in the current
1378symbol table. The value of the function name
1379has a type that is recognized by the interpreter as a user-defined
1380function. This value can be assigned to another name which can then
1381also be used as a function. This serves as a general renaming
1382mechanism:
1383
1384\begin{verbatim}
1385>>> fib
1386<function fib at 10042ed0>
1387>>> f = fib
1388>>> f(100)
13891 1 2 3 5 8 13 21 34 55 89
1390\end{verbatim}
1391
1392You might object that \code{fib} is not a function but a procedure. In
1393Python, like in C, procedures are just functions that don't return a
1394value. In fact, technically speaking, procedures do return a value,
1395albeit a rather boring one. This value is called \code{None} (it's a
1396built-in name). Writing the value \code{None} is normally suppressed by
1397the interpreter if it would be the only value written. You can see it
1398if you really want to:
1399
1400\begin{verbatim}
1401>>> print fib(0)
1402None
1403\end{verbatim}
1404
1405It is simple to write a function that returns a list of the numbers of
1406the Fibonacci series, instead of printing it:
1407
1408\begin{verbatim}
1409>>> def fib2(n): # return Fibonacci series up to n
1410... """Return a list containing the Fibonacci series up to n."""
1411... result = []
1412... a, b = 0, 1
1413... while b < n:
1414... result.append(b) # see below
1415... a, b = b, a+b
1416... return result
1417...
1418>>> f100 = fib2(100) # call it
1419>>> f100 # write the result
1420[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
1421\end{verbatim}
1422
1423This example, as usual, demonstrates some new Python features:
1424
1425\begin{itemize}
1426
1427\item
1428The \keyword{return} statement returns with a value from a function.
1429\keyword{return} without an expression argument returns \code{None}.
1430Falling off the end of a procedure also returns \code{None}.
1431
1432\item
1433The statement \code{result.append(b)} calls a \emph{method} of the list
1434object \code{result}. A method is a function that `belongs' to an
1435object and is named \code{obj.methodname}, where \code{obj} is some
1436object (this may be an expression), and \code{methodname} is the name
1437of a method that is defined by the object's type. Different types
1438define different methods. Methods of different types may have the
1439same name without causing ambiguity. (It is possible to define your
1440own object types and methods, using \emph{classes}, as discussed later
1441in this tutorial.)
1442The method \method{append()} shown in the example is defined for
1443list objects; it adds a new element at the end of the list. In this
1444example it is equivalent to \samp{result = result + [b]}, but more
1445efficient.
1446
1447\end{itemize}
1448
1449\section{More on Defining Functions \label{defining}}
1450
1451It is also possible to define functions with a variable number of
1452arguments. There are three forms, which can be combined.
1453
1454\subsection{Default Argument Values \label{defaultArgs}}
1455
1456The most useful form is to specify a default value for one or more
1457arguments. This creates a function that can be called with fewer
1458arguments than it is defined to allow. For example:
1459
1460\begin{verbatim}
1461def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
1462 while True:
1463 ok = raw_input(prompt)
1464 if ok in ('y', 'ye', 'yes'): return True
1465 if ok in ('n', 'no', 'nop', 'nope'): return False
1466 retries = retries - 1
1467 if retries < 0: raise IOError, 'refusenik user'
1468 print complaint
1469\end{verbatim}
1470
1471This function can be called either like this:
1472\code{ask_ok('Do you really want to quit?')} or like this:
1473\code{ask_ok('OK to overwrite the file?', 2)}.
1474
1475This example also introduces the \keyword{in} keyword. This tests
1476whether or not a sequence contains a certain value.
1477
1478The default values are evaluated at the point of function definition
1479in the \emph{defining} scope, so that
1480
1481\begin{verbatim}
1482i = 5
1483
1484def f(arg=i):
1485 print arg
1486
1487i = 6
1488f()
1489\end{verbatim}
1490
1491will print \code{5}.
1492
1493\strong{Important warning:} The default value is evaluated only once.
1494This makes a difference when the default is a mutable object such as a
1495list, dictionary, or instances of most classes. For example, the
1496following function accumulates the arguments passed to it on
1497subsequent calls:
1498
1499\begin{verbatim}
1500def f(a, L=[]):
1501 L.append(a)
1502 return L
1503
1504print f(1)
1505print f(2)
1506print f(3)
1507\end{verbatim}
1508
1509This will print
1510
1511\begin{verbatim}
1512[1]
1513[1, 2]
1514[1, 2, 3]
1515\end{verbatim}
1516
1517If you don't want the default to be shared between subsequent calls,
1518you can write the function like this instead:
1519
1520\begin{verbatim}
1521def f(a, L=None):
1522 if L is None:
1523 L = []
1524 L.append(a)
1525 return L
1526\end{verbatim}
1527
1528\subsection{Keyword Arguments \label{keywordArgs}}
1529
1530Functions can also be called using
1531keyword arguments of the form \samp{\var{keyword} = \var{value}}. For
1532instance, the following function:
1533
1534\begin{verbatim}
1535def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
1536 print "-- This parrot wouldn't", action,
1537 print "if you put", voltage, "volts through it."
1538 print "-- Lovely plumage, the", type
1539 print "-- It's", state, "!"
1540\end{verbatim}
1541
1542could be called in any of the following ways:
1543
1544\begin{verbatim}
1545parrot(1000)
1546parrot(action = 'VOOOOOM', voltage = 1000000)
1547parrot('a thousand', state = 'pushing up the daisies')
1548parrot('a million', 'bereft of life', 'jump')
1549\end{verbatim}
1550
1551but the following calls would all be invalid:
1552
1553\begin{verbatim}
1554parrot() # required argument missing
1555parrot(voltage=5.0, 'dead') # non-keyword argument following keyword
1556parrot(110, voltage=220) # duplicate value for argument
1557parrot(actor='John Cleese') # unknown keyword
1558\end{verbatim}
1559
1560In general, an argument list must have any positional arguments
1561followed by any keyword arguments, where the keywords must be chosen
1562from the formal parameter names. It's not important whether a formal
1563parameter has a default value or not. No argument may receive a
1564value more than once --- formal parameter names corresponding to
1565positional arguments cannot be used as keywords in the same calls.
1566Here's an example that fails due to this restriction:
1567
1568\begin{verbatim}
1569>>> def function(a):
1570... pass
1571...
1572>>> function(0, a=0)
1573Traceback (most recent call last):
1574 File "<stdin>", line 1, in ?
1575TypeError: function() got multiple values for keyword argument 'a'
1576\end{verbatim}
1577
1578When a final formal parameter of the form \code{**\var{name}} is
1579present, it receives a \ulink{dictionary}{../lib/typesmapping.html}
1580containing all keyword arguments except for those corresponding to
1581a formal parameter. This may be
1582combined with a formal parameter of the form
1583\code{*\var{name}} (described in the next subsection) which receives a
1584tuple containing the positional arguments beyond the formal parameter
1585list. (\code{*\var{name}} must occur before \code{**\var{name}}.)
1586For example, if we define a function like this:
1587
1588\begin{verbatim}
1589def cheeseshop(kind, *arguments, **keywords):
1590 print "-- Do you have any", kind, '?'
1591 print "-- I'm sorry, we're all out of", kind
1592 for arg in arguments: print arg
1593 print '-'*40
1594 keys = keywords.keys()
1595 keys.sort()
1596 for kw in keys: print kw, ':', keywords[kw]
1597\end{verbatim}
1598
1599It could be called like this:
1600
1601\begin{verbatim}
1602cheeseshop('Limburger', "It's very runny, sir.",
1603 "It's really very, VERY runny, sir.",
1604 client='John Cleese',
1605 shopkeeper='Michael Palin',
1606 sketch='Cheese Shop Sketch')
1607\end{verbatim}
1608
1609and of course it would print:
1610
1611\begin{verbatim}
1612-- Do you have any Limburger ?
1613-- I'm sorry, we're all out of Limburger
1614It's very runny, sir.
1615It's really very, VERY runny, sir.
1616----------------------------------------
1617client : John Cleese
1618shopkeeper : Michael Palin
1619sketch : Cheese Shop Sketch
1620\end{verbatim}
1621
1622Note that the \method{sort()} method of the list of keyword argument
1623names is called before printing the contents of the \code{keywords}
1624dictionary; if this is not done, the order in which the arguments are
1625printed is undefined.
1626
1627
1628\subsection{Arbitrary Argument Lists \label{arbitraryArgs}}
1629
1630Finally, the least frequently used option is to specify that a
1631function can be called with an arbitrary number of arguments. These
1632arguments will be wrapped up in a tuple. Before the variable number
1633of arguments, zero or more normal arguments may occur.
1634
1635\begin{verbatim}
1636def fprintf(file, format, *args):
1637 file.write(format % args)
1638\end{verbatim}
1639
1640
1641\subsection{Unpacking Argument Lists \label{unpacking-arguments}}
1642
1643The reverse situation occurs when the arguments are already in a list
1644or tuple but need to be unpacked for a function call requiring separate
1645positional arguments. For instance, the built-in \function{range()}
1646function expects separate \var{start} and \var{stop} arguments. If they
1647are not available separately, write the function call with the
1648\code{*}-operator to unpack the arguments out of a list or tuple:
1649
1650\begin{verbatim}
1651>>> range(3, 6) # normal call with separate arguments
1652[3, 4, 5]
1653>>> args = [3, 6]
1654>>> range(*args) # call with arguments unpacked from a list
1655[3, 4, 5]
1656\end{verbatim}
1657
1658In the same fashion, dictionaries can deliver keyword arguments with the
1659\code{**}-operator:
1660
1661\begin{verbatim}
1662>>> def parrot(voltage, state='a stiff', action='voom'):
1663... print "-- This parrot wouldn't", action,
1664... print "if you put", voltage, "volts through it.",
1665... print "E's", state, "!"
1666...
1667>>> d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"}
1668>>> parrot(**d)
1669-- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised !
1670\end{verbatim}
1671
1672
1673\subsection{Lambda Forms \label{lambda}}
1674
1675By popular demand, a few features commonly found in functional
1676programming languages like Lisp have been added to Python. With the
1677\keyword{lambda} keyword, small anonymous functions can be created.
1678Here's a function that returns the sum of its two arguments:
1679\samp{lambda a, b: a+b}. Lambda forms can be used wherever function
1680objects are required. They are syntactically restricted to a single
1681expression. Semantically, they are just syntactic sugar for a normal
1682function definition. Like nested function definitions, lambda forms
1683can reference variables from the containing scope:
1684
1685\begin{verbatim}
1686>>> def make_incrementor(n):
1687... return lambda x: x + n
1688...
1689>>> f = make_incrementor(42)
1690>>> f(0)
169142
1692>>> f(1)
169343
1694\end{verbatim}
1695
1696
1697\subsection{Documentation Strings \label{docstrings}}
1698
1699There are emerging conventions about the content and formatting of
1700documentation strings.
1701\index{docstrings}\index{documentation strings}
1702\index{strings, documentation}
1703
1704The first line should always be a short, concise summary of the
1705object's purpose. For brevity, it should not explicitly state the
1706object's name or type, since these are available by other means
1707(except if the name happens to be a verb describing a function's
1708operation). This line should begin with a capital letter and end with
1709a period.
1710
1711If there are more lines in the documentation string, the second line
1712should be blank, visually separating the summary from the rest of the
1713description. The following lines should be one or more paragraphs
1714describing the object's calling conventions, its side effects, etc.
1715
1716The Python parser does not strip indentation from multi-line string
1717literals in Python, so tools that process documentation have to strip
1718indentation if desired. This is done using the following convention.
1719The first non-blank line \emph{after} the first line of the string
1720determines the amount of indentation for the entire documentation
1721string. (We can't use the first line since it is generally adjacent
1722to the string's opening quotes so its indentation is not apparent in
1723the string literal.) Whitespace ``equivalent'' to this indentation is
1724then stripped from the start of all lines of the string. Lines that
1725are indented less should not occur, but if they occur all their
1726leading whitespace should be stripped. Equivalence of whitespace
1727should be tested after expansion of tabs (to 8 spaces, normally).
1728
1729Here is an example of a multi-line docstring:
1730
1731\begin{verbatim}
1732>>> def my_function():
1733... """Do nothing, but document it.
1734...
1735... No, really, it doesn't do anything.
1736... """
1737... pass
1738...
1739>>> print my_function.__doc__
1740Do nothing, but document it.
1741
1742 No, really, it doesn't do anything.
1743
1744\end{verbatim}
1745
1746
1747
1748\chapter{Data Structures \label{structures}}
1749
1750This chapter describes some things you've learned about already in
1751more detail, and adds some new things as well.
1752
1753
1754\section{More on Lists \label{moreLists}}
1755
1756The list data type has some more methods. Here are all of the methods
1757of list objects:
1758
1759\begin{methoddesc}[list]{append}{x}
1760Add an item to the end of the list;
1761equivalent to \code{a[len(a):] = [\var{x}]}.
1762\end{methoddesc}
1763
1764\begin{methoddesc}[list]{extend}{L}
1765Extend the list by appending all the items in the given list;
1766equivalent to \code{a[len(a):] = \var{L}}.
1767\end{methoddesc}
1768
1769\begin{methoddesc}[list]{insert}{i, x}
1770Insert an item at a given position. The first argument is the index
1771of the element before which to insert, so \code{a.insert(0, \var{x})}
1772inserts at the front of the list, and \code{a.insert(len(a), \var{x})}
1773is equivalent to \code{a.append(\var{x})}.
1774\end{methoddesc}
1775
1776\begin{methoddesc}[list]{remove}{x}
1777Remove the first item from the list whose value is \var{x}.
1778It is an error if there is no such item.
1779\end{methoddesc}
1780
1781\begin{methoddesc}[list]{pop}{\optional{i}}
1782Remove the item at the given position in the list, and return it. If
1783no index is specified, \code{a.pop()} removes and returns the last item
1784in the list. (The square brackets
1785around the \var{i} in the method signature denote that the parameter
1786is optional, not that you should type square brackets at that
1787position. You will see this notation frequently in the
1788\citetitle[../lib/lib.html]{Python Library Reference}.)
1789\end{methoddesc}
1790
1791\begin{methoddesc}[list]{index}{x}
1792Return the index in the list of the first item whose value is \var{x}.
1793It is an error if there is no such item.
1794\end{methoddesc}
1795
1796\begin{methoddesc}[list]{count}{x}
1797Return the number of times \var{x} appears in the list.
1798\end{methoddesc}
1799
1800\begin{methoddesc}[list]{sort}{}
1801Sort the items of the list, in place.
1802\end{methoddesc}
1803
1804\begin{methoddesc}[list]{reverse}{}
1805Reverse the elements of the list, in place.
1806\end{methoddesc}
1807
1808An example that uses most of the list methods:
1809
1810\begin{verbatim}
1811>>> a = [66.25, 333, 333, 1, 1234.5]
1812>>> print a.count(333), a.count(66.25), a.count('x')
18132 1 0
1814>>> a.insert(2, -1)
1815>>> a.append(333)
1816>>> a
1817[66.25, 333, -1, 333, 1, 1234.5, 333]
1818>>> a.index(333)
18191
1820>>> a.remove(333)
1821>>> a
1822[66.25, -1, 333, 1, 1234.5, 333]
1823>>> a.reverse()
1824>>> a
1825[333, 1234.5, 1, 333, -1, 66.25]
1826>>> a.sort()
1827>>> a
1828[-1, 1, 66.25, 333, 333, 1234.5]
1829\end{verbatim}
1830
1831
1832\subsection{Using Lists as Stacks \label{lists-as-stacks}}
1833\sectionauthor{Ka-Ping Yee}{ping@lfw.org}
1834
1835The list methods make it very easy to use a list as a stack, where the
1836last element added is the first element retrieved (``last-in,
1837first-out''). To add an item to the top of the stack, use
1838\method{append()}. To retrieve an item from the top of the stack, use
1839\method{pop()} without an explicit index. For example:
1840
1841\begin{verbatim}
1842>>> stack = [3, 4, 5]
1843>>> stack.append(6)
1844>>> stack.append(7)
1845>>> stack
1846[3, 4, 5, 6, 7]
1847>>> stack.pop()
18487
1849>>> stack
1850[3, 4, 5, 6]
1851>>> stack.pop()
18526
1853>>> stack.pop()
18545
1855>>> stack
1856[3, 4]
1857\end{verbatim}
1858
1859
1860\subsection{Using Lists as Queues \label{lists-as-queues}}
1861\sectionauthor{Ka-Ping Yee}{ping@lfw.org}
1862
1863You can also use a list conveniently as a queue, where the first
1864element added is the first element retrieved (``first-in,
1865first-out''). To add an item to the back of the queue, use
1866\method{append()}. To retrieve an item from the front of the queue,
1867use \method{pop()} with \code{0} as the index. For example:
1868
1869\begin{verbatim}
1870>>> queue = ["Eric", "John", "Michael"]
1871>>> queue.append("Terry") # Terry arrives
1872>>> queue.append("Graham") # Graham arrives
1873>>> queue.pop(0)
1874'Eric'
1875>>> queue.pop(0)
1876'John'
1877>>> queue
1878['Michael', 'Terry', 'Graham']
1879\end{verbatim}
1880
1881
1882\subsection{Functional Programming Tools \label{functional}}
1883
1884There are three built-in functions that are very useful when used with
1885lists: \function{filter()}, \function{map()}, and \function{reduce()}.
1886
1887\samp{filter(\var{function}, \var{sequence})} returns a sequence
1888consisting of those items from the
1889sequence for which \code{\var{function}(\var{item})} is true.
1890If \var{sequence} is a \class{string} or \class{tuple}, the result will
1891be of the same type; otherwise, it is always a \class{list}.
1892For example, to compute some primes:
1893
1894\begin{verbatim}
1895>>> def f(x): return x % 2 != 0 and x % 3 != 0
1896...
1897>>> filter(f, range(2, 25))
1898[5, 7, 11, 13, 17, 19, 23]
1899\end{verbatim}
1900
1901\samp{map(\var{function}, \var{sequence})} calls
1902\code{\var{function}(\var{item})} for each of the sequence's items and
1903returns a list of the return values. For example, to compute some
1904cubes:
1905
1906\begin{verbatim}
1907>>> def cube(x): return x*x*x
1908...
1909>>> map(cube, range(1, 11))
1910[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
1911\end{verbatim}
1912
1913More than one sequence may be passed; the function must then have as
1914many arguments as there are sequences and is called with the
1915corresponding item from each sequence (or \code{None} if some sequence
1916is shorter than another). For example:
1917
1918\begin{verbatim}
1919>>> seq = range(8)
1920>>> def add(x, y): return x+y
1921...
1922>>> map(add, seq, seq)
1923[0, 2, 4, 6, 8, 10, 12, 14]
1924\end{verbatim}
1925
1926\samp{reduce(\var{function}, \var{sequence})} returns a single value
1927constructed by calling the binary function \var{function} on the first two
1928items of the sequence, then on the result and the next item, and so
1929on. For example, to compute the sum of the numbers 1 through 10:
1930
1931\begin{verbatim}
1932>>> def add(x,y): return x+y
1933...
1934>>> reduce(add, range(1, 11))
193555
1936\end{verbatim}
1937
1938If there's only one item in the sequence, its value is returned; if
1939the sequence is empty, an exception is raised.
1940
1941A third argument can be passed to indicate the starting value. In this
1942case the starting value is returned for an empty sequence, and the
1943function is first applied to the starting value and the first sequence
1944item, then to the result and the next item, and so on. For example,
1945
1946\begin{verbatim}
1947>>> def sum(seq):
1948... def add(x,y): return x+y
1949... return reduce(add, seq, 0)
1950...
1951>>> sum(range(1, 11))
195255
1953>>> sum([])
19540
1955\end{verbatim}
1956
1957Don't use this example's definition of \function{sum()}: since summing
1958numbers is such a common need, a built-in function
1959\code{sum(\var{sequence})} is already provided, and works exactly like
1960this.
1961\versionadded{2.3}
1962
1963\subsection{List Comprehensions}
1964
1965List comprehensions provide a concise way to create lists without resorting
1966to use of \function{map()}, \function{filter()} and/or \keyword{lambda}.
1967The resulting list definition tends often to be clearer than lists built
1968using those constructs. Each list comprehension consists of an expression
1969followed by a \keyword{for} clause, then zero or more \keyword{for} or
1970\keyword{if} clauses. The result will be a list resulting from evaluating
1971the expression in the context of the \keyword{for} and \keyword{if} clauses
1972which follow it. If the expression would evaluate to a tuple, it must be
1973parenthesized.
1974
1975\begin{verbatim}
1976>>> freshfruit = [' banana', ' loganberry ', 'passion fruit ']
1977>>> [weapon.strip() for weapon in freshfruit]
1978['banana', 'loganberry', 'passion fruit']
1979>>> vec = [2, 4, 6]
1980>>> [3*x for x in vec]
1981[6, 12, 18]
1982>>> [3*x for x in vec if x > 3]
1983[12, 18]
1984>>> [3*x for x in vec if x < 2]
1985[]
1986>>> [[x,x**2] for x in vec]
1987[[2, 4], [4, 16], [6, 36]]
1988>>> [x, x**2 for x in vec] # error - parens required for tuples
1989 File "<stdin>", line 1, in ?
1990 [x, x**2 for x in vec]
1991 ^
1992SyntaxError: invalid syntax
1993>>> [(x, x**2) for x in vec]
1994[(2, 4), (4, 16), (6, 36)]
1995>>> vec1 = [2, 4, 6]
1996>>> vec2 = [4, 3, -9]
1997>>> [x*y for x in vec1 for y in vec2]
1998[8, 6, -18, 16, 12, -36, 24, 18, -54]
1999>>> [x+y for x in vec1 for y in vec2]
2000[6, 5, -7, 8, 7, -5, 10, 9, -3]
2001>>> [vec1[i]*vec2[i] for i in range(len(vec1))]
2002[8, 12, -54]
2003\end{verbatim}
2004
2005List comprehensions are much more flexible than \function{map()} and can be
2006applied to complex expressions and nested functions:
2007
2008\begin{verbatim}
2009>>> [str(round(355/113.0, i)) for i in range(1,6)]
2010['3.1', '3.14', '3.142', '3.1416', '3.14159']
2011\end{verbatim}
2012
2013
2014\section{The \keyword{del} statement \label{del}}
2015
2016There is a way to remove an item from a list given its index instead
2017of its value: the \keyword{del} statement. This differs from the
2018\method{pop()}) method which returns a value. The \keyword{del}
2019statement can also be used to remove slices from a list or clear the
2020entire list (which we did earlier by assignment of an empty list to
2021the slice). For example:
2022
2023\begin{verbatim}
2024>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
2025>>> del a[0]
2026>>> a
2027[1, 66.25, 333, 333, 1234.5]
2028>>> del a[2:4]
2029>>> a
2030[1, 66.25, 1234.5]
2031>>> del a[:]
2032>>> a
2033[]
2034\end{verbatim}
2035
2036\keyword{del} can also be used to delete entire variables:
2037
2038\begin{verbatim}
2039>>> del a
2040\end{verbatim}
2041
2042Referencing the name \code{a} hereafter is an error (at least until
2043another value is assigned to it). We'll find other uses for
2044\keyword{del} later.
2045
2046
2047\section{Tuples and Sequences \label{tuples}}
2048
2049We saw that lists and strings have many common properties, such as
2050indexing and slicing operations. They are two examples of
2051\ulink{\emph{sequence} data types}{../lib/typesseq.html}. Since
2052Python is an evolving language, other sequence data types may be
2053added. There is also another standard sequence data type: the
2054\emph{tuple}.
2055
2056A tuple consists of a number of values separated by commas, for
2057instance:
2058
2059\begin{verbatim}
2060>>> t = 12345, 54321, 'hello!'
2061>>> t[0]
206212345
2063>>> t
2064(12345, 54321, 'hello!')
2065>>> # Tuples may be nested:
2066... u = t, (1, 2, 3, 4, 5)
2067>>> u
2068((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
2069\end{verbatim}
2070
2071As you see, on output tuples are always enclosed in parentheses, so
2072that nested tuples are interpreted correctly; they may be input with
2073or without surrounding parentheses, although often parentheses are
2074necessary anyway (if the tuple is part of a larger expression).
2075
2076Tuples have many uses. For example: (x, y) coordinate pairs, employee
2077records from a database, etc. Tuples, like strings, are immutable: it
2078is not possible to assign to the individual items of a tuple (you can
2079simulate much of the same effect with slicing and concatenation,
2080though). It is also possible to create tuples which contain mutable
2081objects, such as lists.
2082
2083A special problem is the construction of tuples containing 0 or 1
2084items: the syntax has some extra quirks to accommodate these. Empty
2085tuples are constructed by an empty pair of parentheses; a tuple with
2086one item is constructed by following a value with a comma
2087(it is not sufficient to enclose a single value in parentheses).
2088Ugly, but effective. For example:
2089
2090\begin{verbatim}
2091>>> empty = ()
2092>>> singleton = 'hello', # <-- note trailing comma
2093>>> len(empty)
20940
2095>>> len(singleton)
20961
2097>>> singleton
2098('hello',)
2099\end{verbatim}
2100
2101The statement \code{t = 12345, 54321, 'hello!'} is an example of
2102\emph{tuple packing}: the values \code{12345}, \code{54321} and
2103\code{'hello!'} are packed together in a tuple. The reverse operation
2104is also possible:
2105
2106\begin{verbatim}
2107>>> x, y, z = t
2108\end{verbatim}
2109
2110This is called, appropriately enough, \emph{sequence unpacking}.
2111Sequence unpacking requires the list of variables on the left to
2112have the same number of elements as the length of the sequence. Note
2113that multiple assignment is really just a combination of tuple packing
2114and sequence unpacking!
2115
2116There is a small bit of asymmetry here: packing multiple values
2117always creates a tuple, and unpacking works for any sequence.
2118
2119% XXX Add a bit on the difference between tuples and lists.
2120
2121
2122\section{Sets \label{sets}}
2123
2124Python also includes a data type for \emph{sets}. A set is an unordered
2125collection with no duplicate elements. Basic uses include membership
2126testing and eliminating duplicate entries. Set objects also support
2127mathematical operations like union, intersection, difference, and
2128symmetric difference.
2129
2130Here is a brief demonstration:
2131
2132\begin{verbatim}
2133>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
2134>>> fruit = set(basket) # create a set without duplicates
2135>>> fruit
2136set(['orange', 'pear', 'apple', 'banana'])
2137>>> 'orange' in fruit # fast membership testing
2138True
2139>>> 'crabgrass' in fruit
2140False
2141
2142>>> # Demonstrate set operations on unique letters from two words
2143...
2144>>> a = set('abracadabra')
2145>>> b = set('alacazam')
2146>>> a # unique letters in a
2147set(['a', 'r', 'b', 'c', 'd'])
2148>>> a - b # letters in a but not in b
2149set(['r', 'd', 'b'])
2150>>> a | b # letters in either a or b
2151set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
2152>>> a & b # letters in both a and b
2153set(['a', 'c'])
2154>>> a ^ b # letters in a or b but not both
2155set(['r', 'd', 'b', 'm', 'z', 'l'])
2156\end{verbatim}
2157
2158
2159\section{Dictionaries \label{dictionaries}}
2160
2161Another useful data type built into Python is the
2162\ulink{\emph{dictionary}}{../lib/typesmapping.html}.
2163Dictionaries are sometimes found in other languages as ``associative
2164memories'' or ``associative arrays''. Unlike sequences, which are
2165indexed by a range of numbers, dictionaries are indexed by \emph{keys},
2166which can be any immutable type; strings and numbers can always be
2167keys. Tuples can be used as keys if they contain only strings,
2168numbers, or tuples; if a tuple contains any mutable object either
2169directly or indirectly, it cannot be used as a key. You can't use
2170lists as keys, since lists can be modified in place using
2171index assignments, slice assignments, or methods like
2172\method{append()} and \method{extend()}.
2173
2174It is best to think of a dictionary as an unordered set of
2175\emph{key: value} pairs, with the requirement that the keys are unique
2176(within one dictionary).
2177A pair of braces creates an empty dictionary: \code{\{\}}.
2178Placing a comma-separated list of key:value pairs within the
2179braces adds initial key:value pairs to the dictionary; this is also the
2180way dictionaries are written on output.
2181
2182The main operations on a dictionary are storing a value with some key
2183and extracting the value given the key. It is also possible to delete
2184a key:value pair
2185with \code{del}.
2186If you store using a key that is already in use, the old value
2187associated with that key is forgotten. It is an error to extract a
2188value using a non-existent key.
2189
2190The \method{keys()} method of a dictionary object returns a list of all
2191the keys used in the dictionary, in arbitrary order (if you want it
2192sorted, just apply the \method{sort()} method to the list of keys). To
2193check whether a single key is in the dictionary, either use the dictionary's
2194\method{has_key()} method or the \keyword{in} keyword.
2195
2196Here is a small example using a dictionary:
2197
2198\begin{verbatim}
2199>>> tel = {'jack': 4098, 'sape': 4139}
2200>>> tel['guido'] = 4127
2201>>> tel
2202{'sape': 4139, 'guido': 4127, 'jack': 4098}
2203>>> tel['jack']
22044098
2205>>> del tel['sape']
2206>>> tel['irv'] = 4127
2207>>> tel
2208{'guido': 4127, 'irv': 4127, 'jack': 4098}
2209>>> tel.keys()
2210['guido', 'irv', 'jack']
2211>>> tel.has_key('guido')
2212True
2213>>> 'guido' in tel
2214True
2215\end{verbatim}
2216
2217The \function{dict()} constructor builds dictionaries directly from
2218lists of key-value pairs stored as tuples. When the pairs form a
2219pattern, list comprehensions can compactly specify the key-value list.
2220
2221\begin{verbatim}
2222>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
2223{'sape': 4139, 'jack': 4098, 'guido': 4127}
2224>>> dict([(x, x**2) for x in (2, 4, 6)]) # use a list comprehension
2225{2: 4, 4: 16, 6: 36}
2226\end{verbatim}
2227
2228Later in the tutorial, we will learn about Generator Expressions
2229which are even better suited for the task of supplying key-values pairs to
2230the \function{dict()} constructor.
2231
2232When the keys are simple strings, it is sometimes easier to specify
2233pairs using keyword arguments:
2234
2235\begin{verbatim}
2236>>> dict(sape=4139, guido=4127, jack=4098)
2237{'sape': 4139, 'jack': 4098, 'guido': 4127}
2238\end{verbatim}
2239
2240
2241\section{Looping Techniques \label{loopidioms}}
2242
2243When looping through dictionaries, the key and corresponding value can
2244be retrieved at the same time using the \method{iteritems()} method.
2245
2246\begin{verbatim}
2247>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
2248>>> for k, v in knights.iteritems():
2249... print k, v
2250...
2251gallahad the pure
2252robin the brave
2253\end{verbatim}
2254
2255When looping through a sequence, the position index and corresponding
2256value can be retrieved at the same time using the
2257\function{enumerate()} function.
2258
2259\begin{verbatim}
2260>>> for i, v in enumerate(['tic', 'tac', 'toe']):
2261... print i, v
2262...
22630 tic
22641 tac
22652 toe
2266\end{verbatim}
2267
2268To loop over two or more sequences at the same time, the entries
2269can be paired with the \function{zip()} function.
2270
2271\begin{verbatim}
2272>>> questions = ['name', 'quest', 'favorite color']
2273>>> answers = ['lancelot', 'the holy grail', 'blue']
2274>>> for q, a in zip(questions, answers):
2275... print 'What is your %s? It is %s.' % (q, a)
2276...
2277What is your name? It is lancelot.
2278What is your quest? It is the holy grail.
2279What is your favorite color? It is blue.
2280\end{verbatim}
2281
2282To loop over a sequence in reverse, first specify the sequence
2283in a forward direction and then call the \function{reversed()}
2284function.
2285
2286\begin{verbatim}
2287>>> for i in reversed(xrange(1,10,2)):
2288... print i
2289...
22909
22917
22925
22933
22941
2295\end{verbatim}
2296
2297To loop over a sequence in sorted order, use the \function{sorted()}
2298function which returns a new sorted list while leaving the source
2299unaltered.
2300
2301\begin{verbatim}
2302>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
2303>>> for f in sorted(set(basket)):
2304... print f
2305...
2306apple
2307banana
2308orange
2309pear
2310\end{verbatim}
2311
2312\section{More on Conditions \label{conditions}}
2313
2314The conditions used in \code{while} and \code{if} statements can
2315contain any operators, not just comparisons.
2316
2317The comparison operators \code{in} and \code{not in} check whether a value
2318occurs (does not occur) in a sequence. The operators \code{is} and
2319\code{is not} compare whether two objects are really the same object; this
2320only matters for mutable objects like lists. All comparison operators
2321have the same priority, which is lower than that of all numerical
2322operators.
2323
2324Comparisons can be chained. For example, \code{a < b == c} tests
2325whether \code{a} is less than \code{b} and moreover \code{b} equals
2326\code{c}.
2327
2328Comparisons may be combined using the Boolean operators \code{and} and
2329\code{or}, and the outcome of a comparison (or of any other Boolean
2330expression) may be negated with \code{not}. These have lower
2331priorities than comparison operators; between them, \code{not} has
2332the highest priority and \code{or} the lowest, so that
2333\code{A and not B or C} is equivalent to \code{(A and (not B)) or C}.
2334As always, parentheses can be used to express the desired composition.
2335
2336The Boolean operators \code{and} and \code{or} are so-called
2337\emph{short-circuit} operators: their arguments are evaluated from
2338left to right, and evaluation stops as soon as the outcome is
2339determined. For example, if \code{A} and \code{C} are true but
2340\code{B} is false, \code{A and B and C} does not evaluate the
2341expression \code{C}. When used as a general value and not as a
2342Boolean, the return value of a short-circuit operator is the last
2343evaluated argument.
2344
2345It is possible to assign the result of a comparison or other Boolean
2346expression to a variable. For example,
2347
2348\begin{verbatim}
2349>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
2350>>> non_null = string1 or string2 or string3
2351>>> non_null
2352'Trondheim'
2353\end{verbatim}
2354
2355Note that in Python, unlike C, assignment cannot occur inside expressions.
2356C programmers may grumble about this, but it avoids a common class of
2357problems encountered in C programs: typing \code{=} in an expression when
2358\code{==} was intended.
2359
2360
2361\section{Comparing Sequences and Other Types \label{comparing}}
2362
2363Sequence objects may be compared to other objects with the same
2364sequence type. The comparison uses \emph{lexicographical} ordering:
2365first the first two items are compared, and if they differ this
2366determines the outcome of the comparison; if they are equal, the next
2367two items are compared, and so on, until either sequence is exhausted.
2368If two items to be compared are themselves sequences of the same type,
2369the lexicographical comparison is carried out recursively. If all
2370items of two sequences compare equal, the sequences are considered
2371equal. If one sequence is an initial sub-sequence of the other, the
2372shorter sequence is the smaller (lesser) one. Lexicographical
2373ordering for strings uses the \ASCII{} ordering for individual
2374characters. Some examples of comparisons between sequences of the
2375same type:
2376
2377\begin{verbatim}
2378(1, 2, 3) < (1, 2, 4)
2379[1, 2, 3] < [1, 2, 4]
2380'ABC' < 'C' < 'Pascal' < 'Python'
2381(1, 2, 3, 4) < (1, 2, 4)
2382(1, 2) < (1, 2, -1)
2383(1, 2, 3) == (1.0, 2.0, 3.0)
2384(1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)
2385\end{verbatim}
2386
2387Note that comparing objects of different types is legal. The outcome
2388is deterministic but arbitrary: the types are ordered by their name.
2389Thus, a list is always smaller than a string, a string is always
2390smaller than a tuple, etc. \footnote{
2391 The rules for comparing objects of different types should
2392 not be relied upon; they may change in a future version of
2393 the language.
2394} Mixed numeric types are compared according to their numeric value, so
23950 equals 0.0, etc.
2396
2397
2398\chapter{Modules \label{modules}}
2399
2400If you quit from the Python interpreter and enter it again, the
2401definitions you have made (functions and variables) are lost.
2402Therefore, if you want to write a somewhat longer program, you are
2403better off using a text editor to prepare the input for the interpreter
2404and running it with that file as input instead. This is known as creating a
2405\emph{script}. As your program gets longer, you may want to split it
2406into several files for easier maintenance. You may also want to use a
2407handy function that you've written in several programs without copying
2408its definition into each program.
2409
2410To support this, Python has a way to put definitions in a file and use
2411them in a script or in an interactive instance of the interpreter.
2412Such a file is called a \emph{module}; definitions from a module can be
2413\emph{imported} into other modules or into the \emph{main} module (the
2414collection of variables that you have access to in a script
2415executed at the top level
2416and in calculator mode).
2417
2418A module is a file containing Python definitions and statements. The
2419file name is the module name with the suffix \file{.py} appended. Within
2420a module, the module's name (as a string) is available as the value of
2421the global variable \code{__name__}. For instance, use your favorite text
2422editor to create a file called \file{fibo.py} in the current directory
2423with the following contents:
2424
2425\begin{verbatim}
2426# Fibonacci numbers module
2427
2428def fib(n): # write Fibonacci series up to n
2429 a, b = 0, 1
2430 while b < n:
2431 print b,
2432 a, b = b, a+b
2433
2434def fib2(n): # return Fibonacci series up to n
2435 result = []
2436 a, b = 0, 1
2437 while b < n:
2438 result.append(b)
2439 a, b = b, a+b
2440 return result
2441\end{verbatim}
2442
2443Now enter the Python interpreter and import this module with the
2444following command:
2445
2446\begin{verbatim}
2447>>> import fibo
2448\end{verbatim}
2449
2450This does not enter the names of the functions defined in \code{fibo}
2451directly in the current symbol table; it only enters the module name
2452\code{fibo} there.
2453Using the module name you can access the functions:
2454
2455\begin{verbatim}
2456>>> fibo.fib(1000)
24571 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
2458>>> fibo.fib2(100)
2459[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
2460>>> fibo.__name__
2461'fibo'
2462\end{verbatim}
2463
2464If you intend to use a function often you can assign it to a local name:
2465
2466\begin{verbatim}
2467>>> fib = fibo.fib
2468>>> fib(500)
24691 1 2 3 5 8 13 21 34 55 89 144 233 377
2470\end{verbatim}
2471
2472
2473\section{More on Modules \label{moreModules}}
2474
2475A module can contain executable statements as well as function
2476definitions.
2477These statements are intended to initialize the module.
2478They are executed only the
2479\emph{first} time the module is imported somewhere.\footnote{
2480 In fact function definitions are also `statements' that are
2481 `executed'; the execution enters the function name in the
2482 module's global symbol table.
2483}
2484
2485Each module has its own private symbol table, which is used as the
2486global symbol table by all functions defined in the module.
2487Thus, the author of a module can use global variables in the module
2488without worrying about accidental clashes with a user's global
2489variables.
2490On the other hand, if you know what you are doing you can touch a
2491module's global variables with the same notation used to refer to its
2492functions,
2493\code{modname.itemname}.
2494
2495Modules can import other modules. It is customary but not required to
2496place all \keyword{import} statements at the beginning of a module (or
2497script, for that matter). The imported module names are placed in the
2498importing module's global symbol table.
2499
2500There is a variant of the \keyword{import} statement that imports
2501names from a module directly into the importing module's symbol
2502table. For example:
2503
2504\begin{verbatim}
2505>>> from fibo import fib, fib2
2506>>> fib(500)
25071 1 2 3 5 8 13 21 34 55 89 144 233 377
2508\end{verbatim}
2509
2510This does not introduce the module name from which the imports are taken
2511in the local symbol table (so in the example, \code{fibo} is not
2512defined).
2513
2514There is even a variant to import all names that a module defines:
2515
2516\begin{verbatim}
2517>>> from fibo import *
2518>>> fib(500)
25191 1 2 3 5 8 13 21 34 55 89 144 233 377
2520\end{verbatim}
2521
2522This imports all names except those beginning with an underscore
2523(\code{_}).
2524
2525
2526\subsection{The Module Search Path \label{searchPath}}
2527
2528\indexiii{module}{search}{path}
2529When a module named \module{spam} is imported, the interpreter searches
2530for a file named \file{spam.py} in the current directory,
2531and then in the list of directories specified by
2532the environment variable \envvar{PYTHONPATH}. This has the same syntax as
2533the shell variable \envvar{PATH}, that is, a list of
2534directory names. When \envvar{PYTHONPATH} is not set, or when the file
2535is not found there, the search continues in an installation-dependent
2536default path; on \UNIX, this is usually \file{.:/usr/local/lib/python}.
2537
2538Actually, modules are searched in the list of directories given by the
2539variable \code{sys.path} which is initialized from the directory
2540containing the input script (or the current directory),
2541\envvar{PYTHONPATH} and the installation-dependent default. This allows
2542Python programs that know what they're doing to modify or replace the
2543module search path. Note that because the directory containing the
2544script being run is on the search path, it is important that the
2545script not have the same name as a standard module, or Python will
2546attempt to load the script as a module when that module is imported.
2547This will generally be an error. See section~\ref{standardModules},
2548``Standard Modules,'' for more information.
2549
2550
2551\subsection{``Compiled'' Python files}
2552
2553As an important speed-up of the start-up time for short programs that
2554use a lot of standard modules, if a file called \file{spam.pyc} exists
2555in the directory where \file{spam.py} is found, this is assumed to
2556contain an already-``byte-compiled'' version of the module \module{spam}.
2557The modification time of the version of \file{spam.py} used to create
2558\file{spam.pyc} is recorded in \file{spam.pyc}, and the
2559\file{.pyc} file is ignored if these don't match.
2560
2561Normally, you don't need to do anything to create the
2562\file{spam.pyc} file. Whenever \file{spam.py} is successfully
2563compiled, an attempt is made to write the compiled version to
2564\file{spam.pyc}. It is not an error if this attempt fails; if for any
2565reason the file is not written completely, the resulting
2566\file{spam.pyc} file will be recognized as invalid and thus ignored
2567later. The contents of the \file{spam.pyc} file are platform
2568independent, so a Python module directory can be shared by machines of
2569different architectures.
2570
2571Some tips for experts:
2572
2573\begin{itemize}
2574
2575\item
2576When the Python interpreter is invoked with the \programopt{-O} flag,
2577optimized code is generated and stored in \file{.pyo} files. The
2578optimizer currently doesn't help much; it only removes
2579\keyword{assert} statements. When \programopt{-O} is used, \emph{all}
2580bytecode is optimized; \code{.pyc} files are ignored and \code{.py}
2581files are compiled to optimized bytecode.
2582
2583\item
2584Passing two \programopt{-O} flags to the Python interpreter
2585(\programopt{-OO}) will cause the bytecode compiler to perform
2586optimizations that could in some rare cases result in malfunctioning
2587programs. Currently only \code{__doc__} strings are removed from the
2588bytecode, resulting in more compact \file{.pyo} files. Since some
2589programs may rely on having these available, you should only use this
2590option if you know what you're doing.
2591
2592\item
2593A program doesn't run any faster when it is read from a \file{.pyc} or
2594\file{.pyo} file than when it is read from a \file{.py} file; the only
2595thing that's faster about \file{.pyc} or \file{.pyo} files is the
2596speed with which they are loaded.
2597
2598\item
2599When a script is run by giving its name on the command line, the
2600bytecode for the script is never written to a \file{.pyc} or
2601\file{.pyo} file. Thus, the startup time of a script may be reduced
2602by moving most of its code to a module and having a small bootstrap
2603script that imports that module. It is also possible to name a
2604\file{.pyc} or \file{.pyo} file directly on the command line.
2605
2606\item
2607It is possible to have a file called \file{spam.pyc} (or
2608\file{spam.pyo} when \programopt{-O} is used) without a file
2609\file{spam.py} for the same module. This can be used to distribute a
2610library of Python code in a form that is moderately hard to reverse
2611engineer.
2612
2613\item
2614The module \ulink{\module{compileall}}{../lib/module-compileall.html}%
2615{} \refstmodindex{compileall} can create \file{.pyc} files (or
2616\file{.pyo} files when \programopt{-O} is used) for all modules in a
2617directory.
2618
2619\end{itemize}
2620
2621
2622\section{Standard Modules \label{standardModules}}
2623
2624Python comes with a library of standard modules, described in a separate
2625document, the \citetitle[../lib/lib.html]{Python Library Reference}
2626(``Library Reference'' hereafter). Some modules are built into the
2627interpreter; these provide access to operations that are not part of
2628the core of the language but are nevertheless built in, either for
2629efficiency or to provide access to operating system primitives such as
2630system calls. The set of such modules is a configuration option which
2631also depends on the underlying platform For example,
2632the \module{amoeba} module is only provided on systems that somehow
2633support Amoeba primitives. One particular module deserves some
2634attention: \ulink{\module{sys}}{../lib/module-sys.html}%
2635\refstmodindex{sys}, which is built into every
2636Python interpreter. The variables \code{sys.ps1} and
2637\code{sys.ps2} define the strings used as primary and secondary
2638prompts:
2639
2640\begin{verbatim}
2641>>> import sys
2642>>> sys.ps1
2643'>>> '
2644>>> sys.ps2
2645'... '
2646>>> sys.ps1 = 'C> '
2647C> print 'Yuck!'
2648Yuck!
2649C>
2650
2651\end{verbatim}
2652
2653These two variables are only defined if the interpreter is in
2654interactive mode.
2655
2656The variable \code{sys.path} is a list of strings that determines the
2657interpreter's search path for modules. It is initialized to a default
2658path taken from the environment variable \envvar{PYTHONPATH}, or from
2659a built-in default if \envvar{PYTHONPATH} is not set. You can modify
2660it using standard list operations:
2661
2662\begin{verbatim}
2663>>> import sys
2664>>> sys.path.append('/ufs/guido/lib/python')
2665\end{verbatim}
2666
2667\section{The \function{dir()} Function \label{dir}}
2668
2669The built-in function \function{dir()} is used to find out which names
2670a module defines. It returns a sorted list of strings:
2671
2672\begin{verbatim}
2673>>> import fibo, sys
2674>>> dir(fibo)
2675['__name__', 'fib', 'fib2']
2676>>> dir(sys)
2677['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__',
2678 '__stdin__', '__stdout__', '_getframe', 'api_version', 'argv',
2679 'builtin_module_names', 'byteorder', 'callstats', 'copyright',
2680 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook',
2681 'exec_prefix', 'executable', 'exit', 'getdefaultencoding', 'getdlopenflags',
2682 'getrecursionlimit', 'getrefcount', 'hexversion', 'maxint', 'maxunicode',
2683 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache',
2684 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setdlopenflags',
2685 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout',
2686 'version', 'version_info', 'warnoptions']
2687\end{verbatim}
2688
2689Without arguments, \function{dir()} lists the names you have defined
2690currently:
2691
2692\begin{verbatim}
2693>>> a = [1, 2, 3, 4, 5]
2694>>> import fibo
2695>>> fib = fibo.fib
2696>>> dir()
2697['__builtins__', '__doc__', '__file__', '__name__', 'a', 'fib', 'fibo', 'sys']
2698\end{verbatim}
2699
2700Note that it lists all types of names: variables, modules, functions, etc.
2701
2702\function{dir()} does not list the names of built-in functions and
2703variables. If you want a list of those, they are defined in the
2704standard module \module{__builtin__}\refbimodindex{__builtin__}:
2705
2706\begin{verbatim}
2707>>> import __builtin__
2708>>> dir(__builtin__)
2709['ArithmeticError', 'AssertionError', 'AttributeError', 'DeprecationWarning',
2710 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False',
2711 'FloatingPointError', 'FutureWarning', 'IOError', 'ImportError',
2712 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt',
2713 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplemented',
2714 'NotImplementedError', 'OSError', 'OverflowError',
2715 'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError',
2716 'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError',
2717 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'True',
2718 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError',
2719 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError',
2720 'UserWarning', 'ValueError', 'Warning', 'WindowsError',
2721 'ZeroDivisionError', '_', '__debug__', '__doc__', '__import__',
2722 '__name__', 'abs', 'apply', 'basestring', 'bool', 'buffer',
2723 'callable', 'chr', 'classmethod', 'cmp', 'coerce', 'compile',
2724 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod',
2725 'enumerate', 'eval', 'execfile', 'exit', 'file', 'filter', 'float',
2726 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex',
2727 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'iter',
2728 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'min',
2729 'object', 'oct', 'open', 'ord', 'pow', 'property', 'quit', 'range',
2730 'raw_input', 'reduce', 'reload', 'repr', 'reversed', 'round', 'set',
2731 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super',
2732 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']
2733\end{verbatim}
2734
2735
2736\section{Packages \label{packages}}
2737
2738Packages are a way of structuring Python's module namespace
2739by using ``dotted module names''. For example, the module name
2740\module{A.B} designates a submodule named \samp{B} in a package named
2741\samp{A}. Just like the use of modules saves the authors of different
2742modules from having to worry about each other's global variable names,
2743the use of dotted module names saves the authors of multi-module
2744packages like NumPy or the Python Imaging Library from having to worry
2745about each other's module names.
2746
2747Suppose you want to design a collection of modules (a ``package'') for
2748the uniform handling of sound files and sound data. There are many
2749different sound file formats (usually recognized by their extension,
2750for example: \file{.wav}, \file{.aiff}, \file{.au}), so you may need
2751to create and maintain a growing collection of modules for the
2752conversion between the various file formats. There are also many
2753different operations you might want to perform on sound data (such as
2754mixing, adding echo, applying an equalizer function, creating an
2755artificial stereo effect), so in addition you will be writing a
2756never-ending stream of modules to perform these operations. Here's a
2757possible structure for your package (expressed in terms of a
2758hierarchical filesystem):
2759
2760\begin{verbatim}
2761Sound/ Top-level package
2762 __init__.py Initialize the sound package
2763 Formats/ Subpackage for file format conversions
2764 __init__.py
2765 wavread.py
2766 wavwrite.py
2767 aiffread.py
2768 aiffwrite.py
2769 auread.py
2770 auwrite.py
2771 ...
2772 Effects/ Subpackage for sound effects
2773 __init__.py
2774 echo.py
2775 surround.py
2776 reverse.py
2777 ...
2778 Filters/ Subpackage for filters
2779 __init__.py
2780 equalizer.py
2781 vocoder.py
2782 karaoke.py
2783 ...
2784\end{verbatim}
2785
2786When importing the package, Python searches through the directories
2787on \code{sys.path} looking for the package subdirectory.
2788
2789The \file{__init__.py} files are required to make Python treat the
2790directories as containing packages; this is done to prevent
2791directories with a common name, such as \samp{string}, from
2792unintentionally hiding valid modules that occur later on the module
2793search path. In the simplest case, \file{__init__.py} can just be an
2794empty file, but it can also execute initialization code for the
2795package or set the \code{__all__} variable, described later.
2796
2797Users of the package can import individual modules from the
2798package, for example:
2799
2800\begin{verbatim}
2801import Sound.Effects.echo
2802\end{verbatim}
2803
2804This loads the submodule \module{Sound.Effects.echo}. It must be referenced
2805with its full name.
2806
2807\begin{verbatim}
2808Sound.Effects.echo.echofilter(input, output, delay=0.7, atten=4)
2809\end{verbatim}
2810
2811An alternative way of importing the submodule is:
2812
2813\begin{verbatim}
2814from Sound.Effects import echo
2815\end{verbatim}
2816
2817This also loads the submodule \module{echo}, and makes it available without
2818its package prefix, so it can be used as follows:
2819
2820\begin{verbatim}
2821echo.echofilter(input, output, delay=0.7, atten=4)
2822\end{verbatim}
2823
2824Yet another variation is to import the desired function or variable directly:
2825
2826\begin{verbatim}
2827from Sound.Effects.echo import echofilter
2828\end{verbatim}
2829
2830Again, this loads the submodule \module{echo}, but this makes its function
2831\function{echofilter()} directly available:
2832
2833\begin{verbatim}
2834echofilter(input, output, delay=0.7, atten=4)
2835\end{verbatim}
2836
2837Note that when using \code{from \var{package} import \var{item}}, the
2838item can be either a submodule (or subpackage) of the package, or some
2839other name defined in the package, like a function, class or
2840variable. The \code{import} statement first tests whether the item is
2841defined in the package; if not, it assumes it is a module and attempts
2842to load it. If it fails to find it, an
2843\exception{ImportError} exception is raised.
2844
2845Contrarily, when using syntax like \code{import
2846\var{item.subitem.subsubitem}}, each item except for the last must be
2847a package; the last item can be a module or a package but can't be a
2848class or function or variable defined in the previous item.
2849
2850\subsection{Importing * From a Package \label{pkg-import-star}}
2851%The \code{__all__} Attribute
2852
2853\ttindex{__all__}
2854Now what happens when the user writes \code{from Sound.Effects import
2855*}? Ideally, one would hope that this somehow goes out to the
2856filesystem, finds which submodules are present in the package, and
2857imports them all. Unfortunately, this operation does not work very
2858well on Mac and Windows platforms, where the filesystem does not
2859always have accurate information about the case of a filename! On
2860these platforms, there is no guaranteed way to know whether a file
2861\file{ECHO.PY} should be imported as a module \module{echo},
2862\module{Echo} or \module{ECHO}. (For example, Windows 95 has the
2863annoying practice of showing all file names with a capitalized first
2864letter.) The DOS 8+3 filename restriction adds another interesting
2865problem for long module names.
2866
2867The only solution is for the package author to provide an explicit
2868index of the package. The import statement uses the following
2869convention: if a package's \file{__init__.py} code defines a list
2870named \code{__all__}, it is taken to be the list of module names that
2871should be imported when \code{from \var{package} import *} is
2872encountered. It is up to the package author to keep this list
2873up-to-date when a new version of the package is released. Package
2874authors may also decide not to support it, if they don't see a use for
2875importing * from their package. For example, the file
2876\file{Sounds/Effects/__init__.py} could contain the following code:
2877
2878\begin{verbatim}
2879__all__ = ["echo", "surround", "reverse"]
2880\end{verbatim}
2881
2882This would mean that \code{from Sound.Effects import *} would
2883import the three named submodules of the \module{Sound} package.
2884
2885If \code{__all__} is not defined, the statement \code{from Sound.Effects
2886import *} does \emph{not} import all submodules from the package
2887\module{Sound.Effects} into the current namespace; it only ensures that the
2888package \module{Sound.Effects} has been imported (possibly running any
2889initialization code in \file{__init__.py}) and then imports whatever names are
2890defined in the package. This includes any names defined (and
2891submodules explicitly loaded) by \file{__init__.py}. It also includes any
2892submodules of the package that were explicitly loaded by previous
2893import statements. Consider this code:
2894
2895\begin{verbatim}
2896import Sound.Effects.echo
2897import Sound.Effects.surround
2898from Sound.Effects import *
2899\end{verbatim}
2900
2901In this example, the echo and surround modules are imported in the
2902current namespace because they are defined in the
2903\module{Sound.Effects} package when the \code{from...import} statement
2904is executed. (This also works when \code{__all__} is defined.)
2905
2906Note that in general the practice of importing \code{*} from a module or
2907package is frowned upon, since it often causes poorly readable code.
2908However, it is okay to use it to save typing in interactive sessions,
2909and certain modules are designed to export only names that follow
2910certain patterns.
2911
2912Remember, there is nothing wrong with using \code{from Package
2913import specific_submodule}! In fact, this is the
2914recommended notation unless the importing module needs to use
2915submodules with the same name from different packages.
2916
2917
2918\subsection{Intra-package References}
2919
2920The submodules often need to refer to each other. For example, the
2921\module{surround} module might use the \module{echo} module. In fact,
2922such references are so common that the \keyword{import} statement
2923first looks in the containing package before looking in the standard
2924module search path. Thus, the \module{surround} module can simply use
2925\code{import echo} or \code{from echo import echofilter}. If the
2926imported module is not found in the current package (the package of
2927which the current module is a submodule), the \keyword{import}
2928statement looks for a top-level module with the given name.
2929
2930When packages are structured into subpackages (as with the
2931\module{Sound} package in the example), there's no shortcut to refer
2932to submodules of sibling packages - the full name of the subpackage
2933must be used. For example, if the module
2934\module{Sound.Filters.vocoder} needs to use the \module{echo} module
2935in the \module{Sound.Effects} package, it can use \code{from
2936Sound.Effects import echo}.
2937
2938Starting with Python 2.5, in addition to the implicit relative imports
2939described above, you can write explicit relative imports with the
2940\code{from module import name} form of import statement. These explicit
2941relative imports use leading dots to indicate the current and parent
2942packages involved in the relative import. From the \module{surround}
2943module for example, you might use:
2944
2945\begin{verbatim}
2946from . import echo
2947from .. import Formats
2948from ..Filters import equalizer
2949\end{verbatim}
2950
2951Note that both explicit and implicit relative imports are based on the
2952name of the current module. Since the name of the main module is always
2953\code{"__main__"}, modules intended for use as the main module of a
2954Python application should always use absolute imports.
2955
2956\subsection{Packages in Multiple Directories}
2957
2958Packages support one more special attribute, \member{__path__}. This
2959is initialized to be a list containing the name of the directory
2960holding the package's \file{__init__.py} before the code in that file
2961is executed. This variable can be modified; doing so affects future
2962searches for modules and subpackages contained in the package.
2963
2964While this feature is not often needed, it can be used to extend the
2965set of modules found in a package.
2966
2967
2968
2969\chapter{Input and Output \label{io}}
2970
2971There are several ways to present the output of a program; data can be
2972printed in a human-readable form, or written to a file for future use.
2973This chapter will discuss some of the possibilities.
2974
2975
2976\section{Fancier Output Formatting \label{formatting}}
2977
2978So far we've encountered two ways of writing values: \emph{expression
2979statements} and the \keyword{print} statement. (A third way is using
2980the \method{write()} method of file objects; the standard output file
2981can be referenced as \code{sys.stdout}. See the Library Reference for
2982more information on this.)
2983
2984Often you'll want more control over the formatting of your output than
2985simply printing space-separated values. There are two ways to format
2986your output; the first way is to do all the string handling yourself;
2987using string slicing and concatenation operations you can create any
2988layout you can imagine. The standard module
2989\module{string}\refstmodindex{string} contains some useful operations
2990for padding strings to a given column width; these will be discussed
2991shortly. The second way is to use the \code{\%} operator with a
2992string as the left argument. The \code{\%} operator interprets the
2993left argument much like a \cfunction{sprintf()}-style format
2994string to be applied to the right argument, and returns the string
2995resulting from this formatting operation.
2996
2997One question remains, of course: how do you convert values to strings?
2998Luckily, Python has ways to convert any value to a string: pass it to
2999the \function{repr()} or \function{str()} functions. Reverse quotes
3000(\code{``}) are equivalent to \function{repr()}, but they are no
3001longer used in modern Python code and will likely not be in future
3002versions of the language.
3003
3004The \function{str()} function is meant to return representations of
3005values which are fairly human-readable, while \function{repr()} is
3006meant to generate representations which can be read by the interpreter
3007(or will force a \exception{SyntaxError} if there is not equivalent
3008syntax). For objects which don't have a particular representation for
3009human consumption, \function{str()} will return the same value as
3010\function{repr()}. Many values, such as numbers or structures like
3011lists and dictionaries, have the same representation using either
3012function. Strings and floating point numbers, in particular, have two
3013distinct representations.
3014
3015Some examples:
3016
3017\begin{verbatim}
3018>>> s = 'Hello, world.'
3019>>> str(s)
3020'Hello, world.'
3021>>> repr(s)
3022"'Hello, world.'"
3023>>> str(0.1)
3024'0.1'
3025>>> repr(0.1)
3026'0.10000000000000001'
3027>>> x = 10 * 3.25
3028>>> y = 200 * 200
3029>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
3030>>> print s
3031The value of x is 32.5, and y is 40000...
3032>>> # The repr() of a string adds string quotes and backslashes:
3033... hello = 'hello, world\n'
3034>>> hellos = repr(hello)
3035>>> print hellos
3036'hello, world\n'
3037>>> # The argument to repr() may be any Python object:
3038... repr((x, y, ('spam', 'eggs')))
3039"(32.5, 40000, ('spam', 'eggs'))"
3040>>> # reverse quotes are convenient in interactive sessions:
3041... `x, y, ('spam', 'eggs')`
3042"(32.5, 40000, ('spam', 'eggs'))"
3043\end{verbatim}
3044
3045Here are two ways to write a table of squares and cubes:
3046
3047\begin{verbatim}
3048>>> for x in range(1, 11):
3049... print repr(x).rjust(2), repr(x*x).rjust(3),
3050... # Note trailing comma on previous line
3051... print repr(x*x*x).rjust(4)
3052...
3053 1 1 1
3054 2 4 8
3055 3 9 27
3056 4 16 64
3057 5 25 125
3058 6 36 216
3059 7 49 343
3060 8 64 512
3061 9 81 729
306210 100 1000
3063>>> for x in range(1,11):
3064... print '%2d %3d %4d' % (x, x*x, x*x*x)
3065...
3066 1 1 1
3067 2 4 8
3068 3 9 27
3069 4 16 64
3070 5 25 125
3071 6 36 216
3072 7 49 343
3073 8 64 512
3074 9 81 729
307510 100 1000
3076\end{verbatim}
3077
3078(Note that one space between each column was added by the way
3079\keyword{print} works: it always adds spaces between its arguments.)
3080
3081This example demonstrates the \method{rjust()} method of string objects,
3082which right-justifies a string in a field of a given width by padding
3083it with spaces on the left. There are similar methods
3084\method{ljust()} and \method{center()}. These
3085methods do not write anything, they just return a new string. If
3086the input string is too long, they don't truncate it, but return it
3087unchanged; this will mess up your column lay-out but that's usually
3088better than the alternative, which would be lying about a value. (If
3089you really want truncation you can always add a slice operation, as in
3090\samp{x.ljust(n)[:n]}.)
3091
3092There is another method, \method{zfill()}, which pads a
3093numeric string on the left with zeros. It understands about plus and
3094minus signs:
3095
3096\begin{verbatim}
3097>>> '12'.zfill(5)
3098'00012'
3099>>> '-3.14'.zfill(7)
3100'-003.14'
3101>>> '3.14159265359'.zfill(5)
3102'3.14159265359'
3103\end{verbatim}
3104
3105Using the \code{\%} operator looks like this:
3106
3107\begin{verbatim}
3108>>> import math
3109>>> print 'The value of PI is approximately %5.3f.' % math.pi
3110The value of PI is approximately 3.142.
3111\end{verbatim}
3112
3113If there is more than one format in the string, you need to pass a
3114tuple as right operand, as in this example:
3115
3116\begin{verbatim}
3117>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
3118>>> for name, phone in table.items():
3119... print '%-10s ==> %10d' % (name, phone)
3120...
3121Jack ==> 4098
3122Dcab ==> 7678
3123Sjoerd ==> 4127
3124\end{verbatim}
3125
3126Most formats work exactly as in C and require that you pass the proper
3127type; however, if you don't you get an exception, not a core dump.
3128The \code{\%s} format is more relaxed: if the corresponding argument is
3129not a string object, it is converted to string using the
3130\function{str()} built-in function. Using \code{*} to pass the width
3131or precision in as a separate (integer) argument is supported. The
3132C formats \code{\%n} and \code{\%p} are not supported.
3133
3134If you have a really long format string that you don't want to split
3135up, it would be nice if you could reference the variables to be
3136formatted by name instead of by position. This can be done by using
3137form \code{\%(name)format}, as shown here:
3138
3139\begin{verbatim}
3140>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
3141>>> print 'Jack: %(Jack)d; Sjoerd: %(Sjoerd)d; Dcab: %(Dcab)d' % table
3142Jack: 4098; Sjoerd: 4127; Dcab: 8637678
3143\end{verbatim}
3144
3145This is particularly useful in combination with the new built-in
3146\function{vars()} function, which returns a dictionary containing all
3147local variables.
3148
3149\section{Reading and Writing Files \label{files}}
3150
3151% Opening files
3152\function{open()}\bifuncindex{open} returns a file
3153object\obindex{file}, and is most commonly used with two arguments:
3154\samp{open(\var{filename}, \var{mode})}.
3155
3156\begin{verbatim}
3157>>> f=open('/tmp/workfile', 'w')
3158>>> print f
3159<open file '/tmp/workfile', mode 'w' at 80a0960>
3160\end{verbatim}
3161
3162The first argument is a string containing the filename. The second
3163argument is another string containing a few characters describing the
3164way in which the file will be used. \var{mode} can be \code{'r'} when
3165the file will only be read, \code{'w'} for only writing (an existing
3166file with the same name will be erased), and \code{'a'} opens the file
3167for appending; any data written to the file is automatically added to
3168the end. \code{'r+'} opens the file for both reading and writing.
3169The \var{mode} argument is optional; \code{'r'} will be assumed if
3170it's omitted.
3171
3172On Windows and the Macintosh, \code{'b'} appended to the
3173mode opens the file in binary mode, so there are also modes like
3174\code{'rb'}, \code{'wb'}, and \code{'r+b'}. Windows makes a
3175distinction between text and binary files; the end-of-line characters
3176in text files are automatically altered slightly when data is read or
3177written. This behind-the-scenes modification to file data is fine for
3178\ASCII{} text files, but it'll corrupt binary data like that in \file{JPEG} or
3179\file{EXE} files. Be very careful to use binary mode when reading and
3180writing such files.
3181
3182\subsection{Methods of File Objects \label{fileMethods}}
3183
3184The rest of the examples in this section will assume that a file
3185object called \code{f} has already been created.
3186
3187To read a file's contents, call \code{f.read(\var{size})}, which reads
3188some quantity of data and returns it as a string. \var{size} is an
3189optional numeric argument. When \var{size} is omitted or negative,
3190the entire contents of the file will be read and returned; it's your
3191problem if the file is twice as large as your machine's memory.
3192Otherwise, at most \var{size} bytes are read and returned. If the end
3193of the file has been reached, \code{f.read()} will return an empty
3194string (\code {""}).
3195\begin{verbatim}
3196>>> f.read()
3197'This is the entire file.\n'
3198>>> f.read()
3199''
3200\end{verbatim}
3201
3202\code{f.readline()} reads a single line from the file; a newline
3203character (\code{\e n}) is left at the end of the string, and is only
3204omitted on the last line of the file if the file doesn't end in a
3205newline. This makes the return value unambiguous; if
3206\code{f.readline()} returns an empty string, the end of the file has
3207been reached, while a blank line is represented by \code{'\e n'}, a
3208string containing only a single newline.
3209
3210\begin{verbatim}
3211>>> f.readline()
3212'This is the first line of the file.\n'
3213>>> f.readline()
3214'Second line of the file\n'
3215>>> f.readline()
3216''
3217\end{verbatim}
3218
3219\code{f.readlines()} returns a list containing all the lines of data
3220in the file. If given an optional parameter \var{sizehint}, it reads
3221that many bytes from the file and enough more to complete a line, and
3222returns the lines from that. This is often used to allow efficient
3223reading of a large file by lines, but without having to load the
3224entire file in memory. Only complete lines will be returned.
3225
3226\begin{verbatim}
3227>>> f.readlines()
3228['This is the first line of the file.\n', 'Second line of the file\n']
3229\end{verbatim}
3230
3231An alternate approach to reading lines is to loop over the file object.
3232This is memory efficient, fast, and leads to simpler code:
3233
3234\begin{verbatim}
3235>>> for line in f:
3236 print line,
3237
3238This is the first line of the file.
3239Second line of the file
3240\end{verbatim}
3241
3242The alternative approach is simpler but does not provide as fine-grained
3243control. Since the two approaches manage line buffering differently,
3244they should not be mixed.
3245
3246\code{f.write(\var{string})} writes the contents of \var{string} to
3247the file, returning \code{None}.
3248
3249\begin{verbatim}
3250>>> f.write('This is a test\n')
3251\end{verbatim}
3252
3253To write something other than a string, it needs to be converted to a
3254string first:
3255
3256\begin{verbatim}
3257>>> value = ('the answer', 42)
3258>>> s = str(value)
3259>>> f.write(s)
3260\end{verbatim}
3261
3262\code{f.tell()} returns an integer giving the file object's current
3263position in the file, measured in bytes from the beginning of the
3264file. To change the file object's position, use
3265\samp{f.seek(\var{offset}, \var{from_what})}. The position is
3266computed from adding \var{offset} to a reference point; the reference
3267point is selected by the \var{from_what} argument. A
3268\var{from_what} value of 0 measures from the beginning of the file, 1
3269uses the current file position, and 2 uses the end of the file as the
3270reference point. \var{from_what} can be omitted and defaults to 0,
3271using the beginning of the file as the reference point.
3272
3273\begin{verbatim}
3274>>> f = open('/tmp/workfile', 'r+')
3275>>> f.write('0123456789abcdef')
3276>>> f.seek(5) # Go to the 6th byte in the file
3277>>> f.read(1)
3278'5'
3279>>> f.seek(-3, 2) # Go to the 3rd byte before the end
3280>>> f.read(1)
3281'd'
3282\end{verbatim}
3283
3284When you're done with a file, call \code{f.close()} to close it and
3285free up any system resources taken up by the open file. After calling
3286\code{f.close()}, attempts to use the file object will automatically fail.
3287
3288\begin{verbatim}
3289>>> f.close()
3290>>> f.read()
3291Traceback (most recent call last):
3292 File "<stdin>", line 1, in ?
3293ValueError: I/O operation on closed file
3294\end{verbatim}
3295
3296File objects have some additional methods, such as
3297\method{isatty()} and \method{truncate()} which are less frequently
3298used; consult the Library Reference for a complete guide to file
3299objects.
3300
3301\subsection{The \module{pickle} Module \label{pickle}}
3302\refstmodindex{pickle}
3303
3304Strings can easily be written to and read from a file. Numbers take a
3305bit more effort, since the \method{read()} method only returns
3306strings, which will have to be passed to a function like
3307\function{int()}, which takes a string like \code{'123'} and
3308returns its numeric value 123. However, when you want to save more
3309complex data types like lists, dictionaries, or class instances,
3310things get a lot more complicated.
3311
3312Rather than have users be constantly writing and debugging code to
3313save complicated data types, Python provides a standard module called
3314\ulink{\module{pickle}}{../lib/module-pickle.html}. This is an
3315amazing module that can take almost
3316any Python object (even some forms of Python code!), and convert it to
3317a string representation; this process is called \dfn{pickling}.
3318Reconstructing the object from the string representation is called
3319\dfn{unpickling}. Between pickling and unpickling, the string
3320representing the object may have been stored in a file or data, or
3321sent over a network connection to some distant machine.
3322
3323If you have an object \code{x}, and a file object \code{f} that's been
3324opened for writing, the simplest way to pickle the object takes only
3325one line of code:
3326
3327\begin{verbatim}
3328pickle.dump(x, f)
3329\end{verbatim}
3330
3331To unpickle the object again, if \code{f} is a file object which has
3332been opened for reading:
3333
3334\begin{verbatim}
3335x = pickle.load(f)
3336\end{verbatim}
3337
3338(There are other variants of this, used when pickling many objects or
3339when you don't want to write the pickled data to a file; consult the
3340complete documentation for
3341\ulink{\module{pickle}}{../lib/module-pickle.html} in the
3342\citetitle[../lib/]{Python Library Reference}.)
3343
3344\ulink{\module{pickle}}{../lib/module-pickle.html} is the standard way
3345to make Python objects which can be stored and reused by other
3346programs or by a future invocation of the same program; the technical
3347term for this is a \dfn{persistent} object. Because
3348\ulink{\module{pickle}}{../lib/module-pickle.html} is so widely used,
3349many authors who write Python extensions take care to ensure that new
3350data types such as matrices can be properly pickled and unpickled.
3351
3352
3353
3354\chapter{Errors and Exceptions \label{errors}}
3355
3356Until now error messages haven't been more than mentioned, but if you
3357have tried out the examples you have probably seen some. There are
3358(at least) two distinguishable kinds of errors:
3359\emph{syntax errors} and \emph{exceptions}.
3360
3361\section{Syntax Errors \label{syntaxErrors}}
3362
3363Syntax errors, also known as parsing errors, are perhaps the most common
3364kind of complaint you get while you are still learning Python:
3365
3366\begin{verbatim}
3367>>> while True print 'Hello world'
3368 File "<stdin>", line 1, in ?
3369 while True print 'Hello world'
3370 ^
3371SyntaxError: invalid syntax
3372\end{verbatim}
3373
3374The parser repeats the offending line and displays a little `arrow'
3375pointing at the earliest point in the line where the error was
3376detected. The error is caused by (or at least detected at) the token
3377\emph{preceding} the arrow: in the example, the error is detected at
3378the keyword \keyword{print}, since a colon (\character{:}) is missing
3379before it. File name and line number are printed so you know where to
3380look in case the input came from a script.
3381
3382\section{Exceptions \label{exceptions}}
3383
3384Even if a statement or expression is syntactically correct, it may
3385cause an error when an attempt is made to execute it.
3386Errors detected during execution are called \emph{exceptions} and are
3387not unconditionally fatal: you will soon learn how to handle them in
3388Python programs. Most exceptions are not handled by programs,
3389however, and result in error messages as shown here:
3390
3391\begin{verbatim}
3392>>> 10 * (1/0)
3393Traceback (most recent call last):
3394 File "<stdin>", line 1, in ?
3395ZeroDivisionError: integer division or modulo by zero
3396>>> 4 + spam*3
3397Traceback (most recent call last):
3398 File "<stdin>", line 1, in ?
3399NameError: name 'spam' is not defined
3400>>> '2' + 2
3401Traceback (most recent call last):
3402 File "<stdin>", line 1, in ?
3403TypeError: cannot concatenate 'str' and 'int' objects
3404\end{verbatim}
3405
3406The last line of the error message indicates what happened.
3407Exceptions come in different types, and the type is printed as part of
3408the message: the types in the example are
3409\exception{ZeroDivisionError}, \exception{NameError} and
3410\exception{TypeError}.
3411The string printed as the exception type is the name of the built-in
3412exception that occurred. This is true for all built-in
3413exceptions, but need not be true for user-defined exceptions (although
3414it is a useful convention).
3415Standard exception names are built-in identifiers (not reserved
3416keywords).
3417
3418The rest of the line provides detail based on the type of exception
3419and what caused it.
3420
3421The preceding part of the error message shows the context where the
3422exception happened, in the form of a stack traceback.
3423In general it contains a stack traceback listing source lines; however,
3424it will not display lines read from standard input.
3425
3426The \citetitle[../lib/module-exceptions.html]{Python Library
3427Reference} lists the built-in exceptions and their meanings.
3428
3429
3430\section{Handling Exceptions \label{handling}}
3431
3432It is possible to write programs that handle selected exceptions.
3433Look at the following example, which asks the user for input until a
3434valid integer has been entered, but allows the user to interrupt the
3435program (using \kbd{Control-C} or whatever the operating system
3436supports); note that a user-generated interruption is signalled by
3437raising the \exception{KeyboardInterrupt} exception.
3438
3439\begin{verbatim}
3440>>> while True:
3441... try:
3442... x = int(raw_input("Please enter a number: "))
3443... break
3444... except ValueError:
3445... print "Oops! That was no valid number. Try again..."
3446...
3447\end{verbatim}
3448
3449The \keyword{try} statement works as follows.
3450
3451\begin{itemize}
3452\item
3453First, the \emph{try clause} (the statement(s) between the
3454\keyword{try} and \keyword{except} keywords) is executed.
3455
3456\item
3457If no exception occurs, the \emph{except\ clause} is skipped and
3458execution of the \keyword{try} statement is finished.
3459
3460\item
3461If an exception occurs during execution of the try clause, the rest of
3462the clause is skipped. Then if its type matches the exception named
3463after the \keyword{except} keyword, the except clause is executed, and
3464then execution continues after the \keyword{try} statement.
3465
3466\item
3467If an exception occurs which does not match the exception named in the
3468except clause, it is passed on to outer \keyword{try} statements; if
3469no handler is found, it is an \emph{unhandled exception} and execution
3470stops with a message as shown above.
3471
3472\end{itemize}
3473
3474A \keyword{try} statement may have more than one except clause, to
3475specify handlers for different exceptions. At most one handler will
3476be executed. Handlers only handle exceptions that occur in the
3477corresponding try clause, not in other handlers of the same
3478\keyword{try} statement. An except clause may name multiple exceptions
3479as a parenthesized tuple, for example:
3480
3481\begin{verbatim}
3482... except (RuntimeError, TypeError, NameError):
3483... pass
3484\end{verbatim}
3485
3486The last except clause may omit the exception name(s), to serve as a
3487wildcard. Use this with extreme caution, since it is easy to mask a
3488real programming error in this way! It can also be used to print an
3489error message and then re-raise the exception (allowing a caller to
3490handle the exception as well):
3491
3492\begin{verbatim}
3493import sys
3494
3495try:
3496 f = open('myfile.txt')
3497 s = f.readline()
3498 i = int(s.strip())
3499except IOError, (errno, strerror):
3500 print "I/O error(%s): %s" % (errno, strerror)
3501except ValueError:
3502 print "Could not convert data to an integer."
3503except:
3504 print "Unexpected error:", sys.exc_info()[0]
3505 raise
3506\end{verbatim}
3507
3508The \keyword{try} \ldots\ \keyword{except} statement has an optional
3509\emph{else clause}, which, when present, must follow all except
3510clauses. It is useful for code that must be executed if the try
3511clause does not raise an exception. For example:
3512
3513\begin{verbatim}
3514for arg in sys.argv[1:]:
3515 try:
3516 f = open(arg, 'r')
3517 except IOError:
3518 print 'cannot open', arg
3519 else:
3520 print arg, 'has', len(f.readlines()), 'lines'
3521 f.close()
3522\end{verbatim}
3523
3524The use of the \keyword{else} clause is better than adding additional
3525code to the \keyword{try} clause because it avoids accidentally
3526catching an exception that wasn't raised by the code being protected
3527by the \keyword{try} \ldots\ \keyword{except} statement.
3528
3529
3530When an exception occurs, it may have an associated value, also known as
3531the exception's \emph{argument}.
3532The presence and type of the argument depend on the exception type.
3533
3534The except clause may specify a variable after the exception name (or tuple).
3535The variable is bound to an exception instance with the arguments stored
3536in \code{instance.args}. For convenience, the exception instance
3537defines \method{__getitem__} and \method{__str__} so the arguments can
3538be accessed or printed directly without having to reference \code{.args}.
3539
3540But use of \code{.args} is discouraged. Instead, the preferred use is to pass
3541a single argument to an exception (which can be a tuple if multiple arguments
3542are needed) and have it bound to the \code{message} attribute. One my also
3543instantiate an exception first before raising it and add any attributes to it
3544as desired.
3545
3546\begin{verbatim}
3547>>> try:
3548... raise Exception('spam', 'eggs')
3549... except Exception, inst:
3550... print type(inst) # the exception instance
3551... print inst.args # arguments stored in .args
3552... print inst # __str__ allows args to printed directly
3553... x, y = inst # __getitem__ allows args to be unpacked directly
3554... print 'x =', x
3555... print 'y =', y
3556...
3557<type 'instance'>
3558('spam', 'eggs')
3559('spam', 'eggs')
3560x = spam
3561y = eggs
3562\end{verbatim}
3563
3564If an exception has an argument, it is printed as the last part
3565(`detail') of the message for unhandled exceptions.
3566
3567Exception handlers don't just handle exceptions if they occur
3568immediately in the try clause, but also if they occur inside functions
3569that are called (even indirectly) in the try clause.
3570For example:
3571
3572\begin{verbatim}
3573>>> def this_fails():
3574... x = 1/0
3575...
3576>>> try:
3577... this_fails()
3578... except ZeroDivisionError, detail:
3579... print 'Handling run-time error:', detail
3580...
3581Handling run-time error: integer division or modulo by zero
3582\end{verbatim}
3583
3584
3585\section{Raising Exceptions \label{raising}}
3586
3587The \keyword{raise} statement allows the programmer to force a
3588specified exception to occur.
3589For example:
3590
3591\begin{verbatim}
3592>>> raise NameError, 'HiThere'
3593Traceback (most recent call last):
3594 File "<stdin>", line 1, in ?
3595NameError: HiThere
3596\end{verbatim}
3597
3598The first argument to \keyword{raise} names the exception to be
3599raised. The optional second argument specifies the exception's
3600argument. Alternatively, the above could be written as
3601\code{raise NameError('HiThere')}. Either form works fine, but there
3602seems to be a growing stylistic preference for the latter.
3603
3604If you need to determine whether an exception was raised but don't
3605intend to handle it, a simpler form of the \keyword{raise} statement
3606allows you to re-raise the exception:
3607
3608\begin{verbatim}
3609>>> try:
3610... raise NameError, 'HiThere'
3611... except NameError:
3612... print 'An exception flew by!'
3613... raise
3614...
3615An exception flew by!
3616Traceback (most recent call last):
3617 File "<stdin>", line 2, in ?
3618NameError: HiThere
3619\end{verbatim}
3620
3621
3622\section{User-defined Exceptions \label{userExceptions}}
3623
3624Programs may name their own exceptions by creating a new exception
3625class. Exceptions should typically be derived from the
3626\exception{Exception} class, either directly or indirectly. For
3627example:
3628
3629\begin{verbatim}
3630>>> class MyError(Exception):
3631... def __init__(self, value):
3632... self.value = value
3633... def __str__(self):
3634... return repr(self.value)
3635...
3636>>> try:
3637... raise MyError(2*2)
3638... except MyError, e:
3639... print 'My exception occurred, value:', e.value
3640...
3641My exception occurred, value: 4
3642>>> raise MyError, 'oops!'
3643Traceback (most recent call last):
3644 File "<stdin>", line 1, in ?
3645__main__.MyError: 'oops!'
3646\end{verbatim}
3647
3648In this example, the default \method{__init__} of \class{Exception}
3649has been overridden. The new behavior simply creates the \var{value}
3650attribute. This replaces the default behavior of creating the
3651\var{args} attribute.
3652
3653Exception classes can be defined which do anything any other class can
3654do, but are usually kept simple, often only offering a number of
3655attributes that allow information about the error to be extracted by
3656handlers for the exception. When creating a module that can raise
3657several distinct errors, a common practice is to create a base class
3658for exceptions defined by that module, and subclass that to create
3659specific exception classes for different error conditions:
3660
3661\begin{verbatim}
3662class Error(Exception):
3663 """Base class for exceptions in this module."""
3664 pass
3665
3666class InputError(Error):
3667 """Exception raised for errors in the input.
3668
3669 Attributes:
3670 expression -- input expression in which the error occurred
3671 message -- explanation of the error
3672 """
3673
3674 def __init__(self, expression, message):
3675 self.expression = expression
3676 self.message = message
3677
3678class TransitionError(Error):
3679 """Raised when an operation attempts a state transition that's not
3680 allowed.
3681
3682 Attributes:
3683 previous -- state at beginning of transition
3684 next -- attempted new state
3685 message -- explanation of why the specific transition is not allowed
3686 """
3687
3688 def __init__(self, previous, next, message):
3689 self.previous = previous
3690 self.next = next
3691 self.message = message
3692\end{verbatim}
3693
3694Most exceptions are defined with names that end in ``Error,'' similar
3695to the naming of the standard exceptions.
3696
3697Many standard modules define their own exceptions to report errors
3698that may occur in functions they define. More information on classes
3699is presented in chapter \ref{classes}, ``Classes.''
3700
3701
3702\section{Defining Clean-up Actions \label{cleanup}}
3703
3704The \keyword{try} statement has another optional clause which is
3705intended to define clean-up actions that must be executed under all
3706circumstances. For example:
3707
3708\begin{verbatim}
3709>>> try:
3710... raise KeyboardInterrupt
3711... finally:
3712... print 'Goodbye, world!'
3713...
3714Goodbye, world!
3715Traceback (most recent call last):
3716 File "<stdin>", line 2, in ?
3717KeyboardInterrupt
3718\end{verbatim}
3719
3720A \emph{finally clause} is always executed before leaving the
3721\keyword{try} statement, whether an exception has occurred or not.
3722When an exception has occurred in the \keyword{try} clause and has not
3723been handled by an \keyword{except} clause (or it has occurred in a
3724\keyword{except} or \keyword{else} clause), it is re-raised after the
3725\keyword{finally} clause has been executed. The \keyword{finally} clause
3726is also executed ``on the way out'' when any other clause of the
3727\keyword{try} statement is left via a \keyword{break}, \keyword{continue}
3728or \keyword{return} statement. A more complicated example:
3729
3730\begin{verbatim}
3731>>> def divide(x, y):
3732... try:
3733... result = x / y
3734... except ZeroDivisionError:
3735... print "division by zero!"
3736... else:
3737... print "result is", result
3738... finally:
3739... print "executing finally clause"
3740...
3741>>> divide(2, 1)
3742result is 2
3743executing finally clause
3744>>> divide(2, 0)
3745division by zero!
3746executing finally clause
3747>>> divide("2", "1")
3748executing finally clause
3749Traceback (most recent call last):
3750 File "<stdin>", line 1, in ?
3751 File "<stdin>", line 3, in divide
3752TypeError: unsupported operand type(s) for /: 'str' and 'str'
3753\end{verbatim}
3754
3755As you can see, the \keyword{finally} clause is executed in any
3756event. The \exception{TypeError} raised by dividing two strings
3757is not handled by the \keyword{except} clause and therefore
3758re-raised after the \keyword{finally} clauses has been executed.
3759
3760In real world applications, the \keyword{finally} clause is useful
3761for releasing external resources (such as files or network connections),
3762regardless of whether the use of the resource was successful.
3763
3764
3765\section{Predefined Clean-up Actions \label{cleanup-with}}
3766
3767Some objects define standard clean-up actions to be undertaken when
3768the object is no longer needed, regardless of whether or not the
3769operation using the object succeeded or failed.
3770Look at the following example, which tries to open a file and print
3771its contents to the screen.
3772
3773\begin{verbatim}
3774for line in open("myfile.txt"):
3775 print line
3776\end{verbatim}
3777
3778The problem with this code is that it leaves the file open for an
3779indeterminate amount of time after the code has finished executing.
3780This is not an issue in simple scripts, but can be a problem for
3781larger applications. The \keyword{with} statement allows
3782objects like files to be used in a way that ensures they are
3783always cleaned up promptly and correctly.
3784
3785\begin{verbatim}
3786with open("myfile.txt") as f:
3787 for line in f:
3788 print line
3789\end{verbatim}
3790
3791After the statement is executed, the file \var{f} is always closed,
3792even if a problem was encountered while processing the lines. Other
3793objects which provide predefined clean-up actions will indicate
3794this in their documentation.
3795
3796
3797\chapter{Classes \label{classes}}
3798
3799Python's class mechanism adds classes to the language with a minimum
3800of new syntax and semantics. It is a mixture of the class mechanisms
3801found in \Cpp{} and Modula-3. As is true for modules, classes in Python
3802do not put an absolute barrier between definition and user, but rather
3803rely on the politeness of the user not to ``break into the
3804definition.'' The most important features of classes are retained
3805with full power, however: the class inheritance mechanism allows
3806multiple base classes, a derived class can override any methods of its
3807base class or classes, and a method can call the method of a base class with the
3808same name. Objects can contain an arbitrary amount of private data.
3809
3810In \Cpp{} terminology, all class members (including the data members) are
3811\emph{public}, and all member functions are \emph{virtual}. There are
3812no special constructors or destructors. As in Modula-3, there are no
3813shorthands for referencing the object's members from its methods: the
3814method function is declared with an explicit first argument
3815representing the object, which is provided implicitly by the call. As
3816in Smalltalk, classes themselves are objects, albeit in the wider
3817sense of the word: in Python, all data types are objects. This
3818provides semantics for importing and renaming. Unlike
3819\Cpp{} and Modula-3, built-in types can be used as base classes for
3820extension by the user. Also, like in \Cpp{} but unlike in Modula-3, most
3821built-in operators with special syntax (arithmetic operators,
3822subscripting etc.) can be redefined for class instances.
3823
3824\section{A Word About Terminology \label{terminology}}
3825
3826Lacking universally accepted terminology to talk about classes, I will
3827make occasional use of Smalltalk and \Cpp{} terms. (I would use Modula-3
3828terms, since its object-oriented semantics are closer to those of
3829Python than \Cpp, but I expect that few readers have heard of it.)
3830
3831Objects have individuality, and multiple names (in multiple scopes)
3832can be bound to the same object. This is known as aliasing in other
3833languages. This is usually not appreciated on a first glance at
3834Python, and can be safely ignored when dealing with immutable basic
3835types (numbers, strings, tuples). However, aliasing has an
3836(intended!) effect on the semantics of Python code involving mutable
3837objects such as lists, dictionaries, and most types representing
3838entities outside the program (files, windows, etc.). This is usually
3839used to the benefit of the program, since aliases behave like pointers
3840in some respects. For example, passing an object is cheap since only
3841a pointer is passed by the implementation; and if a function modifies
3842an object passed as an argument, the caller will see the change --- this
3843eliminates the need for two different argument passing mechanisms as in
3844Pascal.
3845
3846
3847\section{Python Scopes and Name Spaces \label{scopes}}
3848
3849Before introducing classes, I first have to tell you something about
3850Python's scope rules. Class definitions play some neat tricks with
3851namespaces, and you need to know how scopes and namespaces work to
3852fully understand what's going on. Incidentally, knowledge about this
3853subject is useful for any advanced Python programmer.
3854
3855Let's begin with some definitions.
3856
3857A \emph{namespace} is a mapping from names to objects. Most
3858namespaces are currently implemented as Python dictionaries, but
3859that's normally not noticeable in any way (except for performance),
3860and it may change in the future. Examples of namespaces are: the set
3861of built-in names (functions such as \function{abs()}, and built-in
3862exception names); the global names in a module; and the local names in
3863a function invocation. In a sense the set of attributes of an object
3864also form a namespace. The important thing to know about namespaces
3865is that there is absolutely no relation between names in different
3866namespaces; for instance, two different modules may both define a
3867function ``maximize'' without confusion --- users of the modules must
3868prefix it with the module name.
3869
3870By the way, I use the word \emph{attribute} for any name following a
3871dot --- for example, in the expression \code{z.real}, \code{real} is
3872an attribute of the object \code{z}. Strictly speaking, references to
3873names in modules are attribute references: in the expression
3874\code{modname.funcname}, \code{modname} is a module object and
3875\code{funcname} is an attribute of it. In this case there happens to
3876be a straightforward mapping between the module's attributes and the
3877global names defined in the module: they share the same namespace!
3878\footnote{
3879 Except for one thing. Module objects have a secret read-only
3880 attribute called \member{__dict__} which returns the dictionary
3881 used to implement the module's namespace; the name
3882 \member{__dict__} is an attribute but not a global name.
3883 Obviously, using this violates the abstraction of namespace
3884 implementation, and should be restricted to things like
3885 post-mortem debuggers.
3886}
3887
3888Attributes may be read-only or writable. In the latter case,
3889assignment to attributes is possible. Module attributes are writable:
3890you can write \samp{modname.the_answer = 42}. Writable attributes may
3891also be deleted with the \keyword{del} statement. For example,
3892\samp{del modname.the_answer} will remove the attribute
3893\member{the_answer} from the object named by \code{modname}.
3894
3895Name spaces are created at different moments and have different
3896lifetimes. The namespace containing the built-in names is created
3897when the Python interpreter starts up, and is never deleted. The
3898global namespace for a module is created when the module definition
3899is read in; normally, module namespaces also last until the
3900interpreter quits. The statements executed by the top-level
3901invocation of the interpreter, either read from a script file or
3902interactively, are considered part of a module called
3903\module{__main__}, so they have their own global namespace. (The
3904built-in names actually also live in a module; this is called
3905\module{__builtin__}.)
3906
3907The local namespace for a function is created when the function is
3908called, and deleted when the function returns or raises an exception
3909that is not handled within the function. (Actually, forgetting would
3910be a better way to describe what actually happens.) Of course,
3911recursive invocations each have their own local namespace.
3912
3913A \emph{scope} is a textual region of a Python program where a
3914namespace is directly accessible. ``Directly accessible'' here means
3915that an unqualified reference to a name attempts to find the name in
3916the namespace.
3917
3918Although scopes are determined statically, they are used dynamically.
3919At any time during execution, there are at least three nested scopes whose
3920namespaces are directly accessible: the innermost scope, which is searched
3921first, contains the local names; the namespaces of any enclosing
3922functions, which are searched starting with the nearest enclosing scope;
3923the middle scope, searched next, contains the current module's global names;
3924and the outermost scope (searched last) is the namespace containing built-in
3925names.
3926
3927If a name is declared global, then all references and assignments go
3928directly to the middle scope containing the module's global names.
3929Otherwise, all variables found outside of the innermost scope are read-only
3930(an attempt to write to such a variable will simply create a \emph{new}
3931local variable in the innermost scope, leaving the identically named
3932outer variable unchanged).
3933
3934Usually, the local scope references the local names of the (textually)
3935current function. Outside functions, the local scope references
3936the same namespace as the global scope: the module's namespace.
3937Class definitions place yet another namespace in the local scope.
3938
3939It is important to realize that scopes are determined textually: the
3940global scope of a function defined in a module is that module's
3941namespace, no matter from where or by what alias the function is
3942called. On the other hand, the actual search for names is done
3943dynamically, at run time --- however, the language definition is
3944evolving towards static name resolution, at ``compile'' time, so don't
3945rely on dynamic name resolution! (In fact, local variables are
3946already determined statically.)
3947
3948A special quirk of Python is that assignments always go into the
3949innermost scope. Assignments do not copy data --- they just
3950bind names to objects. The same is true for deletions: the statement
3951\samp{del x} removes the binding of \code{x} from the namespace
3952referenced by the local scope. In fact, all operations that introduce
3953new names use the local scope: in particular, import statements and
3954function definitions bind the module or function name in the local
3955scope. (The \keyword{global} statement can be used to indicate that
3956particular variables live in the global scope.)
3957
3958
3959\section{A First Look at Classes \label{firstClasses}}
3960
3961Classes introduce a little bit of new syntax, three new object types,
3962and some new semantics.
3963
3964
3965\subsection{Class Definition Syntax \label{classDefinition}}
3966
3967The simplest form of class definition looks like this:
3968
3969\begin{verbatim}
3970class ClassName:
3971 <statement-1>
3972 .
3973 .
3974 .
3975 <statement-N>
3976\end{verbatim}
3977
3978Class definitions, like function definitions
3979(\keyword{def} statements) must be executed before they have any
3980effect. (You could conceivably place a class definition in a branch
3981of an \keyword{if} statement, or inside a function.)
3982
3983In practice, the statements inside a class definition will usually be
3984function definitions, but other statements are allowed, and sometimes
3985useful --- we'll come back to this later. The function definitions
3986inside a class normally have a peculiar form of argument list,
3987dictated by the calling conventions for methods --- again, this is
3988explained later.
3989
3990When a class definition is entered, a new namespace is created, and
3991used as the local scope --- thus, all assignments to local variables
3992go into this new namespace. In particular, function definitions bind
3993the name of the new function here.
3994
3995When a class definition is left normally (via the end), a \emph{class
3996object} is created. This is basically a wrapper around the contents
3997of the namespace created by the class definition; we'll learn more
3998about class objects in the next section. The original local scope
3999(the one in effect just before the class definition was entered) is
4000reinstated, and the class object is bound here to the class name given
4001in the class definition header (\class{ClassName} in the example).
4002
4003
4004\subsection{Class Objects \label{classObjects}}
4005
4006Class objects support two kinds of operations: attribute references
4007and instantiation.
4008
4009\emph{Attribute references} use the standard syntax used for all
4010attribute references in Python: \code{obj.name}. Valid attribute
4011names are all the names that were in the class's namespace when the
4012class object was created. So, if the class definition looked like
4013this:
4014
4015\begin{verbatim}
4016class MyClass:
4017 "A simple example class"
4018 i = 12345
4019 def f(self):
4020 return 'hello world'
4021\end{verbatim}
4022
4023then \code{MyClass.i} and \code{MyClass.f} are valid attribute
4024references, returning an integer and a function object, respectively.
4025Class attributes can also be assigned to, so you can change the value
4026of \code{MyClass.i} by assignment. \member{__doc__} is also a valid
4027attribute, returning the docstring belonging to the class: \code{"A
4028simple example class"}.
4029
4030Class \emph{instantiation} uses function notation. Just pretend that
4031the class object is a parameterless function that returns a new
4032instance of the class. For example (assuming the above class):
4033
4034\begin{verbatim}
4035x = MyClass()
4036\end{verbatim}
4037
4038creates a new \emph{instance} of the class and assigns this object to
4039the local variable \code{x}.
4040
4041The instantiation operation (``calling'' a class object) creates an
4042empty object. Many classes like to create objects with instances
4043customized to a specific initial state.
4044Therefore a class may define a special method named
4045\method{__init__()}, like this:
4046
4047\begin{verbatim}
4048 def __init__(self):
4049 self.data = []
4050\end{verbatim}
4051
4052When a class defines an \method{__init__()} method, class
4053instantiation automatically invokes \method{__init__()} for the
4054newly-created class instance. So in this example, a new, initialized
4055instance can be obtained by:
4056
4057\begin{verbatim}
4058x = MyClass()
4059\end{verbatim}
4060
4061Of course, the \method{__init__()} method may have arguments for
4062greater flexibility. In that case, arguments given to the class
4063instantiation operator are passed on to \method{__init__()}. For
4064example,
4065
4066\begin{verbatim}
4067>>> class Complex:
4068... def __init__(self, realpart, imagpart):
4069... self.r = realpart
4070... self.i = imagpart
4071...
4072>>> x = Complex(3.0, -4.5)
4073>>> x.r, x.i
4074(3.0, -4.5)
4075\end{verbatim}
4076
4077
4078\subsection{Instance Objects \label{instanceObjects}}
4079
4080Now what can we do with instance objects? The only operations
4081understood by instance objects are attribute references. There are
4082two kinds of valid attribute names, data attributes and methods.
4083
4084\emph{data attributes} correspond to
4085``instance variables'' in Smalltalk, and to ``data members'' in
4086\Cpp. Data attributes need not be declared; like local variables,
4087they spring into existence when they are first assigned to. For
4088example, if \code{x} is the instance of \class{MyClass} created above,
4089the following piece of code will print the value \code{16}, without
4090leaving a trace:
4091
4092\begin{verbatim}
4093x.counter = 1
4094while x.counter < 10:
4095 x.counter = x.counter * 2
4096print x.counter
4097del x.counter
4098\end{verbatim}
4099
4100The other kind of instance attribute reference is a \emph{method}.
4101A method is a function that ``belongs to'' an
4102object. (In Python, the term method is not unique to class instances:
4103other object types can have methods as well. For example, list objects have
4104methods called append, insert, remove, sort, and so on. However,
4105in the following discussion, we'll use the term method exclusively to mean
4106methods of class instance objects, unless explicitly stated otherwise.)
4107
4108Valid method names of an instance object depend on its class. By
4109definition, all attributes of a class that are function
4110objects define corresponding methods of its instances. So in our
4111example, \code{x.f} is a valid method reference, since
4112\code{MyClass.f} is a function, but \code{x.i} is not, since
4113\code{MyClass.i} is not. But \code{x.f} is not the same thing as
4114\code{MyClass.f} --- it is a \obindex{method}\emph{method object}, not
4115a function object.
4116
4117
4118\subsection{Method Objects \label{methodObjects}}
4119
4120Usually, a method is called right after it is bound:
4121
4122\begin{verbatim}
4123x.f()
4124\end{verbatim}
4125
4126In the \class{MyClass} example, this will return the string \code{'hello world'}.
4127However, it is not necessary to call a method right away:
4128\code{x.f} is a method object, and can be stored away and called at a
4129later time. For example:
4130
4131\begin{verbatim}
4132xf = x.f
4133while True:
4134 print xf()
4135\end{verbatim}
4136
4137will continue to print \samp{hello world} until the end of time.
4138
4139What exactly happens when a method is called? You may have noticed
4140that \code{x.f()} was called without an argument above, even though
4141the function definition for \method{f} specified an argument. What
4142happened to the argument? Surely Python raises an exception when a
4143function that requires an argument is called without any --- even if
4144the argument isn't actually used...
4145
4146Actually, you may have guessed the answer: the special thing about
4147methods is that the object is passed as the first argument of the
4148function. In our example, the call \code{x.f()} is exactly equivalent
4149to \code{MyClass.f(x)}. In general, calling a method with a list of
4150\var{n} arguments is equivalent to calling the corresponding function
4151with an argument list that is created by inserting the method's object
4152before the first argument.
4153
4154If you still don't understand how methods work, a look at the
4155implementation can perhaps clarify matters. When an instance
4156attribute is referenced that isn't a data attribute, its class is
4157searched. If the name denotes a valid class attribute that is a
4158function object, a method object is created by packing (pointers to)
4159the instance object and the function object just found together in an
4160abstract object: this is the method object. When the method object is
4161called with an argument list, it is unpacked again, a new argument
4162list is constructed from the instance object and the original argument
4163list, and the function object is called with this new argument list.
4164
4165
4166\section{Random Remarks \label{remarks}}
4167
4168% [These should perhaps be placed more carefully...]
4169
4170
4171Data attributes override method attributes with the same name; to
4172avoid accidental name conflicts, which may cause hard-to-find bugs in
4173large programs, it is wise to use some kind of convention that
4174minimizes the chance of conflicts. Possible conventions include
4175capitalizing method names, prefixing data attribute names with a small
4176unique string (perhaps just an underscore), or using verbs for methods
4177and nouns for data attributes.
4178
4179
4180Data attributes may be referenced by methods as well as by ordinary
4181users (``clients'') of an object. In other words, classes are not
4182usable to implement pure abstract data types. In fact, nothing in
4183Python makes it possible to enforce data hiding --- it is all based
4184upon convention. (On the other hand, the Python implementation,
4185written in C, can completely hide implementation details and control
4186access to an object if necessary; this can be used by extensions to
4187Python written in C.)
4188
4189
4190Clients should use data attributes with care --- clients may mess up
4191invariants maintained by the methods by stamping on their data
4192attributes. Note that clients may add data attributes of their own to
4193an instance object without affecting the validity of the methods, as
4194long as name conflicts are avoided --- again, a naming convention can
4195save a lot of headaches here.
4196
4197
4198There is no shorthand for referencing data attributes (or other
4199methods!) from within methods. I find that this actually increases
4200the readability of methods: there is no chance of confusing local
4201variables and instance variables when glancing through a method.
4202
4203
4204Often, the first argument of a method is called
4205\code{self}. This is nothing more than a convention: the name
4206\code{self} has absolutely no special meaning to Python. (Note,
4207however, that by not following the convention your code may be less
4208readable to other Python programmers, and it is also conceivable that
4209a \emph{class browser} program might be written that relies upon such a
4210convention.)
4211
4212
4213Any function object that is a class attribute defines a method for
4214instances of that class. It is not necessary that the function
4215definition is textually enclosed in the class definition: assigning a
4216function object to a local variable in the class is also ok. For
4217example:
4218
4219\begin{verbatim}
4220# Function defined outside the class
4221def f1(self, x, y):
4222 return min(x, x+y)
4223
4224class C:
4225 f = f1
4226 def g(self):
4227 return 'hello world'
4228 h = g
4229\end{verbatim}
4230
4231Now \code{f}, \code{g} and \code{h} are all attributes of class
4232\class{C} that refer to function objects, and consequently they are all
4233methods of instances of \class{C} --- \code{h} being exactly equivalent
4234to \code{g}. Note that this practice usually only serves to confuse
4235the reader of a program.
4236
4237
4238Methods may call other methods by using method attributes of the
4239\code{self} argument:
4240
4241\begin{verbatim}
4242class Bag:
4243 def __init__(self):
4244 self.data = []
4245 def add(self, x):
4246 self.data.append(x)
4247 def addtwice(self, x):
4248 self.add(x)
4249 self.add(x)
4250\end{verbatim}
4251
4252Methods may reference global names in the same way as ordinary
4253functions. The global scope associated with a method is the module
4254containing the class definition. (The class itself is never used as a
4255global scope!) While one rarely encounters a good reason for using
4256global data in a method, there are many legitimate uses of the global
4257scope: for one thing, functions and modules imported into the global
4258scope can be used by methods, as well as functions and classes defined
4259in it. Usually, the class containing the method is itself defined in
4260this global scope, and in the next section we'll find some good
4261reasons why a method would want to reference its own class!
4262
4263
4264\section{Inheritance \label{inheritance}}
4265
4266Of course, a language feature would not be worthy of the name ``class''
4267without supporting inheritance. The syntax for a derived class
4268definition looks like this:
4269
4270\begin{verbatim}
4271class DerivedClassName(BaseClassName):
4272 <statement-1>
4273 .
4274 .
4275 .
4276 <statement-N>
4277\end{verbatim}
4278
4279The name \class{BaseClassName} must be defined in a scope containing
4280the derived class definition. In place of a base class name, other
4281arbitrary expressions are also allowed. This can be useful, for
4282example, when the base class is defined in another module:
4283
4284\begin{verbatim}
4285class DerivedClassName(modname.BaseClassName):
4286\end{verbatim}
4287
4288Execution of a derived class definition proceeds the same as for a
4289base class. When the class object is constructed, the base class is
4290remembered. This is used for resolving attribute references: if a
4291requested attribute is not found in the class, the search proceeds to look in the
4292base class. This rule is applied recursively if the base class itself
4293is derived from some other class.
4294
4295There's nothing special about instantiation of derived classes:
4296\code{DerivedClassName()} creates a new instance of the class. Method
4297references are resolved as follows: the corresponding class attribute
4298is searched, descending down the chain of base classes if necessary,
4299and the method reference is valid if this yields a function object.
4300
4301Derived classes may override methods of their base classes. Because
4302methods have no special privileges when calling other methods of the
4303same object, a method of a base class that calls another method
4304defined in the same base class may end up calling a method of
4305a derived class that overrides it. (For \Cpp{} programmers: all methods
4306in Python are effectively \keyword{virtual}.)
4307
4308An overriding method in a derived class may in fact want to extend
4309rather than simply replace the base class method of the same name.
4310There is a simple way to call the base class method directly: just
4311call \samp{BaseClassName.methodname(self, arguments)}. This is
4312occasionally useful to clients as well. (Note that this only works if
4313the base class is defined or imported directly in the global scope.)
4314
4315
4316\subsection{Multiple Inheritance \label{multiple}}
4317
4318Python supports a limited form of multiple inheritance as well. A
4319class definition with multiple base classes looks like this:
4320
4321\begin{verbatim}
4322class DerivedClassName(Base1, Base2, Base3):
4323 <statement-1>
4324 .
4325 .
4326 .
4327 <statement-N>
4328\end{verbatim}
4329
4330The only rule necessary to explain the semantics is the resolution
4331rule used for class attribute references. This is depth-first,
4332left-to-right. Thus, if an attribute is not found in
4333\class{DerivedClassName}, it is searched in \class{Base1}, then
4334(recursively) in the base classes of \class{Base1}, and only if it is
4335not found there, it is searched in \class{Base2}, and so on.
4336
4337(To some people breadth first --- searching \class{Base2} and
4338\class{Base3} before the base classes of \class{Base1} --- looks more
4339natural. However, this would require you to know whether a particular
4340attribute of \class{Base1} is actually defined in \class{Base1} or in
4341one of its base classes before you can figure out the consequences of
4342a name conflict with an attribute of \class{Base2}. The depth-first
4343rule makes no differences between direct and inherited attributes of
4344\class{Base1}.)
4345
4346It is clear that indiscriminate use of multiple inheritance is a
4347maintenance nightmare, given the reliance in Python on conventions to
4348avoid accidental name conflicts. A well-known problem with multiple
4349inheritance is a class derived from two classes that happen to have a
4350common base class. While it is easy enough to figure out what happens
4351in this case (the instance will have a single copy of ``instance
4352variables'' or data attributes used by the common base class), it is
4353not clear that these semantics are in any way useful.
4354
4355%% XXX Add rules for new-style MRO?
4356
4357\section{Private Variables \label{private}}
4358
4359There is limited support for class-private
4360identifiers. Any identifier of the form \code{__spam} (at least two
4361leading underscores, at most one trailing underscore) is textually
4362replaced with \code{_classname__spam}, where \code{classname} is the
4363current class name with leading underscore(s) stripped. This mangling
4364is done without regard to the syntactic position of the identifier, so
4365it can be used to define class-private instance and class variables,
4366methods, variables stored in globals, and even variables stored in instances.
4367private to this class on instances of \emph{other} classes. Truncation
4368may occur when the mangled name would be longer than 255 characters.
4369Outside classes, or when the class name consists of only underscores,
4370no mangling occurs.
4371
4372Name mangling is intended to give classes an easy way to define
4373``private'' instance variables and methods, without having to worry
4374about instance variables defined by derived classes, or mucking with
4375instance variables by code outside the class. Note that the mangling
4376rules are designed mostly to avoid accidents; it still is possible for
4377a determined soul to access or modify a variable that is considered
4378private. This can even be useful in special circumstances, such as in
4379the debugger, and that's one reason why this loophole is not closed.
4380(Buglet: derivation of a class with the same name as the base class
4381makes use of private variables of the base class possible.)
4382
4383Notice that code passed to \code{exec}, \code{eval()} or
4384\code{execfile()} does not consider the classname of the invoking
4385class to be the current class; this is similar to the effect of the
4386\code{global} statement, the effect of which is likewise restricted to
4387code that is byte-compiled together. The same restriction applies to
4388\code{getattr()}, \code{setattr()} and \code{delattr()}, as well as
4389when referencing \code{__dict__} directly.
4390
4391
4392\section{Odds and Ends \label{odds}}
4393
4394Sometimes it is useful to have a data type similar to the Pascal
4395``record'' or C ``struct'', bundling together a few named data
4396items. An empty class definition will do nicely:
4397
4398\begin{verbatim}
4399class Employee:
4400 pass
4401
4402john = Employee() # Create an empty employee record
4403
4404# Fill the fields of the record
4405john.name = 'John Doe'
4406john.dept = 'computer lab'
4407john.salary = 1000
4408\end{verbatim}
4409
4410A piece of Python code that expects a particular abstract data type
4411can often be passed a class that emulates the methods of that data
4412type instead. For instance, if you have a function that formats some
4413data from a file object, you can define a class with methods
4414\method{read()} and \method{readline()} that get the data from a string
4415buffer instead, and pass it as an argument.% (Unfortunately, this
4416%technique has its limitations: a class can't define operations that
4417%are accessed by special syntax such as sequence subscripting or
4418%arithmetic operators, and assigning such a ``pseudo-file'' to
4419%\code{sys.stdin} will not cause the interpreter to read further input
4420%from it.)
4421
4422
4423Instance method objects have attributes, too: \code{m.im_self} is the
4424instance object with the method \method{m}, and \code{m.im_func} is the
4425function object corresponding to the method.
4426
4427
4428\section{Exceptions Are Classes Too\label{exceptionClasses}}
4429
4430User-defined exceptions are identified by classes as well. Using this
4431mechanism it is possible to create extensible hierarchies of exceptions.
4432
4433There are two new valid (semantic) forms for the raise statement:
4434
4435\begin{verbatim}
4436raise Class, instance
4437
4438raise instance
4439\end{verbatim}
4440
4441In the first form, \code{instance} must be an instance of
4442\class{Class} or of a class derived from it. The second form is a
4443shorthand for:
4444
4445\begin{verbatim}
4446raise instance.__class__, instance
4447\end{verbatim}
4448
4449A class in an except clause is compatible with an exception if it is the same
4450class or a base class thereof (but not the other way around --- an
4451except clause listing a derived class is not compatible with a base
4452class). For example, the following code will print B, C, D in that
4453order:
4454
4455\begin{verbatim}
4456class B:
4457 pass
4458class C(B):
4459 pass
4460class D(C):
4461 pass
4462
4463for c in [B, C, D]:
4464 try:
4465 raise c()
4466 except D:
4467 print "D"
4468 except C:
4469 print "C"
4470 except B:
4471 print "B"
4472\end{verbatim}
4473
4474Note that if the except clauses were reversed (with
4475\samp{except B} first), it would have printed B, B, B --- the first
4476matching except clause is triggered.
4477
4478When an error message is printed for an unhandled exception, the
4479exception's class name is printed, then a colon and a space, and
4480finally the instance converted to a string using the built-in function
4481\function{str()}.
4482
4483
4484\section{Iterators\label{iterators}}
4485
4486By now you have probably noticed that most container objects can be looped
4487over using a \keyword{for} statement:
4488
4489\begin{verbatim}
4490for element in [1, 2, 3]:
4491 print element
4492for element in (1, 2, 3):
4493 print element
4494for key in {'one':1, 'two':2}:
4495 print key
4496for char in "123":
4497 print char
4498for line in open("myfile.txt"):
4499 print line
4500\end{verbatim}
4501
4502This style of access is clear, concise, and convenient. The use of iterators
4503pervades and unifies Python. Behind the scenes, the \keyword{for}
4504statement calls \function{iter()} on the container object. The
4505function returns an iterator object that defines the method
4506\method{next()} which accesses elements in the container one at a
4507time. When there are no more elements, \method{next()} raises a
4508\exception{StopIteration} exception which tells the \keyword{for} loop
4509to terminate. This example shows how it all works:
4510
4511\begin{verbatim}
4512>>> s = 'abc'
4513>>> it = iter(s)
4514>>> it
4515<iterator object at 0x00A1DB50>
4516>>> it.next()
4517'a'
4518>>> it.next()
4519'b'
4520>>> it.next()
4521'c'
4522>>> it.next()
4523
4524Traceback (most recent call last):
4525 File "<stdin>", line 1, in ?
4526 it.next()
4527StopIteration
4528\end{verbatim}
4529
4530Having seen the mechanics behind the iterator protocol, it is easy to add
4531iterator behavior to your classes. Define a \method{__iter__()} method
4532which returns an object with a \method{next()} method. If the class defines
4533\method{next()}, then \method{__iter__()} can just return \code{self}:
4534
4535\begin{verbatim}
4536class Reverse:
4537 "Iterator for looping over a sequence backwards"
4538 def __init__(self, data):
4539 self.data = data
4540 self.index = len(data)
4541 def __iter__(self):
4542 return self
4543 def next(self):
4544 if self.index == 0:
4545 raise StopIteration
4546 self.index = self.index - 1
4547 return self.data[self.index]
4548
4549>>> for char in Reverse('spam'):
4550... print char
4551...
4552m
4553a
4554p
4555s
4556\end{verbatim}
4557
4558
4559\section{Generators\label{generators}}
4560
4561Generators are a simple and powerful tool for creating iterators. They are
4562written like regular functions but use the \keyword{yield} statement whenever
4563they want to return data. Each time \method{next()} is called, the
4564generator resumes where it left-off (it remembers all the data values and
4565which statement was last executed). An example shows that generators can
4566be trivially easy to create:
4567
4568\begin{verbatim}
4569def reverse(data):
4570 for index in range(len(data)-1, -1, -1):
4571 yield data[index]
4572
4573>>> for char in reverse('golf'):
4574... print char
4575...
4576f
4577l
4578o
4579g
4580\end{verbatim}
4581
4582Anything that can be done with generators can also be done with class based
4583iterators as described in the previous section. What makes generators so
4584compact is that the \method{__iter__()} and \method{next()} methods are
4585created automatically.
4586
4587Another key feature is that the local variables and execution state
4588are automatically saved between calls. This made the function easier to write
4589and much more clear than an approach using instance variables like
4590\code{self.index} and \code{self.data}.
4591
4592In addition to automatic method creation and saving program state, when
4593generators terminate, they automatically raise \exception{StopIteration}.
4594In combination, these features make it easy to create iterators with no
4595more effort than writing a regular function.
4596
4597\section{Generator Expressions\label{genexps}}
4598
4599Some simple generators can be coded succinctly as expressions using a syntax
4600similar to list comprehensions but with parentheses instead of brackets. These
4601expressions are designed for situations where the generator is used right
4602away by an enclosing function. Generator expressions are more compact but
4603less versatile than full generator definitions and tend to be more memory
4604friendly than equivalent list comprehensions.
4605
4606Examples:
4607
4608\begin{verbatim}
4609>>> sum(i*i for i in range(10)) # sum of squares
4610285
4611
4612>>> xvec = [10, 20, 30]
4613>>> yvec = [7, 5, 3]
4614>>> sum(x*y for x,y in zip(xvec, yvec)) # dot product
4615260
4616
4617>>> from math import pi, sin
4618>>> sine_table = dict((x, sin(x*pi/180)) for x in range(0, 91))
4619
4620>>> unique_words = set(word for line in page for word in line.split())
4621
4622>>> valedictorian = max((student.gpa, student.name) for student in graduates)
4623
4624>>> data = 'golf'
4625>>> list(data[i] for i in range(len(data)-1,-1,-1))
4626['f', 'l', 'o', 'g']
4627
4628\end{verbatim}
4629
4630
4631
4632\chapter{Brief Tour of the Standard Library \label{briefTour}}
4633
4634
4635\section{Operating System Interface\label{os-interface}}
4636
4637The \ulink{\module{os}}{../lib/module-os.html}
4638module provides dozens of functions for interacting with the
4639operating system:
4640
4641\begin{verbatim}
4642>>> import os
4643>>> os.system('time 0:02')
46440
4645>>> os.getcwd() # Return the current working directory
4646'C:\\Python24'
4647>>> os.chdir('/server/accesslogs')
4648\end{verbatim}
4649
4650Be sure to use the \samp{import os} style instead of
4651\samp{from os import *}. This will keep \function{os.open()} from
4652shadowing the builtin \function{open()} function which operates much
4653differently.
4654
4655\bifuncindex{help}
4656The builtin \function{dir()} and \function{help()} functions are useful
4657as interactive aids for working with large modules like \module{os}:
4658
4659\begin{verbatim}
4660>>> import os
4661>>> dir(os)
4662<returns a list of all module functions>
4663>>> help(os)
4664<returns an extensive manual page created from the module's docstrings>
4665\end{verbatim}
4666
4667For daily file and directory management tasks, the
4668\ulink{\module{shutil}}{../lib/module-shutil.html}
4669module provides a higher level interface that is easier to use:
4670
4671\begin{verbatim}
4672>>> import shutil
4673>>> shutil.copyfile('data.db', 'archive.db')
4674>>> shutil.move('/build/executables', 'installdir')
4675\end{verbatim}
4676
4677
4678\section{File Wildcards\label{file-wildcards}}
4679
4680The \ulink{\module{glob}}{../lib/module-glob.html}
4681module provides a function for making file lists from directory
4682wildcard searches:
4683
4684\begin{verbatim}
4685>>> import glob
4686>>> glob.glob('*.py')
4687['primes.py', 'random.py', 'quote.py']
4688\end{verbatim}
4689
4690
4691\section{Command Line Arguments\label{command-line-arguments}}
4692
4693Common utility scripts often need to process command line arguments.
4694These arguments are stored in the
4695\ulink{\module{sys}}{../lib/module-sys.html}\ module's \var{argv}
4696attribute as a list. For instance the following output results from
4697running \samp{python demo.py one two three} at the command line:
4698
4699\begin{verbatim}
4700>>> import sys
4701>>> print sys.argv
4702['demo.py', 'one', 'two', 'three']
4703\end{verbatim}
4704
4705The \ulink{\module{getopt}}{../lib/module-getopt.html}
4706module processes \var{sys.argv} using the conventions of the \UNIX{}
4707\function{getopt()} function. More powerful and flexible command line
4708processing is provided by the
4709\ulink{\module{optparse}}{../lib/module-optparse.html} module.
4710
4711
4712\section{Error Output Redirection and Program Termination\label{stderr}}
4713
4714The \ulink{\module{sys}}{../lib/module-sys.html}
4715module also has attributes for \var{stdin}, \var{stdout}, and
4716\var{stderr}. The latter is useful for emitting warnings and error
4717messages to make them visible even when \var{stdout} has been redirected:
4718
4719\begin{verbatim}
4720>>> sys.stderr.write('Warning, log file not found starting a new one\n')
4721Warning, log file not found starting a new one
4722\end{verbatim}
4723
4724The most direct way to terminate a script is to use \samp{sys.exit()}.
4725
4726
4727\section{String Pattern Matching\label{string-pattern-matching}}
4728
4729The \ulink{\module{re}}{../lib/module-re.html}
4730module provides regular expression tools for advanced string processing.
4731For complex matching and manipulation, regular expressions offer succinct,
4732optimized solutions:
4733
4734\begin{verbatim}
4735>>> import re
4736>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
4737['foot', 'fell', 'fastest']
4738>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
4739'cat in the hat'
4740\end{verbatim}
4741
4742When only simple capabilities are needed, string methods are preferred
4743because they are easier to read and debug:
4744
4745\begin{verbatim}
4746>>> 'tea for too'.replace('too', 'two')
4747'tea for two'
4748\end{verbatim}
4749
4750\section{Mathematics\label{mathematics}}
4751
4752The \ulink{\module{math}}{../lib/module-math.html} module gives
4753access to the underlying C library functions for floating point math:
4754
4755\begin{verbatim}
4756>>> import math
4757>>> math.cos(math.pi / 4.0)
47580.70710678118654757
4759>>> math.log(1024, 2)
476010.0
4761\end{verbatim}
4762
4763The \ulink{\module{random}}{../lib/module-random.html}
4764module provides tools for making random selections:
4765
4766\begin{verbatim}
4767>>> import random
4768>>> random.choice(['apple', 'pear', 'banana'])
4769'apple'
4770>>> random.sample(xrange(100), 10) # sampling without replacement
4771[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
4772>>> random.random() # random float
47730.17970987693706186
4774>>> random.randrange(6) # random integer chosen from range(6)
47754
4776\end{verbatim}
4777
4778
4779\section{Internet Access\label{internet-access}}
4780
4781There are a number of modules for accessing the internet and processing
4782internet protocols. Two of the simplest are
4783\ulink{\module{urllib2}}{../lib/module-urllib2.html}
4784for retrieving data from urls and
4785\ulink{\module{smtplib}}{../lib/module-smtplib.html}
4786for sending mail:
4787
4788\begin{verbatim}
4789>>> import urllib2
4790>>> for line in urllib2.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
4791... if 'EST' in line or 'EDT' in line: # look for Eastern Time
4792... print line
4793
4794<BR>Nov. 25, 09:43:32 PM EST
4795
4796>>> import smtplib
4797>>> server = smtplib.SMTP('localhost')
4798>>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org',
4799"""To: jcaesar@example.org
4800From: soothsayer@example.org
4801
4802Beware the Ides of March.
4803""")
4804>>> server.quit()
4805\end{verbatim}
4806
4807
4808\section{Dates and Times\label{dates-and-times}}
4809
4810The \ulink{\module{datetime}}{../lib/module-datetime.html} module
4811supplies classes for manipulating dates and times in both simple
4812and complex ways. While date and time arithmetic is supported, the
4813focus of the implementation is on efficient member extraction for
4814output formatting and manipulation. The module also supports objects
4815that are timezone aware.
4816
4817\begin{verbatim}
4818# dates are easily constructed and formatted
4819>>> from datetime import date
4820>>> now = date.today()
4821>>> now
4822datetime.date(2003, 12, 2)
4823>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
4824'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
4825
4826# dates support calendar arithmetic
4827>>> birthday = date(1964, 7, 31)
4828>>> age = now - birthday
4829>>> age.days
483014368
4831\end{verbatim}
4832
4833
4834\section{Data Compression\label{data-compression}}
4835
4836Common data archiving and compression formats are directly supported
4837by modules including:
4838\ulink{\module{zlib}}{../lib/module-zlib.html},
4839\ulink{\module{gzip}}{../lib/module-gzip.html},
4840\ulink{\module{bz2}}{../lib/module-bz2.html},
4841\ulink{\module{zipfile}}{../lib/module-zipfile.html}, and
4842\ulink{\module{tarfile}}{../lib/module-tarfile.html}.
4843
4844\begin{verbatim}
4845>>> import zlib
4846>>> s = 'witch which has which witches wrist watch'
4847>>> len(s)
484841
4849>>> t = zlib.compress(s)
4850>>> len(t)
485137
4852>>> zlib.decompress(t)
4853'witch which has which witches wrist watch'
4854>>> zlib.crc32(s)
4855226805979
4856\end{verbatim}
4857
4858
4859\section{Performance Measurement\label{performance-measurement}}
4860
4861Some Python users develop a deep interest in knowing the relative
4862performance of different approaches to the same problem.
4863Python provides a measurement tool that answers those questions
4864immediately.
4865
4866For example, it may be tempting to use the tuple packing and unpacking
4867feature instead of the traditional approach to swapping arguments.
4868The \ulink{\module{timeit}}{../lib/module-timeit.html} module
4869quickly demonstrates a modest performance advantage:
4870
4871\begin{verbatim}
4872>>> from timeit import Timer
4873>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
48740.57535828626024577
4875>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
48760.54962537085770791
4877\end{verbatim}
4878
4879In contrast to \module{timeit}'s fine level of granularity, the
4880\ulink{\module{profile}}{../lib/module-profile.html} and \module{pstats}
4881modules provide tools for identifying time critical sections in larger blocks
4882of code.
4883
4884
4885\section{Quality Control\label{quality-control}}
4886
4887One approach for developing high quality software is to write tests for
4888each function as it is developed and to run those tests frequently during
4889the development process.
4890
4891The \ulink{\module{doctest}}{../lib/module-doctest.html} module provides
4892a tool for scanning a module and validating tests embedded in a program's
4893docstrings. Test construction is as simple as cutting-and-pasting a
4894typical call along with its results into the docstring. This improves
4895the documentation by providing the user with an example and it allows the
4896doctest module to make sure the code remains true to the documentation:
4897
4898\begin{verbatim}
4899def average(values):
4900 """Computes the arithmetic mean of a list of numbers.
4901
4902 >>> print average([20, 30, 70])
4903 40.0
4904 """
4905 return sum(values, 0.0) / len(values)
4906
4907import doctest
4908doctest.testmod() # automatically validate the embedded tests
4909\end{verbatim}
4910
4911The \ulink{\module{unittest}}{../lib/module-unittest.html} module is not
4912as effortless as the \module{doctest} module, but it allows a more
4913comprehensive set of tests to be maintained in a separate file:
4914
4915\begin{verbatim}
4916import unittest
4917
4918class TestStatisticalFunctions(unittest.TestCase):
4919
4920 def test_average(self):
4921 self.assertEqual(average([20, 30, 70]), 40.0)
4922 self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
4923 self.assertRaises(ZeroDivisionError, average, [])
4924 self.assertRaises(TypeError, average, 20, 30, 70)
4925
4926unittest.main() # Calling from the command line invokes all tests
4927\end{verbatim}
4928
4929\section{Batteries Included\label{batteries-included}}
4930
4931Python has a ``batteries included'' philosophy. This is best seen
4932through the sophisticated and robust capabilities of its larger
4933packages. For example:
4934
4935\begin{itemize}
4936\item The \ulink{\module{xmlrpclib}}{../lib/module-xmlrpclib.html} and
4937 \ulink{\module{SimpleXMLRPCServer}}{../lib/module-SimpleXMLRPCServer.html}
4938 modules make implementing remote procedure calls into an almost trivial task.
4939 Despite the modules names, no direct knowledge or handling of XML is needed.
4940\item The \ulink{\module{email}}{../lib/module-email.html} package is a library
4941 for managing email messages, including MIME and other RFC 2822-based message
4942 documents. Unlike \module{smtplib} and \module{poplib} which actually send
4943 and receive messages, the email package has a complete toolset for building
4944 or decoding complex message structures (including attachments) and for
4945 implementing internet encoding and header protocols.
4946\item The \ulink{\module{xml.dom}}{../lib/module-xml.dom.html} and
4947 \ulink{\module{xml.sax}}{../lib/module-xml.sax.html} packages provide robust
4948 support for parsing this popular data interchange format. Likewise, the
4949 \ulink{\module{csv}}{../lib/module-csv.html} module supports direct reads and
4950 writes in a common database format. Together, these modules and packages
4951 greatly simplify data interchange between python applications and other
4952 tools.
4953\item Internationalization is supported by a number of modules including
4954 \ulink{\module{gettext}}{../lib/module-gettext.html},
4955 \ulink{\module{locale}}{../lib/module-locale.html}, and the
4956 \ulink{\module{codecs}}{../lib/module-codecs.html} package.
4957\end{itemize}
4958
4959\chapter{Brief Tour of the Standard Library -- Part II\label{briefTourTwo}}
4960
4961This second tour covers more advanced modules that support professional
4962programming needs. These modules rarely occur in small scripts.
4963
4964
4965\section{Output Formatting\label{output-formatting}}
4966
4967The \ulink{\module{repr}}{../lib/module-repr.html} module provides a
4968version of \function{repr()} customized for abbreviated displays of large
4969or deeply nested containers:
4970
4971\begin{verbatim}
4972 >>> import repr
4973 >>> repr.repr(set('supercalifragilisticexpialidocious'))
4974 "set(['a', 'c', 'd', 'e', 'f', 'g', ...])"
4975\end{verbatim}
4976
4977The \ulink{\module{pprint}}{../lib/module-pprint.html} module offers
4978more sophisticated control over printing both built-in and user defined
4979objects in a way that is readable by the interpreter. When the result
4980is longer than one line, the ``pretty printer'' adds line breaks and
4981indentation to more clearly reveal data structure:
4982
4983\begin{verbatim}
4984 >>> import pprint
4985 >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
4986 ... 'yellow'], 'blue']]]
4987 ...
4988 >>> pprint.pprint(t, width=30)
4989 [[[['black', 'cyan'],
4990 'white',
4991 ['green', 'red']],
4992 [['magenta', 'yellow'],
4993 'blue']]]
4994\end{verbatim}
4995
4996The \ulink{\module{textwrap}}{../lib/module-textwrap.html} module
4997formats paragraphs of text to fit a given screen width:
4998
4999\begin{verbatim}
5000 >>> import textwrap
5001 >>> doc = """The wrap() method is just like fill() except that it returns
5002 ... a list of strings instead of one big string with newlines to separate
5003 ... the wrapped lines."""
5004 ...
5005 >>> print textwrap.fill(doc, width=40)
5006 The wrap() method is just like fill()
5007 except that it returns a list of strings
5008 instead of one big string with newlines
5009 to separate the wrapped lines.
5010\end{verbatim}
5011
5012The \ulink{\module{locale}}{../lib/module-locale.html} module accesses
5013a database of culture specific data formats. The grouping attribute
5014of locale's format function provides a direct way of formatting numbers
5015with group separators:
5016
5017\begin{verbatim}
5018 >>> import locale
5019 >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
5020 'English_United States.1252'
5021 >>> conv = locale.localeconv() # get a mapping of conventions
5022 >>> x = 1234567.8
5023 >>> locale.format("%d", x, grouping=True)
5024 '1,234,567'
5025 >>> locale.format("%s%.*f", (conv['currency_symbol'],
5026 ... conv['frac_digits'], x), grouping=True)
5027 '$1,234,567.80'
5028\end{verbatim}
5029
5030
5031\section{Templating\label{templating}}
5032
5033The \ulink{\module{string}}{../lib/module-string.html} module includes a
5034versatile \class{Template} class with a simplified syntax suitable for
5035editing by end-users. This allows users to customize their applications
5036without having to alter the application.
5037
5038The format uses placeholder names formed by \samp{\$} with valid Python
5039identifiers (alphanumeric characters and underscores). Surrounding the
5040placeholder with braces allows it to be followed by more alphanumeric letters
5041with no intervening spaces. Writing \samp{\$\$} creates a single escaped
5042\samp{\$}:
5043
5044\begin{verbatim}
5045>>> from string import Template
5046>>> t = Template('${village}folk send $$10 to $cause.')
5047>>> t.substitute(village='Nottingham', cause='the ditch fund')
5048'Nottinghamfolk send $10 to the ditch fund.'
5049\end{verbatim}
5050
5051The \method{substitute} method raises a \exception{KeyError} when a
5052placeholder is not supplied in a dictionary or a keyword argument. For
5053mail-merge style applications, user supplied data may be incomplete and the
5054\method{safe_substitute} method may be more appropriate --- it will leave
5055placeholders unchanged if data is missing:
5056
5057\begin{verbatim}
5058>>> t = Template('Return the $item to $owner.')
5059>>> d = dict(item='unladen swallow')
5060>>> t.substitute(d)
5061Traceback (most recent call last):
5062 . . .
5063KeyError: 'owner'
5064>>> t.safe_substitute(d)
5065'Return the unladen swallow to $owner.'
5066\end{verbatim}
5067
5068Template subclasses can specify a custom delimiter. For example, a batch
5069renaming utility for a photo browser may elect to use percent signs for
5070placeholders such as the current date, image sequence number, or file format:
5071
5072\begin{verbatim}
5073>>> import time, os.path
5074>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
5075>>> class BatchRename(Template):
5076... delimiter = '%'
5077>>> fmt = raw_input('Enter rename style (%d-date %n-seqnum %f-format): ')
5078Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f
5079
5080>>> t = BatchRename(fmt)
5081>>> date = time.strftime('%d%b%y')
5082>>> for i, filename in enumerate(photofiles):
5083... base, ext = os.path.splitext(filename)
5084... newname = t.substitute(d=date, n=i, f=ext)
5085... print '%s --> %s' % (filename, newname)
5086
5087img_1074.jpg --> Ashley_0.jpg
5088img_1076.jpg --> Ashley_1.jpg
5089img_1077.jpg --> Ashley_2.jpg
5090\end{verbatim}
5091
5092Another application for templating is separating program logic from the
5093details of multiple output formats. This makes it possible to substitute
5094custom templates for XML files, plain text reports, and HTML web reports.
5095
5096
5097\section{Working with Binary Data Record Layouts\label{binary-formats}}
5098
5099The \ulink{\module{struct}}{../lib/module-struct.html} module provides
5100\function{pack()} and \function{unpack()} functions for working with
5101variable length binary record formats. The following example shows how
5102to loop through header information in a ZIP file (with pack codes
5103\code{"H"} and \code{"L"} representing two and four byte unsigned
5104numbers respectively):
5105
5106\begin{verbatim}
5107 import struct
5108
5109 data = open('myfile.zip', 'rb').read()
5110 start = 0
5111 for i in range(3): # show the first 3 file headers
5112 start += 14
5113 fields = struct.unpack('LLLHH', data[start:start+16])
5114 crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
5115
5116 start += 16
5117 filename = data[start:start+filenamesize]
5118 start += filenamesize
5119 extra = data[start:start+extra_size]
5120 print filename, hex(crc32), comp_size, uncomp_size
5121
5122 start += extra_size + comp_size # skip to the next header
5123\end{verbatim}
5124
5125
5126\section{Multi-threading\label{multi-threading}}
5127
5128Threading is a technique for decoupling tasks which are not sequentially
5129dependent. Threads can be used to improve the responsiveness of
5130applications that accept user input while other tasks run in the
5131background. A related use case is running I/O in parallel with
5132computations in another thread.
5133
5134The following code shows how the high level
5135\ulink{\module{threading}}{../lib/module-threading.html} module can run
5136tasks in background while the main program continues to run:
5137
5138\begin{verbatim}
5139 import threading, zipfile
5140
5141 class AsyncZip(threading.Thread):
5142 def __init__(self, infile, outfile):
5143 threading.Thread.__init__(self)
5144 self.infile = infile
5145 self.outfile = outfile
5146 def run(self):
5147 f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
5148 f.write(self.infile)
5149 f.close()
5150 print 'Finished background zip of: ', self.infile
5151
5152 background = AsyncZip('mydata.txt', 'myarchive.zip')
5153 background.start()
5154 print 'The main program continues to run in foreground.'
5155
5156 background.join() # Wait for the background task to finish
5157 print 'Main program waited until background was done.'
5158\end{verbatim}
5159
5160The principal challenge of multi-threaded applications is coordinating
5161threads that share data or other resources. To that end, the threading
5162module provides a number of synchronization primitives including locks,
5163events, condition variables, and semaphores.
5164
5165While those tools are powerful, minor design errors can result in
5166problems that are difficult to reproduce. So, the preferred approach
5167to task coordination is to concentrate all access to a resource
5168in a single thread and then use the
5169\ulink{\module{Queue}}{../lib/module-Queue.html} module to feed that
5170thread with requests from other threads. Applications using
5171\class{Queue} objects for inter-thread communication and coordination
5172are easier to design, more readable, and more reliable.
5173
5174
5175\section{Logging\label{logging}}
5176
5177The \ulink{\module{logging}}{../lib/module-logging.html} module offers
5178a full featured and flexible logging system. At its simplest, log
5179messages are sent to a file or to \code{sys.stderr}:
5180
5181\begin{verbatim}
5182 import logging
5183 logging.debug('Debugging information')
5184 logging.info('Informational message')
5185 logging.warning('Warning:config file %s not found', 'server.conf')
5186 logging.error('Error occurred')
5187 logging.critical('Critical error -- shutting down')
5188\end{verbatim}
5189
5190This produces the following output:
5191
5192\begin{verbatim}
5193 WARNING:root:Warning:config file server.conf not found
5194 ERROR:root:Error occurred
5195 CRITICAL:root:Critical error -- shutting down
5196\end{verbatim}
5197
5198By default, informational and debugging messages are suppressed and the
5199output is sent to standard error. Other output options include routing
5200messages through email, datagrams, sockets, or to an HTTP Server. New
5201filters can select different routing based on message priority:
5202\constant{DEBUG}, \constant{INFO}, \constant{WARNING}, \constant{ERROR},
5203and \constant{CRITICAL}.
5204
5205The logging system can be configured directly from Python or can be
5206loaded from a user editable configuration file for customized logging
5207without altering the application.
5208
5209
5210\section{Weak References\label{weak-references}}
5211
5212Python does automatic memory management (reference counting for most
5213objects and garbage collection to eliminate cycles). The memory is
5214freed shortly after the last reference to it has been eliminated.
5215
5216This approach works fine for most applications but occasionally there
5217is a need to track objects only as long as they are being used by
5218something else. Unfortunately, just tracking them creates a reference
5219that makes them permanent. The
5220\ulink{\module{weakref}}{../lib/module-weakref.html} module provides
5221tools for tracking objects without creating a reference. When the
5222object is no longer needed, it is automatically removed from a weakref
5223table and a callback is triggered for weakref objects. Typical
5224applications include caching objects that are expensive to create:
5225
5226\begin{verbatim}
5227 >>> import weakref, gc
5228 >>> class A:
5229 ... def __init__(self, value):
5230 ... self.value = value
5231 ... def __repr__(self):
5232 ... return str(self.value)
5233 ...
5234 >>> a = A(10) # create a reference
5235 >>> d = weakref.WeakValueDictionary()
5236 >>> d['primary'] = a # does not create a reference
5237 >>> d['primary'] # fetch the object if it is still alive
5238 10
5239 >>> del a # remove the one reference
5240 >>> gc.collect() # run garbage collection right away
5241 0
5242 >>> d['primary'] # entry was automatically removed
5243 Traceback (most recent call last):
5244 File "<pyshell#108>", line 1, in -toplevel-
5245 d['primary'] # entry was automatically removed
5246 File "C:/PY24/lib/weakref.py", line 46, in __getitem__
5247 o = self.data[key]()
5248 KeyError: 'primary'
5249\end{verbatim}
5250
5251\section{Tools for Working with Lists\label{list-tools}}
5252
5253Many data structure needs can be met with the built-in list type.
5254However, sometimes there is a need for alternative implementations
5255with different performance trade-offs.
5256
5257The \ulink{\module{array}}{../lib/module-array.html} module provides an
5258\class{array()} object that is like a list that stores only homogenous
5259data and stores it more compactly. The following example shows an array
5260of numbers stored as two byte unsigned binary numbers (typecode
5261\code{"H"}) rather than the usual 16 bytes per entry for regular lists
5262of python int objects:
5263
5264\begin{verbatim}
5265 >>> from array import array
5266 >>> a = array('H', [4000, 10, 700, 22222])
5267 >>> sum(a)
5268 26932
5269 >>> a[1:3]
5270 array('H', [10, 700])
5271\end{verbatim}
5272
5273The \ulink{\module{collections}}{../lib/module-collections.html} module
5274provides a \class{deque()} object that is like a list with faster
5275appends and pops from the left side but slower lookups in the middle.
5276These objects are well suited for implementing queues and breadth first
5277tree searches:
5278
5279\begin{verbatim}
5280 >>> from collections import deque
5281 >>> d = deque(["task1", "task2", "task3"])
5282 >>> d.append("task4")
5283 >>> print "Handling", d.popleft()
5284 Handling task1
5285
5286 unsearched = deque([starting_node])
5287 def breadth_first_search(unsearched):
5288 node = unsearched.popleft()
5289 for m in gen_moves(node):
5290 if is_goal(m):
5291 return m
5292 unsearched.append(m)
5293\end{verbatim}
5294
5295In addition to alternative list implementations, the library also offers
5296other tools such as the \ulink{\module{bisect}}{../lib/module-bisect.html}
5297module with functions for manipulating sorted lists:
5298
5299\begin{verbatim}
5300 >>> import bisect
5301 >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
5302 >>> bisect.insort(scores, (300, 'ruby'))
5303 >>> scores
5304 [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
5305\end{verbatim}
5306
5307The \ulink{\module{heapq}}{../lib/module-heapq.html} module provides
5308functions for implementing heaps based on regular lists. The lowest
5309valued entry is always kept at position zero. This is useful for
5310applications which repeatedly access the smallest element but do not
5311want to run a full list sort:
5312
5313\begin{verbatim}
5314 >>> from heapq import heapify, heappop, heappush
5315 >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
5316 >>> heapify(data) # rearrange the list into heap order
5317 >>> heappush(data, -5) # add a new entry
5318 >>> [heappop(data) for i in range(3)] # fetch the three smallest entries
5319 [-5, 0, 1]
5320\end{verbatim}
5321
5322
5323\section{Decimal Floating Point Arithmetic\label{decimal-fp}}
5324
5325The \ulink{\module{decimal}}{../lib/module-decimal.html} module offers a
5326\class{Decimal} datatype for decimal floating point arithmetic. Compared to
5327the built-in \class{float} implementation of binary floating point, the new
5328class is especially helpful for financial applications and other uses which
5329require exact decimal representation, control over precision, control over
5330rounding to meet legal or regulatory requirements, tracking of significant
5331decimal places, or for applications where the user expects the results to
5332match calculations done by hand.
5333
5334For example, calculating a 5\%{} tax on a 70 cent phone charge gives
5335different results in decimal floating point and binary floating point.
5336The difference becomes significant if the results are rounded to the
5337nearest cent:
5338
5339\begin{verbatim}
5340>>> from decimal import *
5341>>> Decimal('0.70') * Decimal('1.05')
5342Decimal("0.7350")
5343>>> .70 * 1.05
53440.73499999999999999
5345\end{verbatim}
5346
5347The \class{Decimal} result keeps a trailing zero, automatically inferring four
5348place significance from multiplicands with two place significance. Decimal reproduces
5349mathematics as done by hand and avoids issues that can arise when binary
5350floating point cannot exactly represent decimal quantities.
5351
5352Exact representation enables the \class{Decimal} class to perform
5353modulo calculations and equality tests that are unsuitable for binary
5354floating point:
5355
5356\begin{verbatim}
5357>>> Decimal('1.00') % Decimal('.10')
5358Decimal("0.00")
5359>>> 1.00 % 0.10
53600.09999999999999995
5361
5362>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
5363True
5364>>> sum([0.1]*10) == 1.0
5365False
5366\end{verbatim}
5367
5368The \module{decimal} module provides arithmetic with as much precision as
5369needed:
5370
5371\begin{verbatim}
5372>>> getcontext().prec = 36
5373>>> Decimal(1) / Decimal(7)
5374Decimal("0.142857142857142857142857142857142857")
5375\end{verbatim}
5376
5377
5378
5379\chapter{What Now? \label{whatNow}}
5380
5381Reading this tutorial has probably reinforced your interest in using
5382Python --- you should be eager to apply Python to solving your
5383real-world problems. Where should you go to learn more?
5384
5385This tutorial is part of Python's documentation set.
5386Some other documents in the set are:
5387
5388\begin{itemize}
5389
5390\item \citetitle[../lib/lib.html]{Python Library Reference}:
5391
5392You should browse through this manual, which gives complete (though
5393terse) reference material about types, functions, and the modules in
5394the standard library. The standard Python distribution includes a
5395\emph{lot} of additional code. There are modules to read \UNIX{}
5396mailboxes, retrieve documents via HTTP, generate random numbers, parse
5397command-line options, write CGI programs, compress data, and many other tasks.
5398Skimming through the Library Reference will give you an idea of
5399what's available.
5400
5401\item \citetitle[../inst/inst.html]{Installing Python Modules}
5402explains how to install external modules written by other Python
5403users.
5404
5405\item \citetitle[../ref/ref.html]{Language Reference}: A detailed
5406explanation of Python's syntax and semantics. It's heavy reading,
5407but is useful as a complete guide to the language itself.
5408
5409\end{itemize}
5410
5411More Python resources:
5412
5413\begin{itemize}
5414
5415\item \url{http://www.python.org}: The major Python Web site. It contains
5416code, documentation, and pointers to Python-related pages around the
5417Web. This Web site is mirrored in various places around the
5418world, such as Europe, Japan, and Australia; a mirror may be faster
5419than the main site, depending on your geographical location.
5420
5421\item \url{http://docs.python.org}: Fast access to Python's
5422documentation.
5423
5424\item \url{http://cheeseshop.python.org}:
5425The Python Package Index, nicknamed the Cheese Shop,
5426is an index of user-created Python modules that are available for
5427download. Once you begin releasing code, you can register it
5428here so that others can find it.
5429
5430\item \url{http://aspn.activestate.com/ASPN/Python/Cookbook/}: The
5431Python Cookbook is a sizable collection of code examples, larger
5432modules, and useful scripts. Particularly notable contributions are
5433collected in a book also titled \citetitle{Python Cookbook} (O'Reilly
5434\& Associates, ISBN 0-596-00797-3.)
5435
5436\end{itemize}
5437
5438
5439For Python-related questions and problem reports, you can post to the
5440newsgroup \newsgroup{comp.lang.python}, or send them to the mailing
5441list at \email{python-list@python.org}. The newsgroup and mailing list
5442are gatewayed, so messages posted to one will automatically be
5443forwarded to the other. There are around 120 postings a day (with peaks
5444up to several hundred),
5445% Postings figure based on average of last six months activity as
5446% reported by www.egroups.com; Jan. 2000 - June 2000: 21272 msgs / 182
5447% days = 116.9 msgs / day and steadily increasing.
5448asking (and answering) questions, suggesting new features, and
5449announcing new modules. Before posting, be sure to check the list of
5450\ulink{Frequently Asked Questions}{http://www.python.org/doc/faq/} (also called the FAQ), or look for it in the
5451\file{Misc/} directory of the Python source distribution. Mailing
5452list archives are available at \url{http://mail.python.org/pipermail/}.
5453The FAQ answers many of the questions that come up again and again,
5454and may already contain the solution for your problem.
5455
5456
5457\appendix
5458
5459\chapter{Interactive Input Editing and History Substitution\label{interacting}}
5460
5461Some versions of the Python interpreter support editing of the current
5462input line and history substitution, similar to facilities found in
5463the Korn shell and the GNU Bash shell. This is implemented using the
5464\emph{GNU Readline} library, which supports Emacs-style and vi-style
5465editing. This library has its own documentation which I won't
5466duplicate here; however, the basics are easily explained. The
5467interactive editing and history described here are optionally
5468available in the \UNIX{} and Cygwin versions of the interpreter.
5469
5470This chapter does \emph{not} document the editing facilities of Mark
5471Hammond's PythonWin package or the Tk-based environment, IDLE,
5472distributed with Python. The command line history recall which
5473operates within DOS boxes on NT and some other DOS and Windows flavors
5474is yet another beast.
5475
5476\section{Line Editing \label{lineEditing}}
5477
5478If supported, input line editing is active whenever the interpreter
5479prints a primary or secondary prompt. The current line can be edited
5480using the conventional Emacs control characters. The most important
5481of these are: \kbd{C-A} (Control-A) moves the cursor to the beginning
5482of the line, \kbd{C-E} to the end, \kbd{C-B} moves it one position to
5483the left, \kbd{C-F} to the right. Backspace erases the character to
5484the left of the cursor, \kbd{C-D} the character to its right.
5485\kbd{C-K} kills (erases) the rest of the line to the right of the
5486cursor, \kbd{C-Y} yanks back the last killed string.
5487\kbd{C-underscore} undoes the last change you made; it can be repeated
5488for cumulative effect.
5489
5490\section{History Substitution \label{history}}
5491
5492History substitution works as follows. All non-empty input lines
5493issued are saved in a history buffer, and when a new prompt is given
5494you are positioned on a new line at the bottom of this buffer.
5495\kbd{C-P} moves one line up (back) in the history buffer,
5496\kbd{C-N} moves one down. Any line in the history buffer can be
5497edited; an asterisk appears in front of the prompt to mark a line as
5498modified. Pressing the \kbd{Return} key passes the current line to
5499the interpreter. \kbd{C-R} starts an incremental reverse search;
5500\kbd{C-S} starts a forward search.
5501
5502\section{Key Bindings \label{keyBindings}}
5503
5504The key bindings and some other parameters of the Readline library can
5505be customized by placing commands in an initialization file called
5506\file{\~{}/.inputrc}. Key bindings have the form
5507
5508\begin{verbatim}
5509key-name: function-name
5510\end{verbatim}
5511
5512or
5513
5514\begin{verbatim}
5515"string": function-name
5516\end{verbatim}
5517
5518and options can be set with
5519
5520\begin{verbatim}
5521set option-name value
5522\end{verbatim}
5523
5524For example:
5525
5526\begin{verbatim}
5527# I prefer vi-style editing:
5528set editing-mode vi
5529
5530# Edit using a single line:
5531set horizontal-scroll-mode On
5532
5533# Rebind some keys:
5534Meta-h: backward-kill-word
5535"\C-u": universal-argument
5536"\C-x\C-r": re-read-init-file
5537\end{verbatim}
5538
5539Note that the default binding for \kbd{Tab} in Python is to insert a
5540\kbd{Tab} character instead of Readline's default filename completion
5541function. If you insist, you can override this by putting
5542
5543\begin{verbatim}
5544Tab: complete
5545\end{verbatim}
5546
5547in your \file{\~{}/.inputrc}. (Of course, this makes it harder to
5548type indented continuation lines if you're accustomed to using
5549\kbd{Tab} for that purpose.)
5550
5551Automatic completion of variable and module names is optionally
5552available. To enable it in the interpreter's interactive mode, add
5553the following to your startup file:\footnote{
5554 Python will execute the contents of a file identified by the
5555 \envvar{PYTHONSTARTUP} environment variable when you start an
5556 interactive interpreter.}
5557\refstmodindex{rlcompleter}\refbimodindex{readline}
5558
5559\begin{verbatim}
5560import rlcompleter, readline
5561readline.parse_and_bind('tab: complete')
5562\end{verbatim}
5563
5564This binds the \kbd{Tab} key to the completion function, so hitting
5565the \kbd{Tab} key twice suggests completions; it looks at Python
5566statement names, the current local variables, and the available module
5567names. For dotted expressions such as \code{string.a}, it will
5568evaluate the expression up to the final \character{.} and then
5569suggest completions from the attributes of the resulting object. Note
5570that this may execute application-defined code if an object with a
5571\method{__getattr__()} method is part of the expression.
5572
5573A more capable startup file might look like this example. Note that
5574this deletes the names it creates once they are no longer needed; this
5575is done since the startup file is executed in the same namespace as
5576the interactive commands, and removing the names avoids creating side
5577effects in the interactive environment. You may find it convenient
5578to keep some of the imported modules, such as
5579\ulink{\module{os}}{../lib/module-os.html}, which turn
5580out to be needed in most sessions with the interpreter.
5581
5582\begin{verbatim}
5583# Add auto-completion and a stored history file of commands to your Python
5584# interactive interpreter. Requires Python 2.0+, readline. Autocomplete is
5585# bound to the Esc key by default (you can change it - see readline docs).
5586#
5587# Store the file in ~/.pystartup, and set an environment variable to point
5588# to it: "export PYTHONSTARTUP=/max/home/itamar/.pystartup" in bash.
5589#
5590# Note that PYTHONSTARTUP does *not* expand "~", so you have to put in the
5591# full path to your home directory.
5592
5593import atexit
5594import os
5595import readline
5596import rlcompleter
5597
5598historyPath = os.path.expanduser("~/.pyhistory")
5599
5600def save_history(historyPath=historyPath):
5601 import readline
5602 readline.write_history_file(historyPath)
5603
5604if os.path.exists(historyPath):
5605 readline.read_history_file(historyPath)
5606
5607atexit.register(save_history)
5608del os, atexit, readline, rlcompleter, save_history, historyPath
5609\end{verbatim}
5610
5611
5612\section{Commentary \label{commentary}}
5613
5614This facility is an enormous step forward compared to earlier versions
5615of the interpreter; however, some wishes are left: It would be nice if
5616the proper indentation were suggested on continuation lines (the
5617parser knows if an indent token is required next). The completion
5618mechanism might use the interpreter's symbol table. A command to
5619check (or even suggest) matching parentheses, quotes, etc., would also
5620be useful.
5621
5622
5623\chapter{Floating Point Arithmetic: Issues and Limitations\label{fp-issues}}
5624\sectionauthor{Tim Peters}{tim_one@users.sourceforge.net}
5625
5626Floating-point numbers are represented in computer hardware as
5627base 2 (binary) fractions. For example, the decimal fraction
5628
5629\begin{verbatim}
56300.125
5631\end{verbatim}
5632
5633has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction
5634
5635\begin{verbatim}
56360.001
5637\end{verbatim}
5638
5639has value 0/2 + 0/4 + 1/8. These two fractions have identical values,
5640the only real difference being that the first is written in base 10
5641fractional notation, and the second in base 2.
5642
5643Unfortunately, most decimal fractions cannot be represented exactly as
5644binary fractions. A consequence is that, in general, the decimal
5645floating-point numbers you enter are only approximated by the binary
5646floating-point numbers actually stored in the machine.
5647
5648The problem is easier to understand at first in base 10. Consider the
5649fraction 1/3. You can approximate that as a base 10 fraction:
5650
5651\begin{verbatim}
56520.3
5653\end{verbatim}
5654
5655or, better,
5656
5657\begin{verbatim}
56580.33
5659\end{verbatim}
5660
5661or, better,
5662
5663\begin{verbatim}
56640.333
5665\end{verbatim}
5666
5667and so on. No matter how many digits you're willing to write down, the
5668result will never be exactly 1/3, but will be an increasingly better
5669approximation of 1/3.
5670
5671In the same way, no matter how many base 2 digits you're willing to
5672use, the decimal value 0.1 cannot be represented exactly as a base 2
5673fraction. In base 2, 1/10 is the infinitely repeating fraction
5674
5675\begin{verbatim}
56760.0001100110011001100110011001100110011001100110011...
5677\end{verbatim}
5678
5679Stop at any finite number of bits, and you get an approximation. This
5680is why you see things like:
5681
5682\begin{verbatim}
5683>>> 0.1
56840.10000000000000001
5685\end{verbatim}
5686
5687On most machines today, that is what you'll see if you enter 0.1 at
5688a Python prompt. You may not, though, because the number of bits
5689used by the hardware to store floating-point values can vary across
5690machines, and Python only prints a decimal approximation to the true
5691decimal value of the binary approximation stored by the machine. On
5692most machines, if Python were to print the true decimal value of
5693the binary approximation stored for 0.1, it would have to display
5694
5695\begin{verbatim}
5696>>> 0.1
56970.1000000000000000055511151231257827021181583404541015625
5698\end{verbatim}
5699
5700instead! The Python prompt uses the builtin
5701\function{repr()} function to obtain a string version of everything it
5702displays. For floats, \code{repr(\var{float})} rounds the true
5703decimal value to 17 significant digits, giving
5704
5705\begin{verbatim}
57060.10000000000000001
5707\end{verbatim}
5708
5709\code{repr(\var{float})} produces 17 significant digits because it
5710turns out that's enough (on most machines) so that
5711\code{eval(repr(\var{x})) == \var{x}} exactly for all finite floats
5712\var{x}, but rounding to 16 digits is not enough to make that true.
5713
5714Note that this is in the very nature of binary floating-point: this is
5715not a bug in Python, and it is not a bug in your code either. You'll
5716see the same kind of thing in all languages that support your
5717hardware's floating-point arithmetic (although some languages may
5718not \emph{display} the difference by default, or in all output modes).
5719
5720Python's builtin \function{str()} function produces only 12
5721significant digits, and you may wish to use that instead. It's
5722unusual for \code{eval(str(\var{x}))} to reproduce \var{x}, but the
5723output may be more pleasant to look at:
5724
5725\begin{verbatim}
5726>>> print str(0.1)
57270.1
5728\end{verbatim}
5729
5730It's important to realize that this is, in a real sense, an illusion:
5731the value in the machine is not exactly 1/10, you're simply rounding
5732the \emph{display} of the true machine value.
5733
5734Other surprises follow from this one. For example, after seeing
5735
5736\begin{verbatim}
5737>>> 0.1
57380.10000000000000001
5739\end{verbatim}
5740
5741you may be tempted to use the \function{round()} function to chop it
5742back to the single digit you expect. But that makes no difference:
5743
5744\begin{verbatim}
5745>>> round(0.1, 1)
57460.10000000000000001
5747\end{verbatim}
5748
5749The problem is that the binary floating-point value stored for "0.1"
5750was already the best possible binary approximation to 1/10, so trying
5751to round it again can't make it better: it was already as good as it
5752gets.
5753
5754Another consequence is that since 0.1 is not exactly 1/10,
5755summing ten values of 0.1 may not yield exactly 1.0, either:
5756
5757\begin{verbatim}
5758>>> sum = 0.0
5759>>> for i in range(10):
5760... sum += 0.1
5761...
5762>>> sum
57630.99999999999999989
5764\end{verbatim}
5765
5766Binary floating-point arithmetic holds many surprises like this. The
5767problem with "0.1" is explained in precise detail below, in the
5768"Representation Error" section. See
5769\citetitle[http://www.lahey.com/float.htm]{The Perils of Floating
5770Point} for a more complete account of other common surprises.
5771
5772As that says near the end, ``there are no easy answers.'' Still,
5773don't be unduly wary of floating-point! The errors in Python float
5774operations are inherited from the floating-point hardware, and on most
5775machines are on the order of no more than 1 part in 2**53 per
5776operation. That's more than adequate for most tasks, but you do need
5777to keep in mind that it's not decimal arithmetic, and that every float
5778operation can suffer a new rounding error.
5779
5780While pathological cases do exist, for most casual use of
5781floating-point arithmetic you'll see the result you expect in the end
5782if you simply round the display of your final results to the number of
5783decimal digits you expect. \function{str()} usually suffices, and for
5784finer control see the discussion of Python's \code{\%} format
5785operator: the \code{\%g}, \code{\%f} and \code{\%e} format codes
5786supply flexible and easy ways to round float results for display.
5787
5788
5789\section{Representation Error
5790 \label{fp-error}}
5791
5792This section explains the ``0.1'' example in detail, and shows how
5793you can perform an exact analysis of cases like this yourself. Basic
5794familiarity with binary floating-point representation is assumed.
5795
5796\dfn{Representation error} refers to the fact that some (most, actually)
5797decimal fractions cannot be represented exactly as binary (base 2)
5798fractions. This is the chief reason why Python (or Perl, C, \Cpp,
5799Java, Fortran, and many others) often won't display the exact decimal
5800number you expect:
5801
5802\begin{verbatim}
5803>>> 0.1
58040.10000000000000001
5805\end{verbatim}
5806
5807Why is that? 1/10 is not exactly representable as a binary fraction.
5808Almost all machines today (November 2000) use IEEE-754 floating point
5809arithmetic, and almost all platforms map Python floats to IEEE-754
5810"double precision". 754 doubles contain 53 bits of precision, so on
5811input the computer strives to convert 0.1 to the closest fraction it can
5812of the form \var{J}/2**\var{N} where \var{J} is an integer containing
5813exactly 53 bits. Rewriting
5814
5815\begin{verbatim}
5816 1 / 10 ~= J / (2**N)
5817\end{verbatim}
5818
5819as
5820
5821\begin{verbatim}
5822J ~= 2**N / 10
5823\end{verbatim}
5824
5825and recalling that \var{J} has exactly 53 bits (is \code{>= 2**52} but
5826\code{< 2**53}), the best value for \var{N} is 56:
5827
5828\begin{verbatim}
5829>>> 2**52
58304503599627370496L
5831>>> 2**53
58329007199254740992L
5833>>> 2**56/10
58347205759403792793L
5835\end{verbatim}
5836
5837That is, 56 is the only value for \var{N} that leaves \var{J} with
5838exactly 53 bits. The best possible value for \var{J} is then that
5839quotient rounded:
5840
5841\begin{verbatim}
5842>>> q, r = divmod(2**56, 10)
5843>>> r
58446L
5845\end{verbatim}
5846
5847Since the remainder is more than half of 10, the best approximation is
5848obtained by rounding up:
5849
5850\begin{verbatim}
5851>>> q+1
58527205759403792794L
5853\end{verbatim}
5854
5855Therefore the best possible approximation to 1/10 in 754 double
5856precision is that over 2**56, or
5857
5858\begin{verbatim}
58597205759403792794 / 72057594037927936
5860\end{verbatim}
5861
5862Note that since we rounded up, this is actually a little bit larger than
58631/10; if we had not rounded up, the quotient would have been a little
5864bit smaller than 1/10. But in no case can it be \emph{exactly} 1/10!
5865
5866So the computer never ``sees'' 1/10: what it sees is the exact
5867fraction given above, the best 754 double approximation it can get:
5868
5869\begin{verbatim}
5870>>> .1 * 2**56
58717205759403792794.0
5872\end{verbatim}
5873
5874If we multiply that fraction by 10**30, we can see the (truncated)
5875value of its 30 most significant decimal digits:
5876
5877\begin{verbatim}
5878>>> 7205759403792794 * 10**30 / 2**56
5879100000000000000005551115123125L
5880\end{verbatim}
5881
5882meaning that the exact number stored in the computer is approximately
5883equal to the decimal value 0.100000000000000005551115123125. Rounding
5884that to 17 significant digits gives the 0.10000000000000001 that Python
5885displays (well, will display on any 754-conforming platform that does
5886best-possible input and output conversions in its C library --- yours may
5887not!).
5888
5889\chapter{History and License}
5890\input{license}
5891
5892\input{glossary}
5893
5894\input{tut.ind}
5895
5896\end{document}
Note: See TracBrowser for help on using the repository browser.