Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

functional.rst

Last change on this file was 391, checked in by dmik, 11 years ago
python: Merge vendor 2.7.6 to trunk.
Property svn:eol-style set to `native`
File size: 46.5 KB

Rev	Line
[2]	1	********************************
	2	Functional Programming HOWTO
	3	********************************
	4
	5	:Author: A. M. Kuchling
	6	:Release: 0.31
	7
	8	In this document, we'll take a tour of Python's features suitable for
	9	implementing programs in a functional style. After an introduction to the
	10	concepts of functional programming, we'll look at language features such as
	11	:term:`iterator`\s and :term:`generator`\s and relevant library modules such as
	12	:mod:`itertools` and :mod:`functools`.
	13
	14
	15	Introduction
	16	============
	17
	18	This section explains the basic concept of functional programming; if you're
	19	just interested in learning about Python language features, skip to the next
	20	section.
	21
	22	Programming languages support decomposing problems in several different ways:
	23
	24	* Most programming languages are procedural: programs are lists of
	25	instructions that tell the computer what to do with the program's input. C,
	26	Pascal, and even Unix shells are procedural languages.
	27
	28	* In declarative languages, you write a specification that describes the
	29	problem to be solved, and the language implementation figures out how to
	30	perform the computation efficiently. SQL is the declarative language you're
	31	most likely to be familiar with; a SQL query describes the data set you want
	32	to retrieve, and the SQL engine decides whether to scan tables or use indexes,
	33	which subclauses should be performed first, etc.
	34
	35	* Object-oriented programs manipulate collections of objects. Objects have
	36	internal state and support methods that query or modify this internal state in
	37	some way. Smalltalk and Java are object-oriented languages. C++ and Python
	38	are languages that support object-oriented programming, but don't force the
	39	use of object-oriented features.
	40
	41	* Functional programming decomposes a problem into a set of functions.
	42	Ideally, functions only take inputs and produce outputs, and don't have any
	43	internal state that affects the output produced for a given input. Well-known
	44	functional languages include the ML family (Standard ML, OCaml, and other
	45	variants) and Haskell.
	46
[391]	47	The designers of some computer languages choose to emphasize one particular
	48	approach to programming. This often makes it difficult to write programs that
	49	use a different approach. Other languages are multi-paradigm languages that
	50	support several different approaches. Lisp, C++, and Python are
	51	multi-paradigm; you can write programs or libraries that are largely
	52	procedural, object-oriented, or functional in all of these languages. In a
	53	large program, different sections might be written using different approaches;
	54	the GUI might be object-oriented while the processing logic is procedural or
[2]	55	functional, for example.
	56
	57	In a functional program, input flows through a set of functions. Each function
	58	operates on its input and produces some output. Functional style discourages
	59	functions with side effects that modify internal state or make other changes
	60	that aren't visible in the function's return value. Functions that have no side
	61	effects at all are called purely functional. Avoiding side effects means
	62	not using data structures that get updated as a program runs; every function's
	63	output must only depend on its input.
	64
	65	Some languages are very strict about purity and don't even have assignment
	66	statements such as ``a=3`` or ``c = a + b``, but it's difficult to avoid all
	67	side effects. Printing to the screen or writing to a disk file are side
	68	effects, for example. For example, in Python a ``print`` statement or a
	69	``time.sleep(1)`` both return no useful value; they're only called for their
	70	side effects of sending some text to the screen or pausing execution for a
	71	second.
	72
	73	Python programs written in functional style usually won't go to the extreme of
	74	avoiding all I/O or all assignments; instead, they'll provide a
	75	functional-appearing interface but will use non-functional features internally.
	76	For example, the implementation of a function will still use assignments to
	77	local variables, but won't modify global variables or have other side effects.
	78
	79	Functional programming can be considered the opposite of object-oriented
	80	programming. Objects are little capsules containing some internal state along
	81	with a collection of method calls that let you modify this state, and programs
	82	consist of making the right set of state changes. Functional programming wants
	83	to avoid state changes as much as possible and works with data flowing between
	84	functions. In Python you might combine the two approaches by writing functions
	85	that take and return instances representing objects in your application (e-mail
	86	messages, transactions, etc.).
	87
	88	Functional design may seem like an odd constraint to work under. Why should you
	89	avoid objects and side effects? There are theoretical and practical advantages
	90	to the functional style:
	91
	92	* Formal provability.
	93	* Modularity.
	94	* Composability.
	95	* Ease of debugging and testing.
	96
	97
	98	Formal provability
	99	------------------
	100
	101	A theoretical benefit is that it's easier to construct a mathematical proof that
	102	a functional program is correct.
	103
	104	For a long time researchers have been interested in finding ways to
	105	mathematically prove programs correct. This is different from testing a program
	106	on numerous inputs and concluding that its output is usually correct, or reading
	107	a program's source code and concluding that the code looks right; the goal is
	108	instead a rigorous proof that a program produces the right result for all
	109	possible inputs.
	110
	111	The technique used to prove programs correct is to write down invariants,
	112	properties of the input data and of the program's variables that are always
	113	true. For each line of code, you then show that if invariants X and Y are true
	114	before the line is executed, the slightly different invariants X' and Y' are
	115	true after the line is executed. This continues until you reach the end of
	116	the program, at which point the invariants should match the desired conditions
	117	on the program's output.
	118
	119	Functional programming's avoidance of assignments arose because assignments are
	120	difficult to handle with this technique; assignments can break invariants that
	121	were true before the assignment without producing any new invariants that can be
	122	propagated onward.
	123
	124	Unfortunately, proving programs correct is largely impractical and not relevant
	125	to Python software. Even trivial programs require proofs that are several pages
	126	long; the proof of correctness for a moderately complicated program would be
	127	enormous, and few or none of the programs you use daily (the Python interpreter,
	128	your XML parser, your web browser) could be proven correct. Even if you wrote
	129	down or generated a proof, there would then be the question of verifying the
	130	proof; maybe there's an error in it, and you wrongly believe you've proved the
	131	program correct.
	132
	133
	134	Modularity
	135	----------
	136
	137	A more practical benefit of functional programming is that it forces you to
	138	break apart your problem into small pieces. Programs are more modular as a
	139	result. It's easier to specify and write a small function that does one thing
	140	than a large function that performs a complicated transformation. Small
	141	functions are also easier to read and to check for errors.
	142
	143
	144	Ease of debugging and testing
	145	-----------------------------
	146
	147	Testing and debugging a functional-style program is easier.
	148
	149	Debugging is simplified because functions are generally small and clearly
	150	specified. When a program doesn't work, each function is an interface point
	151	where you can check that the data are correct. You can look at the intermediate
	152	inputs and outputs to quickly isolate the function that's responsible for a bug.
	153
	154	Testing is easier because each function is a potential subject for a unit test.
	155	Functions don't depend on system state that needs to be replicated before
	156	running a test; instead you only have to synthesize the right input and then
	157	check that the output matches expectations.
	158
	159
	160	Composability
	161	-------------
	162
	163	As you work on a functional-style program, you'll write a number of functions
	164	with varying inputs and outputs. Some of these functions will be unavoidably
	165	specialized to a particular application, but others will be useful in a wide
	166	variety of programs. For example, a function that takes a directory path and
	167	returns all the XML files in the directory, or a function that takes a filename
	168	and returns its contents, can be applied to many different situations.
	169
	170	Over time you'll form a personal library of utilities. Often you'll assemble
	171	new programs by arranging existing functions in a new configuration and writing
	172	a few functions specialized for the current task.
	173
	174
	175	Iterators
	176	=========
	177
	178	I'll start by looking at a Python language feature that's an important
	179	foundation for writing functional-style programs: iterators.
	180
	181	An iterator is an object representing a stream of data; this object returns the
	182	data one element at a time. A Python iterator must support a method called
	183	``next()`` that takes no arguments and always returns the next element of the
	184	stream. If there are no more elements in the stream, ``next()`` must raise the
	185	``StopIteration`` exception. Iterators don't have to be finite, though; it's
	186	perfectly reasonable to write an iterator that produces an infinite stream of
	187	data.
	188
	189	The built-in :func:`iter` function takes an arbitrary object and tries to return
	190	an iterator that will return the object's contents or elements, raising
	191	:exc:`TypeError` if the object doesn't support iteration. Several of Python's
	192	built-in data types support iteration, the most common being lists and
	193	dictionaries. An object is called an iterable object if you can get an
	194	iterator for it.
	195
	196	You can experiment with the iteration interface manually:
	197
	198	>>> L = [1,2,3]
	199	>>> it = iter(L)
	200	>>> print it
	201	<...iterator object at ...>
	202	>>> it.next()
	203	1
	204	>>> it.next()
	205	2
	206	>>> it.next()
	207	3
	208	>>> it.next()
	209	Traceback (most recent call last):
	210	File "<stdin>", line 1, in ?
	211	StopIteration
	212	>>>
	213
	214	Python expects iterable objects in several different contexts, the most
	215	important being the ``for`` statement. In the statement ``for X in Y``, Y must
	216	be an iterator or some object for which ``iter()`` can create an iterator.
	217	These two statements are equivalent::
	218
	219	for i in iter(obj):
	220	print i
	221
	222	for i in obj:
	223	print i
	224
	225	Iterators can be materialized as lists or tuples by using the :func:`list` or
	226	:func:`tuple` constructor functions:
	227
	228	>>> L = [1,2,3]
	229	>>> iterator = iter(L)
	230	>>> t = tuple(iterator)
	231	>>> t
	232	(1, 2, 3)
	233
	234	Sequence unpacking also supports iterators: if you know an iterator will return
	235	N elements, you can unpack them into an N-tuple:
	236
	237	>>> L = [1,2,3]
	238	>>> iterator = iter(L)
	239	>>> a,b,c = iterator
	240	>>> a,b,c
	241	(1, 2, 3)
	242
	243	Built-in functions such as :func:`max` and :func:`min` can take a single
	244	iterator argument and will return the largest or smallest element. The ``"in"``
	245	and ``"not in"`` operators also support iterators: ``X in iterator`` is true if
	246	X is found in the stream returned by the iterator. You'll run into obvious
[391]	247	problems if the iterator is infinite; ``max()``, ``min()``
[2]	248	will never return, and if the element X never appears in the stream, the
[391]	249	``"in"`` and ``"not in"`` operators won't return either.
[2]	250
	251	Note that you can only go forward in an iterator; there's no way to get the
	252	previous element, reset the iterator, or make a copy of it. Iterator objects
	253	can optionally provide these additional capabilities, but the iterator protocol
	254	only specifies the ``next()`` method. Functions may therefore consume all of
	255	the iterator's output, and if you need to do something different with the same
	256	stream, you'll have to create a new iterator.
	257
	258
	259
	260	Data Types That Support Iterators
	261	---------------------------------
	262
	263	We've already seen how lists and tuples support iterators. In fact, any Python
	264	sequence type, such as strings, will automatically support creation of an
	265	iterator.
	266
	267	Calling :func:`iter` on a dictionary returns an iterator that will loop over the
	268	dictionary's keys:
	269
	270	.. not a doctest since dict ordering varies across Pythons
	271
	272	::
	273
	274	>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
	275	... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
	276	>>> for key in m:
	277	... print key, m[key]
	278	Mar 3
	279	Feb 2
	280	Aug 8
	281	Sep 9
	282	Apr 4
	283	Jun 6
	284	Jul 7
	285	Jan 1
	286	May 5
	287	Nov 11
	288	Dec 12
	289	Oct 10
	290
	291	Note that the order is essentially random, because it's based on the hash
	292	ordering of the objects in the dictionary.
	293
	294	Applying ``iter()`` to a dictionary always loops over the keys, but dictionaries
	295	have methods that return other iterators. If you want to iterate over keys,
	296	values, or key/value pairs, you can explicitly call the ``iterkeys()``,
	297	``itervalues()``, or ``iteritems()`` methods to get an appropriate iterator.
	298
	299	The :func:`dict` constructor can accept an iterator that returns a finite stream
	300	of ``(key, value)`` tuples:
	301
	302	>>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')]
	303	>>> dict(iter(L))
	304	{'Italy': 'Rome', 'US': 'Washington DC', 'France': 'Paris'}
	305
	306	Files also support iteration by calling the ``readline()`` method until there
	307	are no more lines in the file. This means you can read each line of a file like
	308	this::
	309
	310	for line in file:
	311	# do something for each line
	312	...
	313
	314	Sets can take their contents from an iterable and let you iterate over the set's
	315	elements::
	316
	317	S = set((2, 3, 5, 7, 11, 13))
	318	for i in S:
	319	print i
	320
	321
	322
	323	Generator expressions and list comprehensions
	324	=============================================
	325
	326	Two common operations on an iterator's output are 1) performing some operation
	327	for every element, 2) selecting a subset of elements that meet some condition.
	328	For example, given a list of strings, you might want to strip off trailing
	329	whitespace from each line or extract all the strings containing a given
	330	substring.
	331
	332	List comprehensions and generator expressions (short form: "listcomps" and
	333	"genexps") are a concise notation for such operations, borrowed from the
[391]	334	functional programming language Haskell (http://www.haskell.org/). You can strip
[2]	335	all the whitespace from a stream of strings with the following code::
	336
	337	line_list = [' line 1\n', 'line 2 \n', ...]
	338
	339	# Generator expression -- returns iterator
	340	stripped_iter = (line.strip() for line in line_list)
	341
	342	# List comprehension -- returns list
	343	stripped_list = [line.strip() for line in line_list]
	344
	345	You can select only certain elements by adding an ``"if"`` condition::
	346
	347	stripped_list = [line.strip() for line in line_list
	348	if line != ""]
	349
	350	With a list comprehension, you get back a Python list; ``stripped_list`` is a
	351	list containing the resulting lines, not an iterator. Generator expressions
	352	return an iterator that computes the values as necessary, not needing to
	353	materialize all the values at once. This means that list comprehensions aren't
	354	useful if you're working with iterators that return an infinite stream or a very
	355	large amount of data. Generator expressions are preferable in these situations.
	356
	357	Generator expressions are surrounded by parentheses ("()") and list
	358	comprehensions are surrounded by square brackets ("[]"). Generator expressions
	359	have the form::
	360
	361	( expression for expr in sequence1
	362	if condition1
	363	for expr2 in sequence2
	364	if condition2
	365	for expr3 in sequence3 ...
	366	if condition3
	367	for exprN in sequenceN
	368	if conditionN )
	369
	370	Again, for a list comprehension only the outside brackets are different (square
	371	brackets instead of parentheses).
	372
	373	The elements of the generated output will be the successive values of
	374	``expression``. The ``if`` clauses are all optional; if present, ``expression``
	375	is only evaluated and added to the result when ``condition`` is true.
	376
	377	Generator expressions always have to be written inside parentheses, but the
	378	parentheses signalling a function call also count. If you want to create an
	379	iterator that will be immediately passed to a function you can write::
	380
	381	obj_total = sum(obj.count for obj in list_all_objects())
	382
	383	The ``for...in`` clauses contain the sequences to be iterated over. The
	384	sequences do not have to be the same length, because they are iterated over from
	385	left to right, not in parallel. For each element in ``sequence1``,
	386	``sequence2`` is looped over from the beginning. ``sequence3`` is then looped
	387	over for each resulting pair of elements from ``sequence1`` and ``sequence2``.
	388
	389	To put it another way, a list comprehension or generator expression is
	390	equivalent to the following Python code::
	391
	392	for expr1 in sequence1:
	393	if not (condition1):
	394	continue # Skip this element
	395	for expr2 in sequence2:
	396	if not (condition2):
	397	continue # Skip this element
	398	...
	399	for exprN in sequenceN:
	400	if not (conditionN):
	401	continue # Skip this element
	402
	403	# Output the value of
	404	# the expression.
	405
	406	This means that when there are multiple ``for...in`` clauses but no ``if``
	407	clauses, the length of the resulting output will be equal to the product of the
	408	lengths of all the sequences. If you have two lists of length 3, the output
	409	list is 9 elements long:
	410
	411	.. doctest::
	412	:options: +NORMALIZE_WHITESPACE
	413
	414	>>> seq1 = 'abc'
	415	>>> seq2 = (1,2,3)
	416	>>> [(x,y) for x in seq1 for y in seq2]
	417	[('a', 1), ('a', 2), ('a', 3),
	418	('b', 1), ('b', 2), ('b', 3),
	419	('c', 1), ('c', 2), ('c', 3)]
	420
	421	To avoid introducing an ambiguity into Python's grammar, if ``expression`` is
	422	creating a tuple, it must be surrounded with parentheses. The first list
	423	comprehension below is a syntax error, while the second one is correct::
	424
	425	# Syntax error
	426	[ x,y for x in seq1 for y in seq2]
	427	# Correct
	428	[ (x,y) for x in seq1 for y in seq2]
	429
	430
	431	Generators
	432	==========
	433
	434	Generators are a special class of functions that simplify the task of writing
	435	iterators. Regular functions compute a value and return it, but generators
	436	return an iterator that returns a stream of values.
	437
	438	You're doubtless familiar with how regular function calls work in Python or C.
	439	When you call a function, it gets a private namespace where its local variables
	440	are created. When the function reaches a ``return`` statement, the local
	441	variables are destroyed and the value is returned to the caller. A later call
	442	to the same function creates a new private namespace and a fresh set of local
	443	variables. But, what if the local variables weren't thrown away on exiting a
	444	function? What if you could later resume the function where it left off? This
	445	is what generators provide; they can be thought of as resumable functions.
	446
	447	Here's the simplest example of a generator function:
	448
	449	.. testcode::
	450
	451	def generate_ints(N):
	452	for i in range(N):
	453	yield i
	454
	455	Any function containing a ``yield`` keyword is a generator function; this is
	456	detected by Python's :term:`bytecode` compiler which compiles the function
	457	specially as a result.
	458
	459	When you call a generator function, it doesn't return a single value; instead it
	460	returns a generator object that supports the iterator protocol. On executing
	461	the ``yield`` expression, the generator outputs the value of ``i``, similar to a
	462	``return`` statement. The big difference between ``yield`` and a ``return``
	463	statement is that on reaching a ``yield`` the generator's state of execution is
	464	suspended and local variables are preserved. On the next call to the
	465	generator's ``.next()`` method, the function will resume executing.
	466
	467	Here's a sample usage of the ``generate_ints()`` generator:
	468
	469	>>> gen = generate_ints(3)
	470	>>> gen
	471	<generator object generate_ints at ...>
	472	>>> gen.next()
	473	0
	474	>>> gen.next()
	475	1
	476	>>> gen.next()
	477	2
	478	>>> gen.next()
	479	Traceback (most recent call last):
	480	File "stdin", line 1, in ?
	481	File "stdin", line 2, in generate_ints
	482	StopIteration
	483
	484	You could equally write ``for i in generate_ints(5)``, or ``a,b,c =
	485	generate_ints(3)``.
	486
	487	Inside a generator function, the ``return`` statement can only be used without a
	488	value, and signals the end of the procession of values; after executing a
	489	``return`` the generator cannot return any further values. ``return`` with a
	490	value, such as ``return 5``, is a syntax error inside a generator function. The
	491	end of the generator's results can also be indicated by raising
	492	``StopIteration`` manually, or by just letting the flow of execution fall off
	493	the bottom of the function.
	494
	495	You could achieve the effect of generators manually by writing your own class
	496	and storing all the local variables of the generator as instance variables. For
	497	example, returning a list of integers could be done by setting ``self.count`` to
	498	0, and having the ``next()`` method increment ``self.count`` and return it.
	499	However, for a moderately complicated generator, writing a corresponding class
	500	can be much messier.
	501
	502	The test suite included with Python's library, ``test_generators.py``, contains
	503	a number of more interesting examples. Here's one generator that implements an
	504	in-order traversal of a tree using generators recursively. ::
	505
	506	# A recursive generator that generates Tree leaves in in-order.
	507	def inorder(t):
	508	if t:
	509	for x in inorder(t.left):
	510	yield x
	511
	512	yield t.label
	513
	514	for x in inorder(t.right):
	515	yield x
	516
	517	Two other examples in ``test_generators.py`` produce solutions for the N-Queens
	518	problem (placing N queens on an NxN chess board so that no queen threatens
	519	another) and the Knight's Tour (finding a route that takes a knight to every
	520	square of an NxN chessboard without visiting any square twice).
	521
	522
	523
	524	Passing values into a generator
	525	-------------------------------
	526
	527	In Python 2.4 and earlier, generators only produced output. Once a generator's
	528	code was invoked to create an iterator, there was no way to pass any new
	529	information into the function when its execution is resumed. You could hack
	530	together this ability by making the generator look at a global variable or by
	531	passing in some mutable object that callers then modify, but these approaches
	532	are messy.
	533
	534	In Python 2.5 there's a simple way to pass values into a generator.
	535	:keyword:`yield` became an expression, returning a value that can be assigned to
	536	a variable or otherwise operated on::
	537
	538	val = (yield i)
	539
	540	I recommend that you always put parentheses around a ``yield`` expression
	541	when you're doing something with the returned value, as in the above example.
	542	The parentheses aren't always necessary, but it's easier to always add them
	543	instead of having to remember when they're needed.
	544
	545	(PEP 342 explains the exact rules, which are that a ``yield``-expression must
	546	always be parenthesized except when it occurs at the top-level expression on the
	547	right-hand side of an assignment. This means you can write ``val = yield i``
	548	but have to use parentheses when there's an operation, as in ``val = (yield i)
	549	+ 12``.)
	550
	551	Values are sent into a generator by calling its ``send(value)`` method. This
	552	method resumes the generator's code and the ``yield`` expression returns the
	553	specified value. If the regular ``next()`` method is called, the ``yield``
	554	returns ``None``.
	555
	556	Here's a simple counter that increments by 1 and allows changing the value of
	557	the internal counter.
	558
	559	.. testcode::
	560
	561	def counter (maximum):
	562	i = 0
	563	while i < maximum:
	564	val = (yield i)
	565	# If value provided, change counter
	566	if val is not None:
	567	i = val
	568	else:
	569	i += 1
	570
	571	And here's an example of changing the counter:
	572
	573	>>> it = counter(10)
	574	>>> print it.next()
	575	0
	576	>>> print it.next()
	577	1
	578	>>> print it.send(8)
	579	8
	580	>>> print it.next()
	581	9
	582	>>> print it.next()
	583	Traceback (most recent call last):
	584	File "t.py", line 15, in ?
	585	print it.next()
	586	StopIteration
	587
	588	Because ``yield`` will often be returning ``None``, you should always check for
	589	this case. Don't just use its value in expressions unless you're sure that the
	590	``send()`` method will be the only method used resume your generator function.
	591
	592	In addition to ``send()``, there are two other new methods on generators:
	593
	594	* ``throw(type, value=None, traceback=None)`` is used to raise an exception
	595	inside the generator; the exception is raised by the ``yield`` expression
	596	where the generator's execution is paused.
	597
	598	* ``close()`` raises a :exc:`GeneratorExit` exception inside the generator to
	599	terminate the iteration. On receiving this exception, the generator's code
	600	must either raise :exc:`GeneratorExit` or :exc:`StopIteration`; catching the
	601	exception and doing anything else is illegal and will trigger a
	602	:exc:`RuntimeError`. ``close()`` will also be called by Python's garbage
	603	collector when the generator is garbage-collected.
	604
	605	If you need to run cleanup code when a :exc:`GeneratorExit` occurs, I suggest
	606	using a ``try: ... finally:`` suite instead of catching :exc:`GeneratorExit`.
	607
	608	The cumulative effect of these changes is to turn generators from one-way
	609	producers of information into both producers and consumers.
	610
	611	Generators also become coroutines, a more generalized form of subroutines.
	612	Subroutines are entered at one point and exited at another point (the top of the
	613	function, and a ``return`` statement), but coroutines can be entered, exited,
	614	and resumed at many different points (the ``yield`` statements).
	615
	616
	617	Built-in functions
	618	==================
	619
	620	Let's look in more detail at built-in functions often used with iterators.
	621
	622	Two of Python's built-in functions, :func:`map` and :func:`filter`, are somewhat
	623	obsolete; they duplicate the features of list comprehensions but return actual
	624	lists instead of iterators.
	625
	626	``map(f, iterA, iterB, ...)`` returns a list containing ``f(iterA[0], iterB[0]),
	627	f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``.
	628
	629	>>> def upper(s):
	630	... return s.upper()
	631
	632	>>> map(upper, ['sentence', 'fragment'])
	633	['SENTENCE', 'FRAGMENT']
	634
	635	>>> [upper(s) for s in ['sentence', 'fragment']]
	636	['SENTENCE', 'FRAGMENT']
	637
	638	As shown above, you can achieve the same effect with a list comprehension. The
	639	:func:`itertools.imap` function does the same thing but can handle infinite
	640	iterators; it'll be discussed later, in the section on the :mod:`itertools` module.
	641
	642	``filter(predicate, iter)`` returns a list that contains all the sequence
	643	elements that meet a certain condition, and is similarly duplicated by list
	644	comprehensions. A predicate is a function that returns the truth value of
	645	some condition; for use with :func:`filter`, the predicate must take a single
	646	value.
	647
	648	>>> def is_even(x):
	649	... return (x % 2) == 0
	650
	651	>>> filter(is_even, range(10))
	652	[0, 2, 4, 6, 8]
	653
	654	This can also be written as a list comprehension:
	655
	656	>>> [x for x in range(10) if is_even(x)]
	657	[0, 2, 4, 6, 8]
	658
	659	:func:`filter` also has a counterpart in the :mod:`itertools` module,
	660	:func:`itertools.ifilter`, that returns an iterator and can therefore handle
	661	infinite sequences just as :func:`itertools.imap` can.
	662
	663	``reduce(func, iter, [initial_value])`` doesn't have a counterpart in the
	664	:mod:`itertools` module because it cumulatively performs an operation on all the
	665	iterable's elements and therefore can't be applied to infinite iterables.
	666	``func`` must be a function that takes two elements and returns a single value.
	667	:func:`reduce` takes the first two elements A and B returned by the iterator and
	668	calculates ``func(A, B)``. It then requests the third element, C, calculates
	669	``func(func(A, B), C)``, combines this result with the fourth element returned,
	670	and continues until the iterable is exhausted. If the iterable returns no
	671	values at all, a :exc:`TypeError` exception is raised. If the initial value is
	672	supplied, it's used as a starting point and ``func(initial_value, A)`` is the
	673	first calculation.
	674
	675	>>> import operator
	676	>>> reduce(operator.concat, ['A', 'BB', 'C'])
	677	'ABBC'
	678	>>> reduce(operator.concat, [])
	679	Traceback (most recent call last):
	680	...
	681	TypeError: reduce() of empty sequence with no initial value
	682	>>> reduce(operator.mul, [1,2,3], 1)
	683	6
	684	>>> reduce(operator.mul, [], 1)
	685	1
	686
	687	If you use :func:`operator.add` with :func:`reduce`, you'll add up all the
	688	elements of the iterable. This case is so common that there's a special
	689	built-in called :func:`sum` to compute it:
	690
	691	>>> reduce(operator.add, [1,2,3,4], 0)
	692	10
	693	>>> sum([1,2,3,4])
	694	10
	695	>>> sum([])
	696	0
	697
	698	For many uses of :func:`reduce`, though, it can be clearer to just write the
	699	obvious :keyword:`for` loop::
	700
	701	# Instead of:
	702	product = reduce(operator.mul, [1,2,3], 1)
	703
	704	# You can write:
	705	product = 1
	706	for i in [1,2,3]:
	707	product *= i
	708
	709
	710	``enumerate(iter)`` counts off the elements in the iterable, returning 2-tuples
	711	containing the count and each element.
	712
	713	>>> for item in enumerate(['subject', 'verb', 'object']):
	714	... print item
	715	(0, 'subject')
	716	(1, 'verb')
	717	(2, 'object')
	718
	719	:func:`enumerate` is often used when looping through a list and recording the
	720	indexes at which certain conditions are met::
	721
	722	f = open('data.txt', 'r')
	723	for i, line in enumerate(f):
	724	if line.strip() == '':
	725	print 'Blank line at line #%i' % i
	726
	727	``sorted(iterable, [cmp=None], [key=None], [reverse=False])`` collects all the
	728	elements of the iterable into a list, sorts the list, and returns the sorted
	729	result. The ``cmp``, ``key``, and ``reverse`` arguments are passed through to
	730	the constructed list's ``.sort()`` method. ::
	731
	732	>>> import random
	733	>>> # Generate 8 random numbers between [0, 10000)
	734	>>> rand_list = random.sample(range(10000), 8)
	735	>>> rand_list
	736	[769, 7953, 9828, 6431, 8442, 9878, 6213, 2207]
	737	>>> sorted(rand_list)
	738	[769, 2207, 6213, 6431, 7953, 8442, 9828, 9878]
	739	>>> sorted(rand_list, reverse=True)
	740	[9878, 9828, 8442, 7953, 6431, 6213, 2207, 769]
	741
	742	(For a more detailed discussion of sorting, see the Sorting mini-HOWTO in the
	743	Python wiki at http://wiki.python.org/moin/HowTo/Sorting.)
	744
	745	The ``any(iter)`` and ``all(iter)`` built-ins look at the truth values of an
	746	iterable's contents. :func:`any` returns True if any element in the iterable is
	747	a true value, and :func:`all` returns True if all of the elements are true
	748	values:
	749
	750	>>> any([0,1,0])
	751	True
	752	>>> any([0,0,0])
	753	False
	754	>>> any([1,1,1])
	755	True
	756	>>> all([0,1,0])
	757	False
	758	>>> all([0,0,0])
	759	False
	760	>>> all([1,1,1])
	761	True
	762
	763
	764	Small functions and the lambda expression
	765	=========================================
	766
	767	When writing functional-style programs, you'll often need little functions that
	768	act as predicates or that combine elements in some way.
	769
	770	If there's a Python built-in or a module function that's suitable, you don't
	771	need to define a new function at all::
	772
	773	stripped_lines = [line.strip() for line in lines]
	774	existing_files = filter(os.path.exists, file_list)
	775
	776	If the function you need doesn't exist, you need to write it. One way to write
	777	small functions is to use the ``lambda`` statement. ``lambda`` takes a number
	778	of parameters and an expression combining these parameters, and creates a small
	779	function that returns the value of the expression::
	780
	781	lowercase = lambda x: x.lower()
	782
	783	print_assign = lambda name, value: name + '=' + str(value)
	784
	785	adder = lambda x, y: x+y
	786
	787	An alternative is to just use the ``def`` statement and define a function in the
	788	usual way::
	789
	790	def lowercase(x):
	791	return x.lower()
	792
	793	def print_assign(name, value):
	794	return name + '=' + str(value)
	795
	796	def adder(x,y):
	797	return x + y
	798
	799	Which alternative is preferable? That's a style question; my usual course is to
	800	avoid using ``lambda``.
	801
	802	One reason for my preference is that ``lambda`` is quite limited in the
	803	functions it can define. The result has to be computable as a single
	804	expression, which means you can't have multiway ``if... elif... else``
	805	comparisons or ``try... except`` statements. If you try to do too much in a
	806	``lambda`` statement, you'll end up with an overly complicated expression that's
	807	hard to read. Quick, what's the following code doing?
	808
	809	::
	810
	811	total = reduce(lambda a, b: (0, a[1] + b[1]), items)[1]
	812
	813	You can figure it out, but it takes time to disentangle the expression to figure
	814	out what's going on. Using a short nested ``def`` statements makes things a
	815	little bit better::
	816
	817	def combine (a, b):
	818	return 0, a[1] + b[1]
	819
	820	total = reduce(combine, items)[1]
	821
	822	But it would be best of all if I had simply used a ``for`` loop::
	823
	824	total = 0
	825	for a, b in items:
	826	total += b
	827
	828	Or the :func:`sum` built-in and a generator expression::
	829
	830	total = sum(b for a,b in items)
	831
	832	Many uses of :func:`reduce` are clearer when written as ``for`` loops.
	833
	834	Fredrik Lundh once suggested the following set of rules for refactoring uses of
	835	``lambda``:
	836
	837	1) Write a lambda function.
	838	2) Write a comment explaining what the heck that lambda does.
	839	3) Study the comment for a while, and think of a name that captures the essence
	840	of the comment.
	841	4) Convert the lambda to a def statement, using that name.
	842	5) Remove the comment.
	843
	844	I really like these rules, but you're free to disagree
	845	about whether this lambda-free style is better.
	846
	847
	848	The itertools module
	849	====================
	850
	851	The :mod:`itertools` module contains a number of commonly-used iterators as well
	852	as functions for combining several iterators. This section will introduce the
	853	module's contents by showing small examples.
	854
	855	The module's functions fall into a few broad classes:
	856
	857	* Functions that create a new iterator based on an existing iterator.
	858	* Functions for treating an iterator's elements as function arguments.
	859	* Functions for selecting portions of an iterator's output.
	860	* A function for grouping an iterator's output.
	861
	862	Creating new iterators
	863	----------------------
	864
	865	``itertools.count(n)`` returns an infinite stream of integers, increasing by 1
	866	each time. You can optionally supply the starting number, which defaults to 0::
	867
	868	itertools.count() =>
	869	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
	870	itertools.count(10) =>
	871	10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
	872
	873	``itertools.cycle(iter)`` saves a copy of the contents of a provided iterable
	874	and returns a new iterator that returns its elements from first to last. The
	875	new iterator will repeat these elements infinitely. ::
	876
	877	itertools.cycle([1,2,3,4,5]) =>
	878	1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ...
	879
	880	``itertools.repeat(elem, [n])`` returns the provided element ``n`` times, or
	881	returns the element endlessly if ``n`` is not provided. ::
	882
	883	itertools.repeat('abc') =>
	884	abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ...
	885	itertools.repeat('abc', 5) =>
	886	abc, abc, abc, abc, abc
	887
	888	``itertools.chain(iterA, iterB, ...)`` takes an arbitrary number of iterables as
	889	input, and returns all the elements of the first iterator, then all the elements
	890	of the second, and so on, until all of the iterables have been exhausted. ::
	891
	892	itertools.chain(['a', 'b', 'c'], (1, 2, 3)) =>
	893	a, b, c, 1, 2, 3
	894
	895	``itertools.izip(iterA, iterB, ...)`` takes one element from each iterable and
	896	returns them in a tuple::
	897
	898	itertools.izip(['a', 'b', 'c'], (1, 2, 3)) =>
	899	('a', 1), ('b', 2), ('c', 3)
	900
	901	It's similar to the built-in :func:`zip` function, but doesn't construct an
	902	in-memory list and exhaust all the input iterators before returning; instead
	903	tuples are constructed and returned only if they're requested. (The technical
	904	term for this behaviour is `lazy evaluation
	905	<http://en.wikipedia.org/wiki/Lazy_evaluation>`__.)
	906
	907	This iterator is intended to be used with iterables that are all of the same
	908	length. If the iterables are of different lengths, the resulting stream will be
	909	the same length as the shortest iterable. ::
	910
	911	itertools.izip(['a', 'b'], (1, 2, 3)) =>
	912	('a', 1), ('b', 2)
	913
	914	You should avoid doing this, though, because an element may be taken from the
	915	longer iterators and discarded. This means you can't go on to use the iterators
	916	further because you risk skipping a discarded element.
	917
	918	``itertools.islice(iter, [start], stop, [step])`` returns a stream that's a
	919	slice of the iterator. With a single ``stop`` argument, it will return the
	920	first ``stop`` elements. If you supply a starting index, you'll get
	921	``stop-start`` elements, and if you supply a value for ``step``, elements will
	922	be skipped accordingly. Unlike Python's string and list slicing, you can't use
	923	negative values for ``start``, ``stop``, or ``step``. ::
	924
	925	itertools.islice(range(10), 8) =>
	926	0, 1, 2, 3, 4, 5, 6, 7
	927	itertools.islice(range(10), 2, 8) =>
	928	2, 3, 4, 5, 6, 7
	929	itertools.islice(range(10), 2, 8, 2) =>
	930	2, 4, 6
	931
	932	``itertools.tee(iter, [n])`` replicates an iterator; it returns ``n``
	933	independent iterators that will all return the contents of the source iterator.
	934	If you don't supply a value for ``n``, the default is 2. Replicating iterators
	935	requires saving some of the contents of the source iterator, so this can consume
	936	significant memory if the iterator is large and one of the new iterators is
	937	consumed more than the others. ::
	938
	939	itertools.tee( itertools.count() ) =>
	940	iterA, iterB
	941
	942	where iterA ->
	943	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
	944
	945	and iterB ->
	946	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
	947
	948
	949	Calling functions on elements
	950	-----------------------------
	951
	952	Two functions are used for calling other functions on the contents of an
	953	iterable.
	954
	955	``itertools.imap(f, iterA, iterB, ...)`` returns a stream containing
	956	``f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...``::
	957
	958	itertools.imap(operator.add, [5, 6, 5], [1, 2, 3]) =>
	959	6, 8, 8
	960
	961	The ``operator`` module contains a set of functions corresponding to Python's
	962	operators. Some examples are ``operator.add(a, b)`` (adds two values),
	963	``operator.ne(a, b)`` (same as ``a!=b``), and ``operator.attrgetter('id')``
	964	(returns a callable that fetches the ``"id"`` attribute).
	965
	966	``itertools.starmap(func, iter)`` assumes that the iterable will return a stream
	967	of tuples, and calls ``f()`` using these tuples as the arguments::
	968
	969	itertools.starmap(os.path.join,
	970	[('/usr', 'bin', 'java'), ('/bin', 'python'),
	971	('/usr', 'bin', 'perl'),('/usr', 'bin', 'ruby')])
	972	=>
	973	/usr/bin/java, /bin/python, /usr/bin/perl, /usr/bin/ruby
	974
	975
	976	Selecting elements
	977	------------------
	978
	979	Another group of functions chooses a subset of an iterator's elements based on a
	980	predicate.
	981
	982	``itertools.ifilter(predicate, iter)`` returns all the elements for which the
	983	predicate returns true::
	984
	985	def is_even(x):
	986	return (x % 2) == 0
	987
	988	itertools.ifilter(is_even, itertools.count()) =>
	989	0, 2, 4, 6, 8, 10, 12, 14, ...
	990
	991	``itertools.ifilterfalse(predicate, iter)`` is the opposite, returning all
	992	elements for which the predicate returns false::
	993
	994	itertools.ifilterfalse(is_even, itertools.count()) =>
	995	1, 3, 5, 7, 9, 11, 13, 15, ...
	996
	997	``itertools.takewhile(predicate, iter)`` returns elements for as long as the
	998	predicate returns true. Once the predicate returns false, the iterator will
	999	signal the end of its results.
	1000
	1001	::
	1002
	1003	def less_than_10(x):
	1004	return (x < 10)
	1005
	1006	itertools.takewhile(less_than_10, itertools.count()) =>
	1007	0, 1, 2, 3, 4, 5, 6, 7, 8, 9
	1008
	1009	itertools.takewhile(is_even, itertools.count()) =>
	1010	0
	1011
	1012	``itertools.dropwhile(predicate, iter)`` discards elements while the predicate
	1013	returns true, and then returns the rest of the iterable's results.
	1014
	1015	::
	1016
	1017	itertools.dropwhile(less_than_10, itertools.count()) =>
	1018	10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ...
	1019
	1020	itertools.dropwhile(is_even, itertools.count()) =>
	1021	1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
	1022
	1023
	1024	Grouping elements
	1025	-----------------
	1026
	1027	The last function I'll discuss, ``itertools.groupby(iter, key_func=None)``, is
	1028	the most complicated. ``key_func(elem)`` is a function that can compute a key
	1029	value for each element returned by the iterable. If you don't supply a key
	1030	function, the key is simply each element itself.
	1031
	1032	``groupby()`` collects all the consecutive elements from the underlying iterable
	1033	that have the same key value, and returns a stream of 2-tuples containing a key
	1034	value and an iterator for the elements with that key.
	1035
	1036	::
	1037
	1038	city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
	1039	('Anchorage', 'AK'), ('Nome', 'AK'),
	1040	('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'),
	1041	...
	1042	]
	1043
	1044	def get_state ((city, state)):
	1045	return state
	1046
	1047	itertools.groupby(city_list, get_state) =>
	1048	('AL', iterator-1),
	1049	('AK', iterator-2),
	1050	('AZ', iterator-3), ...
	1051
	1052	where
	1053	iterator-1 =>
	1054	('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')
	1055	iterator-2 =>
	1056	('Anchorage', 'AK'), ('Nome', 'AK')
	1057	iterator-3 =>
	1058	('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')
	1059
	1060	``groupby()`` assumes that the underlying iterable's contents will already be
	1061	sorted based on the key. Note that the returned iterators also use the
	1062	underlying iterable, so you have to consume the results of iterator-1 before
	1063	requesting iterator-2 and its corresponding key.
	1064
	1065
	1066	The functools module
	1067	====================
	1068
	1069	The :mod:`functools` module in Python 2.5 contains some higher-order functions.
	1070	A higher-order function takes one or more functions as input and returns a
	1071	new function. The most useful tool in this module is the
	1072	:func:`functools.partial` function.
	1073
	1074	For programs written in a functional style, you'll sometimes want to construct
	1075	variants of existing functions that have some of the parameters filled in.
	1076	Consider a Python function ``f(a, b, c)``; you may wish to create a new function
	1077	``g(b, c)`` that's equivalent to ``f(1, b, c)``; you're filling in a value for
	1078	one of ``f()``'s parameters. This is called "partial function application".
	1079
	1080	The constructor for ``partial`` takes the arguments ``(function, arg1, arg2,
	1081	... kwarg1=value1, kwarg2=value2)``. The resulting object is callable, so you
	1082	can just call it to invoke ``function`` with the filled-in arguments.
	1083
	1084	Here's a small but realistic example::
	1085
	1086	import functools
	1087
	1088	def log (message, subsystem):
	1089	"Write the contents of 'message' to the specified subsystem."
	1090	print '%s: %s' % (subsystem, message)
	1091	...
	1092
	1093	server_log = functools.partial(log, subsystem='server')
	1094	server_log('Unable to open socket')
	1095
	1096
	1097	The operator module
	1098	-------------------
	1099
	1100	The :mod:`operator` module was mentioned earlier. It contains a set of
	1101	functions corresponding to Python's operators. These functions are often useful
	1102	in functional-style code because they save you from writing trivial functions
	1103	that perform a single operation.
	1104
	1105	Some of the functions in this module are:
	1106
	1107	* Math operations: ``add()``, ``sub()``, ``mul()``, ``div()``, ``floordiv()``,
	1108	``abs()``, ...
	1109	* Logical operations: ``not_()``, ``truth()``.
	1110	* Bitwise operations: ``and_()``, ``or_()``, ``invert()``.
	1111	* Comparisons: ``eq()``, ``ne()``, ``lt()``, ``le()``, ``gt()``, and ``ge()``.
	1112	* Object identity: ``is_()``, ``is_not()``.
	1113
	1114	Consult the operator module's documentation for a complete list.
	1115
	1116
	1117	Revision History and Acknowledgements
	1118	=====================================
	1119
	1120	The author would like to thank the following people for offering suggestions,
	1121	corrections and assistance with various drafts of this article: Ian Bicking,
	1122	Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro
	1123	Lameiro, Jussi Salmela, Collin Winter, Blake Winton.
	1124
	1125	Version 0.1: posted June 30 2006.
	1126
	1127	Version 0.11: posted July 1 2006. Typo fixes.
	1128
	1129	Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one.
	1130	Typo fixes.
	1131
	1132	Version 0.21: Added more references suggested on the tutor mailing list.
	1133
	1134	Version 0.30: Adds a section on the ``functional`` module written by Collin
	1135	Winter; adds short section on the operator module; a few other edits.
	1136
	1137
	1138	References
	1139	==========
	1140
	1141	General
	1142	-------
	1143
	1144	Structure and Interpretation of Computer Programs, by Harold Abelson and
	1145	Gerald Jay Sussman with Julie Sussman. Full text at
	1146	http://mitpress.mit.edu/sicp/. In this classic textbook of computer science,
	1147	chapters 2 and 3 discuss the use of sequences and streams to organize the data
	1148	flow inside a program. The book uses Scheme for its examples, but many of the
	1149	design approaches described in these chapters are applicable to functional-style
	1150	Python code.
	1151
	1152	http://www.defmacro.org/ramblings/fp.html: A general introduction to functional
	1153	programming that uses Java examples and has a lengthy historical introduction.
	1154
	1155	http://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry
	1156	describing functional programming.
	1157
	1158	http://en.wikipedia.org/wiki/Coroutine: Entry for coroutines.
	1159
	1160	http://en.wikipedia.org/wiki/Currying: Entry for the concept of currying.
	1161
	1162	Python-specific
	1163	---------------
	1164
	1165	http://gnosis.cx/TPiP/: The first chapter of David Mertz's book
	1166	:title-reference:`Text Processing in Python` discusses functional programming
	1167	for text processing, in the section titled "Utilizing Higher-Order Functions in
	1168	Text Processing".
	1169
	1170	Mertz also wrote a 3-part series of articles on functional programming
	1171	for IBM's DeveloperWorks site; see
	1172
[391]	1173	`part 1 <http://www.ibm.com/developerworks/linux/library/l-prog/index.html>`__,
	1174	`part 2 <http://www.ibm.com/developerworks/linux/library/l-prog2/index.html>`__, and
	1175	`part 3 <http://www.ibm.com/developerworks/linux/library/l-prog3/index.html>`__,
[2]	1176
[391]	1177
[2]	1178	Python documentation
	1179	--------------------
	1180
	1181	Documentation for the :mod:`itertools` module.
	1182
	1183	Documentation for the :mod:`operator` module.
	1184
	1185	:pep:`289`: "Generator Expressions"
	1186
	1187	:pep:`342`: "Coroutines via Enhanced Generators" describes the new generator
	1188	features in Python 2.5.
	1189
	1190	.. comment
	1191
	1192	Topics to place
	1193	-----------------------------
	1194
	1195	XXX os.walk()
	1196
	1197	XXX Need a large example.
	1198
	1199	But will an example add much? I'll post a first draft and see
	1200	what the comments say.
	1201
	1202	.. comment
	1203
	1204	Original outline:
	1205	Introduction
	1206	Idea of FP
	1207	Programs built out of functions
	1208	Functions are strictly input-output, no internal state
	1209	Opposed to OO programming, where objects have state
	1210
	1211	Why FP?
	1212	Formal provability
	1213	Assignment is difficult to reason about
	1214	Not very relevant to Python
	1215	Modularity
	1216	Small functions that do one thing
	1217	Debuggability:
	1218	Easy to test due to lack of state
	1219	Easy to verify output from intermediate steps
	1220	Composability
	1221	You assemble a toolbox of functions that can be mixed
	1222
	1223	Tackling a problem
	1224	Need a significant example
	1225
	1226	Iterators
	1227	Generators
	1228	The itertools module
	1229	List comprehensions
	1230	Small functions and the lambda statement
	1231	Built-in functions
	1232	map
	1233	filter
	1234	reduce
	1235
	1236	.. comment
	1237
	1238	Handy little function for printing part of an iterator -- used
	1239	while writing this document.
	1240
	1241	import itertools
	1242	def print_iter(it):
	1243	slice = itertools.islice(it, 10)
	1244	for elem in slice[:-1]:
	1245	sys.stdout.write(str(elem))
	1246	sys.stdout.write(', ')
	1247	print elem[-1]
	1248
	1249

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Doc/howto/functional.rst

Download in other formats: