Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

difflib.rst

Last change on this file was 391, checked in by dmik, 11 years ago
python: Merge vendor 2.7.6 to trunk.
Property svn:eol-style set to `native`
File size: 29.9 KB

Rev	Line
[2]	1	:mod:`difflib` --- Helpers for computing deltas
	2	===============================================
	3
	4	.. module:: difflib
	5	:synopsis: Helpers for computing differences between objects.
	6	.. moduleauthor:: Tim Peters <tim_one@users.sourceforge.net>
	7	.. sectionauthor:: Tim Peters <tim_one@users.sourceforge.net>
	8	.. Markup by Fred L. Drake, Jr. <fdrake@acm.org>
	9
	10	.. testsetup::
	11
	12	import sys
	13	from difflib import *
	14
	15	.. versionadded:: 2.1
	16
	17	This module provides classes and functions for comparing sequences. It
	18	can be used for example, for comparing files, and can produce difference
	19	information in various formats, including HTML and context and unified
	20	diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
	21
	22	.. class:: SequenceMatcher
	23
	24	This is a flexible class for comparing pairs of sequences of any type, so long
	25	as the sequence elements are :term:`hashable`. The basic algorithm predates, and is a
	26	little fancier than, an algorithm published in the late 1980's by Ratcliff and
	27	Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
	28	find the longest contiguous matching subsequence that contains no "junk"
	29	elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
	30	idea is then applied recursively to the pieces of the sequences to the left and
	31	to the right of the matching subsequence. This does not yield minimal edit
	32	sequences, but does tend to yield matches that "look right" to people.
	33
	34	Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst
	35	case and quadratic time in the expected case. :class:`SequenceMatcher` is
	36	quadratic time for the worst case and has expected-case behavior dependent in a
	37	complicated way on how many elements the sequences have in common; best case
	38	time is linear.
	39
[391]	40	Automatic junk heuristic: :class:`SequenceMatcher` supports a heuristic that
	41	automatically treats certain sequence items as junk. The heuristic counts how many
	42	times each individual item appears in the sequence. If an item's duplicates (after
	43	the first one) account for more than 1% of the sequence and the sequence is at least
	44	200 items long, this item is marked as "popular" and is treated as junk for
	45	the purpose of sequence matching. This heuristic can be turned off by setting
	46	the ``autojunk`` argument to ``False`` when creating the :class:`SequenceMatcher`.
[2]	47
[391]	48	.. versionadded:: 2.7.1
	49	The autojunk parameter.
	50
[2]	51	.. class:: Differ
	52
	53	This is a class for comparing sequences of lines of text, and producing
	54	human-readable differences or deltas. Differ uses :class:`SequenceMatcher`
	55	both to compare sequences of lines, and to compare sequences of characters
	56	within similar (near-matching) lines.
	57
	58	Each line of a :class:`Differ` delta begins with a two-letter code:
	59
	60	+----------+-------------------------------------------+
	61	\| Code \| Meaning \|
	62	+==========+===========================================+
	63	\| ``'- '`` \| line unique to sequence 1 \|
	64	+----------+-------------------------------------------+
	65	\| ``'+ '`` \| line unique to sequence 2 \|
	66	+----------+-------------------------------------------+
	67	\| ``' '`` \| line common to both sequences \|
	68	+----------+-------------------------------------------+
	69	\| ``'? '`` \| line not present in either input sequence \|
	70	+----------+-------------------------------------------+
	71
	72	Lines beginning with '``?``' attempt to guide the eye to intraline differences,
	73	and were not present in either input sequence. These lines can be confusing if
	74	the sequences contain tab characters.
	75
	76
	77	.. class:: HtmlDiff
	78
	79	This class can be used to create an HTML table (or a complete HTML file
	80	containing the table) showing a side by side, line by line comparison of text
	81	with inter-line and intra-line change highlights. The table can be generated in
	82	either full or contextual difference mode.
	83
	84	The constructor for this class is:
	85
	86
[391]	87	.. function:: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK)
[2]	88
	89	Initializes instance of :class:`HtmlDiff`.
	90
	91	tabsize is an optional keyword argument to specify tab stop spacing and
	92	defaults to ``8``.
	93
	94	wrapcolumn is an optional keyword to specify column number where lines are
	95	broken and wrapped, defaults to ``None`` where lines are not wrapped.
	96
	97	linejunk and charjunk are optional keyword arguments passed into ``ndiff()``
	98	(used by :class:`HtmlDiff` to generate the side by side HTML differences). See
	99	``ndiff()`` documentation for argument default values and descriptions.
	100
	101	The following methods are public:
	102
	103
	104	.. function:: make_file(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
	105
	106	Compares fromlines and tolines (lists of strings) and returns a string which
	107	is a complete HTML file containing a table showing line by line differences with
	108	inter-line and intra-line changes highlighted.
	109
	110	fromdesc and todesc are optional keyword arguments to specify from/to file
	111	column header strings (both default to an empty string).
	112
	113	context and numlines are both optional keyword arguments. Set context to
	114	``True`` when contextual differences are to be shown, else the default is
	115	``False`` to show the full files. numlines defaults to ``5``. When context
	116	is ``True`` numlines controls the number of context lines which surround the
	117	difference highlights. When context is ``False`` numlines controls the
	118	number of lines which are shown before a difference highlight when using the
	119	"next" hyperlinks (setting to zero would cause the "next" hyperlinks to place
	120	the next difference highlight at the top of the browser without any leading
	121	context).
	122
	123
	124	.. function:: make_table(fromlines, tolines [, fromdesc][, todesc][, context][, numlines])
	125
	126	Compares fromlines and tolines (lists of strings) and returns a string which
	127	is a complete HTML table showing line by line differences with inter-line and
	128	intra-line changes highlighted.
	129
	130	The arguments for this method are the same as those for the :meth:`make_file`
	131	method.
	132
	133	:file:`Tools/scripts/diff.py` is a command-line front-end to this class and
	134	contains a good example of its use.
	135
	136	.. versionadded:: 2.4
	137
	138
	139	.. function:: context_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
	140
	141	Compare a and b (lists of strings); return a delta (a :term:`generator`
	142	generating the delta lines) in context diff format.
	143
	144	Context diffs are a compact way of showing just the lines that have changed plus
	145	a few lines of context. The changes are shown in a before/after style. The
	146	number of context lines is set by n which defaults to three.
	147
	148	By default, the diff control lines (those with ``***`` or ``---``) are created
	149	with a trailing newline. This is helpful so that inputs created from
	150	:func:`file.readlines` result in diffs that are suitable for use with
	151	:func:`file.writelines` since both the inputs and outputs have trailing
	152	newlines.
	153
	154	For inputs that do not have trailing newlines, set the lineterm argument to
	155	``""`` so that the output will be uniformly newline free.
	156
	157	The context diff format normally has a header for filenames and modification
	158	times. Any or all of these may be specified using strings for fromfile,
[391]	159	tofile, fromfiledate, and tofiledate. The modification times are normally
	160	expressed in the ISO 8601 format. If not specified, the
[2]	161	strings default to blanks.
	162
	163	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
	164	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
	165	>>> for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
	166	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
	167	*** before.py
	168	--- after.py
	169	***************
	170	* 1,4 **
	171	! bacon
	172	! eggs
	173	! ham
	174	guido
	175	--- 1,4 ----
	176	! python
	177	! eggy
	178	! hamster
	179	guido
	180
	181	See :ref:`difflib-interface` for a more detailed example.
	182
	183	.. versionadded:: 2.3
	184
	185
	186	.. function:: get_close_matches(word, possibilities[, n][, cutoff])
	187
	188	Return a list of the best "good enough" matches. word is a sequence for which
	189	close matches are desired (typically a string), and possibilities is a list of
	190	sequences against which to match word (typically a list of strings).
	191
	192	Optional argument n (default ``3``) is the maximum number of close matches to
	193	return; n must be greater than ``0``.
	194
	195	Optional argument cutoff (default ``0.6``) is a float in the range [0, 1].
	196	Possibilities that don't score at least that similar to word are ignored.
	197
	198	The best (no more than n) matches among the possibilities are returned in a
	199	list, sorted by similarity score, most similar first.
	200
	201	>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
	202	['apple', 'ape']
	203	>>> import keyword
	204	>>> get_close_matches('wheel', keyword.kwlist)
	205	['while']
	206	>>> get_close_matches('apple', keyword.kwlist)
	207	[]
	208	>>> get_close_matches('accept', keyword.kwlist)
	209	['except']
	210
	211
	212	.. function:: ndiff(a, b[, linejunk][, charjunk])
	213
	214	Compare a and b (lists of strings); return a :class:`Differ`\ -style
	215	delta (a :term:`generator` generating the delta lines).
	216
	217	Optional keyword parameters linejunk and charjunk are for filter functions
	218	(or ``None``):
	219
	220	linejunk: A function that accepts a single string argument, and returns true
	221	if the string is junk, or false if not. The default is (``None``), starting with
	222	Python 2.3. Before then, the default was the module-level function
	223	:func:`IS_LINE_JUNK`, which filters out lines without visible characters, except
	224	for at most one pound character (``'#'``). As of Python 2.3, the underlying
	225	:class:`SequenceMatcher` class does a dynamic analysis of which lines are so
	226	frequent as to constitute noise, and this usually works better than the pre-2.3
	227	default.
	228
	229	charjunk: A function that accepts a character (a string of length 1), and
	230	returns if the character is junk, or false if not. The default is module-level
	231	function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
	232	blank or tab; note: bad idea to include newline in this!).
	233
	234	:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
	235
	236	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
	237	... 'ore\ntree\nemu\n'.splitlines(1))
	238	>>> print ''.join(diff),
	239	- one
	240	? ^
	241	+ ore
	242	? ^
	243	- two
	244	- three
	245	? -
	246	+ tree
	247	+ emu
	248
	249
	250	.. function:: restore(sequence, which)
	251
	252	Return one of the two sequences that generated a delta.
	253
	254	Given a sequence produced by :meth:`Differ.compare` or :func:`ndiff`, extract
	255	lines originating from file 1 or 2 (parameter which), stripping off line
	256	prefixes.
	257
	258	Example:
	259
	260	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
	261	... 'ore\ntree\nemu\n'.splitlines(1))
	262	>>> diff = list(diff) # materialize the generated delta into a list
	263	>>> print ''.join(restore(diff, 1)),
	264	one
	265	two
	266	three
	267	>>> print ''.join(restore(diff, 2)),
	268	ore
	269	tree
	270	emu
	271
	272
	273	.. function:: unified_diff(a, b[, fromfile][, tofile][, fromfiledate][, tofiledate][, n][, lineterm])
	274
	275	Compare a and b (lists of strings); return a delta (a :term:`generator`
	276	generating the delta lines) in unified diff format.
	277
	278	Unified diffs are a compact way of showing just the lines that have changed plus
	279	a few lines of context. The changes are shown in a inline style (instead of
	280	separate before/after blocks). The number of context lines is set by n which
	281	defaults to three.
	282
	283	By default, the diff control lines (those with ``---``, ``+++``, or ``@@``) are
	284	created with a trailing newline. This is helpful so that inputs created from
	285	:func:`file.readlines` result in diffs that are suitable for use with
	286	:func:`file.writelines` since both the inputs and outputs have trailing
	287	newlines.
	288
	289	For inputs that do not have trailing newlines, set the lineterm argument to
	290	``""`` so that the output will be uniformly newline free.
	291
	292	The context diff format normally has a header for filenames and modification
	293	times. Any or all of these may be specified using strings for fromfile,
[391]	294	tofile, fromfiledate, and tofiledate. The modification times are normally
	295	expressed in the ISO 8601 format. If not specified, the
[2]	296	strings default to blanks.
	297
	298	>>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
	299	>>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
	300	>>> for line in unified_diff(s1, s2, fromfile='before.py', tofile='after.py'):
	301	... sys.stdout.write(line) # doctest: +NORMALIZE_WHITESPACE
	302	--- before.py
	303	+++ after.py
	304	@@ -1,4 +1,4 @@
	305	-bacon
	306	-eggs
	307	-ham
	308	+python
	309	+eggy
	310	+hamster
	311	guido
	312
	313	See :ref:`difflib-interface` for a more detailed example.
	314
	315	.. versionadded:: 2.3
	316
	317
	318	.. function:: IS_LINE_JUNK(line)
	319
	320	Return true for ignorable lines. The line line is ignorable if line is
	321	blank or contains a single ``'#'``, otherwise it is not ignorable. Used as a
	322	default for parameter linejunk in :func:`ndiff` before Python 2.3.
	323
	324
	325	.. function:: IS_CHARACTER_JUNK(ch)
	326
	327	Return true for ignorable characters. The character ch is ignorable if ch
	328	is a space or tab, otherwise it is not ignorable. Used as a default for
	329	parameter charjunk in :func:`ndiff`.
	330
	331
	332	.. seealso::
	333
	334	`Pattern Matching: The Gestalt Approach <http://www.ddj.com/184407970?pgno=5>`_
	335	Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This
	336	was published in `Dr. Dobb's Journal <http://www.ddj.com/>`_ in July, 1988.
	337
	338
	339	.. _sequence-matcher:
	340
	341	SequenceMatcher Objects
	342	-----------------------
	343
	344	The :class:`SequenceMatcher` class has this constructor:
	345
	346
[391]	347	.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
[2]	348
	349	Optional argument isjunk must be ``None`` (the default) or a one-argument
	350	function that takes a sequence element and returns true if and only if the
	351	element is "junk" and should be ignored. Passing ``None`` for isjunk is
	352	equivalent to passing ``lambda x: 0``; in other words, no elements are ignored.
	353	For example, pass::
	354
	355	lambda x: x in " \t"
	356
	357	if you're comparing lines as sequences of characters, and don't want to synch up
	358	on blanks or hard tabs.
	359
	360	The optional arguments a and b are sequences to be compared; both default to
	361	empty strings. The elements of both sequences must be :term:`hashable`.
	362
[391]	363	The optional argument autojunk can be used to disable the automatic junk
	364	heuristic.
	365
	366	.. versionadded:: 2.7.1
	367	The autojunk parameter.
	368
[2]	369	:class:`SequenceMatcher` objects have the following methods:
	370
	371	.. method:: set_seqs(a, b)
	372
	373	Set the two sequences to be compared.
	374
	375	:class:`SequenceMatcher` computes and caches detailed information about the
	376	second sequence, so if you want to compare one sequence against many
	377	sequences, use :meth:`set_seq2` to set the commonly used sequence once and
	378	call :meth:`set_seq1` repeatedly, once for each of the other sequences.
	379
	380
	381	.. method:: set_seq1(a)
	382
	383	Set the first sequence to be compared. The second sequence to be compared
	384	is not changed.
	385
	386
	387	.. method:: set_seq2(b)
	388
	389	Set the second sequence to be compared. The first sequence to be compared
	390	is not changed.
	391
	392
	393	.. method:: find_longest_match(alo, ahi, blo, bhi)
	394
	395	Find longest matching block in ``a[alo:ahi]`` and ``b[blo:bhi]``.
	396
	397	If isjunk was omitted or ``None``, :meth:`find_longest_match` returns
	398	``(i, j, k)`` such that ``a[i:i+k]`` is equal to ``b[j:j+k]``, where ``alo
	399	<= i <= i+k <= ahi`` and ``blo <= j <= j+k <= bhi``. For all ``(i', j',
	400	k')`` meeting those conditions, the additional conditions ``k >= k'``, ``i
	401	<= i'``, and if ``i == i'``, ``j <= j'`` are also met. In other words, of
	402	all maximal matching blocks, return one that starts earliest in a, and
	403	of all those maximal matching blocks that start earliest in a, return
	404	the one that starts earliest in b.
	405
	406	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
	407	>>> s.find_longest_match(0, 5, 0, 9)
	408	Match(a=0, b=4, size=5)
	409
	410	If isjunk was provided, first the longest matching block is determined
	411	as above, but with the additional restriction that no junk element appears
	412	in the block. Then that block is extended as far as possible by matching
	413	(only) junk elements on both sides. So the resulting block never matches
	414	on junk except as identical junk happens to be adjacent to an interesting
	415	match.
	416
	417	Here's the same example as before, but considering blanks to be junk. That
	418	prevents ``' abcd'`` from matching the ``' abcd'`` at the tail end of the
	419	second sequence directly. Instead only the ``'abcd'`` can match, and
	420	matches the leftmost ``'abcd'`` in the second sequence:
	421
	422	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
	423	>>> s.find_longest_match(0, 5, 0, 9)
	424	Match(a=1, b=0, size=4)
	425
	426	If no blocks match, this returns ``(alo, blo, 0)``.
	427
	428	.. versionchanged:: 2.6
	429	This method returns a :term:`named tuple` ``Match(a, b, size)``.
	430
	431
	432	.. method:: get_matching_blocks()
	433
	434	Return list of triples describing matching subsequences. Each triple is of
	435	the form ``(i, j, n)``, and means that ``a[i:i+n] == b[j:j+n]``. The
	436	triples are monotonically increasing in i and j.
	437
	438	The last triple is a dummy, and has the value ``(len(a), len(b), 0)``. It
	439	is the only triple with ``n == 0``. If ``(i, j, n)`` and ``(i', j', n')``
	440	are adjacent triples in the list, and the second is not the last triple in
	441	the list, then ``i+n != i'`` or ``j+n != j'``; in other words, adjacent
	442	triples always describe non-adjacent equal blocks.
	443
	444	.. XXX Explain why a dummy is used!
	445
	446	.. versionchanged:: 2.5
	447	The guarantee that adjacent triples always describe non-adjacent blocks
	448	was implemented.
	449
	450	.. doctest::
	451
	452	>>> s = SequenceMatcher(None, "abxcd", "abcd")
	453	>>> s.get_matching_blocks()
	454	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
	455
	456
	457	.. method:: get_opcodes()
	458
	459	Return list of 5-tuples describing how to turn a into b. Each tuple is
	460	of the form ``(tag, i1, i2, j1, j2)``. The first tuple has ``i1 == j1 ==
	461	0``, and remaining tuples have i1 equal to the i2 from the preceding
	462	tuple, and, likewise, j1 equal to the previous j2.
	463
	464	The tag values are strings, with these meanings:
	465
	466	+---------------+---------------------------------------------+
	467	\| Value \| Meaning \|
	468	+===============+=============================================+
	469	\| ``'replace'`` \| ``a[i1:i2]`` should be replaced by \|
	470	\| \| ``b[j1:j2]``. \|
	471	+---------------+---------------------------------------------+
	472	\| ``'delete'`` \| ``a[i1:i2]`` should be deleted. Note that \|
	473	\| \| ``j1 == j2`` in this case. \|
	474	+---------------+---------------------------------------------+
	475	\| ``'insert'`` \| ``b[j1:j2]`` should be inserted at \|
	476	\| \| ``a[i1:i1]``. Note that ``i1 == i2`` in \|
	477	\| \| this case. \|
	478	+---------------+---------------------------------------------+
	479	\| ``'equal'`` \| ``a[i1:i2] == b[j1:j2]`` (the sub-sequences \|
	480	\| \| are equal). \|
	481	+---------------+---------------------------------------------+
	482
	483	For example:
	484
	485	>>> a = "qabxcd"
	486	>>> b = "abycdf"
	487	>>> s = SequenceMatcher(None, a, b)
	488	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
	489	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
	490	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
	491	delete a[0:1] (q) b[0:0] ()
	492	equal a[1:3] (ab) b[0:2] (ab)
	493	replace a[3:4] (x) b[2:3] (y)
	494	equal a[4:6] (cd) b[3:5] (cd)
	495	insert a[6:6] () b[5:6] (f)
	496
	497
	498	.. method:: get_grouped_opcodes([n])
	499
	500	Return a :term:`generator` of groups with up to n lines of context.
	501
	502	Starting with the groups returned by :meth:`get_opcodes`, this method
	503	splits out smaller change clusters and eliminates intervening ranges which
	504	have no changes.
	505
	506	The groups are returned in the same format as :meth:`get_opcodes`.
	507
	508	.. versionadded:: 2.3
	509
	510
	511	.. method:: ratio()
	512
	513	Return a measure of the sequences' similarity as a float in the range [0,
	514	1].
	515
	516	Where T is the total number of elements in both sequences, and M is the
	517	number of matches, this is 2.0\*M / T. Note that this is ``1.0`` if the
	518	sequences are identical, and ``0.0`` if they have nothing in common.
	519
	520	This is expensive to compute if :meth:`get_matching_blocks` or
	521	:meth:`get_opcodes` hasn't already been called, in which case you may want
	522	to try :meth:`quick_ratio` or :meth:`real_quick_ratio` first to get an
	523	upper bound.
	524
	525
	526	.. method:: quick_ratio()
	527
	528	Return an upper bound on :meth:`ratio` relatively quickly.
	529
	530
	531	.. method:: real_quick_ratio()
	532
	533	Return an upper bound on :meth:`ratio` very quickly.
	534
	535
	536	The three methods that return the ratio of matching to total characters can give
	537	different results due to differing levels of approximation, although
	538	:meth:`quick_ratio` and :meth:`real_quick_ratio` are always at least as large as
	539	:meth:`ratio`:
	540
	541	>>> s = SequenceMatcher(None, "abcd", "bcde")
	542	>>> s.ratio()
	543	0.75
	544	>>> s.quick_ratio()
	545	0.75
	546	>>> s.real_quick_ratio()
	547	1.0
	548
	549
	550	.. _sequencematcher-examples:
	551
	552	SequenceMatcher Examples
	553	------------------------
	554
	555	This example compares two strings, considering blanks to be "junk:"
	556
	557	>>> s = SequenceMatcher(lambda x: x == " ",
	558	... "private Thread currentThread;",
	559	... "private volatile Thread currentThread;")
	560
	561	:meth:`ratio` returns a float in [0, 1], measuring the similarity of the
	562	sequences. As a rule of thumb, a :meth:`ratio` value over 0.6 means the
	563	sequences are close matches:
	564
	565	>>> print round(s.ratio(), 3)
	566	0.866
	567
	568	If you're only interested in where the sequences match,
	569	:meth:`get_matching_blocks` is handy:
	570
	571	>>> for block in s.get_matching_blocks():
	572	... print "a[%d] and b[%d] match for %d elements" % block
	573	a[0] and b[0] match for 8 elements
	574	a[8] and b[17] match for 21 elements
	575	a[29] and b[38] match for 0 elements
	576
	577	Note that the last tuple returned by :meth:`get_matching_blocks` is always a
	578	dummy, ``(len(a), len(b), 0)``, and this is the only case in which the last
	579	tuple element (number of elements matched) is ``0``.
	580
	581	If you want to know how to change the first sequence into the second, use
	582	:meth:`get_opcodes`:
	583
	584	>>> for opcode in s.get_opcodes():
	585	... print "%6s a[%d:%d] b[%d:%d]" % opcode
	586	equal a[0:8] b[0:8]
	587	insert a[8:8] b[8:17]
	588	equal a[8:29] b[17:38]
	589
	590	.. seealso::
	591
	592	* The :func:`get_close_matches` function in this module which shows how
	593	simple code building on :class:`SequenceMatcher` can be used to do useful
	594	work.
	595
	596	* `Simple version control recipe
	597	<http://code.activestate.com/recipes/576729/>`_ for a small application
	598	built with :class:`SequenceMatcher`.
	599
	600
	601	.. _differ-objects:
	602
	603	Differ Objects
	604	--------------
	605
	606	Note that :class:`Differ`\ -generated deltas make no claim to be minimal
	607	diffs. To the contrary, minimal diffs are often counter-intuitive, because they
	608	synch up anywhere possible, sometimes accidental matches 100 pages apart.
	609	Restricting synch points to contiguous matches preserves some notion of
	610	locality, at the occasional cost of producing a longer diff.
	611
	612	The :class:`Differ` class has this constructor:
	613
	614
	615	.. class:: Differ([linejunk[, charjunk]])
	616
	617	Optional keyword parameters linejunk and charjunk are for filter functions
	618	(or ``None``):
	619
	620	linejunk: A function that accepts a single string argument, and returns true
	621	if the string is junk. The default is ``None``, meaning that no line is
	622	considered junk.
	623
	624	charjunk: A function that accepts a single character argument (a string of
	625	length 1), and returns true if the character is junk. The default is ``None``,
	626	meaning that no character is considered junk.
	627
	628	:class:`Differ` objects are used (deltas generated) via a single method:
	629
	630
	631	.. method:: Differ.compare(a, b)
	632
	633	Compare two sequences of lines, and generate the delta (a sequence of lines).
	634
[391]	635	Each sequence must contain individual single-line strings ending with
	636	newlines. Such sequences can be obtained from the
	637	:meth:`~file.readlines` method of file-like objects. The delta
	638	generated also consists of newline-terminated strings, ready to be
	639	printed as-is via the :meth:`~file.writelines` method of a
	640	file-like object.
[2]	641
	642
	643	.. _differ-examples:
	644
	645	Differ Example
	646	--------------
	647
	648	This example compares two texts. First we set up the texts, sequences of
	649	individual single-line strings ending with newlines (such sequences can also be
[391]	650	obtained from the :meth:`~file.readlines` method of file-like objects):
[2]	651
	652	>>> text1 = ''' 1. Beautiful is better than ugly.
	653	... 2. Explicit is better than implicit.
	654	... 3. Simple is better than complex.
	655	... 4. Complex is better than complicated.
	656	... '''.splitlines(1)
	657	>>> len(text1)
	658	4
	659	>>> text1[0][-1]
	660	'\n'
	661	>>> text2 = ''' 1. Beautiful is better than ugly.
	662	... 3. Simple is better than complex.
	663	... 4. Complicated is better than complex.
	664	... 5. Flat is better than nested.
	665	... '''.splitlines(1)
	666
	667	Next we instantiate a Differ object:
	668
	669	>>> d = Differ()
	670
	671	Note that when instantiating a :class:`Differ` object we may pass functions to
	672	filter out line and character "junk." See the :meth:`Differ` constructor for
	673	details.
	674
	675	Finally, we compare the two:
	676
	677	>>> result = list(d.compare(text1, text2))
	678
	679	``result`` is a list of strings, so let's pretty-print it:
	680
	681	>>> from pprint import pprint
	682	>>> pprint(result)
	683	[' 1. Beautiful is better than ugly.\n',
	684	'- 2. Explicit is better than implicit.\n',
	685	'- 3. Simple is better than complex.\n',
	686	'+ 3. Simple is better than complex.\n',
	687	'? ++\n',
	688	'- 4. Complex is better than complicated.\n',
	689	'? ^ ---- ^\n',
	690	'+ 4. Complicated is better than complex.\n',
	691	'? ++++ ^ ^\n',
	692	'+ 5. Flat is better than nested.\n']
	693
	694	As a single multi-line string it looks like this:
	695
	696	>>> import sys
	697	>>> sys.stdout.writelines(result)
	698	1. Beautiful is better than ugly.
	699	- 2. Explicit is better than implicit.
	700	- 3. Simple is better than complex.
	701	+ 3. Simple is better than complex.
	702	? ++
	703	- 4. Complex is better than complicated.
	704	? ^ ---- ^
	705	+ 4. Complicated is better than complex.
	706	? ++++ ^ ^
	707	+ 5. Flat is better than nested.
	708
	709
	710	.. _difflib-interface:
	711
	712	A command-line interface to difflib
	713	-----------------------------------
	714
	715	This example shows how to use difflib to create a ``diff``-like utility.
	716	It is also contained in the Python source distribution, as
	717	:file:`Tools/scripts/diff.py`.
	718
	719	.. testcode::
	720
	721	""" Command line interface to difflib.py providing diffs in four formats:
	722
	723	* ndiff: lists every line and highlights interline changes.
	724	* context: highlights clusters of changes in a before/after format.
	725	* unified: highlights clusters of changes in an inline format.
	726	* html: generates side by side comparison with change highlights.
	727
	728	"""
	729
	730	import sys, os, time, difflib, optparse
	731
	732	def main():
	733	# Configure the option parser
	734	usage = "usage: %prog [options] fromfile tofile"
	735	parser = optparse.OptionParser(usage)
	736	parser.add_option("-c", action="store_true", default=False,
	737	help='Produce a context format diff (default)')
	738	parser.add_option("-u", action="store_true", default=False,
	739	help='Produce a unified format diff')
	740	hlp = 'Produce HTML side by side diff (can use -c and -l in conjunction)'
	741	parser.add_option("-m", action="store_true", default=False, help=hlp)
	742	parser.add_option("-n", action="store_true", default=False,
	743	help='Produce a ndiff format diff')
	744	parser.add_option("-l", "--lines", type="int", default=3,
	745	help='Set number of context lines (default 3)')
	746	(options, args) = parser.parse_args()
	747
	748	if len(args) == 0:
	749	parser.print_help()
	750	sys.exit(1)
	751	if len(args) != 2:
	752	parser.error("need to specify both a fromfile and tofile")
	753
	754	n = options.lines
	755	fromfile, tofile = args # as specified in the usage string
	756
	757	# we're passing these as arguments to the diff function
	758	fromdate = time.ctime(os.stat(fromfile).st_mtime)
	759	todate = time.ctime(os.stat(tofile).st_mtime)
	760	fromlines = open(fromfile, 'U').readlines()
	761	tolines = open(tofile, 'U').readlines()
	762
	763	if options.u:
	764	diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile,
	765	fromdate, todate, n=n)
	766	elif options.n:
	767	diff = difflib.ndiff(fromlines, tolines)
	768	elif options.m:
	769	diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
	770	tofile, context=options.c,
	771	numlines=n)
	772	else:
	773	diff = difflib.context_diff(fromlines, tolines, fromfile, tofile,
	774	fromdate, todate, n=n)
	775
	776	# we're using writelines because diff is a generator
	777	sys.stdout.writelines(diff)
	778
	779	if __name__ == '__main__':
	780	main()

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Doc/library/difflib.rst

Download in other formats: