Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

difflib.py@ 381

Last change on this file since 381 was 2, checked in by Yuri Dario, 15 years ago
Initial import for vendor code.
Property svn:eol-style set to `native`
File size: 79.2 KB

Line
1	#! /usr/bin/env python
2
3	"""
4	Module difflib -- helpers for computing deltas between objects.
5
6	Function get_close_matches(word, possibilities, n=3, cutoff=0.6):
7	Use SequenceMatcher to return list of the best "good enough" matches.
8
9	Function context_diff(a, b):
10	For two lists of strings, return a delta in context diff format.
11
12	Function ndiff(a, b):
13	Return a delta: the difference between `a` and `b` (lists of strings).
14
15	Function restore(delta, which):
16	Return one of the two sequences that generated an ndiff delta.
17
18	Function unified_diff(a, b):
19	For two lists of strings, return a delta in unified diff format.
20
21	Class SequenceMatcher:
22	A flexible class for comparing pairs of sequences of any type.
23
24	Class Differ:
25	For producing human-readable deltas from sequences of lines of text.
26
27	Class HtmlDiff:
28	For producing HTML side by side comparison with change highlights.
29	"""
30
31	__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
32	'Differ','IS_CHARACTER_JUNK', 'IS_LINE_JUNK', 'context_diff',
33	'unified_diff', 'HtmlDiff', 'Match']
34
35	import heapq
36	from collections import namedtuple as _namedtuple
37	from functools import reduce
38
39	Match = _namedtuple('Match', 'a b size')
40
41	def _calculate_ratio(matches, length):
42	if length:
43	return 2.0 * matches / length
44	return 1.0
45
46	class SequenceMatcher:
47
48	"""
49	SequenceMatcher is a flexible class for comparing pairs of sequences of
50	any type, so long as the sequence elements are hashable. The basic
51	algorithm predates, and is a little fancier than, an algorithm
52	published in the late 1980's by Ratcliff and Obershelp under the
53	hyperbolic name "gestalt pattern matching". The basic idea is to find
54	the longest contiguous matching subsequence that contains no "junk"
55	elements (R-O doesn't address junk). The same idea is then applied
56	recursively to the pieces of the sequences to the left and to the right
57	of the matching subsequence. This does not yield minimal edit
58	sequences, but does tend to yield matches that "look right" to people.
59
60	SequenceMatcher tries to compute a "human-friendly diff" between two
61	sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
62	longest contiguous & junk-free matching subsequence. That's what
63	catches peoples' eyes. The Windows(tm) windiff has another interesting
64	notion, pairing up elements that appear uniquely in each sequence.
65	That, and the method here, appear to yield more intuitive difference
66	reports than does diff. This method appears to be the least vulnerable
67	to synching up on blocks of "junk lines", though (like blank lines in
68	ordinary text files, or maybe "<P>" lines in HTML files). That may be
69	because this is the only method of the 3 that has a concept of
70	"junk" <wink>.
71
72	Example, comparing two strings, and considering blanks to be "junk":
73
74	>>> s = SequenceMatcher(lambda x: x == " ",
75	... "private Thread currentThread;",
76	... "private volatile Thread currentThread;")
77	>>>
78
79	.ratio() returns a float in [0, 1], measuring the "similarity" of the
80	sequences. As a rule of thumb, a .ratio() value over 0.6 means the
81	sequences are close matches:
82
83	>>> print round(s.ratio(), 3)
84	0.866
85	>>>
86
87	If you're only interested in where the sequences match,
88	.get_matching_blocks() is handy:
89
90	>>> for block in s.get_matching_blocks():
91	... print "a[%d] and b[%d] match for %d elements" % block
92	a[0] and b[0] match for 8 elements
93	a[8] and b[17] match for 21 elements
94	a[29] and b[38] match for 0 elements
95
96	Note that the last tuple returned by .get_matching_blocks() is always a
97	dummy, (len(a), len(b), 0), and this is the only case in which the last
98	tuple element (number of elements matched) is 0.
99
100	If you want to know how to change the first sequence into the second,
101	use .get_opcodes():
102
103	>>> for opcode in s.get_opcodes():
104	... print "%6s a[%d:%d] b[%d:%d]" % opcode
105	equal a[0:8] b[0:8]
106	insert a[8:8] b[8:17]
107	equal a[8:29] b[17:38]
108
109	See the Differ class for a fancy human-friendly file differencer, which
110	uses SequenceMatcher both to compare sequences of lines, and to compare
111	sequences of characters within similar (near-matching) lines.
112
113	See also function get_close_matches() in this module, which shows how
114	simple code building on SequenceMatcher can be used to do useful work.
115
116	Timing: Basic R-O is cubic time worst case and quadratic time expected
117	case. SequenceMatcher is quadratic time for the worst case and has
118	expected-case behavior dependent in a complicated way on how many
119	elements the sequences have in common; best case time is linear.
120
121	Methods:
122
123	__init__(isjunk=None, a='', b='')
124	Construct a SequenceMatcher.
125
126	set_seqs(a, b)
127	Set the two sequences to be compared.
128
129	set_seq1(a)
130	Set the first sequence to be compared.
131
132	set_seq2(b)
133	Set the second sequence to be compared.
134
135	find_longest_match(alo, ahi, blo, bhi)
136	Find longest matching block in a[alo:ahi] and b[blo:bhi].
137
138	get_matching_blocks()
139	Return list of triples describing matching subsequences.
140
141	get_opcodes()
142	Return list of 5-tuples describing how to turn a into b.
143
144	ratio()
145	Return a measure of the sequences' similarity (float in [0,1]).
146
147	quick_ratio()
148	Return an upper bound on .ratio() relatively quickly.
149
150	real_quick_ratio()
151	Return an upper bound on ratio() very quickly.
152	"""
153
154	def __init__(self, isjunk=None, a='', b=''):
155	"""Construct a SequenceMatcher.
156
157	Optional arg isjunk is None (the default), or a one-argument
158	function that takes a sequence element and returns true iff the
159	element is junk. None is equivalent to passing "lambda x: 0", i.e.
160	no elements are considered to be junk. For example, pass
161	lambda x: x in " \\t"
162	if you're comparing lines as sequences of characters, and don't
163	want to synch up on blanks or hard tabs.
164
165	Optional arg a is the first of two sequences to be compared. By
166	default, an empty string. The elements of a must be hashable. See
167	also .set_seqs() and .set_seq1().
168
169	Optional arg b is the second of two sequences to be compared. By
170	default, an empty string. The elements of b must be hashable. See
171	also .set_seqs() and .set_seq2().
172	"""
173
174	# Members:
175	# a
176	# first sequence
177	# b
178	# second sequence; differences are computed as "what do
179	# we need to do to 'a' to change it into 'b'?"
180	# b2j
181	# for x in b, b2j[x] is a list of the indices (into b)
182	# at which x appears; junk elements do not appear
183	# fullbcount
184	# for x in b, fullbcount[x] == the number of times x
185	# appears in b; only materialized if really needed (used
186	# only for computing quick_ratio())
187	# matching_blocks
188	# a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
189	# ascending & non-overlapping in i and in j; terminated by
190	# a dummy (len(a), len(b), 0) sentinel
191	# opcodes
192	# a list of (tag, i1, i2, j1, j2) tuples, where tag is
193	# one of
194	# 'replace' a[i1:i2] should be replaced by b[j1:j2]
195	# 'delete' a[i1:i2] should be deleted
196	# 'insert' b[j1:j2] should be inserted
197	# 'equal' a[i1:i2] == b[j1:j2]
198	# isjunk
199	# a user-supplied function taking a sequence element and
200	# returning true iff the element is "junk" -- this has
201	# subtle but helpful effects on the algorithm, which I'll
202	# get around to writing up someday <0.9 wink>.
203	# DON'T USE! Only __chain_b uses this. Use isbjunk.
204	# isbjunk
205	# for x in b, isbjunk(x) == isjunk(x) but much faster;
206	# it's really the __contains__ method of a hidden dict.
207	# DOES NOT WORK for x in a!
208	# isbpopular
209	# for x in b, isbpopular(x) is true iff b is reasonably long
210	# (at least 200 elements) and x accounts for more than 1% of
211	# its elements. DOES NOT WORK for x in a!
212
213	self.isjunk = isjunk
214	self.a = self.b = None
215	self.set_seqs(a, b)
216
217	def set_seqs(self, a, b):
218	"""Set the two sequences to be compared.
219
220	>>> s = SequenceMatcher()
221	>>> s.set_seqs("abcd", "bcde")
222	>>> s.ratio()
223	0.75
224	"""
225
226	self.set_seq1(a)
227	self.set_seq2(b)
228
229	def set_seq1(self, a):
230	"""Set the first sequence to be compared.
231
232	The second sequence to be compared is not changed.
233
234	>>> s = SequenceMatcher(None, "abcd", "bcde")
235	>>> s.ratio()
236	0.75
237	>>> s.set_seq1("bcde")
238	>>> s.ratio()
239	1.0
240	>>>
241
242	SequenceMatcher computes and caches detailed information about the
243	second sequence, so if you want to compare one sequence S against
244	many sequences, use .set_seq2(S) once and call .set_seq1(x)
245	repeatedly for each of the other sequences.
246
247	See also set_seqs() and set_seq2().
248	"""
249
250	if a is self.a:
251	return
252	self.a = a
253	self.matching_blocks = self.opcodes = None
254
255	def set_seq2(self, b):
256	"""Set the second sequence to be compared.
257
258	The first sequence to be compared is not changed.
259
260	>>> s = SequenceMatcher(None, "abcd", "bcde")
261	>>> s.ratio()
262	0.75
263	>>> s.set_seq2("abcd")
264	>>> s.ratio()
265	1.0
266	>>>
267
268	SequenceMatcher computes and caches detailed information about the
269	second sequence, so if you want to compare one sequence S against
270	many sequences, use .set_seq2(S) once and call .set_seq1(x)
271	repeatedly for each of the other sequences.
272
273	See also set_seqs() and set_seq1().
274	"""
275
276	if b is self.b:
277	return
278	self.b = b
279	self.matching_blocks = self.opcodes = None
280	self.fullbcount = None
281	self.__chain_b()
282
283	# For each element x in b, set b2j[x] to a list of the indices in
284	# b where x appears; the indices are in increasing order; note that
285	# the number of times x appears in b is len(b2j[x]) ...
286	# when self.isjunk is defined, junk elements don't show up in this
287	# map at all, which stops the central find_longest_match method
288	# from starting any matching block at a junk element ...
289	# also creates the fast isbjunk function ...
290	# b2j also does not contain entries for "popular" elements, meaning
291	# elements that account for more than 1% of the total elements, and
292	# when the sequence is reasonably large (>= 200 elements); this can
293	# be viewed as an adaptive notion of semi-junk, and yields an enormous
294	# speedup when, e.g., comparing program files with hundreds of
295	# instances of "return NULL;" ...
296	# note that this is only called when b changes; so for cross-product
297	# kinds of matches, it's best to call set_seq2 once, then set_seq1
298	# repeatedly
299
300	def __chain_b(self):
301	# Because isjunk is a user-defined (not C) function, and we test
302	# for junk a LOT, it's important to minimize the number of calls.
303	# Before the tricks described here, __chain_b was by far the most
304	# time-consuming routine in the whole module! If anyone sees
305	# Jim Roskind, thank him again for profile.py -- I never would
306	# have guessed that.
307	# The first trick is to build b2j ignoring the possibility
308	# of junk. I.e., we don't call isjunk at all yet. Throwing
309	# out the junk later is much cheaper than building b2j "right"
310	# from the start.
311	b = self.b
312	n = len(b)
313	self.b2j = b2j = {}
314	populardict = {}
315	for i, elt in enumerate(b):
316	if elt in b2j:
317	indices = b2j[elt]
318	if n >= 200 and len(indices) * 100 > n:
319	populardict[elt] = 1
320	del indices[:]
321	else:
322	indices.append(i)
323	else:
324	b2j[elt] = [i]
325
326	# Purge leftover indices for popular elements.
327	for elt in populardict:
328	del b2j[elt]
329
330	# Now b2j.keys() contains elements uniquely, and especially when
331	# the sequence is a string, that's usually a good deal smaller
332	# than len(string). The difference is the number of isjunk calls
333	# saved.
334	isjunk = self.isjunk
335	junkdict = {}
336	if isjunk:
337	for d in populardict, b2j:
338	for elt in d.keys():
339	if isjunk(elt):
340	junkdict[elt] = 1
341	del d[elt]
342
343	# Now for x in b, isjunk(x) == x in junkdict, but the
344	# latter is much faster. Note too that while there may be a
345	# lot of junk in the sequence, the number of unique junk
346	# elements is probably small. So the memory burden of keeping
347	# this dict alive is likely trivial compared to the size of b2j.
348	self.isbjunk = junkdict.__contains__
349	self.isbpopular = populardict.__contains__
350
351	def find_longest_match(self, alo, ahi, blo, bhi):
352	"""Find longest matching block in a[alo:ahi] and b[blo:bhi].
353
354	If isjunk is not defined:
355
356	Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
357	alo <= i <= i+k <= ahi
358	blo <= j <= j+k <= bhi
359	and for all (i',j',k') meeting those conditions,
360	k >= k'
361	i <= i'
362	and if i == i', j <= j'
363
364	In other words, of all maximal matching blocks, return one that
365	starts earliest in a, and of all those maximal matching blocks that
366	start earliest in a, return the one that starts earliest in b.
367
368	>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
369	>>> s.find_longest_match(0, 5, 0, 9)
370	Match(a=0, b=4, size=5)
371
372	If isjunk is defined, first the longest matching block is
373	determined as above, but with the additional restriction that no
374	junk element appears in the block. Then that block is extended as
375	far as possible by matching (only) junk elements on both sides. So
376	the resulting block never matches on junk except as identical junk
377	happens to be adjacent to an "interesting" match.
378
379	Here's the same example as before, but considering blanks to be
380	junk. That prevents " abcd" from matching the " abcd" at the tail
381	end of the second sequence directly. Instead only the "abcd" can
382	match, and matches the leftmost "abcd" in the second sequence:
383
384	>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
385	>>> s.find_longest_match(0, 5, 0, 9)
386	Match(a=1, b=0, size=4)
387
388	If no blocks match, return (alo, blo, 0).
389
390	>>> s = SequenceMatcher(None, "ab", "c")
391	>>> s.find_longest_match(0, 2, 0, 1)
392	Match(a=0, b=0, size=0)
393	"""
394
395	# CAUTION: stripping common prefix or suffix would be incorrect.
396	# E.g.,
397	# ab
398	# acab
399	# Longest matching block is "ab", but if common prefix is
400	# stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
401	# strip, so ends up claiming that ab is changed to acab by
402	# inserting "ca" in the middle. That's minimal but unintuitive:
403	# "it's obvious" that someone inserted "ac" at the front.
404	# Windiff ends up at the same place as diff, but by pairing up
405	# the unique 'b's and then matching the first two 'a's.
406
407	a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.isbjunk
408	besti, bestj, bestsize = alo, blo, 0
409	# find longest junk-free match
410	# during an iteration of the loop, j2len[j] = length of longest
411	# junk-free match ending with a[i-1] and b[j]
412	j2len = {}
413	nothing = []
414	for i in xrange(alo, ahi):
415	# look at all instances of a[i] in b; note that because
416	# b2j has no junk keys, the loop is skipped if a[i] is junk
417	j2lenget = j2len.get
418	newj2len = {}
419	for j in b2j.get(a[i], nothing):
420	# a[i] matches b[j]
421	if j < blo:
422	continue
423	if j >= bhi:
424	break
425	k = newj2len[j] = j2lenget(j-1, 0) + 1
426	if k > bestsize:
427	besti, bestj, bestsize = i-k+1, j-k+1, k
428	j2len = newj2len
429
430	# Extend the best by non-junk elements on each end. In particular,
431	# "popular" non-junk elements aren't in b2j, which greatly speeds
432	# the inner loop above, but also means "the best" match so far
433	# doesn't contain any junk or popular non-junk elements.
434	while besti > alo and bestj > blo and \
435	not isbjunk(b[bestj-1]) and \
436	a[besti-1] == b[bestj-1]:
437	besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
438	while besti+bestsize < ahi and bestj+bestsize < bhi and \
439	not isbjunk(b[bestj+bestsize]) and \
440	a[besti+bestsize] == b[bestj+bestsize]:
441	bestsize += 1
442
443	# Now that we have a wholly interesting match (albeit possibly
444	# empty!), we may as well suck up the matching junk on each
445	# side of it too. Can't think of a good reason not to, and it
446	# saves post-processing the (possibly considerable) expense of
447	# figuring out what to do with it. In the case of an empty
448	# interesting match, this is clearly the right thing to do,
449	# because no other kind of match is possible in the regions.
450	while besti > alo and bestj > blo and \
451	isbjunk(b[bestj-1]) and \
452	a[besti-1] == b[bestj-1]:
453	besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
454	while besti+bestsize < ahi and bestj+bestsize < bhi and \
455	isbjunk(b[bestj+bestsize]) and \
456	a[besti+bestsize] == b[bestj+bestsize]:
457	bestsize = bestsize + 1
458
459	return Match(besti, bestj, bestsize)
460
461	def get_matching_blocks(self):
462	"""Return list of triples describing matching subsequences.
463
464	Each triple is of the form (i, j, n), and means that
465	a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in
466	i and in j. New in Python 2.5, it's also guaranteed that if
467	(i, j, n) and (i', j', n') are adjacent triples in the list, and
468	the second is not the last triple in the list, then i+n != i' or
469	j+n != j'. IOW, adjacent triples never describe adjacent equal
470	blocks.
471
472	The last triple is a dummy, (len(a), len(b), 0), and is the only
473	triple with n==0.
474
475	>>> s = SequenceMatcher(None, "abxcd", "abcd")
476	>>> s.get_matching_blocks()
477	[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
478	"""
479
480	if self.matching_blocks is not None:
481	return self.matching_blocks
482	la, lb = len(self.a), len(self.b)
483
484	# This is most naturally expressed as a recursive algorithm, but
485	# at least one user bumped into extreme use cases that exceeded
486	# the recursion limit on their box. So, now we maintain a list
487	# ('queue`) of blocks we still need to look at, and append partial
488	# results to `matching_blocks` in a loop; the matches are sorted
489	# at the end.
490	queue = [(0, la, 0, lb)]
491	matching_blocks = []
492	while queue:
493	alo, ahi, blo, bhi = queue.pop()
494	i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
495	# a[alo:i] vs b[blo:j] unknown
496	# a[i:i+k] same as b[j:j+k]
497	# a[i+k:ahi] vs b[j+k:bhi] unknown
498	if k: # if k is 0, there was no matching block
499	matching_blocks.append(x)
500	if alo < i and blo < j:
501	queue.append((alo, i, blo, j))
502	if i+k < ahi and j+k < bhi:
503	queue.append((i+k, ahi, j+k, bhi))
504	matching_blocks.sort()
505
506	# It's possible that we have adjacent equal blocks in the
507	# matching_blocks list now. Starting with 2.5, this code was added
508	# to collapse them.
509	i1 = j1 = k1 = 0
510	non_adjacent = []
511	for i2, j2, k2 in matching_blocks:
512	# Is this block adjacent to i1, j1, k1?
513	if i1 + k1 == i2 and j1 + k1 == j2:
514	# Yes, so collapse them -- this just increases the length of
515	# the first block by the length of the second, and the first
516	# block so lengthened remains the block to compare against.
517	k1 += k2
518	else:
519	# Not adjacent. Remember the first block (k1==0 means it's
520	# the dummy we started with), and make the second block the
521	# new block to compare against.
522	if k1:
523	non_adjacent.append((i1, j1, k1))
524	i1, j1, k1 = i2, j2, k2
525	if k1:
526	non_adjacent.append((i1, j1, k1))
527
528	non_adjacent.append( (la, lb, 0) )
529	self.matching_blocks = non_adjacent
530	return map(Match._make, self.matching_blocks)
531
532	def get_opcodes(self):
533	"""Return list of 5-tuples describing how to turn a into b.
534
535	Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple
536	has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the
537	tuple preceding it, and likewise for j1 == the previous j2.
538
539	The tags are strings, with these meanings:
540
541	'replace': a[i1:i2] should be replaced by b[j1:j2]
542	'delete': a[i1:i2] should be deleted.
543	Note that j1==j2 in this case.
544	'insert': b[j1:j2] should be inserted at a[i1:i1].
545	Note that i1==i2 in this case.
546	'equal': a[i1:i2] == b[j1:j2]
547
548	>>> a = "qabxcd"
549	>>> b = "abycdf"
550	>>> s = SequenceMatcher(None, a, b)
551	>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
552	... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
553	... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
554	delete a[0:1] (q) b[0:0] ()
555	equal a[1:3] (ab) b[0:2] (ab)
556	replace a[3:4] (x) b[2:3] (y)
557	equal a[4:6] (cd) b[3:5] (cd)
558	insert a[6:6] () b[5:6] (f)
559	"""
560
561	if self.opcodes is not None:
562	return self.opcodes
563	i = j = 0
564	self.opcodes = answer = []
565	for ai, bj, size in self.get_matching_blocks():
566	# invariant: we've pumped out correct diffs to change
567	# a[:i] into b[:j], and the next matching block is
568	# a[ai:ai+size] == b[bj:bj+size]. So we need to pump
569	# out a diff to change a[i:ai] into b[j:bj], pump out
570	# the matching block, and move (i,j) beyond the match
571	tag = ''
572	if i < ai and j < bj:
573	tag = 'replace'
574	elif i < ai:
575	tag = 'delete'
576	elif j < bj:
577	tag = 'insert'
578	if tag:
579	answer.append( (tag, i, ai, j, bj) )
580	i, j = ai+size, bj+size
581	# the list of matching blocks is terminated by a
582	# sentinel with size 0
583	if size:
584	answer.append( ('equal', ai, i, bj, j) )
585	return answer
586
587	def get_grouped_opcodes(self, n=3):
588	""" Isolate change clusters by eliminating ranges with no changes.
589
590	Return a generator of groups with upto n lines of context.
591	Each group is in the same format as returned by get_opcodes().
592
593	>>> from pprint import pprint
594	>>> a = map(str, range(1,40))
595	>>> b = a[:]
596	>>> b[8:8] = ['i'] # Make an insertion
597	>>> b[20] += 'x' # Make a replacement
598	>>> b[23:28] = [] # Make a deletion
599	>>> b[30] += 'y' # Make another replacement
600	>>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes()))
601	[[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)],
602	[('equal', 16, 19, 17, 20),
603	('replace', 19, 20, 20, 21),
604	('equal', 20, 22, 21, 23),
605	('delete', 22, 27, 23, 23),
606	('equal', 27, 30, 23, 26)],
607	[('equal', 31, 34, 27, 30),
608	('replace', 34, 35, 30, 31),
609	('equal', 35, 38, 31, 34)]]
610	"""
611
612	codes = self.get_opcodes()
613	if not codes:
614	codes = [("equal", 0, 1, 0, 1)]
615	# Fixup leading and trailing groups if they show no changes.
616	if codes[0][0] == 'equal':
617	tag, i1, i2, j1, j2 = codes[0]
618	codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
619	if codes[-1][0] == 'equal':
620	tag, i1, i2, j1, j2 = codes[-1]
621	codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
622
623	nn = n + n
624	group = []
625	for tag, i1, i2, j1, j2 in codes:
626	# End the current group and start a new one whenever
627	# there is a large range with no changes.
628	if tag == 'equal' and i2-i1 > nn:
629	group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n)))
630	yield group
631	group = []
632	i1, j1 = max(i1, i2-n), max(j1, j2-n)
633	group.append((tag, i1, i2, j1 ,j2))
634	if group and not (len(group)==1 and group[0][0] == 'equal'):
635	yield group
636
637	def ratio(self):
638	"""Return a measure of the sequences' similarity (float in [0,1]).
639
640	Where T is the total number of elements in both sequences, and
641	M is the number of matches, this is 2.0*M / T.
642	Note that this is 1 if the sequences are identical, and 0 if
643	they have nothing in common.
644
645	.ratio() is expensive to compute if you haven't already computed
646	.get_matching_blocks() or .get_opcodes(), in which case you may
647	want to try .quick_ratio() or .real_quick_ratio() first to get an
648	upper bound.
649
650	>>> s = SequenceMatcher(None, "abcd", "bcde")
651	>>> s.ratio()
652	0.75
653	>>> s.quick_ratio()
654	0.75
655	>>> s.real_quick_ratio()
656	1.0
657	"""
658
659	matches = reduce(lambda sum, triple: sum + triple[-1],
660	self.get_matching_blocks(), 0)
661	return _calculate_ratio(matches, len(self.a) + len(self.b))
662
663	def quick_ratio(self):
664	"""Return an upper bound on ratio() relatively quickly.
665
666	This isn't defined beyond that it is an upper bound on .ratio(), and
667	is faster to compute.
668	"""
669
670	# viewing a and b as multisets, set matches to the cardinality
671	# of their intersection; this counts the number of matches
672	# without regard to order, so is clearly an upper bound
673	if self.fullbcount is None:
674	self.fullbcount = fullbcount = {}
675	for elt in self.b:
676	fullbcount[elt] = fullbcount.get(elt, 0) + 1
677	fullbcount = self.fullbcount
678	# avail[x] is the number of times x appears in 'b' less the
679	# number of times we've seen it in 'a' so far ... kinda
680	avail = {}
681	availhas, matches = avail.__contains__, 0
682	for elt in self.a:
683	if availhas(elt):
684	numb = avail[elt]
685	else:
686	numb = fullbcount.get(elt, 0)
687	avail[elt] = numb - 1
688	if numb > 0:
689	matches = matches + 1
690	return _calculate_ratio(matches, len(self.a) + len(self.b))
691
692	def real_quick_ratio(self):
693	"""Return an upper bound on ratio() very quickly.
694
695	This isn't defined beyond that it is an upper bound on .ratio(), and
696	is faster to compute than either .ratio() or .quick_ratio().
697	"""
698
699	la, lb = len(self.a), len(self.b)
700	# can't have more matches than the number of elements in the
701	# shorter sequence
702	return _calculate_ratio(min(la, lb), la + lb)
703
704	def get_close_matches(word, possibilities, n=3, cutoff=0.6):
705	"""Use SequenceMatcher to return list of the best "good enough" matches.
706
707	word is a sequence for which close matches are desired (typically a
708	string).
709
710	possibilities is a list of sequences against which to match word
711	(typically a list of strings).
712
713	Optional arg n (default 3) is the maximum number of close matches to
714	return. n must be > 0.
715
716	Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
717	that don't score at least that similar to word are ignored.
718
719	The best (no more than n) matches among the possibilities are returned
720	in a list, sorted by similarity score, most similar first.
721
722	>>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
723	['apple', 'ape']
724	>>> import keyword as _keyword
725	>>> get_close_matches("wheel", _keyword.kwlist)
726	['while']
727	>>> get_close_matches("apple", _keyword.kwlist)
728	[]
729	>>> get_close_matches("accept", _keyword.kwlist)
730	['except']
731	"""
732
733	if not n > 0:
734	raise ValueError("n must be > 0: %r" % (n,))
735	if not 0.0 <= cutoff <= 1.0:
736	raise ValueError("cutoff must be in [0.0, 1.0]: %r" % (cutoff,))
737	result = []
738	s = SequenceMatcher()
739	s.set_seq2(word)
740	for x in possibilities:
741	s.set_seq1(x)
742	if s.real_quick_ratio() >= cutoff and \
743	s.quick_ratio() >= cutoff and \
744	s.ratio() >= cutoff:
745	result.append((s.ratio(), x))
746
747	# Move the best scorers to head of list
748	result = heapq.nlargest(n, result)
749	# Strip scores for the best n matches
750	return [x for score, x in result]
751
752	def _count_leading(line, ch):
753	"""
754	Return number of `ch` characters at the start of `line`.
755
756	Example:
757
758	>>> _count_leading(' abc', ' ')
759	3
760	"""
761
762	i, n = 0, len(line)
763	while i < n and line[i] == ch:
764	i += 1
765	return i
766
767	class Differ:
768	r"""
769	Differ is a class for comparing sequences of lines of text, and
770	producing human-readable differences or deltas. Differ uses
771	SequenceMatcher both to compare sequences of lines, and to compare
772	sequences of characters within similar (near-matching) lines.
773
774	Each line of a Differ delta begins with a two-letter code:
775
776	'- ' line unique to sequence 1
777	'+ ' line unique to sequence 2
778	' ' line common to both sequences
779	'? ' line not present in either input sequence
780
781	Lines beginning with '? ' attempt to guide the eye to intraline
782	differences, and were not present in either input sequence. These lines
783	can be confusing if the sequences contain tab characters.
784
785	Note that Differ makes no claim to produce a minimal diff. To the
786	contrary, minimal diffs are often counter-intuitive, because they synch
787	up anywhere possible, sometimes accidental matches 100 pages apart.
788	Restricting synch points to contiguous matches preserves some notion of
789	locality, at the occasional cost of producing a longer diff.
790
791	Example: Comparing two texts.
792
793	First we set up the texts, sequences of individual single-line strings
794	ending with newlines (such sequences can also be obtained from the
795	`readlines()` method of file-like objects):
796
797	>>> text1 = ''' 1. Beautiful is better than ugly.
798	... 2. Explicit is better than implicit.
799	... 3. Simple is better than complex.
800	... 4. Complex is better than complicated.
801	... '''.splitlines(1)
802	>>> len(text1)
803	4
804	>>> text1[0][-1]
805	'\n'
806	>>> text2 = ''' 1. Beautiful is better than ugly.
807	... 3. Simple is better than complex.
808	... 4. Complicated is better than complex.
809	... 5. Flat is better than nested.
810	... '''.splitlines(1)
811
812	Next we instantiate a Differ object:
813
814	>>> d = Differ()
815
816	Note that when instantiating a Differ object we may pass functions to
817	filter out line and character 'junk'. See Differ.__init__ for details.
818
819	Finally, we compare the two:
820
821	>>> result = list(d.compare(text1, text2))
822
823	'result' is a list of strings, so let's pretty-print it:
824
825	>>> from pprint import pprint as _pprint
826	>>> _pprint(result)
827	[' 1. Beautiful is better than ugly.\n',
828	'- 2. Explicit is better than implicit.\n',
829	'- 3. Simple is better than complex.\n',
830	'+ 3. Simple is better than complex.\n',
831	'? ++\n',
832	'- 4. Complex is better than complicated.\n',
833	'? ^ ---- ^\n',
834	'+ 4. Complicated is better than complex.\n',
835	'? ++++ ^ ^\n',
836	'+ 5. Flat is better than nested.\n']
837
838	As a single multi-line string it looks like this:
839
840	>>> print ''.join(result),
841	1. Beautiful is better than ugly.
842	- 2. Explicit is better than implicit.
843	- 3. Simple is better than complex.
844	+ 3. Simple is better than complex.
845	? ++
846	- 4. Complex is better than complicated.
847	? ^ ---- ^
848	+ 4. Complicated is better than complex.
849	? ++++ ^ ^
850	+ 5. Flat is better than nested.
851
852	Methods:
853
854	__init__(linejunk=None, charjunk=None)
855	Construct a text differencer, with optional filters.
856
857	compare(a, b)
858	Compare two sequences of lines; generate the resulting delta.
859	"""
860
861	def __init__(self, linejunk=None, charjunk=None):
862	"""
863	Construct a text differencer, with optional filters.
864
865	The two optional keyword parameters are for filter functions:
866
867	- `linejunk`: A function that should accept a single string argument,
868	and return true iff the string is junk. The module-level function
869	`IS_LINE_JUNK` may be used to filter out lines without visible
870	characters, except for at most one splat ('#'). It is recommended
871	to leave linejunk None; as of Python 2.3, the underlying
872	SequenceMatcher class has grown an adaptive notion of "noise" lines
873	that's better than any static definition the author has ever been
874	able to craft.
875
876	- `charjunk`: A function that should accept a string of length 1. The
877	module-level function `IS_CHARACTER_JUNK` may be used to filter out
878	whitespace characters (a blank or tab; note: bad idea to include
879	newline in this!). Use of IS_CHARACTER_JUNK is recommended.
880	"""
881
882	self.linejunk = linejunk
883	self.charjunk = charjunk
884
885	def compare(self, a, b):
886	r"""
887	Compare two sequences of lines; generate the resulting delta.
888
889	Each sequence must contain individual single-line strings ending with
890	newlines. Such sequences can be obtained from the `readlines()` method
891	of file-like objects. The delta generated also consists of newline-
892	terminated strings, ready to be printed as-is via the writeline()
893	method of a file-like object.
894
895	Example:
896
897	>>> print ''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(1),
898	... 'ore\ntree\nemu\n'.splitlines(1))),
899	- one
900	? ^
901	+ ore
902	? ^
903	- two
904	- three
905	? -
906	+ tree
907	+ emu
908	"""
909
910	cruncher = SequenceMatcher(self.linejunk, a, b)
911	for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
912	if tag == 'replace':
913	g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
914	elif tag == 'delete':
915	g = self._dump('-', a, alo, ahi)
916	elif tag == 'insert':
917	g = self._dump('+', b, blo, bhi)
918	elif tag == 'equal':
919	g = self._dump(' ', a, alo, ahi)
920	else:
921	raise ValueError, 'unknown tag %r' % (tag,)
922
923	for line in g:
924	yield line
925
926	def _dump(self, tag, x, lo, hi):
927	"""Generate comparison results for a same-tagged range."""
928	for i in xrange(lo, hi):
929	yield '%s %s' % (tag, x[i])
930
931	def _plain_replace(self, a, alo, ahi, b, blo, bhi):
932	assert alo < ahi and blo < bhi
933	# dump the shorter block first -- reduces the burden on short-term
934	# memory if the blocks are of very different sizes
935	if bhi - blo < ahi - alo:
936	first = self._dump('+', b, blo, bhi)
937	second = self._dump('-', a, alo, ahi)
938	else:
939	first = self._dump('-', a, alo, ahi)
940	second = self._dump('+', b, blo, bhi)
941
942	for g in first, second:
943	for line in g:
944	yield line
945
946	def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
947	r"""
948	When replacing one block of lines with another, search the blocks
949	for similar lines; the best-matching pair (if any) is used as a
950	synch point, and intraline difference marking is done on the
951	similar pair. Lots of work, but often worth it.
952
953	Example:
954
955	>>> d = Differ()
956	>>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
957	... ['abcdefGhijkl\n'], 0, 1)
958	>>> print ''.join(results),
959	- abcDefghiJkl
960	? ^ ^ ^
961	+ abcdefGhijkl
962	? ^ ^ ^
963	"""
964
965	# don't synch up unless the lines have a similarity score of at
966	# least cutoff; best_ratio tracks the best score seen so far
967	best_ratio, cutoff = 0.74, 0.75
968	cruncher = SequenceMatcher(self.charjunk)
969	eqi, eqj = None, None # 1st indices of equal lines (if any)
970
971	# search for the pair that matches best without being identical
972	# (identical lines must be junk lines, & we don't want to synch up
973	# on junk -- unless we have to)
974	for j in xrange(blo, bhi):
975	bj = b[j]
976	cruncher.set_seq2(bj)
977	for i in xrange(alo, ahi):
978	ai = a[i]
979	if ai == bj:
980	if eqi is None:
981	eqi, eqj = i, j
982	continue
983	cruncher.set_seq1(ai)
984	# computing similarity is expensive, so use the quick
985	# upper bounds first -- have seen this speed up messy
986	# compares by a factor of 3.
987	# note that ratio() is only expensive to compute the first
988	# time it's called on a sequence pair; the expensive part
989	# of the computation is cached by cruncher
990	if cruncher.real_quick_ratio() > best_ratio and \
991	cruncher.quick_ratio() > best_ratio and \
992	cruncher.ratio() > best_ratio:
993	best_ratio, best_i, best_j = cruncher.ratio(), i, j
994	if best_ratio < cutoff:
995	# no non-identical "pretty close" pair
996	if eqi is None:
997	# no identical pair either -- treat it as a straight replace
998	for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
999	yield line
1000	return
1001	# no close pair, but an identical pair -- synch up on that
1002	best_i, best_j, best_ratio = eqi, eqj, 1.0
1003	else:
1004	# there's a close pair, so forget the identical pair (if any)
1005	eqi = None
1006
1007	# a[best_i] very similar to b[best_j]; eqi is None iff they're not
1008	# identical
1009
1010	# pump out diffs from before the synch point
1011	for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
1012	yield line
1013
1014	# do intraline marking on the synch pair
1015	aelt, belt = a[best_i], b[best_j]
1016	if eqi is None:
1017	# pump out a '-', '?', '+', '?' quad for the synched lines
1018	atags = btags = ""
1019	cruncher.set_seqs(aelt, belt)
1020	for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
1021	la, lb = ai2 - ai1, bj2 - bj1
1022	if tag == 'replace':
1023	atags += '^' * la
1024	btags += '^' * lb
1025	elif tag == 'delete':
1026	atags += '-' * la
1027	elif tag == 'insert':
1028	btags += '+' * lb
1029	elif tag == 'equal':
1030	atags += ' ' * la
1031	btags += ' ' * lb
1032	else:
1033	raise ValueError, 'unknown tag %r' % (tag,)
1034	for line in self._qformat(aelt, belt, atags, btags):
1035	yield line
1036	else:
1037	# the synch pair is identical
1038	yield ' ' + aelt
1039
1040	# pump out diffs from after the synch point
1041	for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
1042	yield line
1043
1044	def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
1045	g = []
1046	if alo < ahi:
1047	if blo < bhi:
1048	g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
1049	else:
1050	g = self._dump('-', a, alo, ahi)
1051	elif blo < bhi:
1052	g = self._dump('+', b, blo, bhi)
1053
1054	for line in g:
1055	yield line
1056
1057	def _qformat(self, aline, bline, atags, btags):
1058	r"""
1059	Format "?" output and deal with leading tabs.
1060
1061	Example:
1062
1063	>>> d = Differ()
1064	>>> results = d._qformat('\tabcDefghiJkl\n', '\tabcdefGhijkl\n',
1065	... ' ^ ^ ^ ', ' ^ ^ ^ ')
1066	>>> for line in results: print repr(line)
1067	...
1068	'- \tabcDefghiJkl\n'
1069	'? \t ^ ^ ^\n'
1070	'+ \tabcdefGhijkl\n'
1071	'? \t ^ ^ ^\n'
1072	"""
1073
1074	# Can hurt, but will probably help most of the time.
1075	common = min(_count_leading(aline, "\t"),
1076	_count_leading(bline, "\t"))
1077	common = min(common, _count_leading(atags[:common], " "))
1078	common = min(common, _count_leading(btags[:common], " "))
1079	atags = atags[common:].rstrip()
1080	btags = btags[common:].rstrip()
1081
1082	yield "- " + aline
1083	if atags:
1084	yield "? %s%s\n" % ("\t" * common, atags)
1085
1086	yield "+ " + bline
1087	if btags:
1088	yield "? %s%s\n" % ("\t" * common, btags)
1089
1090	# With respect to junk, an earlier version of ndiff simply refused to
1091	# start a match with a junk element. The result was cases like this:
1092	# before: private Thread currentThread;
1093	# after: private volatile Thread currentThread;
1094	# If you consider whitespace to be junk, the longest contiguous match
1095	# not starting with junk is "e Thread currentThread". So ndiff reported
1096	# that "e volatil" was inserted between the 't' and the 'e' in "private".
1097	# While an accurate view, to people that's absurd. The current version
1098	# looks for matching blocks that are entirely junk-free, then extends the
1099	# longest one of those as far as possible but only with matching junk.
1100	# So now "currentThread" is matched, then extended to suck up the
1101	# preceding blank; then "private" is matched, and extended to suck up the
1102	# following blank; then "Thread" is matched; and finally ndiff reports
1103	# that "volatile " was inserted before "Thread". The only quibble
1104	# remaining is that perhaps it was really the case that " volatile"
1105	# was inserted after "private". I can live with that <wink>.
1106
1107	import re
1108
1109	def IS_LINE_JUNK(line, pat=re.compile(r"\s#?\s$").match):
1110	r"""
1111	Return 1 for ignorable line: iff `line` is blank or contains a single '#'.
1112
1113	Examples:
1114
1115	>>> IS_LINE_JUNK('\n')
1116	True
1117	>>> IS_LINE_JUNK(' # \n')
1118	True
1119	>>> IS_LINE_JUNK('hello\n')
1120	False
1121	"""
1122
1123	return pat(line) is not None
1124
1125	def IS_CHARACTER_JUNK(ch, ws=" \t"):
1126	r"""
1127	Return 1 for ignorable character: iff `ch` is a space or tab.
1128
1129	Examples:
1130
1131	>>> IS_CHARACTER_JUNK(' ')
1132	True
1133	>>> IS_CHARACTER_JUNK('\t')
1134	True
1135	>>> IS_CHARACTER_JUNK('\n')
1136	False
1137	>>> IS_CHARACTER_JUNK('x')
1138	False
1139	"""
1140
1141	return ch in ws
1142
1143
1144	def unified_diff(a, b, fromfile='', tofile='', fromfiledate='',
1145	tofiledate='', n=3, lineterm='\n'):
1146	r"""
1147	Compare two sequences of lines; generate the delta as a unified diff.
1148
1149	Unified diffs are a compact way of showing line changes and a few
1150	lines of context. The number of context lines is set by 'n' which
1151	defaults to three.
1152
1153	By default, the diff control lines (those with ---, +++, or @@) are
1154	created with a trailing newline. This is helpful so that inputs
1155	created from file.readlines() result in diffs that are suitable for
1156	file.writelines() since both the inputs and outputs have trailing
1157	newlines.
1158
1159	For inputs that do not have trailing newlines, set the lineterm
1160	argument to "" so that the output will be uniformly newline free.
1161
1162	The unidiff format normally has a header for filenames and modification
1163	times. Any or all of these may be specified using strings for
1164	'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'. The modification
1165	times are normally expressed in the format returned by time.ctime().
1166
1167	Example:
1168
1169	>>> for line in unified_diff('one two three four'.split(),
1170	... 'zero one tree four'.split(), 'Original', 'Current',
1171	... 'Sat Jan 26 23:30:50 1991', 'Fri Jun 06 10:20:52 2003',
1172	... lineterm=''):
1173	... print line
1174	--- Original Sat Jan 26 23:30:50 1991
1175	+++ Current Fri Jun 06 10:20:52 2003
1176	@@ -1,4 +1,4 @@
1177	+zero
1178	one
1179	-two
1180	-three
1181	+tree
1182	four
1183	"""
1184
1185	started = False
1186	for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
1187	if not started:
1188	yield '--- %s %s%s' % (fromfile, fromfiledate, lineterm)
1189	yield '+++ %s %s%s' % (tofile, tofiledate, lineterm)
1190	started = True
1191	i1, i2, j1, j2 = group[0][1], group[-1][2], group[0][3], group[-1][4]
1192	yield "@@ -%d,%d +%d,%d @@%s" % (i1+1, i2-i1, j1+1, j2-j1, lineterm)
1193	for tag, i1, i2, j1, j2 in group:
1194	if tag == 'equal':
1195	for line in a[i1:i2]:
1196	yield ' ' + line
1197	continue
1198	if tag == 'replace' or tag == 'delete':
1199	for line in a[i1:i2]:
1200	yield '-' + line
1201	if tag == 'replace' or tag == 'insert':
1202	for line in b[j1:j2]:
1203	yield '+' + line
1204
1205	# See http://www.unix.org/single_unix_specification/
1206	def context_diff(a, b, fromfile='', tofile='',
1207	fromfiledate='', tofiledate='', n=3, lineterm='\n'):
1208	r"""
1209	Compare two sequences of lines; generate the delta as a context diff.
1210
1211	Context diffs are a compact way of showing line changes and a few
1212	lines of context. The number of context lines is set by 'n' which
1213	defaults to three.
1214
1215	By default, the diff control lines (those with *** or ---) are
1216	created with a trailing newline. This is helpful so that inputs
1217	created from file.readlines() result in diffs that are suitable for
1218	file.writelines() since both the inputs and outputs have trailing
1219	newlines.
1220
1221	For inputs that do not have trailing newlines, set the lineterm
1222	argument to "" so that the output will be uniformly newline free.
1223
1224	The context diff format normally has a header for filenames and
1225	modification times. Any or all of these may be specified using
1226	strings for 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'.
1227	The modification times are normally expressed in the format returned
1228	by time.ctime(). If not specified, the strings default to blanks.
1229
1230	Example:
1231
1232	>>> print ''.join(context_diff('one\ntwo\nthree\nfour\n'.splitlines(1),
1233	... 'zero\none\ntree\nfour\n'.splitlines(1), 'Original', 'Current',
1234	... 'Sat Jan 26 23:30:50 1991', 'Fri Jun 06 10:22:46 2003')),
1235	*** Original Sat Jan 26 23:30:50 1991
1236	--- Current Fri Jun 06 10:22:46 2003
1237	***************
1238	* 1,4 **
1239	one
1240	! two
1241	! three
1242	four
1243	--- 1,4 ----
1244	+ zero
1245	one
1246	! tree
1247	four
1248	"""
1249
1250	started = False
1251	prefixmap = {'insert':'+ ', 'delete':'- ', 'replace':'! ', 'equal':' '}
1252	for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
1253	if not started:
1254	yield '*** %s %s%s' % (fromfile, fromfiledate, lineterm)
1255	yield '--- %s %s%s' % (tofile, tofiledate, lineterm)
1256	started = True
1257
1258	yield '***************%s' % (lineterm,)
1259	if group[-1][2] - group[0][1] >= 2:
1260	yield '* %d,%d **%s' % (group[0][1]+1, group[-1][2], lineterm)
1261	else:
1262	yield '* %d **%s' % (group[-1][2], lineterm)
1263	visiblechanges = [e for e in group if e[0] in ('replace', 'delete')]
1264	if visiblechanges:
1265	for tag, i1, i2, _, _ in group:
1266	if tag != 'insert':
1267	for line in a[i1:i2]:
1268	yield prefixmap[tag] + line
1269
1270	if group[-1][4] - group[0][3] >= 2:
1271	yield '--- %d,%d ----%s' % (group[0][3]+1, group[-1][4], lineterm)
1272	else:
1273	yield '--- %d ----%s' % (group[-1][4], lineterm)
1274	visiblechanges = [e for e in group if e[0] in ('replace', 'insert')]
1275	if visiblechanges:
1276	for tag, _, _, j1, j2 in group:
1277	if tag != 'delete':
1278	for line in b[j1:j2]:
1279	yield prefixmap[tag] + line
1280
1281	def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
1282	r"""
1283	Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
1284
1285	Optional keyword parameters `linejunk` and `charjunk` are for filter
1286	functions (or None):
1287
1288	- linejunk: A function that should accept a single string argument, and
1289	return true iff the string is junk. The default is None, and is
1290	recommended; as of Python 2.3, an adaptive notion of "noise" lines is
1291	used that does a good job on its own.
1292
1293	- charjunk: A function that should accept a string of length 1. The
1294	default is module-level function IS_CHARACTER_JUNK, which filters out
1295	whitespace characters (a blank or tab; note: bad idea to include newline
1296	in this!).
1297
1298	Tools/scripts/ndiff.py is a command-line front-end to this function.
1299
1300	Example:
1301
1302	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
1303	... 'ore\ntree\nemu\n'.splitlines(1))
1304	>>> print ''.join(diff),
1305	- one
1306	? ^
1307	+ ore
1308	? ^
1309	- two
1310	- three
1311	? -
1312	+ tree
1313	+ emu
1314	"""
1315	return Differ(linejunk, charjunk).compare(a, b)
1316
1317	def _mdiff(fromlines, tolines, context=None, linejunk=None,
1318	charjunk=IS_CHARACTER_JUNK):
1319	r"""Returns generator yielding marked up from/to side by side differences.
1320
1321	Arguments:
1322	fromlines -- list of text lines to compared to tolines
1323	tolines -- list of text lines to be compared to fromlines
1324	context -- number of context lines to display on each side of difference,
1325	if None, all from/to text lines will be generated.
1326	linejunk -- passed on to ndiff (see ndiff documentation)
1327	charjunk -- passed on to ndiff (see ndiff documentation)
1328
1329	This function returns an interator which returns a tuple:
1330	(from line tuple, to line tuple, boolean flag)
1331
1332	from/to line tuple -- (line num, line text)
1333	line num -- integer or None (to indicate a context separation)
1334	line text -- original line text with following markers inserted:
1335	'\0+' -- marks start of added text
1336	'\0-' -- marks start of deleted text
1337	'\0^' -- marks start of changed text
1338	'\1' -- marks end of added/deleted/changed text
1339
1340	boolean flag -- None indicates context separation, True indicates
1341	either "from" or "to" line contains a change, otherwise False.
1342
1343	This function/iterator was originally developed to generate side by side
1344	file difference for making HTML pages (see HtmlDiff class for example
1345	usage).
1346
1347	Note, this function utilizes the ndiff function to generate the side by
1348	side difference markup. Optional ndiff arguments may be passed to this
1349	function and they in turn will be passed to ndiff.
1350	"""
1351	import re
1352
1353	# regular expression for finding intraline change indices
1354	change_re = re.compile('(\++\|\-+\|\^+)')
1355
1356	# create the difference iterator to generate the differences
1357	diff_lines_iterator = ndiff(fromlines,tolines,linejunk,charjunk)
1358
1359	def _make_line(lines, format_key, side, num_lines=[0,0]):
1360	"""Returns line of text with user's change markup and line formatting.
1361
1362	lines -- list of lines from the ndiff generator to produce a line of
1363	text from. When producing the line of text to return, the
1364	lines used are removed from this list.
1365	format_key -- '+' return first line in list with "add" markup around
1366	the entire line.
1367	'-' return first line in list with "delete" markup around
1368	the entire line.
1369	'?' return first line in list with add/delete/change
1370	intraline markup (indices obtained from second line)
1371	None return first line in list with no markup
1372	side -- indice into the num_lines list (0=from,1=to)
1373	num_lines -- from/to current line number. This is NOT intended to be a
1374	passed parameter. It is present as a keyword argument to
1375	maintain memory of the current line numbers between calls
1376	of this function.
1377
1378	Note, this function is purposefully not defined at the module scope so
1379	that data it needs from its parent function (within whose context it
1380	is defined) does not need to be of module scope.
1381	"""
1382	num_lines[side] += 1
1383	# Handle case where no user markup is to be added, just return line of
1384	# text with user's line format to allow for usage of the line number.
1385	if format_key is None:
1386	return (num_lines[side],lines.pop(0)[2:])
1387	# Handle case of intraline changes
1388	if format_key == '?':
1389	text, markers = lines.pop(0), lines.pop(0)
1390	# find intraline changes (store change type and indices in tuples)
1391	sub_info = []
1392	def record_sub_info(match_object,sub_info=sub_info):
1393	sub_info.append([match_object.group(1)[0],match_object.span()])
1394	return match_object.group(1)
1395	change_re.sub(record_sub_info,markers)
1396	# process each tuple inserting our special marks that won't be
1397	# noticed by an xml/html escaper.
1398	for key,(begin,end) in sub_info[::-1]:
1399	text = text[0:begin]+'\0'+key+text[begin:end]+'\1'+text[end:]
1400	text = text[2:]
1401	# Handle case of add/delete entire line
1402	else:
1403	text = lines.pop(0)[2:]
1404	# if line of text is just a newline, insert a space so there is
1405	# something for the user to highlight and see.
1406	if not text:
1407	text = ' '
1408	# insert marks that won't be noticed by an xml/html escaper.
1409	text = '\0' + format_key + text + '\1'
1410	# Return line of text, first allow user's line formatter to do its
1411	# thing (such as adding the line number) then replace the special
1412	# marks with what the user's change markup.
1413	return (num_lines[side],text)
1414
1415	def _line_iterator():
1416	"""Yields from/to lines of text with a change indication.
1417
1418	This function is an iterator. It itself pulls lines from a
1419	differencing iterator, processes them and yields them. When it can
1420	it yields both a "from" and a "to" line, otherwise it will yield one
1421	or the other. In addition to yielding the lines of from/to text, a
1422	boolean flag is yielded to indicate if the text line(s) have
1423	differences in them.
1424
1425	Note, this function is purposefully not defined at the module scope so
1426	that data it needs from its parent function (within whose context it
1427	is defined) does not need to be of module scope.
1428	"""
1429	lines = []
1430	num_blanks_pending, num_blanks_to_yield = 0, 0
1431	while True:
1432	# Load up next 4 lines so we can look ahead, create strings which
1433	# are a concatenation of the first character of each of the 4 lines
1434	# so we can do some very readable comparisons.
1435	while len(lines) < 4:
1436	try:
1437	lines.append(diff_lines_iterator.next())
1438	except StopIteration:
1439	lines.append('X')
1440	s = ''.join([line[0] for line in lines])
1441	if s.startswith('X'):
1442	# When no more lines, pump out any remaining blank lines so the
1443	# corresponding add/delete lines get a matching blank line so
1444	# all line pairs get yielded at the next level.
1445	num_blanks_to_yield = num_blanks_pending
1446	elif s.startswith('-?+?'):
1447	# simple intraline change
1448	yield _make_line(lines,'?',0), _make_line(lines,'?',1), True
1449	continue
1450	elif s.startswith('--++'):
1451	# in delete block, add block coming: we do NOT want to get
1452	# caught up on blank lines yet, just process the delete line
1453	num_blanks_pending -= 1
1454	yield _make_line(lines,'-',0), None, True
1455	continue
1456	elif s.startswith(('--?+', '--+', '- ')):
1457	# in delete block and see a intraline change or unchanged line
1458	# coming: yield the delete line and then blanks
1459	from_line,to_line = _make_line(lines,'-',0), None
1460	num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
1461	elif s.startswith('-+?'):
1462	# intraline change
1463	yield _make_line(lines,None,0), _make_line(lines,'?',1), True
1464	continue
1465	elif s.startswith('-?+'):
1466	# intraline change
1467	yield _make_line(lines,'?',0), _make_line(lines,None,1), True
1468	continue
1469	elif s.startswith('-'):
1470	# delete FROM line
1471	num_blanks_pending -= 1
1472	yield _make_line(lines,'-',0), None, True
1473	continue
1474	elif s.startswith('+--'):
1475	# in add block, delete block coming: we do NOT want to get
1476	# caught up on blank lines yet, just process the add line
1477	num_blanks_pending += 1
1478	yield None, _make_line(lines,'+',1), True
1479	continue
1480	elif s.startswith(('+ ', '+-')):
1481	# will be leaving an add block: yield blanks then add line
1482	from_line, to_line = None, _make_line(lines,'+',1)
1483	num_blanks_to_yield,num_blanks_pending = num_blanks_pending+1,0
1484	elif s.startswith('+'):
1485	# inside an add block, yield the add line
1486	num_blanks_pending += 1
1487	yield None, _make_line(lines,'+',1), True
1488	continue
1489	elif s.startswith(' '):
1490	# unchanged text, yield it to both sides
1491	yield _make_line(lines[:],None,0),_make_line(lines,None,1),False
1492	continue
1493	# Catch up on the blank lines so when we yield the next from/to
1494	# pair, they are lined up.
1495	while(num_blanks_to_yield < 0):
1496	num_blanks_to_yield += 1
1497	yield None,('','\n'),True
1498	while(num_blanks_to_yield > 0):
1499	num_blanks_to_yield -= 1
1500	yield ('','\n'),None,True
1501	if s.startswith('X'):
1502	raise StopIteration
1503	else:
1504	yield from_line,to_line,True
1505
1506	def _line_pair_iterator():
1507	"""Yields from/to lines of text with a change indication.
1508
1509	This function is an iterator. It itself pulls lines from the line
1510	iterator. Its difference from that iterator is that this function
1511	always yields a pair of from/to text lines (with the change
1512	indication). If necessary it will collect single from/to lines
1513	until it has a matching pair from/to pair to yield.
1514
1515	Note, this function is purposefully not defined at the module scope so
1516	that data it needs from its parent function (within whose context it
1517	is defined) does not need to be of module scope.
1518	"""
1519	line_iterator = _line_iterator()
1520	fromlines,tolines=[],[]
1521	while True:
1522	# Collecting lines of text until we have a from/to pair
1523	while (len(fromlines)==0 or len(tolines)==0):
1524	from_line, to_line, found_diff =line_iterator.next()
1525	if from_line is not None:
1526	fromlines.append((from_line,found_diff))
1527	if to_line is not None:
1528	tolines.append((to_line,found_diff))
1529	# Once we have a pair, remove them from the collection and yield it
1530	from_line, fromDiff = fromlines.pop(0)
1531	to_line, to_diff = tolines.pop(0)
1532	yield (from_line,to_line,fromDiff or to_diff)
1533
1534	# Handle case where user does not want context differencing, just yield
1535	# them up without doing anything else with them.
1536	line_pair_iterator = _line_pair_iterator()
1537	if context is None:
1538	while True:
1539	yield line_pair_iterator.next()
1540	# Handle case where user wants context differencing. We must do some
1541	# storage of lines until we know for sure that they are to be yielded.
1542	else:
1543	context += 1
1544	lines_to_write = 0
1545	while True:
1546	# Store lines up until we find a difference, note use of a
1547	# circular queue because we only need to keep around what
1548	# we need for context.
1549	index, contextLines = 0, [None]*(context)
1550	found_diff = False
1551	while(found_diff is False):
1552	from_line, to_line, found_diff = line_pair_iterator.next()
1553	i = index % context
1554	contextLines[i] = (from_line, to_line, found_diff)
1555	index += 1
1556	# Yield lines that we have collected so far, but first yield
1557	# the user's separator.
1558	if index > context:
1559	yield None, None, None
1560	lines_to_write = context
1561	else:
1562	lines_to_write = index
1563	index = 0
1564	while(lines_to_write):
1565	i = index % context
1566	index += 1
1567	yield contextLines[i]
1568	lines_to_write -= 1
1569	# Now yield the context lines after the change
1570	lines_to_write = context-1
1571	while(lines_to_write):
1572	from_line, to_line, found_diff = line_pair_iterator.next()
1573	# If another change within the context, extend the context
1574	if found_diff:
1575	lines_to_write = context-1
1576	else:
1577	lines_to_write -= 1
1578	yield from_line, to_line, found_diff
1579
1580
1581	_file_template = """
1582	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
1583	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
1584
1585	<html>
1586
1587	<head>
1588	<meta http-equiv="Content-Type"
1589	content="text/html; charset=ISO-8859-1" />
1590	<title></title>
1591	<style type="text/css">%(styles)s
1592	</style>
1593	</head>
1594
1595	<body>
1596	%(table)s%(legend)s
1597	</body>
1598
1599	</html>"""
1600
1601	_styles = """
1602	table.diff {font-family:Courier; border:medium;}
1603	.diff_header {background-color:#e0e0e0}
1604	td.diff_header {text-align:right}
1605	.diff_next {background-color:#c0c0c0}
1606	.diff_add {background-color:#aaffaa}
1607	.diff_chg {background-color:#ffff77}
1608	.diff_sub {background-color:#ffaaaa}"""
1609
1610	_table_template = """
1611	<table class="diff" id="difflib_chg_%(prefix)s_top"
1612	cellspacing="0" cellpadding="0" rules="groups" >
1613	<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
1614	<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
1615	%(header_row)s
1616	<tbody>
1617	%(data_rows)s </tbody>
1618	</table>"""
1619
1620	_legend = """
1621	<table class="diff" summary="Legends">
1622	<tr> <th colspan="2"> Legends </th> </tr>
1623	<tr> <td> <table border="" summary="Colors">
1624	<tr><th> Colors </th> </tr>
1625	<tr><td class="diff_add"> Added </td></tr>
1626	<tr><td class="diff_chg">Changed</td> </tr>
1627	<tr><td class="diff_sub">Deleted</td> </tr>
1628	</table></td>
1629	<td> <table border="" summary="Links">
1630	<tr><th colspan="2"> Links </th> </tr>
1631	<tr><td>(f)irst change</td> </tr>
1632	<tr><td>(n)ext change</td> </tr>
1633	<tr><td>(t)op</td> </tr>
1634	</table></td> </tr>
1635	</table>"""
1636
1637	class HtmlDiff(object):
1638	"""For producing HTML side by side comparison with change highlights.
1639
1640	This class can be used to create an HTML table (or a complete HTML file
1641	containing the table) showing a side by side, line by line comparison
1642	of text with inter-line and intra-line change highlights. The table can
1643	be generated in either full or contextual difference mode.
1644
1645	The following methods are provided for HTML generation:
1646
1647	make_table -- generates HTML for a single side by side table
1648	make_file -- generates complete HTML file with a single side by side table
1649
1650	See tools/scripts/diff.py for an example usage of this class.
1651	"""
1652
1653	_file_template = _file_template
1654	_styles = _styles
1655	_table_template = _table_template
1656	_legend = _legend
1657	_default_prefix = 0
1658
1659	def __init__(self,tabsize=8,wrapcolumn=None,linejunk=None,
1660	charjunk=IS_CHARACTER_JUNK):
1661	"""HtmlDiff instance initializer
1662
1663	Arguments:
1664	tabsize -- tab stop spacing, defaults to 8.
1665	wrapcolumn -- column number where lines are broken and wrapped,
1666	defaults to None where lines are not wrapped.
1667	linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
1668	HtmlDiff() to generate the side by side HTML differences). See
1669	ndiff() documentation for argument default values and descriptions.
1670	"""
1671	self._tabsize = tabsize
1672	self._wrapcolumn = wrapcolumn
1673	self._linejunk = linejunk
1674	self._charjunk = charjunk
1675
1676	def make_file(self,fromlines,tolines,fromdesc='',todesc='',context=False,
1677	numlines=5):
1678	"""Returns HTML file of side by side comparison with change highlights
1679
1680	Arguments:
1681	fromlines -- list of "from" lines
1682	tolines -- list of "to" lines
1683	fromdesc -- "from" file column header string
1684	todesc -- "to" file column header string
1685	context -- set to True for contextual differences (defaults to False
1686	which shows full differences).
1687	numlines -- number of context lines. When context is set True,
1688	controls number of lines displayed before and after the change.
1689	When context is False, controls the number of lines to place
1690	the "next" link anchors before the next change (so click of
1691	"next" link jumps to just before the change).
1692	"""
1693
1694	return self._file_template % dict(
1695	styles = self._styles,
1696	legend = self._legend,
1697	table = self.make_table(fromlines,tolines,fromdesc,todesc,
1698	context=context,numlines=numlines))
1699
1700	def _tab_newline_replace(self,fromlines,tolines):
1701	"""Returns from/to line lists with tabs expanded and newlines removed.
1702
1703	Instead of tab characters being replaced by the number of spaces
1704	needed to fill in to the next tab stop, this function will fill
1705	the space with tab characters. This is done so that the difference
1706	algorithms can identify changes in a file when tabs are replaced by
1707	spaces and vice versa. At the end of the HTML generation, the tab
1708	characters will be replaced with a nonbreakable space.
1709	"""
1710	def expand_tabs(line):
1711	# hide real spaces
1712	line = line.replace(' ','\0')
1713	# expand tabs into spaces
1714	line = line.expandtabs(self._tabsize)
1715	# relace spaces from expanded tabs back into tab characters
1716	# (we'll replace them with markup after we do differencing)
1717	line = line.replace(' ','\t')
1718	return line.replace('\0',' ').rstrip('\n')
1719	fromlines = [expand_tabs(line) for line in fromlines]
1720	tolines = [expand_tabs(line) for line in tolines]
1721	return fromlines,tolines
1722
1723	def _split_line(self,data_list,line_num,text):
1724	"""Builds list of text lines by splitting text lines at wrap point
1725
1726	This function will determine if the input text line needs to be
1727	wrapped (split) into separate lines. If so, the first wrap point
1728	will be determined and the first line appended to the output
1729	text line list. This function is used recursively to handle
1730	the second part of the split line to further split it.
1731	"""
1732	# if blank line or context separator, just add it to the output list
1733	if not line_num:
1734	data_list.append((line_num,text))
1735	return
1736
1737	# if line text doesn't need wrapping, just add it to the output list
1738	size = len(text)
1739	max = self._wrapcolumn
1740	if (size <= max) or ((size -(text.count('\0')*3)) <= max):
1741	data_list.append((line_num,text))
1742	return
1743
1744	# scan text looking for the wrap point, keeping track if the wrap
1745	# point is inside markers
1746	i = 0
1747	n = 0
1748	mark = ''
1749	while n < max and i < size:
1750	if text[i] == '\0':
1751	i += 1
1752	mark = text[i]
1753	i += 1
1754	elif text[i] == '\1':
1755	i += 1
1756	mark = ''
1757	else:
1758	i += 1
1759	n += 1
1760
1761	# wrap point is inside text, break it up into separate lines
1762	line1 = text[:i]
1763	line2 = text[i:]
1764
1765	# if wrap point is inside markers, place end marker at end of first
1766	# line and start marker at beginning of second line because each
1767	# line will have its own table tag markup around it.
1768	if mark:
1769	line1 = line1 + '\1'
1770	line2 = '\0' + mark + line2
1771
1772	# tack on first line onto the output list
1773	data_list.append((line_num,line1))
1774
1775	# use this routine again to wrap the remaining text
1776	self._split_line(data_list,'>',line2)
1777
1778	def _line_wrapper(self,diffs):
1779	"""Returns iterator that splits (wraps) mdiff text lines"""
1780
1781	# pull from/to data and flags from mdiff iterator
1782	for fromdata,todata,flag in diffs:
1783	# check for context separators and pass them through
1784	if flag is None:
1785	yield fromdata,todata,flag
1786	continue
1787	(fromline,fromtext),(toline,totext) = fromdata,todata
1788	# for each from/to line split it at the wrap column to form
1789	# list of text lines.
1790	fromlist,tolist = [],[]
1791	self._split_line(fromlist,fromline,fromtext)
1792	self._split_line(tolist,toline,totext)
1793	# yield from/to line in pairs inserting blank lines as
1794	# necessary when one side has more wrapped lines
1795	while fromlist or tolist:
1796	if fromlist:
1797	fromdata = fromlist.pop(0)
1798	else:
1799	fromdata = ('',' ')
1800	if tolist:
1801	todata = tolist.pop(0)
1802	else:
1803	todata = ('',' ')
1804	yield fromdata,todata,flag
1805
1806	def _collect_lines(self,diffs):
1807	"""Collects mdiff output into separate lists
1808
1809	Before storing the mdiff from/to data into a list, it is converted
1810	into a single line of text with HTML markup.
1811	"""
1812
1813	fromlist,tolist,flaglist = [],[],[]
1814	# pull from/to data and flags from mdiff style iterator
1815	for fromdata,todata,flag in diffs:
1816	try:
1817	# store HTML markup of the lines into the lists
1818	fromlist.append(self._format_line(0,flag,*fromdata))
1819	tolist.append(self._format_line(1,flag,*todata))
1820	except TypeError:
1821	# exceptions occur for lines where context separators go
1822	fromlist.append(None)
1823	tolist.append(None)
1824	flaglist.append(flag)
1825	return fromlist,tolist,flaglist
1826
1827	def _format_line(self,side,flag,linenum,text):
1828	"""Returns HTML markup of "from" / "to" text lines
1829
1830	side -- 0 or 1 indicating "from" or "to" text
1831	flag -- indicates if difference on line
1832	linenum -- line number (used for line number column)
1833	text -- line text to be marked up
1834	"""
1835	try:
1836	linenum = '%d' % linenum
1837	id = ' id="%s%s"' % (self._prefix[side],linenum)
1838	except TypeError:
1839	# handle blank lines where linenum is '>' or ''
1840	id = ''
1841	# replace those things that would get confused with HTML symbols
1842	text=text.replace("&","&").replace(">",">").replace("<","<")
1843
1844	# make space non-breakable so they don't get compressed or line wrapped
1845	text = text.replace(' ',' ').rstrip()
1846
1847	return '<td class="diff_header"%s>%s</td><td nowrap="nowrap">%s</td>' \
1848	% (id,linenum,text)
1849
1850	def _make_prefix(self):
1851	"""Create unique anchor prefixes"""
1852
1853	# Generate a unique anchor prefix so multiple tables
1854	# can exist on the same HTML page without conflicts.
1855	fromprefix = "from%d_" % HtmlDiff._default_prefix
1856	toprefix = "to%d_" % HtmlDiff._default_prefix
1857	HtmlDiff._default_prefix += 1
1858	# store prefixes so line format method has access
1859	self._prefix = [fromprefix,toprefix]
1860
1861	def _convert_flags(self,fromlist,tolist,flaglist,context,numlines):
1862	"""Makes list of "next" links"""
1863
1864	# all anchor names will be generated using the unique "to" prefix
1865	toprefix = self._prefix[1]
1866
1867	# process change flags, generating middle column of next anchors/links
1868	next_id = ['']*len(flaglist)
1869	next_href = ['']*len(flaglist)
1870	num_chg, in_change = 0, False
1871	last = 0
1872	for i,flag in enumerate(flaglist):
1873	if flag:
1874	if not in_change:
1875	in_change = True
1876	last = i
1877	# at the beginning of a change, drop an anchor a few lines
1878	# (the context lines) before the change for the previous
1879	# link
1880	i = max([0,i-numlines])
1881	next_id[i] = ' id="difflib_chg_%s_%d"' % (toprefix,num_chg)
1882	# at the beginning of a change, drop a link to the next
1883	# change
1884	num_chg += 1
1885	next_href[last] = '<a href="#difflib_chg_%s_%d">n</a>' % (
1886	toprefix,num_chg)
1887	else:
1888	in_change = False
1889	# check for cases where there is no content to avoid exceptions
1890	if not flaglist:
1891	flaglist = [False]
1892	next_id = ['']
1893	next_href = ['']
1894	last = 0
1895	if context:
1896	fromlist = ['<td></td><td> No Differences Found </td>']
1897	tolist = fromlist
1898	else:
1899	fromlist = tolist = ['<td></td><td> Empty File </td>']
1900	# if not a change on first line, drop a link
1901	if not flaglist[0]:
1902	next_href[0] = '<a href="#difflib_chg_%s_0">f</a>' % toprefix
1903	# redo the last link to link to the top
1904	next_href[last] = '<a href="#difflib_chg_%s_top">t</a>' % (toprefix)
1905
1906	return fromlist,tolist,flaglist,next_href,next_id
1907
1908	def make_table(self,fromlines,tolines,fromdesc='',todesc='',context=False,
1909	numlines=5):
1910	"""Returns HTML table of side by side comparison with change highlights
1911
1912	Arguments:
1913	fromlines -- list of "from" lines
1914	tolines -- list of "to" lines
1915	fromdesc -- "from" file column header string
1916	todesc -- "to" file column header string
1917	context -- set to True for contextual differences (defaults to False
1918	which shows full differences).
1919	numlines -- number of context lines. When context is set True,
1920	controls number of lines displayed before and after the change.
1921	When context is False, controls the number of lines to place
1922	the "next" link anchors before the next change (so click of
1923	"next" link jumps to just before the change).
1924	"""
1925
1926	# make unique anchor prefixes so that multiple tables may exist
1927	# on the same page without conflict.
1928	self._make_prefix()
1929
1930	# change tabs to spaces before it gets more difficult after we insert
1931	# markkup
1932	fromlines,tolines = self._tab_newline_replace(fromlines,tolines)
1933
1934	# create diffs iterator which generates side by side from/to data
1935	if context:
1936	context_lines = numlines
1937	else:
1938	context_lines = None
1939	diffs = _mdiff(fromlines,tolines,context_lines,linejunk=self._linejunk,
1940	charjunk=self._charjunk)
1941
1942	# set up iterator to wrap lines that exceed desired width
1943	if self._wrapcolumn:
1944	diffs = self._line_wrapper(diffs)
1945
1946	# collect up from/to lines and flags into lists (also format the lines)
1947	fromlist,tolist,flaglist = self._collect_lines(diffs)
1948
1949	# process change flags, generating middle column of next anchors/links
1950	fromlist,tolist,flaglist,next_href,next_id = self._convert_flags(
1951	fromlist,tolist,flaglist,context,numlines)
1952
1953	s = []
1954	fmt = ' <tr><td class="diff_next"%s>%s</td>%s' + \
1955	'<td class="diff_next">%s</td>%s</tr>\n'
1956	for i in range(len(flaglist)):
1957	if flaglist[i] is None:
1958	# mdiff yields None on separator lines skip the bogus ones
1959	# generated for the first line
1960	if i > 0:
1961	s.append(' </tbody> \n <tbody>\n')
1962	else:
1963	s.append( fmt % (next_id[i],next_href[i],fromlist[i],
1964	next_href[i],tolist[i]))
1965	if fromdesc or todesc:
1966	header_row = '<thead><tr>%s%s%s%s</tr></thead>' % (
1967	'<th class="diff_next"><br /></th>',
1968	'<th colspan="2" class="diff_header">%s</th>' % fromdesc,
1969	'<th class="diff_next"><br /></th>',
1970	'<th colspan="2" class="diff_header">%s</th>' % todesc)
1971	else:
1972	header_row = ''
1973
1974	table = self._table_template % dict(
1975	data_rows=''.join(s),
1976	header_row=header_row,
1977	prefix=self._prefix[1])
1978
1979	return table.replace('\0+','<span class="diff_add">'). \
1980	replace('\0-','<span class="diff_sub">'). \
1981	replace('\0^','<span class="diff_chg">'). \
1982	replace('\1','</span>'). \
1983	replace('\t',' ')
1984
1985	del re
1986
1987	def restore(delta, which):
1988	r"""
1989	Generate one of the two sequences that generated a delta.
1990
1991	Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
1992	lines originating from file 1 or 2 (parameter `which`), stripping off line
1993	prefixes.
1994
1995	Examples:
1996
1997	>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
1998	... 'ore\ntree\nemu\n'.splitlines(1))
1999	>>> diff = list(diff)
2000	>>> print ''.join(restore(diff, 1)),
2001	one
2002	two
2003	three
2004	>>> print ''.join(restore(diff, 2)),
2005	ore
2006	tree
2007	emu
2008	"""
2009	try:
2010	tag = {1: "- ", 2: "+ "}[int(which)]
2011	except KeyError:
2012	raise ValueError, ('unknown delta choice (must be 1 or 2): %r'
2013	% which)
2014	prefixes = (" ", tag)
2015	for line in delta:
2016	if line[:2] in prefixes:
2017	yield line[2:]
2018
2019	def _test():
2020	import doctest, difflib
2021	return doctest.testmod(difflib)
2022
2023	if __name__ == "__main__":
2024	_test()

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: python/trunk/Lib/difflib.py@ 381

Download in other formats: