Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

cvs2cl.pl@ 3040

Visit:

Last change on this file since 3040 was 3031, checked in by bird, 19 years ago
flex 2.5.33.
File size: 61.0 KB

Line
1	#!/bin/sh
2	exec perl -w -x $0 ${1+"$@"} # -- mode: perl; perl-indent-level: 2; --
3	#!perl -w
4
5
6	##############################################################
7	### ###
8	### cvs2cl.pl: produce ChangeLog(s) from `cvs log` output. ###
9	### ###
10	##############################################################
11
12	## $Revision: 1.1 $
13	## $Date: 2003/01/31 17:12:18 $
14	## $Author: wlestes $
15	##
16	## (C) 2001,2002,2003 Martyn J. Pearce <fluffy@cpan.org>, under the GNU GPL.
17	## (C) 1999 Karl Fogel <kfogel@red-bean.com>, under the GNU GPL.
18	##
19	## (Extensively hacked on by Melissa O'Neill <oneill@cs.sfu.ca>.)
20	##
21	## cvs2cl.pl is free software; you can redistribute it and/or modify
22	## it under the terms of the GNU General Public License as published by
23	## the Free Software Foundation; either version 2, or (at your option)
24	## any later version.
25	##
26	## cvs2cl.pl is distributed in the hope that it will be useful,
27	## but WITHOUT ANY WARRANTY; without even the implied warranty of
28	## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
29	## GNU General Public License for more details.
30	##
31	## You may have received a copy of the GNU General Public License
32	## along with cvs2cl.pl; see the file COPYING. If not, write to the
33	## Free Software Foundation, Inc., 59 Temple Place - Suite 330,
34	## Boston, MA 02111-1307, USA.
35
36
37
38	use strict;
39	use Text::Wrap;
40	use Time::Local;
41	use File::Basename;
42
43
44
45	# The Plan:
46	#
47	# Read in the logs for multiple files, spit out a nice ChangeLog that
48	# mirrors the information entered during `cvs commit'.
49	#
50	# The problem presents some challenges. In an ideal world, we could
51	# detect files with the same author, log message, and checkin time --
52	# each <filelist, author, time, logmessage> would be a changelog entry.
53	# We'd sort them; and spit them out. Unfortunately, CVS is not atomic
54	# so checkins can span a range of times. Also, the directory structure
55	# could be hierarchical.
56	#
57	# Another question is whether we really want to have the ChangeLog
58	# exactly reflect commits. An author could issue two related commits,
59	# with different log entries, reflecting a single logical change to the
60	# source. GNU style ChangeLogs group these under a single author/date.
61	# We try to do the same.
62	#
63	# So, we parse the output of `cvs log', storing log messages in a
64	# multilevel hash that stores the mapping:
65	# directory => author => time => message => filelist
66	# As we go, we notice "nearby" commit times and store them together
67	# (i.e., under the same timestamp), so they appear in the same log
68	# entry.
69	#
70	# When we've read all the logs, we twist this mapping into
71	# a time => author => message => filelist mapping for each directory.
72	#
73	# If we're not using the `--distributed' flag, the directory is always
74	# considered to be `./', even as descend into subdirectories.
75
76
77
78	############### Globals ################
79
80	# What we run to generate it:
81	my $Log_Source_Command = "cvs log";
82
83	# In case we have to print it out:
84	my $VERSION = '$Revision: 1.1 $';
85	$VERSION =~ s/\S+\s+(\S+)\s+\S+/$1/;
86
87	## Vars set by options:
88
89	# Print debugging messages?
90	my $Debug = 0;
91
92	# Just show version and exit?
93	my $Print_Version = 0;
94
95	# Just print usage message and exit?
96	my $Print_Usage = 0;
97
98	# Single top-level ChangeLog, or one per subdirectory?
99	my $Distributed = 0;
100
101	# What file should we generate (defaults to "ChangeLog")?
102	my $Log_File_Name = "ChangeLog";
103
104	# Grab most recent entry date from existing ChangeLog file, just add
105	# to that ChangeLog.
106	my $Cumulative = 0;
107
108	# Expand usernames to email addresses based on a map file?
109	my $User_Map_File = "";
110
111	# Output to a file or to stdout?
112	my $Output_To_Stdout = 0;
113
114	# Eliminate empty log messages?
115	my $Prune_Empty_Msgs = 0;
116
117	# Tags of which not to output
118	my @ignore_tags;
119
120	# Don't call Text::Wrap on the body of the message
121	my $No_Wrap = 0;
122
123	# Separates header from log message. Code assumes it is either " " or
124	# "\n\n", so if there's ever an option to set it to something else,
125	# make sure to go through all conditionals that use this var.
126	my $After_Header = " ";
127
128	# XML Encoding
129	my $XML_Encoding = '';
130
131	# Format more for programs than for humans.
132	my $XML_Output = 0;
133
134	# Do some special tweaks for log data that was written in FSF
135	# ChangeLog style.
136	my $FSF_Style = 0;
137
138	# Show times in UTC instead of local time
139	my $UTC_Times = 0;
140
141	# Show day of week in output?
142	my $Show_Day_Of_Week = 0;
143
144	# Show revision numbers in output?
145	my $Show_Revisions = 0;
146
147	# Show tags (symbolic names) in output?
148	my $Show_Tags = 0;
149
150	# Show tags separately in output?
151	my $Show_Tag_Dates = 0;
152
153	# Show branches by symbolic name in output?
154	my $Show_Branches = 0;
155
156	# Show only revisions on these branches or their ancestors.
157	my @Follow_Branches;
158
159	# Don't bother with files matching this regexp.
160	my @Ignore_Files;
161
162	# How exactly we match entries. We definitely want "o",
163	# and user might add "i" by using --case-insensitive option.
164	my $Case_Insensitive = 0;
165
166	# Maybe only show log messages matching a certain regular expression.
167	my $Regexp_Gate = "";
168
169	# Pass this global option string along to cvs, to the left of `log':
170	my $Global_Opts = "";
171
172	# Pass this option string along to the cvs log subcommand:
173	my $Command_Opts = "";
174
175	# Read log output from stdin instead of invoking cvs log?
176	my $Input_From_Stdin = 0;
177
178	# Don't show filenames in output.
179	my $Hide_Filenames = 0;
180
181	# Max checkin duration. CVS checkin is not atomic, so we may have checkin
182	# times that span a range of time. We assume that checkins will last no
183	# longer than $Max_Checkin_Duration seconds, and that similarly, no
184	# checkins will happen from the same users with the same message less
185	# than $Max_Checkin_Duration seconds apart.
186	my $Max_Checkin_Duration = 180;
187
188	# What to put at the front of [each] ChangeLog.
189	my $ChangeLog_Header = "";
190
191	# Whether to enable 'delta' mode, and for what start/end tags.
192	my $Delta_Mode = 0;
193	my $Delta_From = "";
194	my $Delta_To = "";
195
196	## end vars set by options.
197
198	# latest observed times for the start/end tags in delta mode
199	my $Delta_StartTime = 0;
200	my $Delta_EndTime = 0;
201
202	# In 'cvs log' output, one long unbroken line of equal signs separates
203	# files:
204	my $file_separator = "======================================="
205	. "======================================";
206
207	# In 'cvs log' output, a shorter line of dashes separates log messages
208	# within a file:
209	my $logmsg_separator = "----------------------------";
210
211	############### End globals ############
212
213
214
215	&parse_options ();
216	&derive_change_log ();
217
218
219
220	### Everything below is subroutine definitions. ###
221
222	# If accumulating, grab the boundary date from pre-existing ChangeLog.
223	sub maybe_grab_accumulation_date ()
224	{
225	if (! $Cumulative) {
226	return "";
227	}
228
229	# else
230
231	open (LOG, "$Log_File_Name")
232	or die ("trouble opening $Log_File_Name for reading ($!)");
233
234	my $boundary_date;
235	while (<LOG>)
236	{
237	if (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/)
238	{
239	$boundary_date = "$1";
240	last;
241	}
242	}
243
244	close (LOG);
245	return $boundary_date;
246	}
247
248	# Fills up a ChangeLog structure in the current directory.
249	sub derive_change_log ()
250	{
251	# See "The Plan" above for a full explanation.
252
253	my %grand_poobah;
254
255	my $file_full_path;
256	my $time;
257	my $revision;
258	my $author;
259	my $msg_txt;
260	my $detected_file_separator;
261
262	my %tag_date_printed;
263
264	# Might be adding to an existing ChangeLog
265	my $accumulation_date = &maybe_grab_accumulation_date ();
266	if ($accumulation_date) {
267	# Insert -d immediately after 'cvs log'
268	my $Log_Date_Command = "-d\'>${accumulation_date}\'";
269	$Log_Source_Command =~ s/(^.log\S)/$1 $Log_Date_Command/;
270	&debug ("(adding log msg starting from $accumulation_date)\n");
271	}
272
273	# We might be expanding usernames
274	my %usermap;
275
276	# In general, it's probably not very maintainable to use state
277	# variables like this to tell the loop what it's doing at any given
278	# moment, but this is only the first one, and if we never have more
279	# than a few of these, it's okay.
280	my $collecting_symbolic_names = 0;
281	my %symbolic_names; # Where tag names get stored.
282	my %branch_names; # We'll grab branch names while we're at it.
283	my %branch_numbers; # Save some revisions for @Follow_Branches
284	my @branch_roots; # For showing which files are branch ancestors.
285
286	# Bleargh. Compensate for a deficiency of custom wrapping.
287	if (($After_Header ne " ") and $FSF_Style)
288	{
289	$After_Header .= "\t";
290	}
291
292	if (! $Input_From_Stdin) {
293	&debug ("(run \"${Log_Source_Command}\")\n");
294	open (LOG_SOURCE, "$Log_Source_Command \|")
295	or die "unable to run \"${Log_Source_Command}\"";
296	}
297	else {
298	open (LOG_SOURCE, "-") or die "unable to open stdin for reading";
299	}
300
301	binmode LOG_SOURCE;
302
303	%usermap = &maybe_read_user_map_file ();
304
305	while (<LOG_SOURCE>)
306	{
307	# Canonicalize line endings
308	s/\r$//;
309	# If on a new file and don't see filename, skip until we find it, and
310	# when we find it, grab it.
311	if ((! (defined $file_full_path)) and /^Working file: (.*)/)
312	{
313	$file_full_path = $1;
314	if (@Ignore_Files)
315	{
316	my $base;
317	($base, undef, undef) = fileparse ($file_full_path);
318	# Ouch, I wish trailing operators in regexps could be
319	# evaluated on the fly!
320	if ($Case_Insensitive) {
321	if (grep ($file_full_path =~ m\|$_\|i, @Ignore_Files)) {
322	undef $file_full_path;
323	}
324	}
325	elsif (grep ($file_full_path =~ m\|$_\|, @Ignore_Files)) {
326	undef $file_full_path;
327	}
328	}
329	next;
330	}
331
332	# Just spin wheels if no file defined yet.
333	next if (! $file_full_path);
334
335	# Collect tag names in case we're asked to print them in the output.
336	if (/^symbolic names:$/) {
337	$collecting_symbolic_names = 1;
338	next; # There's no more info on this line, so skip to next
339	}
340	if ($collecting_symbolic_names)
341	{
342	# All tag names are listed with whitespace in front in cvs log
343	# output; so if see non-whitespace, then we're done collecting.
344	if (/^\S/) {
345	$collecting_symbolic_names = 0;
346	}
347	else # we're looking at a tag name, so parse & store it
348	{
349	# According to the Cederqvist manual, in node "Tags", tag
350	# names must start with an uppercase or lowercase letter and
351	# can contain uppercase and lowercase letters, digits, `-',
352	# and `_'. However, it's not our place to enforce that, so
353	# we'll allow anything CVS hands us to be a tag:
354	/^\s+([^:]+): ([\d.]+)$/;
355	my $tag_name = $1;
356	my $tag_rev = $2;
357
358	# A branch number either has an odd number of digit sections
359	# (and hence an even number of dots), or has ".0." as the
360	# second-to-last digit section. Test for these conditions.
361	my $real_branch_rev = "";
362	if (($tag_rev =~ /^(\d+\.\d+\.)+\d+$/) # Even number of dots...
363	and (! ($tag_rev =~ /^(1\.)+1$/))) # ...but not "1.[1.]1"
364	{
365	$real_branch_rev = $tag_rev;
366	}
367	elsif ($tag_rev =~ /(\d+\.(\d+\.)+)0.(\d+)/) # Has ".0."
368	{
369	$real_branch_rev = $1 . $3;
370	}
371	# If we got a branch, record its number.
372	if ($real_branch_rev)
373	{
374	$branch_names{$real_branch_rev} = $tag_name;
375	if (@Follow_Branches) {
376	if (grep ($_ eq $tag_name, @Follow_Branches)) {
377	$branch_numbers{$tag_name} = $real_branch_rev;
378	}
379	}
380	}
381	else {
382	# Else it's just a regular (non-branch) tag.
383	push (@{$symbolic_names{$tag_rev}}, $tag_name);
384	}
385	}
386	}
387	# End of code for collecting tag names.
388
389	# If have file name, but not revision, and see revision, then grab
390	# it. (We collect unconditionally, even though we may or may not
391	# ever use it.)
392	if ((! (defined $revision)) and (/^revision (\d+\.[\d.]+)/))
393	{
394	$revision = $1;
395
396	if (@Follow_Branches)
397	{
398	foreach my $branch (@Follow_Branches)
399	{
400	# Special case for following trunk revisions
401	if (($branch =~ /^trunk$/i) and ($revision =~ /^[0-9]+\.[0-9]+$/))
402	{
403	goto dengo;
404	}
405
406	my $branch_number = $branch_numbers{$branch};
407	if ($branch_number)
408	{
409	# Are we on one of the follow branches or an ancestor of
410	# same?
411	#
412	# If this revision is a prefix of the branch number, or
413	# possibly is less in the minormost number, OR if this
414	# branch number is a prefix of the revision, then yes.
415	# Otherwise, no.
416	#
417	# So below, we determine if any of those conditions are
418	# met.
419
420	# Trivial case: is this revision on the branch?
421	# (Compare this way to avoid regexps that screw up Emacs
422	# indentation, argh.)
423	if ((substr ($revision, 0, ((length ($branch_number)) + 1)))
424	eq ($branch_number . "."))
425	{
426	goto dengo;
427	}
428	# Non-trivial case: check if rev is ancestral to branch
429	elsif ((length ($branch_number)) > (length ($revision)))
430	{
431	$revision =~ /^((?:\d+\.)+)(\d+)$/;
432	my $r_left = $1; # still has the trailing "."
433	my $r_end = $2;
434
435	$branch_number =~ /^((?:\d+\.)+)(\d+)\.\d+$/;
436	my $b_left = $1; # still has trailing "."
437	my $b_mid = $2; # has no trailing "."
438
439	if (($r_left eq $b_left)
440	&& ($r_end <= $b_mid))
441	{
442	goto dengo;
443	}
444	}
445	}
446	}
447	}
448	else # (! @Follow_Branches)
449	{
450	next;
451	}
452
453	# Else we are following branches, but this revision isn't on the
454	# path. So skip it.
455	undef $revision;
456	dengo:
457	next;
458	}
459
460	# If we don't have a revision right now, we couldn't possibly
461	# be looking at anything useful.
462	if (! (defined ($revision))) {
463	$detected_file_separator = /^$file_separator$/o;
464	if ($detected_file_separator) {
465	# No revisions for this file; can happen, e.g. "cvs log -d DATE"
466	goto CLEAR;
467	}
468	else {
469	next;
470	}
471	}
472
473	# If have file name but not date and author, and see date or
474	# author, then grab them:
475	unless (defined $time)
476	{
477	if (/^date: .*/)
478	{
479	($time, $author) = &parse_date_and_author ($_);
480	if (defined ($usermap{$author}) and $usermap{$author}) {
481	$author = $usermap{$author};
482	}
483	}
484	else {
485	$detected_file_separator = /^$file_separator$/o;
486	if ($detected_file_separator) {
487	# No revisions for this file; can happen, e.g. "cvs log -d DATE"
488	goto CLEAR;
489	}
490	}
491	# If the date/time/author hasn't been found yet, we couldn't
492	# possibly care about anything we see. So skip:
493	next;
494	}
495
496	# A "branches: ..." line here indicates that one or more branches
497	# are rooted at this revision. If we're showing branches, then we
498	# want to show that fact as well, so we collect all the branches
499	# that this is the latest ancestor of and store them in
500	# @branch_roots. Just for reference, the format of the line we're
501	# seeing at this point is:
502	#
503	# branches: 1.5.2; 1.5.4; ...;
504	#
505	# Okay, here goes:
506
507	if (/^branches:\s+(.*);$/)
508	{
509	if ($Show_Branches)
510	{
511	my $lst = $1;
512	$lst =~ s/(1\.)+1;\|(1\.)+1$//; # ignore the trivial branch 1.1.1
513	if ($lst) {
514	@branch_roots = split (/;\s+/, $lst);
515	}
516	else {
517	undef @branch_roots;
518	}
519	next;
520	}
521	else
522	{
523	# Ugh. This really bothers me. Suppose we see a log entry
524	# like this:
525	#
526	# ----------------------------
527	# revision 1.1
528	# date: 1999/10/17 03:07:38; author: jrandom; state: Exp;
529	# branches: 1.1.2;
530	# Intended first line of log message begins here.
531	# ----------------------------
532	#
533	# The question is, how we can tell the difference between that
534	# log message and a two-line log message whose first line is
535	#
536	# "branches: 1.1.2;"
537	#
538	# See the problem? The output of "cvs log" is inherently
539	# ambiguous.
540	#
541	# For now, we punt: we liberally assume that people don't
542	# write log messages like that, and just toss a "branches:"
543	# line if we see it but are not showing branches. I hope no
544	# one ever loses real log data because of this.
545	next;
546	}
547	}
548
549	# If have file name, time, and author, then we're just grabbing
550	# log message texts:
551	$detected_file_separator = /^$file_separator$/o;
552	if ($detected_file_separator && ! (defined $revision)) {
553	# No revisions for this file; can happen, e.g. "cvs log -d DATE"
554	goto CLEAR;
555	}
556	unless ($detected_file_separator \|\| /^$logmsg_separator$/o)
557	{
558	$msg_txt .= $_; # Normally, just accumulate the message...
559	next;
560	}
561	# ... until a msg separator is encountered:
562	# Ensure the message contains something:
563	if ((! $msg_txt)
564	\|\| ($msg_txt =~ /^\s\.\s$\|^\s*$/)
565	\|\| ($msg_txt =~ /\\\* empty log message \\\*/))
566	{
567	if ($Prune_Empty_Msgs) {
568	goto CLEAR;
569	}
570	# else
571	$msg_txt = "[no log message]\n";
572	}
573
574	### Store it all in the Grand Poobah:
575	{
576	my $dir_key; # key into %grand_poobah
577	my %qunk; # complicated little jobbie, see below
578
579	# Each revision of a file has a little data structure (a `qunk')
580	# associated with it. That data structure holds not only the
581	# file's name, but any additional information about the file
582	# that might be needed in the output, such as the revision
583	# number, tags, branches, etc. The reason to have these things
584	# arranged in a data structure, instead of just appending them
585	# textually to the file's name, is that we may want to do a
586	# little rearranging later as we write the output. For example,
587	# all the files on a given tag/branch will go together, followed
588	# by the tag in parentheses (so trunk or otherwise non-tagged
589	# files would go at the end of the file list for a given log
590	# message). This rearrangement is a lot easier to do if we
591	# don't have to reparse the text.
592	#
593	# A qunk looks like this:
594	#
595	# {
596	# filename => "hello.c",
597	# revision => "1.4.3.2",
598	# time => a timegm() return value (moment of commit)
599	# tags => [ "tag1", "tag2", ... ],
600	# branch => "branchname" # There should be only one, right?
601	# branchroots => [ "branchtag1", "branchtag2", ... ]
602	# }
603
604	if ($Distributed) {
605	# Just the basename, don't include the path.
606	($qunk{'filename'}, $dir_key, undef) = fileparse ($file_full_path);
607	}
608	else {
609	$dir_key = "./";
610	$qunk{'filename'} = $file_full_path;
611	}
612
613	# This may someday be used in a more sophisticated calculation
614	# of what other files are involved in this commit. For now, we
615	# don't use it much except for delta mode, because the
616	# common-commit-detection algorithm is hypothesized to be
617	# "good enough" as it stands.
618	$qunk{'time'} = $time;
619
620	# We might be including revision numbers and/or tags and/or
621	# branch names in the output. Most of the code from here to
622	# loop-end deals with organizing these in qunk.
623
624	$qunk{'revision'} = $revision;
625
626	# Grab the branch, even though we may or may not need it:
627	$qunk{'revision'} =~ /((?:\d+\.)+)\d+/;
628	my $branch_prefix = $1;
629	$branch_prefix =~ s/\.$//; # strip off final dot
630	if ($branch_names{$branch_prefix}) {
631	$qunk{'branch'} = $branch_names{$branch_prefix};
632	}
633
634	# If there's anything in the @branch_roots array, then this
635	# revision is the root of at least one branch. We'll display
636	# them as branch names instead of revision numbers, the
637	# substitution for which is done directly in the array:
638	if (@branch_roots) {
639	my @roots = map { $branch_names{$_} } @branch_roots;
640	$qunk{'branchroots'} = \@roots;
641	}
642
643	# Save tags too.
644	if (defined ($symbolic_names{$revision})) {
645	$qunk{'tags'} = $symbolic_names{$revision};
646	delete $symbolic_names{$revision};
647
648	# If we're in 'delta' mode, update the latest observed
649	# times for the beginning and ending tags, and
650	# when we get around to printing output, we will simply restrict
651	# ourselves to that timeframe...
652
653	if ($Delta_Mode) {
654	if (($time > $Delta_StartTime) &&
655	(grep { $_ eq $Delta_From } @{$qunk{'tags'}}))
656	{
657	$Delta_StartTime = $time;
658	}
659
660	if (($time > $Delta_EndTime) &&
661	(grep { $_ eq $Delta_To } @{$qunk{'tags'}}))
662	{
663	$Delta_EndTime = $time;
664	}
665	}
666	}
667
668	# Add this file to the list
669	# (We use many spoonfuls of autovivication magic. Hashes and arrays
670	# will spring into existence if they aren't there already.)
671
672	&debug ("(pushing log msg for ${dir_key}$qunk{'filename'})\n");
673
674	# Store with the files in this commit. Later we'll loop through
675	# again, making sure that revisions with the same log message
676	# and nearby commit times are grouped together as one commit.
677	push (@{$grand_poobah{$dir_key}{$author}{$time}{$msg_txt}}, \%qunk);
678	}
679
680	CLEAR:
681	# Make way for the next message
682	undef $msg_txt;
683	undef $time;
684	undef $revision;
685	undef $author;
686	undef @branch_roots;
687
688	# Maybe even make way for the next file:
689	if ($detected_file_separator) {
690	undef $file_full_path;
691	undef %branch_names;
692	undef %branch_numbers;
693	undef %symbolic_names;
694	}
695	}
696
697	close (LOG_SOURCE);
698
699	### Process each ChangeLog
700
701	while (my ($dir,$authorhash) = each %grand_poobah)
702	{
703	&debug ("DOING DIR: $dir\n");
704
705	# Here we twist our hash around, from being
706	# author => time => message => filelist
707	# in %$authorhash to
708	# time => author => message => filelist
709	# in %changelog.
710	#
711	# This is also where we merge entries. The algorithm proceeds
712	# through the timeline of the changelog with a sliding window of
713	# $Max_Checkin_Duration seconds; within that window, entries that
714	# have the same log message are merged.
715	#
716	# (To save space, we zap %$authorhash after we've copied
717	# everything out of it.)
718
719	my %changelog;
720	while (my ($author,$timehash) = each %$authorhash)
721	{
722	my $lasttime;
723	my %stamptime;
724	foreach my $time (sort {$main::a <=> $main::b} (keys %$timehash))
725	{
726	my $msghash = $timehash->{$time};
727	while (my ($msg,$qunklist) = each %$msghash)
728	{
729	my $stamptime = $stamptime{$msg};
730	if ((defined $stamptime)
731	and (($time - $stamptime) < $Max_Checkin_Duration)
732	and (defined $changelog{$stamptime}{$author}{$msg}))
733	{
734	push(@{$changelog{$stamptime}{$author}{$msg}}, @$qunklist);
735	}
736	else {
737	$changelog{$time}{$author}{$msg} = $qunklist;
738	$stamptime{$msg} = $time;
739	}
740	}
741	}
742	}
743	undef (%$authorhash);
744
745	### Now we can write out the ChangeLog!
746
747	my ($logfile_here, $logfile_bak, $tmpfile);
748
749	if (! $Output_To_Stdout) {
750	$logfile_here = $dir . $Log_File_Name;
751	$logfile_here =~ s/^\.\/\//\//; # fix any leading ".//" problem
752	$tmpfile = "${logfile_here}.cvs2cl$$.tmp";
753	$logfile_bak = "${logfile_here}.bak";
754
755	open (LOG_OUT, ">$tmpfile") or die "Unable to open \"$tmpfile\"";
756	}
757	else {
758	open (LOG_OUT, ">-") or die "Unable to open stdout for writing";
759	}
760
761	print LOG_OUT $ChangeLog_Header;
762
763	if ($XML_Output) {
764	my $encoding =
765	length $XML_Encoding ? qq'encoding="$XML_Encoding"' : '';
766	my $version = 'version="1.0"';
767	my $declaration =
768	sprintf '<?xml %s?>', join ' ', grep length, $version, $encoding;
769	my $root =
770	'<changelog xmlns="http://www.red-bean.com/xmlns/cvs2cl/">';
771	print LOG_OUT "$declaration\n\n$root\n\n";
772	}
773
774	foreach my $time (sort {$main::b <=> $main::a} (keys %changelog))
775	{
776	next if ($Delta_Mode &&
777	(($time <= $Delta_StartTime) \|\|
778	($time > $Delta_EndTime && $Delta_EndTime)));
779
780	# Set up the date/author line.
781	# kff todo: do some more XML munging here, on the header
782	# part of the entry:
783	my ($ignore,$min,$hour,$mday,$mon,$year,$wday)
784	= $UTC_Times ? gmtime($time) : localtime($time);
785
786	# XML output includes everything else, we might as well make
787	# it always include Day Of Week too, for consistency.
788	if ($Show_Day_Of_Week or $XML_Output) {
789	$wday = ("Sunday", "Monday", "Tuesday", "Wednesday",
790	"Thursday", "Friday", "Saturday")[$wday];
791	$wday = ($XML_Output) ? "<weekday>${wday}</weekday>\n" : " $wday";
792	}
793	else {
794	$wday = "";
795	}
796
797	my $authorhash = $changelog{$time};
798	if ($Show_Tag_Dates) {
799	my %tags;
800	while (my ($author,$mesghash) = each %$authorhash) {
801	while (my ($msg,$qunk) = each %$mesghash) {
802	foreach my $qunkref2 (@$qunk) {
803	if (defined ($$qunkref2{'tags'})) {
804	foreach my $tag (@{$$qunkref2{'tags'}}) {
805	$tags{$tag} = 1;
806	}
807	}
808	}
809	}
810	}
811	foreach my $tag (keys %tags) {
812	if (!defined $tag_date_printed{$tag}) {
813	$tag_date_printed{$tag} = $time;
814	if ($XML_Output) {
815	# NOT YET DONE
816	}
817	else {
818	printf LOG_OUT ("%4u-%02u-%02u${wday} %02u:%02u tag %s\n\n",
819	$year+1900, $mon+1, $mday, $hour, $min, $tag);
820	}
821	}
822	}
823	}
824	while (my ($author,$mesghash) = each %$authorhash)
825	{
826	# If XML, escape in outer loop to avoid compound quoting:
827	if ($XML_Output) {
828	$author = &xml_escape ($author);
829	}
830
831	FOOBIE:
832	while (my ($msg,$qunklist) = each %$mesghash)
833	{
834	## MJP: 19.xii.01 : Exclude @ignore_tags
835	for my $ignore_tag (@ignore_tags) {
836	next FOOBIE
837	if grep $_ eq $ignore_tag, map(@{$_->{tags}},
838	grep(defined $_->{tags},
839	@$qunklist));
840	}
841	## MJP: 19.xii.01 : End exclude @ignore_tags
842
843	my $files = &pretty_file_list ($qunklist);
844	my $header_line; # date and author
845	my $body; # see below
846	my $wholething; # $header_line + $body
847
848	if ($XML_Output) {
849	$header_line =
850	sprintf ("<date>%4u-%02u-%02u</date>\n"
851	. "${wday}"
852	. "<time>%02u:%02u</time>\n"
853	. "<author>%s</author>\n",
854	$year+1900, $mon+1, $mday, $hour, $min, $author);
855	}
856	else {
857	$header_line =
858	sprintf ("%4u-%02u-%02u${wday} %02u:%02u %s\n\n",
859	$year+1900, $mon+1, $mday, $hour, $min, $author);
860	}
861
862	$Text::Wrap::huge = 'overflow'
863	if $Text::Wrap::VERSION >= 2001.0130;
864	# Reshape the body according to user preferences.
865	if ($XML_Output)
866	{
867	$msg = &preprocess_msg_text ($msg);
868	$body = $files . $msg;
869	}
870	elsif ($No_Wrap)
871	{
872	$msg = &preprocess_msg_text ($msg);
873	$files = wrap ("\t", " ", "$files");
874	$msg =~ s/\n(.*)/\n\t$1/g;
875	unless ($After_Header eq " ") {
876	$msg =~ s/^(.*)/\t$1/g;
877	}
878	$body = $files . $After_Header . $msg;
879	}
880	else # do wrapping, either FSF-style or regular
881	{
882	if ($FSF_Style)
883	{
884	$files = wrap ("\t", " ", "$files");
885
886	my $files_last_line_len = 0;
887	if ($After_Header eq " ")
888	{
889	$files_last_line_len = &last_line_len ($files);
890	$files_last_line_len += 1; # for $After_Header
891	}
892
893	$msg = &wrap_log_entry
894	($msg, "\t", 69 - $files_last_line_len, 69);
895	$body = $files . $After_Header . $msg;
896	}
897	else # not FSF-style
898	{
899	$msg = &preprocess_msg_text ($msg);
900	$body = $files . $After_Header . $msg;
901	$body = wrap ("\t", " ", "$body");
902	}
903	}
904
905	$wholething = $header_line . $body;
906
907	if ($XML_Output) {
908	$wholething = "<entry>\n${wholething}</entry>\n";
909	}
910
911	# One last check: make sure it passes the regexp test, if the
912	# user asked for that. We have to do it here, so that the
913	# test can match against information in the header as well
914	# as in the text of the log message.
915
916	# How annoying to duplicate so much code just because I
917	# can't figure out a way to evaluate scalars on the trailing
918	# operator portion of a regular expression. Grrr.
919	if ($Case_Insensitive) {
920	unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/oi)) {
921	print LOG_OUT "${wholething}\n";
922	}
923	}
924	else {
925	unless ($Regexp_Gate && ($wholething !~ /$Regexp_Gate/o)) {
926	print LOG_OUT "${wholething}\n";
927	}
928	}
929	}
930	}
931	}
932
933	if ($XML_Output) {
934	print LOG_OUT "</changelog>\n";
935	}
936
937	close (LOG_OUT);
938
939	if (! $Output_To_Stdout)
940	{
941	# If accumulating, append old data to new before renaming. But
942	# don't append the most recent entry, since it's already in the
943	# new log due to CVS's idiosyncratic interpretation of "log -d".
944	if ($Cumulative && -f $logfile_here)
945	{
946	open (NEW_LOG, ">>$tmpfile")
947	or die "trouble appending to $tmpfile ($!)";
948
949	open (OLD_LOG, "<$logfile_here")
950	or die "trouble reading from $logfile_here ($!)";
951
952	my $started_first_entry = 0;
953	my $passed_first_entry = 0;
954	while (<OLD_LOG>)
955	{
956	if (! $passed_first_entry)
957	{
958	if ((! $started_first_entry)
959	&& /^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
960	$started_first_entry = 1;
961	}
962	elsif (/^(\d\d\d\d-\d\d-\d\d\s+\d\d:\d\d)/) {
963	$passed_first_entry = 1;
964	print NEW_LOG $_;
965	}
966	}
967	else {
968	print NEW_LOG $_;
969	}
970	}
971
972	close (NEW_LOG);
973	close (OLD_LOG);
974	}
975
976	if (-f $logfile_here) {
977	rename ($logfile_here, $logfile_bak);
978	}
979	rename ($tmpfile, $logfile_here);
980	}
981	}
982	}
983
984	sub parse_date_and_author ()
985	{
986	# Parses the date/time and author out of a line like:
987	#
988	# date: 1999/02/19 23:29:05; author: apharris; state: Exp;
989
990	my $line = shift;
991
992	my ($year, $mon, $mday, $hours, $min, $secs, $author) = $line =~
993	m#(\d+)/(\d+)/(\d+)\s+(\d+):(\d+):(\d+);\s+author:\s+([^;]+);#
994	or die "Couldn't parse date ``$line''";
995	die "Bad date or Y2K issues" unless ($year > 1969 and $year < 2258);
996	# Kinda arbitrary, but useful as a sanity check
997	my $time = timegm($secs,$min,$hours,$mday,$mon-1,$year-1900);
998
999	return ($time, $author);
1000	}
1001
1002	# Here we take a bunch of qunks and convert them into printed
1003	# summary that will include all the information the user asked for.
1004	sub pretty_file_list ()
1005	{
1006	if ($Hide_Filenames and (! $XML_Output)) {
1007	return "";
1008	}
1009
1010	my $qunksref = shift;
1011	my @qunkrefs = @$qunksref;
1012	my @filenames;
1013	my $beauty = ""; # The accumulating header string for this entry.
1014	my %non_unanimous_tags; # Tags found in a proper subset of qunks
1015	my %unanimous_tags; # Tags found in all qunks
1016	my %all_branches; # Branches found in any qunk
1017	my $common_dir = undef; # Dir prefix common to all files ("" if none)
1018	my $fbegun = 0; # Did we begin printing filenames yet?
1019
1020	# First, loop over the qunks gathering all the tag/branch names.
1021	# We'll put them all in non_unanimous_tags, and take out the
1022	# unanimous ones later.
1023	QUNKREF:
1024	foreach my $qunkref (@qunkrefs)
1025	{
1026	## MJP: 19.xii.01 : Exclude @ignore_tags
1027	for my $ignore_tag (@ignore_tags) {
1028	next QUNKREF
1029	if grep $_ eq $ignore_tag, @{$$qunkref{'tags'}};
1030	}
1031	## MJP: 19.xii.01 : End exclude @ignore_tags
1032
1033	# Keep track of whether all the files in this commit were in the
1034	# same directory, and memorize it if so. We can make the output a
1035	# little more compact by mentioning the directory only once.
1036	if ((scalar (@qunkrefs)) > 1)
1037	{
1038	if (! (defined ($common_dir)))
1039	{
1040	my ($base, $dir);
1041	($base, $dir, undef) = fileparse ($$qunkref{'filename'});
1042
1043	if ((! (defined ($dir))) # this first case is sheer paranoia
1044	or ($dir eq "")
1045	or ($dir eq "./")
1046	or ($dir eq ".\\"))
1047	{
1048	$common_dir = "";
1049	}
1050	else
1051	{
1052	$common_dir = $dir;
1053	}
1054	}
1055	elsif ($common_dir ne "")
1056	{
1057	# Already have a common dir prefix, so how much of it can we preserve?
1058	$common_dir = &common_path_prefix ($$qunkref{'filename'}, $common_dir);
1059	}
1060	}
1061	else # only one file in this entry anyway, so common dir not an issue
1062	{
1063	$common_dir = "";
1064	}
1065
1066	if (defined ($$qunkref{'branch'})) {
1067	$all_branches{$$qunkref{'branch'}} = 1;
1068	}
1069	if (defined ($$qunkref{'tags'})) {
1070	foreach my $tag (@{$$qunkref{'tags'}}) {
1071	$non_unanimous_tags{$tag} = 1;
1072	}
1073	}
1074	}
1075
1076	# Any tag held by all qunks will be printed specially... but only if
1077	# there are multiple qunks in the first place!
1078	if ((scalar (@qunkrefs)) > 1) {
1079	foreach my $tag (keys (%non_unanimous_tags)) {
1080	my $everyone_has_this_tag = 1;
1081	foreach my $qunkref (@qunkrefs) {
1082	if ((! (defined ($$qunkref{'tags'})))
1083	or (! (grep ($_ eq $tag, @{$$qunkref{'tags'}})))) {
1084	$everyone_has_this_tag = 0;
1085	}
1086	}
1087	if ($everyone_has_this_tag) {
1088	$unanimous_tags{$tag} = 1;
1089	delete $non_unanimous_tags{$tag};
1090	}
1091	}
1092	}
1093
1094	if ($XML_Output)
1095	{
1096	# If outputting XML, then our task is pretty simple, because we
1097	# don't have to detect common dir, common tags, branch prefixing,
1098	# etc. We just output exactly what we have, and don't worry about
1099	# redundancy or readability.
1100
1101	foreach my $qunkref (@qunkrefs)
1102	{
1103	my $filename = $$qunkref{'filename'};
1104	my $revision = $$qunkref{'revision'};
1105	my $tags = $$qunkref{'tags'};
1106	my $branch = $$qunkref{'branch'};
1107	my $branchroots = $$qunkref{'branchroots'};
1108
1109	$filename = &xml_escape ($filename); # probably paranoia
1110	$revision = &xml_escape ($revision); # definitely paranoia
1111
1112	$beauty .= "<file>\n";
1113	$beauty .= "<name>${filename}</name>\n";
1114	$beauty .= "<revision>${revision}</revision>\n";
1115	if ($branch) {
1116	$branch = &xml_escape ($branch); # more paranoia
1117	$beauty .= "<branch>${branch}</branch>\n";
1118	}
1119	foreach my $tag (@$tags) {
1120	$tag = &xml_escape ($tag); # by now you're used to the paranoia
1121	$beauty .= "<tag>${tag}</tag>\n";
1122	}
1123	foreach my $root (@$branchroots) {
1124	$root = &xml_escape ($root); # which is good, because it will continue
1125	$beauty .= "<branchroot>${root}</branchroot>\n";
1126	}
1127	$beauty .= "</file>\n";
1128	}
1129
1130	# Theoretically, we could go home now. But as long as we're here,
1131	# let's print out the common_dir and utags, as a convenience to
1132	# the receiver (after all, earlier code calculated that stuff
1133	# anyway, so we might as well take advantage of it).
1134
1135	if ((scalar (keys (%unanimous_tags))) > 1) {
1136	foreach my $utag ((keys (%unanimous_tags))) {
1137	$utag = &xml_escape ($utag); # the usual paranoia
1138	$beauty .= "<utag>${utag}</utag>\n";
1139	}
1140	}
1141	if ($common_dir) {
1142	$common_dir = &xml_escape ($common_dir);
1143	$beauty .= "<commondir>${common_dir}</commondir>\n";
1144	}
1145
1146	# That's enough for XML, time to go home:
1147	return $beauty;
1148	}
1149
1150	# Else not XML output, so complexly compactify for chordate
1151	# consumption. At this point we have enough global information
1152	# about all the qunks to organize them non-redundantly for output.
1153
1154	if ($common_dir) {
1155	# Note that $common_dir still has its trailing slash
1156	$beauty .= "$common_dir: ";
1157	}
1158
1159	if ($Show_Branches)
1160	{
1161	# For trailing revision numbers.
1162	my @brevisions;
1163
1164	foreach my $branch (keys (%all_branches))
1165	{
1166	foreach my $qunkref (@qunkrefs)
1167	{
1168	if ((defined ($$qunkref{'branch'}))
1169	and ($$qunkref{'branch'} eq $branch))
1170	{
1171	if ($fbegun) {
1172	# kff todo: comma-delimited in XML too? Sure.
1173	$beauty .= ", ";
1174	}
1175	else {
1176	$fbegun = 1;
1177	}
1178	my $fname = substr ($$qunkref{'filename'}, length ($common_dir));
1179	$beauty .= $fname;
1180	$$qunkref{'printed'} = 1; # Just setting a mark bit, basically
1181
1182	if ($Show_Tags && (defined @{$$qunkref{'tags'}})) {
1183	my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1184
1185	if (@tags) {
1186	$beauty .= " (tags: ";
1187	$beauty .= join (', ', @tags);
1188	$beauty .= ")";
1189	}
1190	}
1191
1192	if ($Show_Revisions) {
1193	# Collect the revision numbers' last components, but don't
1194	# print them -- they'll get printed with the branch name
1195	# later.
1196	$$qunkref{'revision'} =~ /.+\.([\d]+)$/;
1197	push (@brevisions, $1);
1198
1199	# todo: we're still collecting branch roots, but we're not
1200	# showing them anywhere. If we do show them, it would be
1201	# nifty to just call them revision "0" on a the branch.
1202	# Yeah, that's the ticket.
1203	}
1204	}
1205	}
1206	$beauty .= " ($branch";
1207	if (@brevisions) {
1208	if ((scalar (@brevisions)) > 1) {
1209	$beauty .= ".[";
1210	$beauty .= (join (',', @brevisions));
1211	$beauty .= "]";
1212	}
1213	else {
1214	# Square brackets are spurious here, since there's no range to
1215	# encapsulate
1216	$beauty .= ".$brevisions[0]";
1217	}
1218	}
1219	$beauty .= ")";
1220	}
1221	}
1222
1223	# Okay; any qunks that were done according to branch are taken care
1224	# of, and marked as printed. Now print everyone else.
1225
1226	foreach my $qunkref (@qunkrefs)
1227	{
1228	next if (defined ($$qunkref{'printed'})); # skip if already printed
1229
1230	if ($fbegun) {
1231	$beauty .= ", ";
1232	}
1233	else {
1234	$fbegun = 1;
1235	}
1236	$beauty .= substr ($$qunkref{'filename'}, length ($common_dir));
1237	# todo: Shlomo's change was this:
1238	# $beauty .= substr ($$qunkref{'filename'},
1239	# (($common_dir eq "./") ? "" : length ($common_dir)));
1240	$$qunkref{'printed'} = 1; # Set a mark bit.
1241
1242	if ($Show_Revisions \|\| $Show_Tags)
1243	{
1244	my $started_addendum = 0;
1245
1246	if ($Show_Revisions) {
1247	$started_addendum = 1;
1248	$beauty .= " (";
1249	$beauty .= "$$qunkref{'revision'}";
1250	}
1251	if ($Show_Tags && (defined $$qunkref{'tags'})) {
1252	my @tags = grep ($non_unanimous_tags{$_}, @{$$qunkref{'tags'}});
1253	if ((scalar (@tags)) > 0) {
1254	if ($started_addendum) {
1255	$beauty .= ", ";
1256	}
1257	else {
1258	$beauty .= " (tags: ";
1259	}
1260	$beauty .= join (', ', @tags);
1261	$started_addendum = 1;
1262	}
1263	}
1264	if ($started_addendum) {
1265	$beauty .= ")";
1266	}
1267	}
1268	}
1269
1270	# Unanimous tags always come last.
1271	if ($Show_Tags && %unanimous_tags)
1272	{
1273	$beauty .= " (utags: ";
1274	$beauty .= join (', ', sort keys (%unanimous_tags));
1275	$beauty .= ")";
1276	}
1277
1278	# todo: still have to take care of branch_roots?
1279
1280	$beauty = "* $beauty:";
1281
1282	return $beauty;
1283	}
1284
1285	sub common_path_prefix ()
1286	{
1287	my $path1 = shift;
1288	my $path2 = shift;
1289
1290	my ($dir1, $dir2);
1291	(undef, $dir1, undef) = fileparse ($path1);
1292	(undef, $dir2, undef) = fileparse ($path2);
1293
1294	# Transmogrify Windows filenames to look like Unix.
1295	# (It is far more likely that someone is running cvs2cl.pl under
1296	# Windows than that they would genuinely have backslashes in their
1297	# filenames.)
1298	$dir1 =~ tr#\\#/#;
1299	$dir2 =~ tr#\\#/#;
1300
1301	my $accum1 = "";
1302	my $accum2 = "";
1303	my $last_common_prefix = "";
1304
1305	while ($accum1 eq $accum2)
1306	{
1307	$last_common_prefix = $accum1;
1308	last if ($accum1 eq $dir1);
1309	my ($tmp1) = split (/\//, (substr ($dir1, length ($accum1))));
1310	my ($tmp2) = split (/\//, (substr ($dir2, length ($accum2))));
1311	$accum1 .= "$tmp1/" if (defined $tmp1 and $tmp1 ne '');
1312	$accum2 .= "$tmp2/" if (defined $tmp2 and $tmp2 ne '');
1313	}
1314
1315	return $last_common_prefix;
1316	}
1317
1318	sub preprocess_msg_text ()
1319	{
1320	my $text = shift;
1321
1322	# Strip out carriage returns (as they probably result from DOSsy editors).
1323	$text =~ s/\r\n/\n/g;
1324
1325	# If it looks like two newlines, make it be two newlines:
1326	$text =~ s/\n\s*\n/\n\n/g;
1327
1328	if ($XML_Output)
1329	{
1330	$text = &xml_escape ($text);
1331	$text = "<msg>${text}</msg>\n";
1332	}
1333	elsif (! $No_Wrap)
1334	{
1335	# Strip off lone newlines, but only for lines that don't begin with
1336	# whitespace or a mail-quoting character, since we want to preserve
1337	# that kind of formatting. Also don't strip newlines that follow a
1338	# period; we handle those specially next. And don't strip
1339	# newlines that precede an open paren.
1340	1 while ($text =~ s/(^\|\n)([^>\s].*[^.\n])\n([^>\n])/$1$2 $3/g);
1341
1342	# If a newline follows a period, make sure that when we bring up the
1343	# bottom sentence, it begins with two spaces.
1344	1 while ($text =~ s/(^\|\n)([^>\s].*)\n([^>\n])/$1$2 $3/g);
1345	}
1346
1347	return $text;
1348	}
1349
1350	sub last_line_len ()
1351	{
1352	my $files_list = shift;
1353	my @lines = split (/\n/, $files_list);
1354	my $last_line = pop (@lines);
1355	return length ($last_line);
1356	}
1357
1358	# A custom wrap function, sensitive to some common constructs used in
1359	# log entries.
1360	sub wrap_log_entry ()
1361	{
1362	my $text = shift; # The text to wrap.
1363	my $left_pad_str = shift; # String to pad with on the left.
1364
1365	# These do NOT take left_pad_str into account:
1366	my $length_remaining = shift; # Amount left on current line.
1367	my $max_line_length = shift; # Amount left for a blank line.
1368
1369	my $wrapped_text = ""; # The accumulating wrapped entry.
1370	my $user_indent = ""; # Inherited user_indent from prev line.
1371
1372	my $first_time = 1; # First iteration of the loop?
1373	my $suppress_line_start_match = 0; # Set to disable line start checks.
1374
1375	my @lines = split (/\n/, $text);
1376	while (@lines) # Don't use `foreach' here, it won't work.
1377	{
1378	my $this_line = shift (@lines);
1379	chomp $this_line;
1380
1381	if ($this_line =~ /^(\s+)/) {
1382	$user_indent = $1;
1383	}
1384	else {
1385	$user_indent = "";
1386	}
1387
1388	# If it matches any of the line-start regexps, print a newline now...
1389	if ($suppress_line_start_match)
1390	{
1391	$suppress_line_start_match = 0;
1392	}
1393	elsif (($this_line =~ /^(\s)\\s+[a-zA-Z0-9]/)
1394	\|\| ($this_line =~ /^(\s)\ [a-zA-Z0-9_\.\/\+-]+/)
1395	\|\| ($this_line =~ /^(\s)$[a-zA-Z0-9_\.\/\+-]+($\|,\s)/)
1396	\|\| ($this_line =~ /^(\s+)(\S+)/)
1397	\|\| ($this_line =~ /^(\s*)- +/)
1398	\|\| ($this_line =~ /^()\s*$/)
1399	\|\| ($this_line =~ /^(\s)\\) +/)
1400	\|\| ($this_line =~ /^(\s*)[a-zA-Z0-9](\)\|\.\|\:) +/))
1401	{
1402	# Make a line break immediately, unless header separator is set
1403	# and this line is the first line in the entry, in which case
1404	# we're getting the blank line for free already and shouldn't
1405	# add an extra one.
1406	unless (($After_Header ne " ") and ($first_time))
1407	{
1408	if ($this_line =~ /^()\s*$/) {
1409	$suppress_line_start_match = 1;
1410	$wrapped_text .= "\n${left_pad_str}";
1411	}
1412
1413	$wrapped_text .= "\n${left_pad_str}";
1414	}
1415
1416	$length_remaining = $max_line_length - (length ($user_indent));
1417	}
1418
1419	# Now that any user_indent has been preserved, strip off leading
1420	# whitespace, so up-folding has no ugly side-effects.
1421	$this_line =~ s/^\s*//;
1422
1423	# Accumulate the line, and adjust parameters for next line.
1424	my $this_len = length ($this_line);
1425	if ($this_len == 0)
1426	{
1427	# Blank lines should cancel any user_indent level.
1428	$user_indent = "";
1429	$length_remaining = $max_line_length;
1430	}
1431	elsif ($this_len >= $length_remaining) # Line too long, try breaking it.
1432	{
1433	# Walk backwards from the end. At first acceptable spot, break
1434	# a new line.
1435	my $idx = $length_remaining - 1;
1436	if ($idx < 0) { $idx = 0 };
1437	while ($idx > 0)
1438	{
1439	if (substr ($this_line, $idx, 1) =~ /\s/)
1440	{
1441	my $line_now = substr ($this_line, 0, $idx);
1442	my $next_line = substr ($this_line, $idx);
1443	$this_line = $line_now;
1444
1445	# Clean whitespace off the end.
1446	chomp $this_line;
1447
1448	# The current line is ready to be printed.
1449	$this_line .= "\n${left_pad_str}";
1450
1451	# Make sure the next line is allowed full room.
1452	$length_remaining = $max_line_length - (length ($user_indent));
1453
1454	# Strip next_line, but then preserve any user_indent.
1455	$next_line =~ s/^\s*//;
1456
1457	# Sneak a peek at the user_indent of the upcoming line, so
1458	# $next_line (which will now precede it) can inherit that
1459	# indent level. Otherwise, use whatever user_indent level
1460	# we currently have, which might be none.
1461	my $next_next_line = shift (@lines);
1462	if ((defined ($next_next_line)) && ($next_next_line =~ /^(\s+)/)) {
1463	$next_line = $1 . $next_line if (defined ($1));
1464	# $length_remaining = $max_line_length - (length ($1));
1465	$next_next_line =~ s/^\s*//;
1466	}
1467	else {
1468	$next_line = $user_indent . $next_line;
1469	}
1470	if (defined ($next_next_line)) {
1471	unshift (@lines, $next_next_line);
1472	}
1473	unshift (@lines, $next_line);
1474
1475	# Our new next line might, coincidentally, begin with one of
1476	# the line-start regexps, so we temporarily turn off
1477	# sensitivity to that until we're past the line.
1478	$suppress_line_start_match = 1;
1479
1480	last;
1481	}
1482	else
1483	{
1484	$idx--;
1485	}
1486	}
1487
1488	if ($idx == 0)
1489	{
1490	# We bottomed out because the line is longer than the
1491	# available space. But that could be because the space is
1492	# small, or because the line is longer than even the maximum
1493	# possible space. Handle both cases below.
1494
1495	if ($length_remaining == ($max_line_length - (length ($user_indent))))
1496	{
1497	# The line is simply too long -- there is no hope of ever
1498	# breaking it nicely, so just insert it verbatim, with
1499	# appropriate padding.
1500	$this_line = "\n${left_pad_str}${this_line}";
1501	}
1502	else
1503	{
1504	# Can't break it here, but may be able to on the next round...
1505	unshift (@lines, $this_line);
1506	$length_remaining = $max_line_length - (length ($user_indent));
1507	$this_line = "\n${left_pad_str}";
1508	}
1509	}
1510	}
1511	else # $this_len < $length_remaining, so tack on what we can.
1512	{
1513	# Leave a note for the next iteration.
1514	$length_remaining = $length_remaining - $this_len;
1515
1516	if ($this_line =~ /\.$/)
1517	{
1518	$this_line .= " ";
1519	$length_remaining -= 2;
1520	}
1521	else # not a sentence end
1522	{
1523	$this_line .= " ";
1524	$length_remaining -= 1;
1525	}
1526	}
1527
1528	# Unconditionally indicate that loop has run at least once.
1529	$first_time = 0;
1530
1531	$wrapped_text .= "${user_indent}${this_line}";
1532	}
1533
1534	# One last bit of padding.
1535	$wrapped_text .= "\n";
1536
1537	return $wrapped_text;
1538	}
1539
1540	sub xml_escape ()
1541	{
1542	my $txt = shift;
1543	$txt =~ s/&/&/g;
1544	$txt =~ s/</</g;
1545	$txt =~ s/>/>/g;
1546	return $txt;
1547	}
1548
1549	sub maybe_read_user_map_file ()
1550	{
1551	my %expansions;
1552
1553	if ($User_Map_File)
1554	{
1555	open (MAPFILE, "<$User_Map_File")
1556	or die ("Unable to open $User_Map_File ($!)");
1557
1558	while (<MAPFILE>)
1559	{
1560	next if /^\s*#/; # Skip comment lines.
1561	next if not /:/; # Skip lines without colons.
1562
1563	# It is now safe to split on ':'.
1564	my ($username, $expansion) = split ':';
1565	chomp $expansion;
1566	$expansion =~ s/^'(.*)'$/$1/;
1567	$expansion =~ s/^"(.*)"$/$1/;
1568
1569	# If it looks like the expansion has a real name already, then
1570	# we toss the username we got from CVS log. Otherwise, keep
1571	# it to use in combination with the email address.
1572
1573	if ($expansion =~ /^\s<{0,1}\S+@./) {
1574	# Also, add angle brackets if none present
1575	if (! ($expansion =~ /<\S+@\S+>/)) {
1576	$expansions{$username} = "$username <$expansion>";
1577	}
1578	else {
1579	$expansions{$username} = "$username $expansion";
1580	}
1581	}
1582	else {
1583	$expansions{$username} = $expansion;
1584	}
1585	}
1586
1587	close (MAPFILE);
1588	}
1589
1590	return %expansions;
1591	}
1592
1593	sub parse_options ()
1594	{
1595	# Check this internally before setting the global variable.
1596	my $output_file;
1597
1598	# If this gets set, we encountered unknown options and will exit at
1599	# the end of this subroutine.
1600	my $exit_with_admonishment = 0;
1601
1602	while (my $arg = shift (@ARGV))
1603	{
1604	if ($arg =~ /^-h$\|^-help$\|^--help$\|^--usage$\|^-?$/) {
1605	$Print_Usage = 1;
1606	}
1607	elsif ($arg =~ /^--delta$/) {
1608	my $narg = shift(@ARGV) \|\| die "$arg needs argument.\n";
1609	if ($narg =~ /^([A-Za-z][A-Za-z0-9_\-]):([A-Za-z][A-Za-z0-9_\-])$/) {
1610	$Delta_From = $1;
1611	$Delta_To = $2;
1612	$Delta_Mode = 1;
1613	} else {
1614	die "--delta FROM_TAG:TO_TAG is what you meant to say.\n";
1615	}
1616	}
1617	elsif ($arg =~ /^--debug$/) { # unadvertised option, heh
1618	$Debug = 1;
1619	}
1620	elsif ($arg =~ /^--version$/) {
1621	$Print_Version = 1;
1622	}
1623	elsif ($arg =~ /^-g$\|^--global-opts$/) {
1624	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1625	# Don't assume CVS is called "cvs" on the user's system:
1626	$Log_Source_Command =~ s/(^\S*)/$1 $narg/;
1627	}
1628	elsif ($arg =~ /^-l$\|^--log-opts$/) {
1629	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1630	$Log_Source_Command .= " $narg";
1631	}
1632	elsif ($arg =~ /^-f$\|^--file$/) {
1633	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1634	$output_file = $narg;
1635	}
1636	elsif ($arg =~ /^--accum$/) {
1637	$Cumulative = 1;
1638	}
1639	elsif ($arg =~ /^--fsf$/) {
1640	$FSF_Style = 1;
1641	}
1642	elsif ($arg =~ /^-U$\|^--usermap$/) {
1643	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1644	$User_Map_File = $narg;
1645	}
1646	elsif ($arg =~ /^-W$\|^--window$/) {
1647	defined(my $narg = shift (@ARGV)) \|\| die "$arg needs argument.\n";
1648	$Max_Checkin_Duration = $narg;
1649	}
1650	elsif ($arg =~ /^-I$\|^--ignore$/) {
1651	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1652	push (@Ignore_Files, $narg);
1653	}
1654	elsif ($arg =~ /^-C$\|^--case-insensitive$/) {
1655	$Case_Insensitive = 1;
1656	}
1657	elsif ($arg =~ /^-R$\|^--regexp$/) {
1658	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1659	$Regexp_Gate = $narg;
1660	}
1661	elsif ($arg =~ /^--stdout$/) {
1662	$Output_To_Stdout = 1;
1663	}
1664	elsif ($arg =~ /^--version$/) {
1665	$Print_Version = 1;
1666	}
1667	elsif ($arg =~ /^-d$\|^--distributed$/) {
1668	$Distributed = 1;
1669	}
1670	elsif ($arg =~ /^-P$\|^--prune$/) {
1671	$Prune_Empty_Msgs = 1;
1672	}
1673	elsif ($arg =~ /^-S$\|^--separate-header$/) {
1674	$After_Header = "\n\n";
1675	}
1676	elsif ($arg =~ /^--no-wrap$/) {
1677	$No_Wrap = 1;
1678	}
1679	elsif ($arg =~ /^--gmt$\|^--utc$/) {
1680	$UTC_Times = 1;
1681	}
1682	elsif ($arg =~ /^-w$\|^--day-of-week$/) {
1683	$Show_Day_Of_Week = 1;
1684	}
1685	elsif ($arg =~ /^-r$\|^--revisions$/) {
1686	$Show_Revisions = 1;
1687	}
1688	elsif ($arg =~ /^-t$\|^--tags$/) {
1689	$Show_Tags = 1;
1690	}
1691	elsif ($arg =~ /^-T$\|^--tagdates$/) {
1692	$Show_Tag_Dates = 1;
1693	}
1694	elsif ($arg =~ /^-b$\|^--branches$/) {
1695	$Show_Branches = 1;
1696	}
1697	elsif ($arg =~ /^-F$\|^--follow$/) {
1698	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1699	push (@Follow_Branches, $narg);
1700	}
1701	elsif ($arg =~ /^--stdin$/) {
1702	$Input_From_Stdin = 1;
1703	}
1704	elsif ($arg =~ /^--header$/) {
1705	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1706	$ChangeLog_Header = &slurp_file ($narg);
1707	if (! defined ($ChangeLog_Header)) {
1708	$ChangeLog_Header = "";
1709	}
1710	}
1711	elsif ($arg =~ /^--xml-encoding$/) {
1712	my $narg = shift (@ARGV) \|\| die "$arg needs argument.\n";
1713	$XML_Encoding = $narg ;
1714	}
1715	elsif ($arg =~ /^--xml$/) {
1716	$XML_Output = 1;
1717	}
1718	elsif ($arg =~ /^--hide-filenames$/) {
1719	$Hide_Filenames = 1;
1720	$After_Header = "";
1721	}
1722	elsif ($arg =~ /^--ignore-tag$/ ) {
1723	die "$arg needs argument.\n"
1724	unless @ARGV;
1725	push @ignore_tags, shift @ARGV;
1726	}
1727	else {
1728	# Just add a filename as argument to the log command
1729	$Log_Source_Command .= " '$arg'";
1730	}
1731	}
1732
1733	## Check for contradictions...
1734
1735	if ($Output_To_Stdout && $Distributed) {
1736	print STDERR "cannot pass both --stdout and --distributed\n";
1737	$exit_with_admonishment = 1;
1738	}
1739
1740	if ($Output_To_Stdout && $output_file) {
1741	print STDERR "cannot pass both --stdout and --file\n";
1742	$exit_with_admonishment = 1;
1743	}
1744
1745	if ($XML_Output && $Cumulative) {
1746	print STDERR "cannot pass both --xml and --accum\n";
1747	$exit_with_admonishment = 1;
1748	}
1749
1750	# Or if any other error message has already been printed out, we
1751	# just leave now:
1752	if ($exit_with_admonishment) {
1753	&usage ();
1754	exit (1);
1755	}
1756	elsif ($Print_Usage) {
1757	&usage ();
1758	exit (0);
1759	}
1760	elsif ($Print_Version) {
1761	&version ();
1762	exit (0);
1763	}
1764
1765	## Else no problems, so proceed.
1766
1767	if ($output_file) {
1768	$Log_File_Name = $output_file;
1769	}
1770	}
1771
1772	sub slurp_file ()
1773	{
1774	my $filename = shift \|\| die ("no filename passed to slurp_file()");
1775	my $retstr;
1776
1777	open (SLURPEE, "<${filename}") or die ("unable to open $filename ($!)");
1778	my $saved_sep = $/;
1779	undef $/;
1780	$retstr = <SLURPEE>;
1781	$/ = $saved_sep;
1782	close (SLURPEE);
1783	return $retstr;
1784	}
1785
1786	sub debug ()
1787	{
1788	if ($Debug) {
1789	my $msg = shift;
1790	print STDERR $msg;
1791	}
1792	}
1793
1794	sub version ()
1795	{
1796	print "cvs2cl.pl version ${VERSION}; distributed under the GNU GPL.\n";
1797	}
1798
1799	sub usage ()
1800	{
1801	&version ();
1802	print <<'END_OF_INFO';
1803	Generate GNU-style ChangeLogs in CVS working copies.
1804
1805	Notes about the output format(s):
1806
1807	The default output of cvs2cl.pl is designed to be compact, formally
1808	unambiguous, but still easy for humans to read. It is largely
1809	self-explanatory, I hope; the one abbreviation that might not be
1810	obvious is "utags". That stands for "universal tags" -- a
1811	universal tag is one held by all the files in a given change entry.
1812
1813	If you need output that's easy for a program to parse, use the
1814	--xml option. Note that with XML output, just about all available
1815	information is included with each change entry, whether you asked
1816	for it or not, on the theory that your parser can ignore anything
1817	it's not looking for.
1818
1819	Notes about the options and arguments (the actual options are listed
1820	last in this usage message):
1821
1822	* The -I and -F options may appear multiple times.
1823
1824	* To follow trunk revisions, use "-F trunk" ("-F TRUNK" also works).
1825	This is okay because no would ever, ever be crazy enough to name a
1826	branch "trunk", right? Right.
1827
1828	* For the -U option, the UFILE should be formatted like
1829	CVSROOT/users. That is, each line of UFILE looks like this
1830	jrandom:jrandom@red-bean.com
1831	or maybe even like this
1832	jrandom:'Jesse Q. Random <jrandom@red-bean.com>'
1833	Don't forget to quote the portion after the colon if necessary.
1834
1835	* Many people want to filter by date. To do so, invoke cvs2cl.pl
1836	like this:
1837	cvs2cl.pl -l "-d'DATESPEC'"
1838	where DATESPEC is any date specification valid for "cvs log -d".
1839	(Note that CVS 1.10.7 and below requires there be no space between
1840	-d and its argument).
1841
1842	Options/Arguments:
1843
1844	-h, -help, --help, or -? Show this usage and exit
1845	--version Show version and exit
1846	-r, --revisions Show revision numbers in output
1847	-b, --branches Show branch names in revisions when possible
1848	-t, --tags Show tags (symbolic names) in output
1849	-T, --tagdates Show tags in output on their first occurance
1850	--stdin Read from stdin, don't run cvs log
1851	--stdout Output to stdout not to ChangeLog
1852	-d, --distributed Put ChangeLogs in subdirs
1853	-f FILE, --file FILE Write to FILE instead of "ChangeLog"
1854	--fsf Use this if log data is in FSF ChangeLog style
1855	-W SECS, --window SECS Window of time within which log entries unify
1856	-U UFILE, --usermap UFILE Expand usernames to email addresses from UFILE
1857	-R REGEXP, --regexp REGEXP Include only entries that match REGEXP
1858	-I REGEXP, --ignore REGEXP Ignore files whose names match REGEXP
1859	-C, --case-insensitive Any regexp matching is done case-insensitively
1860	-F BRANCH, --follow BRANCH Show only revisions on or ancestral to BRANCH
1861	-S, --separate-header Blank line between each header and log message
1862	--no-wrap Don't auto-wrap log message (recommend -S also)
1863	--gmt, --utc Show times in GMT/UTC instead of local time
1864	--accum Add to an existing ChangeLog (incompat w/ --xml)
1865	-w, --day-of-week Show day of week
1866	--header FILE Get ChangeLog header from FILE ("-" means stdin)
1867	--xml Output XML instead of ChangeLog format
1868	--xml-encoding ENCODING Insert encoding clause in XML header
1869	--hide-filenames Don't show filenames (ignored for XML output)
1870	-P, --prune Don't show empty log messages
1871	-g OPTS, --global-opts OPTS Invoke like this "cvs OPTS log ..."
1872	-l OPTS, --log-opts OPTS Invoke like this "cvs ... log OPTS"
1873	FILE1 [FILE2 ...] Show only log information for the named FILE(s)
1874
1875	See http://www.red-bean.com/cvs2cl for maintenance and bug info.
1876	END_OF_INFO
1877	}
1878
1879	__END__
1880
1881	=head1 NAME
1882
1883	cvs2cl.pl - produces GNU-style ChangeLogs in CVS working copies, by
1884	running "cvs log" and parsing the output. Shared log entries are
1885	unified in an intuitive way.
1886
1887	=head1 DESCRIPTION
1888
1889	This script generates GNU-style ChangeLog files from CVS log
1890	information. Basic usage: just run it inside a working copy and a
1891	ChangeLog will appear. It requires repository access (i.e., 'cvs log'
1892	must work). Run "cvs2cl.pl --help" to see more advanced options.
1893
1894	See http://www.red-bean.com/cvs2cl for updates, and for instructions
1895	on getting anonymous CVS access to this script.
1896
1897	Maintainer: Karl Fogel <kfogel@red-bean.com>
1898	Please report bugs to <bug-cvs2cl@red-bean.com>.
1899
1900	=head1 README
1901
1902	This script generates GNU-style ChangeLog files from CVS log
1903	information. Basic usage: just run it inside a working copy and a
1904	ChangeLog will appear. It requires repository access (i.e., 'cvs log'
1905	must work). Run "cvs2cl.pl --help" to see more advanced options.
1906
1907	See http://www.red-bean.com/cvs2cl for updates, and for instructions
1908	on getting anonymous CVS access to this script.
1909
1910	Maintainer: Karl Fogel <kfogel@red-bean.com>
1911	Please report bugs to <bug-cvs2cl@red-bean.com>.
1912
1913	=head1 PREREQUISITES
1914
1915	This script requires C<Text::Wrap>, C<Time::Local>, and
1916	C<File::Basename>.
1917	It also seems to require C<Perl 5.004_04> or higher.
1918
1919	=pod OSNAMES
1920
1921	any
1922
1923	=pod SCRIPT CATEGORIES
1924
1925	Version_Control/CVS
1926
1927	=cut
1928
1929	-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
1930
1931	Note about a bug-slash-opportunity:
1932	-----------------------------------
1933
1934	There's a bug in Text::Wrap, which affects cvs2cl. This script
1935	reveals it:
1936
1937	#!/usr/bin/perl -w
1938
1939	use Text::Wrap;
1940
1941	my $test_text =
1942	"This script demonstrates a bug in Text::Wrap. The very long line
1943	following this paragraph will be relocated relative to the surrounding
1944	text:
1945
1946	====================================================================
1947
1948	See? When the bug happens, we'll get the line of equal signs below
1949	this paragraph, even though it should be above.";
1950
1951	# Print out the test text with no wrapping:
1952	print "$test_text";
1953	print "\n";
1954	print "\n";
1955
1956	# Now print it out wrapped, and see the bug:
1957	print wrap ("\t", " ", "$test_text");
1958	print "\n";
1959	print "\n";
1960
1961	If the line of equal signs were one shorter, then the bug doesn't
1962	happen. Interesting.
1963
1964	Anyway, rather than fix this in Text::Wrap, we might as well write a
1965	new wrap() which has the following much-needed features:
1966
1967	* initial indentation, like current Text::Wrap()
1968	* subsequent line indentation, like current Text::Wrap()
1969	* user chooses among: force-break long words, leave them alone, or die()?
1970	* preserve existing indentation: chopped chunks from an indented line
1971	are indented by same (like this line, not counting the asterisk!)
1972	* optional list of things to preserve on line starts, default ">"
1973
1974	Note that the last two are essentially the same concept, so unify in
1975	implementation and give a good interface to controlling them.
1976
1977	And how about:
1978
1979	Optionally, when encounter a line pre-indented by same as previous
1980	line, then strip the newline and refill, but indent by the same.
1981	Yeah...
1982

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: trunk/flex/tools/cvs2cl.pl@ 3040

Download in other formats: