source: trunk/essentials/sys-apps/sed/BUGS@ 3472

Last change on this file since 3472 was 3083, checked in by bird, 19 years ago

sed 4.1.5

File size: 5.3 KB
Line 
1* ABOUT BUGS
2
3Before reporting a bug, please check the list of known bugs
4and the list of oft-reported non-bugs (below).
5
6Bugs and comments may be sent to bonzini@gnu.org; please
7include in the Subject: header the first line of the output of
8``sed --version''.
9
10Please do not send a bug report like this:
11
12 [while building frobme-1.3.4]
13 $ configure
14 sed: file sedscr line 1: Unknown option to 's'
15
16If sed doesn't configure your favorite package, take a few extra
17minutes to identify the specific problem and make a stand-alone test
18case.
19
20A stand-alone test case includes all the data necessary to perform the
21test, and the specific invocation of sed that causes the problem. The
22smaller a stand-alone test case is, the better. A test case should
23not involve something as far removed from sed as ``try to configure
24frobme-1.3.4''. Yes, that is in principle enough information to look
25for the bug, but that is not a very practical prospect.
26
27
28
29* NON-BUGS
30
31`N' command on the last line
32
33 Most versions of sed exit without printing anything when the `N'
34 command is issued on the last line of a file. GNU sed instead
35 prints pattern space before exiting unless of course the `-n'
36 command switch has been specified. More information on the reason
37 behind this choice can be found in the Info manual.
38
39
40regex syntax clashes (problems with backslashes)
41
42 sed uses the Posix basic regular expression syntax. According to
43 the standard, the meaning of some escape sequences is undefined in
44 this syntax; notable in the case of GNU sed are `\|', `\+', `\?',
45 `\`', `\'', `\<', `\>', `\b', `\B', `\w', and `\W'.
46
47 As in all GNU programs that use Posix basic regular expressions, sed
48 interprets these escape sequences as meta-characters. So, `x\+'
49 matches one or more occurrences of `x'. `abc\|def' matches either
50 `abc' or `def'.
51
52 This syntax may cause problems when running scripts written for other
53 seds. Some sed programs have been written with the assumption that
54 `\|' and `\+' match the literal characters `|' and `+'. Such scripts
55 must be modified by removing the spurious backslashes if they are to
56 be used with recent versions of sed (not only GNU sed).
57
58 On the other hand, some scripts use `s|abc\|def||g' to remove occurrences
59 of _either_ `abc' or `def'. While this worked until sed 4.0.x, newer
60 versions interpret this as removing the string `abc|def'. This is
61 again undefined behavior according to POSIX, but this interpretation
62 is arguably more robust: the older one, for example, required that
63 the regex matcher parsed `\/' as `/' in the common case of escaping
64 a slash, which is again undefined behavior; the new behavior avoids
65 this, and this is good because the regex matcher is only partially
66 under our control.
67
68 In addition, GNU sed supports several escape characters (some of
69 which are multi-character) to insert non-printable characters
70 in scripts (`\a', `\c', `\d', `\o', `\r', `\t', `\v', `\x'). These
71 can cause similar problems with scripts written for other seds.
72
73
74-i clobbers read-only files
75
76 In short, `sed d -i' will let one delete the contents of
77 a read-only file, and in general the `-i' option will let
78 one clobber protected files. This is not a bug, but rather a
79 consequence of how the Unix filesystem works.
80
81 The permissions on a file say what can happen to the data
82 in that file, while the permissions on a directory say what can
83 happen to the list of files in that directory. `sed -i'
84 will not ever open for writing a file that is already on disk,
85 rather, it will work on a temporary file that is finally renamed
86 to the original name: if you rename or delete files, you're actually
87 modifying the contents of the directory, so the operation depends on
88 the permissions of the directory, not of the file). For this same
89 reason, sed will not let one use `-i' on a writeable file in a
90 read-only directory (but unbelievably nobody reports that as a
91 bug...).
92
93
94`0a' does not work (gives an error)
95
96 There is no line 0. 0 is a special address that is only used to treat
97 addresses like `0,/RE/' as active when the script starts: if you
98 write `1,/abc/d' and the first line includes the word `abc', then
99 that match would be ignored because address ranges must span at least
100 two lines (barring the end of the file); but what you probably wanted is
101 to delete every line up to the first one including `abc', and this
102 is obtained with `0,/abc/d'.
103
104
105`[a-z]' is case insensitive
106
107 You are encountering problems with locales. POSIX mandates that `[a-z]'
108 uses the current locale's collation order -- in C parlance, that means
109 strcoll(3) instead of strcmp(3). Some locales have a case insensitive
110 strcoll, others don't: one of those that have problems is Estonian.
111
112 Another problem is that [a-z] tries to use collation symbols. This
113 only happens if you are on the GNU system, using GNU libc's regular
114 expression matcher instead of compiling the one supplied with GNU sed.
115 In a Danish locale, for example, the regular expression `^[a-z]$'
116 matches the string `aa', because aa is a single collating symbol that
117 comes after `a' and before `b'; `ll' behaves similarly in Spanish
118 locales, or `ij' in Dutch locales.
119
120 To work around these problems, which may cause bugs in shell scripts,
121 set the LC_ALL environment variable to `C', or set the locale on a
122 more fine-grained basis with the other LC_* environment variables.
Note: See TracBrowser for help on using the repository browser.