Changeset 3613 for trunk/src/sed/BUGS


Ignore:
Timestamp:
Sep 19, 2024, 2:34:43 AM (11 months ago)
Author:
bird
Message:

src/sed: Merged in changes between 4.1.5 and 4.9 from the vendor branch. (svn merge /vendor/sed/4.1.5 /vendor/sed/current .)

Location:
trunk/src/sed
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/sed

  • trunk/src/sed/BUGS

    r599 r3613  
    1010Please do not send a bug report like this:
    1111
    12         [while building frobme-1.3.4]
    13         $ configure
    14         sed: file sedscr line 1: Unknown option to 's'
     12        [while building frobme-1.3.4]
     13        $ configure
     14        sed: file sedscr line 1: Unknown option to 's'
    1515
    1616If sed doesn't configure your favorite package, take a few extra
     
    7777  a read-only file, and in general the `-i' option will let
    7878  one clobber protected files.  This is not a bug, but rather a
    79   consequence of how the Unix filesystem works.
     79  consequence of how the Unix file system works.
    8080
    8181  The permissions on a file say what can happen to the data
     
    8888  the permissions of the directory, not of the file).  For this same
    8989  reason, sed will not let one use `-i' on a writeable file in a
    90   read-only directory (but unbelievably nobody reports that as a
    91   bug...).
     90  read-only directory, and will break hard or symbolic links when
     91  `-i' is used on such a file.
    9292
    9393
     
    104104
    105105`[a-z]' is case insensitive
     106`s/.*//' does not clear pattern space
    106107
    107108  You are encountering problems with locales.  POSIX mandates that `[a-z]'
    108109  uses the current locale's collation order -- in C parlance, that means
    109110  strcoll(3) instead of strcmp(3).  Some locales have a case insensitive
    110   strcoll, others don't: one of those that have problems is Estonian.
     111  strcoll, others don't.
    111112
    112113  Another problem is that [a-z] tries to use collation symbols.  This
     
    114115  expression matcher instead of compiling the one supplied with GNU sed.
    115116  In a Danish locale, for example, the regular expression `^[a-z]$'
    116   matches the string `aa', because aa is a single collating symbol that
     117  matches the string `aa', because `aa' is a single collating symbol that
    117118  comes after `a' and before `b'; `ll' behaves similarly in Spanish
    118119  locales, or `ij' in Dutch locales.
    119120
    120   To work around these problems, which may cause bugs in shell scripts,
    121   set the LC_ALL environment variable to `C', or set the locale on a
    122   more fine-grained basis with the other LC_* environment variables.
     121  Another common localization-related problem happens if your input stream
     122  includes invalid multibyte sequences.  POSIX mandates that such
     123  sequences are _not_ matched by `.', so that `s/.*//' will not clear
     124  pattern space as you would expect.  In fact, there is no way to clear
     125  sed's buffers in the middle of the script in most multibyte locales
     126  (including UTF-8 locales).  For this reason, GNU sed provides a `z'
     127  command (for `zap') as an extension.
     128
     129  However, to work around both of these problems, which may cause bugs
     130  in shell scripts, you can set the LC_ALL environment variable to `C',
     131  or set the locale on a more fine-grained basis with the other LC_*
     132  environment variables.
Note: See TracChangeset for help on using the changeset viewer.