source: trunk/essentials/sys-apps/gawk/README_d/README.multibyte

Last change on this file was 3076, checked in by bird, 18 years ago

gawk 3.1.5

File size: 1.1 KB
Line 
1Fri Jun 3 12:20:17 IDT 2005
2============================
3
4As noted in the NEWS file, as of 3.1.5, gawk uses character values instead
5of byte values for `index', `length', `substr' and `match'. This works
6in multibyte and unicode locales.
7
8Wed Jun 18 16:47:31 IDT 2003
9============================
10
11Multibyte locales can cause occasional weirdness, in particular with
12ranges inside brackets: /[....]/. Something that works great for ASCII
13will choke for, e.g., en_US.UTF-8. One such program is test/gsubtst5.awk.
14
15By default, the test suite runs with LC_ALL=C and LANG=C. You
16can change this by doing (from a Bourne-style shell):
17
18 $ GAWKLOCALE=some_locale make check
19
20Then the test suite will set LC_ALL and LANG to the given locale.
21
22As of this writing, this works for en_US.UTF-8, and all tests
23pass except gsubtst5.
24
25For the normal case of RS = "\n", the locale is largely irrelevant.
26For other single byte record separators, using LC_ALL=C will give you
27much better performance when reading records. Otherwise, gawk has to
28make several function calls, *per input character* to find the record
29terminator. You have been warned.
Note: See TracBrowser for help on using the repository browser.