Context Navigation

README.multibyte

Visit:

Last change on this file was 3076, checked in by bird, 19 years ago
gawk 3.1.5
File size: 1.1 KB

Line
1	Fri Jun 3 12:20:17 IDT 2005
2	============================
3
4	As noted in the NEWS file, as of 3.1.5, gawk uses character values instead
5	of byte values for `index', `length', `substr' and `match'. This works
6	in multibyte and unicode locales.
7
8	Wed Jun 18 16:47:31 IDT 2003
9	============================
10
11	Multibyte locales can cause occasional weirdness, in particular with
12	ranges inside brackets: /[....]/. Something that works great for ASCII
13	will choke for, e.g., en_US.UTF-8. One such program is test/gsubtst5.awk.
14
15	By default, the test suite runs with LC_ALL=C and LANG=C. You
16	can change this by doing (from a Bourne-style shell):
17
18	$ GAWKLOCALE=some_locale make check
19
20	Then the test suite will set LC_ALL and LANG to the given locale.
21
22	As of this writing, this works for en_US.UTF-8, and all tests
23	pass except gsubtst5.
24
25	For the normal case of RS = "\n", the locale is largely irrelevant.
26	For other single byte record separators, using LC_ALL=C will give you
27	much better performance when reading records. Otherwise, gawk has to
28	make several function calls, per input character to find the record
29	terminator. You have been warned.

Note: See TracBrowser for help on using the repository browser.