| 1 | Fri Jun 3 12:20:17 IDT 2005
|
|---|
| 2 | ============================
|
|---|
| 3 |
|
|---|
| 4 | As noted in the NEWS file, as of 3.1.5, gawk uses character values instead
|
|---|
| 5 | of byte values for `index', `length', `substr' and `match'. This works
|
|---|
| 6 | in multibyte and unicode locales.
|
|---|
| 7 |
|
|---|
| 8 | Wed Jun 18 16:47:31 IDT 2003
|
|---|
| 9 | ============================
|
|---|
| 10 |
|
|---|
| 11 | Multibyte locales can cause occasional weirdness, in particular with
|
|---|
| 12 | ranges inside brackets: /[....]/. Something that works great for ASCII
|
|---|
| 13 | will choke for, e.g., en_US.UTF-8. One such program is test/gsubtst5.awk.
|
|---|
| 14 |
|
|---|
| 15 | By default, the test suite runs with LC_ALL=C and LANG=C. You
|
|---|
| 16 | can change this by doing (from a Bourne-style shell):
|
|---|
| 17 |
|
|---|
| 18 | $ GAWKLOCALE=some_locale make check
|
|---|
| 19 |
|
|---|
| 20 | Then the test suite will set LC_ALL and LANG to the given locale.
|
|---|
| 21 |
|
|---|
| 22 | As of this writing, this works for en_US.UTF-8, and all tests
|
|---|
| 23 | pass except gsubtst5.
|
|---|
| 24 |
|
|---|
| 25 | For the normal case of RS = "\n", the locale is largely irrelevant.
|
|---|
| 26 | For other single byte record separators, using LC_ALL=C will give you
|
|---|
| 27 | much better performance when reading records. Otherwise, gawk has to
|
|---|
| 28 | make several function calls, *per input character* to find the record
|
|---|
| 29 | terminator. You have been warned.
|
|---|