| 1 | To generate or modify mapping headers | 
|---|
| 2 | ------------------------------------- | 
|---|
| 3 | Mapping headers are imported from CJKCodecs as pre-generated form. | 
|---|
| 4 | If you need to tweak or add something on it, please look at tools/ | 
|---|
| 5 | subdirectory of CJKCodecs' distribution. | 
|---|
| 6 |  | 
|---|
| 7 |  | 
|---|
| 8 |  | 
|---|
| 9 | Notes on implmentation characteristics of each codecs | 
|---|
| 10 | ----------------------------------------------------- | 
|---|
| 11 |  | 
|---|
| 12 | 1) Big5 codec | 
|---|
| 13 |  | 
|---|
| 14 | The big5 codec maps the following characters as cp950 does rather | 
|---|
| 15 | than conforming Unicode.org's that maps to 0xFFFD. | 
|---|
| 16 |  | 
|---|
| 17 | BIG5        Unicode     Description | 
|---|
| 18 |  | 
|---|
| 19 | 0xA15A      0x2574      SPACING UNDERSCORE | 
|---|
| 20 | 0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE | 
|---|
| 21 | 0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE | 
|---|
| 22 | 0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT | 
|---|
| 23 | 0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT | 
|---|
| 24 | 0xA2CC      0x5341      HANGZHOU NUMERAL TEN | 
|---|
| 25 | 0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY | 
|---|
| 26 |  | 
|---|
| 27 | Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another | 
|---|
| 28 | big5 codes already, a roundtrip compatibility is not guaranteed for | 
|---|
| 29 | them. | 
|---|
| 30 |  | 
|---|
| 31 |  | 
|---|
| 32 | 2) cp932 codec | 
|---|
| 33 |  | 
|---|
| 34 | To conform to Windows's real mapping, cp932 codec maps the following | 
|---|
| 35 | codepoints in addition of the official cp932 mapping. | 
|---|
| 36 |  | 
|---|
| 37 | CP932     Unicode     Description | 
|---|
| 38 |  | 
|---|
| 39 | 0x80      0x80        UNDEFINED | 
|---|
| 40 | 0xA0      0xF8F0      UNDEFINED | 
|---|
| 41 | 0xFD      0xF8F1      UNDEFINED | 
|---|
| 42 | 0xFE      0xF8F2      UNDEFINED | 
|---|
| 43 | 0xFF      0xF8F3      UNDEFINED | 
|---|
| 44 |  | 
|---|
| 45 |  | 
|---|
| 46 | 3) euc-jisx0213 codec | 
|---|
| 47 |  | 
|---|
| 48 | The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into | 
|---|
| 49 | unicode U+FF3C instead of U+005C as on unicode.org's mapping. | 
|---|
| 50 | Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140 | 
|---|
| 51 | is shown as a full width character, mapping to U+FF3C can make | 
|---|
| 52 | more sense. | 
|---|
| 53 |  | 
|---|
| 54 | The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on | 
|---|
| 55 | codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have | 
|---|
| 56 | overlapped by each other, it doesn't bother standard conformations | 
|---|
| 57 | (and JIS X 0213 Plane 2 is intended to use so.) On encoding | 
|---|
| 58 | sessions, the codec will try to encode kanji characters in this | 
|---|
| 59 | order: | 
|---|
| 60 |  | 
|---|
| 61 | JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212 | 
|---|
| 62 |  | 
|---|
| 63 |  | 
|---|
| 64 | 4) euc-jp codec | 
|---|
| 65 |  | 
|---|
| 66 | The euc-jp codec is a compatibility instance on these points: | 
|---|
| 67 | - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa) | 
|---|
| 68 | - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way) | 
|---|
| 69 | - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way) | 
|---|
| 70 |  | 
|---|
| 71 |  | 
|---|
| 72 | 5) shift-jis codec | 
|---|
| 73 |  | 
|---|
| 74 | The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly | 
|---|
| 75 | instead of using JIS X 0201 for compatibility. The differences are: | 
|---|
| 76 | - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c. | 
|---|
| 77 | - U+007E TILDE is mapped to SHIFT-JIS 0x7e. | 
|---|
| 78 | - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f. | 
|---|
| 79 |  | 
|---|