1 | To generate or modify mapping headers
|
---|
2 | -------------------------------------
|
---|
3 | Mapping headers are imported from CJKCodecs as pre-generated form.
|
---|
4 | If you need to tweak or add something on it, please look at tools/
|
---|
5 | subdirectory of CJKCodecs' distribution.
|
---|
6 |
|
---|
7 |
|
---|
8 |
|
---|
9 | Notes on implmentation characteristics of each codecs
|
---|
10 | -----------------------------------------------------
|
---|
11 |
|
---|
12 | 1) Big5 codec
|
---|
13 |
|
---|
14 | The big5 codec maps the following characters as cp950 does rather
|
---|
15 | than conforming Unicode.org's that maps to 0xFFFD.
|
---|
16 |
|
---|
17 | BIG5 Unicode Description
|
---|
18 |
|
---|
19 | 0xA15A 0x2574 SPACING UNDERSCORE
|
---|
20 | 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
|
---|
21 | 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
|
---|
22 | 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
|
---|
23 | 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
|
---|
24 | 0xA2CC 0x5341 HANGZHOU NUMERAL TEN
|
---|
25 | 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY
|
---|
26 |
|
---|
27 | Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
|
---|
28 | big5 codes already, a roundtrip compatibility is not guaranteed for
|
---|
29 | them.
|
---|
30 |
|
---|
31 |
|
---|
32 | 2) cp932 codec
|
---|
33 |
|
---|
34 | To conform to Windows's real mapping, cp932 codec maps the following
|
---|
35 | codepoints in addition of the official cp932 mapping.
|
---|
36 |
|
---|
37 | CP932 Unicode Description
|
---|
38 |
|
---|
39 | 0x80 0x80 UNDEFINED
|
---|
40 | 0xA0 0xF8F0 UNDEFINED
|
---|
41 | 0xFD 0xF8F1 UNDEFINED
|
---|
42 | 0xFE 0xF8F2 UNDEFINED
|
---|
43 | 0xFF 0xF8F3 UNDEFINED
|
---|
44 |
|
---|
45 |
|
---|
46 | 3) euc-jisx0213 codec
|
---|
47 |
|
---|
48 | The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
|
---|
49 | unicode U+FF3C instead of U+005C as on unicode.org's mapping.
|
---|
50 | Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
|
---|
51 | is shown as a full width character, mapping to U+FF3C can make
|
---|
52 | more sense.
|
---|
53 |
|
---|
54 | The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
|
---|
55 | codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
|
---|
56 | overlapped by each other, it doesn't bother standard conformations
|
---|
57 | (and JIS X 0213 Plane 2 is intended to use so.) On encoding
|
---|
58 | sessions, the codec will try to encode kanji characters in this
|
---|
59 | order:
|
---|
60 |
|
---|
61 | JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
|
---|
62 |
|
---|
63 |
|
---|
64 | 4) euc-jp codec
|
---|
65 |
|
---|
66 | The euc-jp codec is a compatibility instance on these points:
|
---|
67 | - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
|
---|
68 | - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
|
---|
69 | - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
|
---|
70 |
|
---|
71 |
|
---|
72 | 5) shift-jis codec
|
---|
73 |
|
---|
74 | The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
|
---|
75 | instead of using JIS X 0201 for compatibility. The differences are:
|
---|
76 | - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
|
---|
77 | - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
|
---|
78 | - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
|
---|
79 |
|
---|