1 | <?xml version="1.0" encoding="iso-8859-1"?>
|
---|
2 | <!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
|
---|
3 | <chapter id="unicode">
|
---|
4 | <chapterinfo>
|
---|
5 | &author.jelmer;
|
---|
6 | &author.jht;
|
---|
7 | <author>
|
---|
8 | <firstname>TAKAHASHI</firstname><surname>Motonobu</surname>
|
---|
9 | <affiliation>
|
---|
10 | <address><email>monyo@home.monyo.com</email></address>
|
---|
11 | </affiliation>
|
---|
12 | <contrib>Japanese character support</contrib>
|
---|
13 | </author>
|
---|
14 | <pubdate>25 March 2003</pubdate>
|
---|
15 | </chapterinfo>
|
---|
16 |
|
---|
17 | <title>Unicode/Charsets</title>
|
---|
18 |
|
---|
19 | <sect1>
|
---|
20 | <title>Features and Benefits</title>
|
---|
21 |
|
---|
22 | <para>
|
---|
23 | <indexterm><primary>use computer anywhere</primary></indexterm>
|
---|
24 | Every industry eventually matures. One of the great areas of maturation is in
|
---|
25 | the focus that has been given over the past decade to make it possible for anyone
|
---|
26 | anywhere to use a computer. It has not always been that way. In fact, not so long
|
---|
27 | ago, it was common for software to be written for exclusive use in the country of
|
---|
28 | origin.
|
---|
29 | </para>
|
---|
30 |
|
---|
31 | <para>
|
---|
32 | Of all the effort that has been brought to bear on providing native
|
---|
33 | language support for all computer users, the efforts of the
|
---|
34 | <ulink url="http://www.openi18n.org/">Openi18n organization</ulink>
|
---|
35 | is deserving of special mention.
|
---|
36 | </para>
|
---|
37 |
|
---|
38 | <para>
|
---|
39 | <indexterm><primary>codepages</primary></indexterm>
|
---|
40 | Samba-2.x supported a single locale through a mechanism called
|
---|
41 | <emphasis>codepages</emphasis>. Samba-3 is destined to become a truly transglobal
|
---|
42 | file- and printer-sharing platform.
|
---|
43 | </para>
|
---|
44 |
|
---|
45 | </sect1>
|
---|
46 |
|
---|
47 | <sect1>
|
---|
48 | <title>What Are Charsets and Unicode?</title>
|
---|
49 |
|
---|
50 | <para>
|
---|
51 | <indexterm><primary>character set</primary></indexterm>
|
---|
52 | Computers communicate in numbers. In texts, each number is
|
---|
53 | translated to a corresponding letter. The meaning that will be assigned
|
---|
54 | to a certain number depends on the <emphasis>character set (charset)
|
---|
55 | </emphasis> that is used.
|
---|
56 | </para>
|
---|
57 |
|
---|
58 | <para>
|
---|
59 | <indexterm><primary>charset</primary></indexterm>
|
---|
60 | <indexterm><primary>ASCII</primary></indexterm>
|
---|
61 | A charset can be seen as a table that is used to translate numbers to
|
---|
62 | letters. Not all computers use the same charset (there are charsets
|
---|
63 | with German umlauts, Japanese characters, and so on). The American Standard Code
|
---|
64 | for Information Interchange (ASCII) encoding system has been the normative character
|
---|
65 | encoding scheme used by computers to date. This employs a charset that contains
|
---|
66 | 256 characters. Using this mode of encoding, each character takes exactly one byte.
|
---|
67 | </para>
|
---|
68 |
|
---|
69 | <para>
|
---|
70 | <indexterm><primary>multibyte charsets</primary></indexterm>
|
---|
71 | <indexterm><primary>extended characters</primary></indexterm>
|
---|
72 | There are also charsets that support extended characters, but those need at least
|
---|
73 | twice as much storage space as does ASCII encoding. Such charsets can contain
|
---|
74 | <command>256 * 256 = 65536</command> characters, which is more than all possible
|
---|
75 | characters one could think of. They are called multibyte charsets because they use
|
---|
76 | more then one byte to store one character.
|
---|
77 | </para>
|
---|
78 |
|
---|
79 | <para>
|
---|
80 | <indexterm><primary>unicode</primary></indexterm>
|
---|
81 | One standardized multibyte charset encoding scheme is known as
|
---|
82 | <ulink url="http://www.unicode.org/">unicode</ulink>. A big advantage of using a
|
---|
83 | multibyte charset is that you only need one. There is no need to make sure two
|
---|
84 | computers use the same charset when they are communicating.
|
---|
85 | </para>
|
---|
86 |
|
---|
87 | <para>
|
---|
88 | <indexterm><primary>single-byte charsets</primary></indexterm>
|
---|
89 | <indexterm><primary>SMB/CIFS</primary></indexterm>
|
---|
90 | <indexterm><primary>negotiating the charset</primary></indexterm>
|
---|
91 | Old Windows clients use single-byte charsets, named
|
---|
92 | <parameter>codepages</parameter>, by Microsoft. However, there is no support for
|
---|
93 | negotiating the charset to be used in the SMB/CIFS protocol. Thus, you
|
---|
94 | have to make sure you are using the same charset when talking to an older client.
|
---|
95 | Newer clients (Windows NT, 200x, XP) talk Unicode over the wire.
|
---|
96 | </para>
|
---|
97 | </sect1>
|
---|
98 |
|
---|
99 | <sect1>
|
---|
100 | <title>Samba and Charsets</title>
|
---|
101 |
|
---|
102 | <para>
|
---|
103 | <indexterm><primary>Unicode</primary></indexterm>
|
---|
104 | <indexterm><primary>character sets</primary></indexterm>
|
---|
105 | As of Samba-3, Samba can (and will) talk Unicode over the wire. Internally,
|
---|
106 | Samba knows of three kinds of character sets:
|
---|
107 | </para>
|
---|
108 |
|
---|
109 | <variablelist>
|
---|
110 | <varlistentry>
|
---|
111 | <term><smbconfoption name="unix charset"/></term>
|
---|
112 | <listitem><para>
|
---|
113 | <indexterm><primary>UTF-8</primary></indexterm>
|
---|
114 | <indexterm><primary>CP850</primary></indexterm>
|
---|
115 | This is the charset used internally by your operating system.
|
---|
116 | The default is <constant>UTF-8</constant>, which is fine for most
|
---|
117 | systems and covers all characters in all languages. The default
|
---|
118 | in previous Samba releases was to save filenames in the encoding of the
|
---|
119 | clients &smbmdash; for example, CP850 for Western European countries.
|
---|
120 | </para></listitem>
|
---|
121 | </varlistentry>
|
---|
122 |
|
---|
123 | <varlistentry>
|
---|
124 | <term><smbconfoption name="display charset"/></term>
|
---|
125 | <listitem><para>This is the charset Samba uses to print messages
|
---|
126 | on your screen. It should generally be the same as the <parameter>unix charset</parameter>.
|
---|
127 | </para></listitem>
|
---|
128 | </varlistentry>
|
---|
129 |
|
---|
130 | <varlistentry>
|
---|
131 | <term><smbconfoption name="dos charset"/></term>
|
---|
132 | <listitem><para>This is the charset Samba uses when communicating with
|
---|
133 | DOS and Windows 9x/Me clients. It will talk Unicode to all newer clients.
|
---|
134 | The default depends on the charsets you have installed on your system.
|
---|
135 | Run <command>testparm -v | grep "dos charset"</command> to see
|
---|
136 | what the default is on your system.
|
---|
137 | </para></listitem>
|
---|
138 | </varlistentry>
|
---|
139 | </variablelist>
|
---|
140 |
|
---|
141 | </sect1>
|
---|
142 |
|
---|
143 | <sect1>
|
---|
144 | <title>Conversion from Old Names</title>
|
---|
145 |
|
---|
146 | <para>
|
---|
147 | <indexterm><primary>charset conversion</primary></indexterm>
|
---|
148 | Because previous Samba versions did not do any charset conversion,
|
---|
149 | characters in filenames are usually not correct in the UNIX charset but only
|
---|
150 | for the local charset used by the DOS/Windows clients.
|
---|
151 | </para>
|
---|
152 |
|
---|
153 | <para>Bjoern Jacke has written a utility named <ulink url="http://j3e.de/linux/convmv/">convmv</ulink>
|
---|
154 | that can convert whole directory structures to different charsets with one single command.
|
---|
155 | </para>
|
---|
156 |
|
---|
157 | </sect1>
|
---|
158 |
|
---|
159 | <sect1>
|
---|
160 | <title>Japanese Charsets</title>
|
---|
161 |
|
---|
162 | <para>
|
---|
163 | Setting up Japanese charsets is quite difficult. This is mainly because:
|
---|
164 | </para>
|
---|
165 |
|
---|
166 | <itemizedlist>
|
---|
167 | <listitem><para>
|
---|
168 | <indexterm><primary>JIS X 0208</primary></indexterm>
|
---|
169 | The Windows character set is extended from the original legacy Japanese
|
---|
170 | standard (JIS X 0208) and is not standardized. This means that the strictly
|
---|
171 | standardized implementation cannot support the full Windows character set.
|
---|
172 | </para></listitem>
|
---|
173 |
|
---|
174 | <listitem><para>
|
---|
175 | <indexterm><primary>Shift_JIS</primary></indexterm>
|
---|
176 | <indexterm><primary>EUC-JP</primary></indexterm>
|
---|
177 | <indexterm><primary>CAP</primary></indexterm>
|
---|
178 | <indexterm><primary>HEX</primary></indexterm>
|
---|
179 | <indexterm><primary>Japanese</primary></indexterm>
|
---|
180 | Mainly for historical reasons, there are several encoding methods in
|
---|
181 | Japanese, which are not fully compatible with each other. There are
|
---|
182 | two major encoding methods. One is the Shift_JIS series used in Windows
|
---|
183 | and some UNIXes. The other is the EUC-JP series used in most UNIXes
|
---|
184 | and Linux. Moreover, Samba previously also offered several unique encoding
|
---|
185 | methods, named CAP and HEX, to keep interoperability with CAP/NetAtalk and
|
---|
186 | UNIXes that can't use Japanese filenames. Some implementations of the
|
---|
187 | EUC-JP series can't support the full Windows character set.
|
---|
188 | </para></listitem>
|
---|
189 |
|
---|
190 | <listitem><para>There are some code conversion tables between Unicode and legacy
|
---|
191 | Japanese character sets. One is compatible with Windows, another one
|
---|
192 | is based on the reference of the Unicode consortium, and others are
|
---|
193 | a mixed implementation. The Unicode consortium does not officially
|
---|
194 | define any conversion tables between Unicode and legacy character
|
---|
195 | sets, so there cannot be standard one.
|
---|
196 | </para></listitem>
|
---|
197 |
|
---|
198 | <listitem><para>The character set and conversion tables available in iconv() depend
|
---|
199 | on the iconv library that is available. Next to that, the Japanese locale
|
---|
200 | names may be different on different systems. This means that the value of
|
---|
201 | the charset parameters depends on the implementation of iconv() you are using.
|
---|
202 | </para>
|
---|
203 |
|
---|
204 | <para>
|
---|
205 | <indexterm><primary>UCS-2</primary></indexterm>
|
---|
206 | <indexterm><primary>Shift_JIS</primary></indexterm>
|
---|
207 | <indexterm><primary>ASCII</primary></indexterm>
|
---|
208 | <indexterm><primary>English</primary></indexterm>
|
---|
209 | Though 2-byte fixed UCS-2 encoding is used in Windows internally,
|
---|
210 | Shift_JIS series encoding is usually used in Japanese environments
|
---|
211 | as ASCII encoding is in English environments.
|
---|
212 | </para></listitem>
|
---|
213 | </itemizedlist>
|
---|
214 |
|
---|
215 | <sect2><title>Basic Parameter Setting</title>
|
---|
216 |
|
---|
217 | <para>
|
---|
218 | <indexterm><primary>CP932</primary></indexterm>
|
---|
219 | The <smbconfoption name="dos charset"/> and
|
---|
220 | <smbconfoption name="display charset"/>
|
---|
221 | should be set to the locale compatible with the character set
|
---|
222 | and encoding method used on Windows. This is usually CP932
|
---|
223 | but sometimes has a different name.
|
---|
224 | </para>
|
---|
225 |
|
---|
226 | <para>
|
---|
227 | <indexterm><primary>Shift_JIS</primary></indexterm>
|
---|
228 | <indexterm><primary>UTF-8</primary></indexterm>
|
---|
229 | <indexterm><primary>EUC-JP</primary></indexterm>
|
---|
230 | The <smbconfoption name="unix charset"/> can be either Shift_JIS series,
|
---|
231 | EUC-JP series, or UTF-8. UTF-8 is always available, but the availability of other locales
|
---|
232 | and the name itself depends on the system.
|
---|
233 | </para>
|
---|
234 |
|
---|
235 | <para>
|
---|
236 | Additionally, you can consider using the Shift_JIS series as the
|
---|
237 | value of the <smbconfoption name="unix charset"/>
|
---|
238 | parameter by using the vfs_cap module, which does the same thing as
|
---|
239 | setting <quote>coding system = CAP</quote> in the Samba 2.2 series.
|
---|
240 | </para>
|
---|
241 |
|
---|
242 | <para>
|
---|
243 | Where to set <smbconfoption name="unix charset"/>
|
---|
244 | to is a difficult question. Here is a list of details, advantages, and
|
---|
245 | disadvantages of using a certain value.
|
---|
246 | </para>
|
---|
247 |
|
---|
248 | <variablelist>
|
---|
249 | <varlistentry><term>Shift_JIS series</term>
|
---|
250 | <listitem><para>
|
---|
251 | Shift_JIS series means a locale that is equivalent to <constant>Shift_JIS</constant>,
|
---|
252 | used as a standard on Japanese Windows. In the case of <constant>Shift_JIS</constant>,
|
---|
253 | for example, if a Japanese filename consists of 0x8ba4 and 0x974c
|
---|
254 | (a 4-bytes Japanese character string meaning <quote>share</quote>) and <quote>.txt</quote>
|
---|
255 | is written from Windows on Samba, the filename on UNIX becomes
|
---|
256 | 0x8ba4, 0x974c, <quote>.txt</quote> (an 8-byte BINARY string), same as Windows.
|
---|
257 | </para>
|
---|
258 |
|
---|
259 | <para>Since Shift_JIS series is usually used on some commercial-based
|
---|
260 | UNIXes; hp-ux and AIX as the Japanese locale (however, it is also possible
|
---|
261 | to use the EUC-JP locale series). To use Shift_JIS series on these platforms,
|
---|
262 | Japanese filenames created from Windows can be referred to also on
|
---|
263 | UNIX.</para>
|
---|
264 |
|
---|
265 | <para>
|
---|
266 | If your UNIX is already working with Shift_JIS and there is a user
|
---|
267 | who needs to use Japanese filenames written from Windows, the
|
---|
268 | Shift_JIS series is the best choice. However, broken filenames
|
---|
269 | may be displayed, and some commands that cannot handle non-ASCII
|
---|
270 | filenames may be aborted during parsing filenames. Especially, there
|
---|
271 | may be <quote>\ (0x5c)</quote> in filenames, which need to be handled carefully.
|
---|
272 | It is best to not touch filenames written from Windows on UNIX.
|
---|
273 | </para>
|
---|
274 |
|
---|
275 | <para>
|
---|
276 | Note that most Japanized free software actually works with EUC-JP
|
---|
277 | only. It is good practice to verify that the Japanized free software can work
|
---|
278 | with Shift_JIS.
|
---|
279 | </para>
|
---|
280 | </listitem>
|
---|
281 | </varlistentry>
|
---|
282 |
|
---|
283 | <varlistentry><term>EUC-JP series</term>
|
---|
284 | <listitem><para>
|
---|
285 | <indexterm><primary>EUC-JP</primary></indexterm>
|
---|
286 | <indexterm><primary>Japanese UNIX</primary></indexterm>
|
---|
287 | EUC-JP series means a locale that is equivalent to the industry
|
---|
288 | standard called EUC-JP, widely used in Japanese UNIX (although EUC
|
---|
289 | contains specifications for languages other than Japanese, such as
|
---|
290 | EUC-KR). In the case of EUC-JP series, for example, if a Japanese
|
---|
291 | filename consists of 0x8ba4 and 0x974c and <quote>.txt</quote> is written from
|
---|
292 | Windows on Samba, the filename on UNIX becomes 0xb6a6, 0xcdad,
|
---|
293 | <quote>.txt</quote> (an 8-byte BINARY string).
|
---|
294 | </para>
|
---|
295 |
|
---|
296 | <para>
|
---|
297 | <indexterm><primary>EUC-JP</primary></indexterm>
|
---|
298 | <indexterm><primary>UNIX</primary></indexterm>
|
---|
299 | <indexterm><primary>Linux</primary></indexterm>
|
---|
300 | <indexterm><primary>FreeBSD</primary></indexterm>
|
---|
301 | <indexterm><primary>Solaris</primary></indexterm>
|
---|
302 | <indexterm><primary>IRIX</primary></indexterm>
|
---|
303 | <indexterm><primary>Tru64 UNIX</primary></indexterm>
|
---|
304 | <indexterm><primary>Japanese locale</primary></indexterm>
|
---|
305 | <indexterm><primary>Shift_JIS</primary></indexterm>
|
---|
306 | <indexterm><primary>UTF-8</primary></indexterm>
|
---|
307 | Since EUC-JP is usually used on open source UNIX, Linux, and FreeBSD, and on commercial-based UNIX, Solaris,
|
---|
308 | IRIX, and Tru64 UNIX as Japanese locale (however, it is also possible on Solaris to use Shift_JIS and UTF-8,
|
---|
309 | and on Tru64 UNIX it is possible to use Shift_JIS). To use EUC-JP series, most Japanese filenames created from
|
---|
310 | Windows can be referred to also on UNIX. Also, most Japanized free software works mainly with EUC-JP only.
|
---|
311 | </para>
|
---|
312 |
|
---|
313 | <para>
|
---|
314 | It is recommended to choose EUC-JP series when using Japanese filenames on UNIX.
|
---|
315 | </para>
|
---|
316 |
|
---|
317 | <para>
|
---|
318 | Although there is no character that needs to be carefully treated
|
---|
319 | like <quote>\ (0x5c)</quote>, broken filenames may be displayed and some
|
---|
320 | commands that cannot handle non-ASCII filenames may be aborted
|
---|
321 | during parsing filenames.
|
---|
322 | </para>
|
---|
323 |
|
---|
324 | <para>
|
---|
325 | <indexterm><primary>eucJP-ms locale</primary></indexterm>
|
---|
326 | Moreover, if you built Samba using differently installed libiconv,
|
---|
327 | the eucJP-ms locale included in libiconv and EUC-JP series locale
|
---|
328 | included in the operating system may not be compatible. In this case, you may need to
|
---|
329 | avoid using incompatible characters for filenames.
|
---|
330 | </para>
|
---|
331 | </listitem>
|
---|
332 | </varlistentry>
|
---|
333 |
|
---|
334 | <varlistentry><term>UTF-8</term>
|
---|
335 | <listitem><para>
|
---|
336 | UTF-8 means a locale equivalent to UTF-8, the international standard defined by the Unicode consortium. In
|
---|
337 | UTF-8, a <parameter>character</parameter> is expressed using 1 to 3 bytes. In case of the Japanese language,
|
---|
338 | most characters are expressed using 3 bytes. Since on Windows Shift_JIS, where a character is expressed with 1
|
---|
339 | or 2 bytes is used to express Japanese, basically a byte length of a UTF-8 string the length of the UTF-8
|
---|
340 | string is 1.5 times that of the original Shift_JIS string. In the case of UTF-8, for example, if a Japanese
|
---|
341 | filename consists of 0x8ba4 and 0x974c, and <quote>.txt</quote> is written from Windows on Samba, the filename
|
---|
342 | on UNIX becomes 0xe585, 0xb1e6, 0x9c89, <quote>.txt</quote> (a 10-byte BINARY string).
|
---|
343 | </para>
|
---|
344 |
|
---|
345 | <para>
|
---|
346 | For systems where iconv() is not available or where iconv()'s locales
|
---|
347 | are not compatible with Windows, UTF-8 is the only locale available.
|
---|
348 | </para>
|
---|
349 |
|
---|
350 | <para>
|
---|
351 | There are no systems that use UTF-8 as the default locale for Japanese.
|
---|
352 | </para>
|
---|
353 |
|
---|
354 | <para>
|
---|
355 | Some broken filenames may be displayed, and some commands that
|
---|
356 | cannot handle non-ASCII filenames may be aborted during parsing
|
---|
357 | filenames. Especially, there may be <quote>\ (0x5c)</quote> in filenames, which
|
---|
358 | must be handled carefully, so you had better not touch filenames
|
---|
359 | written from Windows on UNIX.
|
---|
360 | </para>
|
---|
361 |
|
---|
362 | <para>
|
---|
363 | <indexterm><primary>Windows</primary></indexterm>
|
---|
364 | <indexterm><primary>Java</primary></indexterm>
|
---|
365 | <indexterm><primary>Unicode UTF-8</primary></indexterm>
|
---|
366 | In addition, although it is not directly concerned with Samba, since
|
---|
367 | there is a delicate difference between the iconv() function, which is
|
---|
368 | generally used on UNIX, and the functions used on other platforms,
|
---|
369 | such as Windows and Java, so far is concerns the conversion between
|
---|
370 | Shift_JIS and Unicode UTF-8 must be done with care and recognition
|
---|
371 | of the limitations involved in the process.
|
---|
372 | </para>
|
---|
373 |
|
---|
374 | <para>
|
---|
375 | <indexterm><primary>Mac OS X </primary></indexterm>
|
---|
376 | Although Mac OS X uses UTF-8 as its encoding method for filenames,
|
---|
377 | it uses an extended UTF-8 specification that Samba cannot handle, so
|
---|
378 | UTF-8 locale is not available for Mac OS X.
|
---|
379 | </para>
|
---|
380 | </listitem>
|
---|
381 | </varlistentry>
|
---|
382 |
|
---|
383 | <varlistentry><term>Shift_JIS series + vfs_cap (CAP encoding)</term>
|
---|
384 | <listitem><para>
|
---|
385 | <indexterm><primary>CAP</primary></indexterm>
|
---|
386 | <indexterm><primary>NetAtalk</primary></indexterm>
|
---|
387 | <indexterm><primary>Macintosh</primary></indexterm>
|
---|
388 | CAP encoding means a specification used in CAP and NetAtalk, file
|
---|
389 | server software for Macintosh. In the case of CAP encoding, for
|
---|
390 | example, if a Japanese filename consists of 0x8ba4 and 0x974c, and
|
---|
391 | <quote>.txt</quote> is written from Windows on Samba, the filename on UNIX
|
---|
392 | becomes <quote>:8b:a4:97L.txt</quote> (a 14 bytes ASCII string).
|
---|
393 | </para>
|
---|
394 |
|
---|
395 | <para>
|
---|
396 | For CAP encoding, a byte that cannot be expressed as an ASCII
|
---|
397 | character (0x80 or above) is encoded in an <quote>:xx</quote> form. You need to take
|
---|
398 | care of containing a <quote>\(0x5c)</quote> in a filename, but filenames are not
|
---|
399 | broken in a system that cannot handle non-ASCII filenames.
|
---|
400 | </para>
|
---|
401 |
|
---|
402 | <para>
|
---|
403 | The greatest merit of CAP encoding is the compatibility of encoding
|
---|
404 | filenames with CAP or NetAtalk. These are respectively the Columbia Appletalk
|
---|
405 | Protocol, and the NetAtalk Open Source software project.
|
---|
406 | Since these software applications write a file name on UNIX with CAP encoding, if a
|
---|
407 | directory is shared with both Samba and NetAtalk, you need to use
|
---|
408 | CAP encoding to avoid non-ASCII filenames from being broken.
|
---|
409 | </para>
|
---|
410 |
|
---|
411 | <para>
|
---|
412 | However, recently, NetAtalk has been
|
---|
413 | patched on some systems to write filenames with EUC-JP (e.g., Japanese original Vine Linux).
|
---|
414 | In this case, you need to choose EUC-JP series instead of CAP encoding.
|
---|
415 | </para>
|
---|
416 |
|
---|
417 | <para>
|
---|
418 | vfs_cap itself is available for non-Shift_JIS series locales for
|
---|
419 | systems that cannot handle non-ASCII characters or systems that
|
---|
420 | share files with NetAtalk.
|
---|
421 | </para>
|
---|
422 |
|
---|
423 | <para>
|
---|
424 | To use CAP encoding on Samba-3, you should use the unix charset parameter and VFS
|
---|
425 | as in <link linkend="vfscap-intl">the VFS CAP smb.conf file</link>.
|
---|
426 | </para>
|
---|
427 |
|
---|
428 | <example id="vfscap-intl">
|
---|
429 | <title>VFS CAP</title>
|
---|
430 | <smbconfblock>
|
---|
431 | <smbconfsection name="[global]"/>
|
---|
432 | <smbconfcomment>the locale name "CP932" may be different</smbconfcomment>
|
---|
433 | <smbconfoption name="dos charset">CP932</smbconfoption>
|
---|
434 | <smbconfoption name="unix charset">CP932</smbconfoption>
|
---|
435 |
|
---|
436 | <smbconfsection name="[cap-share]"/>
|
---|
437 | <smbconfoption name="vfs option">cap</smbconfoption>
|
---|
438 | </smbconfblock>
|
---|
439 | </example>
|
---|
440 |
|
---|
441 | <para>
|
---|
442 | <indexterm><primary>CP932</primary></indexterm>
|
---|
443 | <indexterm><primary>libiconv</primary></indexterm>
|
---|
444 | <indexterm><primary>unix charset</primary></indexterm>
|
---|
445 | <indexterm><primary>cap-share</primary></indexterm>
|
---|
446 | You should set CP932 if using GNU libiconv for unix charset. With this setting,
|
---|
447 | filenames in the <quote>cap-share</quote> share are written with CAP encoding.
|
---|
448 | </para>
|
---|
449 | </listitem>
|
---|
450 | </varlistentry>
|
---|
451 | </variablelist>
|
---|
452 |
|
---|
453 | </sect2>
|
---|
454 |
|
---|
455 | <sect2><title>Individual Implementations</title>
|
---|
456 |
|
---|
457 | <para>
|
---|
458 | Here is some additional information regarding individual implementations:
|
---|
459 | </para>
|
---|
460 |
|
---|
461 | <variablelist>
|
---|
462 | <varlistentry><term>GNU libiconv</term>
|
---|
463 | <listitem><para>
|
---|
464 | To handle Japanese correctly, you should apply the patch
|
---|
465 | <ulink url="http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html">libiconv-1.8-cp932-patch.diff.gz</ulink>
|
---|
466 | to libiconv-1.8.
|
---|
467 | </para>
|
---|
468 |
|
---|
469 | <para>
|
---|
470 | Using the patched libiconv-1.8, these settings are available:
|
---|
471 | </para>
|
---|
472 |
|
---|
473 | <programlisting>
|
---|
474 | dos charset = CP932
|
---|
475 | unix charset = CP932 / eucJP-ms / UTF-8
|
---|
476 | | |
|
---|
477 | | +-- EUC-JP series
|
---|
478 | +-- Shift_JIS series
|
---|
479 | display charset = CP932
|
---|
480 | </programlisting>
|
---|
481 |
|
---|
482 | <para>
|
---|
483 | Other Japanese locales (for example, Shift_JIS and EUC-JP) should not
|
---|
484 | be used because of the lack of the compatibility with Windows.
|
---|
485 | </para>
|
---|
486 | </listitem>
|
---|
487 | </varlistentry>
|
---|
488 |
|
---|
489 | <varlistentry><term>GNU glibc</term>
|
---|
490 | <listitem><para>
|
---|
491 | To handle Japanese correctly, you should apply a <ulink url="http://www2d.biglobe.ne.jp/~msyk/software/glibc/">patch</ulink>
|
---|
492 | to glibc-2.2.5/2.3.1/2.3.2 or should use the patch-merged versions, glibc-2.3.3 or later.
|
---|
493 | </para>
|
---|
494 |
|
---|
495 | <para>
|
---|
496 | Using the above glibc, these setting are available:
|
---|
497 | <smbconfblock>
|
---|
498 | <smbconfoption name="dos charset">CP932</smbconfoption>
|
---|
499 | <smbconfoption name="unix charset">CP932 / eucJP-ms / UTF-8</smbconfoption>
|
---|
500 | <smbconfoption name="display charset">CP932</smbconfoption>
|
---|
501 | </smbconfblock>
|
---|
502 | </para>
|
---|
503 |
|
---|
504 | <para>
|
---|
505 | Other Japanese locales (for example, Shift_JIS and EUC-JP) should not
|
---|
506 | be used because of the lack of the compatibility with Windows.
|
---|
507 | </para>
|
---|
508 | </listitem>
|
---|
509 | </varlistentry>
|
---|
510 | </variablelist>
|
---|
511 |
|
---|
512 | </sect2>
|
---|
513 |
|
---|
514 | <sect2>
|
---|
515 | <title>Migration from Samba-2.2 Series</title>
|
---|
516 |
|
---|
517 | <para>
|
---|
518 | Prior to Samba-2.2 series, the <quote>coding system</quote> parameter was used. The default codepage in Samba
|
---|
519 | 2.x was code page 850. In the Samba-3 series this has been replaced with the <smbconfoption name="unix
|
---|
520 | charset"/> parameter. <link linkend="japancharsets">Japanese Character Sets in Samba-2.2 and Samba-3</link>
|
---|
521 | shows the mapping table when migrating from the Samba-2.2 series to Samba-3.
|
---|
522 | </para>
|
---|
523 |
|
---|
524 | <table frame="all" id="japancharsets">
|
---|
525 | <title>Japanese Character Sets in Samba-2.2 and Samba-3</title>
|
---|
526 |
|
---|
527 | <tgroup cols="2" align="center">
|
---|
528 | <colspec align="center"/>
|
---|
529 | <colspec align="center"/>
|
---|
530 | <thead>
|
---|
531 | <row><entry>Samba-2.2 Coding System</entry><entry>Samba-3 unix charset</entry></row>
|
---|
532 | </thead>
|
---|
533 | <tbody>
|
---|
534 | <row><entry>SJIS</entry><entry>Shift_JIS series</entry></row>
|
---|
535 | <row><entry>EUC</entry><entry>EUC-JP series</entry></row>
|
---|
536 | <row><entry>EUC3<footnote><para>Only exists in Japanese Samba version</para></footnote></entry><entry>EUC-JP series</entry></row>
|
---|
537 | <row><entry>CAP</entry><entry>Shift_JIS series + VFS</entry></row>
|
---|
538 | <row><entry>HEX</entry><entry>currently none</entry></row>
|
---|
539 | <row><entry>UTF8</entry><entry>UTF-8</entry></row>
|
---|
540 | <row><entry>UTF8-Mac<footnote><para>Only exists in Japanese Samba version</para></footnote></entry><entry>currently none</entry></row>
|
---|
541 | <row><entry>others</entry><entry>none</entry></row>
|
---|
542 | </tbody>
|
---|
543 | </tgroup>
|
---|
544 | </table>
|
---|
545 |
|
---|
546 | </sect2>
|
---|
547 |
|
---|
548 | </sect1>
|
---|
549 |
|
---|
550 | <sect1>
|
---|
551 | <title>Common Errors</title>
|
---|
552 |
|
---|
553 | <sect2>
|
---|
554 | <title>CP850.so Can't Be Found</title>
|
---|
555 |
|
---|
556 | <para><quote>Samba is complaining about a missing <filename>CP850.so</filename> file.</quote></para>
|
---|
557 |
|
---|
558 | <para>
|
---|
559 | CP850 is the default <smbconfoption name="dos charset"/>.
|
---|
560 | The <smbconfoption name="dos charset"/> is used to convert data to the codepage used by your DOS clients.
|
---|
561 | If you do not have any DOS clients, you can safely ignore this message. </para>
|
---|
562 |
|
---|
563 | <para>
|
---|
564 | CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed.
|
---|
565 | If you compiled Samba from source, make sure that the configure process found iconv. This can be
|
---|
566 | confirmed by checking the <filename>config.log</filename> file that is generated when
|
---|
567 | <command>configure</command> is executed.</para>
|
---|
568 | </sect2>
|
---|
569 | </sect1>
|
---|
570 |
|
---|
571 | </chapter>
|
---|