1 | <html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter 30. Unicode/Charsets</title><link rel="stylesheet" href="../samba.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.74.0"><link rel="home" href="index.html" title="The Official Samba 3.0.x HOWTO and Reference Guide"><link rel="up" href="optional.html" title="Part III. Advanced Configuration"><link rel="prev" href="integrate-ms-networks.html" title="Chapter 29. Integrating MS Windows Networks with Samba"><link rel="next" href="Backup.html" title="Chapter 31. Backup Techniques"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 30. Unicode/Charsets</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="integrate-ms-networks.html">Prev</a> </td><th width="60%" align="center">Part III. Advanced Configuration</th><td width="20%" align="right"> <a accesskey="n" href="Backup.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="unicode"></a>Chapter 30. Unicode/Charsets</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Jelmer</span> <span class="othername">R.</span> <span class="orgname">The Samba Team</span> <span class="surname">Vernooij</span></h3><div class="affiliation"><span class="orgname">The Samba Team<br></span><div class="address"><p><code class="email"><<a class="email" href="mailto:jelmer@samba.org">jelmer@samba.org</a>></code></p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">John</span> <span class="othername">H.</span> <span class="orgname">Samba Team</span> <span class="surname">Terpstra</span></h3><div class="affiliation"><span class="orgname">Samba Team<br></span><div class="address"><p><code class="email"><<a class="email" href="mailto:jht@samba.org">jht@samba.org</a>></code></p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">TAKAHASHI</span> <span class="surname">Motonobu</span></h3><span class="contrib">Japanese character support</span> <div class="affiliation"><div class="address"><p><code class="email"><<a class="email" href="mailto:monyo@home.monyo.com">monyo@home.monyo.com</a>></code></p></div></div></div></div><div><p class="pubdate">25 March 2003</p></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="unicode.html#id2669832">Features and Benefits</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2669883">What Are Charsets and Unicode?</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670017">Samba and Charsets</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670152">Conversion from Old Names</a></span></dt><dt><span class="sect1"><a href="unicode.html#id2670184">Japanese Charsets</a></span></dt><dd><dl><dt><span class="sect2"><a href="unicode.html#id2670324">Basic Parameter Setting</a></span></dt><dt><span class="sect2"><a href="unicode.html#id2670972">Individual Implementations</a></span></dt><dt><span class="sect2"><a href="unicode.html#id2671095">Migration from Samba-2.2 Series</a></span></dt></dl></dd><dt><span class="sect1"><a href="unicode.html#id2671241">Common Errors</a></span></dt><dd><dl><dt><span class="sect2"><a href="unicode.html#id2671247">CP850.so Can't Be Found</a></span></dt></dl></dd></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2669832"></a>Features and Benefits</h2></div></div></div><p>
|
---|
2 | <a class="indexterm" name="id2669840"></a>
|
---|
3 | Every industry eventually matures. One of the great areas of maturation is in
|
---|
4 | the focus that has been given over the past decade to make it possible for anyone
|
---|
5 | anywhere to use a computer. It has not always been that way. In fact, not so long
|
---|
6 | ago, it was common for software to be written for exclusive use in the country of
|
---|
7 | origin.
|
---|
8 | </p><p>
|
---|
9 | Of all the effort that has been brought to bear on providing native
|
---|
10 | language support for all computer users, the efforts of the
|
---|
11 | <a class="ulink" href="http://www.openi18n.org/" target="_top">Openi18n organization</a>
|
---|
12 | is deserving of special mention.
|
---|
13 | </p><p>
|
---|
14 | <a class="indexterm" name="id2669868"></a>
|
---|
15 | Samba-2.x supported a single locale through a mechanism called
|
---|
16 | <span class="emphasis"><em>codepages</em></span>. Samba-3 is destined to become a truly transglobal
|
---|
17 | file- and printer-sharing platform.
|
---|
18 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2669883"></a>What Are Charsets and Unicode?</h2></div></div></div><p>
|
---|
19 | <a class="indexterm" name="id2669891"></a>
|
---|
20 | Computers communicate in numbers. In texts, each number is
|
---|
21 | translated to a corresponding letter. The meaning that will be assigned
|
---|
22 | to a certain number depends on the <span class="emphasis"><em>character set (charset)
|
---|
23 | </em></span> that is used.
|
---|
24 | </p><p>
|
---|
25 | <a class="indexterm" name="id2669909"></a>
|
---|
26 | <a class="indexterm" name="id2669915"></a>
|
---|
27 | A charset can be seen as a table that is used to translate numbers to
|
---|
28 | letters. Not all computers use the same charset (there are charsets
|
---|
29 | with German umlauts, Japanese characters, and so on). The American Standard Code
|
---|
30 | for Information Interchange (ASCII) encoding system has been the normative character
|
---|
31 | encoding scheme used by computers to date. This employs a charset that contains
|
---|
32 | 256 characters. Using this mode of encoding, each character takes exactly one byte.
|
---|
33 | </p><p>
|
---|
34 | <a class="indexterm" name="id2669933"></a>
|
---|
35 | <a class="indexterm" name="id2669940"></a>
|
---|
36 | There are also charsets that support extended characters, but those need at least
|
---|
37 | twice as much storage space as does ASCII encoding. Such charsets can contain
|
---|
38 | <code class="literal">256 * 256 = 65536</code> characters, which is more than all possible
|
---|
39 | characters one could think of. They are called multibyte charsets because they use
|
---|
40 | more then one byte to store one character.
|
---|
41 | </p><p>
|
---|
42 | <a class="indexterm" name="id2669962"></a>
|
---|
43 | One standardized multibyte charset encoding scheme is known as
|
---|
44 | <a class="ulink" href="http://www.unicode.org/" target="_top">unicode</a>. A big advantage of using a
|
---|
45 | multibyte charset is that you only need one. There is no need to make sure two
|
---|
46 | computers use the same charset when they are communicating.
|
---|
47 | </p><p>
|
---|
48 | <a class="indexterm" name="id2669982"></a>
|
---|
49 | <a class="indexterm" name="id2669989"></a>
|
---|
50 | <a class="indexterm" name="id2669996"></a>
|
---|
51 | Old Windows clients use single-byte charsets, named
|
---|
52 | <em class="parameter"><code>codepages</code></em>, by Microsoft. However, there is no support for
|
---|
53 | negotiating the charset to be used in the SMB/CIFS protocol. Thus, you
|
---|
54 | have to make sure you are using the same charset when talking to an older client.
|
---|
55 | Newer clients (Windows NT, 200x, XP) talk Unicode over the wire.
|
---|
56 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670017"></a>Samba and Charsets</h2></div></div></div><p>
|
---|
57 | <a class="indexterm" name="id2670025"></a>
|
---|
58 | <a class="indexterm" name="id2670032"></a>
|
---|
59 | As of Samba-3, Samba can (and will) talk Unicode over the wire. Internally,
|
---|
60 | Samba knows of three kinds of character sets:
|
---|
61 | </p><div class="variablelist"><dl><dt><span class="term"><a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a></span></dt><dd><p>
|
---|
62 | <a class="indexterm" name="id2670064"></a>
|
---|
63 | <a class="indexterm" name="id2670070"></a>
|
---|
64 | This is the charset used internally by your operating system.
|
---|
65 | The default is <code class="constant">UTF-8</code>, which is fine for most
|
---|
66 | systems and covers all characters in all languages. The default
|
---|
67 | in previous Samba releases was to save filenames in the encoding of the
|
---|
68 | clients for example, CP850 for Western European countries.
|
---|
69 | </p></dd><dt><span class="term"><a class="link" href="smb.conf.5.html#DISPLAYCHARSET" target="_top">display charset</a></span></dt><dd><p>This is the charset Samba uses to print messages
|
---|
70 | on your screen. It should generally be the same as the <em class="parameter"><code>unix charset</code></em>.
|
---|
71 | </p></dd><dt><span class="term"><a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a></span></dt><dd><p>This is the charset Samba uses when communicating with
|
---|
72 | DOS and Windows 9x/Me clients. It will talk Unicode to all newer clients.
|
---|
73 | The default depends on the charsets you have installed on your system.
|
---|
74 | Run <code class="literal">testparm -v | grep "dos charset"</code> to see
|
---|
75 | what the default is on your system.
|
---|
76 | </p></dd></dl></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670152"></a>Conversion from Old Names</h2></div></div></div><p>
|
---|
77 | <a class="indexterm" name="id2670160"></a>
|
---|
78 | Because previous Samba versions did not do any charset conversion,
|
---|
79 | characters in filenames are usually not correct in the UNIX charset but only
|
---|
80 | for the local charset used by the DOS/Windows clients.
|
---|
81 | </p><p>Bjoern Jacke has written a utility named <a class="ulink" href="http://j3e.de/linux/convmv/" target="_top">convmv</a>
|
---|
82 | that can convert whole directory structures to different charsets with one single command.
|
---|
83 | </p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2670184"></a>Japanese Charsets</h2></div></div></div><p>
|
---|
84 | Setting up Japanese charsets is quite difficult. This is mainly because:
|
---|
85 | </p><div class="itemizedlist"><ul type="disc"><li><p>
|
---|
86 | <a class="indexterm" name="id2670200"></a>
|
---|
87 | The Windows character set is extended from the original legacy Japanese
|
---|
88 | standard (JIS X 0208) and is not standardized. This means that the strictly
|
---|
89 | standardized implementation cannot support the full Windows character set.
|
---|
90 | </p></li><li><p>
|
---|
91 | <a class="indexterm" name="id2670215"></a>
|
---|
92 | <a class="indexterm" name="id2670221"></a>
|
---|
93 | <a class="indexterm" name="id2670228"></a>
|
---|
94 | <a class="indexterm" name="id2670235"></a>
|
---|
95 | <a class="indexterm" name="id2670242"></a>
|
---|
96 | Mainly for historical reasons, there are several encoding methods in
|
---|
97 | Japanese, which are not fully compatible with each other. There are
|
---|
98 | two major encoding methods. One is the Shift_JIS series used in Windows
|
---|
99 | and some UNIXes. The other is the EUC-JP series used in most UNIXes
|
---|
100 | and Linux. Moreover, Samba previously also offered several unique encoding
|
---|
101 | methods, named CAP and HEX, to keep interoperability with CAP/NetAtalk and
|
---|
102 | UNIXes that can't use Japanese filenames. Some implementations of the
|
---|
103 | EUC-JP series can't support the full Windows character set.
|
---|
104 | </p></li><li><p>There are some code conversion tables between Unicode and legacy
|
---|
105 | Japanese character sets. One is compatible with Windows, another one
|
---|
106 | is based on the reference of the Unicode consortium, and others are
|
---|
107 | a mixed implementation. The Unicode consortium does not officially
|
---|
108 | define any conversion tables between Unicode and legacy character
|
---|
109 | sets, so there cannot be standard one.
|
---|
110 | </p></li><li><p>The character set and conversion tables available in iconv() depend
|
---|
111 | on the iconv library that is available. Next to that, the Japanese locale
|
---|
112 | names may be different on different systems. This means that the value of
|
---|
113 | the charset parameters depends on the implementation of iconv() you are using.
|
---|
114 | </p><p>
|
---|
115 | <a class="indexterm" name="id2670291"></a>
|
---|
116 | <a class="indexterm" name="id2670298"></a>
|
---|
117 | <a class="indexterm" name="id2670305"></a>
|
---|
118 | <a class="indexterm" name="id2670312"></a>
|
---|
119 | Though 2-byte fixed UCS-2 encoding is used in Windows internally,
|
---|
120 | Shift_JIS series encoding is usually used in Japanese environments
|
---|
121 | as ASCII encoding is in English environments.
|
---|
122 | </p></li></ul></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2670324"></a>Basic Parameter Setting</h3></div></div></div><p>
|
---|
123 | <a class="indexterm" name="id2670331"></a>
|
---|
124 | The <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a> and
|
---|
125 | <a class="link" href="smb.conf.5.html#DISPLAYCHARSET" target="_top">display charset</a>
|
---|
126 | should be set to the locale compatible with the character set
|
---|
127 | and encoding method used on Windows. This is usually CP932
|
---|
128 | but sometimes has a different name.
|
---|
129 | </p><p>
|
---|
130 | <a class="indexterm" name="id2670367"></a>
|
---|
131 | <a class="indexterm" name="id2670374"></a>
|
---|
132 | <a class="indexterm" name="id2670380"></a>
|
---|
133 | The <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> can be either Shift_JIS series,
|
---|
134 | EUC-JP series, or UTF-8. UTF-8 is always available, but the availability of other locales
|
---|
135 | and the name itself depends on the system.
|
---|
136 | </p><p>
|
---|
137 | Additionally, you can consider using the Shift_JIS series as the
|
---|
138 | value of the <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a>
|
---|
139 | parameter by using the vfs_cap module, which does the same thing as
|
---|
140 | setting “<span class="quote">coding system = CAP</span>” in the Samba 2.2 series.
|
---|
141 | </p><p>
|
---|
142 | Where to set <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a>
|
---|
143 | to is a difficult question. Here is a list of details, advantages, and
|
---|
144 | disadvantages of using a certain value.
|
---|
145 | </p><div class="variablelist"><dl><dt><span class="term">Shift_JIS series</span></dt><dd><p>
|
---|
146 | Shift_JIS series means a locale that is equivalent to <code class="constant">Shift_JIS</code>,
|
---|
147 | used as a standard on Japanese Windows. In the case of <code class="constant">Shift_JIS</code>,
|
---|
148 | for example, if a Japanese filename consists of 0x8ba4 and 0x974c
|
---|
149 | (a 4-bytes Japanese character string meaning “<span class="quote">share</span>”) and “<span class="quote">.txt</span>”
|
---|
150 | is written from Windows on Samba, the filename on UNIX becomes
|
---|
151 | 0x8ba4, 0x974c, “<span class="quote">.txt</span>” (an 8-byte BINARY string), same as Windows.
|
---|
152 | </p><p>Since Shift_JIS series is usually used on some commercial-based
|
---|
153 | UNIXes; hp-ux and AIX as the Japanese locale (however, it is also possible
|
---|
154 | to use the EUC-JP locale series). To use Shift_JIS series on these platforms,
|
---|
155 | Japanese filenames created from Windows can be referred to also on
|
---|
156 | UNIX.</p><p>
|
---|
157 | If your UNIX is already working with Shift_JIS and there is a user
|
---|
158 | who needs to use Japanese filenames written from Windows, the
|
---|
159 | Shift_JIS series is the best choice. However, broken filenames
|
---|
160 | may be displayed, and some commands that cannot handle non-ASCII
|
---|
161 | filenames may be aborted during parsing filenames. Especially, there
|
---|
162 | may be “<span class="quote">\ (0x5c)</span>” in filenames, which need to be handled carefully.
|
---|
163 | It is best to not touch filenames written from Windows on UNIX.
|
---|
164 | </p><p>
|
---|
165 | Note that most Japanized free software actually works with EUC-JP
|
---|
166 | only. It is good practice to verify that the Japanized free software can work
|
---|
167 | with Shift_JIS.
|
---|
168 | </p></dd><dt><span class="term">EUC-JP series</span></dt><dd><p>
|
---|
169 | <a class="indexterm" name="id2670515"></a>
|
---|
170 | <a class="indexterm" name="id2670522"></a>
|
---|
171 | EUC-JP series means a locale that is equivalent to the industry
|
---|
172 | standard called EUC-JP, widely used in Japanese UNIX (although EUC
|
---|
173 | contains specifications for languages other than Japanese, such as
|
---|
174 | EUC-KR). In the case of EUC-JP series, for example, if a Japanese
|
---|
175 | filename consists of 0x8ba4 and 0x974c and “<span class="quote">.txt</span>” is written from
|
---|
176 | Windows on Samba, the filename on UNIX becomes 0xb6a6, 0xcdad,
|
---|
177 | “<span class="quote">.txt</span>” (an 8-byte BINARY string).
|
---|
178 | </p><p>
|
---|
179 | <a class="indexterm" name="id2670546"></a>
|
---|
180 | <a class="indexterm" name="id2670553"></a>
|
---|
181 | <a class="indexterm" name="id2670560"></a>
|
---|
182 | <a class="indexterm" name="id2670567"></a>
|
---|
183 | <a class="indexterm" name="id2670574"></a>
|
---|
184 | <a class="indexterm" name="id2670580"></a>
|
---|
185 | <a class="indexterm" name="id2670587"></a>
|
---|
186 | <a class="indexterm" name="id2670594"></a>
|
---|
187 | <a class="indexterm" name="id2670601"></a>
|
---|
188 | <a class="indexterm" name="id2670608"></a>
|
---|
189 | Since EUC-JP is usually used on open source UNIX, Linux, and FreeBSD, and on commercial-based UNIX, Solaris,
|
---|
190 | IRIX, and Tru64 UNIX as Japanese locale (however, it is also possible on Solaris to use Shift_JIS and UTF-8,
|
---|
191 | and on Tru64 UNIX it is possible to use Shift_JIS). To use EUC-JP series, most Japanese filenames created from
|
---|
192 | Windows can be referred to also on UNIX. Also, most Japanized free software works mainly with EUC-JP only.
|
---|
193 | </p><p>
|
---|
194 | It is recommended to choose EUC-JP series when using Japanese filenames on UNIX.
|
---|
195 | </p><p>
|
---|
196 | Although there is no character that needs to be carefully treated
|
---|
197 | like “<span class="quote">\ (0x5c)</span>”, broken filenames may be displayed and some
|
---|
198 | commands that cannot handle non-ASCII filenames may be aborted
|
---|
199 | during parsing filenames.
|
---|
200 | </p><p>
|
---|
201 | <a class="indexterm" name="id2670641"></a>
|
---|
202 | Moreover, if you built Samba using differently installed libiconv,
|
---|
203 | the eucJP-ms locale included in libiconv and EUC-JP series locale
|
---|
204 | included in the operating system may not be compatible. In this case, you may need to
|
---|
205 | avoid using incompatible characters for filenames.
|
---|
206 | </p></dd><dt><span class="term">UTF-8</span></dt><dd><p>
|
---|
207 | UTF-8 means a locale equivalent to UTF-8, the international standard defined by the Unicode consortium. In
|
---|
208 | UTF-8, a <em class="parameter"><code>character</code></em> is expressed using 1 to 3 bytes. In case of the Japanese language,
|
---|
209 | most characters are expressed using 3 bytes. Since on Windows Shift_JIS, where a character is expressed with 1
|
---|
210 | or 2 bytes is used to express Japanese, basically a byte length of a UTF-8 string the length of the UTF-8
|
---|
211 | string is 1.5 times that of the original Shift_JIS string. In the case of UTF-8, for example, if a Japanese
|
---|
212 | filename consists of 0x8ba4 and 0x974c, and “<span class="quote">.txt</span>” is written from Windows on Samba, the filename
|
---|
213 | on UNIX becomes 0xe585, 0xb1e6, 0x9c89, “<span class="quote">.txt</span>” (a 10-byte BINARY string).
|
---|
214 | </p><p>
|
---|
215 | For systems where iconv() is not available or where iconv()'s locales
|
---|
216 | are not compatible with Windows, UTF-8 is the only locale available.
|
---|
217 | </p><p>
|
---|
218 | There are no systems that use UTF-8 as the default locale for Japanese.
|
---|
219 | </p><p>
|
---|
220 | Some broken filenames may be displayed, and some commands that
|
---|
221 | cannot handle non-ASCII filenames may be aborted during parsing
|
---|
222 | filenames. Especially, there may be “<span class="quote">\ (0x5c)</span>” in filenames, which
|
---|
223 | must be handled carefully, so you had better not touch filenames
|
---|
224 | written from Windows on UNIX.
|
---|
225 | </p><p>
|
---|
226 | <a class="indexterm" name="id2670722"></a>
|
---|
227 | <a class="indexterm" name="id2670728"></a>
|
---|
228 | <a class="indexterm" name="id2670735"></a>
|
---|
229 | In addition, although it is not directly concerned with Samba, since
|
---|
230 | there is a delicate difference between the iconv() function, which is
|
---|
231 | generally used on UNIX, and the functions used on other platforms,
|
---|
232 | such as Windows and Java, so far is concerens the conversion between
|
---|
233 | Shift_JIS and Unicode UTF-8 must be done with care and recognition
|
---|
234 | of the limitations involved in the process.
|
---|
235 | </p><p>
|
---|
236 | <a class="indexterm" name="id2670752"></a>
|
---|
237 | Although Mac OS X uses UTF-8 as its encoding method for filenames,
|
---|
238 | it uses an extended UTF-8 specification that Samba cannot handle, so
|
---|
239 | UTF-8 locale is not available for Mac OS X.
|
---|
240 | </p></dd><dt><span class="term">Shift_JIS series + vfs_cap (CAP encoding)</span></dt><dd><p>
|
---|
241 | <a class="indexterm" name="id2670773"></a>
|
---|
242 | <a class="indexterm" name="id2670780"></a>
|
---|
243 | <a class="indexterm" name="id2670786"></a>
|
---|
244 | CAP encoding means a specification used in CAP and NetAtalk, file
|
---|
245 | server software for Macintosh. In the case of CAP encoding, for
|
---|
246 | example, if a Japanese filename consists of 0x8ba4 and 0x974c, and
|
---|
247 | “<span class="quote">.txt</span>” is written from Windows on Samba, the filename on UNIX
|
---|
248 | becomes “<span class="quote">:8b:a4:97L.txt</span>” (a 14 bytes ASCII string).
|
---|
249 | </p><p>
|
---|
250 | For CAP encoding, a byte that cannot be expressed as an ASCII
|
---|
251 | character (0x80 or above) is encoded in an “<span class="quote">:xx</span>” form. You need to take
|
---|
252 | care of containing a “<span class="quote">\(0x5c)</span>” in a filename, but filenames are not
|
---|
253 | broken in a system that cannot handle non-ASCII filenames.
|
---|
254 | </p><p>
|
---|
255 | The greatest merit of CAP encoding is the compatibility of encoding
|
---|
256 | filenames with CAP or NetAtalk. These are respectively the Columbia Appletalk
|
---|
257 | Protocol, and the NetAtalk Open Source software project.
|
---|
258 | Since these software applications write a file name on UNIX with CAP encoding, if a
|
---|
259 | directory is shared with both Samba and NetAtalk, you need to use
|
---|
260 | CAP encoding to avoid non-ASCII filenames from being broken.
|
---|
261 | </p><p>
|
---|
262 | However, recently, NetAtalk has been
|
---|
263 | patched on some systems to write filenames with EUC-JP (e.g., Japanese original Vine Linux).
|
---|
264 | In this case, you need to choose EUC-JP series instead of CAP encoding.
|
---|
265 | </p><p>
|
---|
266 | vfs_cap itself is available for non-Shift_JIS series locales for
|
---|
267 | systems that cannot handle non-ASCII characters or systems that
|
---|
268 | share files with NetAtalk.
|
---|
269 | </p><p>
|
---|
270 | To use CAP encoding on Samba-3, you should use the unix charset parameter and VFS
|
---|
271 | as in <a class="link" href="unicode.html#vfscap-intl" title="Example 30.1. VFS CAP">the VFS CAP smb.conf file</a>.
|
---|
272 | </p><div class="example"><a name="vfscap-intl"></a><p class="title"><b>Example 30.1. VFS CAP</b></p><div class="example-contents"><table class="simplelist" border="0" summary="Simple list"><tr><td> </td></tr><tr><td><em class="parameter"><code>[global]</code></em></td></tr><tr><td># the locale name "CP932" may be different</td></tr><tr><td><a class="indexterm" name="id2670886"></a><em class="parameter"><code>dos charset = CP932</code></em></td></tr><tr><td><a class="indexterm" name="id2670897"></a><em class="parameter"><code>unix charset = CP932</code></em></td></tr><tr><td> </td></tr><tr><td><em class="parameter"><code>[cap-share]</code></em></td></tr><tr><td><a class="indexterm" name="id2670918"></a><em class="parameter"><code>vfs option = cap</code></em></td></tr></table></div></div><br class="example-break"><p>
|
---|
273 | <a class="indexterm" name="id2670933"></a>
|
---|
274 | <a class="indexterm" name="id2670940"></a>
|
---|
275 | <a class="indexterm" name="id2670946"></a>
|
---|
276 | <a class="indexterm" name="id2670953"></a>
|
---|
277 | You should set CP932 if using GNU libiconv for unix charset. With this setting,
|
---|
278 | filenames in the “<span class="quote">cap-share</span>” share are written with CAP encoding.
|
---|
279 | </p></dd></dl></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2670972"></a>Individual Implementations</h3></div></div></div><p>
|
---|
280 | Here is some additional information regarding individual implementations:
|
---|
281 | </p><div class="variablelist"><dl><dt><span class="term">GNU libiconv</span></dt><dd><p>
|
---|
282 | To handle Japanese correctly, you should apply the patch
|
---|
283 | <a class="ulink" href="http://www2d.biglobe.ne.jp/~msyk/software/libiconv-patch.html" target="_top">libiconv-1.8-cp932-patch.diff.gz</a>
|
---|
284 | to libiconv-1.8.
|
---|
285 | </p><p>
|
---|
286 | Using the patched libiconv-1.8, these settings are available:
|
---|
287 | </p><pre class="programlisting">
|
---|
288 | dos charset = CP932
|
---|
289 | unix charset = CP932 / eucJP-ms / UTF-8
|
---|
290 | | |
|
---|
291 | | +-- EUC-JP series
|
---|
292 | +-- Shift_JIS series
|
---|
293 | display charset = CP932
|
---|
294 | </pre><p>
|
---|
295 | Other Japanese locales (for example, Shift_JIS and EUC-JP) should not
|
---|
296 | be used because of the lack of the compatibility with Windows.
|
---|
297 | </p></dd><dt><span class="term">GNU glibc</span></dt><dd><p>
|
---|
298 | To handle Japanese correctly, you should apply a <a class="ulink" href="http://www2d.biglobe.ne.jp/~msyk/software/glibc/" target="_top">patch</a>
|
---|
299 | to glibc-2.2.5/2.3.1/2.3.2 or should use the patch-merged versions, glibc-2.3.3 or later.
|
---|
300 | </p><p>
|
---|
301 | Using the above glibc, these setting are available:
|
---|
302 | </p><table class="simplelist" border="0" summary="Simple list"><tr><td><a class="indexterm" name="id2671048"></a><em class="parameter"><code>dos charset = CP932</code></em></td></tr><tr><td><a class="indexterm" name="id2671060"></a><em class="parameter"><code>unix charset = CP932 / eucJP-ms / UTF-8</code></em></td></tr><tr><td><a class="indexterm" name="id2671072"></a><em class="parameter"><code>display charset = CP932</code></em></td></tr></table><p>
|
---|
303 | </p><p>
|
---|
304 | Other Japanese locales (for example, Shift_JIS and EUC-JP) should not
|
---|
305 | be used because of the lack of the compatibility with Windows.
|
---|
306 | </p></dd></dl></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2671095"></a>Migration from Samba-2.2 Series</h3></div></div></div><p>
|
---|
307 | Prior to Samba-2.2 series, the “<span class="quote">coding system</span>” parameter was used. The default codepage in Samba
|
---|
308 | 2.x was code page 850. In the Samba-3 series this has been replaced with the <a class="link" href="smb.conf.5.html#UNIXCHARSET" target="_top">unix charset</a> parameter. <a class="link" href="unicode.html#japancharsets" title="Table 30.1. Japanese Character Sets in Samba-2.2 and Samba-3">Japanese Character Sets in Samba-2.2 and Samba-3</a>
|
---|
309 | shows the mapping table when migrating from the Samba-2.2 series to Samba-3.
|
---|
310 | </p><div class="table"><a name="japancharsets"></a><p class="title"><b>Table 30.1. Japanese Character Sets in Samba-2.2 and Samba-3</b></p><div class="table-contents"><table summary="Japanese Character Sets in Samba-2.2 and Samba-3" border="1"><colgroup><col align="center"><col align="center"></colgroup><thead><tr><th align="center">Samba-2.2 Coding System</th><th align="center">Samba-3 unix charset</th></tr></thead><tbody><tr><td align="center">SJIS</td><td align="center">Shift_JIS series</td></tr><tr><td align="center">EUC</td><td align="center">EUC-JP series</td></tr><tr><td align="center">EUC3<sup>[<a name="id2671191" href="#ftn.id2671191" class="footnote">a</a>]</sup></td><td align="center">EUC-JP series</td></tr><tr><td align="center">CAP</td><td align="center">Shift_JIS series + VFS</td></tr><tr><td align="center">HEX</td><td align="center">currently none</td></tr><tr><td align="center">UTF8</td><td align="center">UTF-8</td></tr><tr><td align="center">UTF8-Mac<sup>[<a name="id2671222" href="#ftn.id2671222" class="footnote">b</a>]</sup></td><td align="center">currently none</td></tr><tr><td align="center">others</td><td align="center">none</td></tr></tbody><tbody class="footnotes"><tr><td colspan="2"><div class="footnote"><p><sup>[<a name="ftn.id2671191" href="#id2671191" class="para">a</a>] </sup>Only exists in Japanese Samba version</p></div><div class="footnote"><p><sup>[<a name="ftn.id2671222" href="#id2671222" class="para">b</a>] </sup>Only exists in Japanese Samba version</p></div></td></tr></tbody></table></div></div><br class="table-break"></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2671241"></a>Common Errors</h2></div></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2671247"></a>CP850.so Can't Be Found</h3></div></div></div><p>“<span class="quote">Samba is complaining about a missing <code class="filename">CP850.so</code> file.</span>”</p><p>
|
---|
311 | CP850 is the default <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a>.
|
---|
312 | The <a class="link" href="smb.conf.5.html#DOSCHARSET" target="_top">dos charset</a> is used to convert data to the codepage used by your DOS clients.
|
---|
313 | If you do not have any DOS clients, you can safely ignore this message. </p><p>
|
---|
314 | CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed.
|
---|
315 | If you compiled Samba from source, make sure that the configure process found iconv. This can be
|
---|
316 | confirmed by checking the <code class="filename">config.log</code> file that is generated when
|
---|
317 | <code class="literal">configure</code> is executed.</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="integrate-ms-networks.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="optional.html">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="Backup.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter 29. Integrating MS Windows Networks with Samba </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> Chapter 31. Backup Techniques</td></tr></table></div></body></html>
|
---|