1 | <?xml version="1.0" encoding="iso-8859-1"?>
|
---|
2 | <!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
|
---|
3 | <chapter id="SambaHA">
|
---|
4 | <chapterinfo>
|
---|
5 | &author.jht;
|
---|
6 | &author.jeremy;
|
---|
7 | </chapterinfo>
|
---|
8 |
|
---|
9 | <title>High Availability</title>
|
---|
10 |
|
---|
11 | <sect1>
|
---|
12 | <title>Features and Benefits</title>
|
---|
13 |
|
---|
14 | <para>
|
---|
15 | <indexterm><primary>availability</primary></indexterm>
|
---|
16 | <indexterm><primary>intolerance</primary></indexterm>
|
---|
17 | <indexterm><primary>vital task</primary></indexterm>
|
---|
18 | Network administrators are often concerned about the availability of file and print
|
---|
19 | services. Network users are inclined toward intolerance of the services they depend
|
---|
20 | on to perform vital task responsibilities.
|
---|
21 | </para>
|
---|
22 |
|
---|
23 | <para>
|
---|
24 | A sign in a computer room served to remind staff of their responsibilities. It read:
|
---|
25 | </para>
|
---|
26 |
|
---|
27 | <blockquote>
|
---|
28 | <para>
|
---|
29 | <indexterm><primary>fail</primary></indexterm>
|
---|
30 | <indexterm><primary>managed by humans</primary></indexterm>
|
---|
31 | <indexterm><primary>economically wise</primary></indexterm>
|
---|
32 | <indexterm><primary>anticipate failure</primary></indexterm>
|
---|
33 | All humans fail, in both great and small ways we fail continually. Machines fail too.
|
---|
34 | Computers are machines that are managed by humans, the fallout from failure
|
---|
35 | can be spectacular. Your responsibility is to deal with failure, to anticipate it
|
---|
36 | and to eliminate it as far as is humanly and economically wise to achieve.
|
---|
37 | Are your actions part of the problem or part of the solution?
|
---|
38 | </para>
|
---|
39 | </blockquote>
|
---|
40 |
|
---|
41 | <para>
|
---|
42 | If we are to deal with failure in a planned and productive manner, then first we must
|
---|
43 | understand the problem. That is the purpose of this chapter.
|
---|
44 | </para>
|
---|
45 |
|
---|
46 | <para>
|
---|
47 | <indexterm><primary>high availability</primary></indexterm>
|
---|
48 | <indexterm><primary>CIFS/SMB</primary></indexterm>
|
---|
49 | <indexterm><primary>state of knowledge</primary></indexterm>
|
---|
50 | Parenthetically, in the following discussion there are seeds of information on how to
|
---|
51 | provision a network infrastructure against failure. Our purpose here is not to provide
|
---|
52 | a lengthy dissertation on the subject of high availability. Additionally, we have made
|
---|
53 | a conscious decision to not provide detailed working examples of high availability
|
---|
54 | solutions; instead we present an overview of the issues in the hope that someone will
|
---|
55 | rise to the challenge of providing a detailed document that is focused purely on
|
---|
56 | presentation of the current state of knowledge and practice in high availability as it
|
---|
57 | applies to the deployment of Samba and other CIFS/SMB technologies.
|
---|
58 | </para>
|
---|
59 |
|
---|
60 | </sect1>
|
---|
61 |
|
---|
62 | <sect1>
|
---|
63 | <title>Technical Discussion</title>
|
---|
64 |
|
---|
65 | <para>
|
---|
66 | <indexterm><primary>SambaXP conference</primary></indexterm>
|
---|
67 | <indexterm><primary>Germany</primary></indexterm>
|
---|
68 | <indexterm><primary>inspired structure</primary></indexterm>
|
---|
69 | The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
|
---|
70 | conference that was held at Goettingen, Germany, in April 2003. Material has been added
|
---|
71 | from other sources, but it was Jeremy who inspired the structure that follows.
|
---|
72 | </para>
|
---|
73 |
|
---|
74 | <sect2>
|
---|
75 | <title>The Ultimate Goal</title>
|
---|
76 |
|
---|
77 | <para>
|
---|
78 | <indexterm><primary>clustering technologies</primary></indexterm>
|
---|
79 | <indexterm><primary>affordable power</primary></indexterm>
|
---|
80 | <indexterm><primary>unstoppable services</primary></indexterm>
|
---|
81 | All clustering technologies aim to achieve one or more of the following:
|
---|
82 | </para>
|
---|
83 |
|
---|
84 | <itemizedlist>
|
---|
85 | <listitem><para>Obtain the maximum affordable computational power.</para></listitem>
|
---|
86 | <listitem><para>Obtain faster program execution.</para></listitem>
|
---|
87 | <listitem><para>Deliver unstoppable services.</para></listitem>
|
---|
88 | <listitem><para>Avert points of failure.</para></listitem>
|
---|
89 | <listitem><para>Exact most effective utilization of resources.</para></listitem>
|
---|
90 | </itemizedlist>
|
---|
91 |
|
---|
92 | <para>
|
---|
93 | A clustered file server ideally has the following properties:
|
---|
94 | <indexterm><primary>clustered file server</primary></indexterm>
|
---|
95 | <indexterm><primary>connect transparently</primary></indexterm>
|
---|
96 | <indexterm><primary>transparently reconnected</primary></indexterm>
|
---|
97 | <indexterm><primary>distributed file system</primary></indexterm>
|
---|
98 | </para>
|
---|
99 |
|
---|
100 | <itemizedlist>
|
---|
101 | <listitem><para>All clients can connect transparently to any server.</para></listitem>
|
---|
102 | <listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
|
---|
103 | <listitem><para>All servers serve out the same set of files.</para></listitem>
|
---|
104 | <listitem><para>All file changes are immediately seen on all servers.</para>
|
---|
105 | <itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
|
---|
106 | <listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
|
---|
107 | </itemizedlist>
|
---|
108 |
|
---|
109 | </sect2>
|
---|
110 |
|
---|
111 | <sect2>
|
---|
112 | <title>Why Is This So Hard?</title>
|
---|
113 |
|
---|
114 | <para>
|
---|
115 | In short, the problem is one of <emphasis>state</emphasis>.
|
---|
116 | </para>
|
---|
117 |
|
---|
118 | <itemizedlist>
|
---|
119 | <listitem>
|
---|
120 | <para>
|
---|
121 | <indexterm><primary>state information</primary></indexterm>
|
---|
122 | All TCP/IP connections are dependent on state information.
|
---|
123 | </para>
|
---|
124 | <para>
|
---|
125 | <indexterm><primary>TCP failover</primary></indexterm>
|
---|
126 | The TCP connection involves a packet sequence number. This
|
---|
127 | sequence number would need to be dynamically updated on all
|
---|
128 | machines in the cluster to effect seamless TCP failover.
|
---|
129 | </para>
|
---|
130 | </listitem>
|
---|
131 | <listitem>
|
---|
132 | <para>
|
---|
133 | <indexterm><primary>CIFS/SMB</primary></indexterm>
|
---|
134 | <indexterm><primary>TCP</primary></indexterm>
|
---|
135 | CIFS/SMB (the Windows networking protocols) uses TCP connections.
|
---|
136 | </para>
|
---|
137 | <para>
|
---|
138 | This means that from a basic design perspective, failover is not
|
---|
139 | seriously considered.
|
---|
140 | <itemizedlist>
|
---|
141 | <listitem><para>
|
---|
142 | All current SMB clusters are failover solutions
|
---|
143 | &smbmdash; they rely on the clients to reconnect. They provide server
|
---|
144 | failover, but clients can lose information due to a server failure.
|
---|
145 | <indexterm><primary>server failure</primary></indexterm>
|
---|
146 | </para></listitem>
|
---|
147 | </itemizedlist>
|
---|
148 | </para>
|
---|
149 | </listitem>
|
---|
150 | <listitem>
|
---|
151 | <para>
|
---|
152 | Servers keep state information about client connections.
|
---|
153 | <itemizedlist>
|
---|
154 | <indexterm><primary>state</primary></indexterm>
|
---|
155 | <listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
|
---|
156 | <listitem><para>Every file open must be compared with other open files
|
---|
157 | to check share modes.</para></listitem>
|
---|
158 | </itemizedlist>
|
---|
159 | </para>
|
---|
160 | </listitem>
|
---|
161 | </itemizedlist>
|
---|
162 |
|
---|
163 | <sect3>
|
---|
164 | <title>The Front-End Challenge</title>
|
---|
165 |
|
---|
166 | <para>
|
---|
167 | <indexterm><primary>cluster servers</primary></indexterm>
|
---|
168 | <indexterm><primary>single server</primary></indexterm>
|
---|
169 | <indexterm><primary>TCP data streams</primary></indexterm>
|
---|
170 | <indexterm><primary>front-end virtual server</primary></indexterm>
|
---|
171 | <indexterm><primary>virtual server</primary></indexterm>
|
---|
172 | <indexterm><primary>de-multiplex</primary></indexterm>
|
---|
173 | <indexterm><primary>SMB</primary></indexterm>
|
---|
174 | To make it possible for a cluster of file servers to appear as a single server that has one
|
---|
175 | name and one IP address, the incoming TCP data streams from clients must be processed by the
|
---|
176 | front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
|
---|
177 | layer level and then feed the SMB packet to different servers in the cluster.
|
---|
178 | </para>
|
---|
179 |
|
---|
180 | <para>
|
---|
181 | <indexterm><primary>IPC$ connections</primary></indexterm>
|
---|
182 | <indexterm><primary>RPC calls</primary></indexterm>
|
---|
183 | One could split all IPC$ connections and RPC calls to one server to handle printing and user
|
---|
184 | lookup requirements. RPC printing handles are shared between different IPC4 sessions &smbmdash; it is
|
---|
185 | hard to split this across clustered servers!
|
---|
186 | </para>
|
---|
187 |
|
---|
188 | <para>
|
---|
189 | Conceptually speaking, all other servers would then provide only file services. This is a simpler
|
---|
190 | problem to concentrate on.
|
---|
191 | </para>
|
---|
192 |
|
---|
193 | </sect3>
|
---|
194 |
|
---|
195 | <sect3>
|
---|
196 | <title>Demultiplexing SMB Requests</title>
|
---|
197 |
|
---|
198 | <para>
|
---|
199 | <indexterm><primary>SMB requests</primary></indexterm>
|
---|
200 | <indexterm><primary>SMB state information</primary></indexterm>
|
---|
201 | <indexterm><primary>front-end virtual server</primary></indexterm>
|
---|
202 | <indexterm><primary>complicated problem</primary></indexterm>
|
---|
203 | De-multiplexing of SMB requests requires knowledge of SMB state information,
|
---|
204 | all of which must be held by the front-end <emphasis>virtual</emphasis> server.
|
---|
205 | This is a perplexing and complicated problem to solve.
|
---|
206 | </para>
|
---|
207 |
|
---|
208 | <para>
|
---|
209 | <indexterm><primary>vuid</primary></indexterm>
|
---|
210 | <indexterm><primary>tid</primary></indexterm>
|
---|
211 | <indexterm><primary>fid</primary></indexterm>
|
---|
212 | Windows XP and later have changed semantics so state information (vuid, tid, fid)
|
---|
213 | must match for a successful operation. This makes things simpler than before and is a
|
---|
214 | positive step forward.
|
---|
215 | </para>
|
---|
216 |
|
---|
217 | <para>
|
---|
218 | <indexterm><primary>SMB requests</primary></indexterm>
|
---|
219 | <indexterm><primary>Terminal Server</primary></indexterm>
|
---|
220 | SMB requests are sent by vuid to their associated server. No code exists today to
|
---|
221 | effect this solution. This problem is conceptually similar to the problem of
|
---|
222 | correctly handling requests from multiple requests from Windows 2000
|
---|
223 | Terminal Server in Samba.
|
---|
224 | </para>
|
---|
225 |
|
---|
226 | <para>
|
---|
227 | <indexterm><primary>de-multiplexing</primary></indexterm>
|
---|
228 | One possibility is to start by exposing the server pool to clients directly.
|
---|
229 | This could eliminate the de-multiplexing step.
|
---|
230 | </para>
|
---|
231 |
|
---|
232 | </sect3>
|
---|
233 |
|
---|
234 | <sect3>
|
---|
235 | <title>The Distributed File System Challenge</title>
|
---|
236 |
|
---|
237 | <para>
|
---|
238 | <indexterm><primary>Distributed File Systems</primary></indexterm>
|
---|
239 | There exists many distributed file systems for UNIX and Linux.
|
---|
240 | </para>
|
---|
241 |
|
---|
242 | <para>
|
---|
243 | <indexterm><primary>backend</primary></indexterm>
|
---|
244 | <indexterm><primary>SMB semantics</primary></indexterm>
|
---|
245 | <indexterm><primary>share modes</primary></indexterm>
|
---|
246 | <indexterm><primary>locking</primary></indexterm>
|
---|
247 | <indexterm><primary>oplock</primary></indexterm>
|
---|
248 | <indexterm><primary>distributed file systems</primary></indexterm>
|
---|
249 | Many could be adopted to backend our cluster, so long as awareness of SMB
|
---|
250 | semantics is kept in mind (share modes, locking, and oplock issues in particular).
|
---|
251 | Common free distributed file systems include:
|
---|
252 | <indexterm><primary>NFS</primary></indexterm>
|
---|
253 | <indexterm><primary>AFS</primary></indexterm>
|
---|
254 | <indexterm><primary>OpenGFS</primary></indexterm>
|
---|
255 | <indexterm><primary>Lustre</primary></indexterm>
|
---|
256 | </para>
|
---|
257 |
|
---|
258 | <itemizedlist>
|
---|
259 | <listitem><para>NFS</para></listitem>
|
---|
260 | <listitem><para>AFS</para></listitem>
|
---|
261 | <listitem><para>OpenGFS</para></listitem>
|
---|
262 | <listitem><para>Lustre</para></listitem>
|
---|
263 | </itemizedlist>
|
---|
264 |
|
---|
265 | <para>
|
---|
266 | <indexterm><primary>server pool</primary></indexterm>
|
---|
267 | The server pool (cluster) can use any distributed file system backend if all SMB
|
---|
268 | semantics are performed within this pool.
|
---|
269 | </para>
|
---|
270 |
|
---|
271 | </sect3>
|
---|
272 |
|
---|
273 | <sect3>
|
---|
274 | <title>Restrictive Constraints on Distributed File Systems</title>
|
---|
275 |
|
---|
276 | <para>
|
---|
277 | <indexterm><primary>SMB services</primary></indexterm>
|
---|
278 | <indexterm><primary>oplock handling</primary></indexterm>
|
---|
279 | <indexterm><primary>server pool</primary></indexterm>
|
---|
280 | <indexterm><primary>backend file system pool</primary></indexterm>
|
---|
281 | Where a clustered server provides purely SMB services, oplock handling
|
---|
282 | may be done within the server pool without imposing a need for this to
|
---|
283 | be passed to the backend file system pool.
|
---|
284 | </para>
|
---|
285 |
|
---|
286 | <para>
|
---|
287 | <indexterm><primary>NFS</primary></indexterm>
|
---|
288 | <indexterm><primary>interoperability</primary></indexterm>
|
---|
289 | On the other hand, where the server pool also provides NFS or other file services,
|
---|
290 | it will be essential that the implementation be oplock-aware so it can
|
---|
291 | interoperate with SMB services. This is a significant challenge today. A failure
|
---|
292 | to provide this interoperability will result in a significant loss of performance that will be
|
---|
293 | sorely noted by users of Microsoft Windows clients.
|
---|
294 | </para>
|
---|
295 |
|
---|
296 | <para>
|
---|
297 | Last, all state information must be shared across the server pool.
|
---|
298 | </para>
|
---|
299 |
|
---|
300 | </sect3>
|
---|
301 |
|
---|
302 | <sect3>
|
---|
303 | <title>Server Pool Communications</title>
|
---|
304 |
|
---|
305 | <para>
|
---|
306 | <indexterm><primary>POSIX semantics</primary></indexterm>
|
---|
307 | <indexterm><primary>SMB</primary></indexterm>
|
---|
308 | <indexterm><primary>POSIX locks</primary></indexterm>
|
---|
309 | <indexterm><primary>SMB locks</primary></indexterm>
|
---|
310 | Most backend file systems support POSIX file semantics. This makes it difficult
|
---|
311 | to push SMB semantics back into the file system. POSIX locks have different properties
|
---|
312 | and semantics from SMB locks.
|
---|
313 | </para>
|
---|
314 |
|
---|
315 | <para>
|
---|
316 | <indexterm><primary>smbd</primary></indexterm>
|
---|
317 | <indexterm><primary>tdb</primary></indexterm>
|
---|
318 | <indexterm><primary>Clustered smbds</primary></indexterm>
|
---|
319 | All <command>smbd</command> processes in the server pool must of necessity communicate
|
---|
320 | very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
|
---|
321 | uses is not suitable for use across a network. Clustered <command>smbd</command>s must use something else.
|
---|
322 | </para>
|
---|
323 |
|
---|
324 | </sect3>
|
---|
325 |
|
---|
326 | <sect3>
|
---|
327 | <title>Server Pool Communications Demands</title>
|
---|
328 |
|
---|
329 | <para>
|
---|
330 | High-speed interserver communications in the server pool is a design prerequisite
|
---|
331 | for a fully functional system. Possibilities for this include:
|
---|
332 | </para>
|
---|
333 |
|
---|
334 | <itemizedlist>
|
---|
335 | <indexterm><primary>Myrinet</primary></indexterm>
|
---|
336 | <indexterm><primary>scalable coherent interface</primary><see>SCI</see></indexterm>
|
---|
337 | <listitem><para>
|
---|
338 | Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
|
---|
339 | These are high-cost items.
|
---|
340 | </para></listitem>
|
---|
341 |
|
---|
342 | <listitem><para>
|
---|
343 | Gigabit Ethernet (now quite affordable).
|
---|
344 | </para></listitem>
|
---|
345 |
|
---|
346 | <listitem><para>
|
---|
347 | Raw Ethernet framing (to bypass TCP and UDP overheads).
|
---|
348 | </para></listitem>
|
---|
349 | </itemizedlist>
|
---|
350 |
|
---|
351 | <para>
|
---|
352 | We have yet to identify metrics for performance demands to enable this to happen
|
---|
353 | effectively.
|
---|
354 | </para>
|
---|
355 |
|
---|
356 | </sect3>
|
---|
357 |
|
---|
358 | <sect3>
|
---|
359 | <title>Required Modifications to Samba</title>
|
---|
360 |
|
---|
361 | <para>
|
---|
362 | Samba needs to be significantly modified to work with a high-speed server interconnect
|
---|
363 | system to permit transparent failover clustering.
|
---|
364 | </para>
|
---|
365 |
|
---|
366 | <para>
|
---|
367 | Particular functions inside Samba that will be affected include:
|
---|
368 | </para>
|
---|
369 |
|
---|
370 | <itemizedlist>
|
---|
371 | <listitem><para>
|
---|
372 | The locking database, oplock notifications,
|
---|
373 | and the share mode database.
|
---|
374 | </para></listitem>
|
---|
375 |
|
---|
376 | <listitem><para>
|
---|
377 | <indexterm><primary>failure semantics</primary></indexterm>
|
---|
378 | <indexterm><primary>oplock messages</primary></indexterm>
|
---|
379 | Failure semantics need to be defined. Samba behaves the same way as Windows.
|
---|
380 | When oplock messages fail, a file open request is allowed, but this is
|
---|
381 | potentially dangerous in a clustered environment. So how should interserver
|
---|
382 | pool failure semantics function, and how should such functionality be implemented?
|
---|
383 | </para></listitem>
|
---|
384 |
|
---|
385 | <listitem><para>
|
---|
386 | Should this be implemented using a point-to-point lock manager, or can this
|
---|
387 | be done using multicast techniques?
|
---|
388 | </para></listitem>
|
---|
389 |
|
---|
390 | </itemizedlist>
|
---|
391 |
|
---|
392 | </sect3>
|
---|
393 | </sect2>
|
---|
394 |
|
---|
395 | <sect2>
|
---|
396 | <title>A Simple Solution</title>
|
---|
397 |
|
---|
398 | <para>
|
---|
399 | <indexterm><primary>failover servers</primary></indexterm>
|
---|
400 | <indexterm><primary>exported file system</primary></indexterm>
|
---|
401 | <indexterm><primary>distributed locking protocol</primary></indexterm>
|
---|
402 | Allowing failover servers to handle different functions within the exported file system
|
---|
403 | removes the problem of requiring a distributed locking protocol.
|
---|
404 | </para>
|
---|
405 |
|
---|
406 | <para>
|
---|
407 | <indexterm><primary>high-speed server interconnect</primary></indexterm>
|
---|
408 | <indexterm><primary>complex file name space</primary></indexterm>
|
---|
409 | If only one server is active in a pair, the need for high-speed server interconnect is avoided.
|
---|
410 | This allows the use of existing high-availability solutions, instead of inventing a new one.
|
---|
411 | This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
|
---|
412 | complex file name space. Since there is now not a single file system, administrators
|
---|
413 | must remember where all services are located &smbmdash; a complexity not easily dealt with.
|
---|
414 | </para>
|
---|
415 |
|
---|
416 | <para>
|
---|
417 | <indexterm><primary>virtual server</primary></indexterm>
|
---|
418 | The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
|
---|
419 | servers. Backend file space integrity is the responsibility of the administrator.
|
---|
420 | </para>
|
---|
421 |
|
---|
422 | </sect2>
|
---|
423 |
|
---|
424 | <sect2>
|
---|
425 | <title>High-Availability Server Products</title>
|
---|
426 |
|
---|
427 | <para>
|
---|
428 | <indexterm><primary>resource failover</primary></indexterm>
|
---|
429 | <indexterm><primary>high-availability services</primary></indexterm>
|
---|
430 | <indexterm><primary>dedicated heartbeat</primary></indexterm>
|
---|
431 | <indexterm><primary>LAN</primary></indexterm>
|
---|
432 | <indexterm><primary>failover process</primary></indexterm>
|
---|
433 | Failover servers must communicate in order to handle resource failover. This is essential
|
---|
434 | for high-availability services. The use of a dedicated heartbeat is a common technique to
|
---|
435 | introduce some intelligence into the failover process. This is often done over a dedicated
|
---|
436 | link (LAN or serial).
|
---|
437 | </para>
|
---|
438 |
|
---|
439 | <para>
|
---|
440 | <indexterm><primary>SCSI</primary></indexterm>
|
---|
441 | <indexterm><primary>Red Hat Cluster Manager</primary></indexterm>
|
---|
442 | <indexterm><primary>Microsoft Wolfpack</primary></indexterm>
|
---|
443 | <indexterm><primary>Fiber Channel</primary></indexterm>
|
---|
444 | <indexterm><primary>failover communication</primary></indexterm>
|
---|
445 | Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack)
|
---|
446 | can use a shared SCSI of Fiber Channel disk storage array for failover communication.
|
---|
447 | Information regarding Red Hat high availability solutions for Samba may be obtained from
|
---|
448 | <ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com</ulink>.
|
---|
449 | </para>
|
---|
450 |
|
---|
451 | <para>
|
---|
452 | <indexterm><primary>Linux High Availability project</primary></indexterm>
|
---|
453 | The Linux High Availability project is a resource worthy of consultation if your desire is
|
---|
454 | to build a highly available Samba file server solution. Please consult the home page at
|
---|
455 | <ulink url="http://www.linux-ha.org/">www.linux-ha.org/</ulink>.
|
---|
456 | </para>
|
---|
457 |
|
---|
458 | <para>
|
---|
459 | <indexterm><primary>backend failures</primary></indexterm>
|
---|
460 | <indexterm><primary>continuity of service</primary></indexterm>
|
---|
461 | Front-end server complexity remains a challenge for high availability because it must deal
|
---|
462 | gracefully with backend failures, while at the same time providing continuity of service
|
---|
463 | to all network clients.
|
---|
464 | </para>
|
---|
465 |
|
---|
466 | </sect2>
|
---|
467 |
|
---|
468 | <sect2>
|
---|
469 | <title>MS-DFS: The Poor Man's Cluster</title>
|
---|
470 |
|
---|
471 | <para>
|
---|
472 | <indexterm><primary>MS-DFS</primary></indexterm>
|
---|
473 | <indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
|
---|
474 | MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
|
---|
475 | complexity back to the network client, something already included by Microsoft.
|
---|
476 | MS-DFS creates the illusion of a simple, continuous file system name space that works even
|
---|
477 | at the file level.
|
---|
478 | </para>
|
---|
479 |
|
---|
480 | <para>
|
---|
481 | Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can
|
---|
482 | be created using existing Samba functionality.
|
---|
483 | </para>
|
---|
484 |
|
---|
485 | </sect2>
|
---|
486 |
|
---|
487 | <sect2>
|
---|
488 | <title>Conclusions</title>
|
---|
489 |
|
---|
490 | <itemizedlist>
|
---|
491 | <listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
|
---|
492 | <listitem><para>Client failover is the best we can do today.</para></listitem>
|
---|
493 | <listitem><para>Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.</para></listitem>
|
---|
494 | <listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
|
---|
495 | </itemizedlist>
|
---|
496 |
|
---|
497 | </sect2>
|
---|
498 |
|
---|
499 | </sect1>
|
---|
500 | </chapter>
|
---|