[203] | 1 | <?xml version="1.0" encoding="iso-8859-1"?>
|
---|
| 2 | <!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
|
---|
| 3 | <chapter id="SambaHA">
|
---|
| 4 | <chapterinfo>
|
---|
| 5 | &author.jht;
|
---|
| 6 | &author.jeremy;
|
---|
| 7 | </chapterinfo>
|
---|
| 8 |
|
---|
| 9 | <title>High Availability</title>
|
---|
| 10 |
|
---|
| 11 | <sect1>
|
---|
| 12 | <title>Features and Benefits</title>
|
---|
| 13 |
|
---|
| 14 | <para>
|
---|
| 15 | <indexterm><primary>availability</primary></indexterm>
|
---|
| 16 | <indexterm><primary>intolerance</primary></indexterm>
|
---|
| 17 | <indexterm><primary>vital task</primary></indexterm>
|
---|
| 18 | Network administrators are often concerned about the availability of file and print
|
---|
| 19 | services. Network users are inclined toward intolerance of the services they depend
|
---|
| 20 | on to perform vital task responsibilities.
|
---|
| 21 | </para>
|
---|
| 22 |
|
---|
| 23 | <para>
|
---|
| 24 | A sign in a computer room served to remind staff of their responsibilities. It read:
|
---|
| 25 | </para>
|
---|
| 26 |
|
---|
| 27 | <blockquote>
|
---|
| 28 | <para>
|
---|
| 29 | <indexterm><primary>fail</primary></indexterm>
|
---|
| 30 | <indexterm><primary>managed by humans</primary></indexterm>
|
---|
| 31 | <indexterm><primary>economically wise</primary></indexterm>
|
---|
| 32 | <indexterm><primary>anticipate failure</primary></indexterm>
|
---|
| 33 | All humans fail, in both great and small ways we fail continually. Machines fail too.
|
---|
| 34 | Computers are machines that are managed by humans, the fallout from failure
|
---|
| 35 | can be spectacular. Your responsibility is to deal with failure, to anticipate it
|
---|
| 36 | and to eliminate it as far as is humanly and economically wise to achieve.
|
---|
| 37 | Are your actions part of the problem or part of the solution?
|
---|
| 38 | </para>
|
---|
| 39 | </blockquote>
|
---|
| 40 |
|
---|
| 41 | <para>
|
---|
| 42 | If we are to deal with failure in a planned and productive manner, then first we must
|
---|
| 43 | understand the problem. That is the purpose of this chapter.
|
---|
| 44 | </para>
|
---|
| 45 |
|
---|
| 46 | <para>
|
---|
| 47 | <indexterm><primary>high availability</primary></indexterm>
|
---|
| 48 | <indexterm><primary>CIFS/SMB</primary></indexterm>
|
---|
| 49 | <indexterm><primary>state of knowledge</primary></indexterm>
|
---|
| 50 | Parenthetically, in the following discussion there are seeds of information on how to
|
---|
| 51 | provision a network infrastructure against failure. Our purpose here is not to provide
|
---|
| 52 | a lengthy dissertation on the subject of high availability. Additionally, we have made
|
---|
| 53 | a conscious decision to not provide detailed working examples of high availability
|
---|
| 54 | solutions; instead we present an overview of the issues in the hope that someone will
|
---|
| 55 | rise to the challenge of providing a detailed document that is focused purely on
|
---|
| 56 | presentation of the current state of knowledge and practice in high availability as it
|
---|
| 57 | applies to the deployment of Samba and other CIFS/SMB technologies.
|
---|
| 58 | </para>
|
---|
| 59 |
|
---|
| 60 | </sect1>
|
---|
| 61 |
|
---|
| 62 | <sect1>
|
---|
| 63 | <title>Technical Discussion</title>
|
---|
| 64 |
|
---|
| 65 | <para>
|
---|
| 66 | <indexterm><primary>SambaXP conference</primary></indexterm>
|
---|
| 67 | <indexterm><primary>Germany</primary></indexterm>
|
---|
| 68 | <indexterm><primary>inspired structure</primary></indexterm>
|
---|
| 69 | The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
|
---|
| 70 | conference that was held at Goettingen, Germany, in April 2003. Material has been added
|
---|
| 71 | from other sources, but it was Jeremy who inspired the structure that follows.
|
---|
| 72 | </para>
|
---|
| 73 |
|
---|
| 74 | <sect2>
|
---|
| 75 | <title>The Ultimate Goal</title>
|
---|
| 76 |
|
---|
| 77 | <para>
|
---|
| 78 | <indexterm><primary>clustering technologies</primary></indexterm>
|
---|
| 79 | <indexterm><primary>affordable power</primary></indexterm>
|
---|
| 80 | <indexterm><primary>unstoppable services</primary></indexterm>
|
---|
| 81 | All clustering technologies aim to achieve one or more of the following:
|
---|
| 82 | </para>
|
---|
| 83 |
|
---|
| 84 | <itemizedlist>
|
---|
| 85 | <listitem><para>Obtain the maximum affordable computational power.</para></listitem>
|
---|
| 86 | <listitem><para>Obtain faster program execution.</para></listitem>
|
---|
| 87 | <listitem><para>Deliver unstoppable services.</para></listitem>
|
---|
| 88 | <listitem><para>Avert points of failure.</para></listitem>
|
---|
| 89 | <listitem><para>Exact most effective utilization of resources.</para></listitem>
|
---|
| 90 | </itemizedlist>
|
---|
| 91 |
|
---|
| 92 | <para>
|
---|
| 93 | A clustered file server ideally has the following properties:
|
---|
| 94 | <indexterm><primary>clustered file server</primary></indexterm>
|
---|
| 95 | <indexterm><primary>connect transparently</primary></indexterm>
|
---|
| 96 | <indexterm><primary>transparently reconnected</primary></indexterm>
|
---|
| 97 | <indexterm><primary>distributed file system</primary></indexterm>
|
---|
| 98 | </para>
|
---|
| 99 |
|
---|
| 100 | <itemizedlist>
|
---|
| 101 | <listitem><para>All clients can connect transparently to any server.</para></listitem>
|
---|
| 102 | <listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
|
---|
| 103 | <listitem><para>All servers serve out the same set of files.</para></listitem>
|
---|
| 104 | <listitem><para>All file changes are immediately seen on all servers.</para>
|
---|
| 105 | <itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
|
---|
| 106 | <listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
|
---|
| 107 | </itemizedlist>
|
---|
| 108 |
|
---|
| 109 | </sect2>
|
---|
| 110 |
|
---|
| 111 | <sect2>
|
---|
| 112 | <title>Why Is This So Hard?</title>
|
---|
| 113 |
|
---|
| 114 | <para>
|
---|
| 115 | In short, the problem is one of <emphasis>state</emphasis>.
|
---|
| 116 | </para>
|
---|
| 117 |
|
---|
| 118 | <itemizedlist>
|
---|
| 119 | <listitem>
|
---|
| 120 | <para>
|
---|
| 121 | <indexterm><primary>state information</primary></indexterm>
|
---|
| 122 | All TCP/IP connections are dependent on state information.
|
---|
| 123 | </para>
|
---|
| 124 | <para>
|
---|
| 125 | <indexterm><primary>TCP failover</primary></indexterm>
|
---|
| 126 | The TCP connection involves a packet sequence number. This
|
---|
| 127 | sequence number would need to be dynamically updated on all
|
---|
| 128 | machines in the cluster to effect seamless TCP failover.
|
---|
| 129 | </para>
|
---|
| 130 | </listitem>
|
---|
| 131 | <listitem>
|
---|
| 132 | <para>
|
---|
| 133 | <indexterm><primary>CIFS/SMB</primary></indexterm>
|
---|
| 134 | <indexterm><primary>TCP</primary></indexterm>
|
---|
| 135 | CIFS/SMB (the Windows networking protocols) uses TCP connections.
|
---|
| 136 | </para>
|
---|
| 137 | <para>
|
---|
| 138 | This means that from a basic design perspective, failover is not
|
---|
| 139 | seriously considered.
|
---|
| 140 | <itemizedlist>
|
---|
| 141 | <listitem><para>
|
---|
| 142 | All current SMB clusters are failover solutions
|
---|
| 143 | &smbmdash; they rely on the clients to reconnect. They provide server
|
---|
| 144 | failover, but clients can lose information due to a server failure.
|
---|
| 145 | <indexterm><primary>server failure</primary></indexterm>
|
---|
| 146 | </para></listitem>
|
---|
| 147 | </itemizedlist>
|
---|
| 148 | </para>
|
---|
| 149 | </listitem>
|
---|
| 150 | <listitem>
|
---|
| 151 | <para>
|
---|
| 152 | Servers keep state information about client connections.
|
---|
| 153 | <itemizedlist>
|
---|
| 154 | <indexterm><primary>state</primary></indexterm>
|
---|
| 155 | <listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
|
---|
| 156 | <listitem><para>Every file open must be compared with other open files
|
---|
| 157 | to check share modes.</para></listitem>
|
---|
| 158 | </itemizedlist>
|
---|
| 159 | </para>
|
---|
| 160 | </listitem>
|
---|
| 161 | </itemizedlist>
|
---|
| 162 |
|
---|
| 163 | <sect3>
|
---|
| 164 | <title>The Front-End Challenge</title>
|
---|
| 165 |
|
---|
| 166 | <para>
|
---|
| 167 | <indexterm><primary>cluster servers</primary></indexterm>
|
---|
| 168 | <indexterm><primary>single server</primary></indexterm>
|
---|
| 169 | <indexterm><primary>TCP data streams</primary></indexterm>
|
---|
| 170 | <indexterm><primary>front-end virtual server</primary></indexterm>
|
---|
| 171 | <indexterm><primary>virtual server</primary></indexterm>
|
---|
| 172 | <indexterm><primary>de-multiplex</primary></indexterm>
|
---|
| 173 | <indexterm><primary>SMB</primary></indexterm>
|
---|
| 174 | To make it possible for a cluster of file servers to appear as a single server that has one
|
---|
| 175 | name and one IP address, the incoming TCP data streams from clients must be processed by the
|
---|
| 176 | front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
|
---|
| 177 | layer level and then feed the SMB packet to different servers in the cluster.
|
---|
| 178 | </para>
|
---|
| 179 |
|
---|
| 180 | <para>
|
---|
| 181 | <indexterm><primary>IPC$ connections</primary></indexterm>
|
---|
| 182 | <indexterm><primary>RPC calls</primary></indexterm>
|
---|
| 183 | One could split all IPC$ connections and RPC calls to one server to handle printing and user
|
---|
| 184 | lookup requirements. RPC printing handles are shared between different IPC4 sessions &smbmdash; it is
|
---|
| 185 | hard to split this across clustered servers!
|
---|
| 186 | </para>
|
---|
| 187 |
|
---|
| 188 | <para>
|
---|
| 189 | Conceptually speaking, all other servers would then provide only file services. This is a simpler
|
---|
| 190 | problem to concentrate on.
|
---|
| 191 | </para>
|
---|
| 192 |
|
---|
| 193 | </sect3>
|
---|
| 194 |
|
---|
| 195 | <sect3>
|
---|
| 196 | <title>Demultiplexing SMB Requests</title>
|
---|
| 197 |
|
---|
| 198 | <para>
|
---|
| 199 | <indexterm><primary>SMB requests</primary></indexterm>
|
---|
| 200 | <indexterm><primary>SMB state information</primary></indexterm>
|
---|
| 201 | <indexterm><primary>front-end virtual server</primary></indexterm>
|
---|
| 202 | <indexterm><primary>complicated problem</primary></indexterm>
|
---|
| 203 | De-multiplexing of SMB requests requires knowledge of SMB state information,
|
---|
| 204 | all of which must be held by the front-end <emphasis>virtual</emphasis> server.
|
---|
| 205 | This is a perplexing and complicated problem to solve.
|
---|
| 206 | </para>
|
---|
| 207 |
|
---|
| 208 | <para>
|
---|
| 209 | <indexterm><primary>vuid</primary></indexterm>
|
---|
| 210 | <indexterm><primary>tid</primary></indexterm>
|
---|
| 211 | <indexterm><primary>fid</primary></indexterm>
|
---|
| 212 | Windows XP and later have changed semantics so state information (vuid, tid, fid)
|
---|
| 213 | must match for a successful operation. This makes things simpler than before and is a
|
---|
| 214 | positive step forward.
|
---|
| 215 | </para>
|
---|
| 216 |
|
---|
| 217 | <para>
|
---|
| 218 | <indexterm><primary>SMB requests</primary></indexterm>
|
---|
| 219 | <indexterm><primary>Terminal Server</primary></indexterm>
|
---|
| 220 | SMB requests are sent by vuid to their associated server. No code exists today to
|
---|
| 221 | effect this solution. This problem is conceptually similar to the problem of
|
---|
| 222 | correctly handling requests from multiple requests from Windows 2000
|
---|
| 223 | Terminal Server in Samba.
|
---|
| 224 | </para>
|
---|
| 225 |
|
---|
| 226 | <para>
|
---|
| 227 | <indexterm><primary>de-multiplexing</primary></indexterm>
|
---|
| 228 | One possibility is to start by exposing the server pool to clients directly.
|
---|
| 229 | This could eliminate the de-multiplexing step.
|
---|
| 230 | </para>
|
---|
| 231 |
|
---|
| 232 | </sect3>
|
---|
| 233 |
|
---|
| 234 | <sect3>
|
---|
| 235 | <title>The Distributed File System Challenge</title>
|
---|
| 236 |
|
---|
| 237 | <para>
|
---|
| 238 | <indexterm><primary>Distributed File Systems</primary></indexterm>
|
---|
| 239 | There exists many distributed file systems for UNIX and Linux.
|
---|
| 240 | </para>
|
---|
| 241 |
|
---|
| 242 | <para>
|
---|
| 243 | <indexterm><primary>backend</primary></indexterm>
|
---|
| 244 | <indexterm><primary>SMB semantics</primary></indexterm>
|
---|
| 245 | <indexterm><primary>share modes</primary></indexterm>
|
---|
| 246 | <indexterm><primary>locking</primary></indexterm>
|
---|
| 247 | <indexterm><primary>oplock</primary></indexterm>
|
---|
| 248 | <indexterm><primary>distributed file systems</primary></indexterm>
|
---|
| 249 | Many could be adopted to backend our cluster, so long as awareness of SMB
|
---|
| 250 | semantics is kept in mind (share modes, locking, and oplock issues in particular).
|
---|
| 251 | Common free distributed file systems include:
|
---|
| 252 | <indexterm><primary>NFS</primary></indexterm>
|
---|
| 253 | <indexterm><primary>AFS</primary></indexterm>
|
---|
| 254 | <indexterm><primary>OpenGFS</primary></indexterm>
|
---|
| 255 | <indexterm><primary>Lustre</primary></indexterm>
|
---|
| 256 | </para>
|
---|
| 257 |
|
---|
| 258 | <itemizedlist>
|
---|
| 259 | <listitem><para>NFS</para></listitem>
|
---|
| 260 | <listitem><para>AFS</para></listitem>
|
---|
| 261 | <listitem><para>OpenGFS</para></listitem>
|
---|
| 262 | <listitem><para>Lustre</para></listitem>
|
---|
| 263 | </itemizedlist>
|
---|
| 264 |
|
---|
| 265 | <para>
|
---|
| 266 | <indexterm><primary>server pool</primary></indexterm>
|
---|
| 267 | The server pool (cluster) can use any distributed file system backend if all SMB
|
---|
| 268 | semantics are performed within this pool.
|
---|
| 269 | </para>
|
---|
| 270 |
|
---|
| 271 | </sect3>
|
---|
| 272 |
|
---|
| 273 | <sect3>
|
---|
| 274 | <title>Restrictive Constraints on Distributed File Systems</title>
|
---|
| 275 |
|
---|
| 276 | <para>
|
---|
| 277 | <indexterm><primary>SMB services</primary></indexterm>
|
---|
| 278 | <indexterm><primary>oplock handling</primary></indexterm>
|
---|
| 279 | <indexterm><primary>server pool</primary></indexterm>
|
---|
| 280 | <indexterm><primary>backend file system pool</primary></indexterm>
|
---|
| 281 | Where a clustered server provides purely SMB services, oplock handling
|
---|
| 282 | may be done within the server pool without imposing a need for this to
|
---|
| 283 | be passed to the backend file system pool.
|
---|
| 284 | </para>
|
---|
| 285 |
|
---|
| 286 | <para>
|
---|
| 287 | <indexterm><primary>NFS</primary></indexterm>
|
---|
| 288 | <indexterm><primary>interoperability</primary></indexterm>
|
---|
| 289 | On the other hand, where the server pool also provides NFS or other file services,
|
---|
| 290 | it will be essential that the implementation be oplock-aware so it can
|
---|
| 291 | interoperate with SMB services. This is a significant challenge today. A failure
|
---|
| 292 | to provide this interoperability will result in a significant loss of performance that will be
|
---|
| 293 | sorely noted by users of Microsoft Windows clients.
|
---|
| 294 | </para>
|
---|
| 295 |
|
---|
| 296 | <para>
|
---|
| 297 | Last, all state information must be shared across the server pool.
|
---|
| 298 | </para>
|
---|
| 299 |
|
---|
| 300 | </sect3>
|
---|
| 301 |
|
---|
| 302 | <sect3>
|
---|
| 303 | <title>Server Pool Communications</title>
|
---|
| 304 |
|
---|
| 305 | <para>
|
---|
| 306 | <indexterm><primary>POSIX semantics</primary></indexterm>
|
---|
| 307 | <indexterm><primary>SMB</primary></indexterm>
|
---|
| 308 | <indexterm><primary>POSIX locks</primary></indexterm>
|
---|
| 309 | <indexterm><primary>SMB locks</primary></indexterm>
|
---|
| 310 | Most backend file systems support POSIX file semantics. This makes it difficult
|
---|
| 311 | to push SMB semantics back into the file system. POSIX locks have different properties
|
---|
| 312 | and semantics from SMB locks.
|
---|
| 313 | </para>
|
---|
| 314 |
|
---|
| 315 | <para>
|
---|
| 316 | <indexterm><primary>smbd</primary></indexterm>
|
---|
| 317 | <indexterm><primary>tdb</primary></indexterm>
|
---|
| 318 | <indexterm><primary>Clustered smbds</primary></indexterm>
|
---|
| 319 | All <command>smbd</command> processes in the server pool must of necessity communicate
|
---|
| 320 | very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
|
---|
| 321 | uses is not suitable for use across a network. Clustered <command>smbd</command>s must use something else.
|
---|
| 322 | </para>
|
---|
| 323 |
|
---|
| 324 | </sect3>
|
---|
| 325 |
|
---|
| 326 | <sect3>
|
---|
| 327 | <title>Server Pool Communications Demands</title>
|
---|
| 328 |
|
---|
| 329 | <para>
|
---|
| 330 | High-speed interserver communications in the server pool is a design prerequisite
|
---|
| 331 | for a fully functional system. Possibilities for this include:
|
---|
| 332 | </para>
|
---|
| 333 |
|
---|
| 334 | <itemizedlist>
|
---|
| 335 | <indexterm><primary>Myrinet</primary></indexterm>
|
---|
| 336 | <indexterm><primary>scalable coherent interface</primary><see>SCI</see></indexterm>
|
---|
| 337 | <listitem><para>
|
---|
| 338 | Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
|
---|
| 339 | These are high-cost items.
|
---|
| 340 | </para></listitem>
|
---|
| 341 |
|
---|
| 342 | <listitem><para>
|
---|
| 343 | Gigabit Ethernet (now quite affordable).
|
---|
| 344 | </para></listitem>
|
---|
| 345 |
|
---|
| 346 | <listitem><para>
|
---|
| 347 | Raw Ethernet framing (to bypass TCP and UDP overheads).
|
---|
| 348 | </para></listitem>
|
---|
| 349 | </itemizedlist>
|
---|
| 350 |
|
---|
| 351 | <para>
|
---|
| 352 | We have yet to identify metrics for performance demands to enable this to happen
|
---|
| 353 | effectively.
|
---|
| 354 | </para>
|
---|
| 355 |
|
---|
| 356 | </sect3>
|
---|
| 357 |
|
---|
| 358 | <sect3>
|
---|
| 359 | <title>Required Modifications to Samba</title>
|
---|
| 360 |
|
---|
| 361 | <para>
|
---|
| 362 | Samba needs to be significantly modified to work with a high-speed server interconnect
|
---|
| 363 | system to permit transparent failover clustering.
|
---|
| 364 | </para>
|
---|
| 365 |
|
---|
| 366 | <para>
|
---|
| 367 | Particular functions inside Samba that will be affected include:
|
---|
| 368 | </para>
|
---|
| 369 |
|
---|
| 370 | <itemizedlist>
|
---|
| 371 | <listitem><para>
|
---|
| 372 | The locking database, oplock notifications,
|
---|
| 373 | and the share mode database.
|
---|
| 374 | </para></listitem>
|
---|
| 375 |
|
---|
| 376 | <listitem><para>
|
---|
| 377 | <indexterm><primary>failure semantics</primary></indexterm>
|
---|
| 378 | <indexterm><primary>oplock messages</primary></indexterm>
|
---|
| 379 | Failure semantics need to be defined. Samba behaves the same way as Windows.
|
---|
| 380 | When oplock messages fail, a file open request is allowed, but this is
|
---|
| 381 | potentially dangerous in a clustered environment. So how should interserver
|
---|
| 382 | pool failure semantics function, and how should such functionality be implemented?
|
---|
| 383 | </para></listitem>
|
---|
| 384 |
|
---|
| 385 | <listitem><para>
|
---|
| 386 | Should this be implemented using a point-to-point lock manager, or can this
|
---|
| 387 | be done using multicast techniques?
|
---|
| 388 | </para></listitem>
|
---|
| 389 |
|
---|
| 390 | </itemizedlist>
|
---|
| 391 |
|
---|
| 392 | </sect3>
|
---|
| 393 | </sect2>
|
---|
| 394 |
|
---|
| 395 | <sect2>
|
---|
| 396 | <title>A Simple Solution</title>
|
---|
| 397 |
|
---|
| 398 | <para>
|
---|
| 399 | <indexterm><primary>failover servers</primary></indexterm>
|
---|
| 400 | <indexterm><primary>exported file system</primary></indexterm>
|
---|
| 401 | <indexterm><primary>distributed locking protocol</primary></indexterm>
|
---|
| 402 | Allowing failover servers to handle different functions within the exported file system
|
---|
| 403 | removes the problem of requiring a distributed locking protocol.
|
---|
| 404 | </para>
|
---|
| 405 |
|
---|
| 406 | <para>
|
---|
| 407 | <indexterm><primary>high-speed server interconnect</primary></indexterm>
|
---|
| 408 | <indexterm><primary>complex file name space</primary></indexterm>
|
---|
| 409 | If only one server is active in a pair, the need for high-speed server interconnect is avoided.
|
---|
| 410 | This allows the use of existing high-availability solutions, instead of inventing a new one.
|
---|
| 411 | This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
|
---|
| 412 | complex file name space. Since there is now not a single file system, administrators
|
---|
| 413 | must remember where all services are located &smbmdash; a complexity not easily dealt with.
|
---|
| 414 | </para>
|
---|
| 415 |
|
---|
| 416 | <para>
|
---|
| 417 | <indexterm><primary>virtual server</primary></indexterm>
|
---|
| 418 | The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
|
---|
| 419 | servers. Backend file space integrity is the responsibility of the administrator.
|
---|
| 420 | </para>
|
---|
| 421 |
|
---|
| 422 | </sect2>
|
---|
| 423 |
|
---|
| 424 | <sect2>
|
---|
| 425 | <title>High-Availability Server Products</title>
|
---|
| 426 |
|
---|
| 427 | <para>
|
---|
| 428 | <indexterm><primary>resource failover</primary></indexterm>
|
---|
| 429 | <indexterm><primary>high-availability services</primary></indexterm>
|
---|
| 430 | <indexterm><primary>dedicated heartbeat</primary></indexterm>
|
---|
| 431 | <indexterm><primary>LAN</primary></indexterm>
|
---|
| 432 | <indexterm><primary>failover process</primary></indexterm>
|
---|
| 433 | Failover servers must communicate in order to handle resource failover. This is essential
|
---|
| 434 | for high-availability services. The use of a dedicated heartbeat is a common technique to
|
---|
| 435 | introduce some intelligence into the failover process. This is often done over a dedicated
|
---|
| 436 | link (LAN or serial).
|
---|
| 437 | </para>
|
---|
| 438 |
|
---|
| 439 | <para>
|
---|
| 440 | <indexterm><primary>SCSI</primary></indexterm>
|
---|
| 441 | <indexterm><primary>Red Hat Cluster Manager</primary></indexterm>
|
---|
| 442 | <indexterm><primary>Microsoft Wolfpack</primary></indexterm>
|
---|
| 443 | <indexterm><primary>Fiber Channel</primary></indexterm>
|
---|
| 444 | <indexterm><primary>failover communication</primary></indexterm>
|
---|
| 445 | Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack)
|
---|
| 446 | can use a shared SCSI of Fiber Channel disk storage array for failover communication.
|
---|
| 447 | Information regarding Red Hat high availability solutions for Samba may be obtained from
|
---|
| 448 | <ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com</ulink>.
|
---|
| 449 | </para>
|
---|
| 450 |
|
---|
| 451 | <para>
|
---|
| 452 | <indexterm><primary>Linux High Availability project</primary></indexterm>
|
---|
| 453 | The Linux High Availability project is a resource worthy of consultation if your desire is
|
---|
| 454 | to build a highly available Samba file server solution. Please consult the home page at
|
---|
| 455 | <ulink url="http://www.linux-ha.org/">www.linux-ha.org/</ulink>.
|
---|
| 456 | </para>
|
---|
| 457 |
|
---|
| 458 | <para>
|
---|
| 459 | <indexterm><primary>backend failures</primary></indexterm>
|
---|
| 460 | <indexterm><primary>continuity of service</primary></indexterm>
|
---|
| 461 | Front-end server complexity remains a challenge for high availability because it must deal
|
---|
| 462 | gracefully with backend failures, while at the same time providing continuity of service
|
---|
| 463 | to all network clients.
|
---|
| 464 | </para>
|
---|
| 465 |
|
---|
| 466 | </sect2>
|
---|
| 467 |
|
---|
| 468 | <sect2>
|
---|
| 469 | <title>MS-DFS: The Poor Man's Cluster</title>
|
---|
| 470 |
|
---|
| 471 | <para>
|
---|
| 472 | <indexterm><primary>MS-DFS</primary></indexterm>
|
---|
| 473 | <indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
|
---|
| 474 | MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
|
---|
| 475 | complexity back to the network client, something already included by Microsoft.
|
---|
| 476 | MS-DFS creates the illusion of a simple, continuous file system name space that works even
|
---|
| 477 | at the file level.
|
---|
| 478 | </para>
|
---|
| 479 |
|
---|
| 480 | <para>
|
---|
| 481 | Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can
|
---|
| 482 | be created using existing Samba functionality.
|
---|
| 483 | </para>
|
---|
| 484 |
|
---|
| 485 | </sect2>
|
---|
| 486 |
|
---|
| 487 | <sect2>
|
---|
| 488 | <title>Conclusions</title>
|
---|
| 489 |
|
---|
| 490 | <itemizedlist>
|
---|
| 491 | <listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
|
---|
| 492 | <listitem><para>Client failover is the best we can do today.</para></listitem>
|
---|
| 493 | <listitem><para>Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.</para></listitem>
|
---|
| 494 | <listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
|
---|
| 495 | </itemizedlist>
|
---|
| 496 |
|
---|
| 497 | </sect2>
|
---|
| 498 |
|
---|
| 499 | </sect1>
|
---|
| 500 | </chapter>
|
---|