| 1 | <?xml version="1.0" encoding="iso-8859-1"?>
 | 
|---|
| 2 | <!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
 | 
|---|
| 3 | <chapter id="SambaHA">
 | 
|---|
| 4 | <chapterinfo>
 | 
|---|
| 5 |         &author.jht;
 | 
|---|
| 6 |         &author.jeremy;
 | 
|---|
| 7 | </chapterinfo>
 | 
|---|
| 8 | 
 | 
|---|
| 9 | <title>High Availability</title>
 | 
|---|
| 10 | 
 | 
|---|
| 11 | <sect1>
 | 
|---|
| 12 | <title>Features and Benefits</title>
 | 
|---|
| 13 | 
 | 
|---|
| 14 | <para>
 | 
|---|
| 15 | <indexterm><primary>availability</primary></indexterm>
 | 
|---|
| 16 | <indexterm><primary>intolerance</primary></indexterm>
 | 
|---|
| 17 | <indexterm><primary>vital task</primary></indexterm>
 | 
|---|
| 18 | Network administrators are often concerned about the availability of file and print
 | 
|---|
| 19 | services. Network users are inclined toward intolerance of the services they depend
 | 
|---|
| 20 | on to perform vital task responsibilities.
 | 
|---|
| 21 | </para>
 | 
|---|
| 22 | 
 | 
|---|
| 23 | <para>
 | 
|---|
| 24 | A sign in a computer room served to remind staff of their responsibilities. It read:
 | 
|---|
| 25 | </para>
 | 
|---|
| 26 | 
 | 
|---|
| 27 | <blockquote>
 | 
|---|
| 28 | <para>
 | 
|---|
| 29 | <indexterm><primary>fail</primary></indexterm>
 | 
|---|
| 30 | <indexterm><primary>managed by humans</primary></indexterm>
 | 
|---|
| 31 | <indexterm><primary>economically wise</primary></indexterm>
 | 
|---|
| 32 | <indexterm><primary>anticipate failure</primary></indexterm>
 | 
|---|
| 33 | All humans fail, in both great and small ways we fail continually. Machines fail too.
 | 
|---|
| 34 | Computers are machines that are managed by humans, the fallout from failure
 | 
|---|
| 35 | can be spectacular. Your responsibility is to deal with failure, to anticipate it
 | 
|---|
| 36 | and to eliminate it as far as is humanly and economically wise to achieve.
 | 
|---|
| 37 | Are your actions part of the problem or part of the solution?
 | 
|---|
| 38 | </para>
 | 
|---|
| 39 | </blockquote>
 | 
|---|
| 40 | 
 | 
|---|
| 41 | <para>
 | 
|---|
| 42 | If we are to deal with failure in a planned and productive manner, then first we must
 | 
|---|
| 43 | understand the problem. That is the purpose of this chapter.
 | 
|---|
| 44 | </para>
 | 
|---|
| 45 | 
 | 
|---|
| 46 | <para>
 | 
|---|
| 47 | <indexterm><primary>high availability</primary></indexterm>
 | 
|---|
| 48 | <indexterm><primary>CIFS/SMB</primary></indexterm>
 | 
|---|
| 49 | <indexterm><primary>state of knowledge</primary></indexterm>
 | 
|---|
| 50 | Parenthetically, in the following discussion there are seeds of information on how to
 | 
|---|
| 51 | provision a network infrastructure against failure. Our purpose here is not to provide
 | 
|---|
| 52 | a lengthy dissertation on the subject of high availability. Additionally, we have made
 | 
|---|
| 53 | a conscious decision to not provide detailed working examples of high availability
 | 
|---|
| 54 | solutions; instead we present an overview of the issues in the hope that someone will
 | 
|---|
| 55 | rise to the challenge of providing a detailed document that is focused purely on
 | 
|---|
| 56 | presentation of the current state of knowledge and practice in high availability as it
 | 
|---|
| 57 | applies to the deployment of Samba and other CIFS/SMB technologies.
 | 
|---|
| 58 | </para>
 | 
|---|
| 59 | 
 | 
|---|
| 60 | </sect1>
 | 
|---|
| 61 | 
 | 
|---|
| 62 | <sect1>
 | 
|---|
| 63 | <title>Technical Discussion</title>
 | 
|---|
| 64 | 
 | 
|---|
| 65 | <para>
 | 
|---|
| 66 | <indexterm><primary>SambaXP conference</primary></indexterm>
 | 
|---|
| 67 | <indexterm><primary>Germany</primary></indexterm>
 | 
|---|
| 68 | <indexterm><primary>inspired structure</primary></indexterm>
 | 
|---|
| 69 | The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
 | 
|---|
| 70 | conference that was held at Goettingen, Germany, in April 2003. Material has been added
 | 
|---|
| 71 | from other sources, but it was Jeremy who inspired the structure that follows.
 | 
|---|
| 72 | </para>
 | 
|---|
| 73 | 
 | 
|---|
| 74 |         <sect2>
 | 
|---|
| 75 |         <title>The Ultimate Goal</title>
 | 
|---|
| 76 | 
 | 
|---|
| 77 |         <para>
 | 
|---|
| 78 | <indexterm><primary>clustering technologies</primary></indexterm>
 | 
|---|
| 79 | <indexterm><primary>affordable power</primary></indexterm>
 | 
|---|
| 80 | <indexterm><primary>unstoppable services</primary></indexterm>
 | 
|---|
| 81 |         All clustering technologies aim to achieve one or more of the following:
 | 
|---|
| 82 |         </para>
 | 
|---|
| 83 | 
 | 
|---|
| 84 |         <itemizedlist>
 | 
|---|
| 85 |                 <listitem><para>Obtain the maximum affordable computational power.</para></listitem>
 | 
|---|
| 86 |                 <listitem><para>Obtain faster program execution.</para></listitem>
 | 
|---|
| 87 |                 <listitem><para>Deliver unstoppable services.</para></listitem>
 | 
|---|
| 88 |                 <listitem><para>Avert points of failure.</para></listitem>
 | 
|---|
| 89 |                 <listitem><para>Exact most effective utilization of resources.</para></listitem>
 | 
|---|
| 90 |         </itemizedlist>
 | 
|---|
| 91 | 
 | 
|---|
| 92 |         <para>
 | 
|---|
| 93 |         A clustered file server ideally has the following properties:
 | 
|---|
| 94 | <indexterm><primary>clustered file server</primary></indexterm>
 | 
|---|
| 95 | <indexterm><primary>connect transparently</primary></indexterm>
 | 
|---|
| 96 | <indexterm><primary>transparently reconnected</primary></indexterm>
 | 
|---|
| 97 | <indexterm><primary>distributed file system</primary></indexterm>
 | 
|---|
| 98 |         </para>
 | 
|---|
| 99 | 
 | 
|---|
| 100 |         <itemizedlist>
 | 
|---|
| 101 |                 <listitem><para>All clients can connect transparently to any server.</para></listitem>
 | 
|---|
| 102 |                 <listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
 | 
|---|
| 103 |                 <listitem><para>All servers serve out the same set of files.</para></listitem>
 | 
|---|
| 104 |                 <listitem><para>All file changes are immediately seen on all servers.</para>
 | 
|---|
| 105 |                         <itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
 | 
|---|
| 106 |                 <listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
 | 
|---|
| 107 |         </itemizedlist>
 | 
|---|
| 108 | 
 | 
|---|
| 109 |         </sect2>
 | 
|---|
| 110 | 
 | 
|---|
| 111 |         <sect2>
 | 
|---|
| 112 |         <title>Why Is This So Hard?</title>
 | 
|---|
| 113 | 
 | 
|---|
| 114 |         <para>
 | 
|---|
| 115 |         In short, the problem is one of <emphasis>state</emphasis>.
 | 
|---|
| 116 |         </para>
 | 
|---|
| 117 | 
 | 
|---|
| 118 |         <itemizedlist>
 | 
|---|
| 119 |                 <listitem>
 | 
|---|
| 120 |                         <para>
 | 
|---|
| 121 | <indexterm><primary>state information</primary></indexterm>
 | 
|---|
| 122 |                         All TCP/IP connections are dependent on state information.
 | 
|---|
| 123 |                         </para>
 | 
|---|
| 124 |                         <para>
 | 
|---|
| 125 | <indexterm><primary>TCP failover</primary></indexterm>
 | 
|---|
| 126 |                         The TCP connection involves a packet sequence number. This
 | 
|---|
| 127 |                         sequence number would need to be dynamically updated on all
 | 
|---|
| 128 |                         machines in the cluster to effect seamless TCP failover.
 | 
|---|
| 129 |                         </para>
 | 
|---|
| 130 |                 </listitem>
 | 
|---|
| 131 |                 <listitem>
 | 
|---|
| 132 |                         <para>
 | 
|---|
| 133 | <indexterm><primary>CIFS/SMB</primary></indexterm>
 | 
|---|
| 134 | <indexterm><primary>TCP</primary></indexterm>
 | 
|---|
| 135 |                         CIFS/SMB (the Windows networking protocols) uses TCP connections.
 | 
|---|
| 136 |                         </para>
 | 
|---|
| 137 |                         <para>
 | 
|---|
| 138 |                         This means that from a basic design perspective, failover is not
 | 
|---|
| 139 |                         seriously considered.
 | 
|---|
| 140 |                         <itemizedlist>
 | 
|---|
| 141 |                                 <listitem><para>
 | 
|---|
| 142 |                                 All current SMB clusters are failover solutions
 | 
|---|
| 143 |                                 &smbmdash; they rely on the clients to reconnect. They provide server
 | 
|---|
| 144 |                                 failover, but clients can lose information due to a server failure.
 | 
|---|
| 145 | <indexterm><primary>server failure</primary></indexterm>
 | 
|---|
| 146 |                                 </para></listitem>
 | 
|---|
| 147 |                         </itemizedlist>
 | 
|---|
| 148 |                         </para>
 | 
|---|
| 149 |                 </listitem>
 | 
|---|
| 150 |                 <listitem>
 | 
|---|
| 151 |                         <para>
 | 
|---|
| 152 |                         Servers keep state information about client connections.
 | 
|---|
| 153 |                         <itemizedlist>
 | 
|---|
| 154 | <indexterm><primary>state</primary></indexterm>
 | 
|---|
| 155 |                                 <listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
 | 
|---|
| 156 |                                 <listitem><para>Every file open must be compared with other open files
 | 
|---|
| 157 |                                                 to check share modes.</para></listitem>
 | 
|---|
| 158 |                         </itemizedlist>
 | 
|---|
| 159 |                         </para>
 | 
|---|
| 160 |                 </listitem>
 | 
|---|
| 161 |         </itemizedlist>
 | 
|---|
| 162 | 
 | 
|---|
| 163 |                 <sect3>
 | 
|---|
| 164 |                 <title>The Front-End Challenge</title>
 | 
|---|
| 165 | 
 | 
|---|
| 166 |                 <para>
 | 
|---|
| 167 | <indexterm><primary>cluster servers</primary></indexterm>
 | 
|---|
| 168 | <indexterm><primary>single server</primary></indexterm>
 | 
|---|
| 169 | <indexterm><primary>TCP data streams</primary></indexterm>
 | 
|---|
| 170 | <indexterm><primary>front-end virtual server</primary></indexterm>
 | 
|---|
| 171 | <indexterm><primary>virtual server</primary></indexterm>
 | 
|---|
| 172 | <indexterm><primary>de-multiplex</primary></indexterm>
 | 
|---|
| 173 | <indexterm><primary>SMB</primary></indexterm>
 | 
|---|
| 174 |                 To make it possible for a cluster of file servers to appear as a single server that has one
 | 
|---|
| 175 |                 name and one IP address, the incoming TCP data streams from clients must be processed by the
 | 
|---|
| 176 |                 front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
 | 
|---|
| 177 |                 layer level and then feed the SMB packet to different servers in the cluster.
 | 
|---|
| 178 |                 </para>
 | 
|---|
| 179 | 
 | 
|---|
| 180 |                 <para>
 | 
|---|
| 181 | <indexterm><primary>IPC$ connections</primary></indexterm>
 | 
|---|
| 182 | <indexterm><primary>RPC calls</primary></indexterm>
 | 
|---|
| 183 |                 One could split all IPC$ connections and RPC calls to one server to handle printing and user
 | 
|---|
| 184 |                 lookup requirements. RPC printing handles are shared between different IPC4 sessions &smbmdash; it is
 | 
|---|
| 185 |                 hard to split this across clustered servers!
 | 
|---|
| 186 |                 </para>
 | 
|---|
| 187 | 
 | 
|---|
| 188 |                 <para>
 | 
|---|
| 189 |                 Conceptually speaking, all other servers would then provide only file services. This is a simpler
 | 
|---|
| 190 |                 problem to concentrate on.
 | 
|---|
| 191 |                 </para>
 | 
|---|
| 192 | 
 | 
|---|
| 193 |                 </sect3>
 | 
|---|
| 194 | 
 | 
|---|
| 195 |                 <sect3>
 | 
|---|
| 196 |                 <title>Demultiplexing SMB Requests</title>
 | 
|---|
| 197 | 
 | 
|---|
| 198 |                 <para>
 | 
|---|
| 199 | <indexterm><primary>SMB requests</primary></indexterm>
 | 
|---|
| 200 | <indexterm><primary>SMB state information</primary></indexterm>
 | 
|---|
| 201 | <indexterm><primary>front-end virtual server</primary></indexterm>
 | 
|---|
| 202 | <indexterm><primary>complicated problem</primary></indexterm>
 | 
|---|
| 203 |                 De-multiplexing of SMB requests requires knowledge of SMB state information,
 | 
|---|
| 204 |                 all of which must be held by the front-end <emphasis>virtual</emphasis> server.
 | 
|---|
| 205 |                 This is a perplexing and complicated problem to solve.
 | 
|---|
| 206 |                 </para>
 | 
|---|
| 207 | 
 | 
|---|
| 208 |                 <para>
 | 
|---|
| 209 | <indexterm><primary>vuid</primary></indexterm>
 | 
|---|
| 210 | <indexterm><primary>tid</primary></indexterm>
 | 
|---|
| 211 | <indexterm><primary>fid</primary></indexterm>
 | 
|---|
| 212 |                 Windows XP and later have changed semantics so state information (vuid, tid, fid)
 | 
|---|
| 213 |                 must match for a successful operation. This makes things simpler than before and is a
 | 
|---|
| 214 |                 positive step forward.
 | 
|---|
| 215 |                 </para>
 | 
|---|
| 216 | 
 | 
|---|
| 217 |                 <para>
 | 
|---|
| 218 | <indexterm><primary>SMB requests</primary></indexterm>
 | 
|---|
| 219 | <indexterm><primary>Terminal Server</primary></indexterm>
 | 
|---|
| 220 |                 SMB requests are sent by vuid to their associated server. No code exists today to
 | 
|---|
| 221 |                 effect this solution. This problem is conceptually similar to the problem of
 | 
|---|
| 222 |                 correctly handling requests from multiple requests from Windows 2000
 | 
|---|
| 223 |                 Terminal Server in Samba.
 | 
|---|
| 224 |                 </para>
 | 
|---|
| 225 | 
 | 
|---|
| 226 |                 <para>
 | 
|---|
| 227 | <indexterm><primary>de-multiplexing</primary></indexterm>
 | 
|---|
| 228 |                 One possibility is to start by exposing the server pool to clients directly.
 | 
|---|
| 229 |                 This could eliminate the de-multiplexing step.
 | 
|---|
| 230 |                 </para>
 | 
|---|
| 231 | 
 | 
|---|
| 232 |                 </sect3>
 | 
|---|
| 233 | 
 | 
|---|
| 234 |                 <sect3>
 | 
|---|
| 235 |                 <title>The Distributed File System Challenge</title>
 | 
|---|
| 236 | 
 | 
|---|
| 237 |                 <para>
 | 
|---|
| 238 | <indexterm><primary>Distributed File Systems</primary></indexterm>
 | 
|---|
| 239 |                 There exists many distributed file systems for UNIX and Linux.
 | 
|---|
| 240 |                 </para>
 | 
|---|
| 241 | 
 | 
|---|
| 242 |                 <para>
 | 
|---|
| 243 | <indexterm><primary>backend</primary></indexterm>
 | 
|---|
| 244 | <indexterm><primary>SMB semantics</primary></indexterm>
 | 
|---|
| 245 | <indexterm><primary>share modes</primary></indexterm>
 | 
|---|
| 246 | <indexterm><primary>locking</primary></indexterm>
 | 
|---|
| 247 | <indexterm><primary>oplock</primary></indexterm>
 | 
|---|
| 248 | <indexterm><primary>distributed file systems</primary></indexterm>
 | 
|---|
| 249 |                 Many could be adopted to backend our cluster, so long as awareness of SMB
 | 
|---|
| 250 |                 semantics is kept in mind (share modes, locking, and oplock issues in particular).
 | 
|---|
| 251 |                 Common free distributed file systems include:
 | 
|---|
| 252 | <indexterm><primary>NFS</primary></indexterm>
 | 
|---|
| 253 | <indexterm><primary>AFS</primary></indexterm>
 | 
|---|
| 254 | <indexterm><primary>OpenGFS</primary></indexterm>
 | 
|---|
| 255 | <indexterm><primary>Lustre</primary></indexterm>
 | 
|---|
| 256 |                 </para>
 | 
|---|
| 257 | 
 | 
|---|
| 258 |                 <itemizedlist>
 | 
|---|
| 259 |                         <listitem><para>NFS</para></listitem>
 | 
|---|
| 260 |                         <listitem><para>AFS</para></listitem>
 | 
|---|
| 261 |                         <listitem><para>OpenGFS</para></listitem>
 | 
|---|
| 262 |                         <listitem><para>Lustre</para></listitem>
 | 
|---|
| 263 |                 </itemizedlist>
 | 
|---|
| 264 | 
 | 
|---|
| 265 |                 <para>
 | 
|---|
| 266 | <indexterm><primary>server pool</primary></indexterm>
 | 
|---|
| 267 |                 The server pool (cluster) can use any distributed file system backend if all SMB
 | 
|---|
| 268 |                 semantics are performed within this pool.
 | 
|---|
| 269 |                 </para>
 | 
|---|
| 270 | 
 | 
|---|
| 271 |                 </sect3>
 | 
|---|
| 272 | 
 | 
|---|
| 273 |                 <sect3>
 | 
|---|
| 274 |                 <title>Restrictive Constraints on Distributed File Systems</title>
 | 
|---|
| 275 | 
 | 
|---|
| 276 |                 <para>
 | 
|---|
| 277 | <indexterm><primary>SMB services</primary></indexterm>
 | 
|---|
| 278 | <indexterm><primary>oplock handling</primary></indexterm>
 | 
|---|
| 279 | <indexterm><primary>server pool</primary></indexterm>
 | 
|---|
| 280 | <indexterm><primary>backend file system pool</primary></indexterm>
 | 
|---|
| 281 |                 Where a clustered server provides purely SMB services, oplock handling
 | 
|---|
| 282 |                 may be done within the server pool without imposing a need for this to
 | 
|---|
| 283 |                 be passed to the backend file system pool.
 | 
|---|
| 284 |                 </para>
 | 
|---|
| 285 | 
 | 
|---|
| 286 |                 <para>
 | 
|---|
| 287 | <indexterm><primary>NFS</primary></indexterm>
 | 
|---|
| 288 | <indexterm><primary>interoperability</primary></indexterm>
 | 
|---|
| 289 |                 On the other hand, where the server pool also provides NFS or other file services,
 | 
|---|
| 290 |                 it will be essential that the implementation be oplock-aware so it can
 | 
|---|
| 291 |                 interoperate with SMB services. This is a significant challenge today. A failure
 | 
|---|
| 292 |                 to provide this interoperability will result in a significant loss of performance that will be
 | 
|---|
| 293 |                 sorely noted by users of Microsoft Windows clients.
 | 
|---|
| 294 |                 </para>
 | 
|---|
| 295 | 
 | 
|---|
| 296 |                 <para>
 | 
|---|
| 297 |                 Last, all state information must be shared across the server pool.
 | 
|---|
| 298 |                 </para>
 | 
|---|
| 299 | 
 | 
|---|
| 300 |                 </sect3>
 | 
|---|
| 301 | 
 | 
|---|
| 302 |                 <sect3>
 | 
|---|
| 303 |                 <title>Server Pool Communications</title>
 | 
|---|
| 304 | 
 | 
|---|
| 305 |                 <para>
 | 
|---|
| 306 | <indexterm><primary>POSIX semantics</primary></indexterm>
 | 
|---|
| 307 | <indexterm><primary>SMB</primary></indexterm>
 | 
|---|
| 308 | <indexterm><primary>POSIX locks</primary></indexterm>
 | 
|---|
| 309 | <indexterm><primary>SMB locks</primary></indexterm>
 | 
|---|
| 310 |                 Most backend file systems support POSIX file semantics. This makes it difficult
 | 
|---|
| 311 |                 to push SMB semantics back into the file system. POSIX locks have different properties
 | 
|---|
| 312 |                 and semantics from SMB locks.
 | 
|---|
| 313 |                 </para>
 | 
|---|
| 314 | 
 | 
|---|
| 315 |                 <para>
 | 
|---|
| 316 | <indexterm><primary>smbd</primary></indexterm>
 | 
|---|
| 317 | <indexterm><primary>tdb</primary></indexterm>
 | 
|---|
| 318 | <indexterm><primary>Clustered smbds</primary></indexterm>
 | 
|---|
| 319 |                 All <command>smbd</command> processes in the server pool must of necessity communicate
 | 
|---|
| 320 |                 very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
 | 
|---|
| 321 |                 uses is not suitable for use across a network. Clustered <command>smbd</command>s must use something else.
 | 
|---|
| 322 |                 </para>
 | 
|---|
| 323 | 
 | 
|---|
| 324 |                 </sect3>
 | 
|---|
| 325 | 
 | 
|---|
| 326 |                 <sect3>
 | 
|---|
| 327 |                 <title>Server Pool Communications Demands</title>
 | 
|---|
| 328 | 
 | 
|---|
| 329 |                 <para>
 | 
|---|
| 330 |                 High-speed interserver communications in the server pool is a design prerequisite
 | 
|---|
| 331 |                 for a fully functional system. Possibilities for this include:
 | 
|---|
| 332 |                 </para>
 | 
|---|
| 333 | 
 | 
|---|
| 334 |                 <itemizedlist>
 | 
|---|
| 335 | <indexterm><primary>Myrinet</primary></indexterm>
 | 
|---|
| 336 | <indexterm><primary>scalable coherent interface</primary><see>SCI</see></indexterm>
 | 
|---|
| 337 |                         <listitem><para>
 | 
|---|
| 338 |                         Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
 | 
|---|
| 339 |                         These are high-cost items.
 | 
|---|
| 340 |                         </para></listitem>
 | 
|---|
| 341 |                 
 | 
|---|
| 342 |                         <listitem><para>
 | 
|---|
| 343 |                         Gigabit Ethernet (now quite affordable).
 | 
|---|
| 344 |                         </para></listitem>
 | 
|---|
| 345 |                 
 | 
|---|
| 346 |                         <listitem><para>
 | 
|---|
| 347 |                         Raw Ethernet framing (to bypass TCP and UDP overheads).
 | 
|---|
| 348 |                         </para></listitem>
 | 
|---|
| 349 |                 </itemizedlist>
 | 
|---|
| 350 | 
 | 
|---|
| 351 |                 <para>
 | 
|---|
| 352 |                 We have yet to identify metrics for  performance demands to enable this to happen
 | 
|---|
| 353 |                 effectively.
 | 
|---|
| 354 |                 </para>
 | 
|---|
| 355 | 
 | 
|---|
| 356 |                 </sect3>
 | 
|---|
| 357 | 
 | 
|---|
| 358 |                 <sect3>
 | 
|---|
| 359 |                 <title>Required Modifications to Samba</title>
 | 
|---|
| 360 | 
 | 
|---|
| 361 |                 <para>
 | 
|---|
| 362 |                 Samba needs to be significantly modified to work with a high-speed server interconnect
 | 
|---|
| 363 |                 system to permit transparent failover clustering.
 | 
|---|
| 364 |                 </para>
 | 
|---|
| 365 | 
 | 
|---|
| 366 |                 <para>
 | 
|---|
| 367 |                 Particular functions inside Samba that will be affected include:
 | 
|---|
| 368 |                 </para>
 | 
|---|
| 369 | 
 | 
|---|
| 370 |                 <itemizedlist>
 | 
|---|
| 371 |                         <listitem><para>
 | 
|---|
| 372 |                         The locking database, oplock notifications,
 | 
|---|
| 373 |                         and the share mode database.
 | 
|---|
| 374 |                         </para></listitem>
 | 
|---|
| 375 | 
 | 
|---|
| 376 |                         <listitem><para>
 | 
|---|
| 377 | <indexterm><primary>failure semantics</primary></indexterm>
 | 
|---|
| 378 | <indexterm><primary>oplock messages</primary></indexterm>
 | 
|---|
| 379 |                         Failure semantics need to be defined. Samba behaves the same way as Windows.
 | 
|---|
| 380 |                         When oplock messages fail, a file open request is allowed, but this is 
 | 
|---|
| 381 |                         potentially dangerous in a clustered environment. So how should interserver
 | 
|---|
| 382 |                         pool failure semantics function, and how should such functionality be implemented?
 | 
|---|
| 383 |                         </para></listitem>
 | 
|---|
| 384 | 
 | 
|---|
| 385 |                         <listitem><para>
 | 
|---|
| 386 |                         Should this be implemented using a point-to-point lock manager, or can this
 | 
|---|
| 387 |                         be done using multicast techniques?
 | 
|---|
| 388 |                         </para></listitem>
 | 
|---|
| 389 | 
 | 
|---|
| 390 |                 </itemizedlist>
 | 
|---|
| 391 | 
 | 
|---|
| 392 |                 </sect3>
 | 
|---|
| 393 |         </sect2>
 | 
|---|
| 394 | 
 | 
|---|
| 395 |         <sect2>
 | 
|---|
| 396 |         <title>A Simple Solution</title>
 | 
|---|
| 397 | 
 | 
|---|
| 398 |         <para>
 | 
|---|
| 399 | <indexterm><primary>failover servers</primary></indexterm>
 | 
|---|
| 400 | <indexterm><primary>exported file system</primary></indexterm>
 | 
|---|
| 401 | <indexterm><primary>distributed locking protocol</primary></indexterm>
 | 
|---|
| 402 |         Allowing failover servers to handle different functions within the exported file system
 | 
|---|
| 403 |         removes the problem of requiring a distributed locking protocol.
 | 
|---|
| 404 |         </para>
 | 
|---|
| 405 | 
 | 
|---|
| 406 |         <para>
 | 
|---|
| 407 | <indexterm><primary>high-speed server interconnect</primary></indexterm>
 | 
|---|
| 408 | <indexterm><primary>complex file name space</primary></indexterm>
 | 
|---|
| 409 |         If only one server is active in a pair, the need for high-speed server interconnect is avoided.
 | 
|---|
| 410 |         This allows the use of existing high-availability solutions, instead of inventing a new one.
 | 
|---|
| 411 |         This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
 | 
|---|
| 412 |         complex file name space. Since there is now not a single file system, administrators
 | 
|---|
| 413 |         must remember where all services are located &smbmdash; a complexity not easily dealt with.
 | 
|---|
| 414 |         </para>
 | 
|---|
| 415 | 
 | 
|---|
| 416 |         <para>
 | 
|---|
| 417 | <indexterm><primary>virtual server</primary></indexterm>
 | 
|---|
| 418 |         The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
 | 
|---|
| 419 |         servers. Backend file space integrity is the responsibility of the administrator.
 | 
|---|
| 420 |         </para>
 | 
|---|
| 421 | 
 | 
|---|
| 422 |         </sect2>
 | 
|---|
| 423 | 
 | 
|---|
| 424 |         <sect2>
 | 
|---|
| 425 |         <title>High-Availability Server Products</title>
 | 
|---|
| 426 | 
 | 
|---|
| 427 |         <para>
 | 
|---|
| 428 | <indexterm><primary>resource failover</primary></indexterm>
 | 
|---|
| 429 | <indexterm><primary>high-availability services</primary></indexterm>
 | 
|---|
| 430 | <indexterm><primary>dedicated heartbeat</primary></indexterm>
 | 
|---|
| 431 | <indexterm><primary>LAN</primary></indexterm>
 | 
|---|
| 432 | <indexterm><primary>failover process</primary></indexterm>
 | 
|---|
| 433 |         Failover servers must communicate in order to handle resource failover. This is essential
 | 
|---|
| 434 |         for high-availability services. The use of a dedicated heartbeat is a common technique to
 | 
|---|
| 435 |         introduce some intelligence into the failover process. This is often done over a dedicated
 | 
|---|
| 436 |         link (LAN or serial).
 | 
|---|
| 437 |         </para>
 | 
|---|
| 438 | 
 | 
|---|
| 439 |         <para>
 | 
|---|
| 440 | <indexterm><primary>SCSI</primary></indexterm>
 | 
|---|
| 441 | <indexterm><primary>Red Hat Cluster Manager</primary></indexterm>
 | 
|---|
| 442 | <indexterm><primary>Microsoft Wolfpack</primary></indexterm>
 | 
|---|
| 443 | <indexterm><primary>Fiber Channel</primary></indexterm>
 | 
|---|
| 444 | <indexterm><primary>failover communication</primary></indexterm>
 | 
|---|
| 445 |         Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack)
 | 
|---|
| 446 |         can use a shared SCSI of Fiber Channel disk storage array for failover communication.
 | 
|---|
| 447 |         Information regarding Red Hat high availability solutions for Samba may be obtained from
 | 
|---|
| 448 |         <ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com</ulink>.
 | 
|---|
| 449 |         </para>
 | 
|---|
| 450 | 
 | 
|---|
| 451 |         <para>
 | 
|---|
| 452 | <indexterm><primary>Linux High Availability project</primary></indexterm>
 | 
|---|
| 453 |         The Linux High Availability project is a resource worthy of consultation if your desire is
 | 
|---|
| 454 |         to build a highly available Samba file server solution. Please consult the home page at
 | 
|---|
| 455 |         <ulink url="http://www.linux-ha.org/">www.linux-ha.org/</ulink>.
 | 
|---|
| 456 |         </para>
 | 
|---|
| 457 | 
 | 
|---|
| 458 |         <para>
 | 
|---|
| 459 | <indexterm><primary>backend failures</primary></indexterm>
 | 
|---|
| 460 | <indexterm><primary>continuity of service</primary></indexterm>
 | 
|---|
| 461 |         Front-end server complexity remains a challenge for high availability because it must deal
 | 
|---|
| 462 |         gracefully with backend failures, while at the same time providing continuity of service
 | 
|---|
| 463 |         to all network clients.
 | 
|---|
| 464 |         </para>
 | 
|---|
| 465 |         
 | 
|---|
| 466 |         </sect2>
 | 
|---|
| 467 | 
 | 
|---|
| 468 |         <sect2>
 | 
|---|
| 469 |         <title>MS-DFS: The Poor Man's Cluster</title>
 | 
|---|
| 470 | 
 | 
|---|
| 471 |         <para>
 | 
|---|
| 472 | <indexterm><primary>MS-DFS</primary></indexterm>
 | 
|---|
| 473 | <indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
 | 
|---|
| 474 |         MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
 | 
|---|
| 475 |         complexity back to the network client, something already included by Microsoft.
 | 
|---|
| 476 |         MS-DFS creates the illusion of a simple, continuous file system name space that works even
 | 
|---|
| 477 |         at the file level.
 | 
|---|
| 478 |         </para>
 | 
|---|
| 479 | 
 | 
|---|
| 480 |         <para>
 | 
|---|
| 481 |         Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can
 | 
|---|
| 482 |         be created using existing Samba functionality.
 | 
|---|
| 483 |         </para>
 | 
|---|
| 484 | 
 | 
|---|
| 485 |         </sect2>
 | 
|---|
| 486 | 
 | 
|---|
| 487 |         <sect2>
 | 
|---|
| 488 |         <title>Conclusions</title>
 | 
|---|
| 489 | 
 | 
|---|
| 490 |         <itemizedlist>
 | 
|---|
| 491 |                 <listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
 | 
|---|
| 492 |                 <listitem><para>Client failover is the best we can do today.</para></listitem>
 | 
|---|
| 493 |                 <listitem><para>Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.</para></listitem>
 | 
|---|
| 494 |                 <listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
 | 
|---|
| 495 |         </itemizedlist>
 | 
|---|
| 496 | 
 | 
|---|
| 497 |         </sect2>
 | 
|---|
| 498 | 
 | 
|---|
| 499 | </sect1>
 | 
|---|
| 500 | </chapter>
 | 
|---|