Context Navigation

← Previous Revision
Next Revision →
Normal
Revision Log

source: branches/samba-3.2.x/docs-xml/Samba3-HOWTO/TOSHARG-HighAvailability.xml

Visit:

Last change on this file was 203, checked in by Herwig Bauernfeind, 16 years ago
Missing 3.2.2 client and HOWTO files
File size: 18.8 KB

Rev	Line
[203]	1	<?xml version="1.0" encoding="iso-8859-1"?>
	2	<!DOCTYPE chapter PUBLIC "-//Samba-Team//DTD DocBook V4.2-Based Variant V1.0//EN" "http://www.samba.org/samba/DTD/samba-doc">
	3	<chapter id="SambaHA">
	4	<chapterinfo>
	5	&author.jht;
	6	&author.jeremy;
	7	</chapterinfo>
	8
	9	<title>High Availability</title>
	10
	11	<sect1>
	12	<title>Features and Benefits</title>
	13
	14	<para>
	15	<indexterm><primary>availability</primary></indexterm>
	16	<indexterm><primary>intolerance</primary></indexterm>
	17	<indexterm><primary>vital task</primary></indexterm>
	18	Network administrators are often concerned about the availability of file and print
	19	services. Network users are inclined toward intolerance of the services they depend
	20	on to perform vital task responsibilities.
	21	</para>
	22
	23	<para>
	24	A sign in a computer room served to remind staff of their responsibilities. It read:
	25	</para>
	26
	27	<blockquote>
	28	<para>
	29	<indexterm><primary>fail</primary></indexterm>
	30	<indexterm><primary>managed by humans</primary></indexterm>
	31	<indexterm><primary>economically wise</primary></indexterm>
	32	<indexterm><primary>anticipate failure</primary></indexterm>
	33	All humans fail, in both great and small ways we fail continually. Machines fail too.
	34	Computers are machines that are managed by humans, the fallout from failure
	35	can be spectacular. Your responsibility is to deal with failure, to anticipate it
	36	and to eliminate it as far as is humanly and economically wise to achieve.
	37	Are your actions part of the problem or part of the solution?
	38	</para>
	39	</blockquote>
	40
	41	<para>
	42	If we are to deal with failure in a planned and productive manner, then first we must
	43	understand the problem. That is the purpose of this chapter.
	44	</para>
	45
	46	<para>
	47	<indexterm><primary>high availability</primary></indexterm>
	48	<indexterm><primary>CIFS/SMB</primary></indexterm>
	49	<indexterm><primary>state of knowledge</primary></indexterm>
	50	Parenthetically, in the following discussion there are seeds of information on how to
	51	provision a network infrastructure against failure. Our purpose here is not to provide
	52	a lengthy dissertation on the subject of high availability. Additionally, we have made
	53	a conscious decision to not provide detailed working examples of high availability
	54	solutions; instead we present an overview of the issues in the hope that someone will
	55	rise to the challenge of providing a detailed document that is focused purely on
	56	presentation of the current state of knowledge and practice in high availability as it
	57	applies to the deployment of Samba and other CIFS/SMB technologies.
	58	</para>
	59
	60	</sect1>
	61
	62	<sect1>
	63	<title>Technical Discussion</title>
	64
	65	<para>
	66	<indexterm><primary>SambaXP conference</primary></indexterm>
	67	<indexterm><primary>Germany</primary></indexterm>
	68	<indexterm><primary>inspired structure</primary></indexterm>
	69	The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003
	70	conference that was held at Goettingen, Germany, in April 2003. Material has been added
	71	from other sources, but it was Jeremy who inspired the structure that follows.
	72	</para>
	73
	74	<sect2>
	75	<title>The Ultimate Goal</title>
	76
	77	<para>
	78	<indexterm><primary>clustering technologies</primary></indexterm>
	79	<indexterm><primary>affordable power</primary></indexterm>
	80	<indexterm><primary>unstoppable services</primary></indexterm>
	81	All clustering technologies aim to achieve one or more of the following:
	82	</para>
	83
	84	<itemizedlist>
	85	<listitem><para>Obtain the maximum affordable computational power.</para></listitem>
	86	<listitem><para>Obtain faster program execution.</para></listitem>
	87	<listitem><para>Deliver unstoppable services.</para></listitem>
	88	<listitem><para>Avert points of failure.</para></listitem>
	89	<listitem><para>Exact most effective utilization of resources.</para></listitem>
	90	</itemizedlist>
	91
	92	<para>
	93	A clustered file server ideally has the following properties:
	94	<indexterm><primary>clustered file server</primary></indexterm>
	95	<indexterm><primary>connect transparently</primary></indexterm>
	96	<indexterm><primary>transparently reconnected</primary></indexterm>
	97	<indexterm><primary>distributed file system</primary></indexterm>
	98	</para>
	99
	100	<itemizedlist>
	101	<listitem><para>All clients can connect transparently to any server.</para></listitem>
	102	<listitem><para>A server can fail and clients are transparently reconnected to another server.</para></listitem>
	103	<listitem><para>All servers serve out the same set of files.</para></listitem>
	104	<listitem><para>All file changes are immediately seen on all servers.</para>
	105	<itemizedlist><listitem><para>Requires a distributed file system.</para></listitem></itemizedlist></listitem>
	106	<listitem><para>Infinite ability to scale by adding more servers or disks.</para></listitem>
	107	</itemizedlist>
	108
	109	</sect2>
	110
	111	<sect2>
	112	<title>Why Is This So Hard?</title>
	113
	114	<para>
	115	In short, the problem is one of <emphasis>state</emphasis>.
	116	</para>
	117
	118	<itemizedlist>
	119	<listitem>
	120	<para>
	121	<indexterm><primary>state information</primary></indexterm>
	122	All TCP/IP connections are dependent on state information.
	123	</para>
	124	<para>
	125	<indexterm><primary>TCP failover</primary></indexterm>
	126	The TCP connection involves a packet sequence number. This
	127	sequence number would need to be dynamically updated on all
	128	machines in the cluster to effect seamless TCP failover.
	129	</para>
	130	</listitem>
	131	<listitem>
	132	<para>
	133	<indexterm><primary>CIFS/SMB</primary></indexterm>
	134	<indexterm><primary>TCP</primary></indexterm>
	135	CIFS/SMB (the Windows networking protocols) uses TCP connections.
	136	</para>
	137	<para>
	138	This means that from a basic design perspective, failover is not
	139	seriously considered.
	140	<itemizedlist>
	141	<listitem><para>
	142	All current SMB clusters are failover solutions
	143	&smbmdash; they rely on the clients to reconnect. They provide server
	144	failover, but clients can lose information due to a server failure.
	145	<indexterm><primary>server failure</primary></indexterm>
	146	</para></listitem>
	147	</itemizedlist>
	148	</para>
	149	</listitem>
	150	<listitem>
	151	<para>
	152	Servers keep state information about client connections.
	153	<itemizedlist>
	154	<indexterm><primary>state</primary></indexterm>
	155	<listitem><para>CIFS/SMB involves a lot of state.</para></listitem>
	156	<listitem><para>Every file open must be compared with other open files
	157	to check share modes.</para></listitem>
	158	</itemizedlist>
	159	</para>
	160	</listitem>
	161	</itemizedlist>
	162
	163	<sect3>
	164	<title>The Front-End Challenge</title>
	165
	166	<para>
	167	<indexterm><primary>cluster servers</primary></indexterm>
	168	<indexterm><primary>single server</primary></indexterm>
	169	<indexterm><primary>TCP data streams</primary></indexterm>
	170	<indexterm><primary>front-end virtual server</primary></indexterm>
	171	<indexterm><primary>virtual server</primary></indexterm>
	172	<indexterm><primary>de-multiplex</primary></indexterm>
	173	<indexterm><primary>SMB</primary></indexterm>
	174	To make it possible for a cluster of file servers to appear as a single server that has one
	175	name and one IP address, the incoming TCP data streams from clients must be processed by the
	176	front-end virtual server. This server must de-multiplex the incoming packets at the SMB protocol
	177	layer level and then feed the SMB packet to different servers in the cluster.
	178	</para>
	179
	180	<para>
	181	<indexterm><primary>IPC$ connections</primary></indexterm>
	182	<indexterm><primary>RPC calls</primary></indexterm>
	183	One could split all IPC$ connections and RPC calls to one server to handle printing and user
	184	lookup requirements. RPC printing handles are shared between different IPC4 sessions &smbmdash; it is
	185	hard to split this across clustered servers!
	186	</para>
	187
	188	<para>
	189	Conceptually speaking, all other servers would then provide only file services. This is a simpler
	190	problem to concentrate on.
	191	</para>
	192
	193	</sect3>
	194
	195	<sect3>
	196	<title>Demultiplexing SMB Requests</title>
	197
	198	<para>
	199	<indexterm><primary>SMB requests</primary></indexterm>
	200	<indexterm><primary>SMB state information</primary></indexterm>
	201	<indexterm><primary>front-end virtual server</primary></indexterm>
	202	<indexterm><primary>complicated problem</primary></indexterm>
	203	De-multiplexing of SMB requests requires knowledge of SMB state information,
	204	all of which must be held by the front-end <emphasis>virtual</emphasis> server.
	205	This is a perplexing and complicated problem to solve.
	206	</para>
	207
	208	<para>
	209	<indexterm><primary>vuid</primary></indexterm>
	210	<indexterm><primary>tid</primary></indexterm>
	211	<indexterm><primary>fid</primary></indexterm>
	212	Windows XP and later have changed semantics so state information (vuid, tid, fid)
	213	must match for a successful operation. This makes things simpler than before and is a
	214	positive step forward.
	215	</para>
	216
	217	<para>
	218	<indexterm><primary>SMB requests</primary></indexterm>
	219	<indexterm><primary>Terminal Server</primary></indexterm>
	220	SMB requests are sent by vuid to their associated server. No code exists today to
	221	effect this solution. This problem is conceptually similar to the problem of
	222	correctly handling requests from multiple requests from Windows 2000
	223	Terminal Server in Samba.
	224	</para>
	225
	226	<para>
	227	<indexterm><primary>de-multiplexing</primary></indexterm>
	228	One possibility is to start by exposing the server pool to clients directly.
	229	This could eliminate the de-multiplexing step.
	230	</para>
	231
	232	</sect3>
	233
	234	<sect3>
	235	<title>The Distributed File System Challenge</title>
	236
	237	<para>
	238	<indexterm><primary>Distributed File Systems</primary></indexterm>
	239	There exists many distributed file systems for UNIX and Linux.
	240	</para>
	241
	242	<para>
	243	<indexterm><primary>backend</primary></indexterm>
	244	<indexterm><primary>SMB semantics</primary></indexterm>
	245	<indexterm><primary>share modes</primary></indexterm>
	246	<indexterm><primary>locking</primary></indexterm>
	247	<indexterm><primary>oplock</primary></indexterm>
	248	<indexterm><primary>distributed file systems</primary></indexterm>
	249	Many could be adopted to backend our cluster, so long as awareness of SMB
	250	semantics is kept in mind (share modes, locking, and oplock issues in particular).
	251	Common free distributed file systems include:
	252	<indexterm><primary>NFS</primary></indexterm>
	253	<indexterm><primary>AFS</primary></indexterm>
	254	<indexterm><primary>OpenGFS</primary></indexterm>
	255	<indexterm><primary>Lustre</primary></indexterm>
	256	</para>
	257
	258	<itemizedlist>
	259	<listitem><para>NFS</para></listitem>
	260	<listitem><para>AFS</para></listitem>
	261	<listitem><para>OpenGFS</para></listitem>
	262	<listitem><para>Lustre</para></listitem>
	263	</itemizedlist>
	264
	265	<para>
	266	<indexterm><primary>server pool</primary></indexterm>
	267	The server pool (cluster) can use any distributed file system backend if all SMB
	268	semantics are performed within this pool.
	269	</para>
	270
	271	</sect3>
	272
	273	<sect3>
	274	<title>Restrictive Constraints on Distributed File Systems</title>
	275
	276	<para>
	277	<indexterm><primary>SMB services</primary></indexterm>
	278	<indexterm><primary>oplock handling</primary></indexterm>
	279	<indexterm><primary>server pool</primary></indexterm>
	280	<indexterm><primary>backend file system pool</primary></indexterm>
	281	Where a clustered server provides purely SMB services, oplock handling
	282	may be done within the server pool without imposing a need for this to
	283	be passed to the backend file system pool.
	284	</para>
	285
	286	<para>
	287	<indexterm><primary>NFS</primary></indexterm>
	288	<indexterm><primary>interoperability</primary></indexterm>
	289	On the other hand, where the server pool also provides NFS or other file services,
	290	it will be essential that the implementation be oplock-aware so it can
	291	interoperate with SMB services. This is a significant challenge today. A failure
	292	to provide this interoperability will result in a significant loss of performance that will be
	293	sorely noted by users of Microsoft Windows clients.
	294	</para>
	295
	296	<para>
	297	Last, all state information must be shared across the server pool.
	298	</para>
	299
	300	</sect3>
	301
	302	<sect3>
	303	<title>Server Pool Communications</title>
	304
	305	<para>
	306	<indexterm><primary>POSIX semantics</primary></indexterm>
	307	<indexterm><primary>SMB</primary></indexterm>
	308	<indexterm><primary>POSIX locks</primary></indexterm>
	309	<indexterm><primary>SMB locks</primary></indexterm>
	310	Most backend file systems support POSIX file semantics. This makes it difficult
	311	to push SMB semantics back into the file system. POSIX locks have different properties
	312	and semantics from SMB locks.
	313	</para>
	314
	315	<para>
	316	<indexterm><primary>smbd</primary></indexterm>
	317	<indexterm><primary>tdb</primary></indexterm>
	318	<indexterm><primary>Clustered smbds</primary></indexterm>
	319	All <command>smbd</command> processes in the server pool must of necessity communicate
	320	very quickly. For this, the current <parameter>tdb</parameter> file structure that Samba
	321	uses is not suitable for use across a network. Clustered <command>smbd</command>s must use something else.
	322	</para>
	323
	324	</sect3>
	325
	326	<sect3>
	327	<title>Server Pool Communications Demands</title>
	328
	329	<para>
	330	High-speed interserver communications in the server pool is a design prerequisite
	331	for a fully functional system. Possibilities for this include:
	332	</para>
	333
	334	<itemizedlist>
	335	<indexterm><primary>Myrinet</primary></indexterm>
	336	<indexterm><primary>scalable coherent interface</primary><see>SCI</see></indexterm>
	337	<listitem><para>
	338	Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
	339	These are high-cost items.
	340	</para></listitem>
	341
	342	<listitem><para>
	343	Gigabit Ethernet (now quite affordable).
	344	</para></listitem>
	345
	346	<listitem><para>
	347	Raw Ethernet framing (to bypass TCP and UDP overheads).
	348	</para></listitem>
	349	</itemizedlist>
	350
	351	<para>
	352	We have yet to identify metrics for performance demands to enable this to happen
	353	effectively.
	354	</para>
	355
	356	</sect3>
	357
	358	<sect3>
	359	<title>Required Modifications to Samba</title>
	360
	361	<para>
	362	Samba needs to be significantly modified to work with a high-speed server interconnect
	363	system to permit transparent failover clustering.
	364	</para>
	365
	366	<para>
	367	Particular functions inside Samba that will be affected include:
	368	</para>
	369
	370	<itemizedlist>
	371	<listitem><para>
	372	The locking database, oplock notifications,
	373	and the share mode database.
	374	</para></listitem>
	375
	376	<listitem><para>
	377	<indexterm><primary>failure semantics</primary></indexterm>
	378	<indexterm><primary>oplock messages</primary></indexterm>
	379	Failure semantics need to be defined. Samba behaves the same way as Windows.
	380	When oplock messages fail, a file open request is allowed, but this is
	381	potentially dangerous in a clustered environment. So how should interserver
	382	pool failure semantics function, and how should such functionality be implemented?
	383	</para></listitem>
	384
	385	<listitem><para>
	386	Should this be implemented using a point-to-point lock manager, or can this
	387	be done using multicast techniques?
	388	</para></listitem>
	389
	390	</itemizedlist>
	391
	392	</sect3>
	393	</sect2>
	394
	395	<sect2>
	396	<title>A Simple Solution</title>
	397
	398	<para>
	399	<indexterm><primary>failover servers</primary></indexterm>
	400	<indexterm><primary>exported file system</primary></indexterm>
	401	<indexterm><primary>distributed locking protocol</primary></indexterm>
	402	Allowing failover servers to handle different functions within the exported file system
	403	removes the problem of requiring a distributed locking protocol.
	404	</para>
	405
	406	<para>
	407	<indexterm><primary>high-speed server interconnect</primary></indexterm>
	408	<indexterm><primary>complex file name space</primary></indexterm>
	409	If only one server is active in a pair, the need for high-speed server interconnect is avoided.
	410	This allows the use of existing high-availability solutions, instead of inventing a new one.
	411	This simpler solution comes at a price &smbmdash; the cost of which is the need to manage a more
	412	complex file name space. Since there is now not a single file system, administrators
	413	must remember where all services are located &smbmdash; a complexity not easily dealt with.
	414	</para>
	415
	416	<para>
	417	<indexterm><primary>virtual server</primary></indexterm>
	418	The <emphasis>virtual server</emphasis> is still needed to redirect requests to backend
	419	servers. Backend file space integrity is the responsibility of the administrator.
	420	</para>
	421
	422	</sect2>
	423
	424	<sect2>
	425	<title>High-Availability Server Products</title>
	426
	427	<para>
	428	<indexterm><primary>resource failover</primary></indexterm>
	429	<indexterm><primary>high-availability services</primary></indexterm>
	430	<indexterm><primary>dedicated heartbeat</primary></indexterm>
	431	<indexterm><primary>LAN</primary></indexterm>
	432	<indexterm><primary>failover process</primary></indexterm>
	433	Failover servers must communicate in order to handle resource failover. This is essential
	434	for high-availability services. The use of a dedicated heartbeat is a common technique to
	435	introduce some intelligence into the failover process. This is often done over a dedicated
	436	link (LAN or serial).
	437	</para>
	438
	439	<para>
	440	<indexterm><primary>SCSI</primary></indexterm>
	441	<indexterm><primary>Red Hat Cluster Manager</primary></indexterm>
	442	<indexterm><primary>Microsoft Wolfpack</primary></indexterm>
	443	<indexterm><primary>Fiber Channel</primary></indexterm>
	444	<indexterm><primary>failover communication</primary></indexterm>
	445	Many failover solutions (like Red Hat Cluster Manager and Microsoft Wolfpack)
	446	can use a shared SCSI of Fiber Channel disk storage array for failover communication.
	447	Information regarding Red Hat high availability solutions for Samba may be obtained from
	448	<ulink url="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html">www.redhat.com</ulink>.
	449	</para>
	450
	451	<para>
	452	<indexterm><primary>Linux High Availability project</primary></indexterm>
	453	The Linux High Availability project is a resource worthy of consultation if your desire is
	454	to build a highly available Samba file server solution. Please consult the home page at
	455	<ulink url="http://www.linux-ha.org/">www.linux-ha.org/</ulink>.
	456	</para>
	457
	458	<para>
	459	<indexterm><primary>backend failures</primary></indexterm>
	460	<indexterm><primary>continuity of service</primary></indexterm>
	461	Front-end server complexity remains a challenge for high availability because it must deal
	462	gracefully with backend failures, while at the same time providing continuity of service
	463	to all network clients.
	464	</para>
	465
	466	</sect2>
	467
	468	<sect2>
	469	<title>MS-DFS: The Poor Man's Cluster</title>
	470
	471	<para>
	472	<indexterm><primary>MS-DFS</primary></indexterm>
	473	<indexterm><primary>DFS</primary><see>MS-DFS, Distributed File Systems</see></indexterm>
	474	MS-DFS links can be used to redirect clients to disparate backend servers. This pushes
	475	complexity back to the network client, something already included by Microsoft.
	476	MS-DFS creates the illusion of a simple, continuous file system name space that works even
	477	at the file level.
	478	</para>
	479
	480	<para>
	481	Above all, at the cost of complexity of management, a distributed system (pseudo-cluster) can
	482	be created using existing Samba functionality.
	483	</para>
	484
	485	</sect2>
	486
	487	<sect2>
	488	<title>Conclusions</title>
	489
	490	<itemizedlist>
	491	<listitem><para>Transparent SMB clustering is hard to do!</para></listitem>
	492	<listitem><para>Client failover is the best we can do today.</para></listitem>
	493	<listitem><para>Much more work is needed before a practical and manageable high-availability transparent cluster solution will be possible.</para></listitem>
	494	<listitem><para>MS-DFS can be used to create the illusion of a single transparent cluster.</para></listitem>
	495	</itemizedlist>
	496
	497	</sect2>
	498
	499	</sect1>
	500	</chapter>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: