On Fri, 2010-02-19 at 18:41 +0100, Andrew Beekhof wrote:
> On Fri, Feb 19, 2010 at 5:36 PM, Dietmar Maurer <diet...@proxmox.com> wrote:
> > Hi all, I just found a whitepaper from XenServer - seem they implement some
> > kind of self-fencing:
> >
> > -----text from XenServer High Availability Whitepaper-------
> > The worst-case scenario for HA is the situation where a host is thought to 
> > be off-line but is actually
> > still writing to the shared storage, because this can result in corruption 
> > of persistent data. To
> > prevent this situation without requiring active power strip controls, 
> > XenServer employs
> > hypervisor-level fencing. This is a Xen modification which hard-powers off 
> > the host at a very
> > low-level if it does not hear regularly from a watchdog process running in 
> > the control domain.
> > Because it is implemented at a very low-level, this also protects the 
> > storage in the case where the
> > control domain becomes unresponsive for some reason.
> > --------------
> >
> > Does that really make sense? That seem to be a very unreliable solution,
> > because there is no guarantee that a failed node 'self-fence' itself? Or
> > do I miss something?
> 
> Do you trust a host, that has already failed in some way, to now start
> behaving correctly and fence itself?  I wouldn't.

It really depends on the fencing model and what you believe to be more
reliable.  One model says "tell node X to fence" (power fencing) while
the alternative model says "if I don't tell you my health is good,
please self-fence" (watchdog fencing).

There are millions of lines of C code involved in directing a power
fencing device to fence a node.  Generally in this case, the system
directing the fencing is operating from a known good state.

There are several hundred lines of C code that trigger a reboot when a
watchdog timer isn't fed.  Generally in this case, the system directing
the fencing (itself) has entered an undefined failure state.

So a quick matrix:
model            LOC       operating environment  
power fencing    millions  well-defined
self fencing     hundreds  undefined

Knowing well how software works, I personally would trust the code with
hundreds of orders of magnitude less LOC, even when operating in an
undefined state.  The watchdog code (softdog) in the kernel is super
simple, and relies only on timer interrupts.  It is possible the timer
interrupts won't be delivered, in which case an NMI watchdog timer
(which is hardware based) can be used to watch for that situation.  It
is possible for errant kernel code to corrupt the timer list that the
kernel uses to expire timers.  If this happens, self-fencing using
software watchdogs will fail gloriously.

When considering hardware watchdog timer devices, the decision becomes
even more clear, since a hardware watchdog timer has almost complete
isolation from the system in which it is integrated.  Also it is
designed and hardened around one purpose - to powercycle a system if it
is not fed a healthcheck.

Expanding the matrix:
model             LOC       operating environment  
power fencing     millions  well-defined
software watchdog hundreds  undefined
hardware watchdog ASIC      well-defined

In the case of a hardware watchdog, the LOC is hidden behind a self
contained ASIC.  This ASIC could be defective in some way.  But it is
also isolated from the remaining system so that it operates in a
well-defined environment.

Compare those with the failure scenarios of power fencing:
1) the power fencing device could have failed in some way
2) the power fencing device could process a request incorrectly
3) the code that interfaces with the power fencing device could be
defective in some conditions
4) the power fencing hardware could fail to reset its relays for the
node to be rebooted
5) the fencing system directing the fencing could fail in its
communication to the fencing device
6) the network switch connecting the fencing device to the host systems
could have a transient failure to the particular port on which the power
fencing device is configured
... think up your own ...

There are thousands of interactions with power fencing and every one of
them needs to work perfectly for power fencing to work.  On the plus
side, the system is operating in a known good state rather then an
undefined failure condition.

Neither system is perfect, and it is likely a matter of opinion which
you choose.  ATM there are no good watchdog based cluster fencing
implementations available in the community but something I'd like to
tackle.

Regards
-steve


> _______________________________________________
> Openais mailing list
> Openais@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
Openais@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to