On Fri, 2010-02-19 at 18:41 +0100, Andrew Beekhof wrote: > On Fri, Feb 19, 2010 at 5:36 PM, Dietmar Maurer <diet...@proxmox.com> wrote: > > Hi all, I just found a whitepaper from XenServer - seem they implement some > > kind of self-fencing: > > > > -----text from XenServer High Availability Whitepaper------- > > The worst-case scenario for HA is the situation where a host is thought to > > be off-line but is actually > > still writing to the shared storage, because this can result in corruption > > of persistent data. To > > prevent this situation without requiring active power strip controls, > > XenServer employs > > hypervisor-level fencing. This is a Xen modification which hard-powers off > > the host at a very > > low-level if it does not hear regularly from a watchdog process running in > > the control domain. > > Because it is implemented at a very low-level, this also protects the > > storage in the case where the > > control domain becomes unresponsive for some reason. > > -------------- > > > > Does that really make sense? That seem to be a very unreliable solution, > > because there is no guarantee that a failed node 'self-fence' itself? Or > > do I miss something? > > Do you trust a host, that has already failed in some way, to now start > behaving correctly and fence itself? I wouldn't.
It really depends on the fencing model and what you believe to be more reliable. One model says "tell node X to fence" (power fencing) while the alternative model says "if I don't tell you my health is good, please self-fence" (watchdog fencing). There are millions of lines of C code involved in directing a power fencing device to fence a node. Generally in this case, the system directing the fencing is operating from a known good state. There are several hundred lines of C code that trigger a reboot when a watchdog timer isn't fed. Generally in this case, the system directing the fencing (itself) has entered an undefined failure state. So a quick matrix: model LOC operating environment power fencing millions well-defined self fencing hundreds undefined Knowing well how software works, I personally would trust the code with hundreds of orders of magnitude less LOC, even when operating in an undefined state. The watchdog code (softdog) in the kernel is super simple, and relies only on timer interrupts. It is possible the timer interrupts won't be delivered, in which case an NMI watchdog timer (which is hardware based) can be used to watch for that situation. It is possible for errant kernel code to corrupt the timer list that the kernel uses to expire timers. If this happens, self-fencing using software watchdogs will fail gloriously. When considering hardware watchdog timer devices, the decision becomes even more clear, since a hardware watchdog timer has almost complete isolation from the system in which it is integrated. Also it is designed and hardened around one purpose - to powercycle a system if it is not fed a healthcheck. Expanding the matrix: model LOC operating environment power fencing millions well-defined software watchdog hundreds undefined hardware watchdog ASIC well-defined In the case of a hardware watchdog, the LOC is hidden behind a self contained ASIC. This ASIC could be defective in some way. But it is also isolated from the remaining system so that it operates in a well-defined environment. Compare those with the failure scenarios of power fencing: 1) the power fencing device could have failed in some way 2) the power fencing device could process a request incorrectly 3) the code that interfaces with the power fencing device could be defective in some conditions 4) the power fencing hardware could fail to reset its relays for the node to be rebooted 5) the fencing system directing the fencing could fail in its communication to the fencing device 6) the network switch connecting the fencing device to the host systems could have a transient failure to the particular port on which the power fencing device is configured ... think up your own ... There are thousands of interactions with power fencing and every one of them needs to work perfectly for power fencing to work. On the plus side, the system is operating in a known good state rather then an undefined failure condition. Neither system is perfect, and it is likely a matter of opinion which you choose. ATM there are no good watchdog based cluster fencing implementations available in the community but something I'd like to tackle. Regards -steve > _______________________________________________ > Openais mailing list > Openais@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/openais _______________________________________________ Openais mailing list Openais@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais