avoiding double failover using Pacemaker

2020-09-25 · Computing

Imagine this: It’s deep in the middle of the night, and disaster strikes: a mission-critical server goes down. Perhaps there’s a moth in the machine, or perhaps someone you’re related to tripped over the power cord. (Okay, this is unlikely to happen in a datacentre.) No matter! You have a high-availability setup, the robots realise something’s wrong, there’s a failover, and other services detect loss of connectivity and redial. Perhaps you’re called out—or perhaps you just carry on sleeping, since it can now wait until morning. Either way, disaster is averted, and you can investigate and fix the miscreant at leisure. Later, you bring it back online, the cluster lets it join the party—and then rebalances, causing another failover and an entirely unnecessary second outage!

The examples use: CentOS 8.2.2004, Pacemaker 2.0.3-5.el8_2.1, Corosync 3.0.3, PCS 0.10.4.

feeling unbalanced

With default settings, Pacemaker is susceptible to double failover, multiplying your downtime needlessly. This is because Pacemaker allocates zero cost to moving resources across the cluster, letting it stop and start services on different servers as it sees fit, in order to achieve what it considers to be a better allocation. There are likely very few cases where moving a resource is, in fact, zero cost. Certainly, if the resource is a VIP for an active-passive database cluster, movement of a resource will trigger a failover which could send ripples out across your infrastructure pond. But even if the resource is a queue processor, rebalancing might be far from what you want, as it will cause a momentary drop in robotic productivity. And although I mention a double failover, failovers could in fact be caused many more times than this, during restarts of cluster services and server reboots or whatever is necessary in order to restore full health. What to do‽ Stickiness to the rescue!

the right amount of stickiness

Pacemaker supports the notion of stickiness, which allows the assignment of non-zero costs to resources in order to make moving them less attractive. As such, you can assign different costs to different resources, in order to prioritise the rebalancing—in the interests of keeping your database or load-balancer VIP in-place, for example. It’s also possible, however, to assign a high cost to everything, which basically says, only move this if you really have to, as preventing a long outage after a disaster is fine, but obsessively shuffling resources around the cluster is not. Using PCS, it’s possible to add an meta section when the resource is created (or updated), in order to set a non-zero cost.

pcs resource create power-plant-reactor \
    service:over-reactor \
    meta \

If you have lots of resources, setting metadata on each resource is tedious, even if you’re automating these scripts. Not only that, but if you find yourself wanting to rescale the costs, you’ll have to update each resource. Far better is to set the stickiness as a resource default, which only requires specifying once. This means that resources will, by default, be sticky. You can even do this when the cluster is created, before any resources are created or enabled.

pcs resource defaults \

destickifying stoniths

This is all very well, but also leads to Pacemaker not being able to rebalance much at all. This is arguably less important for active-passive databases, but if you’re using a cluster to run dynamically allocated queue processors, such as Isoxya web crawler does to manage its cupboard of spiders, you might want to override the cost for certain resources—to destickify them, if you will. Even in the active-passive database or load-balancer VIP case, you might like to do this for stonith resources, since rebalancing them likely leads to no noticeable effect on your main software, and allowing this leeway can reduce load on your active node automatically. Stoniths support metadata just like regular resources, meaning you can reset them and get the best of both worlds.

pcs stonith create fence_like_your_neighbours_watching \
    fence_virsh \
    identity_file=secrets/id_rsa \
    ip=example.com \
    username=007 \
        meta \