Home Blog Archives RSS
¶ User-friendly failover mechanism via removable media
I've recently realized that USB memory keys make great switches.

Take the example of a small organization that has a primary file server with multiple backup servers. The backup servers periodically retrieve updates from the primary file server and have the exact same configuration as the primary with the exception of ip address and hostname. If the primary server fails (e.g. due to a hard disk malfunction), then one of the backup servers needs to be promoted to replace the primary. Let's restrict ourselves to situations where this failover doesn't need to be immediate, and a few minutes of downtime is acceptable.

To do this with a USB memory key, configure each server so that it periodically checks for the existence of a special file /media/usbdisk/PRIMARY-SERVER. If the file is detected, then the server promotes itself to act as the primary server and reconfigures its network settings appropriately. Otherwise, the server demotes itself to a backup server that retrieves updates from the primary. The important thing to note here is that for many Linux distributions (e.g. Debian, Ubuntu), /media/usbdisk is the mount point for USB memory devices - when a USB memory device is inserted into the computer, its contents immediately show up at /media/usbdisk. Now, all you need to do is purchase a cheapo memory key from the local computer store and create an empty file on it called PRIMARY-SERVER, and you have an instant switch that lets you choose which machine to promote to the primary server.

So why does this matter? Probably the best feature about a scheme like this is that the failover process can be completed in a matter of seconds by even the most technically incompetent person. It's simple and intuitive. If you're using your software that relies on the file server and you notice that the file server isn't responding, you just walk over to the server room and move the "switch" to another machine.

The reason I care is that I help run the computing infrastructure for a small organization. They have a handful of technically unsavvy employees and require a file and database server. I'm happy to setup and configure the servers for them at my leisure, but I can't be on-call if a server fails and they need it replaced immediately. Now, when a server does fail, they simply move the switch and disconnect the failed machine. Since computers are cheap now, they can get triple redundancy with a simple failover mechanism for less than $1,000. When a server needs to be replaced, they send it my way, and I set it up when I have time. Then I just send it back and they plug it in.

Since the failover is so simple, it would actually be good practice to periodically switch primary servers just to ensure that the backups are still working. One of the biggest problems faced in server failures is that the backups don't work when they're most needed, precisely because they're not regularly tested. In a system like this, I would rotate the primary server switch every week or so, just to make sure that all systems are functional.

No comments, be the first!

Comments disabled until the spammers go away. I hope you comment spammers all die horrible deaths and are forced to delete endless streams of comment spam in your days in purgatory.
• Powered by bBlog