Fri, 01 Feb 2008

New hardware for the IMAP server

I mentioned recently that we've received new hardware for the lab's IMAP server. I've begun work toward migrating to the new system, but I think it's going to be more difficult than originally planned. I simply haven't figured out a way to synchronize all the mailboxes quickly enough. The IMAP service needs to be shut down completely for a period of time so we can make sure the filesystem on the new system matches the filesystem on the old system. Unfortunately, I don't see how we can keep this outage to an acceptably short period of time. There are simply too many files.

I've tried a few different approaches to synchronize the filesystem. It's tempting to try rsync, since it only copies the files that actually change, and in this case would leave most of the filesystem completely untouched. Unfortunately, rsync needs to construct a detailed index of the files before it can work, and this operation takes a very long time when dealing with ~14 million files. I've tried to optimize this by splitting the filesystem into smaller chunks (typically individual users' mailboxes) and synchronizing them individually. I've tried running rsync over multiple chunks in parallel with varying numbers of rsync processes. This has helped, but not nearly enough.

Doing a complete filesystem copy gets us away from the rsync overhead, but requires that we copy the entire filesystem contents. That's not a cheap operatoin, either, since the filesystem contains over half a terabyte of small files.

There are a couple of options left available to us, but they both involve Real Work. Kcr has been advocating that we switch to a Cyrus Murder configuration, which could help us here. With the new server and the old server configured as backend IMAP servers, we could serve IMAP mailboxes from both machines at the same time, taking individual (or small groups of) mailboxes offline to move to the new server. This would likely still involve downtime, but would allow us to spread the downtime across several days or weeks, keeping the individual outages very short. If we could find a way to limit the outages only to the specific mailboxes being copied, that would be even better, but I'm not sure that's possible.

Another option might be to disconnect the current IMAP server's RAID array and plug it in to the new server. The new server would then take over as the IMAP server, and we could synchronize to local disk via the direct fibre-channel connection, rather than over the network. I don't think this will help, though, because the network is not the bottleneck in the current setup. The bottleneck seems to be the RAID array itself (and the configuration of the filesystem on it), which we'd carry right over to the new system with us.

So, the point of all this, I think is that we're not as close to rolling out the new server as I'd hoped.

0 writebacks

writebacks...

trackback

TrackBack ping me at:

http://people.csail.mit.edu/noahm/blosxom.cgi/imap-hardware.trackback

comment...

 
Name:
URL/Email: [http://... or mailto:you@wherever] (optional)
Title: (optional)
Comments:
Please enter the text shown in this image in the adjacent text field: