Wednesday, March 25, 2009

The great Mailstore migration

Today's guest blogger is Bryan Allen, our Operations head. For a long time now, Mailstore accounts have been undergoing a makeover, as the number of accounts grew. He's been working on making Mailstore a top-notch service for quite some time. Tonight's hardware roll-out is one of the final steps in his plan, so he's taken a few minutes to chronicle the whole journey. Take it away, Bryan!

Pobox doesn't often talk about the technology we use to ensure the spice continues flowing, and technical posts that don't deal with a specific (typically obnoxious to the person who solved it) problem tend to be pretty dry, but hopefully you'll find this somewhat informative.

Tonight's outage is the next step in what I've come to refer to as The Great Mailstore Migration.

Previously on Pobox...

Two years ago, it became obvious that our infrastructure needed a major change. We needed to consolidate hardware, we needed to move from the x86 Linux whiteboxes we were using to something beefier and better built. After a fair amount of testing, it was decided we'd move to Solaris 10 on Sun hardware.

Changing platforms gave us a lot: All the awesomeness of ZFS (checksumming, cheap snapshots, etc), and gobs of introspection thanks to DTrace. It wasn't a trivial process, but in the end it was absolutely worth it. Perhaps most importantly to how we provision services now, we got Solaris Zones. Pretty much any Pobox service you use lives in a Solaris container. Encapsulating services in this way allows us to fine-tune resource controls, quickly migrate the zone to bigger hardware, gives us a really simple view of how many resources a given service is consuming (as it's all running in a zone, there's no hunting for ancillerary processes amongst, perhaps, thousands).

It's also very useful conceptually when provisioning, knowing that you have services which require x CPU, y RAM, z disk I/O. You can think of the services (which themselves may consist of several services with different requirements) as boxes (or, in Sun or IBM parlance, containers) and assign them to hosts with the appropriate avaible resources. The same could be done for services in a single flat topology but I've found it a very useful mental tool, if nothing else.

The Great Mailstore Migration

The first step of the Great Migration, undertaken last year, moved us:

  • from Generic x86 servers to Sun X4100 M2s
  • from Linux to Solaris 10
  • from ReiserFS to ZFS
  • from SATA to SCSI

There were a few hiccups here, mainly relating the ZFS Adaptive Replacement Cache. Essentially, ZFS does a lot of really smart things when it comes to prefetching data, and caching it in memory. Depending on the size of your dataset, the ZFS ARC will want lots and lots of RAM. In Mailstore's case, the pretty point is pinned at 6GB min, but tends to hover between 7 and 8GB.

Next, we switched mail backends, migrating from Courier to Cyrus. This move was greatly simplified thanks to Gilles Lamiral's imapsync tool. We also deployed nginx in front of Mailstore, in IMAP proxy mode. These are both very cool, very useful pieces of technology, and we're very happy to have them in our toolkit.

The primary reason for moving to Cyrus were its binary indexes. These are compacted databases which greatly speed access to metadata about the messages in your Mailstore folders. We saw major performance increases here, especially relating to Webmail. We also got push notifications for free here, whereas with Courier we had to utilize FAM at such a performance cost it became untenable.

(Note: this type of push doesn't work with iPhones. As pretty much everyone in the office has one, we really wish it did.)

In Tonight's Thrilling Episode...

This outage is a two-part upgrade. We are deploying Sun J4200 SATA arrays to replace the SCSI arrays Mailstore data currently lives on. We're also upgrading the Mailstore servers to the most recent revision of Solaris 10. This latter gets us on ZFS root pools, greatly mitgating the amount of time it can take to upgrade a system. We're also getting a newer version of ZFS, in which, if it becomes necessary, we can build Hybrid Storage Pools on the J4200s.

Moving to the J4200s and SATA also increases our disk capacity by quite a lot. To the customer, this means we can start storing snapshots for longer. Snapshots are what we use to quickly restore user data when requested. At the moment, we store about a week's worth of snapshots. With the new storage systems in place, storing a month or more becomes reasonable. It has happened, though somewhat rarely, that a customer will ask us if we can restore a specific piece of mail they deleted a few days ago. More rarely still, they want to restore something from more than a week past.

Well, now we'll be able to with a minimum of fuss. This increased snapshot capacity may also save some of our POP users who suffer local hard drive crashes (though we highly recommend moving to IMAP!)

Something I've been thinking about for a while is wrapping a web interface around the snapshots and letting customers restore their own mail. This feature may have to wait for ZFS to get a "diff" ability, but email pobox@pobox.com if you think it's an interesting idea.

Coming Up...

In the final planned Mailstore upgrade task, we'll be moving from the legacy version of Cyrus to the latest version. This move will allow us to incrementally build mailbox databases more easily (for faster searching, primarily in Webmail), easier replication, and another major performance boost due to a database backend change.

So there it is: The reason for the planned outages in the last year, and where the Mailstore backend is going.

As always, if you have any questions, please email pobox@pobox.com.

Thanks, Bryan!

-----

Tax time is coming up. Beware phishing attempts! The IRS does not request information via email.

No comments:

Post a Comment