Friday, October 30, 2009

How and Why Downtimes are Scheduled at Pobox

0 comments
Very early next Tuesday morning (or late Monday night for you night owls), Pobox has a Mailstore downtime scheduled from midnight to 4 AM (which is a really long outage for us!) I asked our Operations Head, Bryan Allen, to explain a little bit about downtimes, including the many, many changes that don't require downtimes, as well as what kinds of changes do.

Here at Pobox we try to minimize outages. In fact, we kind of have a thing about it. We keep the spice flowing, and we try to ensure you can access your mail. Periodically, we generate planned outages for software or hardware upgrades.

The vast majority of the Pobox service infrastructure is redundant and to an extent, self-healing; it requires no human interaction when badness occurs. The rest is replicated but requires manual intervention for failover. The core databases are an example of that: We have failover replicas, and we can fail over to any one of them within minutes with a minimum of service impact and no performance degradation once the failover has finished.

Our software upgrades are relatively benign these days. We push new Perl code to production several times a week, and we patch our operating systems regularly (thanks to the magic of Solaris LiveUpgrade, this would create an outage of a few minutes if most services weren't already redundant), and so on. We very, very rarely have service outages that are caused by core software.

On rare occasion (especially in the last several years), we'll hit an unplanned outage on a unique service. Those are Big Fires, and there's only one goal there: To fix the problem and restore service. Hardware outages, simply due to intrinsic orneriness, are harder to both plan for and recover from. Sometimes bugs can generate an outage: Recently a bug in the Mailstore authentication code made it so new clients could not authenticate and access their mail. That was a regression, and the fix was trivial.

Conversely, a planned outage is a declaration of intent: It says we are going to create an interruption of service some some specific, defined reason. At times, the intention is to avoid an unplanned outage (a fire) at some point in the future. Usually, it's to improve the service in some way.

Tuesday's outage is for a relatively major hardware upgrade.

The X4100 M2s running the Mailstore storage are somewhat older dual-core Opterons. We haven't quite hit the wall for their CPU running Mailstore, but we can see the dust on the horizon. Very early Tuesday morning, we're going to be swapping the X4100s out for X4150s (dual quad-core Xeons). That change alone will see us through for quite a long time. In addition to faster CPU and bus, however, the X4150s can take double the RAM (doubling the filesystem cache) and have 8 SAS bays (four free, currently, per-system). This will let us a build a Hybrid Storage Pool, redirecting the filesystem journal writes to an write-optmized SDD, and building an L2 filesystem cache on a read-optimized SSD. To put it very mildly: Zoom. In the future we'll want a way to upgrade the storage head nodes without taking the service offline, but currently the architecture doesn't allow for it.

In addition to snapshots (which we utilize for data recovery and streaming replication), ZFS comes with built-in compression. The bottleneck for disk access, is well, spinning rust. Regardless of the speed of the disks you use, and the size of the your filesystem cache (which is currently 16GB for Mailstore), you still have to retrieve bits from a platter. And that's slow. So why is compression a good thing? Won't compressing files consume CPU? Isn't CPU still a valuable resource? It is, but these days your fileservers CPUs are likely to be sitting relatively idle, while their disks are thrashing. If you compress the bits you write to disk, you have less to read and write, and get far more I/O per second, basically for free.

(Back when we first refactored the discards storage system, it was taking forever to write the user indexes to disk. Enabling compression increased performance by at least threefold.)

For this upgrade, the vast majority of work can be done before the maintenance window even starts. We use Puppet from Reductive Labs to manage our systems, and we encapsulate services in Solaris Zones. Put that together, and you have the ability to quickly provision services on new hardware without actually doing any work. So the new zones are all already running and configured, just waiting for the storage pools to be mounted. It really is as easy as it sounds.

Regardless of how easy it sounds, for every outage you want a pullback plan. If your upgrade totally fails for some reason, you want the ability to just put things back the way they were. In this case, the plan is: Plug the hardware into the old box.

We scheduled this window for four hours because I'm paranoid. You'll see this a lot when any business is updating a core piece of infrastructure. When our datacenters are updating their routers, they announce a six hour window where connectivity may flap. You may have noticed that our connectivity does not actually flap regularly, because they're pushing out updates they've tested in their labs already, and are reasonably sure everything will be full of joy. Badness does occur, though, and when it does it has a very bad habit of avalanching. So you want a defined window to try to resolve the problem and still complete your task, before you have to give up, put the old pieces back into place, and try again another day.
To recap the services that will be affected by this downtime, Mailstore customers will not be able to access webmail, their Mailstore folders or receive new mail to their Mailstore Inbox during the outage window. If you are a Mailstore customer, and also forward your mail to another address, your forwarded copy will be delivered throughout the outage without delay. You will also be able to send mail. If you are an IMAP user, and keep a local copy of your mail on your computer, you will be able to read your local copies, and any changes you make (deleting, moving messages, etc.) will be synced to Mailstore when the downtime ends.

We apologize for any inconvenience this may cause, and Bryan did ask me to tell you that, if everything goes smoothly, the downtime will be much shorter than the scheduled window. He just doesn't like to bank on that.

Thursday, October 15, 2009

Sharing Files: Alternatives to Attachments

0 comments
Fun fact: email is not a file transfer protocol. When you use it to send files around, weird (and sometimes bad) stuff happens. Attachments can contain viruses. They take a lot of bandwidth to send (especially if you're CCing them to a lot of people), and that means that everything is slow, and that makes admins suspicious. (Sure, they're a suspicious lot by nature, but they make the Internet go, so we try not to make them mad as a rule.)

But, for a lot of people, attaching a file to an email is the only way they know to send a file to someone else. How else can you get a file to someone besides email?

The very best way to transfer a file is to open a connection directly from the computer that has it, to the computer that wants it. That way, only the two computers that need to handle it do, by talking directly to each other. (When you email a file, three, four or maybe more different computers will handle that message.)

If you're talking to someone on IM, you may not realize it, but you're already using one of the easiest file transfer mechanisms around! Just drop the file you want to share into your chat, and you'll immediately prompt them to start downloading the file from you. It works for pictures, music, Word documents, even short movies. Drag-and-drop!

But, what if you aren't both online at the same time? Well, there are still plenty of options out there.

As digital cameras become more and more sophisticated, even emailing pictures can generate huge messages. Using one of the photo sharing services or social networking sites, like Flickr, Picasa, Kodak Gallery, Snapfish or Facebook, means you get access to their handy tools (like ordering prints!) It also means that the people who you're sharing photos with have an easy place to see all your pictures over time (so that adorable picture from last Thanksgiving can be looked at again when you email them a link to this year's pumpkin pictures.) With an extensive range of privacy options, it's also really easy to make sure people have to log in with a password to see your pictures.

Just want to share some regular old files? Apple's iDisk or Joyent's BingoDisk let you share files publicly or privately (and provide an off-site backup, should your computer fail.) A third option, Dropbox, even provides 2GB of free storage to anyone who sets up an account -- plenty for any basic file-sharing you may need to do.

There are other benefits to sharing this way, too. It's easier to make changes, and know that someone is looking at or downloading your most up-to-date version. Some sites will provide you with statistics, so you can see how often a file is being looked at.

Know other easy, non-email ways to share files? Leave a comment!

Friday, September 25, 2009

Email Etiquette: a background picture is worth a thousand groans

1 comments
Email Etiquette is a series of blog posts that was nearly titled "Things you've tried to tell your family a million times, and have gotten tired of repeating." Have an email pet peeve that you'd like to see in a future blog post? Send an email to pobox@pobox.com or leave a comment!

Once upon a time, my mother got an email program that had clip art built in. For a period of several months, no message was too large or small to go unembellished with clip art. Failing to find any clip art in her collection that would suit her message content, she could always just fall back to a picture of a cannon.

By and large, email is still a medium that values content over style, at least when you hear from a human and not a company. But, from time to time, we still get that message with pink text on a paisley tiled image background. So, here are Pobox's tips for maximum email enjoyment for all.

Sending images in your messages is great. Grandparents the world over love receiving pictures of their grandbabies. Images are less great the more people you send them to, though. (If you want to spread pictures far and wide, put them on Flickr.) Background pictures, by virtue of going out on every email you send, are thus the worst offenders of the email world. Save background images for your web pages.

When considering your email text, think of it this way. Every email you send is asking someone for something, even if it is only, "Please read this message." When you ask someone for a favor, you want to make it easy for them to do it. So, choose a clear, easy-to-read font, preferably at least 12 pixels large (10 is ok for print, but too small for the screen.) Black text is the easiest to read in a variety of formats. If you need to add color, it should be an accent, like in your signature, not for the whole message text.

What are your email formatting pet peeves?

Monday, September 21, 2009

Keeping Tabs on Released Messages

0 comments
Since we began holding messages we caught as spam (many, many years ago now!), a frequent question has been, "Where is the message I released?" To help answer this question, we have added Delivery Status to the Released Messages page. We hope this will help give a little insight about what's happening behind the scenes!

To view Delivery Status, just click "Edit Columns on the Released Messages page, and check "Delivery Status". There are 3 possible states we display: Bounced, Queued or Sent.

If a message is marked Bounced, we tried to deliver it to the address you selected, and it was rejected by your ISP. This could be due to an overfull mailbox, or a problem with your account. If your release bounced, other messages may also be bouncing, so we recommend adding another forwarding address where we can send your messages, and then re-releasing the message. If your other mail is not bouncing, it's possible that the message you're releasing is actually a phishing attempt or virus, and your ISP is rejecting that message, to protect you from a dangerous email.

If a message is marked Queued, it means that Pobox has it flagged for release, but the release hasn't been processed yet. In order to keep the Spam system running smoothly, releases are processed in a batch. But if you have a message that's been marked Queued for more than 20 minutes, please email Customer Support and let us know, and we'll do what we can to get it released, pronto!

A message marked Sent means that your message is being sent. This is what you should nearly always see. However, this is actually the most common "missing" group!

In most cases, the message characteristics that caused us to think the message was spam also caused your ISP to think the message is spam. Usually, these messages can be found in your ISP's Junk Mail or Spam folder. If they aren't, we recommend emailing your ISP and asking them to locate the missing message. If you also add the "Released Time" column to your Released Messages page, you can see what time we sent you the message.

Another reason why a message marked Sent might go missing is if you have multiple forwarding addresses on your account, and you released the message to one other than the one you're checking. In that case, the simplest thing to do is just release the message again.

Released mail is usually delivered immediately, because there's such a small number of messages sent out from those servers. However, from time to time, mail has been backed up by someone releasing a large amount of spam. In those cases, we have seen messages to other users at that ISP get deferred (or temporarily rejected, with a request to retry delivery later.) This is the major reason that we monitor releases; releasing spam can cause other people's legitimate mail to be delayed.

We hope that you'll find this additional little piece of information about Released Messages useful!

Friday, September 4, 2009

Zombies walk the Internet: Today's Pobox mail delay

1 comments
This morning, Pobox mail saw processing and forwarding delays. Most messages were delayed no more than 10 or 20 minutes, but we did get reports of a few messages taking an hour or more to be delivered to their final destination. In general, we try to keep delays for your mail to under 5 minutes; most messages are handled within seconds.

Today's delay was caused by a huge surge in traffic, that we've actually been dealing with for over a week, from a botnet. Botnets are massive numbers of computers (also known as zombies), typically people's virus-infected home computers, controlled by remote software for nefarious purposes. Some estimates say as many as one in 4 personal computers connected to the Internet are running botnet software.

This software can be used for different purposes. In our case, the botnet is being used to send spam. They are also commonly used for denial-of-service attacks, where huge amounts of traffic are targeted at servers or a company, with the goal of effectively blocking all legitimate traffic; or behind phishing attacks, where credit card or bank information is collected.

We are making a number of network and security changes to deal with this ongoing attack. There will be a series of brief outages this evening for the website, webmail, outbound SMTP and POP3/IMAP services, as we make upgrades and networking changes to prevent further delays.

Running a PC at home? Make sure that you have up-to-date anti-virus software, and run it regularly. Using a home firewall is also a good preventative step from keeping your computer from being used as part of a botnet. If you're running a Mac, you're probably safe. Thus far, there seems to have only been one Mac botnet, and it came from people downloading "shared" copies of iWork '09 and Photoshop CS4.