How To Respond To Downtime

by Christopher Paul on January 4, 2012

From Pinboard:

Pinboard sets the bar high when running the site. I don’t use it as often as I should but I supported it well before users from Delicious flocked to it as a bookmarking alternative. Because of that, I didn’t notice the downtime mentioned in the blog. After reading it, it was clear to me that Pinboard is destined great success (if it hasn’t already achieved it).

The reason?

The time and care put into the users – by making the site great – and doing great things when things go wrong:

“Any kind of data loss on this site is unacceptable, but it’s especially bad when it’s due to completely preventable operator error. I’ll be making some changes to make sure I can’t repeat this kind of mistake:

  • First, I’ve written out all important service deadlines in the little notebook where I write my daily work notes. This is a low-tech but effective way to get me to pay attention to stuff.
  • Second, I’ve worked out a formal backup policy for everything that lives on the filesystem (logs, configuration files, notes, uploaded files, feeds from other services etc). Now they will be backed up automatically with the same thoroughness as the database and user archives.
  • Third, I’m working on better automating server configuration and setup. Though the site was back up quickly, it took me most of the day to get services like search and feed import working smoothly. This should be something I can do with one command.
  • Finally, I’m going to spiff up the status page so that it shows the actual status of specific services. I have this stuff in my admin console, and there’s no reason not to share it with users.”
  • I wish all companies would respond to downtime this way.

    Previous post:

    Next post: