Dreamhost Outage
As some might have been aware, Broken Kode was down for a few hours on Sunday, along with most of the Internet (seems everyone cool hosts their sites either with Dreamhost or Mediatemple, noooch). I don’t mind that since I’ve had it happen to me before, countless times with my previous host. Dreamhost have been blogging about it. The also eventually sent an email to everyone trying to explain everything.
The real reason I’m writing this is to call out their bullshit excuse their building services maintenance people have given. See part of my job is to actually design the electrical backup systems to buildings. Some buildings are considered ‘mission critical’ type buildings. That’s usually financial institutions, hospitals etc. One of the areas I claim a certain amount of knowledge about are UPS systems (as I’ve helped design and maintain several systems all over London).
A UPS is an Uninterrupted Power Supply. When power goes down in a building, it take a while for the generators to kick in, so there’s an interim period where the UPS acts as the electrical supply. Usually it’s some electrical panels and some batteries (the preferred system in America as far as I’m aware, although in the UK other systems are used extensively). Everyone with me, good let’s see what they said shall we:
During this period of 12:55 p.m. to 1:05 p.m., two of the five generators failed. The remaining three generators were unable to sustain the power requirements of the building causing the emergency electrical systems to transfer into a load shedding mode and the buildings UPS system to turn itself off, thus preventing permanent UPS and related equipment damage.
For those of you glossing over this little part of the email, generators will fail, that’s a given. They might not have checked them properly, they could choke up, this is a very common thing, which is why you always have several, to provide resilience to the system. One fails you’ve got another one to handle the load or whatever. There are different degrees of resilience, and that’s where the term n+1 (which they also talk about later comes into play). N being the nominal number of generators they need to supply the building completely, and plus 1 for the additional generator, in case one fails.
As they’ve said the 3 generators couldn’t handle the load, which is fine. Load shedding is when the actually building is smart enough to make sure it provides power to the most important equipment in a building before it completely STOPS. So it seems there’s something more important in the building than the servers. Unless they’re sharing their space with other people, I don’t see what’s more important, and surely not all the servers would be down, since 3 of the 5 were still operating, apparently.
The thing that made me laugh is their UPS statement. The UPS will only work for a total of 5-10 minutes depending on the actual generators, so I have NO idea what the hell they’re talking about when they said permanent UPS. Depending on how long you guys used them the batteries would have been DRAINED.
I asked on the blog if they have diesel rotary UPS systems which work slightly differently but I got no response. It’s nice that they’re very open about it, and I do appreciate the efforts they put into it, and so far I actually really like DH, but their building services people, and their consultants need a good beating. The one time the building needed to be able to utilise the thousands of dollars worth of electrical kit they’ve got in the place (and probably the reason they’re in that building in the first place) and it all fails on them when it actually mattered.