Data centre heating effects

by Jack Hughes on November 19, 2007

One of the side effects of the recent RackSpace outage in their Dallas/Fort Worth data centre has been finding out just how quickly their data centre heats up when the air conditioning system fails.

Our backup generators kicked in instantaneously, but the transfer to backup power triggered the chillers to stop cycling and then to begin cycling back up again—a process that would take on average 30 minutes. Those additional 30 minutes without chillers meant temperatures would rise to levels that could result in data loss and irreparably damage customers’ servers and devices. We made the decision to gradually pull servers offline before that would happen.

WOW! 30 minutes from air-con failure until temperatures reach a level when servers start being damaged. I knew the temperatures would go up fast, but I didn’t think they’d heat up that fast.

Popularity: 19% [?]

Share and Enjoy:
  • del.icio.us
  • Twitter

Related posts:

  1. Discoverable data centre infrastructure
  2. Blog Action Day: The server power double whammy
  3. Servers you can cook with
  4. Cabling as data centre art

{ 2 comments… read them below or add one }

Ronald December 10, 2007 at 12:52 PM

This is true as I recently experienced a similar situation. The temperature starts to rise so that it is noticeable by a person within 5 minutes and within 45 minutes you could cook eggs.
But servers should not suffer irreparable damage? All modern servers have settings to do a critical shutdown on high temperature. In the situation above, some servers started shutting down after 30 minutes.
The solution and cause is not that complex as the Rackspace annoucement makes it out to be. I assume Rackspace has the same issue.
1. Chillers aren’t connected to UPS. They draw to much juice! They are powered in a failure by generator power. The UPS is there for the server and communications equipment.
2. The Chiller controller’s must be wired to UPS!!! The controller starts the chiller, monitors is function, temp, etc, etc. If the chiller controller is on generator power as well then the startup happens incorrectly.
3. The Chiller startup needs to be staggered. If you have 10 chillers starting at once they overload the generator which then protects its circuits and the chillers cycle. Switching everything on at once is not a good idea due to the power spike – Do you remember the scene in Apollo 13 where Mattingly had to experiment to perfect the power-on sequencing of the LEM? Same thing.
4. The chiller startup is time staggered (even more reason to have the controller on UPS). The chillers are started in batches every 5 minutes. The chillers in the batch should be not be 2 chillers next to each other or a portion of the data centre might hot spot if that area has a chiller batch that is the last to start up.

Jack Hughes December 10, 2007 at 1:12 PM

@Ronald – thanks for dropping by. With Intel’s new experimental prototype the POLARIS chip generating 260 Watts @ 5.6 GHz, it’s hard to see the problem getting much easier anytime soon. ;)

Leave a Comment

Previous post:

Next post: