Is your air-conditioner the single most critical piece of kit in your data centre & you don’t control it?

You are not alone. It is quite common for IT staff to have very little control over their data centre or server room air conditioning systems.

Facilities people often don’t feel they need to know the day to day operation of their air conditioning systems. Or if they do, they aren’t guaranteed to communicate system failures to IT people in a timely manner.

This can cause considerable problems.

If the temperature in the server room is going to go very high then you’re going to need to take steps to mitigate any possible damage to your equipment.

At worst, you may have to switch off some or all of your equipment in order to prevent possible long term damage.

Data centre environments can be quite worker hostile places. We often hear horror stories from companies when people switch off the air-conditioning in order to work in the data centre. Unfortunately, workers sometimes forget to switch the air-conditioning system back on again. The server room environment then invariably starts to warm up, sometimes to very high temperatures.

How can environment monitoring help?

By installing your own monitoring equipment you don’t need to rely on anybody else informing you of air-con failures. You find out first!

You can inform the duty IT staff of the problem and they can take action to resolve the problem.

But the benefits don’t stop at that. By monitoring the environment in the data centre you can record a historical trend of your data 24/7/365.

You can tell whether the environment, even when the air-conditioning is working properly, is within the range you expect. You may be keeping the environment too cold, in which case you can save money on your electricity bills. If the environment is too hot you can take steps to cool things down.

Historical information allows you to examine your data for signs of longer term trends.

A gradual rise in temperature may indicate that your air conditioning system is unable to cope with a rising temperature load. Typically, air conditioner upgrade cycles are longer than server upgrade cycles.

As many data centre managers will attest, servers aren’t getting any cooler.

How to monitor your air-conditioning system

There are two basic strategies you can use. You can either monitor your air conditioners themselves, or monitor the environment the air conditioners condition or preferably, both.

Both options are discussed below in more detail.

Monitor the air-conditioner itself

Monitoring the air conditioner is the obvious thing to do. The air conditioner knows exactly what’s going on, so it should be able to tell you when things are going wrong.

One of the problems with monitoring the air conditioner is that there isn’t a single simple way of doing it.

Air conditioners in use in your data centre can go from the very simple all the way up to very sophisticated network enabled devices.

Where your air conditioner is on the scale of sophistication outlined above will indicate how you should go about monitoring.

Many air conditioners have a simple relay switch. When the air conditioner unit has a fault, or is switched off the relay changes state. The relay can be monitored using an environment monitor to detect when the relay changes state.

You can then notify appropriate people of the air conditioner failure.

More sophisticated (read expensive) air conditioners are network enabled and support SNMP.

SNMP is a widely used network management protocol supported by a vast array of network management software.

If you already have a network management system in operation you can integrate your air conditioning system alerting into your existing SNMP polling and alarming architecture.

Your SNMP enabled air conditioning system may well be able to pro-actively send you SNMP based alerts. The alerts come in the form of SNMP traps.

A normal SNMP monitoring system polls a device for information. If the data is found to be out of a normal range the network management system can then raise an alarm.

With SNMP traps, the device itself decides (according to your configuration) when an alarm condition has been reached. It then sends a trap to the configured trap server. The trap server then receives the trap and raises an alarm.

Trap servers can be very sophisticated event correlation engines or they can be very simple. Most network management or network monitoring tools have a built in trap server.

Monitor the environment

Even if you can’t monitor your air conditioning unit directly, you can always monitor the effect the air conditioner has on your server room environment.

In fact, monitoring your server room environment is likely to be something you want to do anyway.

Monitoring your server room environment won’t detect air conditioner failure as quickly as a direct air conditioner monitor.

The environment in your data centre is going to take at least a few minutes to warm up sufficiently to set off your environmental alarm system.

Why? Two reasons.

Firstly, the air making up the environment in your data centre has to be heated up. Heating the volume of air in your data centre is going to take time.

Secondly, you are going to wish to avoid false alarms. So, you need to set the upper temperature band above the normal every day fluctuating level in your data centre. In order for an alarm to be generated, the temperature needs to go above your upper temperature threshold.

The trick is to place the upper threshold low enough so that you have plenty of time to respond to a problem, yet high enough to avoid false alarms. Each data centre is different so you need to find the level that is right for you.

You can find the appropriate higher temperature threshold through experimentation and reviewing your historical data for trending information.

Unfortunately, trending information takes a while to collect. I would suggest at least a week. So you may need to tweak your upper threshold from time to time as your trending information slowly builds up.

No votes yet

Comments

Post new comment