Server Room Monitoring - An Everyday Essential?
Environmental monitoring is essential in your server rooms & data centers. Environmental conditions have a huge impact on how reliable and long lived your servers, switches and routers will be. Bad environmental conditions can reduce the life of components, decrease reliability, and cause long term system unreliability.
Your aim is to establish environmental conditions that remain as stable as possible. A common fault is to install an air conditioning unit that is oversized. Not only is this more expensive to install and run, but it can cause rapid temperature and humidity changes when the unit switches on. Better to have a smaller unit working at a low rate most of the time.
Server Room Temperature
Computer equipment ages faster when it gets hot. In fact equipment manufacturers use this property to help eliminate faulty components, batches are ’baked’ to test for failing units. The idea is that if a component survives this process then it stands a good chance of being reliable in service. In general computers operate more reliably and have a longer life in cooler conditions. The effects of prolonged running at high temperatures can be unpredictable and are not always characterised by catastrophic failures. How many of us have noticed unreliability and intermittent equipment problems in warm weather?
For individual machines, in domestic or small office conditions, and for single servers, the internal fans and cooling mechanism are usually sufficient to keep the temperature within safe operating limits, but in data centers this is rarely the case. Modern servers, switches, routers etc. generate an enormous amount of heat and in all but the smallest installations a separate cooling or air conditioning system is required.
Of course you have made sure that your electrical plant can handle the extra load of an air conditioning system! Remember that today’s machines run faster, and hotter, than those of yesteryear so just because your cooling system worked last summer doesn’t mean that it will cope this year. If there happens to be a heat wave it’s not going to make things any easier.
If people have to work in the Server Room then aim for a comfortable temperature of around 20°C (68°F). Bear in mind though that human perception of temperature is very subjective, what is comfortable for one person can be too hot or too cold for another. As a general rule you should design your Server Rooms for equipment not for people.
If people don’t need to work in the Server Room you could keep the temperature lower. The danger is if someone has to work in the room and it’s too cold, they will switch the cooling off, and may forget to turn it back on again when they leave. First thing anyone knows is when the equipment begins to fry!
Air conditioning units are complex and have a tendency to fail suddenly. In many organisations the responsibility for the unit is handled by the facilities guys, so the data managers may be low down the list of people to be informed. Specialist temperature monitors for air conditioning are complex and expensive, and usually not in the control of the data people.
Hot spots can be a danger in equipment rooms. Often the heat builds up behind equipment racks or near larger machines. Sometimes poor equipment layout means that the exhaust from one unit feeds directly to the intake of another. Positioning temperature sensors in various locations gives you an idea of where problems might occur.
Temperature Gradient and Hot Spots
It is important to measure the temperature gradient in the Server Room, the difference in temperature low down and high up. Poor equipment layout, or poor airflow (see below) means that the warmed exhaust from one unit feeds directly to the intake of another, causing devices higher up to receive less effective cooling. If you locate one sensor 2 feet (0.6 metres) off the floor and a second 5 feet (1.5 metres) you can measure the temperature differential. If the difference is more than 10°F you should consider increasing the airflow or cooling things a bit more.
Positioning temperature sensors in various locations gives you an idea of where problems might occur. In addition extra sensors provide redundancy should a sensors fail or become damaged.
Hot spots can be a problem in equipment rooms. Often the heat builds up behind equipment racks or near larger machines and if not removed can cause premature failures.
Temperature, although very important, is not the only environmental factor to take into account.
Moving air cools more than still air, so maintaining good airflow is important. Forced convection, usually fans of some kind, is commonly deployed. Failure of fans both internal to equipment, and external in the room, can cause localised hot spots and sudden rises in temperature.
Sometimes simply repositioning machines can have a beneficial effect. Fitting baffles or deflectors can also help avoid channelling hot air from one machine into another. Measuring an increase in the exhaust temperatures of key equipment often gives early warning of problems.
The heat loading in a Server Room is considerably more than in a normal office and more changes of air per hour will be required. Typically an office might need 2 or 3 air changes per hour a Server Room might easily need ten times that amount.
For people to work comfortably moving air cools better than still air. Air flowing too fast will be felt as draught and you should try to avoid all the heat staying at the top of the room. Slow fans can help push the air around. Pulling hot air up is better than trying to push it down.
Computers can, and do, operate within a wide humidity range. The key thing is to try and prevent quick humidity changes and wide variations and to avoid environmental conditions where condensation is possible. Humidity that changes from season to season is much easier to handle than that which happens hourly.
Condensation is always the enemy, any conditions that cause moisture to be deposited on equipment will sooner or later destroy things, either by rusting them away or by water entering something vital. That’s why machine specifications always specify a non-condensing atmosphere.
For people to work comfortably the relative humidity should be between 45% and 60%. Too dry and there’s a lot of dust too wet and it’s just plain sticky and hard to work in.
Bear in mind that air conditioning equipment failures may also involve water leakage, or spills. Maybe positioning that server or comms cabinet underneath the air conditioning unit isn’t such a good idea?
Just because your laptop works OK in a warm room does not mean that you can expect a room full of servers to do the same. Too much heat will shorten the life of the equipment and in extreme cases cause it to fail catastrophically costing you time and money. You will need to install a cooling or air conditioning unit. You will need to monitor temperatures, ideally 24/7, and in different parts of the room, behind racks, and near critical equipment.
It is worth having an independent system check on the air conditioning unit. It is unwise to rely on the facilities management people letting you know when the air conditioning fails.