ashrae prediction is better than cure – cfd simulation for data center operation

17
Future Facilities Ltd. Prediction Is Better Than Cure – CFD Simulation For Data Center Operation. This paper was written to support/reflect a seminar presented at ASHRAE Winter meeting 2014, January 21 st , by Mark Seymour, Future Facilities. Mark Seymour 1/21/2014

Upload: robert-schmidt

Post on 15-Apr-2017

156 views

Category:

Software


0 download

TRANSCRIPT

Future Facilities Ltd.

Prediction Is Better Than Cure

– CFD Simulation For Data

Center Operation. This paper was written to support/reflect a seminar presented at

ASHRAE Winter meeting 2014, January 21st, by Mark Seymour, Future

Facilities.

Mark Seymour

1/21/2014

Table of Contents

Introduction ........................................................................................................................................... 1

Why Do Data Centers Fail to Achieve The Design Goals? ............................................................. 3

Figure 1. Room Filled with Notional Equipment to Reflect Design Assumptions ........................... 3

Figure 2. Uniform Load: All Cabinets 4.25kW / Cabinet (408kW total) .......................................... 4

Figure 3. Uniform Airflow Requirement 240l/s/cabinet (23m3/s total) ......................................... 4

Figure 4. All IT Equipment Operates In the ASHRAE Temperature Compliance Recommended

Range .............................................................................................................................................. 4

Figure 5. Typical Enterprise Varied Equipment Configuration........................................................ 5

Figure 6. Cabinet Loads Vary from 0kW to 13.2kW ........................................................................ 5

Figure 7. ASHRAE Temperature Compliance Is Not Achieved For Varied Equipment .................... 6

Do Core DCIM Tools Address This Loss In Performance? ............................................................ 7

Prediction Is Better Than Cure ........................................................................................................... 8

Simplifications In The Modeling Toolset ...................................................................................... 8

Figure 8. Small Datacom Hall with 8x No. 5kW Cabinets ................................................................ 8

Figure 9. Comparison of Elevation of Temperature In Cold Aisle ........................................... 9

Figure 10. Flow in the Raised Floor Predicted by RANS CFD (Left) and PFM CFD (Right)9

Figure 11. Flow from In-Row Coolers Predicted by RANS CFD (Left) and PFM CFD

(Right) .......................................................................................................................................... 10

Simplifications/Assumptions When Creating the Model......................................................... 10

Figure 12. Temperatures When 1U Server Are Above Blade Center ..................................... 12

Figure 13. Temperatures When 1U Server Are Below Blade Center ..................................... 12

Modeling for Operation ..................................................................................................................... 13

In Conclusion ...................................................................................................................................... 14

Mark Seymour Future Facilities Ltd.

Page 1

Introduction

Advancing technology has brought us more powerful computer hardware for our data

centers. In addition, it brings the opportunity to instrument and monitor the Datacom halls

to better understand where the IT load is and the resulting environment, at least in terms of

air temperature at an array of selected locations.

Instrumentation and monitoring is, without doubt, a step forward that helps the operator in

their quest to understand and control energy use of their undoubtedly complicated asset.

But is measurement enough?

Data center operators typically populate their white space slowly over time. They do so

assuming that the design intent (characterised by very high level parameters such as Total

IT kW and IT kW/cabinet or IT kW/sqft) is always valid.

The problem with this assumption is that it does not reflect the reality of the operational

configurations that will exist in the future. In essence, a data center designer has a

challenging task – to design:

The infrastructure for a room of unknown and varying electronics

With unknown and varying load

To be placed in an ill-defined climate – the local climate varying throughout the

year.

So, the designer can only consider:

A range of design scenarios for simplified configurations e.g. Day 1 50%, 100%,

using generic assumed loads and selected ambient conditions

A range of failure scenarios to check the design is resilient in the event of the

unexpected

One thing we know is this: the configurations considered by the designer are unlikely ever

to occur in practice!

This is not to say that CFD used in design to assess the design in relation based on these

assumptions is futile. On the contrary, such an assessment is the key to checking, selecting

and optimising the underlying concepts and strategies.

Further, it allows sensitivity studies for variations in IT load density, IT equipment type and

configuration… in order to avoid design flaws that are subsequently exposed by small

deviations from the design assumptions during operation. However, such design

Mark Seymour Future Facilities Ltd.

Page 2

assessment can never be considered a true prediction, because the configurations that will

occur over time will be unique to that facility and probably to a particular day.

Given that these conceptual design simulations do not cannot guarantee performance in

normal operation, are the advances in measurement and monitoring the saving grace?

Mark Seymour Future Facilities Ltd.

Page 3

Why Do Data Centers Fail to Achieve The Design Goals?

First, we must answer the question: what factors cause the data a center to fail?

We have discussed that configurations may vary. Consider a room cooled by perimeter

down-flow units supplying air via a raised floor to contained cold aisles. In a design

scenario, the room with uniformly loaded racks/cabinets might look something like that

shown in Figure 1.

Figure 1. Room Filled with Notional Equipment to Reflect Design Assumptions

The design assumption is that the power distribution and heat production in the room is

uniform, Figure 2 (p.4). Similarly, the assumption is that the airflow requirement of the IT

equipment is uniform, Figure 3 (p.4).

A design scenario CFD simulation is run to determine whether sufficient cooling is delivered

to each item of IT equipment, given a specified cooling capacity and control scenario. One

way of measuring whether there sufficient cooling for the IT equipment is to look at the

ASHRAE Temperature Compliance to see if the conditions fall within the recommended

range during normal operation. For the above scenario, the design cooling strategy

achieves conditions for all equipment that satisfy ASHRAE temperature compliance for

normal operation, as indicated by the cabinets being colored green, Figure 4 (p.4).

Mark Seymour Future Facilities Ltd.

Page 4

Figure 2. Uniform Load: All Cabinets 4.25kW / Cabinet (408kW total)

Figure 3. Uniform Airflow Requirement 240l/s/cabinet (23m3/s total)

Figure 4. All IT Equipment Operates In the ASHRAE Temperature Compliance Recommended Range

Mark Seymour Future Facilities Ltd.

Page 5

In practice, the equipment distribution that is installed in most enterprise data centers will

not be a uniform and homogeneous layout. Consider a more typical installation of a variety

of different equipment types, Figure 5.

Figure 5. Typical Enterprise Varied Equipment Configuration

The total power/ heat load in the room is the same, but the distribution is non-uniform with

some cabinets/racks having no power and others more than three times the average power,

Figure 6.

Figure 6. Cabinet Loads Vary from 0kW to 13.2kW

Mark Seymour Future Facilities Ltd.

Page 6

A similar variation applies to airflow requirement. This non-uniformity and deviation from

the conceptual configuration has consequences: some IT equipment is now receiving air

that meets only the “Allowable” (orange) range on ASHRAE’s Temperature Compliance

scale, rather than the “Recommended” range. Further, some inlet temperatures fall outside

the “Recommended” and “Allowable” ranges, as indicated in Red, Figure 7.

Figure 7. ASHRAE Temperature Compliance Is Not Achieved For Varied Equipment

In simple terms, configured in this way, the data center cannot be filled to capacity without

risk of loss of availability. As a consequence, to avoid risk of loss of availability due to

equipment overheating, the management team will stop installing new equipment as soon

as equipment starts to exhibit temperature warnings/ alarms. The result? Capacity is lost.

Mark Seymour Future Facilities Ltd.

Page 7

Do Core DCIM Tools Address This Loss In Performance?

In reality, many enterprise-scale data centers undergo changes on a daily basis and the

impact of change can result in equipment being put at risk in the way described. To

manage these frequent changes to complex modern datacom environments the industry

has turned to Data Center Infrastructure Management, or DCIM for short.

A key aspect of DCIM is to include monitoring. The monitoring systems are becoming more

prolific largely because of improvements in price, availability and better access to the data

they provide. The data is used to provide specific alarms, but also it provides views of

power and temperature throughout the IT hall.

While some temperature data is given for the IT inlets (IT is what you really care about),

many systems have sensors at locations other than the inlets to the IT equipment. In such

a configuration, the sensors will not necessarily indicate there is a problem at all. For

example, when recirculation is the source of overheating and the limited number of sensors

means that sensors are not included in the recirculation path, no alarm may ever be issued.

Perhaps much more importantly (even if every inlet were monitored), is the fact that the

data provided by sensors can only tell you what is happening now or what has happened in

the past. But, critically, it does not tell you what will happen in the future when you make

your next installation. In practice, the fact that equipment already installed is cool is no

guarantee that there is sufficient air to cool new equipment. In simple terms, core DCIM

tools such as monitoring simply look at the past or the present and NOT the future.

Perhaps worse still, it is also quite often the case that when a new installation is made the

adverse effect may be on items of IT equipment in another location entirely. As a result,

previously installed IT equipment that has been operating satisfactorily may suddenly be

adversely affected.

Without any foresight, the first signs of this are seen in the form of equipment alarms

indicating that the environment they are experiencing is close to limits for hardware use. In

fact, many data centers start to see thermal alarms from the IT equipment as early in their

life as 60-70% of the design capacity. In order to avoid this lost capacity deployment

decisions need to be assessed in terms of their future impact.

Mark Seymour Future Facilities Ltd.

Page 8

Prediction Is Better Than Cure

Given that strategic measurements indicate the symptoms, that is they look backwards not

forwards, the obvious way is to use the same simulation tools used in design – CFD

simulation – for the operational configuration.an extension to the design.

The difference now is that, unlike the design scenario where CFD is used to model

conceptual configurations, the model must now consider the actual configurations allowing

for the as-built facility, infrastructure, the IT systems, and the deployment practices

actually in use.

The use of simulation represents the only practical way to predict the likely performance,

short of building a mock-up or installing the equipment in test mode in the real facility.

However, it is important to realise that the model must not only use the real equipment in

the chosen locations, but must also reflect the actual installation and practices.

When using CFD as a prediction tool in operation, there are many details that if ignored may

lead you to the wrong deployment decision. These fall into two categories:

1. Simplifications in the modeling toolset

2. Simplifications/assumptions when creating the model

Simplifications In The Modeling Toolset

Given the computational expense of traditional CFD tools, it is tempting to identify elements

of the physics that can be ignored and thus simplify and speed up the simulation process.

Consider a small equipment room, Figure 8.

Figure 8. Small Datacom Hall with 8x No. 5kW Cabinets

The room is cooled by a perimeter down flow unit distributing cool air via a raised floor

plenum to two short rows of 5kW cabinets. The simulation was performed with and without

Mark Seymour Future Facilities Ltd.

Page 9

thermal buoyancy being accounted for (i.e. ignoring the physics that hot air rises), Figure 9.

It is clear that even in the high flows produced in a data center the change in flow and

temperature distribution in the ‘cold aisle’ are significantly such that IT equipment inlet

conditions are very different.

Figure 9. Comparison of Elevation of Temperature In Cold Aisle

Now consider another alternative. Consider the possibility of using a simpler methodology

for determining the flow. The examples below compare a traditional finite volume RANS

(Reynolds Averaged Navier Stokes) solution and a potential flow solution. Figure 10

compares the flow predicted in the raised floor.

Figure 10. Flow in the Raised Floor Predicted by RANS CFD (Left) and PFM CFD (Right)

The streamlines show that using a potential flow solution (which, by its nature, does not

conserve momentum) results in a very different airflow pattern very typically characterized

Mark Seymour Future Facilities Ltd.

Page 10

by a lack of recirculation or separation in the flow. This results in a very different airflow

distribution through the perforated tiles.

In the body of the room the picture is similar. Figure 11 shows the flow from in-row cooling

units.

Figure 11. Flow from In-Row Coolers Predicted by RANS CFD (Left) and PFM CFD (Right)

The failure to conserve momentum, predict separation or recirculation results in very

different flow patterns and consequent IT equipment inlet temperatures.

In summary, to model real operational scenarios, it is important to include the full physics

and conserve all variables.

Simplifications/Assumptions When Creating the Model

Data centers are very complex. It is impossible to model them in full detail. The big

questions are, what do we include and how do we include it? There are many potential

simplifications:

Obstructions – cables, pipes,…

Cooling devices – controls, discharge/ fan characteristics

Airflow distribution – perforated tile characteristics, cable penetrations, containment

details…

Equipment configuration – rack construction, installation details, operational

characteristics,…

In conceptual models, many of these are grossly simplified. For example, cables modeled

as distributed resistance. Another classic simplification is to simply lump the IT equipment

in a rack/cabinet together and simply capture the total heat load and bulk airflow. But is

this reasonable when hoping to predict the true performance in operation?

Mark Seymour Future Facilities Ltd.

Page 11

Consider a rack/cabinet which is to have a blade center and three 1U servers installed.

Does it matter how the two equipment types are installed in this cabinet?

Option 1 – Three 1U servers placed on top of a blade center, Figure 12 (p.12). The

configuration is poorly blanked and hot air recirculates under the blade center. This results

in recirculated air that is over 27°C entering the IT equipment inlets.

Option 2 - The alternative is to install the 1U servers at the bottom with the blade center

above, Figure 13 (p.12). In this configuration, while the there is still recirculation, the

temperature of the recirculated air is less than 22°C. Clearly, using a simplified model

cannot capture both conditions – the detail is essential.

Mark Seymour Future Facilities Ltd.

Page 12

Figure 12. Temperatures When 1U Server are Above Blade Center

Figure 13. Temperatures When 1U Server are Below Blade Center

Mark Seymour Future Facilities Ltd.

Page 13

Modeling for Operation

So, when we use CFD for operational deployment decisions, it is critical that we must:

Ensure the model includes the key details

Survey/monitor the facility to check the model (modern monitoring lends itself to more

frequent checking) i.e. perform a calibration

Check if the model reflects reality sufficiently for engineering decisions. If it does not, we

MUST review and update the model to complete the calibration

Calibration is the process of measuring/monitoring in the data center to ensure that the

modelling simplifications adopted still allow the model to predict reality.

The calibration process warrants a separate document to fully describe it and so is not fully

documented here. However, for the purpose of this paper it is sufficient to recognise that

the calibration process is not a matter of actions like fixing grille flows to measured data

settings rather than predicting them, or any similar artificial adjustments of the model.

To make such adjustments fundamentally flaws the predictive methodology, since by fixing

the flows, the consequences of any change to the configuration can no longer be predicted

for any other scenario where the flow may vary.

Calibration is instead the use of the measured/monitored data to establish whether, or not,

the model of the current configuration predicts reality sufficiently accurately to be

confident it is an adequate reference model to use as a basis to consider future changes to

the installation. If it does not, the measured/monitored data can be used as indicator of

where to look refine the model representation to capture the necessary physics.

Once calibrated, the model can be used to test impact of new deployments in order to

check the impact on Availability, Capacity and Efficiency.

Mark Seymour Future Facilities Ltd.

Page 14

In Conclusion

Data centers are prone to losing capacity compared with design because no design can

allow for the infinite number of potential installations;

Sufficient detail for successful calibration must be added for effective operational

predictive modeling;

Predictive modeling can and should be used to avoid availability, capacity and efficiency

losses alongside traditional DCIM.