jose miguel montanana (nii, japan) michihiro koibuchi (nii, japan )

23
Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japa n Hiroki Matsutani U of Tokyo, Stabilizing Path Modification of Power- Aware On/Off Interconnection Networks

Upload: burke-sanford

Post on 30-Dec-2015

37 views

Category:

Documents


3 download

DESCRIPTION

Stabilizing Path Modification of Power-Aware On/Off Interconnection Networks. Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japan ) Hiroki Matsutani (U of Tokyo, Japan ) Hideharu Amano(Keio U/ NII, Japan). HPC networks (Infiniband, GbE) On/Off link activation method - PowerPoint PPT Presentation

TRANSCRIPT

Jose Miguel Montanana (NII, Japan)

Michihiro Koibuchi (NII, Japan )Hiroki Matsutani ( U of Tokyo, Japan )Hideharu Amano ( Keio U/ NII, Japan )

Stabilizing Path Modification of Power-Aware On/Off

Interconnection Networks

• HPC networks (Infiniband, GbE)

• On/Off link activation method  – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links

• Applying network reconfiguration to switches

• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change

Outline

0

50

100

150

200

250

300

Jun2003

Jun2004

Jun2005

Jun2006

Jun2007

Jun2008

Other technologiesMyrinetInfiniBand (IBA)Gigabit Ethernet

20%

40%

60%

50%

30%

10%

0%

Num

ber

of S

uper

com

pute

rs o

n T

op50

0 L

ist

Per

cent

age

on T

op50

0 L

ist

Network of High-performance computing

Virginia Tech's X

2,200 cores 280th on Top500

ABE (NCSA)

9,600 cores 23th on top500

ASCI-Q (LANL)

8,192 cores

BLUEGENE/L (LLNL)

212,992 processors 2nd on Top500 list

IBA

Propietary

RoadRunner (LANL)

122,400 cores 1st on

Top500

Quadrics

IBA

TACC (Univ Texas)

251,904 cores 5th on top500

IBA

IBA

Examples

2008

HPC Networks Small switches (24/48-port) provide the lowest cost per port

When 100,000 cores are connected, a large number of small switches are needed

- drastically increasing the number of links

- Unused and rarely-used links should be deactivated for power-aware HPCs

switch

host

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

Link aggr. using 3 links

4 paths

• Power cons is almost constant regardless of traffic load• # of activated ports dominates the power cons of switches

– Power cons of port is reduced down to ZERO by port-shutdown operation

Power cons of HPC switchesProduct Port Other

(Xbar) Total ( ratio of ports )

PC5324 1.2 14.9 42.9(65%)

PC6224 2.0 42.5 91.1(53%)

PC6248 2.1 56.8 155.2(63%)

SF-420 1.0 32.6 55.4(41%)SFS7000D-SK9

1.0 43.4 66.1(34%)

Unit :W

GbE

IB

• HPC networks (Infiniband, GbE)

• On/Off link activation method  – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links

• Applying network reconfiguration to switches

• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change

Outline

Overview of the on/off link method  

switch

host

Traffic load becomes low

( turning off a part of links)

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 9 10 11 12131415

Network load is not always high (e.g. during computation time)

Switch ports consume 40-60% of the total power of a switch

A runtime on/off link methodEg : port monitor,

IPTraf, pilot execution

How is NW stabilized during the path-update?

Low or high-load links appear

Selection of on/off links and paths

Update of link status and paths

Traffic monitoring

No

Yes

Very crucial factor

Low traffic load is detected

TREE 1 TREE 4TREE 3TREE 2

0 1 2 3 4 5 6 7 8 910 11 12131415

Paths: Before & After the before path is deactivated

0

1 23

45

6

Stabilizing network during the path updateNetwork Reconfiguration (deadlock avoidance)

Rold

Rold is deadlock freeRnew is deadlock freeRold+Rnew may deadlock

Rnew

3

05

14

6

2

NW ReconfigurationSwitch

Link

Rold=Routing Table

before the update

Rnew=Routing Table after the update

2

6

6

2

0

13

45

Network Reconfiguration

Rold

Rold is deadlock freeRnew is deadlock freeRold+Rnew may cause deadlock

Rnew

3

05

14

Reconfiguration

DeadlockOld behind newNew behind old

Existing NW reconf tech. on fault-tolerant networks

DOUBLE-SCHEMESIMPLE RECONFIGURATION

Static reconfiguration Dynamic reconfigurationTraffic is stopped

New routing is appliedTraffic is resumed

Traffic is not stoppedOld and new routing

coexist

Difficulty to avoid deadlockHigh latencies

STATIC RECONFIGURATION(ST)

Current NW Reconfigurations – SR PDA: Simple

Reconfiguration: Packet Dropping Aware[Lysne08,TC]

• Tokens are sent before update of routing• Packets are sent after updating routing

tables

– SR LA: Simple Reconfiguration: Latency Aware[Lysne08,IEEE TC]

• All new tables are distributed before using new one.

• Latency due to the tokens is reduced.

– DS: Double Scheme[Pinkston03,TPDS]

• Requires 2 virtual channels.• One channel have to be drained

– ST:Static Reconfiguration• Traffic injection is completely stopped

• HPC Interconnects (Infiniband, GbE)

• On/Off link activation method  – Reducing power consumption of HPC networks– Paths are updated to avoid deactivated links

• Applying network reconfiguration to switches

• Evaluations– Cycle-accurate network simulator– Behavior of network during the path change

Outline

• Switch model (InfiniBand)• Buffered input (1KB per VL) and output (1KB per VL) ports • Non-multiplexed crossbar with separate ports per VL• FIFO-based crossbar arbiter per output crossbar port• Round-robin arbiter per output port• 100 ns routing time

• Link model• Link Speed = 2.5 Gbps (1X links)

• Topologies• 2D mesh networks

• Traffic model• Packet lengths are 58 bytes• Uniform• Full range of traffic, from low load to saturation

Simulation Environment

Evaluation ResultsWe twice apply NW reconf. process to each execution:

• Deactivating links, after decrease the traffic injection

• Re-activating links, after increase the traffic injection

We evaluated full range of initial traffic injection, (from low traffic-to near congestion)

Static Reconfiguration (ST)

(a) Low Traffic Load

(b) High Traffic Load

Traffic load decreases Traffic load increasesLatency is high

Latency is high

Traffic decreases, a link is deactivated

Traffic increases, a link is reactivated

At each on/off link operation, traffic is not stabilized in ST!!

SR-LA (dynamic reconfiguration)

(a) Low Traffic Load

(b) High Traffic Load

Also, at each on/off link operation, traffic is not stabilized in SR-LA!!

SR-PDA (dynamic reconfiguration)

(a) Low Traffic Load

(b) High Traffic Load

Also, at each on/off link operation, traffic is not stabilized in SR-PDA!!

Double Scheme (dynamic reocnfiguration)

(a) Low Traffic Load

(b) High Traffic Load

Latency is constant

Traffic load decreases Traffic load increases

Latency is constant

Stabilizing the path update only in Double Scheme!!

DS

ST

SRL

Larger Network (8x8 Mesh)

Similar behavior!!

Only Double Scheme stabilizes networks during the path update!!

• We apply network reconfiguration techniques to power-aware on/off networks for HPC– Links consume ~63% of switch power

• On/off link activation reduces power • It must accept the topology change

– Network reconfiguration smoothly supports the path update » Stabilizing the update of new/old paths» Avoiding deadlocks of new/old paths

• Cycle-accurate simulation – shows its impact on the power-aware on/off networks

• Double Scheme (dynamic NW reconf) maintains performance, stabilizing networks, deadlock avoidance

• Network reconfiguration is essential for realizing the power-aware on/off networks for HPC systems

Conclusions

Acknowledgment

This work was partially supported by JST CREST (ULP-HPC: Ultra Low-Power, High-Performance Computing via Modelling and Optimization of Next Generation HPC Technologies)