ee382c final project crouching tiger, hidden...

36
EE382C Final Project Crouching Tiger, Hidden Dragonfly Alexander Neckar Camilo Moreno Matthew Murray Ziyad Abdel Khaleq

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

EE382C Final Project

Crouching Tiger, Hidden Dragonfly

Alexander Neckar

Camilo Moreno

Matthew Murray

Ziyad Abdel Khaleq

Page 2: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Outline

• Topology, consideration and layout

• Routing solution

• Mirroring and simulation

• Results and conclusion

Page 3: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Dragonfly Topology

Fully-connected local groups

Low hop count

Fast access to global links

Page 4: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Dragonfly Topology

Load balance:

Endpoints/router >= global links per router

~All traffic is bound for other groups. BW should fit.

Local links per router >= endpoints+global links

~All traffic needs to traverse local link before,after global.

Adaptive Routing helps deal with adversarial traffic.

As long as overall BW is sufficient

And we have good backpressure

Page 5: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Considerations

Costs

Optical links drive cost

Minimize number, good utilization

Local links much cheaper

Overprovisioning helps feed global links

Physical layout

fully-connected group size limit (5m cables)

Page 6: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Considerations

Power

Links dominate power

Traffic

Mostly limited in throughput by send window(RPC).

some (RDMA) very large packets.

hotspots.

So... what?

Page 7: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Maybe as many as 60 racks per group!

Page 8: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Realistically, 34ish

Page 9: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Layout Considerations

Maximize racks per group?

routers on bottom slots, wire diagonally

Actually not a constraint

Balance / cost issues with very large groups.

100m optical cables

~70m square: 147 x 50 racks: >200K rack slots

Page 10: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Chips

Channels:

5GB/s = 4 diff. Pairs @10Gb/s

1 optical cable

4 elec. cable pairs each direction

Chips size is perimeter-driven

buffers+crossbar are only a few mm2.

High-radix requires large perimeter for I/O

Page 11: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Exploring options

Lots of guesstimation!

Page 12: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Basic

TOPOLOGY 13x26x13

Cost 6.16M

Power 68Kw

Router Radix 51

Opt. Links 57291

Elect. Links 110175

Groups 339

Endpts/group 338

>114k nodes

Balanced for uniform random

Page 13: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Cheaper, better?

TOPOLOGY 10x32x10

Cost 5.64M

Power 70.7Kw

Router Radix 51

Opt. Links 51360

Elect. Links 159216

Groups 321

Endpts/group 320

Fewer optical cables

Overprovisioned in-group links

8.5% cheaper

4% higher power

Page 14: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

A little more savings

TOPOLOGY 10x34x9

Cost 5.22M

Power 70.5Kw

Router Radix 52

Opt. Links 46971

Elect. Links 172227

Groups 307

Endpts/group 340

90% of normal global links

Overprovisioned in-group links

Even cheaper

Any good?

Page 15: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

What if...?

TOPOLOGY 10x45x5

Cost 3.11M

Power 65.9Kw

Router Radix 59

Opt. Links 25425

Elect. Links 223740

Groups 226

Endpts/group 450

Half the “necessary” global links

Very overprovisioned in-group links

Otherwise not 100K

Almost half the price!

Page 16: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Improving Global Adaptive Routing

I feel the need…the need for speed.

Page 17: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Challenges

Quick congestion detection

Quick and accurate return to minimal

Tricks with credits, etc., can provide stiff backpressure

How do we avoid incorrectly taking the non-minimal route?

Page 18: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Solution idea

Use the rate of change of the queue to provide quick congestion detection and quick return to minimal

Potential advantages:

More accurate representation of network performance

Rapid detection

Potential problems:

Sensitivity to burstiness

Page 19: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Our Work

ROC = 0.99*prev_ROC + 0.01*cur_ROC

Developed two new routing algorithms:

Min_queue_rate < 2*nonmin_queue_rate || min_queue_rate < 0

Old algorithm || min_queue_rate < 0

Page 20: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Results

1024 nodes, 2*p = 2*h = a = 8, injection

Uniform:

2% increase in average, 5% increase in max for both ROC and combo

Bad_dragon:

ROC = 69% ave. latency, 82% max

Combo = 72% ave., 90% max

Page 21: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Bad Dragon Results

0

10

20

30

40

50

60

70

80

90

100

Original ROC Combo

Ave Latency

Max Latency

Hops

Page 22: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Challenge

Booksim's cycle-accurate nature is at odds with simulating our very large system

std::bad_alloc...

Page 23: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Solution: Slicing

Do a fraction of the work and get all of the results!

How do we not include components in our simulation and still effectively simulate the entire network?

Page 24: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Slicing idea 1: Scaledown

A = 8, H = 2

Page 25: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Idea: Relationships

Page 26: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Forget about hotspots for a minute...

Page 27: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Slicing Idea 2: Mirroring

Page 28: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Routing

Page 29: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Mirroring with Hotspots

Page 30: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Results for Different topologies

p/a/h

p: Endpoints per switch

a: Switches per group

h: Global links per switch

100,000 nodes with “Project Traffic”

Best from 10/32/10 @ 3.0277 Million Cycles

Page 31: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results For 13 / 26 / 13

100

97 97.4

100

108.14

106.43

90

92

94

96

98

100

102

104

106

108

110

Original ROC Combo

Average Latency

Hops3,217,516 cycles

3,209,757 cycles

3,247,934 cycles

Page 32: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results For 10 / 32 / 10

100 99.3

97.44

100

112.89

110.9

85

90

95

100

105

110

115

Original ROC Combo

Average Latency

Hops

3,064,421 cycles

3,027,714 cycles 3,054,955 cycles

Page 33: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Simulation Results

For 10 / 32 / 10 WITH 10 Hotspots

100

97.37

98.58

100

113.071

111.1

85

90

95

100

105

110

115

Original ROC Combo

Average Latency

Hops

3,057,401cycles

3,025,221cycles

3,063,628cycles

Page 34: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Other Simulation Results

16 / 28 / 8:

Runtime 4,130,224

Average Latency 519.74 (too big)

10 / 45 / 5 (half global links)

Runtime 4,190,192

Average latency 528.51

Page 35: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Conclusion

ROC always wins in average latency and runtime cycles.

At a small cost of additional power (4%) over the basic 13 / 26 / 13. We can get higher performance cheaper with the 10 / 32 / 10 topology.

Simulated hotspots scenario is pessimistic, numbers are fine.

Page 36: EE382C Final Project Crouching Tiger, Hidden Dragonflycva.stanford.edu/classes/ee382c/projects/alex_camilo_matthew_ziyad.pdf · Ziyad Abdel Khaleq Outline • Topology, consideration

Questions