design of scalable network considering diameter and … · simulator & emulator ... requiring...

18
Design of Scalable Network Considering Diameter and Cable Delay Kentaro Sano Tohoku University, JAPAN Tohoku

Upload: nguyenkhuong

Post on 27-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Design of Scalable Network

Considering Diameter and

Cable Delay

Kentaro Sano

Tohoku University, JAPAN

Tohoku

Design of Scalable Network, K Sano

Agenda

Introduction

Assumption

Preliminary evaluation & candidate networks

Cable length and delay

Simulator & emulator

Summary

2

Design of Scalable Network, K Sano

Introduction

Feasibility study : 2012-2014

3 teams working for

next-gen supercomputers

Tohoku-NEC-JAMSTEC team

Working group for

interconnection

network

subsystem

3

Osaka

University

Tohoku

University

W.G. for interconnection

subsystem

Design of Scalable Network, K Sano

Background and Objective

More nodes with higher performance

Requiring high performance and scalable network

Application demands

Global/collective communication

Local communication (ex: p2p w/ 3D decomposition)

Usability, performance robustness

Scalability

4

Goal: find NW for next-gen supercomputers

Exploring design space with

application demands and technology constraints

Small-diameter NW using high-radix SWs,

which is also good at local p2p communication

Performance, cost, power, usability, reliability

Design of Scalable Network, K Sano

Assumption for Design Space Exploration

System scale

~65536 SMP nodes

Technology

~64x64 full crossbar switch

10~ GB/s per link

5

Full

cross

bar

input q 1 out b

input q 2 out b

input q 64 out b

64 x 64 switch

(virtual cut-through with virtual chs)

Fat tree又は

Hybrid NW

Fat tree又は

Hybrid NW

Fat tree又は

Hybrid NW

Network

n planes (for SMP)

node

1

node

65536

System overview

IB technology roadmap

Design of Scalable Network, K Sano

Preliminary Evaluation

Typical topologies

Full fat tree

3D / 5D torus

Dragonfly

6

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

N N N N

SW

N N

SW

N N

SW

N N

SW SW SW

SW

N N

SW

N N

SW

N N

SW SW SW

SW

N N

SW

N N

SW

N N

SW SW SW

SW SW SW SW SW SW SW SW SW

Full fat tree

n-D torus Dragonfly

Design of Scalable Network, K Sano

Comparison of Topologies

Too large diameter for low-D torus

Too many links for high-D torus / dragonfly

Fat tree looks good, but long cables?

7

Topology Full fat-tree 3D Torus 5D Torus Dragonfly

Nodes 65,536 65,536 65,536 65,536

Organization3 stages

64 x 32 x 3264x32x32 16x8x8x8x8

all-to-all(1D 16, 2D 16x16)

Node injection BW [GB/s]

Bisection BW [TB/s] 320 20 80 160

min to Max hops 2 ~ 6 1~63 1~23 2 ~ 5

min to Max delay [ns] 100 ~ 500 100 ~ 6300 100 ~ 2300 100 ~ 400

Links 196,608 196,608 1,310,720 468,736

Switches 5120 within nodes within nodes 4096

10

no cable delay

considered

Design of Scalable Network, K Sano

Full Fat Tree

Small diameter, but big latency via spine SWs

Max # of hops is limited especially with high-radix SWs.

Cable length grows with # of nodes.

8

32 nodes

SW

N N

SW

N N

SW

N N

SW SW SW

32 SWs

32 SWs

SW

N N

SW

N N

SW

N N

SW SW SW

SW

N N

SW

N N

SW

N N

SW SW SW

1024 nodes / islands

65536 nodes / 64 islands

SW SW SW

64 links, 10GB/s/link

SW SW SW SW SW SW

Max 6 hops

Spine SW

Design of Scalable Network, K Sano

Another Candidate: FTT Hybrid

Hierarchical network

Local fat tree (group)

256 nodes

2-stage fat tree

Only short cables

in a small fat tree

Global 2D torus

16x16 of 256-node groups

Short cables to connect

adjacent groups

512 links between groups

Expected advantages

Shorter cables

Expandable & flexible

9

Global NW: 2D Torus of 16x16 groups

G

x 16

x 16

128128

128256

Nodes

(FTT : Fat Tree & Torus)

Local fat tree

Global 2D torus

Design of Scalable Network, K Sano

Comparison Summary

Detailed & quantitative evaluation

Full fat tree and FTT hybrid

Consider more details about implementation & apps

10

Features Diameter # of Links Note

Fat tree

General-puropse,

High usability◎ ○ High cable delay?

Low-D torus × ○ -

High-D torus ○ × -

Dragonfly

Pseudo

high-radix NW◎ × -

FTT-hybrid

Combination of

Fat tree and torus○ ○ Low cable delay?

Good cost

performance,

Extendability

Design of Scalable Network, K Sano

Cable Length and Delay

Preliminary estimation

based on expected implementation

Boards (node, switch)

Cabinets (node, switch)

Floor layout

Cabling

11

C0 C1

C2 C3

FTT-hybrid layout example

cabinet

Design of Scalable Network, K Sano

Preliminary Result

12

node A node B node D node E

0.05 % 1.5 % 98.4 %

SW SW SW

SW SW

SW

2 m 10 ns

2 m 10 ns

2 m 10 ns

2 m 10 ns

20 m 100 ns

20 m 100 ns

20 m 100 ns

80 m, 400 ns

Stage 1

Stage 2

Stage 3 spine switch

Fat tree (Max 6 hops)

node A node B node D node E

0.05 % 0.33 % 99.6 %

SW SW SW

SW SW

2 m 10 ns

2 m 10 ns

2 m 10 ns

2 m 10 ns

10 m 50 ns

10 m 50 ns

10 m 50 ns

15 m 75 ns

1~16 hops in 2D torus

FTT-hybrid (Max 20 hops)

No big difference in Max cable delay

Fat tree = 1020ns + (5 SW-delay)

Hybrid = 1395ns + (19 SW-delay)

Hybrid can have shorter delay for local p2p communication.

80 m, 400 ns

Design of Scalable Network, K Sano

Example of 3D Mesh Communication

3D decomposition and

adjacent communication

13

Data exchange among 3D subgrids

x

y

z

Global NW: 2D Torus of 16x16 groups

G

x 16

x 16

128128

128256

Nodes

z (x & y can be assigned)

x

y

Latency (4 hops)

= 195ns + (4 SW-delay) : x, y

= 120ns + (3 SW-delay) : z

Much shorter than Fat tree

= 1020ns + (5 SW-delay) : x, y, z

Design of Scalable Network, K Sano

Quantitative Evaluation (On-going)

Software simulator (OPNET-based)

Purpose

Get rough results quickly

Validate collective comm.

Rough SW model

Simple arbitration

No back pressure

Limited NW size

~8129 nodes

Hardware emulator

FPGA-based emulator

Obtain detailed results

Cycle accurate model

Real arbitration, flit-level transmission, back pressure

Large NW : ~65536 nodes

14

routing

switching

Tx & Rx delay

Rx delay given by send SW

switch delay

routing delay

switching delay

transferring delay

buffering delay

inp

ut

po

rt 0

inp

ut

po

rt 1

inp

ut

po

rt 2

inp

ut

po

rt 6

3

ou

tpu

t p

ort

0

ou

tpu

t p

ort

1

ou

tpu

t p

ort

2

ou

tpu

t p

ort

6

3

Switch structure and delay model

Design of Scalable Network, K Sano

Hardware Emulator Overview

FPGA cluster

4 x host PCs

4 x FPGAs / PC

4 x 10G SFP+ ports / FPGA

Implementation

SW for nodes (on Linux)

HW for switches (on FPGA)

15

Node of

FPGA cluster

FPGA board (Stratix V)

x 4

QDR II+

SRAM A

QDR II+

SRAM B

QDR II+

SRAM C

QDR II+

SRAM D

DD

R3 D

RA

M A

PC

3-1

28

00

(D

DR

3-1

60

0)

DD

R3 D

RA

M B

PC

3-1

28

00

(D

DR

3-1

60

0)

10G SFP+ A(Tx, Rx)

10G SFP+ B(Tx, Rx)

10G SFP+ C(Tx, Rx)

10G SFP+ D(Tx, Rx)

ALTERA

Stratix V FPGA

5SGXEA7

N2F45C2

12.8GB/s

12.8GB/s

x18@500MHz

1GB/s forread/write

10Gbps+ each (Tx, Rx)

18 Mbits each(20-bit addressing for 18-bit data)

2GB as default(up to 8GB)

x64@800MHz(DDR)up to

1066MHz

PCIe 3.0 x 8 : 8GB/s (Tx, Rx)

DE5-NET

PCI-Express

DDR3

memory

QDRII SRAM

SFP+

10G Ether

FPGA

Other nodes

not installed yet

Design of Scalable Network, K Sano

Hardware Emulator Overview

16

64 port

10GbE switch

4 x FPGA boards

SFP+

10GbE ports

Node of

FPGA cluster

Other nodes

not installed yet

Design of Scalable Network, K Sano

Summary

Design space exploration for

small diameter NWs with high-radix switches

Technology constraint

Application demands

global and local-p2p communication

Two candidates after topology comparison

Full fat tree & FTT-hybrid

Preliminary evaluation for cable length & delay

Future (on-going) work

Quantitative evaluation with simulation & emulation

Application performance estimation

17

Design of Scalable Network, K Sano 18

Thank you!