silicon photonics for extreme scale computing challenges...

32
Rev PA1 Rev PA1 1 Keren Bergman Department of Electrical Engineering Lightwave Research Lab, Columbia University New York, NY, USA Silicon Photonics for Extreme Scale Computing Challenges & Opportunities

Upload: others

Post on 27-Jun-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 1

Keren BergmanDepartment of Electrical EngineeringLightwave Research Lab, Columbia UniversityNew York, NY, USA

Silicon Photonics for Extreme Scale Computing

–Challenges & Opportunities

Page 2: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 2

Trends in extreme HPC

• Evolution of the top10 in the last six years:– Average total compute power:

• 0.86 PFlops 21 PFlops• ~24x increase

– Average nodal compute power:• 31GFlops 600GFlops• ~19x increase

– Average number of nodes• 28k 35k• ~1.3x increase

Node compute power main contributor to performance growth

2010 2011 2012 2013 2014 2015 20160.1

1

10

100

Node computer power (FLOPs)Number of nodes

Total system compute power (FLOPs)

Average of top 10 sytems, relative to 2010

24x

19x

1.3x

[top500.org, S. Rumley, et al. Optical Interconnects for ExtremeScale Computing Systems, Elsevier PARCO 64, 2017]

Page 3: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 3

Interconnect trends

• Top 10 average node levelevolutions:– Average node compute power:

• 31GFlops 600GFlops• ~19x increase

– Average bandwidth availableper node

• 2.7GB/s 7.8GB/s• ~3.2x increase

– Average byte-per-flop ratio• 0.06 B/Flop 0.01 B/Flop• ~6x decrease• Sunway TaihuLight (#1) shows 0.004 B/Flop !!

Growing gap in interconnect bandwidth

2010 2011 2012 2013 2014 2015 20160.1

1

10

Node computer power (FLOPs)Node bandwidth (bit/s)

Byte-per-flop ratio

Average of top 10 sytems, relative to 2010

19x

3.2x

0.17x

[top500.org, S. Rumley, et al. Optical Interconnects for ExtremeScale Computing Systems, Elsevier PARCO 64, 2017]

Page 4: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 4

• Real Exascale goal: reaching an Exaflop… • …while satisfying constraints (20MW, $200M)• …with reasonably useful applications

• Assume 15% of $ budget for interconnect:– 15% x $200M / 500 Pb/s = 6 ¢/Gb/s– Bi-directional links must thus be sold for ~10 ¢/Gb/s

• Today: optical 10$/Gb/s electrical 0.1-1 $/Gb/s

• Assume 15% of power budget for interconnect:– 15% x 20MW / 125 Pb/s = 24 mW/Gb/s = 24 pJ/bit

= budget for communicating a bit end-to-end

6 pJ/bit per hop 4 pJ/bit for switching today ~20 pJ/bit 2 pJ/bit for transmission today ~10 pJ/bit (elec) ~20 pJ/bit (optical)

Exascale interconnects – power and cost constraints1.25 ExaFLOP

X 0.01 B/FLOP

= 125 Pb/s injection BWX 4 hops

= 500 Pb/s installed BW

Page 5: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

5

The Photonic Opportunity for Data Movement Energy efficient, low-latency, high-bandwidth data interconnectivity is the core

challenge to continued scalability across computing platforms Energy consumption completely dominated by costs of data movement

Bandwidth taper from chip to system forces extreme locality

Reduce Energy Consumption Eliminate Bandwidth Taper

10

100

1,000

10,000

100,000

1,000,000

Band

wid

th: G

b/se

c/m

m

Electronic

Photonic

0.01

0.1

1

10

100

1000

Chip Edge PCB Rack System

Data

Mov

emen

t Ene

rgy

(pJ/

bit)

Electronic

Photonic

Page 6: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 6

Default interconnect architecture

Node Router Router Node

Long distance(router-router) link

Short distance(router-router) link

Electrical transceivers

Optical transceivers(currently 10-20% of router-router links)

Router

Node

Short distanceNR (node-router) link

N computenodes in system

Node

Node

Node

One rack

Page 7: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 7

Router

Core

Exascale optical interconnect ?

• Requirements for that to happen: – Divide cost by 1.5 orders of magnitude at least– Improve energy-efficiency by one order of magnitude at least

Node Router

Core

Router

Core

Node

Long distance RR link

Electrical transceivers

Optical transceivers

Node

N computenodes in system

Node

Node

Page 8: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 8

Exascale supercomputing node

• Consider 50k nodes• Total injection bandwidth ~ 100 Pb/s

• 4 ZB per year at 1% utilization• Total cumulated unidirectional bandwidth: ~ 500 Pb/s

Compute power: From 10 to 30 Teraflop (TF)

Interconnect bandwidth:0.01 B/F 0.8 – 2.4 Tb/s

Bulk memory bandwidth: 0.1 B/F 8 - 24 Tb/s

Near memory bandwidth: 10-30 TF x 8bit x 0.5B/F = 40 - 120Tb/s

Requirements for next-generation interconnects!

Ideal case:Back to 0.01B/F toensure well fed nodes

Page 9: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Silicon Photonics: all the parts

• Silicon as core material– High refractive index; high contrast;

sub micron cross-section, small bend radius.

• Small footprint devices– 10 μm – 1 mm scale compared to

cm-level scale for telecom

• Low power consumption– Can reach <1 pJ/bit per link

• Aggressive WDM platform– Bandwidth densities 1-2Tb/s pin IO

Switching

WDM Modulation & Demultiplexing

Modulation

Detection

•Silicon wafer-scale CMOS– Integration, density scaling– CMOS fabrication tools– 2.5D and 3D platforms

S. Rumley et al. "Silicon Photonics for Exascale Systems”, IEEE JLT 33 (4), 2015.

Page 10: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Photonic Computing Architectures: Beyond Wires10

• Leverage dense WDM bandwidth density• Photonic switching• Distance-independent, cut-through, bufferless• Bandwidth-energy optimized interconnects

On chipShort distance PCBLong distance PCBOptical link

Conventional hop-by-hop data movement

Fully flattened end-to-end data movement

12conversions!

No conversion!

Page 11: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Silicon Photonic Link Design

clk

data

TIA

clk

data

TIAclk clk clk

data

R

C

clk

data

TIAclk gen

Silicon waveguide Silicon waveguide

• Co-existence of Electronics and Photonics• Energy-Bandwidth optimization

Clock DistributionSerialization of Data

Driver

Comb Laser

Vertical Grating Coupler

Coupled Waveguides

OOK Modulation

Fiber

Thermal Tuning

Phot

odiode

“0”

“1”

λ

Deserialization

Demultiplexing Filter

Amplifier

[1] M. Bahadori et al. “Comprehensive Design Space Exploration of Silicon Photonic Interconnects," IEEE JLT 34 (12), 2015.

Page 12: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 12

Utilization of Optical Power Budget

Photonic ElementsFrom Electronics To Electronics

Coupler Loss

Sensitivity level of receiver

Demux Array Penalty

Coupler Loss

Coupler Loss

Modulator Array Penalty

Available Laser power

Maximum Available Budget per λMax 5 dBm per λ

Extra Budget

Maximum link utilization

Page 13: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 13

All-Parameter Optimization: Min Energy Design

Photonic ElementsFrom Electronics To Electronics

Optimal design based on physical dimensions Breakdown of consumption for given bandwidth

1 pJ/bit

Page 14: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 14

3 5 10 15 20 25

Optical channel ( ) linerate [Gb/s]

0

1

2

3

Link

ene

rgy

effic

ienc

y [p

J/bi

t]10% Laser eff.

30% Laser eff.

Highly parallel, “SERDES-less” links

• Investigate “many-channel” architectures with low bitrate wavelengths– Leads to poor laser utilization

• Only possible with high laser efficiency– But may allow drastic simplification of drivers and SERDES blocks

400G

Page 15: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 15

0 10 20 300

4

8

12

Cost per bandwidth – declining but slowly

• Today (2017):– 100G (EDR) best

$/Gb/s figure

– Copper cable have shorter reaches due to higher bit-rate

– Optics: Not even ½ order of magnitude price drop over 4 years

– But electrical-optical gap is shrinking

[Besta et al. “Slim Fly: A Cost Effective Low-Diameter Network Topology”,

SuperComputing 2014]

[may 2017]

Mellanox EDR100 Gb/s

Length [m]

Cos

t [$/

Gb/

s]

Page 16: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 16

Packaging and connector drive costs

[Molex] [Barwicz/IBM, OFC 2017]

Page 17: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 17

Toward integrated optical transceivers

• Fundamental step to reduce cost and power of optics: co-integration

Node Router

Core

Router

Core

Node

Long distance RR link

Electrical transceivers

Optical transceivers

Router

Core

Node

Short distanceNR link

Page 18: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 18

Towards integrated, general-purpose optical transceivers

Node Router

Core

Router

Core

Node

Electrical transceivers

Integrated optical transceivers

Router

Core

Node

Short distanceNR link

• Fundamental step to reduce cost and power of optics

Page 19: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 19

Cost vs. energy

• Transceiver procurement cost dominates TCO– Same energy/procurement ratio as Cori (30% / 70%) with 0.4 $/Gb/s

• This is a factor of 10 from the current cost figure

Hypotetical 0.1$/Gb/s transceivers (8 years)

Hypotetical 1$/Gb/s transceivers (8 years)

Typical 100G transceivers (8 years, 2.5W, $500)

Titan (6 years, 8.2 MW, M$97)

Cori (8 years, 3.9 MW, M$100)

Roadrunner (5 years, 2.35 MW, M$100)

0% 20% 40% 60% 80% 100%

4

8

Desktop PC (5 years, 100W, $2000)

Cost of energy over lifetime ($100/MWh)

Procurement cost

Page 20: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 20

The bright side of optical switching• Optical switches can have astonishing aggregate bandwidths

– Absence of signal introspection– Possibility to receive dense WDM signals (> 1 Tb/s) on each port

• Translates into alluring $/Gb/s figures:

Optical switching >10 cheaper than electrical packet routing…

Type # ports Bandwidth per port

Total bandwidth

Price Price per Gb/s

Calient S320Mems switch

320 ports 400Gbps (with 16 wavelengths at 25G –CDAUI-16 signaling)

128 Tb/s $40k 0.3 $/Gb/s

Mellanox SB7790

36 ports 100 Gbps (4xEDR Infiniband)

3.6 Tb/s $12k 3.3 $/Gb/s

Page 21: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 21

Optical switch vs. Electrical packet router

• (Random access) buffering plays a crucial role. – Without buffering, end-to-end scheduling is required – With buffering, scheduling made link-after-link– Without buffering, no back-to-back transmissions

• Packet routers also allow differentiated QoS, error correction, etc. Packet router offers much higher “value” than optical switches

My personal guess: At least 30x more value (for 10ns optical switching time)Optical switches not competing with packet routers for “daily switching”

Optical switch: Bufferless Electrical packet router: Buffered

Page 22: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 22

Our FPGA-Controlled Switch Test-Bed

• 16 high-speed DACs enable test and control of integrated photonic switch circuit

PIN shifter Thermal tuner

3dB coupler

Grating coupler array

FPGA with 16 high-speed DACs/ADCs

On-

chip

Los

s [d

B]

1

2

3

4

5

Path category

B: Bar

C:cross

W: Waveguide

X: Crossing

Page 23: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 23

Transitioning to Novel Modular Architectures…

• Modular architecture and control plane

• Avoids on chip crossings• Fully non-blocking• Path independent insertion loss• Low crosstalk

[Dessislava Nikolova*, David M. Calhoun*, Yang Liu, Sébastien Rumley, Ari Novack, Tom Baehr-Jones, Michael Hochberg, Keren Bergman , Modular architecture for fully non-blocking silicon photonic switch fabric , Nature Microsystems & Nanoengineering 3 (1607) (Jan 2017).]

Page 24: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 24

AIM Datacom 2nd Tapeout Run

• More efforts poured into monolithically-integrated photonic devices

• Novel switch devices and test structures submitted to AIM platform

Device Area

1 4x4x4 λ Space-and-wavelength switch

1.9mm x 2.6mm

2 4x4 Si space switch 1.4mm x 2.3mm

3 4x4 Si/SiN two-layered space switch

1.5mm x 2.3mm

4 2x2 double-gated/single-gated ring switch

0.8mm x 1.4mm

5 Crossing and escalator test structure

0.6mm x 1mm

6 1x2x8 λ MUX with rings 1.2mm x 0.2mm

7 1x2x4 λ MUX with micro-disks

0.6mm x 0.2mm

8 2x2 double-gated MZM switch

3mm x 0.4mm

Page 25: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 25

What optical networks are good at: bandwidth steering• Put your fibers where traffic is• Example: 3D stencil traffic over Dragonfly

– 20 groups, 462 nodes per group, 24x24x16 stencil

• Per workload reconfiguration – avoids switching time overhead• No need for ultra-large radixes – 8x8 is sufficient

[1] K. Wen, et al. “Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics”, SC 2016

Page 26: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 26

GPU3GPU1 GPU2 GPU4

CMP1 CMP2 NIC1 NIC2MEMMEM

MEMMEM

Intra-node bandwidth steering• Emerging architectural concept:

unified interconnect fabric

– AMD’s Infinitiy fabric– HPE’s “The Machine”– Gen-Z

• Highly flexible!

• But involves many hops

– Latency– Cost/power of routers– Many chip-to-chip PHYs

Page 27: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 27

GPU3GPU1 GPU2 GPU4

CMP1 CMP2 NIC1 NIC2MEMMEM

MEMMEM

Intra-node bandwidth steering• Alternative concept: use many low-

radix optical switches– 8x8 realizable with today’s

technology– Tens of switches can be

collocated on a single chip

• Somehow less flexible than the packet routed counterpart

– Not all-to-all– Reconfiguration takes

microseconds

• But transparent for packets– Latency of point-to-point– Energy efficient

Page 28: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 28

GPU3GPU1 GPU2 GPU4

CMP1 CMP2 NIC1 NIC2MEMMEM

MEMMEM

GPU3GPU1 GPU2 GPU4

CMP1 CMP2

NIC1 NIC2

MM

Conventional architecture

MM MM MM

M M

Page 29: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 29

GPU3GPU1 GPU2 GPU4

CMP1 CMP2 NIC1 NIC2MEMMEM

MEMMEM

GPU3GPU1

GPU2 GPU4

CMP1 CMP2

NIC1 NIC2

MEMMEM MEMMEM

GPU centric / CMPs as data-accelerators

Page 30: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 30

Conclusions• Lack of bandwidth is threatening scalability

• For Exascale, need to work on (priority-sorted)– Costs of the optical part

• Automated packaging and testing• Increased integration• Larger market

– Power of electrical part• Packet routers

– Finely cost/performance-optimizedtopologies [1]

• Taper optical bandwidth, but not too much• Get as much as we can from optical cables

– Power of optical part• Energy-wise optimized designs• Improved laser efficiency (technology or “tricks”)• Technological advances

– Costs of electrical part

[1] M.Y. Teh, et al. “Design space exploration of the Dragonfly topology”, Exacomm workshop (best paper), 2017

Page 31: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 31

Conclusions

• All-optical interconnects• Optical switches and packets routers not directly comparable• Bandwidth-steering based architectures should be explored

• Optical switches used in addition to regular routers

GPU3GPU1 GPU2 GPU4

CMP1 CMP2

NIC1 NIC2

MM MM MM MM

M M

GPU3GPU1

GPU2 GPU4

CMP1 CMP2

NIC1 NIC2

MEMMEM MEMMEM

Page 32: Silicon Photonics for Extreme Scale Computing Challenges ...press3.mcs.anl.gov/atpesc/files/2017/08/ATPESC... · Silicon Photonics for Extreme Scale Computing ... Photonic Computing

Rev PA1Rev PA1 32

Thank you!