intrachip optical networks for a future …conferences.ucdavis.edu/ces_pages/download/past...
TRANSCRIPT
IntraChip Optical Networks 29 February, 2008
© 2008 IBM Corporation
IntraChip Optical Networks for a Future Supercomputer-on-a-Chip
Jeffrey Kash, IBM Research
IntraChip Optical Networks 29 February, 20082© 2008 IBM Corporation
AcknowledgementsIBM Research: Yurii Vlasov, Clint Schow, Will Green, FengnianXia, Jose Moreira, Eugen Schenfeld, Jose Tierno, Alexander Rylyakov,
Columbia University: Keren Bergman, Luca Carloni, Rick Osgood
Cornell University: David Albonesi, Alyssa Apsel, Michal Lipson, Jose Martinez
UC Santa Barbara: Daniel Blumenthal, John Bowers
IntraChip Optical Networks 29 February, 20083© 2008 IBM Corporation
Outline
Optics in today’s HPCs
Trends in microprocessor design– Multi-core designs for power efficiency
Vision for future IntraChip Optical Networks (ICON)– 3D stack of logic, memory, and global optical
interconnects
Required devices and processes– Low power and small footprint
IntraChip Optical Networks 29 February, 20084© 2008 IBM Corporation
Today’s High Performance Server Clusters:Racks are mainly electrically connected, but going optical
Real systems are 10-100s of server racks and several racks of switches
– Rack-to-rack interconnects (≤100m) now moving to optics
– Interconnects within racks (≤5m) now primarily copper
Over time, optics will increasingly replace copper at shorter and shorter distances
• Backplane and card interconnects (≤1m) after rack-to-rack
– Trend will accelerate as bitrates (in the media) increase and costs come down
• 2.5Gb/s 5 Gb/s 10Gb/s 20Gb/s(?)• Target ~ $1/Gb/s
All Electrical All Optical
NEC Earth Simulator(during installation)-All copper-Next gen: Optics
IBM Federation Switch for ASCI Purple (LLNL)(backside of a switch rack)- Copper (bulk, bend, weight, air cooling) - Optical (very organized but more expensive)
Snap 12 module12 Tx or Rx at 2.5Gb/s
placement at back of rack
IntraChip Optical Networks 29 February, 20085© 2008 IBM Corporation
High-SpeedCopper Cabling
Fiber-Ribbon
3 4 31.26
7.24 mils6
35 x 35μm62.5μm pitch
400μm
Cables
Electrical Transmission Lines and Optical Waveguides
HM-Zd 10Gbps connector40 differential pairs
(25mm wide)
MT fiber ferrule48 fibers
extendable to 72 or 96(7mm wide)
ConnectorsBeyond bitrate, density is a major driver of optics.
But optics must be packaged deep within the system to achieve density improvements
IntraChip Optical Networks 29 February, 20086© 2008 IBM Corporation
Packaging of Optical Interconnects is Critical
Optical bulkheadconnector
Ceramic Organic card
Opto module
1cm FlexNIC
Laser+driver ICfiber1.7cm traces
Optical bulkheadconnector
Ceramic Organic card
Opto module
1cm FlexNICNIC
Laser+driver ICfiber1.7cm traces
Optics on-MCM
Good: Optics on-card
• Better to put optics close to logic rather than at the card edgeAvoids distortion, power, & cost of electrical link on each end of optical linkBreaks through pin-count limitation of multi-chip modules (MCMs)
Operation at 10 Gb/s:equalization required
Operation to >15 Gb/s:no equalization required
Ceramic Organic card
Opto module
>12.5cm traces(with or w/o via stubs)
NIC Laser+driver IC
fiber1cm Flex
1.7cm traces
Ceramic Organic card
Opto moduleNICNIC Laser+driver IC
fiber1cm Flex
~2cm traces
Colgan, et. al., “Direct integration of dense parallel optical interconnects on a first level package for high-end servers”, ECTC 2005, 55th, pp. 228-233, Vol. 1., 31 May-3 June 2005.
Optics on-card
~2cm traces
Bandwidth limited by # of pins
IntraChip Optical Networks 29 February, 20087© 2008 IBM Corporation
Current architecture: Electronic Packet Switching
Current architecture (electronic switch chips, interconnected by electrical or optical links, in multi-stage networks)
works well now---– Scalable BW & application-
optimized cost• Multiple switches in parallel
– Modular building blocks • many identical switch chips & links)
-- but challenging in the future– Switch chip throughput stresses the
hardest aspects of chip design• I/O & packaging
– Multi-stage networks will require multiple E-O-E conversions
• N-stage Exabyte/s network = N*Exabytes/s of costN*Exabytes/s of power
Central switch racks
Mare Nostrum, Barcelona Supercomputing Center
IntraChip Optical Networks 29 February, 20088© 2008 IBM Corporation
Scalable Optical Circuit Switch (OCS)
MEMS-based OCS HW is commercially available (Calient, Glimmerglass,..)
• 20 ms switching time• <100 Watts
Possible new architecture: Optical Circuit Switching(Optics is not electronics, maybe a different architecture can use it better)
All-Optical Packet Switches are hard– e.g., IBM/Corning OSMOSIS project
• Expensive, and required complex electrical control network
– No optical memory or optical logic– Probably not cost-competitive against
electronic packet switches, even in 2015-2020
But Optical Circuit Switches (~10millisecond switching time) are available today
– Several technologies (MEMS, piezo-, thermo-,..)
– Low power• OCS power essentially zero, compared to
electronic switch• no extra O-E-O conversion
– But require single-mode optics– In ~2015, with silicon photonics, ~1nsec
switching time• Does 6 orders of magnitude make approach
more suitable to general-purpose computing?
OCS
Input fiber(one channel
shown)
Output fibersOCS Concept
2-axis MEMSMirror
(one channelshown)
IntraChip Optical Networks 29 February, 20089© 2008 IBM Corporation
Outline
Optics in today’s HPCs
Trends in microprocessor design– Multi-core designs for power efficiency
Vision for future IntraChip Optical Networks (ICON)– 3D stack of logic, memory, and global optical
interconnects
Required devices and processes– Low power and small footprint
IntraChip Optical Networks 29 February, 200810© 2008 IBM Corporation
Chip MultiProcessors (CMPs)IBM Cell, Sun Niagara, Intel Montecito, …(note that the processors on the chip are not identical)
Parameter Value Technology process 90nm SOI with low-κ dielectrics and 8 metal
layers of copper interconnect Chip area 235mm^2 Number of transistors ~234M Operating clock frequency 4Ghz Power dissipation ~100W Percentage of power dissipation due to global interconnect
30-50%
Intra-chip, inter-core communication bandwidth
1.024 Tbps, 2Gb/sec/lane (four shared buses, 128 bits data + 64 bits address each)
I/O communication bandwidth 0.819 Tbps (includes external memory)
IBM Cell:
IntraChip Optical Networks 29 February, 200811© 2008 IBM Corporation
After Moray McLaren, HP Labs
~2017Multiple “supercores” on a chip
Electrical communication within supercoreOptical communications between supercores
…but perhaps a hierarchical design of several coresgrouped into a supercore will emerge
IntraChip Optical Networks 29 February, 200812© 2008 IBM Corporation
Theme: How to continue to get exponential performance increase over time (Moore’s Law extension) from silicon ICs even though CMOS scaling by itself is no longer enough
BW requirements must scale with System Performance, ~1Byte/FLOP
IBM Cell Processor9 processors, ~200GFLOPs
On- and Off-chip BW~100GB/sec (0.5B/FLOP)
Transistors
Increased # of Processors
Uniprocessorperformance
Time (linear)
Perf
orm
ance
(log)
Communications and Architecture Can Si photonics provide this
performance increase?
Peta-scale (~2012)
Exa-scale (~2017)
Tera-scale (today)
(Moore’s Law extension)
(original Moore’s Law applies here)
IntraChip Optical Networks 29 February, 200813© 2008 IBM Corporation
Outline
Optics in today’s HPCs
Trends in microprocessor design– Multi-core designs for power efficiency
Vision for future IntraChip Optical Networks (ICON)– 3D stack of logic, memory, and global optical
interconnects
Required devices and processes– Low power and small footprint
IntraChip Optical Networks 29 February, 200814© 2008 IBM Corporation
Higher BW and lowerPower with Optics?
i.e., 3D Integration(why not go to optical plane, too?)
Inter-core communication trends – network on chipINTEL Polaris 2007 Research Chip: 100 Million Transistors ● 80 cores (tiles) ● 275mm2
IntraChip Optical Networks 29 February, 200815© 2008 IBM Corporation
Photonics in Multi-Core ProcessorsIntra-Chip Communications Network
OPTICS:Modulate/receive ultra-high bandwidth data stream once per communication event
Broadband switch fabric is uses very little power○ highly scalable
Off-chip and on-chip can use essentially the same technology○ Much more off-chip BW available
TX RX TX RXTX
RXTX
RXTX
RXTX
RX
ELECTRONICS:Buffer, receive and re-transmit at every switchOff chip is pin-limited and really power hungry
Photonics changes the rules
IntraChip Optical Networks 29 February, 200816© 2008 IBM Corporation
Integration Concept
3D layer stacking will be prevalent in the 22nm timeframe
Intra-chip optics can take advantage of this technology
Photonics layer (with supporting electrical circuits) more easily integrated with high performance logic and memory layers
Layers can be separately optimized for performance and yield
Processor System Stack
Photonic Network Interconnect Plane (includes optical devices, electronic drivers & amplifiers and electronic control network)
Optical Off-chipInterconnects
Memory Plane
Memory Plane
Memory Plane
BEOL verticalelectrical interconnects
Processor Plane w/ local memory cache
IntraChip Optical Networks 29 February, 200817© 2008 IBM Corporation
Vision for Silicon Photonics: Intra-Chip Optical NetworksPack ~36 IBM Cell processor “supercores” on a single ~600mm2 die in 22nm CMOS
Intra-Chip Optical Network: Fundamentally alters the roadmap to scaling high-performance multi-core processors – Communications sub-system and architecture that leap-frogs equivalent electronic systems
• Use photonics for communications, not logic• May require new network architecture; not just a point to point replacement of electrical
network– Silicon Nanophotonics: Enormous capacity and fundamentally low power consumption
• Estimate optical network requires 25 Watts vs. 640 Watts for equivalent electrical network• Off-chip power advantage is even more compelling, by more than an order of magnitude
• In each Cell supercore, there are 9 cores (PPE + 8SPEs)• 324 processors in one chip• Power and area dramatically lower than today at
comparable clock speeds• Each supercore is electrically interconnected• Communication between supercores and off-chip are
optical • BW between supercores is similar to today’s off-Cell
BW (i.e., 1-2Tbps per Cell)
IntraChip Optical Networks 29 February, 200818© 2008 IBM Corporation
PG
PG
PG
PG
PG
PG
PG
PG
PG
Possible On-Chip Optical Network ArchitectureBufferless, Deflection-switch based(OCS on a chip)
Cell Core (on processor plane)Gateway to ICON (on processor and photonic plane)
Thin Electrical Control Network(~1% BW, also sends small messages)
Photonic Network
Deflection Switch
IntraChip Optical Networks 29 February, 200819© 2008 IBM Corporation
On-chip Network Implementation
Subsystems
Architecture with Improved
Application Performance
Possible Devices
• Supercore Gateway to optical network ~2Tbps including over provisioning and codingCombine WDM, TDM, SDM, for example:
– 480Gbps Optical channels – 6 λ’s at 80Gbps or 12 λ’s at 40Gbps
– 4 parallel optical channels
e.g., ring resonator array
• Optical devicesMetrics determined by system needsUltra-dense: 30x area improvement compared to EPIC programCMOS compatible (22nm node)Low powerFunctions include:
– Transmitter, Receiver, Switch, Transport (waveguides, gain)
• Bufferless optical switch networkOver provisioned to optimize throughput and latencySimple electric control plane with block transfer Integration with μProc via 3D layer stacking
IntraChip Optical Networks 29 February, 200820© 2008 IBM Corporation
Outline
Optics in today’s HPCs
Trends in microprocessor design– Multi-core designs for power efficiency
Vision for future IntraChip Optical Networks (ICON)– 3D stack of logic, memory, and global optical
interconnects
Required devices and processes– Low power and small footprint
IntraChip Optical Networks 29 February, 200821© 2008 IBM Corporation
Devices for Implementation Ultradense waveguides
Ring ResonatorTotal Internal Reflection Switch
Optical Gain Block
WDM– Lattice filter (L),-- Ring Resonator (R)
1μm bend radiusUltradense Si waveguides and Optical Components:– 90nm and beyond CMOS generations enables:
• ~20x smaller bend radius than today’s EPIC designs• ~30x smaller area
On chip modulators: Ring resonator or MZI based2x2 optical deflection switches:– Broad band (all λ’s switched simultaneously)– Temperature variation tolerant (~20C)– MZI, TIR or MMI devicesIntegrated InP layers:– Provides optical gain to overcome network lossesOptical or Electronic TDM : Mux up to high bitrate from logic with fast modulators/detectors or by on-chip OTDMOptical WDM Mux-DeMux – Rings, MZI, MMI, AWG are choicesDetectors: e.g., Integrated WG Ge photodetectorsSupporting electronics: High performance, low power CMOS based drivers, amplifiers, control network/logic. Optical source: off chip lasers
Device designs and estimated future performance based on work at IBM, Cornell, UC Santa Barbara and Columbia– Aggressive (but not implausible) performance extrapolation
IntraChip Optical Networks 29 February, 200822© 2008 IBM Corporation
Off-chip optical coupling
IntraChip Optical Networks 29 February, 200823© 2008 IBM Corporation
Coupling from fiber to silicon photonic wire
Fiber
Polymer waveguide
(~3μm by 2μm)
~500nm by 220nm
Si photonic wire
~500nm
Coupling loss <1dB (with a lensed fiber)Cornell, IBM, NTT
S.McNab et al Optics Express 2003 (IBM)
IntraChip Optical Networks 29 February, 200824© 2008 IBM Corporation
WDM passives
IntraChip Optical Networks 29 February, 200825© 2008 IBM Corporation
(i) Multimode Interferometer based WDM devices
6μm
An imaging device: an input field is reproduced in single or multiple images at periodic intervals along propagation direction
Concept: Soldano et al., J. Lightwave Technol., 13, 615-627, 1995.
In
F.Xia et al OFC 2007 (IBM)
IntraChip Optical Networks 29 February, 200826© 2008 IBM Corporation
Footprint: ~40μm×130μm (~0.005mm2)
• 10 times smaller than AWG on same SOI platform
• 100 times smaller than III-V AWG)
(i, continued) MMI-MZI based λ demultiplexerF.Xia et al OFC 2007 (IBM)
20μm
λ1, λ2, λ3, λ4 λ1
λ2
λ3
λ4
R2μm
20μm
λ1, λ2, λ3, λ4 λ1
λ2
λ3
λ4
R
20μm
λ1, λ2, λ3, λ4 λ1
λ2
λ3
λ4
R
20μm
λ1, λ2, λ3, λ4 λ1
λ2
λ3
λ4
20μm20μm
λ1, λ2, λ3, λ4 λ1
λ2
λ3
λ4
R2μm
1540 1542 1544 1546 1548 1550 1552 1554
-20
-15
-10
-5
0
1432
Wavelength (nm)
Rep
onse
(dB
)
1• As-grown
• No active tuning• Loss: 3dB• Pass band: 0.3nm
• Limited by crosstalk• Crosstalk: -12dB• Channel spacing: 3.2±0.1nm
• Designed channel spacing: 3.2nm
IntraChip Optical Networks 29 February, 200827© 2008 IBM Corporation
(ii) WDM based on detuned ring resonators
1542 1544 1546 1548 1550 1552 1554 1556-30
-20
-10
0
Rel
ativ
e re
spon
se (d
B)
Wavelength (nm)
Channel #1 Channel #2 Channel #3 Channel #4
20μm
λ1, λ2, λ3, λ4
λ1
λ2
λ3
λ4
•Cross talk < -20dB
•Limited transmission bandwidth
•Designed channel spacing: 3.2nm
•Experimental value: 2.2nm to 3.1nm
•Difference in resonance wavelength is due to different perimeters of the resonator
• Δλ=3.2nm ↔ ΔL=180nm
F.Xia et al OFC 2007 (IBM)
IntraChip Optical Networks 29 February, 200828© 2008 IBM Corporation
Fast modulators
IntraChip Optical Networks 29 February, 200829© 2008 IBM Corporation
>9dB modulation depth
PRBS 210-1
Waveguide
Ring
0.2µm
,
Silicon Ring Modulator at 12.5 Gb/sXu, Schmidt, and Lipson, Nature, 2005(Cornell)
Appears extendable to 40Gb/s
IntraChip Optical Networks 29 February, 200830© 2008 IBM Corporation
CMOSDriver
CMOSDriver
CMOSDriver
CMOSDriver
Broadband Optical Deflection Switches(all wavelengths simultaneously deflected)
Broadband ring-resonator switch
ON state: – carrier injection coupling into
ring signal switched
OFF state– passive waveguide crossover– negligible power
OFF ON
IntraChip Optical Networks 29 February, 200831© 2008 IBM Corporation
Broadband, thermally stable deflection switch from multiple resonators(Multiple rings broaden the passband for thermal stability)Xia, et al, CLEO 2007, Green, et al., OFC 2008 (IBM)
Switch Performance (multiple λs)
λ2
λ3
λ4
Apodization(flattens passband)
-6 -4 -2 0 2 4 6-3 5
-3 0
-2 5
-2 0
-1 5
-1 0
-5
0
Tran
smis
sion
(dB
m)
W a ve le n g th d e tu n in g (n m )
2 .5 n m
5-ring (w apodization) 4-ring (w/o apodization)
IN
THRU
DROP
IntraChip Optical Networks 29 February, 200832© 2008 IBM Corporation
Photodetectors
IntraChip Optical Networks 29 February, 200833© 2008 IBM Corporation
Ge-on-SOI Detector DesignDehlinger, et al. PTL, 2004 and Schow et al., PTL, 2006 (IBM)
Wi = 300 nmWm = 200 nmS = 0.4 – 1.0 μmtGe = 350 nm
• Lateral PIN design, direct Ge growth on thin SOI.
Si
SiO2
GeSi
Ti/Al
n+ p+ n+ p+
SiO2
SWi
Wm
tGe
• Design Features:– Epitaxy using UHV-CVD−Buried oxide isolates carriers
generated in substrate− Eliminates low-frequency “tail”− 20GHz bandwidth
− Lateral p-i-n design for low capacitance/ dark current
IntraChip Optical Networks 29 February, 200834© 2008 IBM Corporation
Adiabatically coupled, high bandwidth waveguide Ge photodiodes on SOI substrate
On-chip optics requires waveguide geometry(e.g., Luxtera, announced)
Oxide layer
Si strip waveguide
Ge adiabatic taper
L taper
L diode
Light in Si waveguide
IntraChip Optical Networks 29 February, 200835© 2008 IBM Corporation
On-chip gain: Evanescent Optical AmplifiersPark, Bowers, et al, PTL 19, p. 210 (2007) (UCSB)
Intrachip networks have many nodes. Network size is limited by loss in waveguides, switches, and waveguide crossingsSolution: Silicon evanescent optical amplifiers (III-V gain medium)Initial results:
– Device dimensions: H= 0.7 um, W = 2 um, L = 1.36 mm– Amplifier Gain: 13 dB
• Evanescent design allows higher saturation output powers than convention III-V amp• Heating effects can be minimized with package design
University of CaliforniaUniversity of CaliforniaSanta BarbaraSanta Barbara
IntraChip Optical Networks 29 February, 200836© 2008 IBM Corporation
3D Integration
IntraChip Optical Networks 29 February, 200837© 2008 IBM Corporation
Currently developing 3DI processes based upon both Cu-Cu compression and oxide fusion bonding.
Development of 3DI Process for Si Photonics
Oxide bondingSOI
Face-to-backCu-Cu bonding
BulkFace-to-back
Additional process technology development will be necessary to adapt each approach for photonics integration. Challenges include:
– Photonic devices and off-chip optical coupling compatible with 3D integration
– Thermal management
IntraChip Optical Networks 29 February, 200838© 2008 IBM Corporation
Major Challenges
Achieving required device performance, in particular for:– Optical bandwidth of modulators and switches– High per-channel bitrates from direct modulation – Device density– WDM stability against temperature variationsLow power operationIntegration and path to manufacturability– InP with Si– Electronic support circuits with Si nanophotonics devices– Compatibility with 3D layer stacking technologies and CMOS processingNetwork performance: demonstrate system/application advantages – Low latency– High throughput (avoid congestion)
IntraChip Optical Networks 29 February, 200839© 2008 IBM Corporation
Summary
Opt
ical
I/O
Multi-core uProcessor architectures are emerging as a key concept to provide power efficient high performance computing capability– On-chip optical network can overcome the intra-chip and off-chip communications power
bottleneck to scaling these architectures
An on-chip optical network is not just a point to point replacement of electrical network– No optical logic or buffers not packet switched
Silicon Nanophotonics can provide the enormous capacity and fundamentally low power consumption required for future multi-core microprocessors– Work on required devices and demonstration of the utility of the architecture just beginning