7 joshi presentation

Upload: ajit-ashok-ninjazz

Post on 07-Apr-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 7 Joshi Presentation

    1/40

    Building manycore processor-to-DRAMnetworks using monolithic silicon photonics

    Ajay Joshi , Christopher Batten , Vladimir Stojanovi , Krste Asanovi

    MIT, 77 Massachusetts Ave, Cambridge MA 02139 UC Berkeley, 430 Soda Hall, MC #1776, Berkeley, CA 94720{joshi, cbatten, vlada}@mit.edu, [email protected]

    High Performance Embedded Computing(HPEC) Workshop

    23-25 September 2008

  • 8/3/2019 7 Joshi Presentation

    2/40

    MIT/UCB

    Manycore systems design space

  • 8/3/2019 7 Joshi Presentation

    3/40

    MIT/UCB

    Manycore system bandwidth requirements

  • 8/3/2019 7 Joshi Presentation

    4/40

    MIT/UCB

    Manycore systems bandwidth, pin count and power scaling

    4

    1 Byte/Flop,8 Flops/core

    @ 5GHz S e r

    v e r & H P

    C

    M o b ile C lie n t

  • 8/3/2019 7 Joshi Presentation

    5/40

    MIT/UCB

    Interconnect bottlenecks

    CPU

    Cache

    D R A M

    D I M M

    Manycore system

    cores

    Cache

    D R A M

    D I M M

    Cache

    D R A M

    D I M M

    CPU CPU

    InterconnectNetwork

    Interconnect

    Network

    Bottlenecks dueto energy and

    bandwidthdensity limitations

  • 8/3/2019 7 Joshi Presentation

    6/40

  • 8/3/2019 7 Joshi Presentation

    7/40 MIT/UCB

    OutlineMotivation

    Monolithic silicon photonic technologyProcessor-memory network architecture explorationManycore system using silicon photonics

    Conclusion

  • 8/3/2019 7 Joshi Presentation

    8/40 MIT/UCB

    Unified on-chip/off-chip photonic link

    Supports dense wavelength-division multiplexingthat improves bandwidth densityUses monolithic integration that reduces energyconsumptionUtilizes the standard bulk CMOS flow

  • 8/3/2019 7 Joshi Presentation

    9/40 MIT/UCB

    Optical link components

    65 nm bulk CMOS chip designed to test various optical devices

  • 8/3/2019 7 Joshi Presentation

    10/40 MIT/UCB

    Silicon photonics area and energy advantage

    Metric Energy(pJ/b)

    Bandwidth density(Gb/s/)

    Global on-chip photonic link 0.25 160-320

    Global on-chip optimally repeated electrical link 1 5

    Off-chip photonic link (50 coupler pitch) 0.25 13-26Off-chip electrical SERDES (100 pitch) 5 0.1

    On-chip/off-chip seamless photonic link 0.25

  • 8/3/2019 7 Joshi Presentation

    11/40 MIT/UCB

    OutlineMotivation

    Monolithic silicon photonic technologyProcessor-memory network architecture exploration

    Baseline electrical mesh topology

    Electrical mesh with optical global crossbar topologyManycore system using silicon photonicsConclusion

  • 8/3/2019 7 Joshi Presentation

    12/40 MIT/UCB

    Baseline electrical system architecture

    Access point per DM distributed across the chip

    Two on-chip electrical mesh networksRequest path core access point DRAM moduleResponse path DRAM module access point core

    Mesh physical view Mesh logical view

    C = core, DM = DRAM module

  • 8/3/2019 7 Joshi Presentation

    13/40 MIT/UCB

    Interconnect network design methodology

    Ideal throughput and zero load latency used as

    design metricsEnergy constrained approach is adoptedEnergy components in a network

    Mesh energy ( E m ) (router-to-router links (RRL), routers)IO energy ( E io ) (logic-to-memory links (LML))

    F l i t w

    i d t h

    Calculate on-chipRRL energy

    Calculate on-chiprouter energy

    Calculate mesh

    throughput

    Calculate totalmesh energy Calculate energybudget for LML

    Total energy budget

    Calculate LMLwidth

    Calculate I/O

    throughput

    Calculate zero

    load latency

  • 8/3/2019 7 Joshi Presentation

    14/40

    MIT/UCB

    Network throughput and zero load latency

    System throughput limited by on-chip mesh or I/O linksOn-chip mesh could be over-provisioned to overcome meshbottleneckZero load latency limited by data serialization

    (22nm tech, 256 cores @ 2.5 GHz, 8 nJ/cyc energy budget)

  • 8/3/2019 7 Joshi Presentation

    15/40

    MIT/UCB

    Network throughput and zero load latency

    System throughput limited by on-chip mesh or I/O linksOn-chip mesh could be over-provisioned to overcome meshbottleneckZero load latency limited by data serialization

    (22nm tech, 256 cores @ 2.5 GHz, 8 nJ/cyc energy budget)

  • 8/3/2019 7 Joshi Presentation

    16/40

    MIT/UCB

    Network throughput and zero load latency

    System throughput limited by on-chip mesh or I/O linksOn-chip mesh could be over-provisioned to overcome meshbottleneckZero load latency limited by data serialization

    (22nm tech, 256 cores @ 2.5 GHz, 8 nJ/cyc energy budget)

    O P F : 1

    O P F : 2

    O P F : 4

  • 8/3/2019 7 Joshi Presentation

    17/40

    MIT/UCB

    Network throughput and zero load latency

    System throughput limited by on-chip mesh or I/O linksOn-chip mesh could be over-provisioned to overcome meshbottleneckZero load latency limited by data serialization

    On-chip

    serialization

    Off-chipserialization

    (22nm tech, 256 cores @ 2.5 GHz, 8 nJ/cyc energy budget)

    O P F : 1

    O P F : 2

    O P F : 4

  • 8/3/2019 7 Joshi Presentation

    18/40

    MIT/UCB

    OutlineMotivation

    Monolithic silicon photonic technologyProcessor-memory network architecture exploration

    Baseline electrical mesh topology

    Electrical mesh with optical global crossbar topologyManycore system using silicon photonicsConclusion

  • 8/3/2019 7 Joshi Presentation

    19/40

    MIT/UCB

    Optical system architecture

    Off-chip electrical links replaced with optical linksElectrical to optical conversion at access pointWavelengths in each optical link distributedacross various core-DRAM module pairs

    Mesh physical view Mesh logical view

    C = core, DM = DRAM module

  • 8/3/2019 7 Joshi Presentation

    20/40

    MIT/UCB

    Network throughput and zero load latency

    Reduced I/O cost improvessystem bandwidthReduction in latency due to

    lower serialization latencyOn-chip network is the newbottleneck

  • 8/3/2019 7 Joshi Presentation

    21/40

    MIT/UCB

    Network throughput and zero load latency

    Reduced I/O cost improvessystem bandwidthReduction in latency due to

    lower serialization latencyOn-chip network is the newbottleneck

  • 8/3/2019 7 Joshi Presentation

    22/40

    MIT/UCB

    Optical multi-group system architecture

    Break the single on-chip electrical mesh into several groupsEach group has its own smaller meshEach group still has one AP for each DMMore APs each AP is narrower (uses less s)

    Use optical network as a very efficient global crossbar Need a crossbar switch at the memory for arbitration

    Ci = core in group i , DM = DRAM module, S = global crossbar switch

  • 8/3/2019 7 Joshi Presentation

    23/40

    MIT/UCB

    Network throughput vs zero load latency

    Grouping moves traffic

    from energy-inefficientmesh channels to energy-efficient photonicchannelsGrouping and siliconphotonics provides 10x-15x throughputimprovementGrouping reduces ZLL inphotonic range, butincreases ZLL in electricalrange

    A

    B

    1 0 x - 1 5 x

  • 8/3/2019 7 Joshi Presentation

    24/40

    MIT/UCB

    Simulation results

    Grouping2x improvement in bandwidth at comparable latency

    Overprovisioning2x-3x improvement in bandwidth for small group count atcomparable latency

    Minimal improvement for large group count

    256 cores,16 DM

    Uniform random traffic

    256 cores,16 DM

    Uniform randomtraffic

  • 8/3/2019 7 Joshi Presentation

    25/40

    MIT/UCB

    Simulation results

    Replacing off-chip electrical with photonics (Eg1x4 Og1x4)2x improvement in bandwidth at comparable latencyUsing opto-electrical global crossbar (Eg4x2 Og16x1)

    8x-10x improvement in bandwidth at comparable latency

    256 cores,16 DM

    Uniform randomtraffic

    256 cores

    16 DMUniformrandomtraffic

  • 8/3/2019 7 Joshi Presentation

    26/40

    MIT/UCB

    OutlineMotivation

    Monolithic silicon photonic technologyProcessor-memory network architecture explorationManycore system using silicon photonics

    Conclusion

  • 8/3/2019 7 Joshi Presentation

    27/40

    MIT/UCB

    Simplified 16-core system design

  • 8/3/2019 7 Joshi Presentation

    28/40

    MIT/UCB

    Simplified 16-core system design

  • 8/3/2019 7 Joshi Presentation

    29/40

    MIT/UCB

    Simplified 16-core system design

  • 8/3/2019 7 Joshi Presentation

    30/40

    MIT/UCB

    Simplified 16-core system design

  • 8/3/2019 7 Joshi Presentation

    31/40

    MIT/UCB

    Simplified 16-core system design

  • 8/3/2019 7 Joshi Presentation

    32/40

    MIT/UCB

    Full 256-core system design

  • 8/3/2019 7 Joshi Presentation

    33/40

    MIT/UCB

    OutlineMotivation

    Monolithic silicon photonic technologyProcessor-memory network architecture explorationManycore system using silicon photonics

    Conclusion

  • 8/3/2019 7 Joshi Presentation

    34/40

    MIT/UCB

    ConclusionOn-chip network design and memory bandwidth will

    limit manycore system performanceUnified on-chip/off-chip photonic link is proposed tosolve this problemGrouping with optical global crossbar improvessystem throughputFor an energy-constrained approach, photonicsprovide 8-10x improvement in throughput at

    comparable latency

  • 8/3/2019 7 Joshi Presentation

    35/40

    MIT/UCB

    Backup

  • 8/3/2019 7 Joshi Presentation

    36/40

    MIT/UCB

    MIT Eos1 65 nm test chip

    Texas Instruments

    standard 65 nmbulk CMOSprocessFirst ever photonicchip in sub-100nmCMOS

    Automatedphotonic devicelayout

    Monolithicintegration withelectricalmodulator drivers

  • 8/3/2019 7 Joshi Presentation

    37/40

    MIT/UCB

    Ring modulator

    Paperclips

    Waveguide crossings

    M-Z test structures

    Digital driver

    4 ring filter banks

    Photo detector

    Two-ring filter

    One-ring filter

    Vertical coupler grating

  • 8/3/2019 7 Joshi Presentation

    38/40

    MIT/UCB

    Optical waveguide

    Waveguide made of polysiliconSilicon substrate under waveguide etched away toprovide optical cladding64 wavelengths per waveguide in opposite directions

    SEM image of a poly silicon waveguideCross-sectional view of a photonic chip

  • 8/3/2019 7 Joshi Presentation

    39/40

  • 8/3/2019 7 Joshi Presentation

    40/40

    Photodetectors

    Embedded SiGe used to create photodetectorsMonolithic integration enable good optical couplingSub-100 fJ/bit energy cost required for the receiver