Transcript
Page 1: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

RFAD LAB, YONSEI University

IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 9, SEPTEMBER 2008

Photonic Networks-on-Chip for FutureGenerations of Chip Multiprocessors

Assaf Shacham, Member, IEEE, Keren Bergman, Senior Member, IEEE, andLuca P. Carloni, Member, IEEE

A. Shacham is with Aprius Inc., 440 N. Wolfe Rd., Sunnyvale, CA 94085. E-mail: [email protected]. Bergman is with the Department of Electrical Engineering, Columbia University, 500 W. 120th St., 1300 Mudd, New

York, NY 10027. E-mail: [email protected]. Carloni is with the Department of Computer Science, Columbia University, 466 Computer Science Building, 1214

Amsterdam Avenue, Mail Code: 0401, New York, NY 10027-7003. E-mail: [email protected].

2011. 06. 08.Kim Yeo-myung

Page 2: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

CONTENTS I. INTRODUCTION

II. RELATED WORK

III. HYBRID NOC MICROARCHITECTURE

IV. NETWORK DESIGN

V. DESIGN ANALYSIS AND OPTIMIZATION

VI. COMPARATIVE POWER ANALYSIS

VII. CONCLUSION

RFAD LAB, YONSEI University

Page 3: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

INTRODUCTION Parallel Computational Core

– New commercial release for driving performance– The role of interconnect and associated global communication

infrastructure is becoming central to the chip performance

Issue of Network-on-Chip(NoC)– Large Bandwidth & stringent latency requirements– Electrical NoC can provide enough performance but required

large power consumption → Photonic NoC

Photonic NoCs can deliver a dramatic reduction in power expended on intrachip global communi-cations while satisfying the high bandwidths requirements of CMPs

Hibryd NoC Architecture – Photonic + ElectronicRFAD LAB, YONSEI University

Page 4: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

RELATED WORK Relative performance of optical and electrical on-chip

interconnects <Collet et al>– The penetration of on-chip optical interconnects can be

envisioned in lengths larger than 1,000 times the wavelength where they can have lower power and latency than electronic interconnects

Multicore processor architecture where remote memory accesses are implemented as transactions on a global on-chip optical bus <Kirman et al>– A latency reduction as high as 50 percent for some applications

and a power reduction of about 30 percent over a baseline electrical bus

RFAD LAB, YONSEI University

Page 5: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

RELATED WORK An optical NoC based on a wavelength-routed crossbar

<Briere et al>– The crossbar, comprised of passive resonator devices and

routing between an input-output pair, is achieved by selecting the appropriate wavelength

– Problem : requires either widely tunable laser sources or large arrays of fixed-wavelength sources with fast wavelength-selection switches

Benefits of optical intrachip interconnects<Intel>– While optical clock distribution networks are not especially

attractive, wavelength division multiplexing (WDM) does offer interesting advantages for intrachip optical interconnects over copper in deep-submicron processes.

RFAD LAB, YONSEI University

Page 6: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

HYBRID NOC MICROARCHITECTURE Meaning of Hybrid

– Optical + Electronic– Circuit-switched network(bulk message) + packet-switched

network(short message)

Why Hybrid?– Photonic packet switching? Two necessary functions for packet

switching, namely, buffering and header processing, are very difficult to implement with optical devices

– Electronic NoC Problem? Electronic NoCs do have many advantages in flexibility and abundant functionality, but tend to consume high power, which scales up with the transmitted bandwidth

RFAD LAB, YONSEI University

Page 7: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

HYBRID NOC MICROARCHITECTURE Operation of optical circuit switching

1. Electronic control packet is transmitted → routed in the electronic network & setting up a photonic path

2. Buffering takes place for the electronic packets during the path-setup phase

3. The established paths are optical circuits between processing cores → enabling low power, low latency, high BW.

Advantage of photonic path– Bit-rate transparency : 어떤 소자가 광 신호의 전송 속도 (bit-rate)에 관계없이 처리 할 수 있는 능력 → Dynamic power dissipation

scales with the bit rate in electronics(switching power). But photonic switches switch on and off once per message and their energy dissipation does not depend on the bit rate

– Low loss in optical wave guides

RFAD LAB, YONSEI University

Page 8: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

HYBRID NOC MICROARCHITECTURE Exploiting Photonics in NoC Design

RFAD LAB, YONSEI University

Optical Switch

Modulator

Waveguide & FiberCoupling lens

The construction of the photonic NoC in a single layer, above the metal

* Torus Networks * Off-Chip Laser

* Optical Clock Distribution Network

* WDM

(Microring-resonator structure)

Page 9: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

HYBRID NOC MICROARCHITECTURE Life of a Message in the Photonic NoC

1. A write operation that takes place from a processing unit in a core to a memory that is located in another core is start.

2. As soon as the write address is known a path-setup packet is sent on the electronic control network.

3. The control packet is routed in the electronic network, reserving the photonic switches along the path for the photonic message which will follow it.

4. When the path-setup packet reaches the destination port, the photonic path is reserved and is ready to route the message.

5. A short light pulse can then be transmitted onto the waveguide in the opposite direction (from the destination to the source), signaling to the source that the path is open.

6. After the message transmission is completed, a path teardown packet is sent to free the path resources for usage by other messages.

RFAD LAB, YONSEI University

Page 10: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Building Blocks)

Photonic Switching Element(PSE)

– Microring-resonator structure(similar device : optically pumped)– OFF state: The resonant frequency of the rings is different from

the wavelength– ON state: The switch is turned on by the injection of electrical

current into p-n contacts surrounding the rings– Switching time : 30 ps– Their merit lies mainly in their extremely small footprint, with ring

diameters of approximately 12um, and their low power

RFAD LAB, YONSEI University

Page 11: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Building Blocks)

Photonic Switching Element(PSE)

– 4 X 4 switches (controlled by electronic circuit termed an ER)– Control packets are received in the ER, processed, and sent to

their next hop, while the PSEs are switched ON and OFF accordingly

– Blocking Relation is exist. (Nonblocking switches offer improved performance and simplify network management and routing.)

RFAD LAB, YONSEI University

Page 12: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Topology)

4 X 4 folded torus network– The communication requirements of a CMP are best served by a

2D regular topology such as a mesh or a torus

– A regular 2D topology requires 5 X 5 switches which are overly complex to implement using photonic technology.

– Therefore use a folded-torus topology as a base and augment it with access points for the gateways.

RFAD LAB, YONSEI University

Page 13: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Topology)

4 X 4 folded torus network

– The access points for the gateways are designed with two goals in mind: 1) to facilitate injection and ejection without interference with the through traffic on the torus and 2) to avoid blocking between injected and ejected traffic which may be caused by the switches internal blocking.

Page 14: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Topology)

4 X 4 folded torus network

Page 15: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

NETWORK DESIGN(Flow Control)

XY dimension-order routing on the torus network

– Path setup time is required (travel a number of ERs and undergo some processing in each hop & blocking) (nanosecond order)

– The transmission latency of the optical data is very short and depends only on the group velocity of light in a silicon waveguide : 2cm – 300ps

RFAD LAB, YONSEI University

Page 16: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Simulation Setup

– Developed POINTS (Photonic On-chip Interconnection Network Traffic Simulator)

– 36-core CMP, 6X6 Planar layout, 22nm CMOS tech.– The chip size is assumed to be 20 mm along its edge, so each

core is 3.3 X 3.3 mm in size.– The network is a 6 X 6 folded-torus network augmented with

36 gateway access points, so it uses a matrix of 12 X 12 switches.

– A propagation velocity of 15.4 ps/mm in a silicon waveguide for the optical signals

– The inter-PSE delay and interrouter delay are, therefore, 13 and 220 ps, respectively

– The PSE setup time is assumed to be 1 ns and the router processing latency is 600 ps

RFAD LAB, YONSEI University

Page 17: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Dealing with Deadlock

– Deadlock : 1. 프로그램 1 이 자원 A 를 요청하여 , 그것을 할당받았다 . 2. 프로그램 2 가 자원 B 를 요청하여 , 그것을 할당받았다 . 3. 프로그램 1 이 자원 B 를 추가로 요청하였으나 , 자원 B 가 다른 프로그램에 의해 사용 중이므로 , 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다 . 4. 프로그램 2 가 자원 A 를 추가로 요청하였으나 , 자원 A 가 다른 프로그램에 의해 사용 중이므로 , 사용 가능한 상태가 될 때까지 대기 열에서 기다리고 있다 .

Page 18: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Optimizing Message Size

– Large messages → Link utilization is compromised and serialization latency is increased.

– Small messages → The relative overhead of the path-setup latency becomes too large and efficiency is degraded.

Page 19: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Optimizing Message Size

– The optimal DMA block size for the transactions over the photonic NoC ranges between 4 and 16 Kbytes

Page 20: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Increasing Path Multiplicity

Page 21: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

DESIGN ANALYSIS AND OPTIMIZATION Evaluating Path-setup Procedures

– Reductions in path-setup latency translate to improved efficiency of the network interfaces and to higher average bandwidth.

– tq is a major contributor to the overall setup latency– Some of the Technique is mentioned to reduce the tq.

(Immediately dropping any path-setup packet that is blocked instead of buffering it)

Page 22: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

COMPARATIVE POWER ANALYSIS Power Analysis → The main motivation for the

design of a photonic NoC– To evaluate this power analysis, perform a comparative high

level power analysis.

Condition of Power Analysis – Same bandwidth & same number of processing core– Assume : 22nm CMOS technology, hosting 36 processing

cores, each requiring a peak bandwidth 800 Gbps, average bandwidth 512 Gbps

– Assume : uniform traffic model, mesh topology, XY dimension-order routing

RFAD LAB, YONSEI University

Page 23: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

COMPARATIVE POWER ANALYSIS Reference Electronic NoC

1. Reading from a buffer (for high-BW, Large parallel line is required)

2. Traversing the routers’ internal crossbar,3. Transmission across the interrouter link,4. Writing to a buffer in the subsequent router, and5. Triggering an arbitration decision.RFAD LAB, YONSEI University

Page 24: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

COMPARATIVE POWER ANALYSIS Proposed Photonic NoC

1. The photonic data-transfer network (6X6 CMP)Path multiplicity factor : 2 → 12 X 12 Photonic mesh (576 PSEs)Power of PSE : On state → 10 mW, Off state → no dissipationTotal Power consumption (statistic)

2. Electronic Control network (6X6 CMP)Each photonic message is accompanied by two 32-bit control packets and the typical size of a message is 2 Kbytes.

Page 25: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

COMPARATIVE POWER ANALYSIS Proposed Photonic NoC

3. The electronic control network960 Gbps BW → 40 Gbps X 24 Wavelengths → 24 modulator and receiver is required.We estimate that Silicon ring-resonator modulator, SiGe photo-detectors the energy will decrease to about 0.2 pJ/bit in the next 8-10 years

(Supplementary circuits that are usually required for the implementation of optical receivers(CDR,serializer etc) are not needed in an ultrashort link in which the modulation rate is equal to the chip clock rate)(The off-chip laser sources consume an estimated power of 10 mW per wavelength. Although a large number of lasers are required to exploit the bandwidth potential of the optical NoC, their power is dissipated off-chip and does not contribute to the chip power density)

Page 26: Assaf Shacham , Member, IEEE,  Keren  Bergman, Senior Member, IEEE, and

CONCLUSION The motivation behind our work

– 1. Multicore processors step into an era where high bandwidth communications between large numbers of cores is a key driver of computing performance.

– 2. Power dissipation has clearly become the limiting factor in the design of high-performance microprocessors

– 3. Recent breakthroughs in the field of silicon photonics suggest that the integration of optical elements with CMOS electronics is likely to become viable in the near future.

This paper aims at laying the groundwork for future research progress by providing a complete discussion of the fundamental issues that need to be addressed to design a photonic NoC for CMPs


Top Related