onchip interconnect exploration for multicore processors utilizing fpgas graham schelle and dirk...

12
Onchip Onchip Interconnect Interconnect Exploration for Exploration for Multicore Multicore Processors Processors Utilizing FPGAs Utilizing FPGAs Graham Schelle and Dirk Grunwald Graham Schelle and Dirk Grunwald University of Colorado at Boulder University of Colorado at Boulder

Upload: ashley-barrett

Post on 19-Jan-2018

212 views

Category:

Documents


0 download

DESCRIPTION

Network on Chip Defined (in 1 slide!) Power/design concerns in modern processors lead to multicore chips Transistors seen as “free” allowing more transistors for non- computational tasks Network on Chip Networking scales to infinite number of access points and is well understood High speed clocking leads to signals not propagating across chip in single cycle

TRANSCRIPT

Page 1: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

Onchip Onchip Interconnect Interconnect

Exploration for Exploration for MulticoreMulticoreProcessors Processors

Utilizing FPGAsUtilizing FPGAsGraham Schelle and Dirk GrunwaldGraham Schelle and Dirk GrunwaldUniversity of Colorado at BoulderUniversity of Colorado at Boulder

Page 2: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

OutlineOutline Network on Chip (NoC) defined Network on Chip (NoC) defined Current onchip interconnect toolsCurrent onchip interconnect tools NoCem (NoC Emulator) specificationNoCem (NoC Emulator) specification What else is needed before releaseWhat else is needed before release

We want it to be used…and citedWe want it to be used…and cited ConclusionsConclusions

Page 3: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

Network on Chip Defined (in Network on Chip Defined (in 1 slide!)1 slide!)

Power/design concerns in

modern processors lead

to multicore chips

Transistors seen as “free”

allowing more transistors for

non-computational

tasksNetwork on Chip

Networking scales to infinite

number of access points

and is well understood

High speed clocking leads to signals not propagating

across chip in single cycle

Page 4: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

Onchip Interconnects for Onchip Interconnects for FPGAsFPGAs

Existing Buses on FPGAsExisting Buses on FPGAs PLB,OPB,FSLPLB,OPB,FSL Can have multiple masters (e.g. Can have multiple masters (e.g.

processors)processors) Scale well for current uses of FPGAsScale well for current uses of FPGAs

Existing NoCsExisting NoCs Research projectsResearch projects Proprietary projectsProprietary projects Application specific (streaming…)Application specific (streaming…) Not built for parameterization, some Not built for parameterization, some

other VALID focusother VALID focus

Page 5: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

NoCem SpecificationNoCem Specification Synthesizable VHDLSynthesizable VHDL Heavy use of generics / generate Heavy use of generics / generate

statementsstatements Requires minimal Xilinx IP (FIFOs…)Requires minimal Xilinx IP (FIFOs…) To modify anythingTo modify anything

Change generics, everything automatically Change generics, everything automatically generatedgenerated

E.g. to go from 2x2 mesh with 16b datawidth E.g. to go from 2x2 mesh with 16b datawidth to 4x4 torus with 8b datawidth, change 3 lines to 4x4 torus with 8b datawidth, change 3 lines of code!of code!

Page 6: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

NoCem InterfaceNoCem Interface FIFO-ishFIFO-ish

Enqueue and dequeue path for every Enqueue and dequeue path for every access pointaccess point

Packet Control and Data pathsPacket Control and Data paths Meaning of those paths depends on Meaning of those paths depends on

NoC configurationNoC configuration DatapathDatapath

Only variable width. Length of packet Only variable width. Length of packet determined by packet controldetermined by packet control

Packet control: src, dest, packet lengthPacket control: src, dest, packet length Underlying Network reads toplevel packet Underlying Network reads toplevel packet

structure, reads correct fields at correct timesstructure, reads correct fields at correct times

Page 7: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

NoCem BridgesNoCem Bridges Use Existing Buses, bridge to NoCUse Existing Buses, bridge to NoC

Integration into existing Xilinx tool flowsIntegration into existing Xilinx tool flows NoC can look like memory, SoC, …NoC can look like memory, SoC, … Use IPIF interface Use IPIF interface

PLB, OPBPLB, OPB Different bus widths…Different bus widths… But processors both 32bBut processors both 32b

Page 8: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

How Big is NoCem?How Big is NoCem?NoC NoC

DimensioDimensionsns

DatawidtDatawidthh

LUTsLUTs xc2vp30 xc2vp30 LUTs LUTs usedused

2x22x2 16b16b 4,0864,086 14%14%3x33x3 16b16b 11,69311,693 42%42%4x44x4 16b16b 21,57021,570 78%78%2x22x2 32b32b 5,8225,822 21%21%3x33x3 32b32b 16,39416,394 59%59%4x44x4 32b32b 34,37034,370 125%125%

Mesh, 16-deep channel FIFOs, RR Arbitration

Page 9: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

Example UsesExample Uses Memory Architecture (in paper)Memory Architecture (in paper)

Various distributed cache configurationsVarious distributed cache configurations Asymmetric Processor ConfigurationAsymmetric Processor Configuration

Using Microblaze, PowerPCUsing Microblaze, PowerPC Special Processor OffloadsSpecial Processor Offloads

Floating Point, Network ProcessingFloating Point, Network Processing

All can be emulated over NoC using All can be emulated over NoC using NoCem…NoCem…

Page 10: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

For ReleaseFor Release We want NoCem to be used!We want NoCem to be used!

Already in use at CU BoulderAlready in use at CU Boulder Full source will be made available Full source will be made available

onlineonline To do for release To do for release

Clean/zip up codeClean/zip up code Some DocumentationSome Documentation

ETA: April 2006ETA: April 2006

Page 11: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

ConclusionsConclusions NoCem as a research toolNoCem as a research tool

Open sourceOpen source Non-proprietaryNon-proprietary Non application SpecificNon application Specific

NoCem for multicore processor researchNoCem for multicore processor research Allows NoC explorationAllows NoC exploration Easy integration into Xilinx EDK flowEasy integration into Xilinx EDK flow Useful for a variety of research topics in this Useful for a variety of research topics in this

spacespace

Page 12: Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald…

Any Questions?Any Questions?