onchip interconnect exploration for multicore processors utilizing fpgas graham schelle and dirk...
DESCRIPTION
Network on Chip Defined (in 1 slide!) Power/design concerns in modern processors lead to multicore chips Transistors seen as “free” allowing more transistors for non- computational tasks Network on Chip Networking scales to infinite number of access points and is well understood High speed clocking leads to signals not propagating across chip in single cycleTRANSCRIPT
Onchip Onchip Interconnect Interconnect
Exploration for Exploration for MulticoreMulticoreProcessors Processors
Utilizing FPGAsUtilizing FPGAsGraham Schelle and Dirk GrunwaldGraham Schelle and Dirk GrunwaldUniversity of Colorado at BoulderUniversity of Colorado at Boulder
OutlineOutline Network on Chip (NoC) defined Network on Chip (NoC) defined Current onchip interconnect toolsCurrent onchip interconnect tools NoCem (NoC Emulator) specificationNoCem (NoC Emulator) specification What else is needed before releaseWhat else is needed before release
We want it to be used…and citedWe want it to be used…and cited ConclusionsConclusions
Network on Chip Defined (in Network on Chip Defined (in 1 slide!)1 slide!)
Power/design concerns in
modern processors lead
to multicore chips
Transistors seen as “free”
allowing more transistors for
non-computational
tasksNetwork on Chip
Networking scales to infinite
number of access points
and is well understood
High speed clocking leads to signals not propagating
across chip in single cycle
Onchip Interconnects for Onchip Interconnects for FPGAsFPGAs
Existing Buses on FPGAsExisting Buses on FPGAs PLB,OPB,FSLPLB,OPB,FSL Can have multiple masters (e.g. Can have multiple masters (e.g.
processors)processors) Scale well for current uses of FPGAsScale well for current uses of FPGAs
Existing NoCsExisting NoCs Research projectsResearch projects Proprietary projectsProprietary projects Application specific (streaming…)Application specific (streaming…) Not built for parameterization, some Not built for parameterization, some
other VALID focusother VALID focus
NoCem SpecificationNoCem Specification Synthesizable VHDLSynthesizable VHDL Heavy use of generics / generate Heavy use of generics / generate
statementsstatements Requires minimal Xilinx IP (FIFOs…)Requires minimal Xilinx IP (FIFOs…) To modify anythingTo modify anything
Change generics, everything automatically Change generics, everything automatically generatedgenerated
E.g. to go from 2x2 mesh with 16b datawidth E.g. to go from 2x2 mesh with 16b datawidth to 4x4 torus with 8b datawidth, change 3 lines to 4x4 torus with 8b datawidth, change 3 lines of code!of code!
NoCem InterfaceNoCem Interface FIFO-ishFIFO-ish
Enqueue and dequeue path for every Enqueue and dequeue path for every access pointaccess point
Packet Control and Data pathsPacket Control and Data paths Meaning of those paths depends on Meaning of those paths depends on
NoC configurationNoC configuration DatapathDatapath
Only variable width. Length of packet Only variable width. Length of packet determined by packet controldetermined by packet control
Packet control: src, dest, packet lengthPacket control: src, dest, packet length Underlying Network reads toplevel packet Underlying Network reads toplevel packet
structure, reads correct fields at correct timesstructure, reads correct fields at correct times
NoCem BridgesNoCem Bridges Use Existing Buses, bridge to NoCUse Existing Buses, bridge to NoC
Integration into existing Xilinx tool flowsIntegration into existing Xilinx tool flows NoC can look like memory, SoC, …NoC can look like memory, SoC, … Use IPIF interface Use IPIF interface
PLB, OPBPLB, OPB Different bus widths…Different bus widths… But processors both 32bBut processors both 32b
How Big is NoCem?How Big is NoCem?NoC NoC
DimensioDimensionsns
DatawidtDatawidthh
LUTsLUTs xc2vp30 xc2vp30 LUTs LUTs usedused
2x22x2 16b16b 4,0864,086 14%14%3x33x3 16b16b 11,69311,693 42%42%4x44x4 16b16b 21,57021,570 78%78%2x22x2 32b32b 5,8225,822 21%21%3x33x3 32b32b 16,39416,394 59%59%4x44x4 32b32b 34,37034,370 125%125%
Mesh, 16-deep channel FIFOs, RR Arbitration
Example UsesExample Uses Memory Architecture (in paper)Memory Architecture (in paper)
Various distributed cache configurationsVarious distributed cache configurations Asymmetric Processor ConfigurationAsymmetric Processor Configuration
Using Microblaze, PowerPCUsing Microblaze, PowerPC Special Processor OffloadsSpecial Processor Offloads
Floating Point, Network ProcessingFloating Point, Network Processing
All can be emulated over NoC using All can be emulated over NoC using NoCem…NoCem…
For ReleaseFor Release We want NoCem to be used!We want NoCem to be used!
Already in use at CU BoulderAlready in use at CU Boulder Full source will be made available Full source will be made available
onlineonline To do for release To do for release
Clean/zip up codeClean/zip up code Some DocumentationSome Documentation
ETA: April 2006ETA: April 2006
ConclusionsConclusions NoCem as a research toolNoCem as a research tool
Open sourceOpen source Non-proprietaryNon-proprietary Non application SpecificNon application Specific
NoCem for multicore processor researchNoCem for multicore processor research Allows NoC explorationAllows NoC exploration Easy integration into Xilinx EDK flowEasy integration into Xilinx EDK flow Useful for a variety of research topics in this Useful for a variety of research topics in this
spacespace
Any Questions?Any Questions?