a programmable adaptive router for a gals parallel system
Post on 11-Jan-2016
32 Views
Preview:
DESCRIPTION
TRANSCRIPT
A Programmable Adaptive Router for a GALS Parallel System
Jian Wu
APT GroupUniversity of Manchester
May 2009
SpiNNaker System for Neural Simulation
• Massively-Parallel (1 million ARMs)• Massive neural net
simulations (1 billion
neurons in real time)
GALS infrastructure
• Fault-tolerant
Node = SpiNNaker CMP + large off-chip memory
SpiNNaker Chip
Router Requirements
Operation requirements: Route multicast, point-to-point and nearest-neighbour packets. Reprogrammable at run-time. Provide an external interface to system resources. Fault-tolerant operation. Power efficiency.
Bandwidth Requirements: ~7.4Gb/s On-Chip traffic: (20-1)procs x 1000neurons x 72bit x 1000Hz = 1.368Gb/s Inter-chip traffic: 1Gb/s x 6 links = 6Gb/sBandwidth Target = 72bit x 200MHz = 14.4Gb/s
Router architecture
Packet checking: - Check packet for errors and enable appropriate routing engine
Multicast (MC) router: - Route neural spikes according to their source address
Point-to-Point (P2P) router: - Route system management and control information packets.
Nearest-neighbor (NN) router: - Route system boot-up and debugging info - Provide external I/F to resources
Adaptive routing: - Redirect blocked packets
Router Interface to system NoC: - AHB Master and Slave Interfaces
Multicast Router
Default and Adaptive Routing
• Route packets “across chip” by default (save RT entries!)
Automatically re-route packets destined to congested or failed links
Interfacing with System NoC
• Nearest-Neighbour packets are diverted to the System NoC.
Programming data is sourced from the System NoC.
Elastic Buffering
The spiking rate for the great majority of neurons is low -just a few Hz: Pipeline “ bubbles” between valid packets.
There can be more than one request to the datapath issued in the same clock cycle.
The adaptive routing mechanism stalls the pipeline to find an alternative path for the congested packet.
Simple, synthezisable design: Use ordinary flip-flops for data latching. Use a global, combinatorial circuit to generate stall signals
Elastic Buffering
Pipeline1 Pipeline2 Pipeline3
PipelineControl
PipelineControl
PipelineControl
Flag1 Flag2 Flag3
Disable Disable Disable
Ba
ck P
ressu
re
Input Interchangeable Buffer
Used for flow control at the head of the pipeline.
One register is used in normal operation
The second is used when a stall occurs in the next stage
The delay is re-introduced when the stall is removed
Parallel-Path Synchronizer
Avoid 2-cycle penalty to increase throuhgput
Packet Drop Rate
Power vs. Traffic Load
Power Distribution
Power distribution under full traffic load
Power distribution under 10% traffic load
Thank you
top related