configuring a large-scale gals system m.m. khan*, j. navaridas†, l.a. plana*, m. luj´an*, j.v...
TRANSCRIPT
![Page 1: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/1.jpg)
Configuring a Large-Scale GALS System
M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*,
J.V Woods*, J. Miguel-Alonso† and S.B. Furber*
*School of Computer Science, The University of Manchester, UK
†University of The Basque Country, Spain
![Page 2: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/2.jpg)
SpiNNaker
• Objectives– High-performance– Robust– Low-power
![Page 3: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/3.jpg)
SpiNNaker CMP
• System RAM
• Boot ROM
• MC Router
• Sys. Controller
• Ethernet
• SDRAM
• 20 Proc. Nodes
![Page 4: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/4.jpg)
Processing Node
• ARM968E-S
• Comm. Ctlr.
• Interrupt Ctlr.
• DMA Ctlr.
• Timer
• TCM (100K)
![Page 5: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/5.jpg)
Communication Network• MC Router• Packets
– MC– P2P– NN
• 1Gb/s inter-chip
• 6Gb/s per Node
• Six two-way inter-chip links
*L.A. Plana et al.An On-Chip and Inter-Chip Communications Network for the Spinnaker Massively-Parallel Neural Net Simulator. In Proc. Second ACM/IEEE
International Symposium on Networks-on-Chip (NoCS 2008), pages 215 – 216, 2008.
![Page 6: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/6.jpg)
Performance• 64K CMPs
• > 1m ARM968
• 256 tera IPS computing power
• >8 TB memory
• 6 Gb/s/Node Comm. NoC (spike channel)
• 1 Gb/s System NoC (synaptic channel)
• 109 neurons in real-time
![Page 7: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/7.jpg)
Fault-tolerance
• Redundancy
• Fault-detection and Isolation
• Fault-recovery
• Min. single-point-of-failure
• Run-time configuration
• Run-time recovery
• Run-time application loading
![Page 8: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/8.jpg)
Low-power• Hardware
– Asynchronous Communication– Low-power ARM968
• Software– Asynchronous Event-Driven Model
![Page 9: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/9.jpg)
Standard Application Model
• Sleepy processors
• Event-driven application
• No scheduler• No software
threads• Only ISRs• Driven by
Interrupts
![Page 10: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/10.jpg)
Configuration Process-I
• Min Boot-ROM code
• POST+chip components initialization
• Batch mode
POST
Load Boot code in TCM
Select Monitor Proc.
Configure Interrupts
Monitor
Configure Chip
Go to Sleep
yes
no
![Page 11: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/11.jpg)
Configuration Process-II
• Event-driven Model
• Real-time Configuration
• Processors on Sleep
Recovery
Host System Comm.
Assign (0, 0)
Status to Host Chip
Host Chip
Frame + Packet Comm.Packet Comm.
Acc. Status to Host
Assign (x, y)
Conf. RouterConf. Router
yesno
![Page 12: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/12.jpg)
Flood-fill Mechanism• Event-driven
model
• Droplets of data block to origin chip(s)
• A pipelined wave of data from origin(s) to other chips
1 Ethernet Connection
2 Ethernet Connections
animations from http://physics-animations.com/Physics/English/int_ref.htm#Wlb
![Page 13: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/13.jpg)
Flood-fill Mechanism
• Various Mechs.– Broadcast– 5 Chips fwd– 3 Chips fwd– 2 Chips fwd
• Performance Vs robustness
![Page 14: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/14.jpg)
Evaluation
• SystemC system-level model
• Cycle-accurate• Instruction
accurate• 129706 cycles
for configuration process-I
![Page 15: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/15.jpg)
Evaluationa) Impact of System Size
1Eth - 8 KB application + 16 KB data
200
400
600
800
32x32 64x64 128x128 256x256
cycl
es (
thou
sand
s)
2msg
3msg
5msg
bcast
![Page 16: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/16.jpg)
Evaluation
0
5
10
15
20
32KB + 32KB 32KB + 64KB 32KB +128KB 32KB +256KB 32KB +512KB
Boo
ting
tim
e (M
illi
ons
of c
ycle
s) .
2msg
3msg
5msg
bcast
b) Impact of Data SizeApplication+Data
![Page 17: Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer](https://reader036.vdocuments.mx/reader036/viewer/2022062422/56649f295503460f94c4235f/html5/thumbnails/17.jpg)
Conclusions