centralized traffic monitoring for online … 2013 – 8th international workshop on reconfigurable...
TRANSCRIPT
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
Centralized Traffic Monitoring for online-resizable Clusters in Networks-on-Chip
Philipp Gorski, Dirk Timmermann
Institute of Applied Microelectronics and Computer Engineering University of Rostock, Germany
{philipp.gorski2, dirk.timmermann}@uni-rostock.de
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
Outline
• Introduction
– Networks-on-Chip (NoC)
• Traffic Monitoring
– Basic Concept
– Dual NoC Infrastructure
– Hardware/Software-based Clustering
– Sensing and Flow
– Experimental Results
• Outlook & Future Work
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
2
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
3
Introduction – Networks-on-Chip
Lin
k
CORE[3,3]
CORE[3,2]
CORE[3,1]
CORE[3,0]
CORE[2,3]
CORE[2,2]
CORE[2,1]
CORE[2,0]
CORE[1,3]
CORE[1,2]
CORE[1,1]
CORE[1,0]
Y
X
Link Link
Link Link
Link Link
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
R R
R R
CORE[0,3]
Link Link
Lin
k
Lin
k
Lin
k
R RNI
R R Link
Link
Link
Lin
kLi
nk
R
R
Link
Lin
k
R
R
R
R
R
R
CORE[0,2]
NI
CORE[0,1]
NI
CORE[0,0]
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
NI
CELL/TILE
NoC with 2D-MESH TOPOLOGY (Nx=4, Ny=4)
WES
T P
OR
T EAST P
OR
T
NORTH PORT
CORE
2D-MESHData-NoC
Router
NI SOUTH PORT
LINK I/O
LINK I/O
LIN
K I/
O
LINK
I/O
IN OUT
OUT IN
INO
UTIN
OU
T
LINKWIDTH = 8, 16, 32, 64 , …
• NoC Routers (R), Links, Network-Interfaces (NI), Topology
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
4
Introduction – Why Traffic Monitoring?
• Increasing parallelism in CMP (64, 128, 256, 512, … Ipcores)
• Communication becomes dominant for performance and energy consumption
• Runtime-based management mechanisms need current traffic information
– Application Mapping / Workload Management
– Adaptive Routing / Traffic Management
– Application Profiling / Debugging
– Wear-out and Degradation Management
• To achieve cooperative interaction of these mechanisms a common information base is needed
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
Outline
• Introduction
– Networks-on-Chip (NoC)
• Traffic Monitoring
– Basic Concept
– Dual NoC Infrastructure
– Hardware/Software-based Clustering
– Sensing and Flow
– Experimental Results
• Outlook & Future Work
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
5
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
6
Traffic Monitoring – Basic Concept • Design criteria:
– Monitoring inside a Region of Interest
– Monitoring with a Resolution of Interest
– Reuse of Information/Infrastructure
• Flexible hardware/software solution
– Counter-based activity sensing/aggregation in hardware
– Monitoring data evaluation and management in software
• Centralized collection of all link- and pathloads inside a defined Region
– All loads scaled to 0-100% with resolution of 1, 2 or 4% in hardware (ready to use)
• Adjustable timing for each Region/Cluster
– 10³ to 105 clock cycles for full data set
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
7
Traffic Monitoring – Dual NoC Infrastructure
R
R
R
R
CORE[3,3]SNI
CORE[3,2]SNI
CORE[3,1]SNI
CORE[3,0]SNIR
R
R
R
CORE[2,3]SNI
CORE[2,2]SNI
CORE[2,1]SNI
CORE[2,0]SNIR
R
R
R
CORE[1,3]SNI
CORE[1,2]SNI
CORE[1,1]SNI
CORE[1,0]SNIR
R
R
Y
X
Data-NoC System-NoC
Link Link
Link Link
Link Link
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
R R
R R
R
CORE[0,3]
Link Link
Lin
k
Lin
k
Lin
k
R RDNI
R R Link
Link
Link
Lin
kLi
nk
R
R
Link
Lin
k
R
R
R
R
R
R
SNI
CORE[0,2]
DNI
SNI
CORE[0,1]
DNI
SNI
CORE[0,0]
DNI
SNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
R : Router NodeDNI : Data-NoC Network InterfaceSNI : System-NoC Network InterfaceMaster Cores
atile
bti
le
NY C
ELLs
NX CELLs
CELL/TILE
• Data-NoC for application data
• System-NoC for monitoring
• Minimal design and runtime interferences
• IP-Cores now has two NIs
– Data-NoC Interface (DNI)
– System-NoC-Interface (SNI)
• Smallest management unit is the CELL/TILE
• Two types of CELLs
– Master (CPU)
– Slave
• Full reuse!
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
8
Traffic Monitoring – HW/SW-based Clustering
• Rectangular shaped group of CELLs – Lower Left Corner (LLC)
– Upper Right Corner (URC)
– At least one Master CELL
• No overlapping of Clusters
• One Master CELL per Cluster runs monitoring software
• Each CELL monitors own loads – Cluster paths
– Router links
• Hardware extensions at each CELL needed
• Maximum size is NCLmax [16, 64]
R
R
R
R
CORE[3,3]SNI
CORE[3,2]SNI
CORE[3,1]SNI
CORE[3,0]SNIR
R
R
R
CORE[2,3]SNI
CORE[2,2]SNI
CORE[2,1]SNI
CORE[2,0]SNIR
R
R
R
CORE[1,3]SNI
CORE[1,2]SNI
CORE[1,1]SNI
CORE[1,0]SNIR
R
R
Y
Link Link
Link Link
Link Link
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
R R
R R
R
CORE[0,3]
Link Link
Lin
k
Lin
k
Lin
k
R RDNI
R R Link
Link
Link
Lin
kLi
nk
R
R
Link
Lin
k
R
R
R
R
R
R
SNI
CORE[0,2]
DNI
SNI
CORE[0,1]
DNI
SNI
CORE[0,0]
DNI
SNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
2x2 CLUSTER 1 [ LLC=(0,2) ; URC=(1,3) ] 2x2 CLUSTER 2 [ LLC=(2,2) ; URC=(3,3) ]
4x2
CLU
STER
3 [
LLC
=(0
,0)
; UR
C=(
3,1
) ]
atile
bti
le
NY C
ELLs
NX CELLs
CELL/TILE
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
9
Traffic Monitoring – Sensing and Flow
• Hierarchical aggregation structure of 3 interacting stages:
– Stage 1 – Traffic Sensors: Activity sensing of links and paths at CELLs
• Path Sensors uses REQ/ACK-signals of DNI for destinations inside region REAL LOAD!
• Link Sensors uses lock signals of arbiter device REAL LOAD + CONGESTION!
• Counting of active clock cycles until overflow happens at defined bound
– Stage 2 – SNI-Extension: Periodic checking and reporting at CELLs
• Finite state machine periodically checks Traffic Sensors for overflows Sensor Check Period
• Generates and transmits monitoring packets to Master if overflows happened
– Stage 3 – Event Aggregation Point: Master counts reported overflows
• Each Traffic Sensor has a corresponding overflow counter at the Event Aggregation Point (EAP)
• Event Aggregation Point only at Master CELLs
• Periodic access of counter values via Core Interface (CI) Monitoring Cycle
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
10
Traffic Monitoring – Sensing and Flow
PATH-LUT Traffic Sensors
SNI-Extension
OFGs
RESET
LOCK DST
Packetization Unit @ DNIPI
DST
DNoC Input Port (CORE)
REQFLIT DATA
ACK
PATH-ENABLE
LINK-ENABLES LINK_BUSY Signals (Data-NoC)
Depacketization Unit @ DNI DNoC Output Port (CORE)
ACKFLIT DATA
REQ
Input Buffer @ DNI
Output Buffer @ DNI
IP-CORE
Output Buffer @ SNI
Input Buffer @ SNI
Packetization Unit @ SNI SNoC Input Port (CORE)
REQFLIT DATA
ACK
Depacketization Unit @ SNI SNoC Output Port (CORE)
ACKFLIT DATA
REQ
MC-ADR
T-MODES | LLC | URC | GROUP-ID
Event Aggregation Point(Counter Array)
CONFIG
EAP
-CI
I.
II.
Data-NoC
System-NoC
Stage 1
Stage 2
Stage 3
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
11
MODETijctrl
ctrl
MODETijctrl
ij
ij
ij
btcnts
s
btcnts
tcnt
tcnt
tcnt
)()1(
)0(
)()1(
0
)(
1)(
)( 1
else
rst
btcnts
tOFG
tOFG
MODETictrl
i
i )1(
)()1(
)(
0
1
)( 1
clkii ttt 1
Traffic Monitoring – Sensing and Flow • Stage 1 – Traffic Sensors
11 10 9 8 7 6 5 4 3 2 1 0
COMPARATORR
7
8
9
10
11
T-MODE 4 [128]
T-MODE 3 [256]
T-MODE 2 [512]
T-MODE 1 [1024]
T-MODE 0 [2048]
COUNTER
CLKRESET
OFGOFG-RESET
ENABLE
T-MODE
6 T-MODE 5 [64]Implementation: • 12-bit counter • 12-bit comparator • ENABLE = sctrl
• Six bT-MODE [64 – 2048] • OFG = Overflow bit • j < (NCLmax + 5) per CELL
Activity Timing in Clock Cycles
COUNTER COMPARATOR
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
12
0)()(1
0
SN
j
spjsp tOFGtcheck
Traffic Monitoring – Sensing and Flow • Stage 2 – SNI-Extension
clkMODETsp tbt
FSM: Sensor Check Period
TS – CORE ALL OUT
MU
X
MU
X
WEST_LINK_BUSY
SOUTH_LINK_BUSY
EAST_LINK_BUSY
NORTH_LINK_BUSY
PATH-ENABLE == ACK
DST
FSM
SELECT RESET
DATA-OUT
CORE_LINK_BUSY
OFG-CHECK
ENABLE
CLUSTER-CHECK+
ID-GENERATOR
TIMER
T-MODES
LLCCLi
URCCLi
TS – PATH 0
TS – PATH (NCLmax-1)
TS – LINK NORTH
TS – LINK EAST
TS – LINK SOUTH
TS – LINK WEST
...
TS – LINK CORE
OFGs
CONFIG
T-M
OD
ES
PERIOD-TRIGGER
GROUP-ID
MU
X
CTRL
Implementation: • Check via Xoring all OFG • Read & Reset all OFG • Packet composition via FSM
• bT-MODE set during Cluster Setup • bT-MODE equal at all Cluster CELLs
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
13
1
)0()0)((
)0()1)((
0
)(
1)(
)( 1
1
1
rst
rsttOFG
rsttOFG
tload
tload
tload spj
spj
spj
spj
spj
Traffic Monitoring – Sensing and Flow • Stage 3 – Event Aggregation Point
Event Counter 0
Event Counter ...
Event Counter (NS-1)GR
OU
P 0
Event Counter 0
Event Counter ...
Event Counter (NS-1)GR
OU
P ..
.
Event Counter 0
Event Counter ...
Event Counter (NS-1)
GR
OU
P (
NC
Lmax
-1)
MU
X
0
...
1
GR
OU
P ID
GROUP-SELECT
EAP-CI(Registers + Bus-Interface)
GROUP-RESETS
GR
OU
P (
NC
Lmax
-1)
VA
LUES
GR
OU
P ..
. VA
LUES
GR
OU
P 0
VA
LUES
0
1
GR
OU
P ID
1
1
GR
OU
P ID
0
0
GR
OU
P ID
EVEN
T B
UFF
ER
...
...
...
Unscaled Loads!
Implementation: • 7-bit counter GROUPs • (NCLmax + 5) counter per GROUP • NCLmax GROUPs (for each CELL) • NCLmax = [16, 64] analysed • Event Buffer stores incoming packets • Vector order = Counter order • Periodic read and reset by software • Intermediate read by software • Direct Access via Core Interface (CI) • Scaling of loads via Access Timing!
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
14
254
502
1001
2
1
SPs
SPs
SPs
j
j
j
j
Nk
Nk
Nk
load
load
load
sload
spSPMC tNT
sSP kN /100
Traffic Monitoring – Sensing and Flow • Load Scaling via Access Timing
– Stepping ks={1,2,4} sload=0:ks:100 in % of max BW per Link/Path
– NSP = # of Sensor Periods tsp=f(tclk, bT-MODE) per Monitoring Cycle
– TMC = Monitoring Cycle
Scaled Loads!
• min(bT-MODE)=f(Clustersize)
– Traffic Monitoring Many-to-One Pattern
– Condition: (Injected BW in Cluster < Receivable BW at Master) • 16 CELLs min(bT-MODE) = 128
• 64 CELLs min(bT-MODE) = 1024
– If condition met sload ± emax with emax ≤ 2∙ks!!!
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
15
R
R
R
R
CORE[3,3]SNI
CORE[3,2]SNI
CORE[3,1]SNI
CORE[3,0]SNIR
R
R
R
CORE[2,3]SNI
CORE[2,2]SNI
CORE[2,1]SNI
CORE[2,0]SNIR
R
R
R
CORE[1,3]SNI
CORE[1,2]SNI
CORE[1,1]SNI
CORE[1,0]SNIR
R
R
Y
X
Data-NoC System-NoC
Link Link
Link Link
Link Link
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
Lin
k
R R
R R
R
CORE[0,3]
Link Link
Lin
k
Lin
k
Lin
k
R RDNI
R R Link
Link
Link
Lin
kLi
nk
R
R
Link
Lin
k
R
R
R
R
R
R
SNI
CORE[0,2]
DNI
SNI
CORE[0,1]
DNI
SNI
CORE[0,0]
DNI
SNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
DNI
R : Router NodeDNI : Data-NoC Network InterfaceSNI : System-NoC Network InterfaceMaster Cores
atile
bti
le
NY C
ELLs
NX CELLs
CELL/TILE
Total Area Overhead per TILE @ 45nm for tclk=1ns, NX=8 and NY=8 NCLmax 16 CELLs 64 CELLs
SNoC Linkwidth 8-bit 16-bit 8-bit 16-bit Amaster /Atile 0.51% 0.66% 3.11% 3.26% Aslave /Atile 0.29% 0.44% 0.37% 0.52%
Traffic Monitoring – Experimental Results • Hardware Overhead of the Traffic Monitoring in 45nm
– Aslave = Area(RouterSNoC, LinksSNoC, Traffic Sensors, SNI-Extension)
– Amaster = Area(Anormal, EAP)
– Atile = 3mm x 3mm (estimate 45nm Intel SCC)
• EAP and Traffic Sensors dominate the hardware overhead
• But area estimates remain inside a feasible range!
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
• Full system simulation with mixed workloads
– Random Uniform Traffic Pattern • Increasing injection rates up to saturation of NoC
• Packetsize = rand(5,15) Flit
• Destination = rand(0, Nx·Ny - 1)
– Random Task Graphs with sequential mapping • 7 to 70 Tasks per Graph
• 2 to 10 Graphs per Workload
• Packetsize = rand(5,50) Flit
– Each Workload simulated 10 times with 10 full Monitoring Cycles for each Cluster shape (4x4, 8x2, 8x8, 4x16) and stepping (1, 2, 4)
– Logging of average and maximum errors for pathloads (PL) and linkloads (LL)
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
16
Traffic Monitoring – Experimental Results
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
17
0
1
2
3
4
5
6
7
8
9
0 0,05 0,1 0,15 0,2 0,25 0,3
Ab
solu
te E
rro
r e
max
Injection Rate [Flit per Clock Cycle] per Ipcore
PL-1 LL-1 PL-2 LL-2 PL-4 LL-4
Traffic Monitoring – Experimental Results • Max. error in 4x4 CELL Cluster @ Random : min(bT-MODE) = 128
ks = 4
ks = 2
ks = 1
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
18
1,95 1,98 1,88 1,98
3,88 3,97 3,81 3,97
7,94 7,94 7,94 7,94
0
1
2
3
4
5
6
7
8
9
Ab
solu
te E
rro
r e
max
maximum error
max. average error
Traffic Monitoring – Experimental Results • Max. Error all 16 CELL Cluster Scenarios: min(bT-MODE) = 128
ks = 4
ks = 2
ks = 1
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
19
1,23 1,62 1,60
1,95
2,81 3,28
3,47
3,97
5,83
7,05 7,14
7,91
0
1
2
3
4
5
6
7
8
9
Ab
solu
te E
rro
r e
max
maximum error
max. average error
Traffic Monitoring – Experimental Results • Max. Error all 64 CELL Cluster Scenarios: min(bT-MODE) = 1024
ks = 4
ks = 2
ks = 1
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
Outline
• Introduction
– Networks-on-Chip (NoC)
• Traffic Monitoring
– Basic Concept
– Dual NoC Infrastructure
– Hardware/Software-based Clustering
– Sensing and Flow
– Experimental Results
• Conclusion & Future Work
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
20
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
21
Conclusion & Future Work
• Flexible HW/SW traffic monitoring proposed
– All path- and linkloads inside a resizable region at a single entity
– Adjustable timing and accuracy
– Hardware overhead and achievable Monitoring Cycles are feasible
• Next steps and investigations Runtime Mechanisms
– Workload/Application Profiling (Rent‘s Rule) • Communication Distributions/Probabilities
• Execution Phase Detection
• Combination with Performance Counter Data
– Flow-based Traffic Management • Path adaptations inside Clusters
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
THANK YOU!
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
22
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
23
Traffic Monitoring – Experimental Results
Parameter Value NoC SNoC DNoC
Linkwidth wL 8-, 16-bit 64-bit Clock Rate tclk 1ns
Port Buffer Depth 1 flit 5 flit bT-MODE 64 – 2048 (Simulation: 128, 1024)
Cluster Size NCLmax 16 and 64 Cluster Shape 4×4, 8×2 and 16×4, 8×8
Monitor Position Lower Left Corner (LLC) Technology 45nm (Nangate FreePDK45)
• 45nm hardware synthesis via Synopsys Design Compiler – Estimation of hardware overhead
• SystemC-based cycle accurate NoC simulation – Measurement of absolute traffic monitoring error (max & avrg)
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
24
Traffic Monitoring – Basic Concept
Parameter Aggregation
Prognostic Services
SOFT
WA
RE
HA
RD
WA
RE
SYSTEM-MONITORING SYSTEM-CONTROL
GLO
BA
L SY
STEM
-LEV
EL
Algorithm Reconfiguration
Actors
System Adaptations
CLU
STER
/ R
EGIO
N
TILE
/ C
ELL
Data Aggregation
Sensors
Monitoring Evaluations
• Research scope More software-defined and cooperative runtime optimization
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
25
Introduction – Networks-on-Chip
PATH-LUTLOCK DST
Packetization Unit @ NIPI
Router Input Port (CORE)
REQ
FLIT DATA
ACK
Depacketization Unit @ NI Router Output Port (CORE)
ACK
FLIT DATA
REQ
Core Input Buffer @ NI
Core Output Buffer @ NI
CORE
0 h
ead
er
1 p
aylo
ad
2 p
aylo
ad
...
(N-1
) ta
il
Packet with N Flit
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
26
Traffic Monitoring – Sensing and Flow
Monitoring Cycle TMS [µs] for tclk=1ns bT-MODE ks=1 ks=2 ks=4
64 6.4 3.2 1.6 128 12.8 6.4 3.2 256 25.6 12.8 6.4 512 51.2 25.6 12.8
1024 102.4 51.2 25.6 2048 204.8 102.4 51.2
• Timing of Traffic Monitoring offers two options:
– Adjustment of bT-MODE At all Cluster CELLs
– Adjustment of Stepping ks At Master CELL only
– Tradeoff: Effort vs. Accuracy
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
27
0
5
10
15
20
25
30
35
40
45
50
0 5 10 15 20 25 30
Ave
rage
Lin
k Lo
ad [
%]
Average Injected Load per Core [%]
Linkload Sensors Pathload Sensors
Traffic Monitoring – Experimental Results • Pseudoload 8x8 CELL Cluster Scenarios: min(bT-MODE) = 1024
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
28
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70 80
Ave
rage
Lin
k Lo
ad [
%]
Average Injected Load per Core [%]
Linkload Sensors Pathload Sensors
Traffic Monitoring – Experimental Results • Pseudoload 4x4 CELL Cluster Scenarios: min(bT-MODE) = 128
ks = 1
Pseudoload Congestion External traffic
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
Introduction – Networks-on-Chip • Networks-on-Chip (NoC)
– Packet-based and globally asynchronous communication on-chip
– Replacement of bus-based interconnections Scalability & Parallelism
• Basic elements of the NoC:
– Ipcore = Computational resource that communicates via the NoC
– Network-Interface (NI) = Connection of Ipcore and NoC for reception/transmission of packets
– Router (R) = Switching units that lead packets through the NoC from source to destination Ipcore
– Link = Bidirectional point-to-point connections between Routers
– Topology = Connection Graph of Ipcores, Routers and Links
• Scalable on-chip communication for Chip Multiprocessor (CMP) Systems
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
29
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip
College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering
30
NORTH
EAST
SOUTH
WEST
CORE
CXflitCORE_IN
port_input_select
NORTH
EAST
SOUTH
WEST
CORE
reqack
RL
IBUF
out
flit
HLreq
ack
ARB
HLreqack
flit
flitOUT
reqack
Router-Pipeline (Input- to Output-Port)
LINKIN PORTIN PORTOUT LINKOUTCROSSBAR
IncomingTransmission
Buffer (IBUF)+
Routing (RL)
Crossbar Traversal
Arbitration (ARB)Outgoing
Transmission
Introduction – Networks-on-Chip
0 h
ead
er
1 p
aylo
ad
2 p
aylo
ad
...
(N-1
) ta
il
Packet with N Flit
• Here used: – 2D-Mesh Tolpology
– XY-Routing
– Wormhole Switching and REQ/ACK Flow Control
– Input Buffering
– Round Robin Arbitration