centralized traffic monitoring for online … 2013 – 8th international workshop on reconfigurable...

30
ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable Clusters in Networks-on-Chip Philipp Gorski, Dirk Timmermann Institute of Applied Microelectronics and Computer Engineering University of Rostock, Germany {philipp.gorski2, dirk.timmermann}@uni-rostock.de College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

Upload: lythuy

Post on 30-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

Centralized Traffic Monitoring for online-resizable Clusters in Networks-on-Chip

Philipp Gorski, Dirk Timmermann

Institute of Applied Microelectronics and Computer Engineering University of Rostock, Germany

{philipp.gorski2, dirk.timmermann}@uni-rostock.de

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

Page 2: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

Outline

• Introduction

– Networks-on-Chip (NoC)

• Traffic Monitoring

– Basic Concept

– Dual NoC Infrastructure

– Hardware/Software-based Clustering

– Sensing and Flow

– Experimental Results

• Outlook & Future Work

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

2

Page 3: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

3

Introduction – Networks-on-Chip

Lin

k

CORE[3,3]

CORE[3,2]

CORE[3,1]

CORE[3,0]

CORE[2,3]

CORE[2,2]

CORE[2,1]

CORE[2,0]

CORE[1,3]

CORE[1,2]

CORE[1,1]

CORE[1,0]

Y

X

Link Link

Link Link

Link Link

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

R R

R R

CORE[0,3]

Link Link

Lin

k

Lin

k

Lin

k

R RNI

R R Link

Link

Link

Lin

kLi

nk

R

R

Link

Lin

k

R

R

R

R

R

R

CORE[0,2]

NI

CORE[0,1]

NI

CORE[0,0]

NI

NI

NI

NI

NI

NI

NI

NI

NI

NI

NI

NI

NI

CELL/TILE

NoC with 2D-MESH TOPOLOGY (Nx=4, Ny=4)

WES

T P

OR

T EAST P

OR

T

NORTH PORT

CORE

2D-MESHData-NoC

Router

NI SOUTH PORT

LINK I/O

LINK I/O

LIN

K I/

O

LINK

I/O

IN OUT

OUT IN

INO

UTIN

OU

T

LINKWIDTH = 8, 16, 32, 64 , …

• NoC Routers (R), Links, Network-Interfaces (NI), Topology

Page 4: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

4

Introduction – Why Traffic Monitoring?

• Increasing parallelism in CMP (64, 128, 256, 512, … Ipcores)

• Communication becomes dominant for performance and energy consumption

• Runtime-based management mechanisms need current traffic information

– Application Mapping / Workload Management

– Adaptive Routing / Traffic Management

– Application Profiling / Debugging

– Wear-out and Degradation Management

• To achieve cooperative interaction of these mechanisms a common information base is needed

Page 5: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

Outline

• Introduction

– Networks-on-Chip (NoC)

• Traffic Monitoring

– Basic Concept

– Dual NoC Infrastructure

– Hardware/Software-based Clustering

– Sensing and Flow

– Experimental Results

• Outlook & Future Work

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

5

Page 6: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

6

Traffic Monitoring – Basic Concept • Design criteria:

– Monitoring inside a Region of Interest

– Monitoring with a Resolution of Interest

– Reuse of Information/Infrastructure

• Flexible hardware/software solution

– Counter-based activity sensing/aggregation in hardware

– Monitoring data evaluation and management in software

• Centralized collection of all link- and pathloads inside a defined Region

– All loads scaled to 0-100% with resolution of 1, 2 or 4% in hardware (ready to use)

• Adjustable timing for each Region/Cluster

– 10³ to 105 clock cycles for full data set

Page 7: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

7

Traffic Monitoring – Dual NoC Infrastructure

R

R

R

R

CORE[3,3]SNI

CORE[3,2]SNI

CORE[3,1]SNI

CORE[3,0]SNIR

R

R

R

CORE[2,3]SNI

CORE[2,2]SNI

CORE[2,1]SNI

CORE[2,0]SNIR

R

R

R

CORE[1,3]SNI

CORE[1,2]SNI

CORE[1,1]SNI

CORE[1,0]SNIR

R

R

Y

X

Data-NoC System-NoC

Link Link

Link Link

Link Link

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

R R

R R

R

CORE[0,3]

Link Link

Lin

k

Lin

k

Lin

k

R RDNI

R R Link

Link

Link

Lin

kLi

nk

R

R

Link

Lin

k

R

R

R

R

R

R

SNI

CORE[0,2]

DNI

SNI

CORE[0,1]

DNI

SNI

CORE[0,0]

DNI

SNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

R : Router NodeDNI : Data-NoC Network InterfaceSNI : System-NoC Network InterfaceMaster Cores

atile

bti

le

NY C

ELLs

NX CELLs

CELL/TILE

• Data-NoC for application data

• System-NoC for monitoring

• Minimal design and runtime interferences

• IP-Cores now has two NIs

– Data-NoC Interface (DNI)

– System-NoC-Interface (SNI)

• Smallest management unit is the CELL/TILE

• Two types of CELLs

– Master (CPU)

– Slave

• Full reuse!

Page 8: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

8

Traffic Monitoring – HW/SW-based Clustering

• Rectangular shaped group of CELLs – Lower Left Corner (LLC)

– Upper Right Corner (URC)

– At least one Master CELL

• No overlapping of Clusters

• One Master CELL per Cluster runs monitoring software

• Each CELL monitors own loads – Cluster paths

– Router links

• Hardware extensions at each CELL needed

• Maximum size is NCLmax [16, 64]

R

R

R

R

CORE[3,3]SNI

CORE[3,2]SNI

CORE[3,1]SNI

CORE[3,0]SNIR

R

R

R

CORE[2,3]SNI

CORE[2,2]SNI

CORE[2,1]SNI

CORE[2,0]SNIR

R

R

R

CORE[1,3]SNI

CORE[1,2]SNI

CORE[1,1]SNI

CORE[1,0]SNIR

R

R

Y

Link Link

Link Link

Link Link

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

R R

R R

R

CORE[0,3]

Link Link

Lin

k

Lin

k

Lin

k

R RDNI

R R Link

Link

Link

Lin

kLi

nk

R

R

Link

Lin

k

R

R

R

R

R

R

SNI

CORE[0,2]

DNI

SNI

CORE[0,1]

DNI

SNI

CORE[0,0]

DNI

SNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

2x2 CLUSTER 1 [ LLC=(0,2) ; URC=(1,3) ] 2x2 CLUSTER 2 [ LLC=(2,2) ; URC=(3,3) ]

4x2

CLU

STER

3 [

LLC

=(0

,0)

; UR

C=(

3,1

) ]

atile

bti

le

NY C

ELLs

NX CELLs

CELL/TILE

Page 9: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

9

Traffic Monitoring – Sensing and Flow

• Hierarchical aggregation structure of 3 interacting stages:

– Stage 1 – Traffic Sensors: Activity sensing of links and paths at CELLs

• Path Sensors uses REQ/ACK-signals of DNI for destinations inside region REAL LOAD!

• Link Sensors uses lock signals of arbiter device REAL LOAD + CONGESTION!

• Counting of active clock cycles until overflow happens at defined bound

– Stage 2 – SNI-Extension: Periodic checking and reporting at CELLs

• Finite state machine periodically checks Traffic Sensors for overflows Sensor Check Period

• Generates and transmits monitoring packets to Master if overflows happened

– Stage 3 – Event Aggregation Point: Master counts reported overflows

• Each Traffic Sensor has a corresponding overflow counter at the Event Aggregation Point (EAP)

• Event Aggregation Point only at Master CELLs

• Periodic access of counter values via Core Interface (CI) Monitoring Cycle

Page 10: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

10

Traffic Monitoring – Sensing and Flow

PATH-LUT Traffic Sensors

SNI-Extension

OFGs

RESET

LOCK DST

Packetization Unit @ DNIPI

DST

DNoC Input Port (CORE)

REQFLIT DATA

ACK

PATH-ENABLE

LINK-ENABLES LINK_BUSY Signals (Data-NoC)

Depacketization Unit @ DNI DNoC Output Port (CORE)

ACKFLIT DATA

REQ

Input Buffer @ DNI

Output Buffer @ DNI

IP-CORE

Output Buffer @ SNI

Input Buffer @ SNI

Packetization Unit @ SNI SNoC Input Port (CORE)

REQFLIT DATA

ACK

Depacketization Unit @ SNI SNoC Output Port (CORE)

ACKFLIT DATA

REQ

MC-ADR

T-MODES | LLC | URC | GROUP-ID

Event Aggregation Point(Counter Array)

CONFIG

EAP

-CI

I.

II.

Data-NoC

System-NoC

Stage 1

Stage 2

Stage 3

Page 11: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

11

MODETijctrl

ctrl

MODETijctrl

ij

ij

ij

btcnts

s

btcnts

tcnt

tcnt

tcnt

)()1(

)0(

)()1(

0

)(

1)(

)( 1

else

rst

btcnts

tOFG

tOFG

MODETictrl

i

i )1(

)()1(

)(

0

1

)( 1

clkii ttt 1

Traffic Monitoring – Sensing and Flow • Stage 1 – Traffic Sensors

11 10 9 8 7 6 5 4 3 2 1 0

COMPARATORR

7

8

9

10

11

T-MODE 4 [128]

T-MODE 3 [256]

T-MODE 2 [512]

T-MODE 1 [1024]

T-MODE 0 [2048]

COUNTER

CLKRESET

OFGOFG-RESET

ENABLE

T-MODE

6 T-MODE 5 [64]Implementation: • 12-bit counter • 12-bit comparator • ENABLE = sctrl

• Six bT-MODE [64 – 2048] • OFG = Overflow bit • j < (NCLmax + 5) per CELL

Activity Timing in Clock Cycles

COUNTER COMPARATOR

Page 12: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

12

0)()(1

0

SN

j

spjsp tOFGtcheck

Traffic Monitoring – Sensing and Flow • Stage 2 – SNI-Extension

clkMODETsp tbt

FSM: Sensor Check Period

TS – CORE ALL OUT

MU

X

MU

X

WEST_LINK_BUSY

SOUTH_LINK_BUSY

EAST_LINK_BUSY

NORTH_LINK_BUSY

PATH-ENABLE == ACK

DST

FSM

SELECT RESET

DATA-OUT

CORE_LINK_BUSY

OFG-CHECK

ENABLE

CLUSTER-CHECK+

ID-GENERATOR

TIMER

T-MODES

LLCCLi

URCCLi

TS – PATH 0

TS – PATH (NCLmax-1)

TS – LINK NORTH

TS – LINK EAST

TS – LINK SOUTH

TS – LINK WEST

...

TS – LINK CORE

OFGs

CONFIG

T-M

OD

ES

PERIOD-TRIGGER

GROUP-ID

MU

X

CTRL

Implementation: • Check via Xoring all OFG • Read & Reset all OFG • Packet composition via FSM

• bT-MODE set during Cluster Setup • bT-MODE equal at all Cluster CELLs

Page 13: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

13

1

)0()0)((

)0()1)((

0

)(

1)(

)( 1

1

1

rst

rsttOFG

rsttOFG

tload

tload

tload spj

spj

spj

spj

spj

Traffic Monitoring – Sensing and Flow • Stage 3 – Event Aggregation Point

Event Counter 0

Event Counter ...

Event Counter (NS-1)GR

OU

P 0

Event Counter 0

Event Counter ...

Event Counter (NS-1)GR

OU

P ..

.

Event Counter 0

Event Counter ...

Event Counter (NS-1)

GR

OU

P (

NC

Lmax

-1)

MU

X

0

...

1

GR

OU

P ID

GROUP-SELECT

EAP-CI(Registers + Bus-Interface)

GROUP-RESETS

GR

OU

P (

NC

Lmax

-1)

VA

LUES

GR

OU

P ..

. VA

LUES

GR

OU

P 0

VA

LUES

0

1

GR

OU

P ID

1

1

GR

OU

P ID

0

0

GR

OU

P ID

EVEN

T B

UFF

ER

...

...

...

Unscaled Loads!

Implementation: • 7-bit counter GROUPs • (NCLmax + 5) counter per GROUP • NCLmax GROUPs (for each CELL) • NCLmax = [16, 64] analysed • Event Buffer stores incoming packets • Vector order = Counter order • Periodic read and reset by software • Intermediate read by software • Direct Access via Core Interface (CI) • Scaling of loads via Access Timing!

Page 14: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

14

254

502

1001

2

1

SPs

SPs

SPs

j

j

j

j

Nk

Nk

Nk

load

load

load

sload

spSPMC tNT

sSP kN /100

Traffic Monitoring – Sensing and Flow • Load Scaling via Access Timing

– Stepping ks={1,2,4} sload=0:ks:100 in % of max BW per Link/Path

– NSP = # of Sensor Periods tsp=f(tclk, bT-MODE) per Monitoring Cycle

– TMC = Monitoring Cycle

Scaled Loads!

• min(bT-MODE)=f(Clustersize)

– Traffic Monitoring Many-to-One Pattern

– Condition: (Injected BW in Cluster < Receivable BW at Master) • 16 CELLs min(bT-MODE) = 128

• 64 CELLs min(bT-MODE) = 1024

– If condition met sload ± emax with emax ≤ 2∙ks!!!

Page 15: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

15

R

R

R

R

CORE[3,3]SNI

CORE[3,2]SNI

CORE[3,1]SNI

CORE[3,0]SNIR

R

R

R

CORE[2,3]SNI

CORE[2,2]SNI

CORE[2,1]SNI

CORE[2,0]SNIR

R

R

R

CORE[1,3]SNI

CORE[1,2]SNI

CORE[1,1]SNI

CORE[1,0]SNIR

R

R

Y

X

Data-NoC System-NoC

Link Link

Link Link

Link Link

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

Lin

k

R R

R R

R

CORE[0,3]

Link Link

Lin

k

Lin

k

Lin

k

R RDNI

R R Link

Link

Link

Lin

kLi

nk

R

R

Link

Lin

k

R

R

R

R

R

R

SNI

CORE[0,2]

DNI

SNI

CORE[0,1]

DNI

SNI

CORE[0,0]

DNI

SNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

DNI

R : Router NodeDNI : Data-NoC Network InterfaceSNI : System-NoC Network InterfaceMaster Cores

atile

bti

le

NY C

ELLs

NX CELLs

CELL/TILE

Total Area Overhead per TILE @ 45nm for tclk=1ns, NX=8 and NY=8 NCLmax 16 CELLs 64 CELLs

SNoC Linkwidth 8-bit 16-bit 8-bit 16-bit Amaster /Atile 0.51% 0.66% 3.11% 3.26% Aslave /Atile 0.29% 0.44% 0.37% 0.52%

Traffic Monitoring – Experimental Results • Hardware Overhead of the Traffic Monitoring in 45nm

– Aslave = Area(RouterSNoC, LinksSNoC, Traffic Sensors, SNI-Extension)

– Amaster = Area(Anormal, EAP)

– Atile = 3mm x 3mm (estimate 45nm Intel SCC)

• EAP and Traffic Sensors dominate the hardware overhead

• But area estimates remain inside a feasible range!

Page 16: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

• Full system simulation with mixed workloads

– Random Uniform Traffic Pattern • Increasing injection rates up to saturation of NoC

• Packetsize = rand(5,15) Flit

• Destination = rand(0, Nx·Ny - 1)

– Random Task Graphs with sequential mapping • 7 to 70 Tasks per Graph

• 2 to 10 Graphs per Workload

• Packetsize = rand(5,50) Flit

– Each Workload simulated 10 times with 10 full Monitoring Cycles for each Cluster shape (4x4, 8x2, 8x8, 4x16) and stepping (1, 2, 4)

– Logging of average and maximum errors for pathloads (PL) and linkloads (LL)

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

16

Traffic Monitoring – Experimental Results

Page 17: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

17

0

1

2

3

4

5

6

7

8

9

0 0,05 0,1 0,15 0,2 0,25 0,3

Ab

solu

te E

rro

r e

max

Injection Rate [Flit per Clock Cycle] per Ipcore

PL-1 LL-1 PL-2 LL-2 PL-4 LL-4

Traffic Monitoring – Experimental Results • Max. error in 4x4 CELL Cluster @ Random : min(bT-MODE) = 128

ks = 4

ks = 2

ks = 1

Page 18: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

18

1,95 1,98 1,88 1,98

3,88 3,97 3,81 3,97

7,94 7,94 7,94 7,94

0

1

2

3

4

5

6

7

8

9

Ab

solu

te E

rro

r e

max

maximum error

max. average error

Traffic Monitoring – Experimental Results • Max. Error all 16 CELL Cluster Scenarios: min(bT-MODE) = 128

ks = 4

ks = 2

ks = 1

Page 19: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

19

1,23 1,62 1,60

1,95

2,81 3,28

3,47

3,97

5,83

7,05 7,14

7,91

0

1

2

3

4

5

6

7

8

9

Ab

solu

te E

rro

r e

max

maximum error

max. average error

Traffic Monitoring – Experimental Results • Max. Error all 64 CELL Cluster Scenarios: min(bT-MODE) = 1024

ks = 4

ks = 2

ks = 1

Page 20: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

Outline

• Introduction

– Networks-on-Chip (NoC)

• Traffic Monitoring

– Basic Concept

– Dual NoC Infrastructure

– Hardware/Software-based Clustering

– Sensing and Flow

– Experimental Results

• Conclusion & Future Work

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

20

Page 21: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

21

Conclusion & Future Work

• Flexible HW/SW traffic monitoring proposed

– All path- and linkloads inside a resizable region at a single entity

– Adjustable timing and accuracy

– Hardware overhead and achievable Monitoring Cycles are feasible

• Next steps and investigations Runtime Mechanisms

– Workload/Application Profiling (Rent‘s Rule) • Communication Distributions/Probabilities

• Execution Phase Detection

• Combination with Performance Counter Data

– Flow-based Traffic Management • Path adaptations inside Clusters

Page 22: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

THANK YOU!

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

22

Page 23: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

23

Traffic Monitoring – Experimental Results

Parameter Value NoC SNoC DNoC

Linkwidth wL 8-, 16-bit 64-bit Clock Rate tclk 1ns

Port Buffer Depth 1 flit 5 flit bT-MODE 64 – 2048 (Simulation: 128, 1024)

Cluster Size NCLmax 16 and 64 Cluster Shape 4×4, 8×2 and 16×4, 8×8

Monitor Position Lower Left Corner (LLC) Technology 45nm (Nangate FreePDK45)

• 45nm hardware synthesis via Synopsys Design Compiler – Estimation of hardware overhead

• SystemC-based cycle accurate NoC simulation – Measurement of absolute traffic monitoring error (max & avrg)

Page 24: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

24

Traffic Monitoring – Basic Concept

Parameter Aggregation

Prognostic Services

SOFT

WA

RE

HA

RD

WA

RE

SYSTEM-MONITORING SYSTEM-CONTROL

GLO

BA

L SY

STEM

-LEV

EL

Algorithm Reconfiguration

Actors

System Adaptations

CLU

STER

/ R

EGIO

N

TILE

/ C

ELL

Data Aggregation

Sensors

Monitoring Evaluations

• Research scope More software-defined and cooperative runtime optimization

Page 25: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

25

Introduction – Networks-on-Chip

PATH-LUTLOCK DST

Packetization Unit @ NIPI

Router Input Port (CORE)

REQ

FLIT DATA

ACK

Depacketization Unit @ NI Router Output Port (CORE)

ACK

FLIT DATA

REQ

Core Input Buffer @ NI

Core Output Buffer @ NI

CORE

0 h

ead

er

1 p

aylo

ad

2 p

aylo

ad

...

(N-1

) ta

il

Packet with N Flit

Page 26: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

26

Traffic Monitoring – Sensing and Flow

Monitoring Cycle TMS [µs] for tclk=1ns bT-MODE ks=1 ks=2 ks=4

64 6.4 3.2 1.6 128 12.8 6.4 3.2 256 25.6 12.8 6.4 512 51.2 25.6 12.8

1024 102.4 51.2 25.6 2048 204.8 102.4 51.2

• Timing of Traffic Monitoring offers two options:

– Adjustment of bT-MODE At all Cluster CELLs

– Adjustment of Stepping ks At Master CELL only

– Tradeoff: Effort vs. Accuracy

Page 27: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

27

0

5

10

15

20

25

30

35

40

45

50

0 5 10 15 20 25 30

Ave

rage

Lin

k Lo

ad [

%]

Average Injected Load per Core [%]

Linkload Sensors Pathload Sensors

Traffic Monitoring – Experimental Results • Pseudoload 8x8 CELL Cluster Scenarios: min(bT-MODE) = 1024

Page 28: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

28

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80

Ave

rage

Lin

k Lo

ad [

%]

Average Injected Load per Core [%]

Linkload Sensors Pathload Sensors

Traffic Monitoring – Experimental Results • Pseudoload 4x4 CELL Cluster Scenarios: min(bT-MODE) = 128

ks = 1

Pseudoload Congestion External traffic

Page 29: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

Introduction – Networks-on-Chip • Networks-on-Chip (NoC)

– Packet-based and globally asynchronous communication on-chip

– Replacement of bus-based interconnections Scalability & Parallelism

• Basic elements of the NoC:

– Ipcore = Computational resource that communicates via the NoC

– Network-Interface (NI) = Connection of Ipcore and NoC for reception/transmission of packets

– Router (R) = Switching units that lead packets through the NoC from source to destination Ipcore

– Link = Bidirectional point-to-point connections between Routers

– Topology = Connection Graph of Ipcores, Routers and Links

• Scalable on-chip communication for Chip Multiprocessor (CMP) Systems

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

29

Page 30: Centralized Traffic Monitoring for online … 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip Centralized Traffic Monitoring for online-resizable

ReCoSoC 2013 – 8th International Workshop on Reconfigurable Communication-centric Systems-on-Chip

College of Computer Science and Electrical Engineering Institute of Applied Microelectronics and Computer Engineering

30

NORTH

EAST

SOUTH

WEST

CORE

CXflitCORE_IN

port_input_select

NORTH

EAST

SOUTH

WEST

CORE

reqack

RL

IBUF

out

flit

HLreq

ack

ARB

HLreqack

flit

flitOUT

reqack

Router-Pipeline (Input- to Output-Port)

LINKIN PORTIN PORTOUT LINKOUTCROSSBAR

IncomingTransmission

Buffer (IBUF)+

Routing (RL)

Crossbar Traversal

Arbitration (ARB)Outgoing

Transmission

Introduction – Networks-on-Chip

0 h

ead

er

1 p

aylo

ad

2 p

aylo

ad

...

(N-1

) ta

il

Packet with N Flit

• Here used: – 2D-Mesh Tolpology

– XY-Routing

– Wormhole Switching and REQ/ACK Flow Control

– Input Buffering

– Round Robin Arbitration