communication architecture synthesis alessandro pinto, university of california at berkeley...

Communication Architecture Synthesis

Alessandro Pinto,

University of California at Berkeley

apinto@eecs.berkeley.edu

Introduction: what

Communication synthesisWhat is the “best way” of transferring information between entities

Problem FormulationWho wants to communicate with whomHow he aspects the communication mechanism to behaveWhat is possible to built todayHow to build a communication architecture that is cheap and satisfies the entities constraints while being feasible

Introduction: why

On-Chip applicationLots of components connected together

Distance is becoming important

There isn’t a formal approach to the problem

We build tools which are the glue in the platform-based design approach

More or less everything has been done in the design of “functions”

Outline

OverviewInternet vs. On-chipnet

Standard communication structuresVSIA + Standard comm. Platform

Constraint-Driven Communication Synthesis

A theoretic approach (A.Pinto,L.P.Carloni.ASV)

Current researchConclusion

Wire Scaling

“As designs scale to newer technologies, they get smaller and their wiresget shorter, and the relative change in the speed of wires to thespeed of gates is modest.” M.A.Horowitz et.al “The future of wires”

“The real wire problem arises with increasing chip complexity and global communication costs.” M.A.Horowitz et.al “The future of wires”

Local Wire

Global Wire

What is going to happen

From few closed connected components to a lot of distributed IP’s

A global net is most likely to be traveled in more that one clock cycle

High bandwidth and low power requirements

Fault tolerance requirement

There is need for a correct by construction methodology

Internet Network

Highly distributed

Fault Tolerant

Every node must reachable

Networks vs. On-Chip Networks

Internet major concern was to guarantee connectivity in case of attack

Cost is mostly static

Buffers memory is not a problem (512MB is still cheap)

Number and complexity of protocol layer is important only for performance issues

Full connectivity is not the major concern

Now cost is mostly dynamic (power)

Registers are expensive

The protocol must be very light

In the future error correction might become a reality

The beginning of internet

Back in 1969: the ARPAnet

The beginning of On-Chipnet

A lot of effort on on-chip communicationOne of GSRC main theme: Communication-based design

Special section in conferences

Different approachesNot always really networks

Buses are still widely used

Effect of standardization

'69 '71 '74 '76 '78 '80 '82

'84 '85 '86 '87 '89 '90 '91

1983: TCP/IP was introduced

1987: Commercialization of internetTCP/IP technology is transferred formInventors to vendors

Standardization

VSIA: Virtual Socket Interface AllianceSpecifies Virtual Component Interface

Peripheral VCI

Basic VCI

Advanced VCI

VCI Initiator VCI Target

VCI Target VCI Initiator

Bus Master Bus SlaveAny Bus

http://www.vsi.org

Standard “Bus” Architecture

ARM (ARM Ltd.)*

CoreFrame (Palmchip)

CoreConnect (IBM)

WichBone (Silicore)

SiliconBackPlane (Sonics)*

AMBA Bus Hierarchy

HP ARMCore On Chip RAM

Off-Chip RAM

Bridge

Keypad

AHB: Advanced High-Performance Bus

AB: Advanced Peripheral Bus

SiliconBackplane

It’s possible to specify constraints for each point to point communication

It’s possible to specify different OPC

Synthesis of the network that guarantees performances

A Today’s Platform Example

Processor

USB, MMC, IrDA, Bluetooth

PCMCIA, SDRAM

PCMCIA

Bluetooth

Based on Intel XScale

Where are we headed?

Communication is specified in terms of point to point Constraints

A target implementation technology is abstracted into a Library

A synthesis procedure generates a communication architecture

That satisfies the constraints

That uses only the library elements

A Platform Integrator EnvironmentArchPltBuilderArchPltBuilder

To Silicon

Communication Topology Design

Platform-based-design in communication

Application Space

Implementation Space

Topology Design

Protocol Design

Routing

Network Layer

MAC Layer

Physical Level

The Problem

Previous Work

S.Dey et al., “System-Level Performance Analysis for Designing On-Chip Communication Architectures”, TCAD, Vol. 20, NO. 6, June 2001

S.Dey et al., “Efficient Exploration of the SOC Communication Architecture Design Space”,Int. Conf. On CAD, August 1999

J.M. Daveau et al., “Protocol Selection and Interface Generation for HW-SW Codesign”, TVLSI, Vol. 5, NO.1, March 1997

J.M. Daveau et al., “Synthesis of system level communication by an allocation based approach”, Int. Symposium on System Synthesis, 1995

Our Approach

System modules communicate by means of point-to-point unidirectional channels

Each channel is characterized by a set of communication constraints (distance, minimum bandwidth)

The specific functionality of each module is “abstracted away”

Abstract Model

The Goal – Overview of the Method

Application Space

Implementation Space

Constraint Propagation

Performance Abstraction

Abstraction (Intuition)

Length

Relay Station Switches

This is the basic library of Communication Components

Communication-Constraint Graph

G=(V,A)

v V, p(v) is the vertex position in our coordinate system

a A:The arc distance d(a)

The arc bandwidth b(a)

M 1 M 5

Library

= L N, communication links and communication nodes

l L:d(l) is the link length

b(l) is the link bandwidth

c(l) link cost

n N c(n) is the cost of the communication node

Communication Implementation

(b(a),d(a))=(4,5)

(1,2)(1,2)

G=(V,A)

Arc Matching

Arc Segmentation

Arc Duplication + Arc Segmentation

Arc Merging

Implementation Graph

G’(G, ) = (V’ N’, A’) v’ V’, v’ V

n’N’, n’ is an instance of an element of N

a’A’, a’ is an instance of an element of L

a(u,v) A, P(a) set of paths s.t.P(a) connects u’ with v’ without passing through any other computational vertex

P(a) satisfies the bandwidth constraint of a

The cost of and implementation graph is

N'n' A'a'

)c(a' )c(n' )C(G'

TheOptimization Problem

Gtosbj

)'(min),('

Assumptions

Assumption on the Libraryd(a1) > d(a2) b(a1) > b(a2) c(a1) > c(a2)

K-Way Merging structure

Candidate Implementations

Generate only k-way mergings that could be part of the final implementation

Derive mergeability conditions based only on positions and bandwidth of the constraint graph and the library

2-Way Mergeablility

d(a1) + d(a2)

|| p(u1) – p(u2)||

|| p(v1) – p(v2)||

|| p(u1) – p(u2)|| + || p(v1) – p(v2)||

{a1, a2} not mergeable

2-Way Mergeablility

{a1… ak} not mergeable

(k-1)d(ak) + d(ai)

|| p(u1) – p(uk)||

|| p(v1) – p(vk)||

|| p(ui) – p(uk)|| + || p(vi) – p(vk)||

2-Way Mergeablility

b(ai) b(al) + b(aj) {a1… ak} not mergeable

1i Llmax

k][1,jmin

n-Way Mergeablility

Expansion Rule

If a A is not mergeable with any set of k-1 arcs in A,

then a is not mergeable with any set of k arcs in A

Summary of Pruning Conditions

K-way mergeablility condition over distance

K-way mergeablility condition over bandwidth

Kk+1 mergeability condition

All the pruning conditions are based on Positions of ports andProperties of arcs in the Constraint Graphand maximum bandwidth in the Library

Solving the Optimization Problem

Constrains

Library

CandidateMerging

Generation

Non LinearOptimizer

UnateCovering

Solution

K-way Mergings

Point to point

Constrained Distance Sum Matrix

Dij= d(ai) + d(aj) aj

Dkj + Dmj = d(ak) + d(aj)+ d(am) + d(aj) = 2d(aj) + d(ak) + d(am)

= (k-1)d(aj) + d(ai)

Merging Distance Sum Matrix

ij= ||p(ui) – p(uj)|| + ||p(vi) - p(vj)||

kj + mj = ||p(uk) – p(uj)|| + ||p(vk) - p(vj)|| + ||p(um) – p(uj)|| + ||p(vm) - p(vj)||=

= || p(ui) – p(uk)|| + || p(vi) – p(vk)||

Algorithm K 1;foundcanditatemapping TRUE;while (foundcanditatemapping) { for all column jcol() kWayMergingNotFound TRUE; for all subset of row R={i1…ik} row() if sum(D,R) < sum(,R) if l L s.t. b(l) + bmin(R,j) < sum(B,R) + B[j] {

S S kmergingimplementation(R,j);

kWayMergingNotFound FALSE } if (kWayMergingNotFound)

col() col() \ j;

if col() = foundcanditatemapping FALSE;

else k k+1;}

A Simple Example

a1 a3a2

(0,0) (0.4,0) (1.4,0)

(0,1) (0.4,1) (1.4,1)

0.8 2.8

a1 a2 a3

b(l)=2 d(l)=1 c(l)=1L

A Simple Example

a1 a3a2

(0,0) (0.4,0) (1.4,0)

(0,1) (0.4,1) (1.4,1)

0.8 2.8

a1 a2 a3

b(l)=2 d(l)=1 c(l)=1L

A Simple Example

a1 a3a2

(0,0) (0.4,0) (1.4,0)

(0,1) (0.4,1) (1.4,1)

0.8 2.8

a1 a2 a3

b(l)=2 d(l)=1 c(l)=1L

Examples

3 3-way mergings

1 9-waymerging

Cost per unit length

C/l Lib1

All p-to-p

1 9-waymerging

Cost per unit length

C/l Lib1

Examples

Limitations

u1 u2 u3 v1 v2 v3

Alternation

u1u2 u3 v1 v2 v3

Bi-directionality

Library Characterization

Main concerns (costs)Area

Costs are affected byGeometric and physic properties of wires (our basic communication primitives)

Communication Speed

Wires Geometry and Physics

Thickness

DielectricSpace

buffer buffer

Depends only on the technology, not on the metal layer

An Example

Based on Intel 0.13um

M1 M2 M3 M4 M5 M6

Energy(pJ)

M1 M2 M3 M4 M5 M6

Sopt(*minsize)

Energy consumptionfor a series of lcrit and buffer sopt

Optimum buffer size in terms of minimum size

pscrit 17

Wires trade off

crit is constant

for the same length, different metal layers carry different bandwidth

A wire in M(n) is going to cost more in terms of area and power then a wire in M(n-1)

Buffers are bigger

The library is convex

critcritl

Conlcusion

A chip will look like a network of componentsHighly distributedMulticlock cycle nets

CAD tools for On-Chipnet design mustAllow specification of communication constraintsFind the minimum cost network. Costs are:

PowerArea (statefull/stateless repeaters)Blocking time (queues)

Ensure Network reliability and correctness

Constraint-Driven Communication Synthesis is our approach to the problem

Future Work (Available Projects)

Ad hoc flow control protocol for low power, minimum buffering On-Chipnet

Implementation and performance/cost abstraction of fast low power On-Chip routers

Interface with Metropolis

On-Chipnet VHDL netlist generator

communication architecture synthesis alessandro pinto, university of california at berkeley...

chip communication

reachable slide

reality slide

feasible slide

construction methodology

communication mechanism

global communication

chip application

Documents

learning to see by moving - arxiv · learning to see by...

ee40: introduction to microelectronic...

curriculum vitae antonio pinto · curriculum vitae antonio...

departamento de recursos humanos · 2018. 10. 16. ·...

course roadmap - cs184.eecs.berkeley.edu

march 22, 2016 arxiv:1603.02752v2 [cs.lg] 18 mar 2016 ·...

formandos: antónio pinto; patrícia pinto & ...

tal lavian tlavian@eecs.berkeley.edu optical networking &...

celi pinto

paulo raimundo pinto sistema de automaÇÃo do...

pdf - arxiv.org · yasin.abbasiyadkori@qut.edu.au...

yo nunca pinto sueños o pesadillas. pinto mi propia

ford pinto

ee40: introduction to microelectronic...

ee40: introduction to microelectronic circuits summer 2004...

insights into routervm’s flexibility and performance mel...

pinto mayas

pÁg. 11 pÁg. 11 pÁg. 14 dívida precisa luxo, mais luxo e...

pinto en los aÑos 30 - ayuntamiento de pinto

muri meeting july 2002 gert lanckriet (...