evaluating the raw microprocessor michael bedford taylor raw architecture group computer science and...

22
Evaluating the Raw microprocessor Michael Bedford Taylor Raw Architecture Group Computer Science and AI Laboratory Massachusetts Institute of Technology

Upload: kerrie-webster

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Evaluating the Raw microprocessor

Michael Bedford TaylorRaw Architecture Group

Computer Science and AI LaboratoryMassachusetts Institute of Technology

Evaluating the Raw microprocessor

Brief Overview of Raw Architecture

Avenues of Evaluation

Empirical - Comparison with P3

Analytical - Modeling Large scale ILP Experiential - Experimental Systems

The Raw Architecture

Divide the silicon into an array of identical, programmable tiles.

(A signal can get through a small amount oflogic and to the next tile in one cycle.)

Raw Architecture

ComputeProcessor

Routers

On-chip networks

Raw Architecture

ComputeProcessor

Routers

On-chip networks

IF RFDA TL

M1 M2

F P

E

U

TV

F4 WB

r26

r27

r25

r24

InputFIFOsfromStaticRouter

r26

r27

r25

r24

OutputFIFOstoStaticRouter

Inside the compute processor – networks are integrated directly into the bypass paths

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

tmp0=tmp0.1

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8

v0.9=tmp0.1-v1.8

v0=v0.9

pval5=seed.0*6.0

pval4=pval5+2.0

tmp3.6=pval4/3.0

tmp3=tmp3.6

v3.10=tmp3.6-v2.7

v3=v3.10

v2.4=v2

pval3=seed.o*v2.4

tmp2.5=pval3+2.0

tmp2=tmp2.5

pval6=tmp1.3-tmp2.5

v2.7=pval6*5.0

v2=v2.7

seed.0=seed

pval1=seed.0*3.0

pval0=pval1+2.0

tmp0.1=pval0/2.0

tmp0=tmp0.1

v1.2=v1

pval2=seed.0*v1.2

tmp1.3=pval2+2.0

tmp1=tmp1.3

pval7=tmp1.3+tmp2.5

v1.8=pval7*3.0

v1=v1.8v0.9=tmp0.1-v1.8

v0=v0.9

Raw’s bypass-integrated on-chip networks serve as a Scalar Operand Network, or SON.

Multiple Raw tilesProgram graph

Empirical EvaluationComparison to P3

Parameter Raw (IBM ASIC) P3 (Intel)

Litho 180 nm 180 nm

Process CMOS 7SF P858

Metal Layers Cu 6 Al 6

FO1 Delay 23 ps 11 ps

Dielectric k 4.1 3.55

Design Style Standard Cell Full custom

Initial Freq 425 MHz 500-733 MHz

Die Area 331 mm2 106 mm2

Analytical Evaluation

Scalar Operand Network Research (SONs).

(See HPCA 2003 and future.)

Scalar Operand Network

The network and the associated algorithms that are responsible for matching operands and operationsIn space.

SON Performance Metric: 5-tuple

conventional <3, 15, 2, 1, 12>distributed multiprocessor

Superscalar < 0, 0, 0, 0, 0>(not scalable)

Raw: a new point in the region.

conventional <3, 15, 2, 1, 12>distributed multiprocessor

Raw SON < 0, 1, 1, 1, 0>

Superscalar SON < 0, 0, 0, 0, 0>(not scalable)

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16

Cycles

Spe

edup

vs.

Raw

cholesky

vpenta

mxm

fpppp-kernel

sha

swim

jacobi

life

Impact of Receive Occupancy, 64 tiles,i.e., <0,1,1,1,n>

Experiential Evaluation (i.e., Real Hardware, Real Systems)

Systems Online or in PipelineWorkstationMicrophone ArrayFabric System(Software Radio on Raw)(IP Routing on Raw)

Raw Chip Specifications

IBM SA27E Process180 nm, 6-metal copper ASIC

process

16 Tile RAW Processor18.23mm x 18.23mm

1657 pin CCGA package

1152 HSTL signal pins

Clock and Power420MHz (actual)

10 watts (power save mode)

18 watts typical

35 watts max

.. twenty-eight 32-bit buses connecting Raw Chip to I/O and Memory System

Raw Motherboard

2 Microphone Board

2 Microphones

1 A-to-D

1 CPLD

2 Connectors

1020 Element Microphone Array

Fabric System Architecture

Design: two distinct board types

Board 1: Quad Raw Board

Board 2: I/O & Memory Board

Replicate and connect

Summary