evaluating the raw microprocessor michael bedford taylor raw architecture group computer science and...
TRANSCRIPT
Evaluating the Raw microprocessor
Michael Bedford TaylorRaw Architecture Group
Computer Science and AI LaboratoryMassachusetts Institute of Technology
Evaluating the Raw microprocessor
Brief Overview of Raw Architecture
Avenues of Evaluation
Empirical - Comparison with P3
Analytical - Modeling Large scale ILP Experiential - Experimental Systems
The Raw Architecture
Divide the silicon into an array of identical, programmable tiles.
(A signal can get through a small amount oflogic and to the next tile in one cycle.)
IF RFDA TL
M1 M2
F P
E
U
TV
F4 WB
r26
r27
r25
r24
InputFIFOsfromStaticRouter
r26
r27
r25
r24
OutputFIFOstoStaticRouter
Inside the compute processor – networks are integrated directly into the bypass paths
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8
v0.9=tmp0.1-v1.8
v0=v0.9
pval5=seed.0*6.0
pval4=pval5+2.0
tmp3.6=pval4/3.0
tmp3=tmp3.6
v3.10=tmp3.6-v2.7
v3=v3.10
v2.4=v2
pval3=seed.o*v2.4
tmp2.5=pval3+2.0
tmp2=tmp2.5
pval6=tmp1.3-tmp2.5
v2.7=pval6*5.0
v2=v2.7
seed.0=seed
pval1=seed.0*3.0
pval0=pval1+2.0
tmp0.1=pval0/2.0
tmp0=tmp0.1
v1.2=v1
pval2=seed.0*v1.2
tmp1.3=pval2+2.0
tmp1=tmp1.3
pval7=tmp1.3+tmp2.5
v1.8=pval7*3.0
v1=v1.8v0.9=tmp0.1-v1.8
v0=v0.9
Raw’s bypass-integrated on-chip networks serve as a Scalar Operand Network, or SON.
Multiple Raw tilesProgram graph
Empirical EvaluationComparison to P3
Parameter Raw (IBM ASIC) P3 (Intel)
Litho 180 nm 180 nm
Process CMOS 7SF P858
Metal Layers Cu 6 Al 6
FO1 Delay 23 ps 11 ps
Dielectric k 4.1 3.55
Design Style Standard Cell Full custom
Initial Freq 425 MHz 500-733 MHz
Die Area 331 mm2 106 mm2
Scalar Operand Network
The network and the associated algorithms that are responsible for matching operands and operationsIn space.
SON Performance Metric: 5-tuple
conventional <3, 15, 2, 1, 12>distributed multiprocessor
Superscalar < 0, 0, 0, 0, 0>(not scalable)
Raw: a new point in the region.
conventional <3, 15, 2, 1, 12>distributed multiprocessor
Raw SON < 0, 1, 1, 1, 0>
Superscalar SON < 0, 0, 0, 0, 0>(not scalable)
0
0.2
0.4
0.6
0.8
1
1.2
0 4 8 12 16
Cycles
Spe
edup
vs.
Raw
cholesky
vpenta
mxm
fpppp-kernel
sha
swim
jacobi
life
Impact of Receive Occupancy, 64 tiles,i.e., <0,1,1,1,n>
Experiential Evaluation (i.e., Real Hardware, Real Systems)
Systems Online or in PipelineWorkstationMicrophone ArrayFabric System(Software Radio on Raw)(IP Routing on Raw)
Raw Chip Specifications
IBM SA27E Process180 nm, 6-metal copper ASIC
process
16 Tile RAW Processor18.23mm x 18.23mm
1657 pin CCGA package
1152 HSTL signal pins
Clock and Power420MHz (actual)
10 watts (power save mode)
18 watts typical
35 watts max
Fabric System Architecture
Design: two distinct board types
Board 1: Quad Raw Board
Board 2: I/O & Memory Board
Replicate and connect