cs294-6 reconfigurable computing

37
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices

Upload: libba

Post on 15-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

CS294-6 Reconfigurable Computing. Day 5 September 8, 1998 Comparing Computing Devices. Quotes. An engineer is a man who can do for a dime what any fool can do for a dollar. If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS294-6 Reconfigurable Computing

CS294-6Reconfigurable Computing

Day 5

September 8, 1998

Comparing Computing Devices

Page 2: CS294-6 Reconfigurable Computing

Quotes

• An engineer is a man who can do for a dime what any fool can do for a dollar.

• If it can’t be expressed in figures, it is not science; it is opinion. -- Lazarus Long

Page 3: CS294-6 Reconfigurable Computing

Motivation

• Need to understand– How costly (big) is a solution– How compare to alternatives– Cost and benefit of flexbility

Page 4: CS294-6 Reconfigurable Computing

What we really want:

• Complete implementation of our application

• For each architectural alternatives– In same implementation technology – w/ multiple area-time points

Page 5: CS294-6 Reconfigurable Computing

Reality

• Seldom get it packaged that nicely – much work to do so– technology keeps moving

• Deal with– estimation from components– technology differences– few area-time points

Page 6: CS294-6 Reconfigurable Computing

Today Empirical

• Start sorting out– custom vs. configurable– spatial configurable vs. temporal

Page 7: CS294-6 Reconfigurable Computing

FPGA Table

Page 8: CS294-6 Reconfigurable Computing

How many gates?

Page 9: CS294-6 Reconfigurable Computing

“gates” in 2-LUT

Page 10: CS294-6 Reconfigurable Computing

Now how many?

Page 11: CS294-6 Reconfigurable Computing

Gates/unit area?

Usable gates?

Page 12: CS294-6 Reconfigurable Computing

Gates Required?

Depth=3, Depth=2048?

Page 13: CS294-6 Reconfigurable Computing

Gate metric for FPGAs?

• Day3: several components for computations– compute element– interconnect:

• space• time

– instructions

• Not all applications need in same balance

• Assigning a single “capacity” number to device is an oversimplification

Page 14: CS294-6 Reconfigurable Computing

Exercise Admin

• Simulation slow -> see fastsim alternative

• Exercise 1 more effort than anticipated

• Drop exercise 4 and rearrange due date– SPACE1 --- ASAP– SPACE2 --- original 9/10, try before 9/15 – CYCLE --- 9/24

Page 15: CS294-6 Reconfigurable Computing

Density vs. Binding TimeD

ensi

ty

Binding Time

Full Custom

GateArray

FPGA

Processor

Pre

-mas

k

Fin

al M

ask(

s)

cycle

“startup”

Page 16: CS294-6 Reconfigurable Computing

MPGA vs. Custom?

• AMI CICC’83– MPGA 1.0

– Std-Cell 0.7

– Custom 0.5

• Toshiba DSP– Custom 0.3

• Mosaid RAM– Custom 0.2

• GE CICC’86– MPGA 1.0

– Std-Cell 0.4--0.7• FF/counter 0.7

• FullAdder 0.4

• RAM 0.2

Page 17: CS294-6 Reconfigurable Computing

Metal Programmable Gate Arrays

Page 18: CS294-6 Reconfigurable Computing

MPGAs

• Modern -- “Sea of Gates”

• yield 35--70%

• maybe 5k/gate ? (quite a bit of variance)

Page 19: CS294-6 Reconfigurable Computing

MPGA vs. FPGA

• MPGA (SOG GA)– 5K2/gate

– 35-70% usable (50%)

– 7-17K2/gate net

• Ratio: 2--10 (5)

• Xilinx XC4K– 1.25M2 /CLB

– 17--48 gates (26?)

– 26-73K2/gate net

Adding ~2x Custom/MPGA, Custom/FPGA ~10x

Page 20: CS294-6 Reconfigurable Computing

MPGA vs. FPGA

• MPGA (SOG GA) gd~1ns

• Ratio: 1--7 (2.5)

• Xilinx XC4K gates in 7ns

– 2-3 gates typical

Page 21: CS294-6 Reconfigurable Computing

Processors and FPGAs

Page 22: CS294-6 Reconfigurable Computing

Processors and FPGAs

Page 23: CS294-6 Reconfigurable Computing

Degrade from Peak: Processors

• Ops w/ no gate evaluations (interconnect)

• Ops use limited word width

• Stalls waiting for retimed data

Page 24: CS294-6 Reconfigurable Computing

Degrade from Peak: FPGAs

• Long path length --> not run at cycle

• Limited throughput requirement– bottlenecks elsewhere limit throughput req.

• Insufficient interconnect

• Insufficient retiming resources (bandwidth)

Page 25: CS294-6 Reconfigurable Computing

Degrade from Peak: Custom/MPGA

• Solve more general problem than required– (more gates than really need)

• Long path length

• Limited throughput requirement

• Not needed or applicable to a problem

Page 26: CS294-6 Reconfigurable Computing

Raw Density Summary

• Area– MPGA 2-3x Custom– FPGA 5x MPGA

• Area-Time– Gate Array 6-10x Custom– FPGA 15-20x Gate Array– Processor 10x FPGA

Page 27: CS294-6 Reconfigurable Computing

Raw Density Caveats

• Processor/FPGA may solve more specialized problem

• Problems have different resource balance requirements– …can lead to low yield of raw density

Page 28: CS294-6 Reconfigurable Computing

Broadenning Picture

• Compare larger computations

• For comparison– throughput density metric: results/area-time

• normalize out area-time point selection

• high throughput density – -> most in fixed area

– -> least area to satisfy fixed throughput target

Page 29: CS294-6 Reconfigurable Computing

Multiply

Page 30: CS294-6 Reconfigurable Computing

FIR

Page 31: CS294-6 Reconfigurable Computing

IIR/Biquad

Page 32: CS294-6 Reconfigurable Computing

DES Keysearch

<http://www.cs.berkeley.edu/~iang/isaac/hardware/>

Page 33: CS294-6 Reconfigurable Computing

DNA Sequence Match

• Problem: “cost” of transform S1S2

• Given: cost of insertion, deletion, substitution

• Relevance: similarity of DNA sequences– evolutionary similarity– structure predict function

• Typically: new sequence compared to large databse

Page 34: CS294-6 Reconfigurable Computing

DNA Sequence Match

Page 35: CS294-6 Reconfigurable Computing

Floating-Point Add (single prec.)

Page 36: CS294-6 Reconfigurable Computing

Floating-Point Mpy (single prec.)

Page 37: CS294-6 Reconfigurable Computing

Summary

• Raw densities (Area-Time)– FPGA/custom = 100x– Processor/custom = 1000x

• Special-purpose functional units in processors/DSPs, much lower net benefit since need to control and interconnect

• Gap narrows (closes) as programmable can be specialized