the reconfigurable streaming vector processor...

23
1 The Reconfigurable Streaming Vector Processor (RSVP TM ) Silviu Ciricescu, Ray Essick, Brian Lucas, Phil May, Kent Moat, Jim Norris, Michael Schuette, and Ali Saidi Motorola Labs, Motorola, Schaumburg, IL The Mitre Corporation, Bedord, MA

Upload: others

Post on 30-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

1

The Reconfigurable Streaming Vector Processor (RSVPTM)

Silviu Ciricescu, Ray Essick, Brian Lucas, Phil May,Kent Moat, Jim Norris, Michael Schuette, and Ali Saidi†

Motorola Labs, Motorola, Schaumburg, ILThe Mitre Corporation, Bedord, MA†

Page 2: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

2

Introduction

CoprocessorA special-purpose processing unit that assists the CPU in performing certain types of operations

Vector ArchitectureTraditionally used in supercomputersHeavily pipelined architecture that operates on vectors and matricesVector processors are machines built primarily to handle large scientific and engineering calculations

Page 3: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

3

Introduction (Cont.)Multimedia Functions

Computationally intensiveData streamingApplications: Image/Video capturing, Handwriting recognition, Voice recognitionMany more portable/embedded applications

Streaming DataData is produced/acquired as a stream of elementsRelevant for a short period of timeUndergoes same set of computationHigh degree of spatial localityRelatively poor temporal localityData access patterns allow prefetching of data ahead of computation

Page 4: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

4

Introduction to RSVP

RSVP is a vector coprocessor architectureAccelerates streaming data operationsTargets multimedia functionsRSVP Programming model

Streaming data description – Vector shape and sizeComputation description – Data Flow GraphsAbove descriptions are intuitive and independent of each otherMachine Independent

Page 5: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

5

Motivation for RSVPVector Processing Architectures

In recent years, targeting multimedia rather than SupercomputerMMX extensions to Intel IA32 architectureAltiVec extensions to PowerPC architecture“Wide Word SIMD”

RISC like Load-store Programming modelWide fixed-sized vector registersLevel of abstraction low

RSVP better Architecture than “Wide Word SIMD”

Performance Gap between memory and processing

Page 6: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

6

Streaming Vector Data Processing using RSVP Architecture

A coprocessor to operate synchronously with an existing host CPUA programming model that separates the description of data from computation

Data described by the location and shape in memoryComputation described by Data Flow Graph

Page 7: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

7

Streaming Computation ModelDecoupled Operand FetchDeep Pipelining (function unit chaining)SIMD Processing

Page 8: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

8

Streaming Computation ModelRSVP architecture utilizes a stream-oriented approach to vector ProcessingDecouples and overlaps data access and data processingDecoupled Operand Fetch

Vector stream units – independent load/store unitVSUs communicate with processing unit through interlocked FIFO queues

Deep PipeliningProcessing unit split into N-stage pipeline by chaining multiple function units togetherAllows higher clock frequencies and increased resource utilization

SIMD processingMultiple taps on each of the VSUsParallelism limited by resource limitation, algorithm characteristics, vector size

Page 9: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

9

Describing Data

Vector processing handled by Vector Streaming Unit (VSU)Vector description consists of pointer to first element in each vector in memory and description of vector shapeShape of vector consists of three scalar values: Stride, Skip, Span

Page 10: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

10

Vector Shape

Page 11: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

11

Describing Computation Using Data Flow Graphs

Synchronous DFG language expresses vector operations in a machine independent mannerDependencies explicitly stated to facilitate parallel executionDFG Node description

Input Operands – Reference to previous nodes rather than named registers (Data Dependence)Operation to be performedMinimum precision of its output values

Iteration-to-Iteration dependenceTunnel nodes – source and sink of data flow same

Order dependenceSequential execution (linear DFG) should be matched by any parallel execution

Page 12: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

12

Scheduling and Binary Compatibility

Binary form of linear DFG executed by RSVPLinear DFG may not be best suited for direct execution on a particular RSVP implementationUniversal Fat Binaries (UFB) provided

More than one binary form of DFGDFG Compiler creates themLinked list with linear DFG appearing lastThe binary form that first executes will be used

Page 13: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

13

Quant Programming ExampleCompresses video images through quantization

Page 14: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

14

Quant Programming Example – Data Flow Graph

Page 15: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

15

Architecture

Page 16: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

16

ArchitectureInput and output VSUs

Minimum 1 and 3 respectivelyMaximum 64VSUs handle all issues related to loading/storing data

DFGNo more than 256 nodes“reach back” no more than 63 nodes

Accumulator and scalar/tunnel registersMinimum 2 and 16 respectivelyMaximum 64

SchedulerConverts DFG to machine dependent form

ControlImplements machine dependent form of DFG

Function unitsShould support all operations allowed by DFG

Page 17: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

17

ImplementationFirst Implementation of RSVP

Low cost, low power solutionFabricated in TSMC 0.18um CMOS Technology

Tool setCompiler, assembler, linker for ARMCompiler for RSVP linear DFG

AreaComparable to ARM9 host processorIn effect, doubling area resulted in greater than 2x increase in performance

Power DissipationComparable to ARM9 host processorClock gating done to reduce system power dissipation

Page 18: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

18

First Implementation

Page 19: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

19

Reconfigurability

Fabric and QueueReconfigurable interconnect20 links, each link can transfer 16-bits of data from its source to its destinationLinks can be reconfigured every cycle

Function units64 bits wide, sliced on 16-bit boundariesFully pipelined with result latching

Page 20: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

20

Results – Kernel Speedups

Magnitude of speedup results from large effective Instruction width of RSVP ImplementationIssue Rate: ARM9 – 0.78 IPC, RSVP – 9 IPC

WWSIMD – 4 Instructions per cycle

Page 21: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

21

Results – Application Speedups

Page 22: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

22

Conclusion

RSVP architecture vector coprocessor/accelerator architectureimproves general purpose CPU performance on streaming data applicationsImproves time to market because of ease of programmabilitySpeedups for kernels and applications range from 2 to over 20 times that of a host processor alone

Page 23: The Reconfigurable Streaming Vector Processor (RSVPTMclass.ece.iastate.edu/tyagi/cpre583/Fall2005/... · 2005-12-05 · 1 The Reconfigurable Streaming Vector Processor (RSVPTM) Silviu

23

Thank you

Questions ???