evaluating the imagine stream processor

Post on 31-Jan-2016

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Evaluating the Imagine Stream Processor. Jung Ho Ahn , William J. Dally, Brucek Khailany , Ujval J. Kapasi , and Abhishek Das ISCA 2004. Motivation. Provide efficiency of an ASIC Provide flexibility of a programmable processor Simplify special-purpose processor design - PowerPoint PPT Presentation

TRANSCRIPT

Evaluating the Imagine Stream Processor

Jung Ho Ahn, William J. Dally, Brucek Khailany, Ujval J. Kapasi, and Abhishek Das

ISCA 2004

Motivation• Provide efficiency of an ASIC• Provide flexibility of a programmable processor• Simplify special-purpose processor design • Lower special-purpose processor design cost• Provide better applicability• Target media applications

Stream Architecture

Development Board

PowerPC, 150 MHz2 x Imagine, 200 MHzFPGA Bridge, 66 MHz

256MB of SDRAM / Imagine, 100 MHz

Applications

Mapping

Execution on a Single Stream

…Iteration n

Iteration 1

……

Output Stream

Input Stream

SRFKernel 1

Execution of Multiple KernelsSRF Kernel 1

Stream 1

Stream 2

Stream 3

processing…

Kernel 2

processing…

Kernel 3

processing…

Stream 4

Application PerformanceGOPS: 18%

GFLOPS: 60%

Sources of Overhead

Stream Length Effects

Access Pattern Effects

Energy Efficiency

Energy consumption per FLOP :(when normalized to 0.13um 1.2V process)

Imagine @ 200 MHz:277pJ/FLOP

TI C67x DSP @ 225MHz:889pJ/FLOP (3.2x more)

Intel Pentium M @ 1200GHz:3600pJ/FLOP (13x more)

Memory Bandwidth Requirement

Host Processor Bandwidth Requirement

Programming Model

Compiler OptimizationsStream Ordering

Compiler OptimizationsSRF Overlapping and Packing

Compiler OptimizationsStrip-mining

Compiler OptimizationsLoop Unrolling and Software Pipelining

Conclusions

• Provides performance close to that of ASIC and flexibility via programming

• Can sustain between 16% and 60% of the peak arithmetic performance

• Exposed 2-level register file allows compiler to exploit locality

• Broader applicability• Requires considerable programming effort• Limited to media applications with regular control-

flow

Collab Questions

• How does the performance compare to other processors? (Dan, Marko, Jason, Prateeksha, Chris)

• What is the compiler efficiency? (Mario, Liang)• How were the design decisions motivated? (Jing,

Marisabel)• How does the programming model compare to that

of GPUs? (Greg)

Kernels

top related