evaluating application performance between dsp processor ... · group of architectures and...

30
Evaluating Application Performance Between DSP Processor and GPP Using Reconfigurable Hardware Nicola, Eduardo V. Ruzicki, Julio C.M. Martins, Luis H.J. Mattos, J´ ulio C.B. Group of Architectures and Integrated Circuits - GACI Federal University of Pelotas - UFPel Pelotas - Brasil 1 / 30

Upload: ledien

Post on 03-Sep-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

Evaluating Application Performance BetweenDSP Processor and GPP Using Reconfigurable

Hardware

Nicola, Eduardo V.Ruzicki, Julio C.M.Martins, Luis H.J.Mattos, Julio C.B.

Group of Architectures and Integrated Circuits - GACIFederal University of Pelotas - UFPel

Pelotas - Brasil

1 / 30

Index

1 Introduction

2 The Reconfigurable Hardware

3 Case Study

4 Results and Conclusions

2 / 30

Introduction

The industry and market of electronic devices grows rapidlyeach day

Most products now have builtin microprocessors to handlespecific tasks

This is made possible due to the fast technology development,in response to market demands

General purpose processors (GPPs)ASICs (Application Specific Integrated Circuits)

3 / 30

Introduction

However, it is possible to perform those tasks in other kinds ofprocessors:

GPPs were used before the idealization of DSP processorsASIC were another option of reduced costs and flexibility

4 / 30

Motivation

Flexible Specialization

A field that is growing in research is ReconfigurableArchitecturesThis area of research aims to improve processor performancewithout increasing its clock rateThis is made using specialized hardware structures to performportions of program code

5 / 30

Methodology

The work was divided in stages:

1 Choice of tools to simulate DSP processors and simulate theMips32 processor;

2 Selection of typical DSP applications;

3 Use of simulators to generate execution traces from the DSPapplications;

4 Comparison of results between the DSP processor and theMIPS32, with and without the Reconfigurable Hardware.

6 / 30

Methodology

Three commercial tools were analysed as possible DSP processorsimulators for the work:

7 / 30

Methodology

Code Composer

Features

DevelopmentDebuggerProfilerReal-time-operation system

8 / 30

Methodology

Symphony Studio

Works with DSP563xx/DSP567xx processors

Features

Can be executed on Eclipse environmentGDB (GNU Project Debugger)Project Manager

9 / 30

Methodology

VisualDSP++

Features

Statistical performance measurementsEmulationSimulationCompilation

10 / 30

Methodology

OVPsim

Features

Describing platforms with one or more processorsModels are available for many standard processorsCompilationTrace Generator

11 / 30

The Reconfigurable Hardware

Pipeline Diagram

12 / 30

The Reconfigurable Hardware

Instruction Fetch

13 / 30

The Reconfigurable Hardware

Instruction Decode

14 / 30

The Reconfigurable Hardware

Execution

15 / 30

The Reconfigurable Hardware

Memory Access

16 / 30

The Reconfigurable Hardware

Write Back

17 / 30

The Reconfigurable Hardware

The Reconfigurable System works as a functional unit

18 / 30

The Reconfigurable Hardware

The Binary Translator works parallel with MIPS pipeline

At DI stage, BT reads the PC indexed address

19 / 30

The Reconfigurable Hardware

The Binary Translator works parallel with MIPS pipeline

At DI stage, BT reads the PC indexed addressAfter decoding the instruction, BT sends the instruction toRU, when possible, as a configuration to be made on RU

20 / 30

The Reconfigurable Hardware

The Binary Translator works parallel with MIPS pipeline

At DI stage, BT reads the PC indexed addressAfter decoding the instruction, BT sends the instruction toRU, when possible, as a configuration to be made on RUBT also stores the instruction configuration or group ofinstruction configurations on RCWhen PC points to an existing address on RC, RU takes theexecution from CPU.

21 / 30

Case Study

A generic and customized benchmark was madewith DSP-like operations

1 Performs a multiplication of two real vectors2 Two matrix addition3 Matrix subtraction4 Matrix scalar multiplication5 Calculates the mean of the input array6 Calculates the RMS of the elements in the input

array7 Accumulates the two vector product8 Two matrix addition

22 / 30

Case Study

9 Executes the Discrete Fourier Transform

10 Executes the Inverse Discrete Fourier Transform

11 Executes the Fast Fourier Transform

12 Executes the Inverse Fast Fourier Transform

13 Executes the Finite Impulse Response

23 / 30

Case Study: Methodology

The Work-flow

24 / 30

Case Study: Results

1 2 3 4 5 6 7 8

2

3

4

·105

Clo

ckC

ycle

s

ADSP-BF504 MIPS32-pure MIPS32-array

25 / 30

Case Study: Results

9 10

0

2

4

6

·106

Clo

ckC

ycle

s

ADSP-BF504 MIPS32-pure MIPS32-array

26 / 30

Case Study: Results

11 12 13

0

2

4

6

·105

Clo

ckC

ycle

s

ADSP-BF504 MIPS32-pure MIPS32-array

27 / 30

Case Study: Results

MIPS32 had less clock cycles with the Reconfigurable System

Blackfin ADSP-BF504 is a specialized DSP processor,comparing to MIPS-pure(without array), but MIPS-array hadless clock cycles in most applications

28 / 30

Conclusions

This work presented the simulation of DSP applications tocompare Blackfin and MIPS32

Even Blackfin ADSP-BF504 being a specialized DSPprocessor, MIPS-array had less clock cycles in mostapplications

A GPP processor with Reconfigurable System can optimiseprogram execution, compared to DSP processors

For future works: Other kinds of simulations

1 Power estimates2 Energy consumption

29 / 30

Evaluating Application Performance BetweenDSP Processor and GPP Using

Reconfigurable Hardware

Eduardo NicolaJulio RuzickiLuis MartinsJulio Mattos

{evnicola, jcmruzicki,lhjmartins, julius}@inf.ufpel.edu.br

30 / 30