lev kirischian, irina terterian, pil woo chun and vadim geurkov

23
Re-configurable Parallel Stream Processor with self- assembling and self- restorable micro- architecture Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov Embedded and Re-configurable Systems Lab RYERSON University, CANADA

Upload: fuller

Post on 08-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture. Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov Embedded and Re-configurable Systems Lab RYERSON University, CANADA. Example of Multi-task Data-Flow workload - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture

Re-configurable Parallel Stream Processor with self-assembling and self-restorable micro-architecture

Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov

Embedded and Re-configurable Systems Lab

RYERSON University, CANADA

Lev Kirischian, Irina Terterian, Pil Woo Chun and Vadim Geurkov

Embedded and Re-configurable Systems Lab

RYERSON University, CANADA

Page 2: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Example of Multi-task Data-Flow workload Example of Multi-task Data-Flow workload where each task can run in different modeswhere each task can run in different modes

Time

Tasks

Task 1: Mode 1 Mode 2 Mode 3

Task 2: Mode 1 Task 2: Mode 2

Task 3

Task 4: Mode 1 Mode 3 Mode 4 Mode 7

Page 3: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Software-to-task optimizationSoftware-to-task optimization allows using conventional allows using conventional computing platforms with computing platforms with fixed architecturefixed architecture (Superscalar, (Superscalar, VLIW, etc.) coupled with software compilers and OS. VLIW, etc.) coupled with software compilers and OS.

Limitations of the conventional processors Limitations of the conventional processors

1.1. If tasks are executed on sequential computing system – processing If tasks are executed on sequential computing system – processing time often cannot fit specification requirementstime often cannot fit specification requirements

2.2. If tasks are executed on parallel computing system with fixed If tasks are executed on parallel computing system with fixed architecture – cost-effectiveness of these parallel computers strongly architecture – cost-effectiveness of these parallel computers strongly depend on the tasks algorithm or data structure depend on the tasks algorithm or data structure

Usual Approach: Conventional Processors with Software-to-Task Optimization (Compilers +OS)

Page 4: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

ASP allows reaching required cost-performance ASP allows reaching required cost-performance parameters because ASP-architecture is optimized on parameters because ASP-architecture is optimized on data-flow graph of the task and task data structuredata-flow graph of the task and task data structure

Alternative Approach: Application Specific Processors(ASP) with Static Hardware-to-Task OptimizationAlternative Approach: Application Specific Processors(ASP) with Static Hardware-to-Task Optimization

1.1. Decrease of performance if task algorithm or data Decrease of performance if task algorithm or data structure changesstructure changes

2.2. Limited possibility for further modernization Limited possibility for further modernization 3.3. High cost for multi-task or multi-mode custom High cost for multi-task or multi-mode custom

computing systemscomputing systems

Limitations for the Application Specific Processors Limitations for the Application Specific Processors

Page 5: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Proposed Approach: Reconfigurable Processor with Dynamic Architecture-to-Task Optimization

High-performance computing system for multi-task data-High-performance computing system for multi-task data-flow applications should contain two major components:flow applications should contain two major components:

1. 1. Dynamically Re-configurable Computing PlatformDynamically Re-configurable Computing Platform based on partially-configurable FPGA devices to provide based on partially-configurable FPGA devices to provide maximum possible hardware flexibility.maximum possible hardware flexibility.

2. Library of 2. Library of Application Specific Virtual ProcessorsApplication Specific Virtual Processors (ASVP) – configuration bit-streams to program On-Chip (ASVP) – configuration bit-streams to program On-Chip Application Specific Processor’s circuitry for the period Application Specific Processor’s circuitry for the period of time while Application (Task) is active. of time while Application (Task) is active.

Page 6: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Architecture of Partially Reconfigurable FPGA devices (Xilinx “Virtex” Family)Architecture of Partially Reconfigurable FPGA devices (Xilinx “Virtex” Family)

I / OI / OFrameFrame

I / OI / OFrameFrame

CLBsCLBsFrameFrame

# 1# 1

CLBsCLBsFrameFrame

# N# N

BlockBlockRAMRAM

CLBsCLBsFrameFrame

# i# i

BlockBlockRAMRAM

Internal (Virtual BUS)Internal (Virtual BUS)

Internal Configuration SRAMInternal Configuration SRAM

Configuration Data FilesConfiguration Data Files

CLB - CCLB - Configurableonfigurable L Logicogic B Block - Uniform Logic Element of a lock - Uniform Logic Element of a Frame, smallest individually configurable component in the FPGAFrame, smallest individually configurable component in the FPGA

In Out

Page 7: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Application Specific Virtual ProcessorApplication Specific Virtual Processor (ASVP) (ASVP) – – a group of logic resources dedicated and optimally a group of logic resources dedicated and optimally configured to reflect the algorithm and data structure configured to reflect the algorithm and data structure of the task.of the task.

ASVP is presented in a form of configuration data file ASVP is presented in a form of configuration data file (configuration bit-stream) to be downloaded into the (configuration bit-stream) to be downloaded into the FPGA when task should be activatedFPGA when task should be activated

Concept of Application Specific Virtual Concept of Application Specific Virtual Processor (ASVP)Processor (ASVP)

Page 8: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

1. ASVP-core downloads to the Reconfigurable 1. ASVP-core downloads to the Reconfigurable platform before task activationplatform before task activation

2. ASVP performs the task data processing as long as it 2. ASVP performs the task data processing as long as it is necessary without interruption or time sharing of is necessary without interruption or time sharing of dedicated logic resources with any other taskdedicated logic resources with any other task

3. After task completion all resources included in3. After task completion all resources included in the ASVP can be re-configured for any other task.the ASVP can be re-configured for any other task.

Life-cycle of Application Specific Virtual ProcessorLife-cycle of Application Specific Virtual Processor

Page 9: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

ASVP Architecture-to-Task Optimization ASVP Architecture-to-Task Optimization in Partially Reconfigurable FPGAin Partially Reconfigurable FPGA

Data-Flow GraphData-Flow Graph

XOR XOR

Data InData In

+

Data OutData Out

InputInput

OutputOutput

XXOORRXXOORR

++

FPGAFPGA

FPGA Slots: 1 2 3 ... FPGA Slots: 1 2 3 ...

Internal (Virtual) BUSInternal (Virtual) BUS

VirtualHardware

ComponentXOR

Page 10: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Processing

Element (PEi)

Interface Element (IEj)

Local routing

Tri-state Buf-fers

VHC

Global Routing

Lines

Micro-architecture of a Virtual Hardware Component

Page 11: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Virtual Bus

Virtual Hardware Component Boundary

Virtual Hardware Component & Virtual Bus Interconnection

Page 12: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

MOi

X i Y i … X n Y n

MOn

MOk

Result

VHCi

{MO i} IE i

VHC n

{MOn} IE n

VHC k

{MOk} IE k

Application Specific Virtual Processor (ASVP)

I/O B LOCK

Xi Yi … Xn Yn Result

Virtual. Bus Lines # i # i+1 # i+2 # n #n+1 #n+2 #k

Micro-architecture of Application Specific Virtual Processor (ASVP)

Micro-architecture of ASVP is based on Virtual Hardware Components interconnected via Virtual Bus lines

Page 13: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

ASVP 2ASVP1 for Task 1

Virtual Bus

Data in #2

FU 3FU 2FU 1 FU 4

Data out #2

I/O 3 I/O 4I/O 1 I/O 2Data in #1

Data out #1

Data out #3

ASVP 3

Parallel Task Processing on the Dynamically Re-Parallel Task Processing on the Dynamically Re-configurable Stream Processor (DRSP) configurable Stream Processor (DRSP)

RIM 1 RIM 2 RIM 3 RIM 4

Page 14: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

DRSP: System Level ArchitectureDRSP: System Level Architecture

Host PCHost PC

Task Memory Task 1:{Afix+Amodes}

………………….Task h:{Afix+Amodes}

PCI-

Bus

PCI-InterfaceModule

PRCP-basePRCP-baseReconfigurableReconfigurableFunctional UnitFunctional Unit

Afix Afix ii + … + …

ReconfigurableReconfigurableFunctional UnitFunctional Unit

Afix Afix ii + … + …

Data Stream SourceData Stream SourceData Stream SourceData Stream Source

Configuration& Data Bus

Configuration& Data Bus

Data OutData OutData OutData Out

RT-HOS

Cache Memory

{Amodes i}

Cache Memory

{Amodes i}

Page 15: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Architecture of Reconfigurable Computing Module Architecture of Reconfigurable Computing Module

2 x 3.43 Gbit / S (12 bit*300 MHz) Input LVDS ports2 x 3.43 Gbit / S (12 bit*300 MHz) Input LVDS ports

2 x 3.43 Gbit / S (12 bit*300 MHz) Output LVDS Ports2 x 3.43 Gbit / S (12 bit*300 MHz) Output LVDS Ports

8.12 Gbit /SLVTTL

BUS(64 bit x133MHz)

Reconfig.Functional

Unit [ RFM

0111-002]

Reconfig.Functional

Unit [ RFM

0111-002]

Real-Time HardwareOperating SystemBased on XCV50E

Vertex FPGA

Real-Time HardwareOperating SystemBased on XCV50E

Vertex FPGA

PCIInterface800

Mbit/S

PCIInterface800

Mbit/S

SPISPI

SPISPI

Config.Files / DataCache (4x512KB)

Config.Files / DataCache (4x512KB)

Page 16: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Reconfigurable Computing Module based on Reconfigurable Computing Module based on Xilinx “Virtex-E family of FPGA DevicesXilinx “Virtex-E family of FPGA Devices

Page 17: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Restoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-columnRestoration of ASVP using spare CLB-column

InputInput

OutputOutput

XXOORRXXOORR

++

AP AP ii

Column # 1 2 3 ... Column # 1 2 3 ...

Communication FieldCommunication Field

++

If hardware fault occurs If hardware fault occurs the damaged Virtual the damaged Virtual Hardware Component Hardware Component can be relocated to the can be relocated to the reserved CLB-columnreserved CLB-column..

Page 18: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

When the proposed technology is most beneficial?

• Workload consists of many tasks, where each task can run in different modes.

• Each task requires high-speed data-stream processing

• Task algorithms may be modified within life cycle of a system

• Active tasks must run in parallel and should not be interrupted in any case when one of the tasks switches its mode or terminates.

• System can be remotely or self-restored even if some hardware fault occurs

Page 19: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

DRSP Application for Networked Intelligent Manufacturing Systems

DRSP Application for Networked Intelligent Manufacturing Systems

High performance parallel data-stream processing (up to thousands of billions operations / sec.) of big volume of data (up to hundreds of Giga bits) for:

a) Complex image processing and image recognition,

b) Spectrum analysis and digital signal processing,

c) Data transmission via LAN with data compression / decompression and encryption / decryption,

d) Control of high performance manufacturing equipment and robotic systems.

High performance parallel data-stream processing (up to thousands of billions operations / sec.) of big volume of data (up to hundreds of Giga bits) for:

a) Complex image processing and image recognition,

b) Spectrum analysis and digital signal processing,

c) Data transmission via LAN with data compression / decompression and encryption / decryption,

d) Control of high performance manufacturing equipment and robotic systems.

Page 20: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Acceleration of Task / Mode SwitchingAcceleration of Task / Mode Switching

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10

Number of CLB-slots in Virtual Component

Ac

ce

lera

tio

n

Acceleration of task or mode switching comparing with Entire FPGA-based system increases when number of CLB-columns in ASVP is minimal and can be over that 20 times faster

Page 21: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Modes

Tasks

2 4 8 16

4 2.8 4.4 7.6 14

8 5.6 8.8 15.2 28

16 11.2 17.6 30.4 56

When number of tasks and task modes increases in a workload, respectively increases the cost-effectiveness of DRSP

Minimization of Hardware Resources

Minimization of Logic resources in DRSP approach Comparing with entire FPGA-based systems:

Page 22: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

SUMMARY: RDSP Comparing with SUMMARY: RDSP Comparing with Conventional CPU, DSP or ASP PlatformsConventional CPU, DSP or ASP Platforms

DRSP DRSP Conv. CPU DSP ASP Conv. CPU DSP ASP

PerformancePerformance

FlexibilityFlexibility

ReliabilityReliability Much lower Much lower than DRSPthan DRSP

Lower than Lower than DRSPDRSP

Much lower Much lower than DRSPthan DRSP

Much lower Much lower than DRSPthan DRSP

Much lower Much lower than DRSPthan DRSP

Lower than Lower than DRSPDRSP

Somewhat Somewhat higherhigher

None, or very None, or very littlelittle

Lower than Lower than DRSPDRSP

Page 23: Lev Kirischian,  Irina Terterian, Pil Woo Chun and Vadim Geurkov

Thank youThank you