dynamic hardware software partitioning

37
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda

Upload: chelsey

Post on 25-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Hardware Software Partitioning. A First Approach. Komal Kasat Nalini Kumar Gaurav Chitroda. Outline. Hardware-Software Partitioning Motivation Introduction to Dynamic Hw-Sw Partitioning System Architecture Tool Overview Experiments Conclusion. Hardware Software Partitioning. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Hardware Software Partitioning

Dynamic Hardware Software PartitioningA First Approach

Komal KasatNalini KumarGaurav Chitroda

Page 2: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 3: Dynamic Hardware Software Partitioning

Hardware Software PartitioningGiven an application dividing the tasks into

software on microprocessor and hardware co-processors

Software used for features and flexibilityHardware used for better performanceExample applications:

Microwave Oven Cell Phone Camera

Page 4: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 5: Dynamic Hardware Software Partitioning

MotivationEver increasing embedded system

design complexityReduction in energy consumptionBetter performance compared to

using only softwareBetter optimization than dynamic

software optimization techniquesAvailability of Single Chip platform

for microprocessor and FPGA

Page 6: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 7: Dynamic Hardware Software Partitioning

IntroductionProblem with current approach

Tool flow problems 1st - designer uses profiler 2nd - use compiler with partitioning capabilities 3rd - apply synthesis tool Integration requires extra designer effort Very complicated compared to typical

software design

Need a more transparent approach – Dynamic Hw-Sw Partitioning

Page 8: Dynamic Hardware Software Partitioning

Dynamic Hw-Sw Partitioning

Page 9: Dynamic Hardware Software Partitioning

Advantages Partitioner entirely on chipTransparent process-

-no extra designer effort-no disruption to standard tool flows

Can use existing compilers while partitioning

Can tune system to actual usage and data values

Can adapt to new usage over timeSupports legacy programs

Page 10: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 11: Dynamic Hardware Software Partitioning

System Architecture

Micro - Processor

Dynamic Partitioning

Module

Configurable

Logic

Memory (with

application software)

Page 12: Dynamic Hardware Software Partitioning

Dynamic Partitioning Module (DPM)

Memory

Profiler

Partitioning Co-

Processor

Profiler: to detect most frequently executed application software loops

Partitioning co-processor: to decompile and synthesize the selected binary regions for hardware implementation

Memory:To run the program Dynamic Partitioning

Module

Page 13: Dynamic Hardware Software Partitioning

Issues with DPMSeems to impose much size

overhead compared with the main processor

But this will not pose much of a problem:

Co-processor much leaner than the main processor

Dozens of main processors may share a single DPM

Overhead becomes smaller as platform complexity increases

Page 14: Dynamic Hardware Software Partitioning

Configurable Logic

DMA R0_Input R1_InOut

Configurable Logic Fabric

Page 15: Dynamic Hardware Software Partitioning

Configurable LogicUses DMA controller to access memoryR0_Input – 32 bit input register to store

dataR1_InOut – 32 bit register for input and

outputA fixed 32 bit channel connecting output

of configurable fabric to R1-InOut regStore output data in R1_InOut before DMA

controller writes data back to memory

Page 16: Dynamic Hardware Software Partitioning

Current Architecture LimitationsSimpler than existing commercial platformsConfigurable logic implements

combinational logic onlyLoops must have single cycle

implementationMemory access limited to sequential

addressesNo. of loop iterations determined before

loop execution

Inspite of the above limitations, significant speedup is achieved

Page 17: Dynamic Hardware Software Partitioning

Configurable Logic Fabric (CLF)General configurable logic fabric is

capable of handling most complex designs

But mapping, place and route is very time consuming

Logic to implement typical software inner loops much simpler

Developed a simple CLF to simplify the place and route

Page 18: Dynamic Hardware Software Partitioning

CLF Architecture

LUT

SM SM SM

SM SM SM

LUT

Page 19: Dynamic Hardware Software Partitioning

0 1 2 3

0 1 2 3

0

1

2

3

0

1

2

3

•4 routing channels on each side•Connection from one side to another only on the same channel•This is done to simplify the routing•Special connection matrix at bottom of CLF for switching channels

Switch Matrix (SM)

Page 20: Dynamic Hardware Software Partitioning

Look - Up Table (LUT)

SRAM (8 x 2)

I n p ut s

I n p ut s

Outputs

•3 ip – 2 op LUT

•8 word, 2 bit wide SRAM

•Can connect LUT to the routing channels from either side

•Can connect the outputs of the LUT only at the bottom

Page 21: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 22: Dynamic Hardware Software Partitioning

Tool Overview

Page 23: Dynamic Hardware Software Partitioning

Tool Overview

Page 24: Dynamic Hardware Software Partitioning

Loop Profiler Detects regions of software to be implemented as

hardware Monitors instruction addresses on memory bus :

non-intrusive On backward branch update cache entry which

stores branch frequencyDecompilationConverts software loop into high levelConverts each assembly instruction to

corresponding register transfersCreates control flow and data flow graphsApplies standard compiler optimizations

Page 25: Dynamic Hardware Software Partitioning

Tool Overview

Page 26: Dynamic Hardware Software Partitioning

Register Transfer & Logic Synthesis

RT converts each output bit into Boolean expression Logic Synthesis creates DAG of Boolean logic Nodes of DAG correspond to simple logic gates Optimize using logic minimization algorithm

Technology Mapping Traverse DAG backward from op node Combine nodes to create LUT’s Map the final 3ip – 2op LUT’s to the CLF

DMA Configuration Maps memory access of decompiled loop onto DMA Detect read/writes, ++, --, address updates, etc. Remove address calculations, loop counters, exits Starts data transfer

Page 27: Dynamic Hardware Software Partitioning

Tool Overview

Page 28: Dynamic Hardware Software Partitioning

Placement Relative placement, determine critical path, place this

path into single horizontal row Analyze dependency between placed and non-placed

nodes Each node placed above (input to placed node) or below

(uses output from placed node) the dependant node

Routing Uses simple greedy algorithm 3 steps:

between input nodes and LUT’sbetween LUT’s and outputsconnect LUT’s together

Routing done through switch matrices If route not available back track to find alternatives

Page 29: Dynamic Hardware Software Partitioning

Tool Overview

Page 30: Dynamic Hardware Software Partitioning

Combines place and route hw description with DMA configuration

Bitfile is used to initialize the configurable logic

Bitfile Creation

Binary ModificationReplace sw by jump to hw initialization codeWrite to port connected to hw enableCode for putting µP in sleep modeHw asserts completion signal after executionµP resumes from end of original sw loop

Page 31: Dynamic Hardware Software Partitioning

Dynamic Partitioning Tool Details

Tool Code Size (Lines

Binary Size (Kbytes)

Data Size (Kbytes) Time (s)

Decompilation DMA ConfigRT Synthesis

7203 125 452 0.05

Logic Synthesis

Tech. MappingPlace & Route

4695 88 360 1.04

Total 213 1.09

Dynamic hw-sw partitioning is feasible if partitioning module can fit in a small area Overhead due to the partitioner in terms of power and size should be less Sometimes when separate processor not possible, partitioning module may share existing processor

Page 32: Dynamic Hardware Software Partitioning

OutlineHardware-Software PartitioningMotivationIntroduction to Dynamic Hw-Sw

PartitioningSystem ArchitectureTool OverviewExperimentsConclusion

Page 33: Dynamic Hardware Software Partitioning

ExperimentsExampl

eTotal Ins

Loop Ins

Loop Time %

Loop Size %

Ideal Speedu

pBrev 992 104 70.0 10.5 3.3

G3fax1 1094 6 31.4 0.5 1.5G3fax2 1094 6 31.2 0.5 1.5

url 13526 17 79.9 0.1 5.0logmin 8968 38 63.8 0.4 2.8

Avg: 55.3 2.4 2.8

Benchmark Information

Page 34: Dynamic Hardware Software Partitioning

Example

Sw Time

Sw Loop Time

Hw Loop Time

Sw/Hw Time S

Brev 0.05 0.03 0.001 0.02 3.1G3fax1 23.50 7.35 0.82 16.98 1.4G3fax2 23.50 7.39 1.49 17.61 1.3

url 379.90 303.74 13.29 89.45 4.2logmin 16.32 10.42 0.21 6.12 2.7

Avg: 65.78 3.16 26.03 2.6

Dynamic Partitioning Results

On an average hw execution 20 times faster than software

Achieve close to ideal speedup of 2.8

Page 35: Dynamic Hardware Software Partitioning

Determining Hardware PerformanceProduct of total loop iterations and total loop

executionsLoop bodies are single cycleSo product represents total cycles spent in the loop Included initialization and write back timeDetermine delay through all transistors along

critical pathDelay small enough to run configurable logic at

60MHz

Page 36: Dynamic Hardware Software Partitioning

ConclusionDynamic hardware software partitioning

approach better than traditional approachTransparent : benefits of partitioning using

standard software tool flowsAdapt to applications actual usage at run timeObtained close to ideal speedup Future work: extend it to sequential logic and

more complex memory access patterns

Page 37: Dynamic Hardware Software Partitioning

QUESTIONS