embedded computer architecture asip application specific instruction-set processor 5kk73 bart mesman...

34
Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

Upload: harry-harrell

Post on 04-Jan-2016

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

Embedded Computer Architecture

ASIPApplication Specific Instruction-set

Processor

5KK73

Bart Mesman and Henk Corporaal

Page 2: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Embedded Computer Archtiecture H.Corporaal and B. Mesman

2

flexibility

efficiency

DSP

Programmable CPU

Programmable DSP

Application domain specific

Applicationspecific processor

Application domain specific processors (ADSP or ASIP)

Page 3: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

3

Application domain specific processors (ADSP or ASIP)

takes a well defined application domain as a starting point• exploits characteristics of the domain (computation kernels)• still programmable within the domain

e.g. MPEG2 coding uses 8*8 DCT transform, DECT, GSM etc ...

performance: clock speed + ILP ILP,DLP, tuning to domain flexible dev. (new apps.) cost effective (high volume)

Appl. domain

implementation

ADSP

implementation

Appl. domain

GP

problems - specification manual design, - design time and effort large effort => synthesized cores

Page 4: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

4

Part DescriptionClock(MHz)

Size(gates)

ROM(Kbyte)

RAM(Kbyte)

Speech Components

ADPCM Full duplex ITU-T G.726 compliant and 40 kbit/s speech-compression encoder/decoder. 4 5,100 1.3 0.128

ADPCM-16 Full duplex 16 Channel ITU-T G.726 compliant 16, 24, 32 and 40 kbit/s speech-compression encoder/decoder. 32 10,200 1.3 2.048

IW-ASRSpeechRecognition

Template-based speaker-dependent, isolated-word automatic speech recognition 1.3 9,000 6approx.1kbyte/word

G.723.1 Low bit-rate ITU-TG.723.1 compliant speech-compression at 6.3 kbit/s; can be combined with G.723.1A. 20 24,000 22 2.3

G.723.1AExtended version of G.723.1 to reduce bit rate by a silence compression scheme. Uses voice activity detection andcomfort-noise generation. Fully compliant with Annex A of speech-compression standard CODEC G.723.1.Yields no additional hardware cost.

20 24,000 22 2.3

SpeechSynthesis

Phrase-concatenated speech synthesisDepends on compressionrequirements

Telecommunications

EchoCancellation

High-performance Echo-cancellation and suppression processor. 4 6,000 2.80 0.15

DTMF Full-duplex DTMF transceiver. 2 4,000 1.00 0.15

Caller-ID On-hook and off-hook caller line identification. Includes DTMF and V.23. 3 6,000 2.10 0.15

Reed-Solomon Full-duplex Reed-Solomon codec 7,000 3.75 0.15

ViterbiDecoder

Configurable rate, code and constraint-length. (depending on throughput) Configurable traceback depth. Supportssoft & hard decision making. Supports code puncturing.

5,000

to9,000

--- ---

V.23 modem ITU-T V23 compliant 1200 baud FSK modem 6,000 0.80 0.15

Other

Pink NoiseGenerator

Low-ripple pink noise filter with filter characteristic of -3 ± 0.08 dB per octave over the bandwidth 20Hz to 20kHz 4,000 0.10 0.10

CCIR 656/601 Digital video converter : CCIR to raw-video data and vice versa. 1,500 none none

www.adelantetech.com

Page 5: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

5

application(s)processor

-model

OK?

more appl.? yes

no

noyes

Estimationscycles/algoccupation

HWdesign

SW (code generation)

Estimationsnsec/cycle,

area, power/instr

go to phase 2

3 phases 1. exploration 2. hw design (layout) + processing 3. design appl. sw

Fast, accurate and early feedback

Design process

parametersinstance

e.g. VLIW withshared RFs

Page 6: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

6

*1

+2

*3

*4

*5

+6

+7

*8

*9

+10

IPB

OPB

ALU

MULT

IPB

OPB

+2*3

*1

*1

*3

+2

*1

*3

*4

*3

*4

*4

*3+6

*3

+6

+7*8

*5

*5

*8

*8

+7

*5

*9

*5

*9

*5

*9+10

*9

+10

CandidateLIST

Conflict & Priority Comp.

ScheduledOperation

0 0

1 1

2 2

3 3

4 4

5

ASIP/VLIW architectures: list scheduling

Page 7: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

7

A compiler is retargetable if it can generate code for a ‘new’ processor architecture specified in a machine description file.

A guarded register transfer pattern (GRTP) is a register transferpattern (RTP) together with the control bits of the instruction word that control the RTP. a: = b + c | instr = xxxx0101GRTPs contain all inter-RT-conflict information.

Instruction set extraction (ISE) is the process of generating all possible GRTPs for a specific processor.

Problem statement

Page 8: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

8

Algorithmspec

FE

CDFG

Code Generation

Machinecode

Processorspec (instance)

ISE

GRTP

Problem statement

in ch 4 this is

part of the code

generator

Page 9: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

9

PC

IM

+1

I.(20:0)

RAM

I.(12:5)

I.(4)

Inp

I.(20:13)

I.(3:2)

I.(1:0)

REG

outp

Example: Simple processor [Leupers]

Page 10: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

10

Instruction Instruction bits21111111111098765432109876543210

PC := PC + 1 xxxxxxxxxxxxxxxxxxxxxREG := Inp xxxxxxxxxxxxxxxxx011x

REG := IM PC .(20..13) xxxxxxxxxxxxxxxxx001x

REG := RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x1xREG := REG - Inp xxxxxxxxxxxxxxxxx0101

REG := REG - IM PC .(20..13) xxxxxxxxxxxxxxxxx0001

REG := REG - RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x01REG := REG + Inp xxxxxxxxxxxxxxxxx0100

REG := REG + IM PC .(20..13) xxxxxxxxxxxxxxxxx0000

REG := REG + RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x00RAM IM PC . (12..5 ) := REG xxxxxxxxxxxxxxxx1xxxxoutp := REG xxxxxxxxxxxxxxxxxxxxxRAM_NOP xxxxxxxxxxxxxxxx0xxxx

Example: Simple processor [Leupers]

Page 11: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

11

ASIP/VLIW architectures

A|RT designer template as an example (= set of rules, a model)

Differences with GP VLIW processors 1. // FUs

• ASUs = complex appl. Spec. FUs (beyond subword //) e.g. biquad, median, DCT etc …

• larger grainsize, more heterogeneous, more pipelines2. Rfiles

• many Rfiles (>5 vs 1 or 2)• limited # ports (3 vs 15) • limited size (<16 vs. 128)

3. Issue slots• all in parallel vs. 5

Page 12: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

12

RF1

FU1

RF2 RF3

FU2

RF4 RF5

FU3

RF6 RF7

FU4

RF8

IR1 IR2 IR3 IR4

Instruction memory Con-trol

flags

Page 13: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

13

readaddress

RF 1

writeaddress

RF 1

readaddress

RF 2

writeaddress

RF 2mux 1 mux 2

controlFU

outputdrivers

Additional characteristics of the A|RT designer template• interconnect network: busses + input multiplexers

mux control is part of the instruction control can change every clock cycle network can be incomplete busses can be merged

• memories are modeled as FUs separate data in and data out 2 inputs (data in and address) and 1 output

• Each FU can generate one or more flags• instruction format (per issue slot)

ASIP/VLIW architectures

Page 14: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

14

ALU MACbus1 bus2

RF1 RF2 RF3 RF4

mux 2

read RF1

write RF1

read RF2

write RF2

ALU instr.mux

3read RF4

write RF4

read RF3

write RF3

MAC instr.

091019

ASIP/VLIW architectures: example

Page 15: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

15

GRTP Instruction bits1 1 1 1 1 1 1 1 1 19 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0

RF1 = ALU (RF1, RF2) x c c c c x x c c c x x x x x x x x x xRF2 = ALU (RF1, RF2) x c x c c c c c c c x x x x x x x x x xRF3 = ALU (RF1, RF2) x c x c c x x c c c c x x c c x x x x xRF3 = MAC (RF3, RF4) x x x x x x x x x x c c c c c c x c c cRF4 = MAC (RF3, RF4) x x x x x x x x x x x c c x x c c c c cRF2 = MAC (RF3, RF4) c x x x x c c x x x x c c x x c x c c c

ASIP/VLIW architectures : example

Page 16: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

16

Datapath synthesis

Controller synthesis

OK?

Changepragmas

Algorithmspec

no

yes

RTs

Estimationsarea, power, timing

RF1 : x = RF2 : y, RF3 : z | ALU = ADDInmux = bus2

assign ( a+b, ALU, fu_alu1)assign ( a+_, ALU, fu_alu2)assign ( _+_, ALU, fu_alu3)

VLIW makes relatively simple code selection

possible

ASIP/VLIW architectures:design flow

Page 17: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

17

#define NTAPS 4

int fir(int in)int i;static int state[NTAPS];static int coeff[NTAPS];int out[NTAPS];

state[NTAPS] = in;out[0] = state[0] * coeff[0];for ( i = 1; i < NTAPS+1; i++)

out[i] = out[i-1] + state[i] * coeff[i];state[i-1] = state[i];

return(out[NTAPS]);

*

Z-1

*

Z-1

*

Z-1

*

+

c3c4 c2 c1

x4 x3 x2 x1

y

Z-1

c0

x0

*

Application examples (1)

Page 18: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

Processor Architectures and Program Mapping H. Corporaal, J. van

Meerbergen, and B. Mesman

18

.L1000006sll $3, $2, 2 R3=R2>>2 R3=i-1addu $14, $15, $3 R14=R15+R3lw $24, 0($14) R24=load(*R14) R24=coeff[i-1]addiu $12, $6, -4 R12=R6-4addu $11, $12, $3 R11=R12+R3lw $13, 0($11) R13=load(*R11) R13=state[i-1]nopmult $24, $13 R24=R24*R13addu $25, $sp, $3 R25=sp+R3lw $9, -4($25) R9=load(R25-4) R9=out[i-1]addiu $2, $2, 1 R2=R2+1 i=i+1mflo $13 R13=move from low mpy regaddu $10, $9, $13 R10=R9+R13 R10=out[i]sw $10, 0($25) mem(*R25)=R10addu $25, $7, $3 R25=R7+R3sw $24, 0($25) mem(*R25)=R24slti $24, $2, 10bne $24, $0, .L100006addiu $15, $7, -4

Application examples (1)

19 instructions per tap!!

Page 19: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

19

temp1 = input << 1temp2 = if (bit(input,7) == 1

then 29 else 0

out = temp1 exor temp2

Bit level operations:finite field arithmetic

r1 = LB input Load byter2 = SLL r1 Shift left logicalr3 = ANDI r1, mask AND immediater4 = ADDI r3, -1 ADD immediateBNE ( r4 != r0) Branch on != to nonzeronopR5 = XORI(r1, 29) Exclusive or immediateJ common Jumpnop

nonzero r5 = XOR(r1,r0) Exclusive ORcommon …

in[0] in[1] in[2] in[3] in[4] in[5] in[6] in[7]

out[0] out[1] out[2] out[3] out[4] out[5] out[6] out[7]

exor exor exor

Application examples (2)

10 instructions!!Very simple in hardware

Page 20: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

Processor Architectures and Program Mapping H. Corporaal, J. van

Meerbergen, and B. Mesman

20

srl $13, $2, 20andi $25, $13, 1srl $14, $2, 21andi $24, $14, 6or $15, $25, $24srl $13, $2, 22andi $14, $13, 56or $25, $15, $14sll $24, $25, 2

202223252627

source register ($2)

destination register ($24)

2 3 4 5 6 7

Bit level operations : DES example

Application examples (2)

Page 21: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

21

srl $24, $5, 18srl $25, $5, 17xor $8, $24, $25srl $9, $5, 16xor $10, $8, $9srl $11, $5, 13xor $12, $10, $11andi $13, $12, 1

181716 13

xor

$5

1$13

… 0 ...

Bit level operations : A5 example (GSM encryption)

Application examples (2)

Page 22: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

22

architecture viewarchitecture view

life-time analysislife-time analysis

resource loadresource load

bus loadbus load

cycle-countcycle-count

ASIP/VLIW architectures: feedback

Page 23: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

23

ImplementationIndependent

Design Database

ImplementationIndependent

Design Database

Low power aspects

• Estimation

EXU ACTIVITY AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

area

speed

power

Estimation Database

+Architecture

Mistral2 Mistral2

Page 24: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

24

GSM viterbi decoder : default solution

13750

EXU ACTIV AREA POWERalu_1 96% 3469 46196romctrl_1 48% 39 259acu_1 26% 327 1209ipb_1 5% 131 105opb_1 23% 1804 5801ctrl 9821 135035total 15591 188605

EXU ACTIV AREA POWERalu_1 96% 3469 46196romctrl_1 48% 39 259acu_1 26% 327 1209ipb_1 5% 131 105opb_1 23% 1804 5801ctrl 9821 135035total 15591 188605

• controller responsible for 70% of power consumption

– maximum resource-sharing

– heavy decision-making : “main” loop with 16 metrics-computations per iteration

• EXU-numbers include Registers for local storage

Page 25: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

25

GSM viterbi decoder : no loop-folding

• area down by 33%• power down by 35%• next step: reduce # of program-steps with

second ALU

14247

EXU ACTIV AREA POWERalu_1 92% 3411 45073romctrl_1 45% 39 255acu_1 25% 294 1087ipb_1 5% 107 86opb_1 22% 1661 5340ctrl 4919 70087total 10431 121928

EXU ACTIV AREA POWERalu_1 92% 3411 45073romctrl_1 45% 39 255acu_1 25% 294 1087ipb_1 5% 107 86opb_1 22% 1661 5340ctrl 4919 70087total 10431 121928

Page 26: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

26

GSM viterbi decoder : 2 ALU’s

9739

EXU ACTIV AREA POWERalu_1 69% 1797 12248alu_2 65% 1393 8916romctrl_1 67% 39 255acu_1 37% 294 1087ipb_1 8% 149 119opb_1 33% 2136 6871ctrl 8957 87235total 14766 116731

EXU ACTIV AREA POWERalu_1 69% 1797 12248alu_2 65% 1393 8916romctrl_1 67% 39 255acu_1 37% 294 1087ipb_1 8% 149 119opb_1 33% 2136 6871ctrl 8957 87235total 14766 116731

cycle count down 30%

area up 42% power down by 5% next step: introduce

ASU to reduce ALU-load

Page 27: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

27

GSM viterbi decoder : 1 x ACS-ASU

EXU ACTIV AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

EXU ACTIV AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

func ACS ( M1, M2, d ) MS, MS8 =begin MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi; MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;end;

func ACS ( M1, M2, d ) MS, MS8 =begin MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi; MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;end;

=

1930

cycle count down 5X power down 20X !

Page 28: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

28

GSM viterbi decoder : 4 x ACS-ASU

EXU ACTIV AREA POWERalu_1 94% 243 97acs_asu_1 95% 1041 420acs_asu_2 95% 1041 420acs_asu_3 95% 1041 420acs_asu_4 95% 1041 420split_asu_1 47% 90 18or_asu_1 47% 592 118romctrl_1 28% 48 6acu_1 98% 212 85ipb_1 23% 60 6opb_1 50% 369 80ctrl 1306 555total 7084 2645

EXU ACTIV AREA POWERalu_1 94% 243 97acs_asu_1 95% 1041 420acs_asu_2 95% 1041 420acs_asu_3 95% 1041 420acs_asu_4 95% 1041 420split_asu_1 47% 90 18or_asu_1 47% 592 118romctrl_1 28% 48 6acu_1 98% 212 85ipb_1 23% 60 6opb_1 50% 369 80ctrl 1306 555total 7084 2645

cycle count down another 5X

area up 23% power down another

3X !

425

Page 29: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

29

GSM viterbi example : summary

ImplementationIndependent

Design Database

ImplementationIndependent

Design Database

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

default loop 2 ALU 1 ACS 4 ACS

power

areacycles

72x !72x !

Mistral2 Mistral2

Page 30: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

30

Exploration phase

Application softwaredevelopment:

constraint driven compilation

application(s)processor

-model

OK?

more appl.? yes

no

noyes

HWdesign

SW (code generation)

application(s)

OK?no

yes

SW (code generation)

Freezeprocessor

model

no

Discussion: phase 3

Page 31: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

31

RF1

FU1 FU2 FU3 FU4

IR1 IR2 IR3 IR4

Instruction memory Con-trol

flags

RF2 RF3 RF4

Page 32: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

32

Discussion: problems with VLIWs

• code compaction = reduce code size after scheduling possible compaction ratio ?e.g. p0 = 0.9 and p1 = 0.1 information content (entropy) = - pi log2 pi = 0.47

maximum compression factor 2 • control parallelism during scheduling = switch between

different processor models (10% of code = 90% runtime) • architecture

reduce number of control bits for operand addressese.g. 128 reg (TM) -> 28 bits/issue slot for addresses only=> use stacks and fifos

code size and instruction bandwidth

Page 33: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 33

n n A n n n n nn B n n n n n nn n n n n C n nn n n n n D n nn n n E n n n nF n n n n n n nn n n n n n G nn n n n n n n H

A B C D E F G H0 0 0 0 0 0 0 0

n B A n n C n nn n n E n D n nF n n n n n n nn n n n n n G H

A B C D E F G H1 1 0 1 0 0 1 0

A B C D E F G H1 1 1 1 1 1 1 0

A B C D E F G H

Fully serial

Mixed serial/parallel

Fully parallel

Velocity encoding

Classical encoding: fetching many nops

Page 34: Embedded Computer Architecture ASIP Application Specific Instruction-set Processor 5KK73 Bart Mesman and Henk Corporaal

04/20/23 Platform Design H.Corporaal and B. Mesman

34

Conclusions

• ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency).

• The methodology is interesting for IP creation.

• The key problem is retargetable compilation.

• A (distributed) VLIW model is a good compromise between HW and SW.

• Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback.