design of embedded dsp processors · 10/9/2017 unit 11 of tsea26 –2017 –h1 14 isa template...

46
by Dake Liu: [email protected] © Copyright of Linköping University, all rights reserved ® 10/9/2017 Unit 11 of TSEA26 2017 H1 1 Design of Embedded DSP Processors Unit 11: ASIP design review and applications

Upload: others

Post on 25-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 1

Design of Embedded DSP

Processors

Unit 11: ASIP design

review and applications

Page 2: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 2

Review of the

course

Page 3: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

To save your time

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 3

Page 4: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

To save your time

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 4

Page 5: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 5

What should we get from the course

1. ASIP concept & design flow

2. Profiling and plan for HW design

3. ASM and micro architecture design

– ALU, MAC, and Register file

– PFC and memory addressing

4. Toolchain design

5. FW design and benchmark

6. Integration and verification

7. Advanced ASIP architectures

Page 6: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

What is an ASIP

10/9/2017 6

• HW-SW co-design

for an applicaion

domain

• Accelerate 10% codes

running 90% time

• Scoped flexibility

• Usually based on

an instruction set

template

• + custom arch

for datapath, data

access, & control

• To reach custom

performance and

power, silicon cost

Page 7: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Compare CPU, ASIP, and ASIC

CPU

• All general applications

• X86, ARM,

• High end, high cost

• Strong SW ecologicalsupports

ASIP

1.For embedded applications

2.Low software ecological requirement

3.Performance design for an application domain

ASIC

• A function module not progrmmable

• Very high performance

• High design cost & Short life time

Page 8: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®Embedded computing performance required by market

1GOPS

10 100 1T 10T

Video/image

Baseband

Learning

Graphics

H.263

H.264

High end ISP for quality camera

H.265

WCDMA

11g/a

LTE, LTEA, LTE-Hi terminal

Word recognition

LTE base stations

2D graphics

3D video games

Car registration

AR VR associated

Deep learning

in terminal

1080p 8k

GSM

5G BS

CT

Radar

HSPA

language learning

Ultrasonic array

3DTV

Page 9: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

ASIP on markets (mostly in SoC as IP)

2017/10/9 Unit 11 of TSEA26 – 2017 –H1 9

ASIP Applications IP/year $/year

SDR baseband Handset, base station 1B 3-5B

ISP for image and video Handset, video, camera 1B 1-2B

Video codec Handset, survaillence 0.5B 2-4B

Storage SSD, Memory cards >100M ~500M

Gateways Gateway, home gateway >50M ~500M

Network processors ISP, router, industrial >10M ~100M

Industrial control Motion and motor control 100M 300M

Robots Vision, control 10M 100M

IoT Communication, sensing 50B 50B

Deep learning Server, terminal ? ?

Defense DFE Baseband, sensing, ISP ? ?

Defense AP Recognition, decision ? ?

……Video application, VR, AR, medical, toys, home, and much more……

Page 10: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

2017/10/9 Unit 11 of TSEA26 – 2017 –H1 10

ASIP design flow

Source code analysis, Decision for ISA of ASIP

Design instruction set and toolchain for prototyping

Benchmark (kernel), evaluate microarchitecturte

Microarchitecture design, VLSI design, Verifications

Change

ISA?Satisfied?

Yes

No

Page 11: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Code profiling to find what to accelerate

• We gave examples, not yet systematically gave

profiling methods and profiling tools.

– Collect algorithms from codes and related text books

• Algorithm scope is related to product life-time (up to you)

– Profiling flow (tool selection and use of the tool):

• Select a (static/dynamic) tool & right data set, set up flow

• To find 10% codes run at 90% time, accelerate!

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 11

Page 12: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

What shall we accelerate

1. Computing: Instruction fusion or magic

instructions (Eric Anders)

2. Data access: To hide (pipeline) data access

cost behind computing (Andreas K)

3. Control: Minimize control overheads by

hiding it or using extra control HW (ch14)

4. NoC: Reduce SoC / NoC cost before chip

integration (special I/O for core).

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 12

Page 13: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 13

How do we accelerate ASIP requirement specification

Early manual partition according to application profiling

ASIP Integration, final function verification and performance validation

Instruction set

specification

Assembly instruction set simulator

Benchmarking of

instruction set

Application SW implementation

Processor architecture

specification

Microarchitecture design

Processor HW implementation

Implement the function as a subroutine

Implement the function as an instruction

Implement the function as a subroutine

Implement the function as an instruction Design for HW

acceleration

Page 14: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 14

ISA template selection

A typical ASIP DSP processor assembly instruction set

Move

instructions

Arithmetic

instructions

Control

Lo

ad i

mm

edia

te d

ata

Lo

ad o

r st

ore

bet

wee

n

mem

ory

an

d r

egis

ters

Mo

ve

bet

wee

n r

egis

ters

Gen

eral

ari

thm

etic

, lo

gic

,

shif

t /

rota

te i

nst

ruct

ion

s

Div

isio

n a

nd

oth

er v

ecto

r

and

ite

rati

ve

inst

ruct

ion

s

Lon

g a

rith

met

ic o

per

atio

ns

Bit

an

d b

its

man

ipu

lati

on

s

Bra

nsc

h a

nd

cal

l

Oth

er p

rog

ram

flo

w c

on

tro

l

inst

ruct

ion

s

Res

erv

ed f

or

acce

lera

tio

n

exte

nsi

on

RISC CISC

Mu

ltip

lica

tio

ns

CISC

instructions

MA

C a

nd

co

nv

olu

tio

n

Rep

eat

inst

ruct

ion

Accelerate

extensions

Page 15: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Design an basic instruction set

• Load store instructions: between all registers

and to / from memories

• ALU instructions: for single, double precision,

signed, unsigned, integer, fractional, arithmetic,

logic, & iterative (innermost loop) computing.

• Flow control instructions: cover conditional

unconditional jumps, call / returns, NOP, loop

control / repeat, and custom control.

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 15

Page 16: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Numerical presentationsSigned

integer

Unsigned

integer

Signed

fractional

Block

floating point

Floating

point

Normal arithmetic Y Y Y Y

Audio, voice Y Y

Image, video Y Y

Normal DSP Y Y Y Y

Logic operations Y

Control flow Y

Addressing Y

HPC Y

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 16

Guarding and scaling before computing

Result = truncation (saturation (rounding (scaling (A))))

Page 17: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Coding for an instruction set

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 17

All micro-operations in an assembly instruction

Imp

licit

mic

ro-o

pera

tio

ns:

for

ex

am

ple

bu

s tr

an

sacti

on

s,

an

d i

nst

ructi

on d

ecod

ing

Explicit micro-operations specified in assembly manual:

Explicit micro-operations specified in assembly

code and binary machine code:

Implicit micro-

operations not

specified in

assembly code:

For example

flag ops and

PC<=PC+1

Data

mem

ory

ad

dre

ssin

g

Op

era

nd

s

Dest

inati

on

Op

era

tio

n

Ex

pli

cit

specif

iers

Targ

et

ad

dre

ssin

g

• An assembly instruction set in binary format can be executed by HW. • Othorgonal coding for structural design: easy instruction decoding

• Efficiebnt coding for low program memory cost: may be less flexible

Page 18: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 18

A code multiplexing example

Type 1 (2b) Sub type a (2b) Operation code (6b) Operand A (5b) Operand B (5b)

Target address (20b) for 1M PM space

Operand A (5b) 16-b constant

Register (5b) Memory address (16b)

Multiplexing code

(Control codes)

Multiplexed fields

Type 1 (2b) Sub type a (2b) Operation code (6b)

Type 1 (2b) Sub type a (2b) Operation code (6b)

Type 1 (2b) Sub type a (2b) Operation code (6b)

The trade off between orthogonality and code density

Page 19: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 19

Prepare for the

exam

Page 20: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 20

Prepare for the exam1. Check your ASIC and ASIP knowledge

1. Basic concepts (small questions), design

for accelerations

2. Design for an ALU and a register file

3. Design for a MAC, convolution, and other

operations in MAC

4. Design for normal / accelerated (modulo)

memory addressing

2. ASIP: PFC, jump, repeat, call, return,

and instruction decoding

Page 21: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Basic concepts, not limited to

1. Use Y-chart (behavior, structure, physical) to specify HW

2. Hardware multiplexing (MUX, keeper), pipeline, memory

concepts (principle, model, partition)

3. Finite precision and design/verification corners (Datapath,

Data access, and Control path corners)

4. Critical path, pipeline balance, and (hidden) fan-out

5. Hazard, delay slot, and pipeline induce problem (RTL/SIM)

6. Basic concept of assembly coding tools and FW design using

HW knowledge from the course

7. Anything mentioned during teaching and tutorials as well as

lab discussions (low %, yet essential).

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 21

Page 22: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Register file

• Write port (how many inputs, from all SRF)

• Registers (operand keepers)

• Operand output ports (to RF, all DP, SRF, M)

• Special registers (where and how to access)

• Hidden critical path from control

• How to design a RF with multi write ports

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 22

Page 23: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

ALU

• Learn hardware multiplexing (MUX design)

• Design multiplexer controls (control table)

• Primitive based design method and portable

design (barrel shift primitive)

• E.g., special functions, such as flags

• E.g., special functions, such as ABS, MAX

• E.g., advanced, register forwarding (control)

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 23

Page 24: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

MAC

• Integer / fractional, signed / unsigned MUL

• How to emulate double precision MUL

• MAC using fractional data, MAC supporting

very long iteration

• Arithmetic computing using accumulator

• R = truncation(saturation(round(scale(A))))

• There will be at least one question of MAC

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 24

Page 25: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Data access

• Basic memory sub system design knowledge:

peripheral, multi blocks, hierarchy

• Design multiple address pointers in parallel

• Modulo addressing principle, circuit design

• Overflow and underflow checking

• Acceleration for custom data accesses

• Is addressing circuit a signed / unsigned HW?

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 25

Page 26: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Control path

• Skills to design an instruction decoder logic

• Skills to design a basic PC FSM circuits

– Hazards and handling principle /circuits

– Pipeline execution table of a processor core

– Delay slot design for (conditional) jumps

– Design flush control when a jump is taken

– Design for repeat control

• Option: control of register forwarding

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 26

Page 27: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Toolchain

• Basic concepts will be in the exam

• Toolchain concept from user’s point of view

• Basic knowledge from the lab to use tools

• How to design a ISS for programmers

– (Lab4) FSM, clock counting, hazard, debugging

• How to design a microarchitecture simulator

• How to write C function for an instruction

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 27

Page 28: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Firmware design

• Algorithm selection, constraints (time, Memory costs)

• How to insert gain measurements / gain controls in a

program while suing finite precision fixed point data

• To use register variable lifetime and optimize codes

• Why cycle checking is the last step before compiling?

• Innermost loop subroutine (speed up / hazard control

by scheduling and unrolling)

• FW programming / development flow (with 3 entries)

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 28

Page 29: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Vector processors

• Acceleration opportunities on instruction level

– Parallel computing, hidden data accesses and control

– Instruction fusion, magic instructions

• Three kinds of SIMD architectures (our definition)

– Vector (flat), Reduce, and 2D datapath

• Three basic SIMD challenges

– Data alignment: permutation based access on SPMs

– Conditional execution: Execute true & false separately

– Compiling: avoid doing it, using intrinsic based model

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 29

Page 30: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Integration and verification

• What to care during core integration

– Function, structure, and physical

• Verification

– Compliance test

– Corner test (DP, data access, control corners)

– Write ASM code & select data for a corner test

• What is DUT and how to write a test suit

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 30

Page 31: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Chapters / sections you can skip

• Following sections will not be essential and will not be examed

– 1.7.3, 2.1.6, 2.1.8, 3.1, 3.2, 3.4, 3.5.1, 3.5.2, 3.5.4, 3.5.5, 4.6

– chapter 5, 6, 9, 16, and 17 will not be examed

– 7.1.2, 7.1.3, 7.1.4, 7.1.5, 7.4, 7.5, 8.2, 8.3, 8.4, 8.9, 10.1, 10.5, 11.3

– Do not read chapter 14. Read my compendium instead

– 18.2.5

– 19.1 is rather old. Carefully follow my lecture/slides is enough

– 19.2 is OK to read. Chapter 20 is rather old, try to follow my slides

• To reach high score in the exam, you can skip listed part. You

are suggested to read through the book if you really want to

design a processor.

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 31

Page 32: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 32

Application

case studies

Page 33: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Ba

seb

an

d f

un

ctio

n F

low

Transform FFT

Beamforming

Decimator, Rate matching (Farrow filter)

Energy measure, gain control

Rotator, notch, bandpass filter

Symbol and frame synchronization

Channel estimation

Single carrier: finger

finder

OFDM: LS or

MMSE

Matrix pre-process

Single carrier: RAKE

receiver

OFDM: LS /

MMSE

Data detection (soft LLR, hard)

De-interleave

FEC

Turbo LDPC

CRC

CRC

Rate matching and interleaving

Precoding

Modulation

CC

Error correction coding

Turbo LDPC RS

Beamforming

CDMA

Scrambling and Channel access

multiplexing

OFDM

FFT and Channel access

multiplexing

Filtering for pulse shaping

DPD for RF power amplifier

Viterbi

DAC interface

ADC interface

MCU

Legends for functional partition

Symbol FEC BIT MAC interface

MAC interface

FEC

RS

Page 34: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 34

Baseband subsystem

MCU (the baseband controller)

Baseband connection network

Symbol processor

DFE

Symbol processor

Matrix

LLR

processor

FEC

processor

Host interface

Memory interface

ADC port

DAC port

Bit

processor

Symbol processor

FFT

Different kinds of SIMD processors

SIMD1 SIMD2 SIMD3 SIMD4 SIMD5 SIMD6

Page 35: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

DFE – Digital Front End,

2 designer-years

• Function: Low pass, band pass, biquad, and

Farrow filters, rotators for I and Q

• Structure: dedicated SIMD, translate IIR to

avoid dependence

• Physical constraints: Up to 100 MAC

operations for one sample data (I or Q), two

filter chains, low power and low silicon cost

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 35

Page 36: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Digital Front End

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 36

Page 37: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

FFT DFT machine,

2 designer-years

• Function: Parallel execution multiple R2, R4,

R8, R16 FFT. R3, R5 DFT

• Structure: A R16 machine can be divided into

2XR8, 2XR5, 4XR4, 4XR3, 8XR2

• Physical constraints: Critical path is a 17b

MUL, to keep internal precision using block

floating point. To minimize the twiddle factor

memory cost.10/9/2017 Unit 11 of TSEA26 – 2017 –H1 37

Page 38: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Complex data matrix computing

more than 10 design-years

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 38

• Function: +, -, MUL, determinants |A|, Hermitian,

Transpose, matrix inversion, LUD, QRD, SVD

• Structure: 2D datapath for multi-out / reduce,

data access with permutation

• Physical constraints: Matrix inversion is the

critical function. MUL is in the critical path

Page 39: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

ePUMA for matrix and tensor

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 39

Page 40: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

FEC ASIP based on SIMD

4 design-years

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 40

• Function: Convolutional and block Turbo,

LDPC, Viterbi (not for Reed Solomon)

• Structure: BCJR (Bahl, Cocke, Jelinek, Raviv)

maximum a posteriori decoding based Forward

and backward recursion, permutated addressing

• Physical constraints: Dual port memory is the

bottleneck. Addressing path might be critical.

Page 41: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Merge FBR algorithms into one flow

Page 42: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Implement the flow into an ASIP

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 42

Synopsys Design Compiler, Cadence Encounter

ST Microelectronics 65 nm CMOS Low Power 1.1V

200 MHz(memory 400 MHz)P=12,W=32(Turbo)or 64(CC)Turbo:

Currently 12 SISO, 200MHz, 6 iteration, 186MBPS

Future: 24SISO, 500MHz, 4 iteration, 1395MBPS

Cost

Area 2.12 mm2

Power consumption 322 mW

• ASIP-FEC: A FEC baseband processor for

Turbo, LDPC, and Viterbi, 2014

Page 43: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

PBIT ASIP based on SIMD

3 designer-years

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 43

• Function: Bit manipulation in parallel including

all LFSR (CRC, CC, Scrambling) GF (Reed

Solomon codec, AEC, DEC, ZUC, Snow3G….)

• Structure: LUT (look up table) based parallel

bit SIMD GF ALU, permutated addressing

• Physical constraints: not much, small memory

blocks, table address generation.

Page 44: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

PBIT ASIP based on SIMD

Performance

11.6 Gb/s AES

32.0 Gb/s SNOW 3G

16.0 Gb/s ZUC

128.0 Gb/s CRC

8.0 Gb/s RS(255,239)

STMicroelectronics的65 nm

Low Power 1.2 V

1.0 GHz

Cost

area 0.77 mm2

Logic gates 207 KGates

power 489 mW

• BP-ASIP: A 128-way parallel baseband processor for parallel bit

manipulation covering RS codec, LFSR, and encryptions. 2016

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 44

Page 45: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

Summarize what/how to learn

System

understanding

Plan HW

schematic

HW

codingFW coding

Integration

verification

Finite precision Just enough quality Where/what Sat/rnd Gain ctrl Corner cases

Micro architecture Functions to map Sharing sharing HW knowledge Balance

Register file Write conflict Critical path Fanout Life time Fanin fanout

ALU: Arithmetic & Logic HW sharing Reuse skill IP code precision corner

MAC: MUL and ACC MAC/LALU/MLU Reuse skill IP code Use MAC corner

Memory and data access Modulo Pipeline pipeline D-allocate IP coding

Program flow control PC and I-decoder PFC pipeline PC Hazard Pipeline

Assembly coding tools Behavior/arch SIM D-hazard ------ Lab4 Verification

Firmware plan & design Bit/mem/cycle ------ ------ plan vs code SW v.s. HW

Survey of Different ASIP Efficient VPU Tool limited critical Kernels

10/9/2017 Unit 11 of TSEA26 – 2017 –H1 45

Skills

Con

cep

ts

5% 15% 10%20% 50%

10%

10%

10%

10%

10%

10%

10%

10%

10%

10%

Page 46: Design of Embedded DSP Processors · 10/9/2017 Unit 11 of TSEA26 –2017 –H1 14 ISA template selection A typical ASIP DSP processor assembly instruction set Move instructions Arithmetic

by Dake Liu: [email protected]© Copyright of Linköping University, all rights reserved ®

LOGO

Dake Liu, Room 556 coridoor B, Hus-B, phone 281256, [email protected]

Welcome to ask any

questions you want to

• I can answer

• Or discuss together

• I want to know what you want