tms320 dr. naim dahnoun, bristol university, (c) texas instruments 2004 overview a.a.2009-10

TMS320Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Overview

A.A.2009-10

A.A.2009-10

2

VoIP

Texas Instruments’ TMS320 Family

C2000

C5000

C6000

Lowest CostControl Systems- Motor Control- Storage- Digital Ctrl

Systems

Efficiency Best MIPS per Watt /

Dollar / Size- Wireless phones- Internet audio

players- Digital still cameras - Modems- Telephony- VoIP

Performance &Best Ease-of-Use

- Multi Channel and Multi

Function App's- Comm Infrastructure- Wireless Base-

stations- DSL- Imaging- Multi-media Servers- Video

Digital Signal Processor

3

C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing.

c62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multi-channel telephony systems.

C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive.

Digital Signal Processor

A.A.2009-10

C6000

TMS320C6000Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004

Architectural Overview

A.A.2009-10

5

TMS320 C6xx

Y =N

å an xnn = 1

*

Prodotto Scalare tra due vettori

Instruction Set

DSP System Block Diagram

A.A.2009-10

6

TMS320 C6xx

It has been shown in Introduction that SOP is the key element for most

DSP algorithms.

Two basic Operations

are required for this algorithm.

- Multiplication

- Addition Therefore two basic

Instructions are required

Scalar Product of Vectors

The implementation in this module will be done in

assembly for C6xx

= a1 * x1 + a2 * x2 +... + aN * xN

Y =N

å an xnn = 1

*

SOP Algorithm

A.A.2009-10

7

Multiply (MPY)

The multiplication of a1 by x1 is done in assembly by the following instruction:

MPY a1, x1, Y

This instruction is performed by a multiplier

unit that is called “.M”

Y =N

å an xnn = 1

*

= a1 * x1 + a2 * x2 +... + aN * xN

A.A.2009-10

8

Multiply (.M unit)

.M.M

Y =40

å an xnn = 1

*

The .M unit performs multiplications in hardware

MPY .M a1, x1, Y

Note: 16-bit by 16-bit multiplier provides a 32-bit result.

32-bit by 32-bit multiplier provides a 64-bit result.

A.A.2009-10

9

Addition (.L unit)

.M.M

.L.L

Y =40

å an xnn = 1

*

MPY .M a1, x1, prod

ADD .L Y, prod, Y

RISC processors such as the C6000 use registers to hold the operands, so

lets change this code.

RISC processors such as the C6000 use registers to hold the operands, so

lets change this code.A.A.2009-10

10

Register File A

Y =40

å an xnn = 1

*

MPY .M a1, x1, prod

ADD .L Y, prod, Y

MPY .M A0, A1, A3

ADD .L A4, A3, A4

MPY .M A0, A1, A3

ADD .L A4, A3, A4

.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.

Let us correct this by replacing a, x, prod and Y by the

registers as shown above.

Let us correct this by replacing a, x, prod and Y by the

registers as shown above.

A.A.2009-10

11

Specifying Register Names

Register File A contains 16 registers (A0 -A15) which are

32-bits wide.

.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.

A.A.2009-10

12

Data loading (load unit .D)

Q: How do we load the operands into the registers?

.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.

A: The operands are loaded into the registers by loading them from the memory using the .D unit.

.D.D

Data MemoryData Memory

It is worth noting at this stage that the only way to access memory is through

the .D unit.

A.A.2009-10

13

Load Instruction

Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.

.D.D


A.A.2009-10

14

Load Instructions

A: The load instructions.

Q: Which instruction(s) can be used for loading operands from the memory to the registers?.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.LDB, LDH

LDW,LDDW.D.D


A.A.2009-10

15

Using the Load Instructions

Before using the load unit you have to be

aware that this processor is byte addressable,

which means that each byte is represented by a

unique address.

Also the addresses are 32-bit wide.

00000000

00000002

00000004

00000006

00000008

Data

16-bits

32 bits address

FFFFFFFF

Byte

A.A.2009-10

16

The syntax for the load instruction is:

Where:

Rn is a register that contains the address of the operand to be loaded

and

Rm is the destination register.

LD *Rn,Rm

00000000

00000002

00000004

00000006

00000008

Data

16-bits

address

FFFFFFFF

a1x1

prodY


A.A.2009-10

17


The question now is how many bytes are going to be loaded into the destination register?

LD *Rn,Rm

00000000

00000002

00000004

00000006

00000008

Data

16-bits

address

FFFFFFFF

a1x1

prodY


A.A.2009-10

18


LD *Rn,Rm

The answer, is that it depends on the instruction you choose:• LDB: loads one byte (8-bit)

• LDH: loads half word (16-bit)

• LDW: loads a word (32-bit)

• LDDW: loads a double word (64-bit)

Note: LD on its own does not exist.

00000000

00000002

00000004

00000006

00000008

Data

16-bits

address

FFFFFFFF

a1x1

prodY


A.A.2009-10

19

Example:If we assume that A5 = 0x4 then:

(1) LDB *A5, A7 ;

gives A7 = 0x00000001

(2) LDH *A5,A7;

gives A7 = 0x00000201

(3) LDW *A5,A7;

gives A7 = 0x04030201

(4) LDDW *A5,A7:A6;

gives A7:A6 = 0x0807060504030201

00000000

00000002

00000004

00000006

00000008

Data

16-bits

address

FFFFFFFF

0xB0xA

0xD0xC

0x010x02

0x030x04

0x050x06

0x070x08

01The syntax for the load instruction is:

LD *Rn,Rm

Little Endianla parte meno significativava memorizzata per prima,

all'indirizzo più bassodi memoria


A.A.2009-10

20

Q: If data can only be accessed by the load instruction and the .D unit, how can we load the register pointer Rn in the first place?

00000000

00000002

00000004

00000006

00000008

Data

16-bits

address

FFFFFFFF

0xB0xA

0xD0xC

0x10x2

0x30x4

0x50x6

0x70x8

01The syntax for the load instruction is:

LD *Rn,Rm


A.A.2009-10

21

The instruction MVKL will allow a move of a 16-bit constant into a register as shown below:

MVKL .? a, A5(‘a’ is a constant or label)

How many bits represent a full address?

32 bits

So why does the instruction not allow a 32-bit move?

All instructions are 32-bit wide (see instruction opcode).

Loading the Pointer *Rn

A.A.2009-10

22

To solve this problem another instruction is available: MVKH

eg. MVKH .? a, A5(‘a’ is a constant or label)

ah

ah x

al a

A5

Finally, to move the 32-bit address to a register we can use:

MVKL a, A5

MVKH a, A5


A.A.2009-10

23

Always use MVKL then MVKH, look at the following examples:

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC Wrong

Example 1 (A5 = 0x87654321)

MVKL 0x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)MVKH 0x1234FABC, A5

A5 = 0x1234FABC OK

Example 2 (A5 = 0x87654321)

MVKH 0x1234FABC, A5

A5 = 0x12344321


A.A.2009-10

24

MVKL pt1, A5 MVKH pt1, A5


LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

pt1 and pt2 point to some locations

in the data memory.

.M.M

.L.L

A0

A1

A2

A3

A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.

.

.

Le componentiai e xi sono numeri

interi da 16 bits

.D.D


LDH, MVKL and MVKHIPOTESI

A.A.2009-10

26

Creating a loop



LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

So far we have only implemented the SOP for one tap only, i.e.

Y= a1 * x1

So let’s create a loop so that we can implement the SOP for N Taps.

A.A.2009-10

27

With the C6000 processors there are no dedicated

instructions such as block repeat. The loop is created

using the B instruction.

So far we have only implemented the SOP for one tap only, i.e.

Y= a1 * x1

So let’s create a loop so that we can implement the SOP for N Taps.

Creating a loop – B instruction

A.A.2009-10

28

Create a label to branch to.

Add a branch instruction, B.

Create a loop counter (LC).

Add an instruction to decrement the LC.

Make the branch conditional based on LC.

What are the steps for creating a loop ?

A.A.2009-10

29

1. Create a label to branch to



loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

A.A.2009-10

30



loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

B .? loop

2. Add a branch instruction, B.

A.A.2009-10

31

Which unit is used by the B instruction?

MVKL .S pt1, A5 MVKH .S pt1, A5


loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

B .S loop

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.D.D

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.S.S

.D.D

Data MemoryData MemoryA.A.2009-10

32

3. Create a loop counter.



MVKL .S count, B0

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

B .S loop

B registers will be introduced later

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.D.D

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.S.S

.D.D


33

4. Decrement the loop counter



MVKL .S count, B0

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

B .S loop

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.D.D

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.S.S

.D.D


34

What is the syntax for making instruction conditional?

[condition] InstructionLabel

e.g. [B1] B loop

(1) The condition can be one of the following registers: A1, A2, B0, B1, B2.

(2) Any instruction can be conditional.

5. Make the branch conditional based on the value in the loop counter

A.A.2009-10

35

The condition can be inverted by adding the exclamation symbol “!” as follows:

[!condition] InstructionLabel

e.g.

[!B0]B loop; branch if B0 = 0

[B0] B loop; branch if B0 != 0

5. Make the branch conditional based on the value in the loop counter

A.A.2009-10

36

MVKL .S2 pt1, A5 MVKH .S2 pt1, A5


MVKL .S2 count, B0

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

5. Make the branch conditional

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.D.D

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

ax

prod

32-bits

Y

.S.S

.D.D


37

With this processor all the instructions are encoded in a 32-bit.

Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be coded.

Case 1: B .S1 label Relative branch. Label limited to +/- 220 offset.

More on the Branch Instruction (1)

21-bit relative addressB

32-bit

A.A.2009-10

38

More on the Branch Instruction (2)

By specifying a register as an operand instead of a label, it is possible to have an absolute branch.

This will allow a dynamic range of 232.

Case 2: B .S2 register Absolute branch. Operates on .S2 ONLY!

5-bit register codeB

32-bit

A.A.2009-10

39

Testing the code

This code performs

the following operations:

a0*x0 + a0*x0 + a0*x0 +

… + a0*x0

However, we would like to perform:

a0*x0 + a1*x1 + a2*x2 +

… + aN*xN



MVKL .S2 count, B0

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

A.A.2009-10

40

Modifying the pointers

The solution is

to modify

the pointers

A5 and A6



MVKL .S2 count, B0

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

A.A.2009-10

41

Indexing Pointers

Description

Pointer

+ Pre-offset

- Pre-offset

Pre-increment

Pre-decrement

Post-increment

Post-decrement

Syntax PointerModified

*R*+R[disp]*-R[disp]*++R[disp]*--R[disp]*R++[disp]*R--[disp]

No

No

No

Yes

Yes

Yes

Yes

[disp] specifies # elements - size in DW, W, H, or B. disp = R or 5-bit constant. R can be any register.

A.A.2009-10

42

Modify and testing the code

This code now performs


a0*x0 + a1*x1 + a2*x2 +

... + aN*xN



MVKL .S2 count, B0

loop LDH .D *A5++, A0

LDH .D *A6++, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

A.A.2009-10

43

Store the final result

This code now performs


a0*x0 + a1*x1 + a2*x2 +

... + aN*xN



MVKL .S2 count, B0


LDH .D *A6++, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

STH .D A4, *A7

A.A.2009-10

44

Store the final result

The Pointer A7

is now initialised.



MVKL .S2 pt3, A7MVKH .S2 pt3, A7MVKL .S2 count, B0


LDH .D *A6++, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

STH .D A4, *A7

A.A.2009-10

45

What is the initial value of A4?

A4 is used as

an accumulator,

so it needs to be

reset to zero.



MVKL .S2 pt3, A7MVKH .S2 pt3, A7MVKL .S2 count, B0ZERO .L A4


LDH .D *A6++, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

SUB .S B0, 1, B0

[B0] B .S loop

STH .D A4, *A7

A.A.2009-10

46 TMS320 C6xx

Codex Complete



MVKL .S2 pt3, A7MVKH .S2 pt3, A7

MVKL .S2 N, B0

ZERO .L1 A4

loop LDH .D1 *A5++, A0

LDH .D1 *A6++, A1

MPY .M1 A0, A1, A3

ADD .L1 A4, A3, A4

SUB .S2 B0, 1, B0

[B0] B .S1 loop

STH .D1 A4, *A7

32 bits 32 bits32 bits16 bits

*A5 = an (16 bits)

16 bits

(pt1)

16 bits

(pt2)

16 bits

(pt3)

*A6 = xn (16 bits)

*A7 = Y (16 bits)

B0 = N (16 bits)

A0

A1A4

A.A.2009-10

47

Y =40å an xn

n = 1*

Code Review(using side A only)

MVK .S1 40, A2 ; A2 = 40, loop countloop: LDH .D1 *A5++, A0 ; A0 = a(n)

LDH .D1 *A6++, A1 ; A1 = x(n)MPY .M1 A0, A1, A3 ; A3 = a(n) * x(n)ADD .L1 A4, A3, A4 ; Y = Y + A3SUB .L1 A2, 1, A2 ; decrement loop count

[A2] B .S1 loop ; if A2 0, branchSTH .D1 A4, *A7 ; *A7 = Y

Note: Assume that A4 was previously cleared and the pointers are initialised. Assume that A2 is B0

A.A.2009-10

tms320 dr. naim dahnoun, bristol university, (c) texas instruments 2004 overview a.a.2009-10

Documents

n x n n

y mpy a1

tms320 c6xx y

c6000 slide

c64x dsps

y note

c texas instruments

digital age