dsp lecture series dsp memory architecture dr. e.w. hu nov. 28, 2000

24
DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Upload: elaine-atkinson

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

DSP Lecture Series

DSP Memory Architecture

Dr. E.W. Hu

Nov. 28, 2000

Page 2: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Computer Architecture and VLSI Technology Pioneer: Lynn Conway

In the 1950s, while working at IBM, Lynn Conway conceived the idea of multi-issue processors , the forerunner of today’s VLIW processors?

Page 3: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Fixed-point DSP datapath

Page 4: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

What is memory architecture

The characteristics of the organization of memory and its interconnection with the processor’s datapath is called memory architecture.

Memory architecture determines the memory bandwidth which is a critical factor that affects the performance of a DSP.

Page 5: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Memory bandwidth

In general, bandwidth w is defined as the rate at which the words can be written to (store) or read from the memory.

For a DSP, it is convenient to think of how many instruction cycles are needed to complete a read or write operation. If everything else is the same, the smaller the number of instruction cycles, the higher the bandwidth.

Page 6: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Why DSP applications needs large memory bandwidth

A high performance datapath is only part of a high-performance processor.

DSP applications are typical computation-intensive, which requires large amount of data to be moved to and from the memory quickly (between the datapath(s) and the memory module (s), as described in the next slide.

Page 7: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Typical DSP applications: the FIR or finite impulse response filter

Page 8: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

At each “tap”, four memory accesses are needed for FIR application

Fetch the MAC instruction in memory Read the data value from memory(a

‘sample’ from the signal) Read the appropriate coefficient from

memory (known constant for a particular filter)

Write the data value to memory (next location in the delay line)

Page 9: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

The Von Neumann architecture for general-purpose processors

Page 10: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

The Harvard architecture: design basis for most DSPs; more than two accesses per cycle

Page 11: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Variations of the Harvard architecture allow still more memory accesses per instruction cycle

Page 12: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Typical DSPs with two or three independent memory banks

Analog Devices ADSP-21xx

AT&T DSP 16xx

Zilog Z893xx

Motorola DSP5600x, DSP563xx, DSP96002

Page 13: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Other approaches to achieve multiple accesses to memories per cycle

Examples of some other approaches multiple, sequential accesses per instruction cycle

over a single set of buses (meaning each access takes less than one cycle), e.g., Zoran ZR3800.

Multi-ported memories that allow multiple concurrent memory accesses over two or more independent sets of buses (Fig 5.4), e.g., AT&T DSP32xx.

Allows read/write operation to proceed at the same time under restricted circumstances, e.g., AT&T DSP16xx.

Page 14: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Using cache memory to reduce memory accesses

On-chip program cache reduces memory accesses

There are so many different implementations of program caches: Single instruction repeat buffer Multiple-instruction cache (e.g., stores a block of 16

instructions) Single-sector instruction cache that stores some

number of most recently used instructions.

Page 15: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Using “modulo addressing” technique to reduce memory accesses

To be discussed in the next seminar: memory addressing modes

Page 16: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Using “algorithmic approaches” to reduce memory accesses

Algorithms are used to exploit data locality to reduce memory accesses.

DSP algorithms that operate on blocks of input data often fetch the same data from memory multiple times during execution, as in the case of FIR filter computation.

In the example that follows, the filter operates on a block of two input samples. Instead of computing output samples one at a time, the filter instead computes two output samples at a time, allowing it to reuse previously fetched data. This effectively reduces the memory bandwidth required from one instruction fetch and two data fetches to one instruction fetch and one data fetch per instruction cycle.

Page 17: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Illustration of algorithmic approach

Page 18: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Memory wait states

Wait states are states in which the processor cannot execute its program because it is waiting for access to memory due to, for example Slow memory Bus sharing

Page 19: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

On-chip ROM for low-cost embedded applications

On-chip ROM (usually small, 256 to 36K words) is used to store small application programs and constant data for low-cost embedded applications.

Page 20: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

External memory interfaces

Page 21: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

External memory interfaces: manual caching

If a section of often-used program code is stored in a slow, off-chip memory, it is programmer’s responsibility to move the code to faster on-chip RAM, either at system start-up or when that section of program is needed.

Page 22: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Dynamic memory

Most DSPs use static RAM, which is faster and easier to interface, but it is more expensive.

For low-cost high-volume product, the designer might need to consider dynamic RAM, especially the static-column DRAM.

Page 23: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Direct memory access

DMA allows data transfer to take place (to/from processor’s memory) without the involvement of the processor itself.

It is typically used to improve the performance for I/O devices.

Page 24: DSP Lecture Series DSP Memory Architecture Dr. E.W. Hu Nov. 28, 2000

Customization

Some vendors are flexible enough to customize it chip-design for their customers (customizable DSPS).