avnet speedway design workshop™ - rdsl

54
Avnet SpeedWay Workshops 1 Accelerating Your Success™ V10_1_2_0 Avnet Speedway Design Workshop Creating FPGA-based Co-Processors for DSPs Using Model Based Design Techniques Lecture 5: Creating a Stand-alone Video System

Upload: others

Post on 15-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Avnet SpeedWay Workshops

1

Accelerating Your Success™

V10_1_2_0

Avnet SpeedwayDesign Workshop™

Creating FPGA-based Co-Processors for DSPs Using Model Based Design Techniques

Lecture 5: Creating a Stand-alone Video System

Avnet SpeedWay Workshops

2

2Avnet SpeedWay Design Workshop™

Develop Executable Spec in Simulink

Partition Between DSP and FPGA Co-Processor

Model-Based Design Flow

Design Exploration for Targeting Hardware

Verify Hardware in HW Co-simulation

Implement Stand-Alone Video System

The final design phase after verification in simulation is implementation as a stand-alone system comprised of DSP and FPGA co-processor.

Avnet SpeedWay Workshops

3

3Avnet SpeedWay Design Workshop™

The Problem We Wish to Solve

Maintaining a complex system involving DSP and FPGA co-processor can be tedious and error-prone.

MathWorks model-based design bridges TI DSP and Xilinx FPGA design flows with automatic code generation to remove the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.

Final FPGA co-processor system offers better performance.

Avnet SpeedWay Workshops

4

4Avnet SpeedWay Design Workshop™

Agenda

• Interfacing the DSP and FPGA Co-Processor

• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module

• Model-Based Infrastructure for Stand-Alone Implementation

Avnet SpeedWay Workshops

5

5Avnet SpeedWay Design Workshop™

MATLAB® and Simulink®

Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®

Algorithm and System DesignAlgorithm and System Design

Verif

y

Real-Time WorkshopEmbedded Coder,

Targets, Links

RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,

Targets, LinksTargets, Links

Verif

y

Generate

Generate

Code Composer

Avnet Spartan3A-DSP FPGA / DaVinci Platform

C / ASM

XilinxXilinxXilinx

MathWorksMathWorksMathWorks

Link for CCSLink for CCS

Xilinx System Generator for DSP

Xilinx System Xilinx System Generator for DSPGenerator for DSP

HDL

ISEISE

Hardware Hardware CoCo--simulationsimulation

TITITI

AvnetAvnetAvnet

DSPDSP FPGAFPGA ChipscopeChipscopeChipscopeVerifyVideo source

LCDPanel

Design Flow for Stand-Alone Implementation…

< mouse click >

We begin by examining the connectivity for data transfer between the DM6437 and FPGA co-processor.

< mouse click >

We continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit.

< mouse click >

We conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system.

Note that video now flows into the system from a live source, contrary to video frames generated by a Simulink testbench for verification using hardware co-simulation.

Avnet SpeedWay Workshops

6

6Avnet SpeedWay Design Workshop™

SimulinkAlgorithm and System Design

SimulinkSimulinkAlgorithm and System DesignAlgorithm and System Design

Image Translate

Sum-of-Absolute -Differences

(SAD)Location

estimation

Motion estimation

Relative motion vector from frame to frame

updated templateupdated ROI 1

2

Model Partition DSP / FPGA

Video Stabilization Model

Recall the steps of labs 3 and 4, where a Simulink model was partitionned between DSP and FPGA …

Moving to a stand-alone implementation, we must now bridge the FPGA co-processor hardware and to DSP software.

Avnet SpeedWay Workshops

7

7Avnet SpeedWay Design Workshop™

Co-Processor

DSPCore

?

• Requires hardware interface and communication protocol• Managing asynchronous clock domains• Software API to communicate with hardware

?

Bridging Software to Hardware.

Bridging software (DSP) to hardware (FPGA Co-Processor) requires:

•hardware interface and communication protocol

•managing asynchronous clock domains

•software API to communicate with hardware

Mouse click …

How can this be implemented ?

Let's examine these aspects in detail, especially as they relate to exchanging streaming data such as video between the FPGA co-processor and the DSP.

Avnet SpeedWay Workshops

8

8Avnet SpeedWay Design Workshop™

Co-Processor?Data Control Data Control

DSPCore

EMIF

Data and control on common bus (EMIF)– Obliges burst transfer over time-shared bus– Inefficient for streaming data (ex. video)– Requires inserted syncs, framing in DSP software, handshaking

Bridging Software to Hardware / EMIF

Bridging software on the DSP-side to the hardware co-processor requires first and foremost a hardware interface and communication protocol. One possibility is EMIF, ‘External Memory Interface’, which groups address, data and control signals for interface to external devices. EMIF comes in a variety of sizes across different families of DaVinci, from synchronous 32-bit data on DM642 to asynchronous 8-bit data on the DM6437.

It is convenient to differentiate between control data and streaming data. Control data is often bursty in nature and not time-critical, while streaming data is constant and requires a fixed bandwidth. Exchanging streaming data such as video between DSP and FPGA co-processor over a shared bus such as EMIF will require time-multiplexed burst transactions to accommodate other devices access to the bus. Control data must be inserted between streaming data bursts in a time-multiplexed bus-sharing. Furthermore, exchanging video over a bus such as EMIF would necessitate inserted syncs, and would require framing in software in the DSP and asynchronous FIFOs in the FPGA. For these reasons, EMIF is not the best choice of interface between the DSP and FPGA co-processor.

Avnet SpeedWay Workshops

9

9Avnet SpeedWay Design Workshop™

Control

DSPCore

VLYNQ

Separate data and control– Streaming full-duplex video over dedicated Video Processing Subsystem of

DM6437– Control over VLYNQ– Simple, fast, efficient

Co-ProcessorVLYNQLOGICORE

VPBE

VPFE VPFE INTERFACE

VPBE INTERFACE

Video

Video

Bridging Software to Hardware / VLYNQ

A simple and efficient approach is to transport streaming data over dedicated ports of the Video Processing Subsystem, while control data flows through a separate, non-time critical link. This offers simple, fast uninterrupted bi-directional streaming video between DSP and FPGA co-processor.

Let's examine the resources on DM6437 to implement separate video and control interfaces to the FPGA co-processor.

------------------------------------------------------

Why does video flow thru FPGA and not directly to DSP ?

… because board is built to pipe video thru FPGA to/from DSP.

------------------------------------------------------

Note: Although not officially supported, TI has done some work to allow general-purpose data, not just video, to flow into the VPFE and out of the VPBE ports.

Contact Bernie Thompson at TI.

Avnet SpeedWay Workshops

10

10Avnet SpeedWay Design Workshop™

Control

DSPCore

VLYNQ Co-ProcessorVLYNQLOGICORE

VPFE VPFE INTERFACEVideo

Video

VLYNQ

• Xilinx and TI collaborating for seamless interconnection between DSPs and FPGAs

• Low pin count, low cost, scalable bandwidth• DaVinci has on-chip VLYNQ peripheral• Xilinx VYNQ LogiCORETM IP delivered through Xilinx CORE

Generator

VLYNQ is a serial (i.e. low pin count) communications interface that enables the extension of an internal bus segment to one or more external physical devices (ex. FPGA). VLYNQ accomplishes this function by serializing bus transactions in one device, transferring the serialized transaction between devices via a VLYNQ port, and de-serializing the transaction in the external device.

VLYNQ peripheral is offered in DaVinci (DM644x and DM643x devices), Jacinto, Avalanche, Puma, Sangam, Titan, APEX and other TI communication processors.

Xilinx has licensed VLYNQ, so it is a great opportunity to connect FPGAs to TI DSPs, in addition to EMIF (External memory I/F) and Serial Rapid IO (SRIO).

Avnet SpeedWay Workshops

11

11Avnet SpeedWay Design Workshop™

Xilinx FPGA

CMD 1(10 bits)

Byte Count(10 bits)

Address(<4*10 bits)

End of PacketEOP (10 bits)

Data(N*10 bits)

CMD 2(10 bits)

Address Mask(4 bits)

Packet Type(4 bits)

CLK REQ (optional)CLOCK

RECEIVETRANSMIT

VLYN

Q

VLYN

Q

• Scalable to meet bandwidth requirements (3pin to 10pin)

• Single ended, unidirectional I/O• 8b/10b encoding. In-band signaling

• Memory mapped, master & slave on a single bus • Software transparent for future device integration

• High-Speed, low pin-count, full duplex, peer-to-peer Serial I/F• Extension of an internal bus segment to one or more external devices• Point-to-point serial interface for other VLYNQ compatible devices• External devices are mapped to local physical address space and appear

as if they are on the internal bus of the local device

VLYNQ

Avnet Spartan-3A DSP DaVinci board uses all 4 data (transmit / receive) pairs. Individual pins can be GPIO if a lower bandwidth VLYNQ interface is desired or not used.

Avnet SpeedWay Workshops

12

12Avnet SpeedWay Design Workshop™

35.7285.5616

32.49259.938

27.55220.374

15.84126.721

Throughput (Mbytes/sec)

Throughput (Mbits/sec)

Burst Size in 32-bit Words

Maximum Effective Throughput - With 99Mhz Clock (100 Mhz max clock supported)All benchmarks using 4 VLYNQ transmit/receive pairs.

• 8b/10b coding causes 20% overhead - only 8bits of data contained in every 10bits sent

• Total Overhead = protocol overhead + 8b/10b overhead• Theoretical Maximum throughput = 4 data lines X 100Mhz

max clock = 50 Mbytes/sec)

VLYNQ Performance

The max write rate describes the maximum available data rate of the serial interface for transmission, taking into consideration the 8b/10b encoding overheads. This is calculated as follows:

Max write rate = VLYNQ Serial Clock (MHZ) x No. of Pins x 8b/10b encoding overhead

The 8b/10b encoding overhead essentially accounts for 20% overhead, thus the actual effective data throughput after subtraction of the encoding overhead gives a factor of 0.8. For example, if the VLYNQ clock is running at 99 MHZ on a 4 pin per direction interface, the raw data is 99 x 4 or 396 Mbps. After the 8B10B encoding is removed, the maximum write rate is 396 x 0.8 = 316.8 Mbps.

The total throughput on the VLYNQ interface includes both transmit and receive directions. Therefore, for the above configuration, a remote device can also be writing to the local device at the same data rates, then the total throughput is the sum of transmit and receive rates, or 633.6 Mbps. In addition to the 8b/10b encoding, the packet structure for read/write operations also results in additional overheads. The VLYNQ module can transfer single 32-bit words or a burst of up to sixteen 32-bit words.

The data and throughput calculations shown here are sample calculations for most ideal situations. In general, the data rates depend on a variety of other factors, such as efficiency of read/write burst transactions, ability of buffering up read/write data, and how best it can be serially shifted out without stalling additional read/write data burst, remote and local components , both external and internal (device operations, board considerations, etc.).

References:

TMS320DM643x DMP VLYNQ Port User's Guide / TI Literature: spru938b.pdf (Appendix B)

Avnet SpeedWay Workshops

13

13Avnet SpeedWay Design Workshop™

Video Processing Subsystem

Connectivity

Video processing SubsystemDSP

CoreVLYNQ

Map Region 10400:0000

07FF:FFFF

Map Region 20800:00000800:00FF

Map Region 30800:01000801:00FF

Map Region 40801:01000841:00FF

Map Region 1

Map Region 2

Map Region 3

Map Region 4

Peripheral A0000:000003FF:FFFF

Address decode

Peripheral B0400:00000400:00FF

Peripheral C0500:00000500:FFFF

Peripheral D0B00:00000B3F:FFFF

VLYNQ

VLYNQ Remote Memory Mapping

• Remote VLYNQ devices memory mapped to the local (DSP host) device’s address space• Finer memory-decoding can target smaller address ranges within the FPGA co-processor

Remote VLYNQ device(s) are memory mapped to the local (host) device’s address space when a link is established (and appear as if they are on the internal bus, similar to any other on-chip peripherals). Enumerating the VLYNQ devices (single or multiple) into a coherent memory map for accessing each device is part of the initialization sequence.

After the enumeration, the host (local) device can access the remote device address map using local device addresses. The VLYNQ module in the host device manages the address translation of the local address to the remote address. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1 to 4). The transmit side has a contiguous map; the size of the map is the same as the remote device map.

The figure illustrates this mapping.

This capability makes VLYNQ ideal for memory-mapping FPGA-based peripherals. For clarity, only 4 peripherals are shown above; finer memory-decoding can target any number of smaller address ranges to communicate with registers within the FPGA co-processor. The Avnet VLYNQ block allows memory-mapped address spaces down to single-register level using System Generator shared memory registers.

---------------------------------------------------------------------------------------------------------------------

Reference:

In the local device, the address of the VLYNQ remote memory map in the local configuration space is the transmit address accessing remote devices over the serial interface. The address of the VLYNQ remote memory map is programmed in the TX address map register (XAM). When the local device transmits, first it strips off the transmit address offset in the local device memory map Then the local device sends the

Avnet SpeedWay Workshops

14

14Avnet SpeedWay Design Workshop™

Agenda

• Interfacing the DSP and FPGA Co-Processor

• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module

• Model-Based Infrastructure for Stand-Alone Implementation

Avnet SpeedWay Workshops

15

15Avnet SpeedWay Design Workshop™

Integration of 3 Pieces of Avnet Hardware

PS Video EXPSpartan-3A DSP DaVinciEvaluation Kit

6.5” NEC LCD panel

+ +

6.5” NEC panel is targeted for $500 resale, but we do not have an established price yet.

Avnet SpeedWay Workshops

16

16Avnet SpeedWay Design Workshop™

10/100/1G PHY

RS232

EXP

DDR2

Image SensorInterface

ParallelFlash

DDR2

SPI Flash

Component Video Out

Audio CODEC

DaVinciDM6437

RS232

VPBE

8- Bit EMIF

VPFE

SwitchesSPI

Flash

10/100 PHY

Clocks

JTAG JTAG

USBParallelFlash

I2C

VLYNQ

McBSP1

LEDS LEDS

Spartan 3A-DSP 3SD1800A

Avnet Spartan-3A DSP DaVinci Evaluation Kit

The Avnet Spartan-DSP DaVinci Evaluation Platform combines on the same baseboard the new Xilinx Spartan 3A-DSP FPGA and TI DaVinci TMS320DM6437 Digital Media Processor, optimized for video applications such as surveillance, automotive, machine vision.

DM6437 connects to Spartan3A-DSP over several interfaces : VLYNQ, EMIFA, VPBE, VPFE.

Features:

• Xilinx 3SD1800A-FG676 FPGA• Programmable LVDS Clock Generator• On-board 27 MHz LVTTL Oscillator• On-board LVTTL Oscillator Socket• 16M x 32-bit DDR2 SDRAM • 256K x 36bit ZBT SRAM• EXP Expansion Slot• 10/100 PHY• 64Mb x 2 SPI Configuration Flash• JTAG Programming/Configuration Port• RS232 Port• Two User LEDs• A 4-position User DIP Switch• Three User Push Button Switches• Audio CODEC shared with DM6437

TI DaVinci DSP Processor

• TMS320DM6437 Digital Media Processor• 128 MB 166 MHz DDR2 SDRAM• 64 Mb serial SPI Flash program code storage• 10/100 PHY• VGA Out• Audio CODEC shared with FPGA

Avnet SpeedWay Workshops

17

17Avnet SpeedWay Design Workshop™

• Xilinx XC3SD1800A-4FG676C FPGA• Clocks

– Programmable LVDS clock generator – On-board 27 MHz LVTTL oscillator – On-board LVTTL oscillator socket

• Memory– 128M x 32-bit DDR2 SDRAM– 16M x 8 parallel / BPI configuration Flash– 64Mb SPI configuration/storage Flash

• Interfaces– 10/100/1000 PHY– JTAG programming/configuration port– RS-232 serial port – Image Sensor Interface– 2 EXP expansion connectors

• Buttons and switches– 4 LEDs– Eight 4-position DIP switch– 4 push-button switches

• TI TMS320DM6437 DaVinciProcessor

• Memory– 128 MB 166 MHz DDR2 SDRAM – 128 Mb parallel Flash program code

storage – 64 Mb serial SPI Flash program code

storage • Interfaces

– 10/100 Ethernet Port – Component and composite video out – Audio CODEC shared with FPGA – USB

• Buttons and switches– 4 User LEDs

DSPFPGA

Avnet Spartan-3A DSP DaVinci Evaluation Kit

Avnet SpeedWay Workshops

18

18Avnet SpeedWay Design Workshop™

• High-Definition Video Decoder – Texas Instruments TVP7001 (RGB, Component)• Standard-Definition Video Decoder – Texas Instruments TVP5150 (Composite, S-Video)• DVI Transmitter – TFP410• DVI Receiver – AD9887A• Analog Devices ADV7123 RGB DAC• Parallel RGB and LVDS interfaces to Flat Panel Displays• Stereo Audio CODECs

Avnet Pro-Sumer Video EXP Module

The Avnet EXP ProSumer Video (EXP PS Video) Module is a plug-in module designed to interface with compatible Avnet baseboards, including the Avnet Spartan-DSP DaVinciEvaluation Platform. The EXP PS Video Module provides a number of video and audio interfaces to its host via two EXP connectors.

Avnet SpeedWay Workshops

19

19Avnet SpeedWay Design Workshop™

• NEC XGA LCD flat panel display -NL10276BC13-01C

• Super-Transmissive Natural Light TFT• 1024 x 768 Resolution• 6.5 inches Diagonal• 16.77M colors• LVDS Interface• LED Backlight

NEC TFT Display

Avnet SpeedWay Workshops

20

20Avnet SpeedWay Design Workshop™

VPBE

Avnet Spartan3A-DSP / DaVinci Evaluation Kit

2X Scaler

RGB24-bits

Flat PanelController

62.5 MPixels / sec

Flat Panel

Display

62.5 x 7 = 437.5 Mbps

1024 x 768XGA

LVDS Flat Panel Controller

Avnet provides a controller for LVDS flat panel displays. It is provided at no extra cost to customers who purchase the PS Video EXP module.

RGB + syncs digital video arrives at the flat panel controller at 62.5 MPixels / sec.

The outputs of the LVDS flat panel controller comprise 5 LVDS transmit pairs: • a forwarded clock at 1/7th the bit rate with 4:3 duty cycle comprising the LCD_FTXC

pair• 4 data lines LCD_FTX[3:0], each of which carry a 7:1 serialized bit stream.

These 5 LVDS transmit pairs originate from the baseboard FPGA, are routed up through the EXP connector to J6 of the Avnet EXP PS Video module.

J6 is a JAE FI-X30S-HF connector that accepts a cable assembly to drive a NEC 6.5” XGA TFT-LCD module.

Avnet SpeedWay Workshops

21

21Avnet SpeedWay Design Workshop™

Agenda

• Interfacing the DSP and FPGA Co-Processor

• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module

• Model-Based Infrastructure for Stand-Alone Implementation

Avnet SpeedWay Workshops

22

22Avnet SpeedWay Design Workshop™

Avnet Board Support Package for Simulink

DM6437

Here is an overview of the Avnet board support package for Simulink for Spartan-3A DSP DaVinci Development kit. It is subdivided into 3 blocksets.

On the left are Simulink blocks that map to physical peripheral devices within the DM6437, such as UART, CAN and the Video-Processing subsystem.

On the right are blocks that are implemented in the Spartan3A-DSP. These blocks are used in the System Generator portion of the Simulink model.

At the bottom are Simulink blocks that map to the DSP, but which communicate with FPGA functions, or physical board-level circuitry via the FPGA, such as LEDs.

The Avnet board support package for Simulink is the result of collaborative work between Avnet and The MathWorks.

Avnet SpeedWay Workshops

23

23Avnet SpeedWay Design Workshop™

• Library of Simulink blocks supporting features of DM6437 on Avnet Spartan-3A DSP DaVinci Evaluation Kit

• Exposes parameters of each peripheral• Generates API to DSP/BIOS drivers

Avnet Board Support Library for Simulink

Overview of Simulink blocks in BSP to support DM6437. Note the extensive list of parameters offered for each peripheral.

Avnet SpeedWay Workshops

24

24Avnet SpeedWay Design Workshop™

Control

DSPCore

VLYNQ Co-ProcessorVLYNQLOGICORE

VPBE

VPFE VPFE INTERFACE

VPBE INTERFACE

Video

Video

Avnet Board Support Package / VPSS

• VPSS blocks used by automatic code-generation to call DSP/BIOS driver APIs

How is the VPSS connectivity accomplished ?

This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VPFE and VPBE blocks are used by RTW Embedded Coder to call the DSP/BIOS driver API.

Avnet SpeedWay Workshops

25

25Avnet SpeedWay Design Workshop™

ControlDSPCore

VLYNQ Co-ProcessorVLYNQLOGICORE

VPFE VPFE INTERFACEVideo

Avnet Board Support Library / VLYNQ

• VLYNQ block used by automatic code-generation to call VLYNQ DSP/BIOS driver API

How is the VLYNQ connectivity accomplished on the DSP side ?

This is accomplished with Avnet BSP for Simulink, developed in collaboration with The MathWorks. For code generation, the VLYNQ block is used by RTW Embedded Coder to call the DSP/BIOS driver API.

(Recall directory structure of Avnet BSP from lecture 4)

Avnet SpeedWay Workshops

26

26Avnet SpeedWay Design Workshop™

Co-Processor

FIFO

REG

RAM

Memory-Mapped IO

0400:000004000001

0800:00000800:00FF0800:01000801:00FF

MATLAB® and Simulink®

Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®

Algorithm and System DesignAlgorithm and System Design

Real-Time WorkshopEmbedded Coder,

Targets, Links

RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,

Targets, LinksTargets, Links

Verif

y

Generate

Generate

Code Composer

Avnet Spartan3A-DSP FPGA / DaVinci Platform

C / ASM

XilinxXilinxXilinx

MathWorksMathWorksMathWorks

Link for CCSLink for CCS

Xilinx System Generator for DSP

Xilinx System Xilinx System Generator for DSPGenerator for DSP

HDL

ISEISE

TITITI

AvnetAvnetAvnet

DSPDSP FPGAFPGAVLYNQ

REG

FIFO

RAM

Memory Map

Export memory map via MATLAB

Passing FPGA Memory Map via MATLAB…

DaVinci processor

Shared memories in the System Generator model destined for the FPGA co-processor are associated with the DM6437 through the ‘DaVinci Processor’ VLYNQ Interface block’s GUI in System Generator. After an association is made, System Generator automatically generates a memory map of all shared memory in the model.

<mouse click>

During code generation, the memory map is exported to Code Composer Studio via the MATLAB workspace to create memory-mapped IO in DM6437 that communicate with corresponding registers, FIFOs and RAM elements in the FPGA co-processor over VLYNQ.

<mouse click>

On the FPGA side, System Generator project integration with ISE carries memory mapping information to the VLYNQ IP in ISE, where the final bitstream is created.

<mouse click>

The result is an association between memory-mapped IO space in the DM6437 and registers, FIFOs and RAM memory elements in the FPGA co-processor, which appear to the DM6437 as local memory space through VLYNQ.

Push-button automatic code generation removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.

Avnet SpeedWay Workshops

27

27Avnet SpeedWay Design Workshop™

DSP design

FPGA design

Memory Mapcommunicatedvia MATLAB

Implementing DSP to FPGA VLYNQ Interface

Here we show usage of the DM6437 VLYNQ Interface blocks in Simulink to connect DM6437 in the top windows to the FPGA co-processor in System Generator in the bottom windows. Note the memory-mapping for a single shared register passed via the MATLAB workspace.

Avnet SpeedWay Workshops

28

28Avnet SpeedWay Design Workshop™

TC6 Automatic Code Generation for DM6437

• VLYNQ DSP/BIOS driver API created by automatic code-generation from Avnet BSP VLYNQ block

Excerpt of auto-generated code from The MathWorks Embedded Coder for TC6 from VLYNQ block in Avnet board support library for Simulink.

Avnet SpeedWay Workshops

29

29Avnet SpeedWay Design Workshop™

• Various FPGA infrastructure on different clock domains• Multiple Subsystem Generator allows multiple asynchronous clock

domains in one System Generator model

Clock Domains in System Generator

Multiple clock domains are handled seamlessly by the Avnet board support package using a powerful feature of System Generator: Multiple Subsystem Generator.

This example shows VLYNQ interface to DSP on one clock domain, VPFE for incoming video one another clock domain, and VPBE for video display on a third clock domain.

Note that the top-level FPGA design is finalized in ISE after project export from System Generator.

Avnet SpeedWay Workshops

30

30Avnet SpeedWay Design Workshop™

Avnet Board Support Package / Demos

• Suite of demos integrated into board support package• FPGA-based co-processors using model based design

A comprehensive suite of demos is integrated into the Simulink board support package for the Avnet Spartan-3A DSP FPGA DaVinci Development Kit. Demos cover these aspects of creation of FPGA-based co-processors using model based design:

•LCD Demo: generate an image on the LCD panel of the Avnet Spartan-3A DSP FPGA DaVinci Development Kit

•Resizer demo: demonstrates two methods for resizing an image

•NTSC to LCD passthrough: demonstrates how to implement a NTSC to LCD passthrough

•SVGA to LCD passthrough: demonstrates how to implement a SVGA to LCD passthrough

•Video surveillance recording: demonstrates a video surveillance recording application with motion-detection algorithm on the DM6437 DSP

•LED Demo: Using a very simple example, a model-based design is gradually targeted to DSP and FPGA hardware.

Avnet SpeedWay Workshops

31

31Avnet SpeedWay Design Workshop™

Avnet Design Resource Center

• Download Board Support Package for Simulink from DRC

Avnet SpeedWay Workshops

32

32Avnet SpeedWay Design Workshop™

Stand-Alone Video Stabilization System

XGA Flat

Panel

Avnet Xilinx Spartan3A-DSP DaVinci Evaluation Platform

DDR2

Video source

NTSC

ImageTranslate

Best-match row,column

VLYNQ

RGB24-bits

VPBE

Flat PanelController

VLYNQLOGICORE SAD

2XScaler

VPFEScaler VPFE

INTERFACE

1024 x 76860 Hz

Template,ROI

VPBE INTERFACE

Block diagram of stand-alone video stabilization system that will be built in lab 5. The architecture of the Avnet Spartan-3A-DSP DaVinci board routes video data through the FPGA towards the DM6437 over the dedicated VPFE video port.

Template and ROI data are sent to the FPGA at each frame for SAD search of template in region of interest (ROI). Best-match result of SAD is sent back to DM6437 over VLYNQ.

Motion vector is used as offset for image translation to stabilize the video from frame-to-frame. Video output is sent over VPBE to FPGA for display on XGA flat panel.

Avnet SpeedWay Workshops

33

33Avnet SpeedWay Design Workshop™

MATLAB® and Simulink®

Algorithm and System DesignMATLABMATLAB®® and Simulinkand Simulink®®

Algorithm and System DesignAlgorithm and System Design

Verif

y

Real-Time WorkshopEmbedded Coder,

Targets, Links

RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,

Targets, LinksTargets, Links

Verif

y

Generate

Generate

Code Composer

Avnet Spartan3A-DSP FPGA / DaVinci Platform

C / ASM

XilinxXilinxXilinx

MathWorksMathWorksMathWorks

Link for CCSLink for CCS

Xilinx System Generator for DSP

Xilinx System Xilinx System Generator for DSPGenerator for DSP

HDL

ISEISE

Hardware Hardware CoCo--simulationsimulation

TITITI

AvnetAvnetAvnet

DSPDSP FPGAFPGA ChipscopeChipscopeChipscopeVerifyVideo source

LCDPanel

VLYNQ

Integrating the DSP and FPGA Co-processor…

Preview of lab 5:

< mouse click >

1. Implement connectivity in System Generator for data transfer between the DM6437 and FPGA co-processor over VLYNQ.

< mouse click >

2. Continue with automatic code generation of executables for both DSP and FPGA, including the Avnet board support package for Simulink on Avnet Spartan-3A DSP DaVinci development Kit.

< mouse click >

3. Conclude with in-system verification techniques of the combined the DSP and FPGA co-processor system.

Hardware co-simulation was used for functional verification in lab 4. It is not used for stand-alone implementation, and is shown here as reference only.

Note that video now flows into the system from a live source, contrary to video frames generated by Simulink for hardware co-simulation.

Avnet SpeedWay Workshops

34

34Avnet SpeedWay Design Workshop™

Summary

• Interfacing the DSP and FPGA Co-Processor

• Avnet Spartan3A-DSP DaVinci Platform with + PS Video EXP Module

• Model-Based Infrastructure for Stand-Alone Implementation

… proceed to lab 5 Integrating the DSP and FPGA Co-processor

Avnet SpeedWay Workshops

35

35Avnet SpeedWay Design Workshop™

Reference Slides

Avnet SpeedWay Workshops

36

36Avnet SpeedWay Design Workshop™

Video Processing SubsystemDSP

Core

VLYNQ Data Flow

Remote VLYNQLocal VLYNQ

Cust

om In

terfa

ce

Co-Processor

VLYNQ block diagram.

The previous slide showed memory mapping between the local (host) device’s address space and the remote address space. This is accomplished via the address translation blocks. A remote VLYNQ device is mapped to the local device’s address via the address map registers (TX address map, RX address map size n, RX address map offset n, where n = 1to 4). For clarity, the map registers aren’t shown on the block diagram above.

The data flow between two VLYNQ devices is shown here, in which the write originates from the DM643x slave configuration bus interface towards the outbound command (CMD) FIFO after address translation. Data is subsequently read from the FIFO and encapsulated in a write request packet. The packet is encoded and serialized before being transmitted to the remote VLYNQ in the FPGA.

The remote device subsequently de-serializes and decodes the receive data and writes it into the inbound CMD FIFO. A write operation initiates on the FPGA VLYNQ OPB master bus interface (On-Chip Peripherial Bus) after reading the address and data from the FIFO. 32-bit OPB interface standard can interface directly to an embedded processor in the FPGA, or a custom user interface, as shown.

Finally, address decoding can deliver the data to register(s) of the addressed peripheral.

The Xilinx VLYNQ serial interface is not directly coupled to the OPB interface; there are asynchronous FIFOs between the two interface domains, and the interfaces operate independently. However, if the OPB fails to generate sufficient commands and data to consume all the VLYNQ interface’s bandwidth, the VLYNQ interface generates idle packets. If the OPB fails to immediately accept all remotely generated commands and data, the FIFOs fill and the VLYNQ interface turns flow control on.

Reference:TMS320DM643x DMP VLYNQ Port User's GuideLiterature Number: SPRU938BSection 2.5.1

Xilinx VLYNQ v1.3 / Core Generator 10.1Literature Number: DS324

Avnet SpeedWay Workshops

37

37Avnet SpeedWay Design Workshop™

http://focus.ti.com/lit/ug/spru938b/spru938b.pdf

www.xilinx.com/products/ipcenter/DO-DI-VLYNQ.htm

VLYNQ References

VLYNQ documentation consists of the TMS320DM643x DMP VLYNQ Port User’s Guide from TI and of the VLYNQ LogiCore datasheet from Xilinx.

Avnet SpeedWay Workshops

38

38Avnet SpeedWay Design Workshop™

vlynq_config.peer_tx_addr = 0;vlynq_config.local_rtm_cfg_type = no_rtm_cfg;vlynq_config.peer_rtm_cfg_type = no_rtm_cfg;vlynq_config.local_tx_fast_path = FALSE;vlynq_config.peer_tx_fast_path = FALSE;

/* Initialize the VLYNQ control module */ptr_vlynq = PAL_sysVlynqInitSoc(&vlynq_config);

if(NULL == ptr_vlynq){

VLYNQ_DEBUG("VLYNQ :Failed to initialize the vlynq 0x%08x\n\r",vlynq_config.base_addr);

VLYNQ_DEBUG("VLYNQ :The error msg: %s\n\r", vlynq_config.error_msg);goto av_vlynq_init_fail;

}

/* Map memory regions of device for remote/local VLYNQ depending on region ID to be mapped and the size and offset. */while(init_p_region->id > -1){

if(VLYNQ_APP_SUCCESS != PAL_sysVlynqMapRegion(ptr_vlynq, init_p_region->remote, init_p_region->id,init_p_region->offset, init_p_region->size, ptr_vlynq_dev))

VLYNQ DSP/BIOS Driver

On the TI SOC software side, a VLYNQ peripheral is implemented using a set of functions within the API (application programming interface) provided by the VLYNQ device driver.

Shown above are 2 of the preparatory steps to activate VLYNQ: PAL_sysVlynqInitSoc to initialize the VLYNQ control module, and PAL_sysVlynqMapRegion to map memory regions of the device for remote/local VLYNQ depending on the region ID to be mapped and the size and offset.

Refer to VLYNQ Device Driver architecture for a full description of all functions in the API.

Avnet SpeedWay Workshops

39

39Avnet SpeedWay Design Workshop™

Avnet Tools:- avnet_3adsp_dm6437_0_04

AVNET_S3ADSP_DM6437_INSTALL_DIR => C:\avnet_s3adsp_dm6437_0_04PSP_EVMDM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\pspCSLR_DM6437_INSTALLDIR => %AVNET_S3ADSP_DM6437_INSTALL_DIR%\psp\pspdrivers\soc\dm6437\dsp\inc

Modified version of C:\dvsdk_1_01_00_15\psp_1_00_02_00Modified version of C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval

DSP drivers(CCS specific)

FPGA logic(ISE specific)

DSP blockset(Target Support Package TC6 & Embedded IDE Link CC specific)

FPGA blockset (SysGen specific)

Avnet BSP Installation Package

Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure.

Note:

•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval

•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00

Avnet SpeedWay Workshops

40

40Avnet SpeedWay Design Workshop™

DSP blockset(Target Support Package TC6 & Embedded IDE Link CC specific)

FPGA blockset (System Generator specific)

DSP drivers (Code Composer Studio specific)

FPGA logic (ISE specific)

Spartan-3A DSP DaVinci Board Support Package

Network Devloper’s Kit (DSP/BIOS)PSP Drivers for DM6437 (DSP/BIOS)

Ethernet Hardware Co-Simulation support files

Once installed, the Avnet Spartan-3A DSP DaVinci board support package consists of the above directory structure. We concentrate here on Ethernet hardware co-simulation support files. All other components of the BSP will be presented in lecture 5.

Note:

•NDK = Modified-for-Avnet version of DVSDK for TI DM6437 EVM : C:\dvsdk_1_01_00_15\ndk_1_92_00_22_eval

•PSP = Modified-for-Avnet version of PSP DSP/BIOS drivers for TI DM6437 EVM : C:\dvsdk_1_01_00_15\psp_1_00_02_00

Avnet SpeedWay Workshops

41

41Avnet SpeedWay Design Workshop™

Ethernet Hardware Co-Simulation Support Files

• Board appears in list of targets for Ethernet hardware co-simulation

Avnet provides Ethernet hardware co-simulation support files for the Spartan-3A DSp DaVinci, as well as several Avnet Virtex-5 evaluation kits. The support files, known as ‘plugins’ are packaged in a standard format for the System Generator plugin installer ‘xlinstallplugin’. Once installed under the directory tree shown here, the board appears in the target list for Ethernet point-to-point hardware co-simulation.

Avnet SpeedWay Workshops

42

Accelerating Your Success™

V10_1_2_0

Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries

LED Demo

Avnet SpeedWay Workshops

43

43Avnet SpeedWay Design Workshop™

bsl\dsp\gel:- avnet_s3adsp_dm6437.ccs => CCS setup for BlackHawk USB510L- avnet_s3adsp_dm6437.gel => GEL file for Avnet board

bsl\dsp\src:bsl\dsp\inc:- dm6437_init.c/.h => various init/config routines- fpga_interface.c/.h => FPGA device driver (apply/release reset)- vlynq_interface.c/.h => VLYNQ device driver- led_interface.c/.h => LED device driver- dip_interface.c/.h => DIP Switch device driver- vpss_interface.h => contains a bunch of useful defines

bsl\dsp\dspbios:- Platform.tci => ??

BSL – DSP drivers

Avnet SpeedWay Workshops

44

44Avnet SpeedWay Design Workshop™

bsl\fpga\rtl:- pattern => XGA pattern generator (color bars + moving logo)- lcd => LCD flat panel interface- picoblaze => picoblaze-based I2C controller- vlynq => VLYNQ interface core- video => video interfaces (stddef, hidef, vpfe, vpbe)- debug => ChipScope debug module- top_level => top level designs

bsl\fpga\chipscope:- ChipScope Analyzer project for FPGA debug

bsl\fpga\ucf:- constraints file for FPGA designs

bsl\fpga\ise- davinci_coprocessor_stddef => example design for Composite input- davinci_coprocessor_hidef => example design for VGA input

BSL – FPGA Logic

Avnet SpeedWay Workshops

45

Accelerating Your Success™

V10_1_2_0

Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries

LED Demo

Avnet SpeedWay Workshops

46

46Avnet SpeedWay Design Workshop™

DIP Switch:- Reads one of SW10[1:4] switches (cannot be used with VPFE/VPBE)LED:- Writes to one of D7, D8, D9, D10 LEDsVLYNQ Read/Write:- Reads/Writes to FPGA peripherals via VLYNQ

MSL – DSP Logic

Avnet SpeedWay Workshops

47

47Avnet SpeedWay Design Workshop™

DaVinci Processor:- similar to Xilinx’s EDK Processor block- automatically creates VLYNQ bus logic to all shared regs/fifos/mems- creates memory map

I2C Controller:- PicoBlaze-based I2C Controller- Command Port via request/response FIFOs

MSL – FPGA Blockset

Avnet SpeedWay Workshops

48

48Avnet SpeedWay Design Workshop™

VLYNQbus logic

Automaticallycreated

FPGA design

Implementing DSP to FPGA VLYNQ Interface

Memories used in the co-processor are associated with the DaVinci processor through the block’s GUI interface in system Generator.

After an association is made, System Generator automatically generates an interface that marshals data to and from the processor over VLYNQ. On the DaVinci side Target for C6000 handles automatic code generation. Having the control and processor in the same development environment removes all the grunt work of manually maintaining the API, including memory-maps, function headers and C-code device drivers in Code Composer Studio.

Avnet SpeedWay Workshops

49

Accelerating Your Success™

V10_1_2_0

Installation PackageBSL – Board Support LibrariesMSL – Model Support Libraries

LED Demo

Avnet SpeedWay Workshops

50

50Avnet SpeedWay Design Workshop™

LED Demo – DIP Implementations

forsimulation

only

forDSP build

Avnet SpeedWay Workshops

51

51Avnet SpeedWay Design Workshop™

LED Demo – LED Implementations

forsimulation

only

forDSP build

Avnet SpeedWay Workshops

52

52Avnet SpeedWay Design Workshop™

LED Demo – Simulation only

Avnet SpeedWay Workshops

53

53Avnet SpeedWay Design Workshop™

LED Demo – DSP only

Avnet SpeedWay Workshops

54

54Avnet SpeedWay Design Workshop™

Serial RapidIO™ Enables Increased Bandwidth (TI TMS320C6455, C6474, etc.)

•C6455 Serial RapidIO Support – IEEE 1149.6 Compliant– 1.25, 2.5, 3.125 GBit/sec per link

Up to four 1x links (each 1x link is bidirectional) --OR--Up to one 4x link (bi-directional pipe), which provides up to 12.5 GBit/sec

– Resulting range 10 – 25 GBits/sec total (1.25 – 3.125 GBytes/sec)– Supports DSP-to-DSP on the same board, DSP-to-Switch, DSP-to-FPGA,

etc.•Benefits– 1x Link is fast enough to send HD 1080i raw video between devices– 4x Link is easily fast enough to send HD 1080p raw video between devices– Reduction in chip count, board area and system cost

Serial RapidIO is a high-performance, packet-switched, interconnect technology that addresses the embedded industry's need for:

Serial RapidIO allows chip-to-chip and board-to-board communications at performance levels scaling to ten Gigabits per second and beyond

Reliability Increased Bandwidth Faster Bus Speeds

TI customers asked for faster IO performance. TI listened. TI are bus agnostic. So, let’s first explain why did TI choose Serial Rapid IO for C6455:

High Performance for HD video and Telecom Channel DensityWorldwide standard, Multiple applications, broad OEM adoptionFlexible / scaleable rates and widths (1x or 4x)Low pin count and Low power per link

TI was part of the consortium that defined the standard with other industry leaders.

The theoretical payload bandwidth is up to 25Gbits/sec, but there is some overhead (addresses, acknowledgement, error correction) with any communications protocol. (reality may be ~19 or 20 Gbits/sec)

From a video infrastructure applications perspective, the 1x Link is fast enough to send HD 1080i raw video between devices and the 4x link can easily send HD 1080p raw video between devices. The use of SRIO in infrastructure applications with large “DSP farms” may allow the reduction of FPGA cost (quantity, pin count, size and/or cost) for our OEMs.