dsp advisorceva-dsp.mediaroom.com/download/ceva+newsletter+2013_02... · 2013-07-06 · dsp advisor...

DSP advisor

> In this issue

Multi-core design gets easier with MUST- scalability, flexibility, and precision for next-generation communications

AMF abstracts DSPs in Android-based systems- offloading multimedia tasks on Android platforms

Shaky hands make better pictures - CEVA delivers Super-Resolution imaging performance for mobile devices

Mindspeed leverages CEVA DSPs for next-gen wireless products with multiple CEVA DSPs in wireless infrastructure SoCs

Antcor targets multiple Wi-Fi markets with a single SoC

Demos-on-demand: the latest CEVA demo for computational photography and computer vision

CEVA Newsletter | Spring | 2013

•

•

•

•

•

•

Multi-core design gets easier with MUST- scalability, flexibility, and precision for next-generation communications

Whether you’re a wireless modem OEM, a wireless IP provider, or a

maker of semiconductor chips for the wireless market, your job gets

harder with each generation of wireless technology. Not only do

processing demands become increasingly exacting, but deployment

configurations are evolving as well. Where once macrocells dominated

the cellular infrastructure landscape, configurations now vary according

to the location and traffic expected, from small picocells in densely-

packed installations through micro-, metro-, and macrocells. Exploration

into technologies like Cloud RAN promises even more change in

the future.

This means that modem designs must be flexible enough to handle

a variety of traffic and configurations, but they also need to be able

to scale so that a given design platform can be adapted to a specific

modem design as quickly as possible. And that means that the

underlying architecture has to have the flexibility and scalability that

only a well-integrated multicore platform can provide, built around

clusters of DSP processors offering dynamic load balancing. The new

technologies themselves are also demanding much more processing

precision and throughput than current architectures provide.

The CEVA-XC4000 DSP core is optimized for wireless communications,

and it has already proven its value with more than 20 designs under

its belt. To increase the scope, scalability, and precision of the XC4000

architecture, CEVA has introduced new MUST multicore technology

and vector floating point capabilities. Combined with a rich set of

accelerators and tightly-coupled extensions (TCEs), the XC4000 becomes

an even more compelling solution to wireless modem complexity.

Multicore means more than simply adding additional cores. While

the DSPs largely serve the data plane, they must interact with each

other and with host CPUs. Depending on the design decisions, they

may be configured in homogeneous or heterogeneous, symmetric or

asymmetric arrangements. CEVA has facilitated this by using AXI-4 and

FIC (fast interconnect) schemes for easy assembly of multiple DSPs and

ARM cores. On top of the different interconnect options, CEVA offers

Ceva DSP aDvISor > SPrIng 2013

2

DSP Core DSP Core

DSP Core DSP Core

Data Traffic Manager




CEVA-XC CEVA-XC

CEVA-XC CEVA-XC

Tightly CoupledExtensions (TCE)

Vector FPU Vector FPU

Vector FPU Vector FPU

Data Cache Data Cache

Data Cache Data Cache

Cache Coherency Cache Coherency

Cache Coherency Cache Coherency

QueueManager

QueueManager

QueueManager

QueueManager

MLD

Demapp

FFT

DFT

Peripherals & I/O I/F L2 Cache Snoop Filtering Shared Queues

Viterbi

HARQ Combine

User Defined

BufferManager

BufferManager

BufferManager

BufferManager

System Elements and Peripherals

AMBA 4 Advanced Coherent Bus Interface

>

CEVA’s MUST multicore cluster block diagram

Multicore processing also benefits from a hardware scheduling

infrastructure that allows optimal, tunable task scheduling on the fly.

The CEVA TCEs, as well as any customer proprietary accelerators, are

also managed through the Data Traffic Manager, offloading the DSP from

data management in the system. A hardware abstraction layer isolates

software programmers from the hardware details, making software

simpler and easier to port and the different hardware accelerators

easier to control.

aMF abstracts DSPs in android-based systems- offloading multimedia tasks on android platforms

Multimedia on a phone used to be a big deal. Handling images or video

faster and with better resolution was part of the whiz-bang attraction of

a new phone. But these days, everyone expects a multimedia experience

on their phone. Your phone might stand out if it does a particularly

good job of handling media, but you have no chance of success if the

phone you design can’t handle multimedia.

High quality multimedia is so pervasive yet demanding that image, video, and audio processing are being bumped from the main CPU to dedicated engines like the CEVA-MM3101 (for photography and computer vision) and the CEVA-TeakLite-4 (for audio and speech). Through a combination of finely-tailored DSPs, judicious hardware acceleration, and optimized software solutions, phones using CEVA cores process multimedia faster and with lower power and they free

up the main CPU to run more user applications.

an advanced Data Traffic Manager that enables data transfers to be

automatically managed based on task status and data load levels.

Multicore also creates the need for cache coherency if memory is to

be shared between multiple DSP and/or CPU cores. By leveraging the

AMBA-4 ACE coherency infrastructure, a variety of caching schemes can

be managed transparently by the hardware without SW intervention,

allowing – but not requiring – software control.

Multicore helps scale a given design, but radio technologies like

high-dimension MIMO, a multiple-antennae feature of both LTE-A

and 802.11ac, require the high precision that only floating point

data provides. A conventional floating point unit would not meet the

performance requirements, so CEVA has added vector level floating

point capabilities, allowing up to 32 simultaneous single-core-cycle

floating point operations with IEEE precision. Not only does this speed

execution, but it also makes it much easier to port algorithms originally

written using ‘float’ type or developed with tools like Matlab, which rely

on floating point math, directly to the software implementation that

will be executed on the XC4000 platform.

Meanwhile, many of the functions to be performed on incoming data

packets are just begging to be accelerated in hardware, improving

performance and offloading the DSP cores. CEVA provides a rich set

of accelerators and TCEs for exactly this purpose, including maximum

likelihood MIMO detectors (MLD), 3G de-spreaders, DFT and FFT

units, Viterbi decoders, and log-likelihood ratio (LLR) and hybrid ARQ

(HARQ) units.


3

OMX = OpenMax APIIPC = Inter Processor Communication

IPC

CPU

OpenCORE / Stagefright

CEVA Host Link DriverHardware

Driver(PIU)

MP3OMX

Component

File Reader OMX

Component

Dolby MobileOMX

Component

SuperResolution

OMXComponent

Logical Communication

DSPs

OpenMAX Components

CEVA DSPLink Driver

RTOS

HardwareDrivers

PSU

OMX MP3 decoder

OMX DM processor

OMX Super-Resolution

>

CEVA Android Multimedia Framework

There has been a catch, however, if you’re building an Android phone:

Android doesn’t understand offloads. As far as it’s concerned, there’s

one processor (a CPU, possibly using multiple cores), and that processor

does all the work, which leaves any multimedia accelerators high and

dry. If you’re writing multimedia – in particular, code leveraging the

open-source OpenMAX API – there’s no way you can tell Android to

take advantage of deeply-embedded accelerators for more efficient

execution.

To address this, CEVA has announced its Android Multimedia Framework,

or AMF. What we’ve done is to build an RTOS-managed DSP subsystem

next to the CPU; this can execute software algorithms and manage

DSP-accelerated functions. In order to make this accessible to software

executing on the CPU, we’ve built a set of drivers for the CPU and an

abstraction layer that implements the OpenMAX API.

When the CPU encounters an OpenMAX multimedia function call,

the function won’t be executed on the CPU; it will call the lower-level

driver – and that driver will engage the DSP subsystem, sharing any

necessary data and returning the results. Furthermore, multimedia

function calls can be chained together via the Android tunneling

mechanism, reducing data transfer overhead.

The CEVA AMF can work on platforms implemented on a single SoC or

across multiple chips, allowing flexibility in the underlying architecture

and system connectivity. In fact, if the CPU was originally sized assuming

it would have to handle multimedia, you may be able to drop some CPU

cores, lowering overall cost and power. Using the CEVA platforms, you

can do compute-intensive multimedia functions using 95% less power

than you would need if you were to run them on the CPU.

Meanwhile, Android application writers can continue to write high-level

C code as if targeting the CPU. The CEVA AMF makes this sleight of hand

completely transparent, keeping abstraction high while providing the

performance that would otherwise require low-level coding to obtain.

If you’ve been struggling to find a way to manage multimedia better on

your mobile SoC, come to our AMF page http://www.ceva-dsp.com/

Android-Multimedia-Framework-AMF and get the details.

Your OEM customers will thank you.

Shaky hands make better pictures - Ceva delivers Super-resolution imaging performance for mobile devices

A low-quality camera in the shaky hands of a novice might seem to be

the worst possible scenario for picture-taking. But it’s actually turning

into exactly what’s needed to get a picture that looks like it came from

a much better camera.

This isn’t about stabilizing a shaking camera using optical image

stabilization (OIS), which compensates for the camera movement at

the time the picture is taken. This is different: it’s about taking several

images very quickly, each of which is slightly offset from the others due

to even the tiniest shaking or movement of the device. Algorithms can

combine them to create a single image with apparent resolution far

greater than what the camera sensor actually provides.

This technology is called Super-Resolution (SR), and it’s typically been

something you do on a PC after taking the pictures. But with CEVA’s

newly-introduced Super-Resolution algorithm, this capability can be

designed directly into the camera or phone itself.

Low-resolution phones and cameras can greatly benefit from this

technology for a couple of reasons. First, image sensors have to trade

off resolution against the ability to take low-light photos. The total

sensor area is fixed, and higher resolution means more pixels in the

same area. That means smaller pixels, and smaller pixels have less

4


Multi-image Super Resolution

Low res sensor

High quality image

CEVA SRAlgorithm

Four 5Mpixel images Single 20Mpixel image

>

CEVA Multi-image Super Resolution

area for receiving photons. But by putting SR capability directly into a

phone with a lower-resolution sensor, you can get excellent resolution

and good low-light performance, eliminating the tradeoff.

The other reason why this makes particular sense for smartphones is

that such units are much more likely to be handheld. SR doesn’t work

well (or at all) if there is absolutely no change from image to image

(a static image taken on a tripod, for instance). Handheld cameras

and phones are ideal for providing those small movements needed to

make SR work. The algorithm has an anti-ghosting feature to eliminate

movement artifacts that can appear using other SR algorithms.

The CEVA implementation is optimized to work as a pure software

module on the CEVA-MM3101 platform, executing efficiently and with

low power consumption. On a 28nm process, for example, CEVA’s SR

algorithm can combine four 5-Mpixel images into a single 20-Mpixel

image in less than a second, using less than 30 mW to do so. This

process is useful not only for creating higher-quality pictures, but it can

also be used to improve the quality of a digital zoom feature or even

just to remove image noise – which is of particular value when taking

pictures in low light.

Because the algorithm executes so fast, this can even become the

default mode for the camera – the user might not even need to know

what’s going on; he’ll simply be getting outstanding pictures from a

cost-effective phone. You can find more information on our SR page

http://events.ceva-dsp.com/sr/, http://www.ceva-dsp.com/CEVA-

MM3101

Mindspeed leverages Ceva DSPs for next-gen wireless products with multiple Ceva DSPs in wireless infrastructure SoCs

Mindspeed Technologies is an established pioneer in system-on-

chip (SoC) design, utilizing CEVA DSPs in their Transcede family of

SoCs for the past five years. The Transcede family supports NodeB

and eNodeB wireless baseband processing and achieves market-

leading performance by incorporating the latest DSP technology from

CEVA. With the emergence of small cell base stations, volumes can

justify the development of specialized SoCs. CEVA DSPs are the ideal

solution, having the lowest power/performance ratio of any DSP and

an instruction set optimized for wireless modems.

The CEVA-XC architecture underlies a broad range of Transcede solutions

for wireless base station applications, from femtocells to macrocells. 5


>

Comparison of CEVA Multi-image super-resolution versus others’ Resolution

CEVA Super ResLeading PC SR Application


CEVA Super ResBicubic Interpolation


Mindspeed has leveraged the flexibility and performance of the CEVA

DSPs, bolstered by reference architectures, a complete development

environment for programming in C, and a comprehensive suite of

optimized library functions, including LTE, LTE-A, Wi-Fi and HSPA+.

Architectural features specifically address base station application

requirements, with high precision, strong support of high-dimension

MIMO technology, and advanced support of DSP offloading to

dedicated pre-optimized tightly coupled extensions (TCEs). The

architecture also includes specialized multi-core features such as

advanced data traffic management, fast system interconnect support

for easy integration of DSP clusters, and native connectivity with

ARM® processors.

Transcede SoCs are optimally balanced to meet the performance of their

targeted base station market, whether residential, enterprise, or pico.

As can been seen from the diagram, the number of functional blocks can

be scaled to meet the performance requirements of the application. In

addition, since updates to new versions of the 3GPP standards can be

handled by software upgrades, the solution involving programmable

CPUs and DSPs is “future-proof”.

In addition, an array of DSP processors from CEVA allows each DSP to be dynamically loaded with the code image needed during frame processing, presenting a much more flexible design as compared to fixed-function implementations.

The CEVA DSP software development environment, CEVA-ToolBox, allows Mindspeed to write optimal DSP code that can be ported across the CEVA-XC family and across ARM cores running real-time Linux. Highly

optimized DSP libraries from CEVA support code re-use.

Examples of functions that run on CEVA DSPs are symbol processing,

channel estimation, map/de-map, physical signals, and MIMO

processing. The CEVA DSPs are ideal for optimized baseband processing

due to their wide adoption by leading user equipment (UE) vendors.

Based on Mindspeed’s prior success with CEVA DSPs, the company has

decided to use the new CEVA-XC4000 in their next generation Transcede

SoCs. The enhanced power-optimized pipeline and TCEs make this

processor ideally suited for Transcede SoCs as they implement the most

advanced wireless standards, including LTE-Advanced, Wi-Fi 802.11ac,

multi-carrier HSPA+, and 5G Wi-Fi.

antcor targets multiple Wi-Fi markets with a single SoC

With the proliferation of communication devices and standards, it’s

impossible to design dedicated chips for every combination. CEVA’s

technology lets architects and designers use software instead of

hardware to manage this complexity, mixing and configuring standards

on a single hardware platform. The benefit is that a one-time hardware

investment can be applied to multiple markets.

6


GEMACRGMII

Memory toMemory DMA

Engines

Network Interface

ARM MPcore Cluster

FECHW

CEVA-XC DSP Array

FilterProcessing

Atray

CPRI

PCIe

Expansion Buses to Peripherals incl.USB, GPIO, UART, Coresight

L2 Memory

DDR3 Contoller

DDR3 PHY

Radio Interface

Buses

AXI Multi-Layer CEVA Bus Matrix

AXI Multi-Layer Sys Bus Matrix

L2 Cache

>

Mindspeed Transcede SoC implementing CEVA-XC DSP array

But for devices intended to support only WiFi, hardware has traditionally

been the methodology of choice. The challenge of a hardware approach

is that time-to-market pressures and cost restraints end up limiting the

scope of markets that each chip can address.

Instead of following the usual hardware methodology, Antcor, designers

of the Proteus family of WiFi IP, has used a software approach,

taking advantage of the CEVA-XC family to develop its Multi-Mode

Extension (MME) capability. MME is a software scheduler that creates

multi-threaded instances of Proteus that share the same DSP and

hardware resources.

By using software instead of hardware, a single platform can manage

a variety of tasks. Proteus can be configured either to implement a

single 4x4 802.11ac access point or multiple 3x3, 2x2, or 1x1 access

points. This allows Antcor customers to address all of these applications

with a single SoC design instead of requiring multiple separate chips.

The CEVA-XC architecture, with its focus on low power, dedicated support

of such critical features as advanced MIMO technology, and its design

environment and libraries, has given Antcor the power and performance

that allows them to rely on software instead of hardware. Designers using

Antcor’s technology will be able to complete their designs faster – and

address a much wider market that they would otherwise been able to do.

Demos-on-demand: the latest Ceva demo for computational photography and computer vision

CEVA has assembled an extensive collection of computer vision and

computational photography kernels in their CEVA-CV library. This

library is based on OpenCV kernels and algorithms that were ported

and optimized to run on the CEVA-MM3101 for embedded devices. Most

computer vision algorithms utilize pieces of this library to significantly

speed up their development cycle. As an example, by taking advantage

of the CEVA-CV library, the CEVA Super-Resolution algorithm took less

than two months to port and fully optimize for the CEVA-MM3101.

In order to demonstrate how easy the library is to use, CEVA has built

a demo board using an Altera FPGA that implements a single

CEVA-MM3101. You can chain up to three different kernels in any order

and see how they impact a high-definition video clip of your choice.

You can provide the video from a camera or another source, and the

resulting output is sent to a screen or monitor.

Another software component working behind the scenes to enable

this demo is CEVA’s SmartFrame Manager. This module handles all the

system-related tasks needed to support the CV kernels, which include

copying incoming data, transferring data into local memories, activating

DMA, and enabling tunneling of consecutive tasks.

7


Single DSP – Multiple WiFi Access Point Configurations

Multi-Mode Extension (MME)

802.11ac 4x4

High-Profile 802.11ac

802.11ac 2x2

Triple Simultaneous AP

802.11n 1x1

802.11b/g

802.11ac 2x2

802.11n 2x2

Simultaneous Dual Band

>

Antcor Single DSP - Multiple Wi-Fi Access Point Configurations

CEVA DSP advisor Newsletter© CEVA 2013. All rights reserved

USa1943 Landings DriveMountain View, CA 94043 Tel: +1 (650) 417 7900 Fax: +1 (650) 417 7995

Hong KongLevel 43, AIA Tower 183 Electric Road North PointHong KongTel: +852 3975 1264

Israel2 Maskit Street POBox 2068 Herzelia 46120 Tel: +972 9 961 3700 Fax: +972 9 961 3800

Ireland2nd Floor 8-11 Lower Baggot St. Dublin 2 Tel: +353 1 237 3900 Fax: +353 1 237 3923

Japan3014 Shinoharacho Kohoku-ku, Yokohama Kanagawa-Ken 222-0026 Tel: +81 45 430 3901 Fax: +81 45 430 3904

South Korea#478, Hyundai Arion 147 Gumgok-Dong, Bundang-Gu, Sungnam-Si Kyunggi-Do, 463-853Tel: +82 31 704 4471Fax: +82 31 704 4479

ChinaRoom 517 Apollo Business Center No.1440, Yan An Road (C) Shanghai 200040 Tel: +86 21 610 31719 Fax: +86 21 610 31720

TaiwanRoom 909, 9F, No.689, Sec.5 Chung Hsio E. Road Hsin-Yi District, Taipei Tel. +886 2 8785 8668 Fax. +886 2 8785 1281

SwedenKlarabergsviadukten 70 Box 70396 107 24 Stockholm Tel: +46 (0)8 506 362 24 Fax: +46 (0)8 506 362 20

www.ceva-dsp.com>


8

You can choose from the following kernels and filters:

• Median

• Average

• Sobel

• Gaussian

• FIR

•Laplacian

•Fast9

•Corner Harris

Different combinations of these kernels can be used for applications

like:

•Noise reduction

•Scale down

•Deblurring

•Finding the 2nd-order derivative

•Threshold

These applications are used for finding corners, gradients, and

derivatives; finding an inverse using a lookup table; and erosion and

dilation morphology.

The demo is easy to use: Once you choose the kernels, the host

initializes the vector processor and the SmartFrame Manager tool

establishes the optimal block size, configures the DMA requests, and

handles other such housekeeping duties. At that point, the vector

processor can start applying the selected kernels.

An explanatory video of the demo is available on YouTube at: http://

youtu.be/HLCqcvVyohk or directly on CEVA’s demo showroom under

computer vision at: http://events.ceva-dsp.com/showroom/. To learn

more about CEVA’s computer vision demo, click: http://www.ceva-dsp.

com/CEVA-MM3101

CEVA-CV various combinations of kernels and filters

dsp advisorceva-dsp.mediaroom.com/download/ceva+newsletter+2013_02... · 2013-07-06 · dsp advisor...

Documents