dsp advisorceva-dsp.mediaroom.com/download/ceva+newsletter+2013_02... · 2013-07-06 · dsp advisor...
TRANSCRIPT
DSP advisor
> In this issue
Multi-core design gets easier with MUST- scalability, flexibility, and precision for next-generation communications
AMF abstracts DSPs in Android-based systems- offloading multimedia tasks on Android platforms
Shaky hands make better pictures - CEVA delivers Super-Resolution imaging performance for mobile devices
Mindspeed leverages CEVA DSPs for next-gen wireless products with multiple CEVA DSPs in wireless infrastructure SoCs
Antcor targets multiple Wi-Fi markets with a single SoC
Demos-on-demand: the latest CEVA demo for computational photography and computer vision
CEVA Newsletter | Spring | 2013
•
•
•
•
•
•
Multi-core design gets easier with MUST- scalability, flexibility, and precision for next-generation communications
Whether you’re a wireless modem OEM, a wireless IP provider, or a
maker of semiconductor chips for the wireless market, your job gets
harder with each generation of wireless technology. Not only do
processing demands become increasingly exacting, but deployment
configurations are evolving as well. Where once macrocells dominated
the cellular infrastructure landscape, configurations now vary according
to the location and traffic expected, from small picocells in densely-
packed installations through micro-, metro-, and macrocells. Exploration
into technologies like Cloud RAN promises even more change in
the future.
This means that modem designs must be flexible enough to handle
a variety of traffic and configurations, but they also need to be able
to scale so that a given design platform can be adapted to a specific
modem design as quickly as possible. And that means that the
underlying architecture has to have the flexibility and scalability that
only a well-integrated multicore platform can provide, built around
clusters of DSP processors offering dynamic load balancing. The new
technologies themselves are also demanding much more processing
precision and throughput than current architectures provide.
The CEVA-XC4000 DSP core is optimized for wireless communications,
and it has already proven its value with more than 20 designs under
its belt. To increase the scope, scalability, and precision of the XC4000
architecture, CEVA has introduced new MUST multicore technology
and vector floating point capabilities. Combined with a rich set of
accelerators and tightly-coupled extensions (TCEs), the XC4000 becomes
an even more compelling solution to wireless modem complexity.
Multicore means more than simply adding additional cores. While
the DSPs largely serve the data plane, they must interact with each
other and with host CPUs. Depending on the design decisions, they
may be configured in homogeneous or heterogeneous, symmetric or
asymmetric arrangements. CEVA has facilitated this by using AXI-4 and
FIC (fast interconnect) schemes for easy assembly of multiple DSPs and
ARM cores. On top of the different interconnect options, CEVA offers
Ceva DSP aDvISor > SPrIng 2013
2
DSP Core DSP Core
DSP Core DSP Core
Data Traffic Manager
Data Traffic Manager
Data Traffic Manager
Data Traffic Manager
CEVA-XC CEVA-XC
CEVA-XC CEVA-XC
Tightly CoupledExtensions (TCE)
Vector FPU Vector FPU
Vector FPU Vector FPU
Data Cache Data Cache
Data Cache Data Cache
Cache Coherency Cache Coherency
Cache Coherency Cache Coherency
QueueManager
QueueManager
QueueManager
QueueManager
MLD
Demapp
FFT
DFT
Peripherals & I/O I/F L2 Cache Snoop Filtering Shared Queues
Viterbi
HARQ Combine
User Defined
BufferManager
BufferManager
BufferManager
BufferManager
System Elements and Peripherals
AMBA 4 Advanced Coherent Bus Interface
>
CEVA’s MUST multicore cluster block diagram
Multicore processing also benefits from a hardware scheduling
infrastructure that allows optimal, tunable task scheduling on the fly.
The CEVA TCEs, as well as any customer proprietary accelerators, are
also managed through the Data Traffic Manager, offloading the DSP from
data management in the system. A hardware abstraction layer isolates
software programmers from the hardware details, making software
simpler and easier to port and the different hardware accelerators
easier to control.
aMF abstracts DSPs in android-based systems- offloading multimedia tasks on android platforms
Multimedia on a phone used to be a big deal. Handling images or video
faster and with better resolution was part of the whiz-bang attraction of
a new phone. But these days, everyone expects a multimedia experience
on their phone. Your phone might stand out if it does a particularly
good job of handling media, but you have no chance of success if the
phone you design can’t handle multimedia.
High quality multimedia is so pervasive yet demanding that image, video, and audio processing are being bumped from the main CPU to dedicated engines like the CEVA-MM3101 (for photography and computer vision) and the CEVA-TeakLite-4 (for audio and speech). Through a combination of finely-tailored DSPs, judicious hardware acceleration, and optimized software solutions, phones using CEVA cores process multimedia faster and with lower power and they free
up the main CPU to run more user applications.
an advanced Data Traffic Manager that enables data transfers to be
automatically managed based on task status and data load levels.
Multicore also creates the need for cache coherency if memory is to
be shared between multiple DSP and/or CPU cores. By leveraging the
AMBA-4 ACE coherency infrastructure, a variety of caching schemes can
be managed transparently by the hardware without SW intervention,
allowing – but not requiring – software control.
Multicore helps scale a given design, but radio technologies like
high-dimension MIMO, a multiple-antennae feature of both LTE-A
and 802.11ac, require the high precision that only floating point
data provides. A conventional floating point unit would not meet the
performance requirements, so CEVA has added vector level floating
point capabilities, allowing up to 32 simultaneous single-core-cycle
floating point operations with IEEE precision. Not only does this speed
execution, but it also makes it much easier to port algorithms originally
written using ‘float’ type or developed with tools like Matlab, which rely
on floating point math, directly to the software implementation that
will be executed on the XC4000 platform.
Meanwhile, many of the functions to be performed on incoming data
packets are just begging to be accelerated in hardware, improving
performance and offloading the DSP cores. CEVA provides a rich set
of accelerators and TCEs for exactly this purpose, including maximum
likelihood MIMO detectors (MLD), 3G de-spreaders, DFT and FFT
units, Viterbi decoders, and log-likelihood ratio (LLR) and hybrid ARQ
(HARQ) units.
Ceva DSP aDvISor > SPrIng 2013
3
OMX = OpenMax APIIPC = Inter Processor Communication
IPC
CPU
OpenCORE / Stagefright
CEVA Host Link DriverHardware
Driver(PIU)
MP3OMX
Component
File Reader OMX
Component
Dolby MobileOMX
Component
SuperResolution
OMXComponent
Logical Communication
DSPs
OpenMAX Components
CEVA DSPLink Driver
RTOS
HardwareDrivers
PSU
OMX MP3 decoder
OMX DM processor
OMX Super-Resolution
>
CEVA Android Multimedia Framework
There has been a catch, however, if you’re building an Android phone:
Android doesn’t understand offloads. As far as it’s concerned, there’s
one processor (a CPU, possibly using multiple cores), and that processor
does all the work, which leaves any multimedia accelerators high and
dry. If you’re writing multimedia – in particular, code leveraging the
open-source OpenMAX API – there’s no way you can tell Android to
take advantage of deeply-embedded accelerators for more efficient
execution.
To address this, CEVA has announced its Android Multimedia Framework,
or AMF. What we’ve done is to build an RTOS-managed DSP subsystem
next to the CPU; this can execute software algorithms and manage
DSP-accelerated functions. In order to make this accessible to software
executing on the CPU, we’ve built a set of drivers for the CPU and an
abstraction layer that implements the OpenMAX API.
When the CPU encounters an OpenMAX multimedia function call,
the function won’t be executed on the CPU; it will call the lower-level
driver – and that driver will engage the DSP subsystem, sharing any
necessary data and returning the results. Furthermore, multimedia
function calls can be chained together via the Android tunneling
mechanism, reducing data transfer overhead.
The CEVA AMF can work on platforms implemented on a single SoC or
across multiple chips, allowing flexibility in the underlying architecture
and system connectivity. In fact, if the CPU was originally sized assuming
it would have to handle multimedia, you may be able to drop some CPU
cores, lowering overall cost and power. Using the CEVA platforms, you
can do compute-intensive multimedia functions using 95% less power
than you would need if you were to run them on the CPU.
Meanwhile, Android application writers can continue to write high-level
C code as if targeting the CPU. The CEVA AMF makes this sleight of hand
completely transparent, keeping abstraction high while providing the
performance that would otherwise require low-level coding to obtain.
If you’ve been struggling to find a way to manage multimedia better on
your mobile SoC, come to our AMF page http://www.ceva-dsp.com/
Android-Multimedia-Framework-AMF and get the details.
Your OEM customers will thank you.
Shaky hands make better pictures - Ceva delivers Super-resolution imaging performance for mobile devices
A low-quality camera in the shaky hands of a novice might seem to be
the worst possible scenario for picture-taking. But it’s actually turning
into exactly what’s needed to get a picture that looks like it came from
a much better camera.
This isn’t about stabilizing a shaking camera using optical image
stabilization (OIS), which compensates for the camera movement at
the time the picture is taken. This is different: it’s about taking several
images very quickly, each of which is slightly offset from the others due
to even the tiniest shaking or movement of the device. Algorithms can
combine them to create a single image with apparent resolution far
greater than what the camera sensor actually provides.
This technology is called Super-Resolution (SR), and it’s typically been
something you do on a PC after taking the pictures. But with CEVA’s
newly-introduced Super-Resolution algorithm, this capability can be
designed directly into the camera or phone itself.
Low-resolution phones and cameras can greatly benefit from this
technology for a couple of reasons. First, image sensors have to trade
off resolution against the ability to take low-light photos. The total
sensor area is fixed, and higher resolution means more pixels in the
same area. That means smaller pixels, and smaller pixels have less
4
Ceva DSP aDvISor > SPrIng 2013
Multi-image Super Resolution
Low res sensor
High quality image
CEVA SRAlgorithm
Four 5Mpixel images Single 20Mpixel image
>
CEVA Multi-image Super Resolution
area for receiving photons. But by putting SR capability directly into a
phone with a lower-resolution sensor, you can get excellent resolution
and good low-light performance, eliminating the tradeoff.
The other reason why this makes particular sense for smartphones is
that such units are much more likely to be handheld. SR doesn’t work
well (or at all) if there is absolutely no change from image to image
(a static image taken on a tripod, for instance). Handheld cameras
and phones are ideal for providing those small movements needed to
make SR work. The algorithm has an anti-ghosting feature to eliminate
movement artifacts that can appear using other SR algorithms.
The CEVA implementation is optimized to work as a pure software
module on the CEVA-MM3101 platform, executing efficiently and with
low power consumption. On a 28nm process, for example, CEVA’s SR
algorithm can combine four 5-Mpixel images into a single 20-Mpixel
image in less than a second, using less than 30 mW to do so. This
process is useful not only for creating higher-quality pictures, but it can
also be used to improve the quality of a digital zoom feature or even
just to remove image noise – which is of particular value when taking
pictures in low light.
Because the algorithm executes so fast, this can even become the
default mode for the camera – the user might not even need to know
what’s going on; he’ll simply be getting outstanding pictures from a
cost-effective phone. You can find more information on our SR page
http://events.ceva-dsp.com/sr/, http://www.ceva-dsp.com/CEVA-
MM3101
Mindspeed leverages Ceva DSPs for next-gen wireless products with multiple Ceva DSPs in wireless infrastructure SoCs
Mindspeed Technologies is an established pioneer in system-on-
chip (SoC) design, utilizing CEVA DSPs in their Transcede family of
SoCs for the past five years. The Transcede family supports NodeB
and eNodeB wireless baseband processing and achieves market-
leading performance by incorporating the latest DSP technology from
CEVA. With the emergence of small cell base stations, volumes can
justify the development of specialized SoCs. CEVA DSPs are the ideal
solution, having the lowest power/performance ratio of any DSP and
an instruction set optimized for wireless modems.
The CEVA-XC architecture underlies a broad range of Transcede solutions
for wireless base station applications, from femtocells to macrocells. 5
Ceva DSP aDvISor > SPrIng 2013
>
Comparison of CEVA Multi-image super-resolution versus others’ Resolution
CEVA Super ResLeading PC SR Application
CEVA Super ResLeading PC SR Application
CEVA Super ResBicubic Interpolation
CEVA Super ResLeading PC SR Application
Mindspeed has leveraged the flexibility and performance of the CEVA
DSPs, bolstered by reference architectures, a complete development
environment for programming in C, and a comprehensive suite of
optimized library functions, including LTE, LTE-A, Wi-Fi and HSPA+.
Architectural features specifically address base station application
requirements, with high precision, strong support of high-dimension
MIMO technology, and advanced support of DSP offloading to
dedicated pre-optimized tightly coupled extensions (TCEs). The
architecture also includes specialized multi-core features such as
advanced data traffic management, fast system interconnect support
for easy integration of DSP clusters, and native connectivity with
ARM® processors.
Transcede SoCs are optimally balanced to meet the performance of their
targeted base station market, whether residential, enterprise, or pico.
As can been seen from the diagram, the number of functional blocks can
be scaled to meet the performance requirements of the application. In
addition, since updates to new versions of the 3GPP standards can be
handled by software upgrades, the solution involving programmable
CPUs and DSPs is “future-proof”.
In addition, an array of DSP processors from CEVA allows each DSP to be dynamically loaded with the code image needed during frame processing, presenting a much more flexible design as compared to fixed-function implementations.
The CEVA DSP software development environment, CEVA-ToolBox, allows Mindspeed to write optimal DSP code that can be ported across the CEVA-XC family and across ARM cores running real-time Linux. Highly
optimized DSP libraries from CEVA support code re-use.
Examples of functions that run on CEVA DSPs are symbol processing,
channel estimation, map/de-map, physical signals, and MIMO
processing. The CEVA DSPs are ideal for optimized baseband processing
due to their wide adoption by leading user equipment (UE) vendors.
Based on Mindspeed’s prior success with CEVA DSPs, the company has
decided to use the new CEVA-XC4000 in their next generation Transcede
SoCs. The enhanced power-optimized pipeline and TCEs make this
processor ideally suited for Transcede SoCs as they implement the most
advanced wireless standards, including LTE-Advanced, Wi-Fi 802.11ac,
multi-carrier HSPA+, and 5G Wi-Fi.
antcor targets multiple Wi-Fi markets with a single SoC
With the proliferation of communication devices and standards, it’s
impossible to design dedicated chips for every combination. CEVA’s
technology lets architects and designers use software instead of
hardware to manage this complexity, mixing and configuring standards
on a single hardware platform. The benefit is that a one-time hardware
investment can be applied to multiple markets.
6
Ceva DSP aDvISor > SPrIng 2013
GEMACRGMII
Memory toMemory DMA
Engines
Network Interface
ARM MPcore Cluster
FECHW
CEVA-XC DSP Array
FilterProcessing
Atray
CPRI
PCIe
Expansion Buses to Peripherals incl.USB, GPIO, UART, Coresight
L2 Memory
DDR3 Contoller
DDR3 PHY
Radio Interface
Buses
AXI Multi-Layer CEVA Bus Matrix
AXI Multi-Layer Sys Bus Matrix
L2 Cache
>
Mindspeed Transcede SoC implementing CEVA-XC DSP array
But for devices intended to support only WiFi, hardware has traditionally
been the methodology of choice. The challenge of a hardware approach
is that time-to-market pressures and cost restraints end up limiting the
scope of markets that each chip can address.
Instead of following the usual hardware methodology, Antcor, designers
of the Proteus family of WiFi IP, has used a software approach,
taking advantage of the CEVA-XC family to develop its Multi-Mode
Extension (MME) capability. MME is a software scheduler that creates
multi-threaded instances of Proteus that share the same DSP and
hardware resources.
By using software instead of hardware, a single platform can manage
a variety of tasks. Proteus can be configured either to implement a
single 4x4 802.11ac access point or multiple 3x3, 2x2, or 1x1 access
points. This allows Antcor customers to address all of these applications
with a single SoC design instead of requiring multiple separate chips.
The CEVA-XC architecture, with its focus on low power, dedicated support
of such critical features as advanced MIMO technology, and its design
environment and libraries, has given Antcor the power and performance
that allows them to rely on software instead of hardware. Designers using
Antcor’s technology will be able to complete their designs faster – and
address a much wider market that they would otherwise been able to do.
Demos-on-demand: the latest Ceva demo for computational photography and computer vision
CEVA has assembled an extensive collection of computer vision and
computational photography kernels in their CEVA-CV library. This
library is based on OpenCV kernels and algorithms that were ported
and optimized to run on the CEVA-MM3101 for embedded devices. Most
computer vision algorithms utilize pieces of this library to significantly
speed up their development cycle. As an example, by taking advantage
of the CEVA-CV library, the CEVA Super-Resolution algorithm took less
than two months to port and fully optimize for the CEVA-MM3101.
In order to demonstrate how easy the library is to use, CEVA has built
a demo board using an Altera FPGA that implements a single
CEVA-MM3101. You can chain up to three different kernels in any order
and see how they impact a high-definition video clip of your choice.
You can provide the video from a camera or another source, and the
resulting output is sent to a screen or monitor.
Another software component working behind the scenes to enable
this demo is CEVA’s SmartFrame Manager. This module handles all the
system-related tasks needed to support the CV kernels, which include
copying incoming data, transferring data into local memories, activating
DMA, and enabling tunneling of consecutive tasks.
7
Ceva DSP aDvISor > SPrIng 2013
Single DSP – Multiple WiFi Access Point Configurations
Multi-Mode Extension (MME)
802.11ac 4x4
High-Profile 802.11ac
802.11ac 2x2
Triple Simultaneous AP
802.11n 1x1
802.11b/g
802.11ac 2x2
802.11n 2x2
Simultaneous Dual Band
>
Antcor Single DSP - Multiple Wi-Fi Access Point Configurations
CEVA DSP advisor Newsletter© CEVA 2013. All rights reserved
USa1943 Landings DriveMountain View, CA 94043 Tel: +1 (650) 417 7900 Fax: +1 (650) 417 7995
Hong KongLevel 43, AIA Tower 183 Electric Road North PointHong KongTel: +852 3975 1264
Israel2 Maskit Street POBox 2068 Herzelia 46120 Tel: +972 9 961 3700 Fax: +972 9 961 3800
Ireland2nd Floor 8-11 Lower Baggot St. Dublin 2 Tel: +353 1 237 3900 Fax: +353 1 237 3923
Japan3014 Shinoharacho Kohoku-ku, Yokohama Kanagawa-Ken 222-0026 Tel: +81 45 430 3901 Fax: +81 45 430 3904
South Korea#478, Hyundai Arion 147 Gumgok-Dong, Bundang-Gu, Sungnam-Si Kyunggi-Do, 463-853Tel: +82 31 704 4471Fax: +82 31 704 4479
ChinaRoom 517 Apollo Business Center No.1440, Yan An Road (C) Shanghai 200040 Tel: +86 21 610 31719 Fax: +86 21 610 31720
TaiwanRoom 909, 9F, No.689, Sec.5 Chung Hsio E. Road Hsin-Yi District, Taipei Tel. +886 2 8785 8668 Fax. +886 2 8785 1281
SwedenKlarabergsviadukten 70 Box 70396 107 24 Stockholm Tel: +46 (0)8 506 362 24 Fax: +46 (0)8 506 362 20
www.ceva-dsp.com>
Ceva DSP aDvISor > SPrIng 2013
8
You can choose from the following kernels and filters:
• Median
• Average
• Sobel
• Gaussian
• FIR
•Laplacian
•Fast9
•Corner Harris
Different combinations of these kernels can be used for applications
like:
•Noise reduction
•Scale down
•Deblurring
•Finding the 2nd-order derivative
•Threshold
These applications are used for finding corners, gradients, and
derivatives; finding an inverse using a lookup table; and erosion and
dilation morphology.
The demo is easy to use: Once you choose the kernels, the host
initializes the vector processor and the SmartFrame Manager tool
establishes the optimal block size, configures the DMA requests, and
handles other such housekeeping duties. At that point, the vector
processor can start applying the selected kernels.
An explanatory video of the demo is available on YouTube at: http://
youtu.be/HLCqcvVyohk or directly on CEVA’s demo showroom under
computer vision at: http://events.ceva-dsp.com/showroom/. To learn
more about CEVA’s computer vision demo, click: http://www.ceva-dsp.
com/CEVA-MM3101
CEVA-CV various combinations of kernels and filters