audio compression & synthesis technology

1

Audio Compression & SynthesisTechnology Overview

Audio Compression & SynthesisTechnology Overview

Adam Chang, MSEE

Product Marketing Manager

2

ContentsContents

Audio Compression Technology Overview

Audio Synthesis Technology Overview

Speech Compression Overview

MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering

Digital Audio Product Offering

Summary

3

ContentsContents







Summary

4

Audio Compression Technologies

A wild range of Audio compression technologies are available, but few of them are really commercialized.

Owing to internet music, MPEG-1/Audio Layer-3 (so called MP3) becomes the most successful Audio compression technology

In addition to internet music, Audio compression technologies are applied to:

Portable solid-state Audio recorder

Internet Radio

DAB (Digital Audio Broadcast) system

Audio accessories of portable devices (Cell phone, PDA, …)

5

MPEG-1/Audio Layer 3 CodingMPEG-1/Audio Layer 3 Coding

MPEG/Audio compression layer 3 is now well known as MP3

Low bit-rate Application 64Kbps for mono channel

Sampling Frequency: 32, 44.1, or 48KHz

Lossy compression algorithm: 12-to-1 Compression ratio

6

Audio Encoding System OverviewAudio Encoding System Overview

FilterBank

Bit or NoiseAllocation

BitstreamFormatting

PsychoacousticModel

SMR(Signal to Mask Ratio)

DigitalAudioInput

EncodedBitstream

7

Hybrid Filter BankHybrid Filter Bank

Polyphase filter bank divides the audio signal into 32 equal-width frequency sub bands.

Processing the filter outputs with a MDCT (Modified Discrete Cosine Transformation)

8

Psychoacoustic ModelPsychoacoustic Model

Incoming signal is transformed from time domain to frequency domain for analysis.

Psychoacoustic model will calculate SMR (Signal-to-Mask Ratio) to each band by using auditory perception like Simultaneous Masking, Temporal Masking, and Absolute Threshold.

SMR of each band will have direct impact to compreesion rate and audio quality.

Different Psychoacoustic models are chosen upon trade-off between audio quality and compression rate.

9

Noise/Bit AllocationNoise/Bit Allocation

Based on SMR from Psychoacoustic model and bit rate restriction, 576 frequency coefficients are grouped to scale factor bands.

Each scale factor band executes noise (or bit) allocation by repeating adjustment of its scale factor and global gain until distortion is minimized.

Non-uniform quantization & Huffman Coding 。

10

Audio Compression Technologies Comparison

Technology Bit rate (Kbit/sec) Advantages Drawbacks

(1) Internet Music Standard (1) Bit rate is too high

(2) Easy to be silicon LSI

(1) High compression ratio (1) IP by Thomson Multimedia

(2) Extention of MP3 (2) No encoder IC available

(1) Excellent audio quality (1) Not internet music standard

(2) High compression ratio (2) No encoder IC available

(1) Excellent audio quality (1) IP by Microsoft

(2) High compression ratio (2) No encoder IC available

(1) Excellent audio quality (1) IP by Sony

(2) High compression ratio (2) No encoder IC available72

MP3

MP3PRO

AAC

WMA

ATRAC3

128

64

96

96

11

Audio Compression TechnologiesBrief SummaryAudio Compression TechnologiesBrief Summary

MP3 is the most mature technology, and its encoder is easy to be implemented by silicon LSI

Among newly developed Audio compression technologies, MP3PRO is the most shining star, because:

It is backward compatible with MP3

Its compression rate is the lowest based on the same audio quality like MP3

Its encoder is easier to be implemented by silicon LSI

Thomson Media aggressively promotes it be new internet music standard

12

ContentsContents







Summary

13

Audio Synthesis Technologies

Audio synthesis technology is actually an method of producing sounds where no acoustic sound is used

Among audio synthesis technologies, FM (Frequency Modulation) and Wavetable Synthesis are now mainstream Audio technologies

Audio synthesis technologies are now wildly applied to many applications like

Music Keyboard

Cell phone sound generator

Toys

Melody accessories

14

Wavetable Synthesis TechnologyWavetable Synthesis Technology

u-Law Compression

Sound Model

Loop

Envelope Control

Pitch shift

Interpolation

15

u-Law Compressionu-Law Compression

Converts linear 16-bit samples into 8-bit codes

)2551log(

|)|2551log()(

s

ssigns

Assume all samples are fractional values between -1 and 1

255

1256

s

s

256log

)2551log( ss

16-bit linear samples8-b

it u

-La

w c

ode

s

16

A Typical Waveform of SoundA Typical Waveform of Sound

17

Sound ModelSound Model

ADSR Model

A (Attack), D (Decay), S (Sustain), R (Release)

For non-percussive instruments (e.g. violin)

note on note off

0 dB

am

plit

ude

a

tten

uatio

n

time

A

DS

R

18

Sound ModelSound Model

ADSR Model

For percussive instruments (e.g. piano, drum)

note on note off

0 dB

am

plit

ude

a

tten

uatio

n

time

A

DS

R

19

LoopLoop

20

Envelope ControlEnvelope Control

21

Pitch shiftPitch shift

Use one or limited sound samples of notes to generate all notes you want to perform

Access the stored sample memory at different rates during playback

PointerMemory

PointerMemory

Some particular pitch Pitch shifted up by one octave

fs 2fs

22

InterpolationInterpolation

23

Wavetable System ImplementationWavetable System Implementation

MicroProcessor

RAM

ProgramROM

MIDI IN WavetableSynthesizer

DAC

WavetableROM

Audio Out (L)

Audio Out (R)

24

FM (Frequency modulation)FM (Frequency modulation)

FM is actually a process of varying the frequency of a signal, often periodically;

25

FMModulation

Oblong Wave

Created

Saw toothed Wave

Created

Pyramidal Wave

Created

Parameter

CarrierCreated

Output Sound

Modulator

Carrier(Sine wave)

Parameter

Parameter

FM (Frequency Modulation)FM (Frequency Modulation)

Fundamental principle of FM sound generator is to synthesizing tones by combining “modulation signal” and “carrier” signal.

26

A device producing “carrier” or “modulator” is called an “operator”

At least two operators are required to generate sound of a musical instrument.

For percussion instruments, at least 4 operators are required if expecting decent instrumental sound quality

FM (Frequency Modulation)

27

Audio Synthesis ComparisonAudio Synthesis Comparison

Theoretically, FM and Wavetable synthesis can achieve the same audio quality.

Technology Advantages Drawbacks

(1) Easy to be implemented (1) Cost

(2) Quality consistent

(1) Cost (1) Not easy to be implemented

(2) Quality is inconsistent

Wavetable Synthesis

Frequency Modulation

28

ContentsContents







Summary

29

Speech Compression TechnologiesSpeech Compression Technologies

In last decade, we have seen rapid progress in speech technologies.

Present speech coders are tending to “source-specific” and “hearing-specific” for low rate consideration.

Speech compression technologies are now wildly applied to many applications like

Digital Telecommucation devices (Cell phone, ISDN, DECT, SST, DAM, …)

Digital voice recording accessories of Cell phone, PDA, DSC, ...

Electronic Language learning solution

Toys

30

Quality MeasuresQuality Measures

Rather from Audio compression technologies, there does exist an impersonal quality measure method called MOS (Mean Opinion Scoring)

MOS(Mean Opinion Score)

Impairment scale

5 Imperceptible

4 Perceptible, but not annoying

3 Slightly annoying

2 Annoying

1 Very annoying

31

Major Speech CodersMajor Speech Coders

Type of coder Bit Rates in Kb/sec MOS

PCM 64 4.3

ADPCM 32 4.1

GSM 13 3.8

CELP 4.8 3.3

LPC 2.4 2.6

32

0111

0110

0101

0100

0011

0010

0001

0000 0001

0111

0110

0100

0011

0101

0110

0111

0111

0101

0010

0000

0110

0100

Analog Input Quantized Output

Waveform CodingWaveform Coding

PCM (Pulse Code Modulation)

33

Waveform CodingWaveform Coding ADPCM (Adaptive Differential Pulse Code Modulation)

Analysis of speech waveforms shows a high sample-to-sample correlation.

ADPCM (Adaptive differential Pulse Code Modulation) was developed to further reduce bit rate while preserving the overall speech quality.

Encoder

Decoder

Step sizeCalculation

Z-1

Z-1

X(n)Linear Input Signal

d(n)difference

ss(n) Step size

ss(n+1) Adjusted step size

X(n-1) estimate oflast input sample

L(n)ADPCM output sample

X(n)

+-

34

APeriodicSignal

BVariable Signal

COutputsound

Source CodingSource Coding

Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tracts.

Voiced sound are produced when the vocal cords vibrate open and closed like quasi-periodic pulses.

Unvoiced sounds result when the excitation is a noise-like turbulence.

35

Source CodingSource Coding

LPC (Linear Predictive Coder)

Pulsegenerator

White noisegenerator

PnB

and

wid

th F

orm

ant

freq

uen

cy

P3

P2

P1

Voiced/unvoicedcontrol

X

Am

pli

tud

eX

X

X

+ SpeechSignal

36

Hybrid CodingHybrid Coding Hybrid coding is an analysis-by-synthesis approach.

The encoder analyzes the input speech by synthesizing many different approximations to it, then transmits information representing the synthesis filter parameters and the excitation to the decoder.

ExcitationGeneration

SynthesisFilter

ErrorWeighting

ErrorMinimization

-

Input speech s(n)

s’(n)e(n)

ew(n)

u(n)

ExcitationGeneration

SynthesisFilter s’(n)u(n)

Reproducedspeech

Encoder

Decoder

37

Typically waveform coding (like ADPCM) is used at high bit rates, and gives very good quality speech.

Source coding (like LPC) operates at very low bit rates, but tend to produce speech which sounds synthetic.

Hybrid coding (like CELP) uses techniques from both source and waveform coding, and gives good quality speech at intermediate bit rates.

Speech Compression TechnologiesBrief Summary

5

4

3

2

1

1 2 4 8 16 32 64 (Kbps)

Hybrid CodingWaveform coding

Source Coding

MOS

38

ContentsContents







Summary

39

Speech product Offering - IELL (Electronic Language Learning)Speech product Offering - IELL (Electronic Language Learning)

MCU core(6502, Z80, 8051)

ROM(Program, data)

SRAM(Data buffer, PIM)

A/D(Voice input, Pen-input)

DSP(LRC, synthesizer)

D/A(Voice output)

* Red block means the components or technologies that MXIC can provide.

Memory Card

Flash

PC

USB

LCD Module(Display)

I/O & Peripherals(Keyboard, battery, ...)

ELL System Block Diagram

40

MXIC ELL Product FeaturesTHV - True Human VoiceMXIC ELL Product FeaturesTHV - True Human Voice

What is “True Human Voice”?

What can MXIC provide to THV solution?

MXIC has 1.2K/2.0Kbps LRC (Low-Rate Coder) with excellent speech quality.

Over 50,000 THV words can be stored in 64Mb ROM based on 1.2Kbps LRC.

Record Human voice Compression in PC Code stored in ROM DSP decodes and playback

41

Why is it so important to have Sequential ROM interface in ED application? Because:

ED needs larger and larger Mask ROM density:

Content becomes larger and larger

True Human Voice

MCU just needs 20 pins up to 4Gb Sequential ROM. It saves pin-count, which means to save die size

Sequential ROM is the most cost-effective

MXIC ELL Product FeaturesTHV - True Human VoiceMXIC ELL Product FeaturesTHV - True Human Voice

Conventional ROMMXIC Sequential ROM

42

China44%

Taiwan7%

HK3%

Korea2%

Japan44%

Worldwide ED Market SizeWorldwide ED Market Size

China Taiwan HK Korea Japan TotalQuantity (K sets) 4,000 600 300 200 4,000 9,100

Source: MXIC, 2001

43

Worldwide ED Market SizeWorldwide ED Market Size

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

1999 2000 2001 2002 2003

Japan

Korea

HK

Taiwan

China

1999 2000 2001 2002 2003 CAGRChina 2,600 3,600 4,000 4,600 5,400 20.05%

Taiwan 500 550 600 630 660 7.19%HK 250 280 300 320 350 8.78%

Korea 180 200 200 220 240 7.46%Japan 3,000 3,500 4,000 4,500 5,000 13.62%Total 6,530 8,130 9,100 10,270 11,650 15.57%

Source: MXIC, 2001

44

Q1/2001 Q1/2002Q2/2001 Q4/2001Q3/2001 Q3/2002Q2/2002 Q4/2002

ELL Product Road MapELL Product Road Map

* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* DVR stands for Digital Voice Recorder, VR stands for Voice Recognition

Z80 embedded 3-in-1

ED Controller

All-in-one ED(MCU + DSP + S-ROM I/F)

LRCdecoder

MCU & Speech Processor for ED & PDA

MCU & Speech Processor for ED & PDA

MX93L551DVR Processor

with LRC 6502 embeddedED Controller

ARM7TDMI embeddedED Controller

MX93L552DVR Processor

with VR

45

MXIC ELL Solution AdvantageMXIC ELL Solution Advantage

We can provide THV (True Human Voice) solution!

We can provide MCU ASSP with:

Effective Sequential ROM interface for program and data storage in ED with THV (True Human Voice) feature

We can provide Sequential ROM family (64Mb ~ 256Mb) for ED and E-Book

46

Digital VoiceRecorder

(DSP Engine Chip)

Microcontroller

Flash

Speech Product Offering - IIDVR (Digital Voice Recorder)Speech Product Offering - IIDVR (Digital Voice Recorder)


LCD Display

Keypad

MIC

Speaker


47

DVR (Digital Voice Recorder)DVR (Digital Voice Recorder)

Message management:

Playback, Fast Forward, Rewind

Forward/backward Search within specific message

Repeat FFRW

00:00 05:30

05:1002:15 200ms

FSBS

Repeat

48

DVR (Digital Voice Recorder)DVR (Digital Voice Recorder)

PSA (Playback Speed Adjustment) can be ranged from 50% to 200%

100%50% 200%

Fast Playback

Normal Playback

Slow Playback

49

MXIC DVR Solution AdvantageMXIC DVR Solution Advantage

We can provide switchable speech compression rate (4.8K/12.8K/32Kbps) for different speech recording systems

We can provide flexible speech manipulations like:

Folder management

Playback, pause, FF, RW, Repeat, Forward/backward search, append, …

PSA (Playback Speed Adjustment)

We can provide Total System Solution (MCU, DSP, Flash)

50

Microphone

Speaker

TelephoneLine

AFlash(Voice Prompt)

MCU(System

control code)All-in-one DAM Controller

Speech Product Offering - IIIDAM (Digital Answering Machine)Speech Product Offering - IIIDAM (Digital Answering Machine)


DisplayKeypad


51

DAM (Digital Answering Machine)DAM (Digital Answering Machine)

Key successful factor is to have an excellent speech CODEC

Switchable compression rate: 4.8K/12.8K/32Kbps

MRC (Multi-Rate Coder): 3.6Kbps ~ 14.2Kbps

Full-duplex speakerphone is highlighted in this application

Also, Telecom signal processing (tone generation/detection) is also included

ACOUSTICCOUPLING

Speaker

Microphone

DAMEngine Chip

SPKDriver PCM

Codec-1PCM

Codec-2MICGain

LineGain

LineDriver

4-2 wirecoupling

LINECOUPLING

52

NorthAmerica

60%Europe

16%

Japan19%

Others5%

Worldwide DAM SizeWorldwide DAM Size

unit: M sets

North America Europe Japan Others TotalQ'ty (M sets) 22 6 7 2 37

Source: MXIC, 2001

53

Worldwide DAM SizeWorldwide DAM Size

05,000

10,00015,00020,00025,00030,00035,00040,00045,000

1999 2000 2001 2002 2003

OthersChinaEuropeJapanUS

unit: K sets

1999 2000 2001 2002 2003 CAGRUS 20,500 21,500 22,000 22,500 23,000 2.92%

Japan 6,800 7,000 7,000 7,100 7,200 1.44%Europe 6,200 6,000 6,000 5,800 5,800 -1.65%China 180 200 200 600 1,000 53.53%Others 1,800 2,000 2,000 2,100 2,200 5.14%Total 35,480 36,700 37,200 38,100 39,200 2.52%

Source: MXIC, 2001

54

Q1/2002 Q1/2003Q2/2002 Q4/2002Q3/2002 Q3/2003Q2/2003 Q4/2003

MX93L108Entry level

DAM Processor

DAM Product Road MapDAM Product Road Map

* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* MRC stands for Multi-Rate Coder, CID stands for Caller ID, and SPK stands for Speaker phone

MX931115V DAM

MX93L111A3V MRC DAM

DAM embedded4Mb Flash

DAM SolutionDAM Solution

DAM processorembedded 1Mb MTP

MX931325V DAM w/

CID/SPK

MX93L132A3V MRC DAMw/ CID/SPK

55

High-end

MXIC

Low-end

MRC (Multi-Rate Coder)+ 8/16Mb AFlash

12.8K/32Kbps+ 64/128Mb SDRAM

MXIC DAM Solution AdvantagesMXIC DAM Solution Advantages

MXIC has different kinds of solutions in each DAM market segment

MXIC is the leader in mid-range segment, and Top 2 DAM IC Vendor in the World

MXIC provides one-stop shopping service (DSP, MCU, AFlash) in DAM application

56

ContentsContents







Summary

57

AudioEncoder/Decoder

ProcessorHost

controllerPower

Amplifier

Memory(MMC, CF, SD,

Memory Stick, …)

16-bitAudioCodec

Speaker

Headphone

Audio Devices

AudioROM

Flash

Audioinput

Audio Product Offering - IAIRTM (Audio IC Recorder)Audio Product Offering - IAIRTM (Audio IC Recorder)

Audio Devices


58

AIRTM (Audio IC Recorder)AIRTM (Audio IC Recorder)

AIRTM, A brand new Audio product concept!

Built-in S/PDIF, Audio data can be directly saved into the MP3 Player via its MP3 real-time encoding.

Say Good-bye to the sophisticated PC download method!

CD Compression Download

S/PDIFAudio Devices

59

AIRTM (Audio IC Recorder)AIRTM (Audio IC Recorder)

Mini Component System and Portable Audio:

Upgrade Conventional Models to Fully-Digital Audio (MP3)

Alignment with Young Generation’s Portable MP3 Players!

Mini Component System Portable AudioMX92L600

Audio IC Recorder

Cassette Memory CardsPortable MP3 Players

60

MicroProcessor

SRAM

ProgramROM

MIDI IN WavetableSynthesizer

Audio DAC

WavetableROM

Audio Product Offering - IIWavetable Sound GeneratorAudio Product Offering - IIWavetable Sound Generator

MIDI for Sound Generator:


Sound Generator ASSP

61

Q1/2001 Q1/2002Q2/2001 Q4/2001Q3/2001 Q3/2002Q2/2002 Q4/2002

Digital Audio Product Road MapDigital Audio Product Road Map

* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* DVR stands for Digital Voice Recorder, LRC stands for Low-Rate Coder

MX92L600MP3 Codec Promotional Singles

(8MB embedded)

MP3/AAC Player & Recorder Solution

MP3/AAC Player & Recorder Solution

MX92L500MP3 decoder

Audio ROM derivatives

62

MXIC Digital Audio AdvantagesMXIC Digital Audio Advantages

Professional MIDI technology (with General MIDI V1.0 Sound set, 32 Polyphony and 32 Multi-timbre) provides supreme sound generator solution for Mobile phones, PDA, ED, and Toys applications.

Complete solution for MP3 player and recorder

In-house Sequential ROM, Flash and Memory Card support

63

ContentsContents







Summary

64

SummarySummary

Among Audio Compression technologies, MP3 is the most mature one, while MP3PRO is deemed to be a future start.

FM and wavetable synthesis are mainstream Audio synthesis technologies, and wavetable synthesis seems superior pratically.

Different speech technologies are for different applications. Among all, Hybrid coding is superior reinforced by DSP technology.

MXIC focus on Audio & speech technologies, and several products related to Audio & Speech were presented.

65

Moving Toward IA, Moving with Us!

Audio Compression & Synthesis Technology OverviewAudio Compression & Synthesis Technology Overview

audio compression & synthesis technology

Documents