audio compression & synthesis technology
TRANSCRIPT
1
Audio Compression & SynthesisTechnology Overview
Audio Compression & SynthesisTechnology Overview
Adam Chang, MSEE
Product Marketing Manager
2
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
3
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
4
Audio Compression Technologies
A wild range of Audio compression technologies are available, but few of them are really commercialized.
Owing to internet music, MPEG-1/Audio Layer-3 (so called MP3) becomes the most successful Audio compression technology
In addition to internet music, Audio compression technologies are applied to:
Portable solid-state Audio recorder
Internet Radio
DAB (Digital Audio Broadcast) system
Audio accessories of portable devices (Cell phone, PDA, …)
5
MPEG-1/Audio Layer 3 CodingMPEG-1/Audio Layer 3 Coding
MPEG/Audio compression layer 3 is now well known as MP3
Low bit-rate Application 64Kbps for mono channel
Sampling Frequency: 32, 44.1, or 48KHz
Lossy compression algorithm: 12-to-1 Compression ratio
6
Audio Encoding System OverviewAudio Encoding System Overview
FilterBank
Bit or NoiseAllocation
BitstreamFormatting
PsychoacousticModel
SMR(Signal to Mask Ratio)
DigitalAudioInput
EncodedBitstream
7
Hybrid Filter BankHybrid Filter Bank
Polyphase filter bank divides the audio signal into 32 equal-width frequency sub bands.
Processing the filter outputs with a MDCT (Modified Discrete Cosine Transformation)
8
Psychoacoustic ModelPsychoacoustic Model
Incoming signal is transformed from time domain to frequency domain for analysis.
Psychoacoustic model will calculate SMR (Signal-to-Mask Ratio) to each band by using auditory perception like Simultaneous Masking, Temporal Masking, and Absolute Threshold.
SMR of each band will have direct impact to compreesion rate and audio quality.
Different Psychoacoustic models are chosen upon trade-off between audio quality and compression rate.
9
Noise/Bit AllocationNoise/Bit Allocation
Based on SMR from Psychoacoustic model and bit rate restriction, 576 frequency coefficients are grouped to scale factor bands.
Each scale factor band executes noise (or bit) allocation by repeating adjustment of its scale factor and global gain until distortion is minimized.
Non-uniform quantization & Huffman Coding 。
10
Audio Compression Technologies Comparison
Technology Bit rate (Kbit/sec) Advantages Drawbacks
(1) Internet Music Standard (1) Bit rate is too high
(2) Easy to be silicon LSI
(1) High compression ratio (1) IP by Thomson Multimedia
(2) Extention of MP3 (2) No encoder IC available
(1) Excellent audio quality (1) Not internet music standard
(2) High compression ratio (2) No encoder IC available
(1) Excellent audio quality (1) IP by Microsoft
(2) High compression ratio (2) No encoder IC available
(1) Excellent audio quality (1) IP by Sony
(2) High compression ratio (2) No encoder IC available72
MP3
MP3PRO
AAC
WMA
ATRAC3
128
64
96
96
11
Audio Compression TechnologiesBrief SummaryAudio Compression TechnologiesBrief Summary
MP3 is the most mature technology, and its encoder is easy to be implemented by silicon LSI
Among newly developed Audio compression technologies, MP3PRO is the most shining star, because:
It is backward compatible with MP3
Its compression rate is the lowest based on the same audio quality like MP3
Its encoder is easier to be implemented by silicon LSI
Thomson Media aggressively promotes it be new internet music standard
12
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
13
Audio Synthesis Technologies
Audio synthesis technology is actually an method of producing sounds where no acoustic sound is used
Among audio synthesis technologies, FM (Frequency Modulation) and Wavetable Synthesis are now mainstream Audio technologies
Audio synthesis technologies are now wildly applied to many applications like
Music Keyboard
Cell phone sound generator
Toys
Melody accessories
14
Wavetable Synthesis TechnologyWavetable Synthesis Technology
u-Law Compression
Sound Model
Loop
Envelope Control
Pitch shift
Interpolation
15
u-Law Compressionu-Law Compression
Converts linear 16-bit samples into 8-bit codes
)2551log(
|)|2551log()(
s
ssigns
Assume all samples are fractional values between -1 and 1
255
1256
s
s
256log
)2551log( ss
16-bit linear samples8-b
it u
-La
w c
ode
s
16
A Typical Waveform of SoundA Typical Waveform of Sound
17
Sound ModelSound Model
ADSR Model
A (Attack), D (Decay), S (Sustain), R (Release)
For non-percussive instruments (e.g. violin)
note on note off
0 dB
am
plit
ude
a
tten
uatio
n
time
A
DS
R
18
Sound ModelSound Model
ADSR Model
For percussive instruments (e.g. piano, drum)
note on note off
0 dB
am
plit
ude
a
tten
uatio
n
time
A
DS
R
19
LoopLoop
20
Envelope ControlEnvelope Control
21
Pitch shiftPitch shift
Use one or limited sound samples of notes to generate all notes you want to perform
Access the stored sample memory at different rates during playback
PointerMemory
PointerMemory
Some particular pitch Pitch shifted up by one octave
fs 2fs
22
InterpolationInterpolation
23
Wavetable System ImplementationWavetable System Implementation
MicroProcessor
RAM
ProgramROM
MIDI IN WavetableSynthesizer
DAC
WavetableROM
Audio Out (L)
Audio Out (R)
24
FM (Frequency modulation)FM (Frequency modulation)
FM is actually a process of varying the frequency of a signal, often periodically;
25
FMModulation
Oblong Wave
Created
Saw toothed Wave
Created
Pyramidal Wave
Created
Parameter
CarrierCreated
Output Sound
Modulator
Carrier(Sine wave)
Parameter
Parameter
FM (Frequency Modulation)FM (Frequency Modulation)
Fundamental principle of FM sound generator is to synthesizing tones by combining “modulation signal” and “carrier” signal.
26
A device producing “carrier” or “modulator” is called an “operator”
At least two operators are required to generate sound of a musical instrument.
For percussion instruments, at least 4 operators are required if expecting decent instrumental sound quality
FM (Frequency Modulation)
27
Audio Synthesis ComparisonAudio Synthesis Comparison
Theoretically, FM and Wavetable synthesis can achieve the same audio quality.
Technology Advantages Drawbacks
(1) Easy to be implemented (1) Cost
(2) Quality consistent
(1) Cost (1) Not easy to be implemented
(2) Quality is inconsistent
Wavetable Synthesis
Frequency Modulation
28
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
29
Speech Compression TechnologiesSpeech Compression Technologies
In last decade, we have seen rapid progress in speech technologies.
Present speech coders are tending to “source-specific” and “hearing-specific” for low rate consideration.
Speech compression technologies are now wildly applied to many applications like
Digital Telecommucation devices (Cell phone, ISDN, DECT, SST, DAM, …)
Digital voice recording accessories of Cell phone, PDA, DSC, ...
Electronic Language learning solution
Toys
30
Quality MeasuresQuality Measures
Rather from Audio compression technologies, there does exist an impersonal quality measure method called MOS (Mean Opinion Scoring)
MOS(Mean Opinion Score)
Impairment scale
5 Imperceptible
4 Perceptible, but not annoying
3 Slightly annoying
2 Annoying
1 Very annoying
31
Major Speech CodersMajor Speech Coders
Type of coder Bit Rates in Kb/sec MOS
PCM 64 4.3
ADPCM 32 4.1
GSM 13 3.8
CELP 4.8 3.3
LPC 2.4 2.6
32
0111
0110
0101
0100
0011
0010
0001
0000 0001
0111
0110
0100
0011
0101
0110
0111
0111
0101
0010
0000
0110
0100
Analog Input Quantized Output
Waveform CodingWaveform Coding
PCM (Pulse Code Modulation)
33
Waveform CodingWaveform Coding ADPCM (Adaptive Differential Pulse Code Modulation)
Analysis of speech waveforms shows a high sample-to-sample correlation.
ADPCM (Adaptive differential Pulse Code Modulation) was developed to further reduce bit rate while preserving the overall speech quality.
Encoder
Decoder
Step sizeCalculation
Z-1
Z-1
X(n)Linear Input Signal
d(n)difference
ss(n) Step size
ss(n+1) Adjusted step size
X(n-1) estimate oflast input sample
L(n)ADPCM output sample
X(n)
+-
34
APeriodicSignal
BVariable Signal
COutputsound
Source CodingSource Coding
Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tracts.
Voiced sound are produced when the vocal cords vibrate open and closed like quasi-periodic pulses.
Unvoiced sounds result when the excitation is a noise-like turbulence.
35
Source CodingSource Coding
LPC (Linear Predictive Coder)
Pulsegenerator
White noisegenerator
PnB
and
wid
th F
orm
ant
freq
uen
cy
P3
P2
P1
Voiced/unvoicedcontrol
X
Am
pli
tud
eX
X
X
+ SpeechSignal
36
Hybrid CodingHybrid Coding Hybrid coding is an analysis-by-synthesis approach.
The encoder analyzes the input speech by synthesizing many different approximations to it, then transmits information representing the synthesis filter parameters and the excitation to the decoder.
ExcitationGeneration
SynthesisFilter
ErrorWeighting
ErrorMinimization
-
Input speech s(n)
s’(n)e(n)
ew(n)
u(n)
ExcitationGeneration
SynthesisFilter s’(n)u(n)
Reproducedspeech
Encoder
Decoder
37
Typically waveform coding (like ADPCM) is used at high bit rates, and gives very good quality speech.
Source coding (like LPC) operates at very low bit rates, but tend to produce speech which sounds synthetic.
Hybrid coding (like CELP) uses techniques from both source and waveform coding, and gives good quality speech at intermediate bit rates.
Speech Compression TechnologiesBrief Summary
5
4
3
2
1
1 2 4 8 16 32 64 (Kbps)
Hybrid CodingWaveform coding
Source Coding
MOS
38
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
39
Speech product Offering - IELL (Electronic Language Learning)Speech product Offering - IELL (Electronic Language Learning)
MCU core(6502, Z80, 8051)
ROM(Program, data)
SRAM(Data buffer, PIM)
A/D(Voice input, Pen-input)
DSP(LRC, synthesizer)
D/A(Voice output)
* Red block means the components or technologies that MXIC can provide.
Memory Card
Flash
PC
USB
LCD Module(Display)
I/O & Peripherals(Keyboard, battery, ...)
ELL System Block Diagram
40
MXIC ELL Product FeaturesTHV - True Human VoiceMXIC ELL Product FeaturesTHV - True Human Voice
What is “True Human Voice”?
What can MXIC provide to THV solution?
MXIC has 1.2K/2.0Kbps LRC (Low-Rate Coder) with excellent speech quality.
Over 50,000 THV words can be stored in 64Mb ROM based on 1.2Kbps LRC.
Record Human voice Compression in PC Code stored in ROM DSP decodes and playback
41
Why is it so important to have Sequential ROM interface in ED application? Because:
ED needs larger and larger Mask ROM density:
Content becomes larger and larger
True Human Voice
MCU just needs 20 pins up to 4Gb Sequential ROM. It saves pin-count, which means to save die size
Sequential ROM is the most cost-effective
MXIC ELL Product FeaturesTHV - True Human VoiceMXIC ELL Product FeaturesTHV - True Human Voice
Conventional ROMMXIC Sequential ROM
42
China44%
Taiwan7%
HK3%
Korea2%
Japan44%
Worldwide ED Market SizeWorldwide ED Market Size
China Taiwan HK Korea Japan TotalQuantity (K sets) 4,000 600 300 200 4,000 9,100
Source: MXIC, 2001
43
Worldwide ED Market SizeWorldwide ED Market Size
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1999 2000 2001 2002 2003
Japan
Korea
HK
Taiwan
China
1999 2000 2001 2002 2003 CAGRChina 2,600 3,600 4,000 4,600 5,400 20.05%
Taiwan 500 550 600 630 660 7.19%HK 250 280 300 320 350 8.78%
Korea 180 200 200 220 240 7.46%Japan 3,000 3,500 4,000 4,500 5,000 13.62%Total 6,530 8,130 9,100 10,270 11,650 15.57%
Source: MXIC, 2001
44
Q1/2001 Q1/2002Q2/2001 Q4/2001Q3/2001 Q3/2002Q2/2002 Q4/2002
ELL Product Road MapELL Product Road Map
* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* DVR stands for Digital Voice Recorder, VR stands for Voice Recognition
Z80 embedded 3-in-1
ED Controller
All-in-one ED(MCU + DSP + S-ROM I/F)
LRCdecoder
MCU & Speech Processor for ED & PDA
MCU & Speech Processor for ED & PDA
MX93L551DVR Processor
with LRC 6502 embeddedED Controller
ARM7TDMI embeddedED Controller
MX93L552DVR Processor
with VR
45
MXIC ELL Solution AdvantageMXIC ELL Solution Advantage
We can provide THV (True Human Voice) solution!
We can provide MCU ASSP with:
Effective Sequential ROM interface for program and data storage in ED with THV (True Human Voice) feature
We can provide Sequential ROM family (64Mb ~ 256Mb) for ED and E-Book
46
Digital VoiceRecorder
(DSP Engine Chip)
Microcontroller
Flash
Speech Product Offering - IIDVR (Digital Voice Recorder)Speech Product Offering - IIDVR (Digital Voice Recorder)
* Red block means the components or technologies that MXIC can provide.
LCD Display
Keypad
MIC
Speaker
ELL System Block Diagram
47
DVR (Digital Voice Recorder)DVR (Digital Voice Recorder)
Message management:
Playback, Fast Forward, Rewind
Forward/backward Search within specific message
Repeat FFRW
00:00 05:30
05:1002:15 200ms
FSBS
Repeat
48
DVR (Digital Voice Recorder)DVR (Digital Voice Recorder)
PSA (Playback Speed Adjustment) can be ranged from 50% to 200%
100%50% 200%
Fast Playback
Normal Playback
Slow Playback
49
MXIC DVR Solution AdvantageMXIC DVR Solution Advantage
We can provide switchable speech compression rate (4.8K/12.8K/32Kbps) for different speech recording systems
We can provide flexible speech manipulations like:
Folder management
Playback, pause, FF, RW, Repeat, Forward/backward search, append, …
PSA (Playback Speed Adjustment)
We can provide Total System Solution (MCU, DSP, Flash)
50
Microphone
Speaker
TelephoneLine
AFlash(Voice Prompt)
MCU(System
control code)All-in-one DAM Controller
Speech Product Offering - IIIDAM (Digital Answering Machine)Speech Product Offering - IIIDAM (Digital Answering Machine)
* Red block means the components or technologies that MXIC can provide.
DisplayKeypad
ELL System Block Diagram
51
DAM (Digital Answering Machine)DAM (Digital Answering Machine)
Key successful factor is to have an excellent speech CODEC
Switchable compression rate: 4.8K/12.8K/32Kbps
MRC (Multi-Rate Coder): 3.6Kbps ~ 14.2Kbps
Full-duplex speakerphone is highlighted in this application
Also, Telecom signal processing (tone generation/detection) is also included
ACOUSTICCOUPLING
Speaker
Microphone
DAMEngine Chip
SPKDriver PCM
Codec-1PCM
Codec-2MICGain
LineGain
LineDriver
4-2 wirecoupling
LINECOUPLING
52
NorthAmerica
60%Europe
16%
Japan19%
Others5%
Worldwide DAM SizeWorldwide DAM Size
unit: M sets
North America Europe Japan Others TotalQ'ty (M sets) 22 6 7 2 37
Source: MXIC, 2001
53
Worldwide DAM SizeWorldwide DAM Size
05,000
10,00015,00020,00025,00030,00035,00040,00045,000
1999 2000 2001 2002 2003
OthersChinaEuropeJapanUS
unit: K sets
1999 2000 2001 2002 2003 CAGRUS 20,500 21,500 22,000 22,500 23,000 2.92%
Japan 6,800 7,000 7,000 7,100 7,200 1.44%Europe 6,200 6,000 6,000 5,800 5,800 -1.65%China 180 200 200 600 1,000 53.53%Others 1,800 2,000 2,000 2,100 2,200 5.14%Total 35,480 36,700 37,200 38,100 39,200 2.52%
Source: MXIC, 2001
54
Q1/2002 Q1/2003Q2/2002 Q4/2002Q3/2002 Q3/2003Q2/2003 Q4/2003
MX93L108Entry level
DAM Processor
DAM Product Road MapDAM Product Road Map
* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* MRC stands for Multi-Rate Coder, CID stands for Caller ID, and SPK stands for Speaker phone
MX931115V DAM
MX93L111A3V MRC DAM
DAM embedded4Mb Flash
DAM SolutionDAM Solution
DAM processorembedded 1Mb MTP
MX931325V DAM w/
CID/SPK
MX93L132A3V MRC DAMw/ CID/SPK
55
High-end
MXIC
Low-end
MRC (Multi-Rate Coder)+ 8/16Mb AFlash
12.8K/32Kbps+ 64/128Mb SDRAM
MXIC DAM Solution AdvantagesMXIC DAM Solution Advantages
MXIC has different kinds of solutions in each DAM market segment
MXIC is the leader in mid-range segment, and Top 2 DAM IC Vendor in the World
MXIC provides one-stop shopping service (DSP, MCU, AFlash) in DAM application
56
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
57
AudioEncoder/Decoder
ProcessorHost
controllerPower
Amplifier
Memory(MMC, CF, SD,
Memory Stick, …)
16-bitAudioCodec
Speaker
Headphone
Audio Devices
AudioROM
Flash
Audioinput
Audio Product Offering - IAIRTM (Audio IC Recorder)Audio Product Offering - IAIRTM (Audio IC Recorder)
Audio Devices
* Red block means the components or technologies that MXIC can provide.
58
AIRTM (Audio IC Recorder)AIRTM (Audio IC Recorder)
AIRTM, A brand new Audio product concept!
Built-in S/PDIF, Audio data can be directly saved into the MP3 Player via its MP3 real-time encoding.
Say Good-bye to the sophisticated PC download method!
CD Compression Download
S/PDIFAudio Devices
59
AIRTM (Audio IC Recorder)AIRTM (Audio IC Recorder)
Mini Component System and Portable Audio:
Upgrade Conventional Models to Fully-Digital Audio (MP3)
Alignment with Young Generation’s Portable MP3 Players!
Mini Component System Portable AudioMX92L600
Audio IC Recorder
Cassette Memory CardsPortable MP3 Players
60
MicroProcessor
SRAM
ProgramROM
MIDI IN WavetableSynthesizer
Audio DAC
WavetableROM
Audio Product Offering - IIWavetable Sound GeneratorAudio Product Offering - IIWavetable Sound Generator
MIDI for Sound Generator:
* Red block means the components or technologies that MXIC can provide.
Sound Generator ASSP
61
Q1/2001 Q1/2002Q2/2001 Q4/2001Q3/2001 Q3/2002Q2/2002 Q4/2002
Digital Audio Product Road MapDigital Audio Product Road Map
* Rectangle means existing products, and circle means under developing products* Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule.* DVR stands for Digital Voice Recorder, LRC stands for Low-Rate Coder
MX92L600MP3 Codec Promotional Singles
(8MB embedded)
MP3/AAC Player & Recorder Solution
MP3/AAC Player & Recorder Solution
MX92L500MP3 decoder
Audio ROM derivatives
62
MXIC Digital Audio AdvantagesMXIC Digital Audio Advantages
Professional MIDI technology (with General MIDI V1.0 Sound set, 32 Polyphony and 32 Multi-timbre) provides supreme sound generator solution for Mobile phones, PDA, ED, and Toys applications.
Complete solution for MP3 player and recorder
In-house Sequential ROM, Flash and Memory Card support
63
ContentsContents
Audio Compression Technology Overview
Audio Synthesis Technology Overview
Speech Compression Overview
MXIC Solution to Digital Audio & Speech Applications
Speech Product Offering
Digital Audio Product Offering
Summary
64
SummarySummary
Among Audio Compression technologies, MP3 is the most mature one, while MP3PRO is deemed to be a future start.
FM and wavetable synthesis are mainstream Audio synthesis technologies, and wavetable synthesis seems superior pratically.
Different speech technologies are for different applications. Among all, Hybrid coding is superior reinforced by DSP technology.
MXIC focus on Audio & speech technologies, and several products related to Audio & Speech were presented.
65
Moving Toward IA, Moving with Us!
Audio Compression & Synthesis Technology OverviewAudio Compression & Synthesis Technology Overview