basic features of audio signals ( 音訊的基本特徵 )
DESCRIPTION
Basic Features of Audio Signals ( 音訊的基本特徵 ). Jyh-Shing Roger Jang ( 張智星 ) http://www.cs.nthu.edu.tw/~jang MIR Lab, CS Dept, Tsing Hua Univ. Hsinchu, Taiwan. Audio Features. Four commonly used audio features Volume Pitch Zero crossing rate Timber Our goal - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/1.jpg)
Basic Features of Audio Signals(音訊的基本特徵 )
Jyh-Shing Roger Jang (張智星 )http://www.cs.nthu.edu.tw/~jang
MIR Lab, CS Dept, Tsing Hua Univ.Hsinchu, Taiwan
![Page 2: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/2.jpg)
Audio Features
Four commonly used audio features Volume Pitch Zero crossing rate Timber
Our goal These features can be perceived subjectively. But we need to compute them quantitatively for
further processing and recognition.
![Page 3: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/3.jpg)
Audio Features in Time Domain
Audio features presented in the time domain
Intensity
Fundamental period
Timbre: Waveform within an FP
![Page 4: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/4.jpg)
Audio Features in Frequency DomainVolume: Magnitude of spectrumPitch: Distance between harmonicsTimber: Smoothed spectrum
Second formant F2First formant
F1Pitch freq
Intensity
![Page 5: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/5.jpg)
Demo: Real-time Spectrogram
Try “dspstfft_audio” under MATLAB:
Spectrogram:Spectrum:
![Page 6: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/6.jpg)
Steps for Audio Feature Extraction
Frame blocking Frame duration of 20 ms or so
Feature extraction Volume, zero-crossing rate, pitch, MFCC, etc
Endpoint detection Usually based on volume & zero-crossing rate
![Page 7: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/7.jpg)
Frame Blocking
Sample rate = 11025 HzFrame size = 256 samplesOverlap = 84 samples(Hop size = 256-84)Frame rate = 11025/(256-84)=64 frames/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
![Page 8: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/8.jpg)
Intensity (I) Intensity
Visual cue: Amplitude of vibration Computation:
Volume:
Log energy (in decibel):
Characteristics Influenced by
microphone typesMicrophone setups
Perceived volume is influenced by frequency and timbre
1
n
ii
vol s
2
101
10*logn
ii
energy s
![Page 9: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/9.jpg)
Intensity (II)To avoid DC drifting
DC drifting: The vibration is not around zero Computation:
Volume:
Log energy (in decibel):
Theoretical background (How to prove?)
1
n
ii
vol s median s
2
101
10*logn
ii
energy s mean s
1 21
, ,..., arg minn
n ix
i
s s s s s x median s
2
1 21
, ,..., arg minn
n ix
i
s s s s s x mean s
![Page 10: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/10.jpg)
Intensity (III)
Examples Please refer to the online tutorial
![Page 11: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/11.jpg)
Pitch
Definition Pitch is known as fundamental frequency, which is
equal to the no. of fundamental period within a second. The unit used here is Hertz (Hz).
More commonly, pitch is in terms of semitone, which can be converted from pitch in Hertz:
269 12*log440
Hzsemitone
![Page 12: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/12.jpg)
Pitch Computation (I)
Pitch of tuning forks
semitoneff
pitch
Hzff
98.68440
log*1269
56.4395/7187/16000
2
![Page 13: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/13.jpg)
Pitch Computation (II)
Pitch of speech
semitoneff
pitch
Hzff
42.46440
log*1269
403.1193/75477/16000
2
![Page 14: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/14.jpg)
Statistics of Mandarin Chinese 5401 characters, each character is at least associated with a
base syllable and a tone 411 base syllables, and most syllables have 4 ones, so we have
1501 tonal syllables Tone is characterized by the pitch curves:
Tone 1: high-high Tone 2: low-high Tone 3: high-low-high Tone 4: high-low
Some examples of tones: 1242:清華大學 1234:三民主義、優柔寡斷、搭達打大、依宜以易、夫福府負 ?????:美麗大教堂、滷蛋有夠鹹( Taiwanese)
![Page 15: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/15.jpg)
Sinusoidal Signals
How to generate a stream of sinusoidal signalsfs=16000;
duration=3;
f=440;
t=(1:fs*duration)/fs;
y=0.8*sin(2*pi*f*t);
plot(t,y); axis([0.6, 0.65, -1 1]);
sound(y, fs);
![Page 16: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/16.jpg)
Zero Crossing Rate
Zero crossing rate (ZCR) The number of zero crossing in a frame.
Characteristics: Noise and unvoiced sound have high ZCR. ZCR is commonly used in endpoint detection,
especially in detection the start and end of unvoiced sounds.
To distinguish noise/silence from unvoiced sound, usually we add a bias before computing ZCR.
![Page 17: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/17.jpg)
ZCR ComputationsTwo types of ZCR definition
If a sample with zero value is considered a case of ZCR, then the value of ZCR is higher. Otherwise its lower.
It affects the ZCR, especially when the sample rate is low.
Other consideration Zero-justification is required. ZCR with shift can be used to distinguish between
unvoiced sounds and silence. (How to determine the shift amount?)
![Page 18: Basic Features of Audio Signals ( 音訊的基本特徵 )](https://reader036.vdocuments.mx/reader036/viewer/2022081506/568151c6550346895dbffb23/html5/thumbnails/18.jpg)
ZCR
Examples Please refer to the online tutorial.