basic audio feature
TRANSCRIPT
-
7/28/2019 Basic Audio Feature
1/18
Basic Features of Audio Signals
()
Jyh-Shing Roger Jang ()
http://mirlab.org/jang
MIR Lab, CSIE DeptNational Taiwan Univ., Taiwan
http://mirlab.org/janghttp://mirlab.org/jang -
7/28/2019 Basic Audio Feature
2/18
Audio Features
Four commonly used audio features
Volume, pitch, zero crossing rate, timber
Our goalThese features can be perceived subjectively
(except for zero crossing rate).
Our goal is to compute them quantitatively (and
objectively) for further processing andrecognition.
-
7/28/2019 Basic Audio Feature
3/18
Audio Features in Time Domain
Audio features presented in the time domain
Intensity
Fundamental period
Timbre: Waveform within an FP
-
7/28/2019 Basic Audio Feature
4/18
Audio Features in Frequency Domain
Volume: Magnitude of spectrum
Pitch: Distance between harmonics
Timber: Smoothed spectrumSecond formant
F2First formant
F1
Pitch freq
Intensity
-
7/28/2019 Basic Audio Feature
5/18
General Steps for Audio Analysis
1. Frame blocking
Frame duration of 20 ms or so
2. Feature extraction
Volume, zero-crossing rate, pitch, MFCC, etc
3. Frame-based Analysis
Pitch contour comparison, HMM evaluation, etc
-
7/28/2019 Basic Audio Feature
6/18
Frame Blocking
Sample rate = 11025 Hz
Frame size = 256 samples
Overlap = 84 samples
(Hop size = frame size - overlap)
Frame rate = 11025/(256-84)=64 frames/sec0 50 100 150 200 250 300
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-
7/28/2019 Basic Audio Feature
7/18
Frame-based Manipulation
For simplicity, we usually pack frames into a
frame matrix for easy manipulation in
MATLAB:
[y, fs, nbits]=wavread(file.wav);
frameMat=enframe(y, frameSize, overlap);
-
7/28/2019 Basic Audio Feature
8/18
Volume (I)
Loudness of audio signals
Visual cue: Amplitude of vibration
Also known as energy or intensity
Two major ways of computing volume:Volume:
Log energy (in decibel):
1
n
i
i
vol s
2
10 1
10*logn
ii
energy s
-
7/28/2019 Basic Audio Feature
9/18
Volume (II)
Perceived volume is influenced by
Frequency (see equal loudness curves in text)
Timbre (see example in text)
Computed volume is influenced byMicrophone types
Microphone setups
-
7/28/2019 Basic Audio Feature
10/18
Volume (III)
To avoid DC bias (or DC drifting)
DC bias: The vibration is not around zero
Computation:
Volume:
Log energy (in decibel):
Theoretical background (How to prove?)
1
n
i
i
vol s median s
2
10
1
10*logn
i
i
energy s mean s
1 2 1, ,..., arg min
n
n ixis s s s s x median s
2
1 2
1
, ,..., arg minn
n ix
i
s s s s s x mean s
-
7/28/2019 Basic Audio Feature
11/18
Volume (IV)
Functions for computing volume
Example: volume01
Example: volume02
Example: volume03
Volume depends on
Frequency
Try this equal loudness test
Timber
Example: volume04
http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q) -
7/28/2019 Basic Audio Feature
12/18
Zero Crossing Rate
Zero crossing rate (ZCR)
The number of zero crossing in a frame.
Characteristics
Zero-justification is required.
Noise and unvoiced sound have high ZCR.
ZCR is commonly used in endpoint detection,
especially in detection the start and end ofunvoiced sounds.
To distinguish noise/silence from unvoiced sound,usually we add a shift before computing ZCR.
-
7/28/2019 Basic Audio Feature
13/18
ZCR Computations
Two types of ZCR definitions
If a sample with zero value is considered a case ofZCR, then the value of ZCR is higher. Otherwise
its lower.The distinction diminishes when using a higher
bit resolution.
Other consideration
ZCR with shift can be used to distinguish betweenunvoiced sounds and silence. (How to determinethe shift amount?)
-
7/28/2019 Basic Audio Feature
14/18
ZCR
ZCR computing
Example: zcr01
Example: zcr02
To use ZCR to distinguish between unvoiced
sounds and environmental noise
Example: Example: zcrWithShift
http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v) -
7/28/2019 Basic Audio Feature
15/18
Pitch
Definition
Pitch is also known as fundamental frequency,which is equal to the no. of fundamental period
within a second. The unit used here is Hertz (Hz).More commonly, pitch is in terms of semitone,
which can be converted from pitch in Hertz:
269 12* log
440
Hzsemitone
-
7/28/2019 Basic Audio Feature
16/18
Pitch Computation (I)
Pitch of tuning forks
semitoneff
pitch
Hzfpff
fp
9827.68440
log69
56.439/1
sec002275.016000/5/)7189(
2
-
7/28/2019 Basic Audio Feature
17/18
Pitch Computation (II)
Pitch of speech
semitoneff
pitch
Hzfpf f
f p
42.46440
log69
403.119/1
sec008375.016000/3/)75477(
2
-
7/28/2019 Basic Audio Feature
18/18
Statistics of Mandarin Chinese
5401 characters, each character is at least associated with abase syllable and a tone
411 base syllables, and most syllables have 4 ones, so wehave 1501 tonal syllables
Tone is characterized by the pitch curves:Tone 1: high-high
Tone 2: low-high
Tone 3: high-low-high
Tone 4: high-low
Some examples of tones:1242
1234
?????Taiwanese