basic audio feature

7/28/2019 Basic Audio Feature

1/18

Basic Features of Audio Signals

()

Jyh-Shing Roger Jang ()

http://mirlab.org/jang

MIR Lab, CSIE DeptNational Taiwan Univ., Taiwan
http://mirlab.org/janghttp://mirlab.org/jang


2/18

Audio Features

Four commonly used audio features

Volume, pitch, zero crossing rate, timber

Our goalThese features can be perceived subjectively

(except for zero crossing rate).

Our goal is to compute them quantitatively (and

objectively) for further processing andrecognition.


3/18

Audio Features in Time Domain

Audio features presented in the time domain

Intensity

Fundamental period

Timbre: Waveform within an FP


4/18

Audio Features in Frequency Domain

Volume: Magnitude of spectrum

Pitch: Distance between harmonics

Timber: Smoothed spectrumSecond formant

F2First formant

F1

Pitch freq

Intensity


5/18

General Steps for Audio Analysis

1. Frame blocking

Frame duration of 20 ms or so

2. Feature extraction

Volume, zero-crossing rate, pitch, MFCC, etc

3. Frame-based Analysis

Pitch contour comparison, HMM evaluation, etc


6/18

Frame Blocking

Sample rate = 11025 Hz

Frame size = 256 samples

Overlap = 84 samples

(Hop size = frame size - overlap)

Frame rate = 11025/(256-84)=64 frames/sec0 50 100 150 200 250 300

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Zoom in

Overlap

Frame

0 500 1000 1500 2000 2500-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3


7/18

Frame-based Manipulation

For simplicity, we usually pack frames into a

frame matrix for easy manipulation in

MATLAB:

[y, fs, nbits]=wavread(file.wav);

frameMat=enframe(y, frameSize, overlap);


8/18

Volume (I)

Loudness of audio signals

Visual cue: Amplitude of vibration

Also known as energy or intensity

Two major ways of computing volume:Volume:

Log energy (in decibel):

1

n

i

i

vol s

2

10 1

10*logn

ii

energy s


9/18

Volume (II)

Perceived volume is influenced by

Frequency (see equal loudness curves in text)

Timbre (see example in text)

Computed volume is influenced byMicrophone types

Microphone setups


10/18

Volume (III)

To avoid DC bias (or DC drifting)

DC bias: The vibration is not around zero

Computation:

Volume:

Log energy (in decibel):

Theoretical background (How to prove?)

1

n

i

i

vol s median s

2

10

1

10*logn

i

i

energy s mean s

1 2 1, ,..., arg min

n

n ixis s s s s x median s

2

1 2

1

, ,..., arg minn

n ix

i

s s s s s x mean s


11/18

Volume (IV)

Functions for computing volume

Example: volume01

Example: volume02

Example: volume03

Volume depends on

Frequency

Try this equal loudness test

Timber

Example: volume04
http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)


12/18

Zero Crossing Rate

Zero crossing rate (ZCR)

The number of zero crossing in a frame.

Characteristics

Zero-justification is required.

Noise and unvoiced sound have high ZCR.

ZCR is commonly used in endpoint detection,

especially in detection the start and end ofunvoiced sounds.

To distinguish noise/silence from unvoiced sound,usually we add a shift before computing ZCR.


13/18

ZCR Computations

Two types of ZCR definitions

If a sample with zero value is considered a case ofZCR, then the value of ZCR is higher. Otherwise

its lower.The distinction diminishes when using a higher

bit resolution.

Other consideration

ZCR with shift can be used to distinguish betweenunvoiced sounds and silence. (How to determinethe shift amount?)


14/18

ZCR

ZCR computing

Example: zcr01

Example: zcr02

To use ZCR to distinguish between unvoiced

sounds and environmental noise

Example: Example: zcrWithShift
http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)


15/18

Pitch

Definition

Pitch is also known as fundamental frequency,which is equal to the no. of fundamental period

within a second. The unit used here is Hertz (Hz).More commonly, pitch is in terms of semitone,

which can be converted from pitch in Hertz:

269 12* log

440

Hzsemitone


16/18

Pitch Computation (I)

Pitch of tuning forks

semitoneff

pitch

Hzfpff

fp

9827.68440

log69

56.439/1

sec002275.016000/5/)7189(

2


17/18

Pitch Computation (II)

Pitch of speech

semitoneff

pitch

Hzfpf f

f p

42.46440

log69

403.119/1

sec008375.016000/3/)75477(

2


18/18

Statistics of Mandarin Chinese

5401 characters, each character is at least associated with abase syllable and a tone

411 base syllables, and most syllables have 4 ones, so wehave 1501 tonal syllables

Tone is characterized by the pitch curves:Tone 1: high-high

Tone 2: low-high

Tone 3: high-low-high

Tone 4: high-low

Some examples of tones:1242

1234

?????Taiwanese

basic audio feature

Documents