basic audio feature

Upload: brian-sanchez

Post on 03-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Basic Audio Feature

    1/18

    Basic Features of Audio Signals

    ()

    Jyh-Shing Roger Jang ()

    http://mirlab.org/jang

    MIR Lab, CSIE DeptNational Taiwan Univ., Taiwan

    http://mirlab.org/janghttp://mirlab.org/jang
  • 7/28/2019 Basic Audio Feature

    2/18

    Audio Features

    Four commonly used audio features

    Volume, pitch, zero crossing rate, timber

    Our goalThese features can be perceived subjectively

    (except for zero crossing rate).

    Our goal is to compute them quantitatively (and

    objectively) for further processing andrecognition.

  • 7/28/2019 Basic Audio Feature

    3/18

    Audio Features in Time Domain

    Audio features presented in the time domain

    Intensity

    Fundamental period

    Timbre: Waveform within an FP

  • 7/28/2019 Basic Audio Feature

    4/18

    Audio Features in Frequency Domain

    Volume: Magnitude of spectrum

    Pitch: Distance between harmonics

    Timber: Smoothed spectrumSecond formant

    F2First formant

    F1

    Pitch freq

    Intensity

  • 7/28/2019 Basic Audio Feature

    5/18

    General Steps for Audio Analysis

    1. Frame blocking

    Frame duration of 20 ms or so

    2. Feature extraction

    Volume, zero-crossing rate, pitch, MFCC, etc

    3. Frame-based Analysis

    Pitch contour comparison, HMM evaluation, etc

  • 7/28/2019 Basic Audio Feature

    6/18

    Frame Blocking

    Sample rate = 11025 Hz

    Frame size = 256 samples

    Overlap = 84 samples

    (Hop size = frame size - overlap)

    Frame rate = 11025/(256-84)=64 frames/sec0 50 100 150 200 250 300

    -0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    Zoom in

    Overlap

    Frame

    0 500 1000 1500 2000 2500-0.4

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

  • 7/28/2019 Basic Audio Feature

    7/18

    Frame-based Manipulation

    For simplicity, we usually pack frames into a

    frame matrix for easy manipulation in

    MATLAB:

    [y, fs, nbits]=wavread(file.wav);

    frameMat=enframe(y, frameSize, overlap);

  • 7/28/2019 Basic Audio Feature

    8/18

    Volume (I)

    Loudness of audio signals

    Visual cue: Amplitude of vibration

    Also known as energy or intensity

    Two major ways of computing volume:Volume:

    Log energy (in decibel):

    1

    n

    i

    i

    vol s

    2

    10 1

    10*logn

    ii

    energy s

  • 7/28/2019 Basic Audio Feature

    9/18

    Volume (II)

    Perceived volume is influenced by

    Frequency (see equal loudness curves in text)

    Timbre (see example in text)

    Computed volume is influenced byMicrophone types

    Microphone setups

  • 7/28/2019 Basic Audio Feature

    10/18

    Volume (III)

    To avoid DC bias (or DC drifting)

    DC bias: The vibration is not around zero

    Computation:

    Volume:

    Log energy (in decibel):

    Theoretical background (How to prove?)

    1

    n

    i

    i

    vol s median s

    2

    10

    1

    10*logn

    i

    i

    energy s mean s

    1 2 1, ,..., arg min

    n

    n ixis s s s s x median s

    2

    1 2

    1

    , ,..., arg minn

    n ix

    i

    s s s s s x mean s

  • 7/28/2019 Basic Audio Feature

    11/18

    Volume (IV)

    Functions for computing volume

    Example: volume01

    Example: volume02

    Example: volume03

    Volume depends on

    Frequency

    Try this equal loudness test

    Timber

    Example: volume04

    http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://www.phys.unsw.edu.au/jw/hearing.htmlhttp://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)
  • 7/28/2019 Basic Audio Feature

    12/18

    Zero Crossing Rate

    Zero crossing rate (ZCR)

    The number of zero crossing in a frame.

    Characteristics

    Zero-justification is required.

    Noise and unvoiced sound have high ZCR.

    ZCR is commonly used in endpoint detection,

    especially in detection the start and end ofunvoiced sounds.

    To distinguish noise/silence from unvoiced sound,usually we add a shift before computing ZCR.

  • 7/28/2019 Basic Audio Feature

    13/18

    ZCR Computations

    Two types of ZCR definitions

    If a sample with zero value is considered a case ofZCR, then the value of ZCR is higher. Otherwise

    its lower.The distinction diminishes when using a higher

    bit resolution.

    Other consideration

    ZCR with shift can be used to distinguish betweenunvoiced sounds and silence. (How to determinethe shift amount?)

  • 7/28/2019 Basic Audio Feature

    14/18

    ZCR

    ZCR computing

    Example: zcr01

    Example: zcr02

    To use ZCR to distinguish between unvoiced

    sounds and environmental noise

    Example: Example: zcrWithShift

    http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureZeroCrossingRate.asp?title=5-3%20Zero%20Crossing%20Rate%20(%B9L%B9s%B2v)
  • 7/28/2019 Basic Audio Feature

    15/18

    Pitch

    Definition

    Pitch is also known as fundamental frequency,which is equal to the no. of fundamental period

    within a second. The unit used here is Hertz (Hz).More commonly, pitch is in terms of semitone,

    which can be converted from pitch in Hertz:

    269 12* log

    440

    Hzsemitone

  • 7/28/2019 Basic Audio Feature

    16/18

    Pitch Computation (I)

    Pitch of tuning forks

    semitoneff

    pitch

    Hzfpff

    fp

    9827.68440

    log69

    56.439/1

    sec002275.016000/5/)7189(

    2

  • 7/28/2019 Basic Audio Feature

    17/18

    Pitch Computation (II)

    Pitch of speech

    semitoneff

    pitch

    Hzfpf f

    f p

    42.46440

    log69

    403.119/1

    sec008375.016000/3/)75477(

    2

  • 7/28/2019 Basic Audio Feature

    18/18

    Statistics of Mandarin Chinese

    5401 characters, each character is at least associated with abase syllable and a tone

    411 base syllables, and most syllables have 4 ones, so wehave 1501 tonal syllables

    Tone is characterized by the pitch curves:Tone 1: high-high

    Tone 2: low-high

    Tone 3: high-low-high

    Tone 4: high-low

    Some examples of tones:1242

    1234

    ?????Taiwanese