changes in ppc[1]

Upload: pallavi-ch

Post on 08-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Changes in Ppc[1]

    1/31

    Use of spectral autocorrelation in

    spectral envelope linearprediction for speech recognitionBY

    B.LALITHA

    (08691D3807)

    Under the guidance of

    Prof. M. B. MANJUNATHA, M.Tech (PhD)

  • 8/6/2019 Changes in Ppc[1]

    2/31

    Introduction

    Out-Line of the project

    11.Introduction to Speech.Introduction to Speech

    i) Speech Productioni) Speech Productionii) Speech Recognitionii) Speech Recognition

    2. Implementation

    3.spectral enevelope LPC Analysis

    4.

    Speech recognition us

    ing Dynamic Time Warping (DTW)5.Result

  • 8/6/2019 Changes in Ppc[1]

    3/31

    1. INTRODUCTION TO SPEECH

    i)i) SpeechSpeech ProductionProduction::

    SpeechSpeech cancan bebe characterizedcharacterized asas aa signalsignal carryingcarrying messagemessage informationinformation..

    TheThe PurposePurpose ofofspeechspeech isis communicationcommunication betweenbetween humanshumans..

    SpeechSpeech isis anan acousticacoustic waveformwaveform thatthat conveysconveys informationinformation fromfrom aa speakerspeaker toto

    aa listenerlistener

  • 8/6/2019 Changes in Ppc[1]

    4/31

    Human peech ommunication

    Human Speech signal Human Ear

  • 8/6/2019 Changes in Ppc[1]

    5/31

    Speech Production Mechanism

  • 8/6/2019 Changes in Ppc[1]

    6/31

    Speech Production Mechanism

    Flow of air from lungs

    Vibrating vocal cords

    Speech production cavities

    Lips

    Sound wave

    Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)

  • 8/6/2019 Changes in Ppc[1]

    7/31

    Classification of Speech Signals

    Voiced Sounds :

    Voiced sounds are produced when the vocal cords vibrate. These arequasi-periodic pulses of air which excite the vocal tract.

    These are labeled as / u /, / d /, / w /, / i /, and / e /.

    Unvoiced or Fricative Sounds :

    Unvoiced sounds are produced by forming constriction at some point in

    the vocal tract and forcing air through the constriction at a high velocity toproduce turbulence.

    These are labeled as / / is a fricative sh / f /, and / s / are also fricatives.

  • 8/6/2019 Changes in Ppc[1]

    8/31

    ii) Speech Recognition

    The three basic steps in Automatic Speech Recognition (ASR) are:

    1. Parameter Estimation2. Parameter Comparison

    3.

    Decis

    ion Making

  • 8/6/2019 Changes in Ppc[1]

    9/31

    IMPLEMENTATION

    Pre-emphasising

    Hamming windowing

    Linear prediction

    Spectral autocorrelation

    Dynamic time wrapping

    Extract total storedvectors

    Vector with minimum value

    PredictiveFilter

    Input Speech signal

    LPC Coefficients

    Recognized Word

    ReferenceWord

  • 8/6/2019 Changes in Ppc[1]

    10/31

    PrePre--EmphasisEmphasis

    Boosting the energy in the high frequencies.Boosting the energy in the high frequencies.

    The spectrum for voiced segments has more energy at lower frequenciesThe spectrum for voiced segments has more energy at lower frequenciesthan higher frequencies.than higher frequencies.

    spectral tiltspectral tilt Spectral tilt is caused by the nature of the glottal pulseSpectral tilt is caused by the nature of the glottal pulse

    Boosting highBoosting high--frequency energy gives more information to Acousticfrequency energy gives more information to AcousticModel.Model.

    Improves phone recognition performanceImproves phone recognition performance

  • 8/6/2019 Changes in Ppc[1]

    11/31

    Example of preExample of pre--emphasisemphasis

    Before and after preBefore and after pre--emphasisemphasis Spectral slice from the vowel [aa]Spectral slice from the vowel [aa]

  • 8/6/2019 Changes in Ppc[1]

    12/31

    Pre-emphasing

    b = [ 1 -15/16];

    x= filter(b,1,x);

    Len = length(x);

    %Resample Decimation by 4 % x = x(1:Fs/8000:Len);

    x = resample(x,8000,Fs);

    Fs=8000;

  • 8/6/2019 Changes in Ppc[1]

    13/31

    Pre-emphasized and Resampled

  • 8/6/2019 Changes in Ppc[1]

    14/31

    windowing

    Speech is Non-stationary signal.

    A window is non-zero inside some region and zero elsewhere.

    The speech extracted from each window is called as a frame.

  • 8/6/2019 Changes in Ppc[1]

    15/31

    Windowing

    The windowing process, showingthe frame shift and frame size

  • 8/6/2019 Changes in Ppc[1]

    16/31

    Applying the window for i = 1:n for j = 1:nbFrame M(i, j) = speech(((j - 1) * m) + i); end end

    h = hamming(n);

    M2 = diag(h) * M;

  • 8/6/2019 Changes in Ppc[1]

    17/31

    Common window shapesCommon window shapes

    Rectangular windowRectangular window

    Hamming windowHamming window

  • 8/6/2019 Changes in Ppc[1]

    18/31

    Common window shapesCommon window shapes

  • 8/6/2019 Changes in Ppc[1]

    19/31

    Linear prediction

    Linear Predictive Coding (LPC) provides low-dimension representation ofspeech signal at oneframe

    representation ofspectral envelope, not harmonics analytically tractable method some ability to identify formants

    LPC models the speech signal at time point n as anapproximate linear combination of previousp samples :

    where a1, a2, ap are constant for each frame ofspeech.

    )()2()1()( 21 pnsansansans p } .

  • 8/6/2019 Changes in Ppc[1]

    20/31

    If the error over a segment of speech is defined as

    2

    1

    2

    2

    1

    2

    1

    )()(

    )(

    ! !

    !

    !

    !

    M

    Mm

    p

    k

    nkn

    M

    Mm

    nn

    kmsams

    meE

    where (sn = signal starting at time n)then we can find akby setting xEn/xak= 0 fork= 1,2,p,obtainingp equations andp unknowns:

    pimsimskmsimsaM

    Mm

    nn

    p

    k

    M

    Mm

    nnk ee! !! !

    1)()()()(2

    1

    2

    11

    Error is minimum (not maximum) when derivative is zero, because

    as any akchanges away from optimum value, error will increase.

  • 8/6/2019 Changes in Ppc[1]

    21/31

    2

    1

    2

    1)()( ! !

    !

    M

    Mm

    p

    kkn kmsamsE

    ! !!!

    !

    2

    1111

    2 )()()()(2)(M

    Mm

    p

    r

    k

    p

    k

    k

    p

    k

    kn rmsakmsakmsamsmsE

    !

    !

    !

    !

    !

    2

    1

    1

    1

    22

    1

    11

    2

    )()()()(2

    )()2()2()(2

    )()1()1()(2)(M

    Mmp

    r

    rpp

    p

    r

    r

    p

    r

    r

    n

    rmsapmsapmsams

    rmsamsamsams

    rmsamsamsamsms

    E

    .

    Features: LPC

    )()2(...)1()2()2()(2

    )()1(...)1()1(2)1()(2)(0

    2122

    1111

    2

    1

    2

    1

    pmsamsamsamsamsams

    pmsamsamsamsamsamsmsa

    E

    p

    M

    Mm

    pn

    !!xx

    !

    -

    ! !

    2

    10)1()(...)1()3()1()2(

    )()1(...)2()1()1()1(2)1()(2

    32

    21M

    Mm p

    p

    mspmsamsmsamsmsa

    pmsamsmsamsmsmsamsms

    0)()1(2...)2()1(2)1()1(2)1()(22

    1

    21 !!

    M

    Mm

    p pmsmsamsmsamsmsamsms

    Repeat above equationns fora2, a3, ap

  • 8/6/2019 Changes in Ppc[1]

    22/31

    ? A pikmsimsaimsmsM

    Mm

    p

    k

    k

    M

    Mm

    ee!

    ! !!

    10)()(2)()(22

    1

    2

    11

    pikmsimsaimsmsM

    Mm

    p

    k

    k ee!

    ! !

    10)()(2)()(22

    11

    pimsimskmsimsaM

    Mm

    p

    k

    M

    Mm

    k ee! !! !

    1)()()()(2

    1

    2

    11

  • 8/6/2019 Changes in Ppc[1]

    23/31

    LPC Autocorrelation Method

    Autocorrelation: meas

    ure of periodicity ins

    ignal

    !

    !2

    1

    )()(),(M

    Mm

    nnn kmsimskiJ

    we can re-write equation as

    piikia n

    p

    k

    nk ee!! 1)0,(),(1 JJ

    We can solve forakusing several methods. The most common

    method in speech processing is the autocorrelation method:

    Force the signal to be zero outside of interval 0 e m e N-1:

    where w(m) is a finite-length window (e.g. Hamming) of length N

    that is zero when less than 0 and greater than N-1. is the

    windowed signal. As a result,

  • 8/6/2019 Changes in Ppc[1]

    24/31

    !

    !1

    0

    2 )(pN

    m

    nnmeE

    because of setting the signal to zero outside the window, eqn

    (6):

    z

    and this can be expressed as

    and this is identical to the autocorrelation function for |i-k|

    because

    the autocorrelation function is symmetric, Rn(-k) = Rn(k) :

    so the set of equations forak (eqn (7)) can be combo of(7) and

    (12):

    ! eeee!

    1

    0 0

    1

    )()(),(p

    m

    nnnpkpikmsimskiJ

    ! ee

    ee!

    )(1

    0 0

    1))(()(),(

    ki

    m

    nnn

    pk

    pikimsmskiJ

    !

    !

    !

    k

    m

    nnn

    nn

    kmsmskR

    kiRki

    1

    0

    )()()(

    |)(|),(J

    !

    ee!p

    k

    nnkpiiRkiRa

    1

    1)(|)(|

  • 8/6/2019 Changes in Ppc[1]

    25/31

    !

    )(

    )3()2()1(

    )0()3()2()1(

    )3()0()1()2()2()1()0()1()1()2()1()0(

    3

    2

    1

    p

    a

    a

    a

    a

    p

    p

    p

    p

    p

    p

    n

    n

    n

    n

    pnnnn

    nnnn

    nnnn

    nnnn

    .

    .

    .

    .

    .

    .....

    .....

    .

    .

    .

    In matrix form, equation (14) looks like this:

    There is a recursive algorithm to solve this: Durbins solution

    LPC Durbins Solution

    Solve a Toeplitz (symmetric, diagonal elements equal) matrixfor values ofE:

  • 8/6/2019 Changes in Ppc[1]

    26/31

    )(

    )1(2)(

    )1()1()(

    )(

    )1(1

    1

    )1(

    )0(

    1

    )1(

    11

    1)()(

    )0(

    1)(|)(|

    p

    jj

    i

    i

    i

    i

    jii

    i

    j

    i

    j

    i

    i

    i

    ii

    j

    i

    ji

    p

    k

    nnk

    aEkE

    ijk

    k

    piEjiRiRk

    RE

    piiRkiR

    E

    EEE

    E

    E

    E

    !!

    ee!

    !

    ee

    !

    !

    ee!

    !

    !

    We can compute spectral envelope magnitude from LPCparametersby evaluating the transfer function S(z) forz=ej[:

    !

    !! p

    k

    kj

    k

    j

    j

    ea

    G

    eA

    G

    eS

    1

    1)()( [

    [

    [

  • 8/6/2019 Changes in Ppc[1]

    27/31

    Finding frequency envelope using LPC method for col =1:nbFrame % compute Mth-order autocorrelation function:

    rx = zeros(1,Or+1)'; speech1 = M2(:,col)'+0.000001; for i=1:Or+1, rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)'; end % prepare the M by M Toeplitx covariance matrix: covmatrix = zeros(Or,Or); for i=1:Or, covmatrix(i,i:Or) = rx(1:Or-i+1)'; covmatrix(i:Or,i) = rx(1:Or-i+1); end

    % solve "normal equations" for prediction coeffs Acoeff s = - covmatrix \ rx(2:Or+1); Alp = [1,Acoeffs']; % LP polynomialA(z) dbenvlp(:,col) = 20*log(abs(freqz(1,Alp,n*2)')); end

  • 8/6/2019 Changes in Ppc[1]

    28/31

    Dynamic Time Warping (DTW The SELP analysis is evaluated using a Dynamic time wrapping T = { t1,t2,,ti,.tn} , R= {r 1,r2,,ri,.,rm}

    T= Test Signal or Unknown Signal

    R= Reference Signal or known Signal

    A matrix of m x n is created

    C(x, y) = MIN [C(x + 1, y) , C(x + 1,y + 1) , C(x, y + 1) ] + D(x, y)

  • 8/6/2019 Changes in Ppc[1]

    29/31

    A matrix of m x n is created

  • 8/6/2019 Changes in Ppc[1]

    30/31

    Dynamic Time Warping (DTW)

  • 8/6/2019 Changes in Ppc[1]

    31/31

    Dynamic Time Warping (DTW