ec-512ec512 lecnotes pt2

Upload: nagmani-kumar

Post on 01-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    1/29

     

    Hypothesis Testing

    Hypothesis testing: use of statistics to determine theprobability that a given hypothesis is true / false.

    Hypothesis: some theory or claim that has been put

    forward because it is believed to be true, but has not

    been proved.

    In the present context, hypothesis is a conjecture about

    the distribution of some random variables – often

    statements about mean and variance of an r.v.

    Hypothesis testing tests a null hypothesis H 0 against the

    alternate hypothesis H 1.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    2/29

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    3/29

     

    Decision Errors *he outcome of a hypothesis test is either +reject H 0 or

    +do not reject H 0.

    -hen we perform a statistical test we hope that our

    decision will be correct, but sometimes it will be wrong.

    *here are two possible errors that can be made inhypothesis test.

    *ruth

    ecisions

     )ccept  H 0

    eject  H 0

     H 0

    "orrect *ype I 0rror 

     H 1

    *ype II 0rror  "orrect

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    4/29

     

    Steps in Hypothesis

    Testing

    Hypothesis testing is a proof by contradiction.Step 1: 1ormulate the null and alternative hypothesis. )ssume

    H2 is true.

    Step 2: "ollect data.

    Step 3: 0valuate whether data are consistent with the statisticalhypothesis – identify a test statistics that will be computedfrom the collection of data and assess the truth of the nullhypothesis.

    Approaches: #$% 1re'uentist or classical.

     #&% 3ayesian

      #(% 4i!elihood.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    5/29

     

    Some Defnitions "ritical region or rejection region – that region of the data5

    space or the corresponding range of statistical test value forwhich null hypothesis will be rejected.

    6i7e8 or significance level of the test – prob. of incorrectlyrejecting H 2 9 prob. of *ype I error 9 α .

    -hen value for test statistic is found in the rejection region,the test is said to be 6statistically significant8 at α  level

    rob. of *ype II error 9 β

    6ower8 of a test – prob. of correctly rejecting H 2

     9 $ ; β. *he p5value is the min. significance level for which we would

    still reject the null hypothesis. *he p5value is a measure ofhow much evidence there is against the null hypothesis.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    6/29

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    7/29

     

    Example (contd.)

    ?easure sample value 9 x s.

    0valuate

    If z  @ &, then we decide that the sample has not come fromthe 7ero mean =aussian process under consideration.

    "hec! that the 6si7e8 or 6significance level8 or the

    probability of *ype I error for the set criterion is α  9 2.2A

    *he p5value for the measured sample is

    where f 2# x % 9 N #µ2,σ &%.

    σ 

     µ 0−=   s x

     z 

    ∫ +∞

    −=0

    )(2   00 µ 

     µ 

     s x

    dx x f   p

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    8/29

     

    Example (contd.)

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    9/29

     

    Bayesian HypothesisTesting =iven prior probabilities P #H 2% and P #H $%

    =iven li!elihoods p# x s B H 2% and p# x s B H $%, where x s is the

    collected data.

    "alculate posterior odds ratio 9 3ayes factor C prior odds

    ratio.

    )(

    )(

    )|(

    )|(

    )|(

    )|(

    1

    0

    1

    0

    1

    0

     H  P 

     H  P 

     H  x p

     H  x p

     x H  P 

     x H  P 

     s

     s

     s

     s ×=

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    10/29

     

    Bayesian Method (contd.)

    ecide for H 2 if the posterior odd is greater than $, else

    decide for H $.

    3asically based on ?) criterion.

    rior odd modified after observing the data.

    Integrates prior probabilities associated with competing

    hypotheses into the assessment of which hypothesis is

    the most li!ely for the data in hand. aying other way, li!elihoods modified with the prior

    probabilities.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    11/29

     

    P #*ype II 0rror% 9 ∫R0 p( x |  H 1)dxecision boundary:  p( x |  H 0) P ( H 0) =  p( x |  H 1) P ( H 1)

    Bayesian Method (contd.)

    P E  = P #*ype I 0rror%P #H 2% D P #*ype II 0rror%P #H $%P #*ype I 0rror% 9 ∫R1 p( x |  H 0)dx

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    12/29

     

    Bayesian Method (contd.)

    -hen the hypotheses are defined over some parameter:

    Eseful for compound hypothesis testing.

    0xample: -e toss a coin $22 times and obtain F2 heads

    and G2 tails. -hat is the evidence against thehypothesis that the coin is fair

    )(

    )(

    )|(),|(

    )|(),|(

    )|(

    )|(

    1

    0

    11

    00

    1

    0

     H  P 

     H  P 

    d  H  p H  x p

    d  H  p H  x p

     x H  P 

     x H  P 

     s

     s

     s

     s

    ×= ∫ ∫ 

    θ θ θ 

    θ θ θ 

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    13/29

     

    Minimum Prob o! Error 

    -e attach some cost function: C ij  9 cost incurred byaccepting H i  when H  j  is true.

    R 2 is region of accepting H 2, R $ is region of rejecting H 2#accepting H $% – we have to determine proper decision

    regions so that the overall cost is minimi7ed.

    >verall cost value:

    [ ]

    [ ]dx H  P  H  x pC  H  P  H  x pC 

    dx H  P  H  x pC  H  P  H  x pC C 

     R

     R

    ∫ 

    ∫ 

    ++

    +=

    1

    0

    )()|()()|(

    )()|()()|(

    11110010

    11010000

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    14/29

     

    Minimum Prob o! Error(contd.) "ost minimi7ed if we decide for H 2 when

    "onsidering 7ero5one cost function, decide for H 2 when

    *his is essentially the 3ayes decision criterion.

    In this the overall error #both *ype I and *ype II% isminimi7ed.

    )()|()()|(

    )()|()()|(

    11110010

    11010000

     H  P  H  x pC  H  P  H  x pC 

     H  P  H  x pC  H  P  H  x pC 

     s s

     s s

    +<

    +

    )()|()()|( 0011   H  P  H  x p H  P  H  x p  s s   <

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    15/29

     

    Minimum Prob o! Error(contd.)

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    16/29

     

    "isherian HypothesisTesting "onstruct a statistical null hypothesis

    "hoose an appropriate distribution or test statistic.

    "ollect the data with random samples.

    etermine the p value assuming the null hypothesis istrue.

    eject the null hypothesis if p is small

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    17/29

     

    Neyman#Pearson Method

    imilar to 1ishers approach, but:

    et significance level in advance,

    1ocus on *ype I and *ype II errors, as well as power of

    tests.

    1or any α  there is infinite number of possible decisionrules #infinite number of critical regions%.

    0ach critical region has a power.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    18/29

     

    Neyman#Pearson Method(contd.)

    1alse )larm: -rongly rejecting H 2. etection: ightly detecting H 2 not true #rejecting H 2%.

    ?iss: -rongly accepting H 2.

    "hec! that prob. of false alarm P F  9 α  and prob. of miss

    is β. rob. of correct acceptance is #$ ; α % and prob. of

    detection P D 9 #$ ; β%.

    o, we have two degrees of freedom.

     )im to increase P D while decreasing P F – generally not

    possible simultaneously.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    19/29

     

    Neyman#Pearson Method(contd.)

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    20/29

     

    Neyman#Pearson Method(contd.)

    elation between P F  and P D given commonly by receiveroperating characteristic #>"% curve:

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    21/29

     

    Neyman#Pearson $emma

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    22/29

     

    Hypothesis Testing using$%T ecall the criterion used for minimi7ing cost function in

    hypothesis testing #decision ma!ing%: eject H 2 when

    *his can be rewritten as

    o, if the threshold k  in 4* e'uals H then the cost

    function is minimi7ed.  )lso, chec! that if k  9 P #H $%/P #H 2% for 7ero5one cost

    function then it is essentially the 3ayes criterion based

    hypothesis testing.

    )()|()()|(

    )()|()()|(

    11110010

    11010000

     H  P  H  x pC  H  P  H  x pC 

     H  P  H  x pC  H  P  H  x pC 

     s s

     s s

    +>+

    )()(

    )|()|(

    0

    1

    0010

    1101

    1

    0

     H  P 

     H  P 

    C C 

    C C 

     H  x p

     H  x p

     s

     s ×−−<

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    23/29

     

    Points to Note

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    24/29

     

    Maximum $i&elihoodEstimation

     )ssumed that the parametric form of p# x % is !nown, butdepends on some parameters θ $, θ &, θ (, KK..

    o, once we can find #estimate% these parameter values

    then the density function is uni'uely determined.

    >bserve a set of i.i.d. training samples x $, x &, KK.., x n.

    4i!elihood:

    ?40 finds the values of parameters for which the

    li!elihood is maximi7ed. 

    ∏=

    ==n

    k n   x p x x x p p

    1

    21   )|()|,.......,,()|(   θθθX

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    25/29

     

    M$E (contd.) *o find the parameter value that maximi7es li!elihood we

    need to differentiate w.r.t. θ k  and then e'uate to 7ero.

    0'uivalently, we may differentiate the log5li!elihood

    function – this is easier to wor! with.

    4og5li!elihood:

    olve for 

    0xplain with examples: =aussian case un!nown mean,

    un!nown mean and variance.

    ∑=

    ==n

    k  x p pl 1

    )|(ln)|(ln)(   θθXθ

    ∑=

    =∇=∇n

    k  x pl 

    1

    )|(ln)(   0θθθθ

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    26/29

     

    MAP Estimation

    rior probability of different parameter values given p#θ%.

    -e can determine the posterior prob. p#θ B X% for the

    given training samples.

    -e loo! for the parameter values that maximi7es this

    posterior prob.

    *hat is, we maximi7e p#X B θ% p#θ%.

    0'uivalently, we maximi7e l #θ% p#θ%.

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    27/29

     

    Bayesian Estimation *he parametric form of p# x % is !nown but the parameter

    values not !nown.

    3asic goal is to compute density function from a given set

    of samples, i.e. p# x  B X% which is close to the un!nown p# x %.

    o, need to compute p#θ B X% – called reproducing density.

    Initial !nowledge about the parameter values is contained

    in a !nown prior density p#θ% – called conjugate prior .

    est of our !nowledge about the parameters is contained

    in the sample set.

    ∫ =   θXθθX   d  p x p x p   )|()|()|(

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    28/29

     

    Bayesian Estimation'contd( Esing 3ayes formula:

    ince the samples are drawn independently according to

    the un!nown prob. density p# x %

    ∫ 

    =θθθX

    θθXXθ

    d  p p

     p p p

    )()|(

    )()|()|(

    ∏=

    =n

    k  x p p1

    )|()|(   θθX

  • 8/9/2019 EC-512EC512 LecNotes Pt2

    29/29

    Bayesian Estimation )Example

    =iven:

    eproducing density:

    1inally, density function:

    ),(~)(),(~)|(  2

    00

    2

    σ  µ  µ σ  µ  µ    N  p N  x p

    ∑=

    =+

    =+

    +   

      

     +

    =

             −−=

    n

    k nnnn

    n

    n

    n

     xnnnn

    n

     X  p

    122

    0

    2202

    0220

    2

    220

    20

    2

    2

    1ˆˆwhere

    2

    1exp2

    1)|(

     µ σ σ 

    σ σ σ  µ 

    σ σ 

    σ  µ 

    σ σ 

    σ  µ 

    σ 

     µ  µ 

    πσ  µ 

    ),(~)|(   22 nn N  x p   σ σ  µ    +X