igec time coincidence search is taking advantage of “a priori” information

17
Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR Tokyo Univ.Trento and INFN IGEC time coincidence search is taking advantage of “a priori” information template search: matched filters optimized for short and rare transient gw with flat Fourier transform over the detector frequency band many trials at once: - different detector configurations (9 pairs + 7 triples + 2 four-fold) - many target thresholds on the searched gw amplitude (30) - directional / non directional searches narrowband detectors & same directional sensitivity Cons: probing a smaller volume of the signal parameter space Pros: simpler problem GravStat 2005

Upload: cicada

Post on 09-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

GravStat 2005. Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN. narrowband detectors & same directional sensitivity Cons: probing a smaller volume of the signal parameter space - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IGEC  time coincidence search is taking advantage of “a priori” information

Statistical problems in network data analysis:burst searches by narrowband detectors

L.Baggio and G.A.ProdiICRR Tokyo Univ.Trento and INFN

IGEC time coincidence search is taking advantage of “a priori” information

• template search: matched filters optimized for short and rare transient gw with flat Fourier transform over the detector frequency band

• many trials at once: - different detector configurations (9 pairs + 7 triples + 2 four-fold)- many target thresholds on the searched gw amplitude (30)- directional / non directional searches

narrowband detectors & same directional sensitivityCons: probing a smaller volume of the signal parameter spacePros: simpler problem

GravStat 2005

Page 2: IGEC  time coincidence search is taking advantage of “a priori” information

… IGEC cont`d

• data selection and time coincidence search: - control of false dismissal probability- balance between efficiency of detection and background fluctuations

• background noise estimation - high statistics: 103 time lags for detector pairs

104 – 105 detector triples- goodness of fit tests with background model (Poisson)

• blind analysis (“good will”):- tuning of procedures on time shifted data by looking at all the

observation time (no playground)

… what if evidence for a claim would appear ?“GW candidates will be given special attention …”

- IGEC-2 agreed on a blind data exchange (secret time lag)

GravStat 2005

Page 3: IGEC  time coincidence search is taking advantage of “a priori” information

Poisson statistics

For each couple of detectors and amplitude selection, the resampled statistics allows to test Poisson hypothesis for accidental coincidences.

Example: EX-NA background(one-tail 2 p-level 0.71)

As for all two-fold combinations a fairly big number of tests are performed, the overall agreement of the histogram of p-levels with uniform distribution says the last word on the goodness-of-the-fit.

verifiedGravStat 2005

Page 4: IGEC  time coincidence search is taking advantage of “a priori” information

A few basics: confidence belts and coverage

x

x

x

( ; )p d f x

0 1 coverage

0 1 coverage

0 1 coverage experimental data

phys

ical

unk

now

n

GravStat 2005

Page 5: IGEC  time coincidence search is taking advantage of “a priori” information

( )C CL

I can be chosen arbitrarily within this “horizontal” constraint

Feldman & Cousins (1998) and variations (Giunti 1999, Roe & Woodroofe 1999, ...)

0 1 coverage

Freedom of choice of confidence belt

Fixed frequentistic coverage

GravStat 2005

Page 6: IGEC  time coincidence search is taking advantage of “a priori” information

Plot of the likelihood integral vs. minimum (conservative) coverage minC(), with background counts Nb=0.01-10

Confidence intervals from likelihood integral

• I fixed, solve for :

sup

inf

supinf

1

0

( ; ) ( ; )

( ; ) ( ; )

c c

N

c cN

N N N N

I N N dN N N dN supinf0 N N

• Compute the coverage

supinf|

( ) ( ; )c

cN N N N

C N f N N I

• Let

c b

b obs

N N N

N T

• Poisson pdf:

( ; )!

bc

N NN

c bc

ef N N N N

N

( ; ) ( ; )c cN N f N N• Likelihood:

GravStat 2005

Page 7: IGEC  time coincidence search is taking advantage of “a priori” information

c bN N N

Example: Poisson background Nb = 7.0

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

coincidence counts Nc

N

0

1

2

3

4

5

6

7

8

9

10

99%

99%

95%

95%

99.9%

99.9%

50%50%

99%

95%

85%

N

Likelihood integral

Page 8: IGEC  time coincidence search is taking advantage of “a priori” information

Plot of the likelihood integral vs. minimum (conservative) coverage minC(), with background counts Nb=0.01-10

Confidence intervals from likelihood integral

• I fixed, solve for :

sup

inf

supinf

1

0

( ; ) ( ; )

( ; ) ( ; )

c c

N

c cN

N N N N

I N N dN N N dN supinf0 N N

• Compute the coverage

supinf|

( ) ( ; )c

cN N N N

C N f N N I

• Let

c b

b obs

N N N

N T

• Poisson pdf:

( ; )!

bc

N NN

c bc

ef N N N N

N

( ; ) ( ; )c cN N f N N• Likelihood:

GravStat 2005

Page 9: IGEC  time coincidence search is taking advantage of “a priori” information

Multiple configurations/selection/grouping within IGEC analysis

GravStat 2005

Page 10: IGEC  time coincidence search is taking advantage of “a priori” information

0

100

200

300

400

500

0 1 2 3 4 5

numer of false alarms

coun

ts

Resampling statistics of accidental claimsevent time series

coverage “claims”

0.90 0.866 (0.555) [1]

0.95 0.404 (0.326) [1]

expected found

Easy to set up a blind search

GravStat 2005

Page 11: IGEC  time coincidence search is taking advantage of “a priori” information

Keep track of the number of trials (and their correlation) !

IGEC-1 final results consist of a few sets of tens of Confidence Intervals with min{C}=95%

the “false positives” would hide true discoveries requiring more than 5 two-sided C.I. to reach 0.1% confidence for rejecting H0

the procedure was good for Upper Limits, but NOT optimized for discoveries

Need to decrease the “false alarm probability” (type I error)

GravStat 2005

Page 12: IGEC  time coincidence search is taking advantage of “a priori” information

Freedom of choice of confidence belt

Fine tune of the false alarm probability

0

GW enthusiastic

fanatic skeptical

Page 13: IGEC  time coincidence search is taking advantage of “a priori” information

c bN N N

Example: confidence belt from likelihood integralPoisson background Nb = 7.0

Min{C}=95%

1 - C(N )

P{false alarm} < 0.1%

P{false alarm} < 5 %

GravStat 2005

Page 14: IGEC  time coincidence search is taking advantage of “a priori” information

What false alarm threshold should be used to claim evidence for rejecting the null H0?

GravStat 2005

• control the overall false detection probability: Familywise Error Rate < requires single C.I. with P{false alarm} < /mPro: rare mistakesCon: high detection inefficiency

• control the mean False Discovery Rate: R = total number of reported discoveriesF+ = actual number of false positives

Benjamini & Hochberg (JRSS-B (1995) 57:289-300)Miller et. al. (A J 122: 3492-3505 Dec 2001; http://arxiv.org/abs/astro-ph/0107034)

Fq

R

Page 15: IGEC  time coincidence search is taking advantage of “a priori” information

Typically, the measured values of p are biased toward 0. signal

The p-values are uniformly distributed in [0,1] if the assumed hypothesis is true

Usually, the alternative hypothesis is not known. However, the presence of a signal

would contribute to bias the p-values distribution.

p-level

1

background

pdf

FDR control

Page 16: IGEC  time coincidence search is taking advantage of “a priori” information

Sketch of Benjamini & Hochberg FDR control procedure

• choose your desired bound q on <FDR>;

• OK if p-values are independent or positively correlated

• compute p-values {p1, p2, … pm} for a set of tests, and sort them in creasing order;

p-value

m

• determine the threshold T= pk by finding the index k such that pj<(q/m) j for every j>k;

reject H0 q

T

counts

• in case NO signal is present (H0 is true), the procedure is equivalent to the control of the FamilyWise Error Rate at confidence < q

Page 17: IGEC  time coincidence search is taking advantage of “a priori” information

Open questions

check the fluctuations of the random variable FDR with respect to the mean.

check how the expected uniform distribution of p-values for the null H0 can be biased (systematics, …)

would the colleagues agree that overcoming the threshold chosen to control FDR means & requires reporting a rejection of the null hypothesis ?

To me rejection of the null is a claim for an excess correlation in the observatory at the true time, not taken into account in the measured noise background at different time lags. It could NOT be gws, but a paper reporting the H0 rejection is worthwhile and due.

GravStat 2005