statistical methods in particle physics day 3: multivariate methods (ii)

59
G. Cowan Statistical Methods in Particle Physics 1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清清清清清清清清清清清清 2010 清 4 清 12—16 清 Glen Cowan Physics Department Royal Holloway, University of Lond [email protected] www.pp.rhul.ac.uk/~cowan

Upload: helen-thomas

Post on 31-Dec-2015

37 views

Category:

Documents


2 download

DESCRIPTION

Statistical Methods in Particle Physics Day 3: Multivariate Methods (II). 清华大学高能物理研究中心 2010 年 4 月 12—16 日. Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan. Outline of lectures. Day #1: Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 1

Statistical Methods in Particle PhysicsDay 3: Multivariate Methods (II)

清华大学高能物理研究中心2010 年 4 月 12—16日

Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan

Page 2: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 2

Outline of lecturesDay #1: Introduction

Review of probability and Monte Carlo Review of statistics: parameter estimation

Day #2: Multivariate methods (I) Event selection as a statistical test Cut-based, linear discriminant, neural networks

Day #3: Multivariate methods (II) More multivariate classifiers: BDT, SVM ,...

Day #4: Significance tests for discovery and limits Including systematics using profile likelihood

Day #5: Bayesian methods Bayesian parameter estimation and model selection

Page 3: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 3

Day #3: outline

Nonparametric probability density estimation methodshistogramskernel density estimation, K nearest neighbour

Boosted Decision Trees

Support Vector Machines

Page 4: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 4

Review of the problemSuppose we have events from Monte Carlo of two types, signal and background (each instance of x is multivariate):

Goal is to use these training events to adjust the parametersof a test statistic (“classifier”) y(x), that we can then use onreal data to distinguish between the two types.

Yesterday we saw linear classifiers, neural networks and startedtalking about probability density estimation methods, where wefind the classifier from the ratio of estimated pdfs.

Page 5: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 5

Page 6: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 6

Page 7: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 7

Page 8: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 8

Page 9: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 9

Page 10: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 10

Page 11: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 11

Page 12: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 12

Page 13: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 13

Page 14: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 14

Page 15: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 15

Page 16: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 16

Page 17: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 17

Page 18: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 18

Page 19: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 19

Page 20: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 20

Page 21: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 21

Page 22: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 22

Page 23: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 23

Page 24: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 24

Page 25: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics 25

Boosted decision trees

Page 26: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 26

Particle i.d. in MiniBooNEDetector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes:

H.J. Yang, MiniBooNE PID, DNP06H.J. Yang, MiniBooNE PID, DNP06

Search for to e oscillations

required particle i.d. using information from the PMTs.

Page 27: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 27

Decision treesOut of all the input variables, find the one for which with a single cut gives best improvement in signal purity:

Example by MiniBooNE experiment,B. Roe et al., NIM 543 (2005) 577

where wi. is the weight of the ith event.

Resulting nodes classified as either signal/background.

Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node.

The set of cuts defines the decision boundary.

Page 28: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 28

Page 29: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 29

Page 30: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 30

Page 31: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 31

Page 32: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 32

BDT example from MiniBooNE~200 input variables for each event ( interaction producing e, or

Each individual tree is relatively weak, with a misclassification error rate ~ 0.4 – 0.45

B. Roe et al., NIM 543 (2005) 577

Page 33: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 33

Monitoring overtraining

From MiniBooNEexample:

Performance stableafter a few hundredtrees.

Page 34: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 34

Monitoring overtraining (2)

Page 35: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 35

Comparison of boosting algorithmsA number of boosting algorithms on the market; differ in theupdate rule for the weights.

Page 36: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 36

Page 37: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 37

Single top quark production (CDF/D0)Top quark discovered in pairs, butSM predicts single top production.

Use many inputs based on jet properties, particle i.d., ...

signal(blue +green)

Pair-produced tops are now a background process.

Page 38: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 38

Different classifiers for single top

Also Naive Bayes and various approximations to likelihood ratio,....

Final combined result is statistically significant (>5 level) but not easy to understand classifier outputs.

Page 39: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 39

Page 40: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 40

Page 41: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 41

Page 42: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 42

Page 43: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 43

Page 44: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 44

Page 45: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 45

Page 46: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 46

Page 47: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 47

Page 48: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 48

Page 49: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 49

Page 50: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 50

Page 51: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 51

Using an SVMTo use an SVM the user must as a minimum choose

a kernel function (e.g. Gaussian)any free parameters in the kernel (e.g. the of the Gaussian)a cost parameter C (plays role of regularization parameter)

The training is relatively straightforward because, in contrast to neuralnetworks, the function to be minimized has a single global minimum.

Furthermore evaluating the classifier only requires that one retainand sum over the support vectors, a relatively small number of points.

The advantages/disadvantages and rationale behind the choices above is not always clear to the particle physicist -- help needed here.

Page 52: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 52

SVM in particle physicsSVMs are very popular in the Machine Learning community but haveyet to find wide application in HEP. Here is an early example froma CDF top quark anlaysis (A. Vaiciulis, contribution to PHYSTAT02).

signaleff.

Page 53: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 53

Summary on multivariate methodsParticle physics has used several multivariate methods for many years:

linear (Fisher) discriminantneural networksnaive Bayes

and has in the last several years started to use a few morek-nearest neighbourboosted decision treessupport vector machines

The emphasis is often on controlling systematic uncertainties betweenthe modeled training data and Nature to avoid false discovery.

Although many classifier outputs are "black boxes", a discoveryat 5 significance with a sophisticated (opaque) method will win thecompetition if backed up by, say, 4 evidence from a cut-based method.

Page 54: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

Quotes I like

“If you believe in something you don't understand, you suffer,...”

– Stevie Wonder

“Keep it simple.As simple as possible.Not any simpler.”

– A. Einstein

G. Cowan page 54Statistical Methods in Particle Physics

Page 55: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 55

Extra slides

Page 56: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 56

Page 57: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 57

Page 58: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 58

Page 59: Statistical Methods in Particle Physics Day 3:   Multivariate Methods (II)

G. Cowan Statistical Methods in Particle Physics page 59