statistical methods in particle physics day 3: multivariate methods (ii)
DESCRIPTION
Statistical Methods in Particle Physics Day 3: Multivariate Methods (II). 清华大学高能物理研究中心 2010 年 4 月 12—16 日. Glen Cowan Physics Department Royal Holloway, University of London [email protected] www.pp.rhul.ac.uk/~cowan. Outline of lectures. Day #1: Introduction - PowerPoint PPT PresentationTRANSCRIPT
G. Cowan Statistical Methods in Particle Physics 1
Statistical Methods in Particle PhysicsDay 3: Multivariate Methods (II)
清华大学高能物理研究中心2010 年 4 月 12—16日
Glen CowanPhysics DepartmentRoyal Holloway, University of [email protected]/~cowan
G. Cowan Statistical Methods in Particle Physics 2
Outline of lecturesDay #1: Introduction
Review of probability and Monte Carlo Review of statistics: parameter estimation
Day #2: Multivariate methods (I) Event selection as a statistical test Cut-based, linear discriminant, neural networks
Day #3: Multivariate methods (II) More multivariate classifiers: BDT, SVM ,...
Day #4: Significance tests for discovery and limits Including systematics using profile likelihood
Day #5: Bayesian methods Bayesian parameter estimation and model selection
G. Cowan Statistical Methods in Particle Physics 3
Day #3: outline
Nonparametric probability density estimation methodshistogramskernel density estimation, K nearest neighbour
Boosted Decision Trees
Support Vector Machines
G. Cowan Statistical Methods in Particle Physics 4
Review of the problemSuppose we have events from Monte Carlo of two types, signal and background (each instance of x is multivariate):
Goal is to use these training events to adjust the parametersof a test statistic (“classifier”) y(x), that we can then use onreal data to distinguish between the two types.
Yesterday we saw linear classifiers, neural networks and startedtalking about probability density estimation methods, where wefind the classifier from the ratio of estimated pdfs.
G. Cowan Statistical Methods in Particle Physics 5
G. Cowan Statistical Methods in Particle Physics 6
G. Cowan Statistical Methods in Particle Physics 7
G. Cowan Statistical Methods in Particle Physics 8
G. Cowan Statistical Methods in Particle Physics 9
G. Cowan Statistical Methods in Particle Physics 10
G. Cowan Statistical Methods in Particle Physics 11
G. Cowan Statistical Methods in Particle Physics 12
G. Cowan Statistical Methods in Particle Physics 13
G. Cowan Statistical Methods in Particle Physics 14
G. Cowan Statistical Methods in Particle Physics 15
G. Cowan Statistical Methods in Particle Physics 16
G. Cowan Statistical Methods in Particle Physics 17
G. Cowan Statistical Methods in Particle Physics 18
G. Cowan Statistical Methods in Particle Physics 19
G. Cowan Statistical Methods in Particle Physics 20
G. Cowan Statistical Methods in Particle Physics 21
G. Cowan Statistical Methods in Particle Physics 22
G. Cowan Statistical Methods in Particle Physics 23
G. Cowan Statistical Methods in Particle Physics 24
G. Cowan Statistical Methods in Particle Physics 25
Boosted decision trees
G. Cowan Statistical Methods in Particle Physics page 26
Particle i.d. in MiniBooNEDetector is a 12-m diameter tank of mineral oil exposed to a beam of neutrinos and viewed by 1520 photomultiplier tubes:
H.J. Yang, MiniBooNE PID, DNP06H.J. Yang, MiniBooNE PID, DNP06
Search for to e oscillations
required particle i.d. using information from the PMTs.
G. Cowan Statistical Methods in Particle Physics page 27
Decision treesOut of all the input variables, find the one for which with a single cut gives best improvement in signal purity:
Example by MiniBooNE experiment,B. Roe et al., NIM 543 (2005) 577
where wi. is the weight of the ith event.
Resulting nodes classified as either signal/background.
Iterate until stop criterion reached based on e.g. purity or minimum number of events in a node.
The set of cuts defines the decision boundary.
G. Cowan Statistical Methods in Particle Physics page 28
G. Cowan Statistical Methods in Particle Physics page 29
G. Cowan Statistical Methods in Particle Physics page 30
G. Cowan Statistical Methods in Particle Physics page 31
G. Cowan Statistical Methods in Particle Physics page 32
BDT example from MiniBooNE~200 input variables for each event ( interaction producing e, or
Each individual tree is relatively weak, with a misclassification error rate ~ 0.4 – 0.45
B. Roe et al., NIM 543 (2005) 577
G. Cowan Statistical Methods in Particle Physics page 33
Monitoring overtraining
From MiniBooNEexample:
Performance stableafter a few hundredtrees.
G. Cowan Statistical Methods in Particle Physics page 34
Monitoring overtraining (2)
G. Cowan Statistical Methods in Particle Physics page 35
Comparison of boosting algorithmsA number of boosting algorithms on the market; differ in theupdate rule for the weights.
G. Cowan Statistical Methods in Particle Physics page 36
G. Cowan Statistical Methods in Particle Physics page 37
Single top quark production (CDF/D0)Top quark discovered in pairs, butSM predicts single top production.
Use many inputs based on jet properties, particle i.d., ...
signal(blue +green)
Pair-produced tops are now a background process.
G. Cowan Statistical Methods in Particle Physics page 38
Different classifiers for single top
Also Naive Bayes and various approximations to likelihood ratio,....
Final combined result is statistically significant (>5 level) but not easy to understand classifier outputs.
G. Cowan Statistical Methods in Particle Physics page 39
G. Cowan Statistical Methods in Particle Physics page 40
G. Cowan Statistical Methods in Particle Physics page 41
G. Cowan Statistical Methods in Particle Physics page 42
G. Cowan Statistical Methods in Particle Physics page 43
G. Cowan Statistical Methods in Particle Physics page 44
G. Cowan Statistical Methods in Particle Physics page 45
G. Cowan Statistical Methods in Particle Physics page 46
G. Cowan Statistical Methods in Particle Physics page 47
G. Cowan Statistical Methods in Particle Physics page 48
G. Cowan Statistical Methods in Particle Physics page 49
G. Cowan Statistical Methods in Particle Physics page 50
G. Cowan Statistical Methods in Particle Physics page 51
Using an SVMTo use an SVM the user must as a minimum choose
a kernel function (e.g. Gaussian)any free parameters in the kernel (e.g. the of the Gaussian)a cost parameter C (plays role of regularization parameter)
The training is relatively straightforward because, in contrast to neuralnetworks, the function to be minimized has a single global minimum.
Furthermore evaluating the classifier only requires that one retainand sum over the support vectors, a relatively small number of points.
The advantages/disadvantages and rationale behind the choices above is not always clear to the particle physicist -- help needed here.
G. Cowan Statistical Methods in Particle Physics page 52
SVM in particle physicsSVMs are very popular in the Machine Learning community but haveyet to find wide application in HEP. Here is an early example froma CDF top quark anlaysis (A. Vaiciulis, contribution to PHYSTAT02).
signaleff.
G. Cowan Statistical Methods in Particle Physics page 53
Summary on multivariate methodsParticle physics has used several multivariate methods for many years:
linear (Fisher) discriminantneural networksnaive Bayes
and has in the last several years started to use a few morek-nearest neighbourboosted decision treessupport vector machines
The emphasis is often on controlling systematic uncertainties betweenthe modeled training data and Nature to avoid false discovery.
Although many classifier outputs are "black boxes", a discoveryat 5 significance with a sophisticated (opaque) method will win thecompetition if backed up by, say, 4 evidence from a cut-based method.
Quotes I like
“If you believe in something you don't understand, you suffer,...”
– Stevie Wonder
“Keep it simple.As simple as possible.Not any simpler.”
– A. Einstein
G. Cowan page 54Statistical Methods in Particle Physics
G. Cowan Statistical Methods in Particle Physics page 55
Extra slides
G. Cowan Statistical Methods in Particle Physics page 56
G. Cowan Statistical Methods in Particle Physics page 57
G. Cowan Statistical Methods in Particle Physics page 58
G. Cowan Statistical Methods in Particle Physics page 59