multivariate methods in hep

39
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab

Upload: posy

Post on 07-Jan-2016

65 views

Category:

Documents


0 download

DESCRIPTION

Multivariate Methods in HEP. Pushpa Bhat Fermilab. Outline. Introduction/History Physics Analysis Examples Popular Methods Likelihood Discriminants Neural Networks Bayesian Learning Decision Trees Future Issues and Concerns Summary. Some History. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1

Multivariate Methods in HEP

Pushpa Bhat Fermilab

Page 2: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 2

Outline

• Introduction/History• Physics Analysis Examples• Popular Methods

• Likelihood Discriminants• Neural Networks• Bayesian Learning• Decision Trees

• Future• Issues and Concerns• Summary

Page 3: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 3

Some History

• In 1990 most of the HEP community was skeptical towards use of multivariate methods, particularly so in case of neural networks (NN)• NN as a black box

Can’t understand weightsNonlinear mapping; higher order correlations Though mathematical function can’t explain in terms of physicsCan’t calculate systematic errors reliably

Uni-variate or “cut-based” analysis was the norm • Some were pursuing application of neural network methods to HEP

around 1990• Peterson, Lonnblad, Denby, Becks, Seixas, Lindsey, etc

• First AIHENP (Artificial Intelligence in High Energy & Nuclear Physics) workshop was in 1990.• Organizers included D. Perret-Gallix, K.H. Becks, R. Brun, J.Vermaseren. AIHENP metamorphosed into ACAT ten years later, in 2000

• Multivariate methods such as Fisher discriminants were in limited use.• In 1990, I began to pursue the use of multivariate methods, especially

NN, in top quark searches at Dzero.

Page 4: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 4

Mid-1990’s

• LEP experiments had been using NN and likelihood discriminants for particle-ID applications and eventually for signal searches (Steinberger; tau-ID)

• H1 at HERA successfully implemented and used NN for triggering (Kiesling).

• Hardware NN was attempted at Fermilab at CDF• Fermilab Advanced Analysis Methods Group

brought CDF and DØ together for discussion of these methods and applications in physics analyses.

Page 5: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 5

The Top QuarkPost-Evidence, Pre-Discovery !

Fisher Analysis of tte channel

One candidate event (S/B)(mt = 180 GeV)

= 18 w.r.t. Z = 10 w.r.t WW

NN Analysis tt e+jets channeltt

W+jets

W+jetstt160 Data

P. Bhat, DPF94

Page 6: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 6

Cut Optimization for Top Discovery Feb. ‘95

Signal

BackgroundJan. ’95

(Aspen) cut

Mar. ’95Discovery cut

Contours: Possible NN cuts Feb. ‘95

Sig. Eff.

S/B (Feb-Mar, 95 -Discovery

Conventional cut)

S/B reach with 2-v NN analysisfor similar efficiency

(Jan, 95 –Aspen mtg.Conventional cut)

Neural Network Equi-probability Contour cuts from 2-variable analysis compared with conventional cuts used in Jan. ’95 and in Observation paper

P. Bhat, H.Prosper, E. AmidiD0 Top Marathon, Feb. ‘95

Page 7: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 7

Measurement of the Top Quark Mass

Discriminant variables

mt = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c2

The DiscriminantsThe Discriminants

DØ Lepton+jetsDØ Lepton+jets

Fit performed in 2-D: (DLB/NN, mfit)

Run I (1996) result with NN and likelihoodRecent (CDF+D0) mt measurement:

mt= 171.4 ± 2.1 Gev/c2

First significant physics result using multivariate methods

Page 8: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 8

Higgs, the Holy Grail of HEPDiscovery Reach at the Tevatron

• The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis

• Improved bb mass resolution & b-tag efficiency crucial

Run II Higgs study hep-ph/0010338 (Oct-2000)P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022

Page 9: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 9

Then, it got easier

• One of the important steps in getting the NN accepted at the Tevatron experiments was to make the Bayesian connection.

• Another important message to drive home was “the maximal use of information in the event” for the job at hand

• Developed a random grid search technique that can be used as baseline for comparison

• Neural network methods now have become popular due to the ease of use, power and many successful applications

Maybe too easy??

Page 10: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 10

Optimal Event Selection

x

r(x,y) = constant defines an optimaldecision boundary

r(x,y) = constant defines an optimaldecision boundary

Feature spaceFeature space

),|(

),|(

)()|,(

)()|,(),(

yxbp

yxsp

bpbyxp

spsyxpyxr

),|(

),|(

)()|,(

)()|,(),(

yxbp

yxsp

bpbyxp

spsyxpyxr

S = B =

Conventional cutsx x

y y

0

0

y

0y

x0

x

y

x

y

0x

0y

Page 11: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 11

The NN-Bayesian Connection

Output of a feed forward neural network can approximate the posterior probability P(s|x1,x2).

r

rxspxy

1)|()ˆ,(

1x

2x

)ˆ,,( 21 xxy

))P(|P(x

))P(|P(x )x |( 11

1ii CC

CCCP

)()|(

)()|(

bpbxp

spsxpr

Page 12: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 12

Limitations of “Conventional NN”

• The training yields one set of weights or network parameters• Need to look for “best” network, but avoid overfitting

• Heuristic decisions on network architecture• Inputs, number of hidden nodes, etc.

• No direct way to compute uncertainties

Page 13: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 13

Ensembles of Networks

NN1

NN2

NN3

NNM

X

y1

y2

y3

yM

)(xyayi

ii

Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.

Page 14: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 14

Bayesian Learning

• The result of Bayesian training is a posterior density of the network weights

P(w|training data) • Generate a sequence of weights (network

parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:

K

1knew

1),( kwxy

Ky

Page 15: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 15

Bayesian Learning – 2

• Advantages• Less prone to over-fitting• Less need to optimize the size of the network. Can use a

large network! Indeed, number of weights can be greater than number of training events!

• In principle, provides best estimate of p(t|x)p(t|x)

• Disadvantages• Computationally demanding!

• The dimensionality of the parameter space is, typically, large • There could be multiple maxima in the likelihood function p(t|

x,w), or, equivalently, multiple minima in the error function E(x,w).

Page 16: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 16

Example: Single Top Search

• Training Data• 2000 events (1000 tqb- + 1000 Wbb-)• Standard set of 11 variables

• Network• (11, 30, 1) Network (391391 parameters!)

• Markov Chain Monte Carlo (MCMC)• 500 iterations, but use last 100 iterations • 20 MCMC steps per iteration• NN-parameters stored after each iteration• 10,000 steps• ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)

Page 17: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 17

Signal:tqb; Background:Wbb Distributions

Example: Single Top Search

Page 18: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 18

Page 19: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 19

Decision Trees

• Recover events that fail criteria in cut-based analyses• Start at first “node” with a fraction of the “training

sample” • Select best variable and cut with best separation to

produce two “branches ” of events, (F)ailed and (P)assed cut

• Repeat recursively on successive nodes• Stop when improvement stops or when too few events

are left • Terminal node is called a “leaf ” with purity =

Ns/(Ns+Nb)• Run remaining events and data through the tree to

derive results• Boosting DT:

• Boosting is a recently developed technique that improves any weak classifier (decision tree, neural network, etc)

• Boosting averages the results of many trees, dilutes the discrete nature of the output, improves the performance

DØ single topanalysis

Page 20: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 20

Matrix Element MethodExample: Top mass measurement

• Maximal use of information in each event by calculating event-by-event signal and background probabilities based on the respective matrix element

x: reconstructed kinematic variables of final state objectsJES: jet energy Scale from Mw constraint

• Signal and background probabilities from differential cross sections

• Write combined likelihood for all events

• Maximize likelihood w.r.t. mtop, JES

Page 21: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 21

Summary

• Multivariate methods are now used extensively in HEP data analysis

• Neural networks, because of their ease of use and power, are favorites for particle-ID and signal/background discrimination

• Bayesian neural networks take us one step closer to optimization

• Likelihood discriminants and Decision trees are becoming popular because they are easier to “defend” (no “black-box” stigma)

• Many issues remain to be addressed as we get ready to deploy the multivariate methods for discoveries in HEP

Page 22: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 22

Nothing tends so much to the advancement of knowledge as the application of a new instrument - Humphrey Davy

No amount of experimentation can ever prove me right; a single experiment can prove me wrong. - Albert Einstein

Page 23: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 23

CDF

CDF

DØDØ

Booster

World’s Highest Energy Laboratory

(for now)

Page 24: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 24

Our Fancy New Toys

LHC Ring

SPS Ring

PS

Circumference = 27kmBeam Energy = 7.7 TeVLuminosity =1.65x1034 cm-2sec-1

Startup date: 2007

p p

LHC Magnet LHC Tunnel

TI 2TI 2

TI 8TI 8

The Large Hadron Collider

CMS

Page 25: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 25

LHC Environment

14 TeV Proton Proton colliding beams

Parameter ValueBunch-crossing frequency 40 MHz

Average # of collisions / crossing

20

“interaction rate” ~109

Average # of charged tracks

1000

Radiation field severe

CMS Parameter ValueLevel-1 trigger rate 100 kHz

Mean time between triggers

10 sec

Trigger latency 3.2 sec

Solenoid field 4 T

Page 26: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 26

CMS Silicon Tracker

Challenges

Page 27: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 27

CMS Si Tracker

5.4 m

2,4

m

Inner Barrel & Disks

(TIB & TID)

PixelsOuter Barrel (TOB)

Page 28: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 28

Lots of Silicon

214m2 of silicon sensors11.4 million silicon strips66 million pixels!

Page 29: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 29

Si Tracker Challenges

• Large and complex system• 77.4 million total channels (out of a total of 78.2 M for

experiment)• Detector monitoring, data organization, data quality monitoring,

analysis, visualization, interpretation all daunting!

• Need to monitor every channel and make sure most of the detector is working at all times (live fraction of the detector and efficiencies bound to decrease with time)

• Need to verify data integrity and data quality for physics• Diagnose and fix problems ASAP• Keep calibration and alignment parameters current

Page 30: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 30

Detector/Data Monitoring

• Monitor• Environmental variables

• Temperatures, coolant flow rates, interlocks, radiation doses

• Hardware status• Voltages, currents

• Channel Data• Readout states, Errors, missing data/channels, bad ID for

channel/modulemany kinds to be categorized and tracked and displayedshould be able to find rare problems/errors (with low

occurrence rate) that may corrupt data Problems (Rare problems may indicate a developing failure mode or hidden bad behavior)

Correlate problem/noisy channels with history, temperature, currents, etc.

Page 31: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 31

Data Quality Monitoring

• Monitor• Raw Data

• Pedestals, noise, adc counts, occupancies, efficiencies• Processed high level objects

• Clusters, tracks, etc.• Evaluate thousands of histograms

• Can’t visually examine all• Automatically evaluate histograms by comparing to reference

histograms • Adaptive, efficient, find evolving patterns over time

• Quantiles? q-q plots/comparison instead of KS test?• A variety of 2D “heat” maps

• Occupancies, #of bad channels/module, #of errors/module, etc.

• Typical occupancy ~ 2% in strip tracker• 200,000 channels written out 100 times/sec

Page 32: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 32

Module Assembly Precision

Example of a“Heat” map

Page 33: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 33

Need smart approaches

• What are the best techniques for data-mining?• To organize data for analysis and data visualization

• complex geometry/addressing makes visualization difficult

• For finding problematic channels quickly, efficiently clustering, exploratory data-mining

• For finding anomalies, corrupt data, patterns of behaviorFeature-finding algorithms, superpose many events, time

evolution, spatial and temporal correlations

• Noise Correlations • Via correlation coefficients of defined groups• Correlate to history (time variations), environmental

variables

Page 34: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 34

Data Visualization

• Based on hierarchical/geometrical structure of the tracker• Display every channel, attach objects/info to each

Sub-structuresLayers/ringsModulesReadout Chips

Page 35: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 35

Multivariate Analysis Issues

• Dimensionality Reduction• Choosing Variables optimally without losing information

• Choosing the right method for the problem• Controlling Model Complexity• Testing Convergence• Validation

• Given a limited sample what is the best way?

• Computational Efficiency

Page 36: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 36

Multivariate Analysis Issues

• Correctness of modeling• How do we make sure the multivariate modeling is

correct? • The data used for training or building PDEs represent reality.

Is it sufficient to check the modeling in the mapped variable? Pair-wise correlations? Higher order correlations?

• How do we show that the background is modeled well? How do we quantify the correctness of modeling?

• In conventional analysis, we normally look for variables that are well modeled in order to apply cuts

• How well is the background modeled in the signal region?

• Worries about hidden bias• Worries about underestimating errors

Page 37: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 37

Sociological Issues

• We have been conservative in the use of MV methods for discovery.

• We have been more aggressive in the use of MV methods for setting limits.

• But discovery is more important and needs all the power you can muster!

• This is expected to change at LHC.

Page 38: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 38

Summary

• The next generation of experiments will need to adopt advanced data mining and data analysis techniques

• Conventional/routine tasks such as alignment, detector performance and data quality monitoring and data visualization will be challenging and require new approaches

• Many issues regarding use of multivariate methods of data analysis for discoveries and measurements need to be addressed to make optimal use of data

Page 39: Multivariate Methods in HEP

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 39

MV: Where can we use them?

• Almost everywhere since HEP events are multivariate• Improve several aspects of analysis

• Event selection• Triggering, Real-time Filters, Data Streaming

• Event reconstruction• Tracking/vertexing, particle ID

• Signal/Background Discrimination• Higgs discovery, SUSY discovery, Single top, …

• Functional Approximation• Jet energy corrections, tag rates, fake rates

• Parameter estimation• Top quark mass, Higgs mass, SUSY model parameters

• Data Exploration• Knowledge Discovery via data-mining• Data-driven extraction of information, latent structure analysis