(fast) machine learning at the lhc · 09/10/19 fast ml, ml at the lhc, j.-r. vlimant 31 ml in...

45
(Fast) Machine Learning at the LHC (or rather at the experiments at the LHC) Jean-Roch Vlimant [email protected] @vlimant

Upload: others

Post on 31-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

(Fast) Machine Learning at the LHC

(or rather at the experiments at the LHC)

Jean-Roch [email protected] @vlimant

Page 2: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

Outline

I.Physics at the LHCII.The Case for Machine LearningIII.Applying ML at the LHCIV.Fast ML at the LHC

Page 3: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 3

High Energy PhysicsEndeavor

In a nutshell

Page 4: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 4

Big Science Pipeline

LHC Computing Grid 200k cores pledge toCMS over ~100 sites

CMS Detector1PB/s

CMS L1 & High-Level Triggers

50k cores, 1kHz

Large Hadron Collider40 MHz of collision

CERN Tier-0 Computing Center20k cores dedicated

CERN Tier-0/Tier-1 Tape Storage

200PB total LHC Grid Remote Access to 100PB of data

Rare SignalMeasurement~1 out of 106

Page 5: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 5

The Large Hadron Collider

8.5 kilometers

26.7 km accelerator colliding 6.5 TeV protons beams.Beams prepared by LINAC2, PSB, PS, SPS

Geneva, Switezrland

Page 6: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 6

Colliding Hadrons

Probing fundamental laws of physics as large spectrum ofparticles (known and unknown) can be produced

Page 7: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 7

The Standard Model

Well demonstrated effective model. We can predict most of the observations.We can use a large amount of simulation.

Page 8: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 8

Simulating Hadron Collisions

Event Generator: compute predictions of the standard models toseveral orders of expansion in coupling constants (LO, NLO,NNLO, ...) using proton density functions.

Hadronization: phenomenological model of the evolution ofhadrons under the effect of QCD.

Material simulator: transports all particles throughout meters ofdetector, using high resolution geometrical description of thematerials.

Electronic emulator: converts simulated energy deposits insensitive material, into the expected electronic signal, includingnoise from the detector.

Madgraph,Pythia,Sherpa, ...

Pythia, ...

GEANT 4,GEANT V

Homegrownsoftware

Non-differentiable sequence of complex simulators ofthe signal expected from the detectors.

Page 9: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 9

Size Of The Challenge

1 eventevery

500.000 proton collision

Low probability of producing exotic and interesting signals.Observe rare events from a large amount of data.

Page 10: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 10

Size Of The Challenge

Low probability of producing exotic and interesting signals.Observe rare events from a large amount of data.

Page 11: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 11

CMS Detector

Page 12: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 12

CMS 100 Megapixel Camera

Page 13: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 13

CMS Readout

Highly heterogeneous system Raw data is 100M channelssampled every 25 ns : 1Pb/s50EB per day in readout and

online processing.

Page 14: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 14

Event Filtering

Ultra fast decision to keep the relevant data.In hardware and software.

1000 Gb/s1000

Gb

/s

105 H

z

40 M

Hz

1-3

kHz

L1 HLT

Page 15: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 15

Looking for the Unkown

Searches for new Physics(SUSY, BSM, ...) scan over alarge number of potentialsignal models

➢ We don't know exactly whatto expect

➢ Tedious search procedure➢ How to trigger on the

unknown

https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsSUS

Page 16: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 16

Computing GridTier0

CERNComputer Center

IN2P3 T1 INFN T1FNAL T1... ...

10 – 40 to 100 Gb/s

300-1500MB/s

T2 T2

T2 T2

T2 T2T2 T2

T2 T2T2 T2

● Hundreds of computer centers (100-10k cores per site)● Increased use as a cloud resources (any job anywhere)● Increasing use of additional cloud and HPC resource● Real time data processing at Tier0● Data and Simulation production at Tier1 and Tier2● High bandwidth networks between disk storage

10 – 40 to 100 Gb/s 10 – 40 to 100 Gb/s

T3

T3 T3

T3

10 to Nx10 Gb/s

Page 17: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 17

Take home message :

Measure rare and exotic, unknownprocesses out of orders of magnitude larger

backgrounds.

The Standard Model predicts with precisionwhat to expect from many processes.

Reconstruct, identify and reject large amountof event within resource constraints.

Operate a complex ensemble of complexsystems from LHC to analysis.

Page 18: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 18

High Energy Physics Data Representation

With bias on CMS

Page 19: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 19

A Journey Through Matter

Particles leave hints of their passage in sub-detectors.Specific (but overlapping) pattern for each particle type.

Page 20: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 20

What is a Jet

Quark&gluons hadronize as they propagate.Any particle decaying in quark/gluons will result in a “jet” of particles

in the direction of the original particle.Ambiguities on the original particle gets worse in boosted systems.

Page 21: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 21

From RAW to High Level dataDetector

DataDetector Data Local

reconstructionJet ClusteringParticle

representationHigh levelfeatures

The reconstruction of an event goes from the digitalsignal of the individual sub-detector to a sequence of

particles, jets, and high-level features

Event Processing

Dimensionality reduction

Globalization of information

Page 22: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 22

What is an Event

One collision-event every 25 ns / 40MHzAdd 40 such on top of each other currently. Up to 200 such overlay in the horizon 2025.

Page 23: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 23

Take home message :

Complex geometry of detectors, whereparticle leaves significative pattern of energy.

Up to 200 (averaged) overlay of collisions ina single event snapshot.

Multiple level of data representation.

Event reconstruction is mostly patternrecognition tasks.

Page 24: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 24

The Case for MachineLearning

Page 25: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 25

Operation Vectorization

ANN ≡ matrix operations ≡ parallelizable

Computation of prediction from artificial neural networkmodel can be vectorized to a large extend.

Multiple computing architecture well suited for inference(GPU, TPU, FPGA, ...)

Page 26: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 26

Learning from Complexity

“Simple” machine learning model can extract informationfrom complex dataset.

More classical algorithm counter part may take years of development.

Page 27: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 27

Physics Knowledge

Machine Learning can help understand Physics.We can make better models with Physics.

P. Komiske, E. Metodiev, J. Thaler, https://arxiv.org/abs/1810.05165

Page 28: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 28

Possible Utilizations

Accuracy Speed

Interpretable

➔ Fast surrogate models (trigger, simulation, ...) for computingrestricted algorithms.

➔ Model more accurate than existing algorithms (tagging, ...)➔ Model performing otherwise impossible tasks (operations, ...)

Page 29: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 29

Take home message :

Machine learning offers the possibility forbetter Physics-per-dollar ratio.

Increased sensitivity of algorithms byharnessing the complexity in the data.

Extracting Physics knowledge from data.

Page 30: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 30

Machine Learning atthe LHC

Only a selected few examples flashed, much more on-going effort not shown here

Page 31: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31

ML in Simulation

● Non-differentiable simulators have large number ofparameters

➢ Machine learning is used for tuning➢ Machine learning can be used to infer posteriors over

parameters

● Simulation is the second most computing intensive taskat the LHC experiments

➢ Lots of development towards using generativemodels as fast surrogate

➢ Potential for extreme speed ups (>1000x) for part ofthe simulation

➢ Wide range of simulated products: analysis levelfeatures, particle set, calorimeter, ...

Page 32: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 32

Generative models

● Use of adversarial generative network, variational auto-encoders to simulate EM calorimeter showers

● On the verge of reaching acceptable fidelity

https://indico.cern.ch/event/708041/contributions/3270775/

One of many ongoing worksin simulation

Page 33: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 33

Trigger

● Machine learning models used to classify/triggerevents in real-time

➢ LHCb turbo-stream➢ ML for event reconstruction

● Lots of potential applications of ML➢ Improved background rejection➢ Surrogate reconstruction algorithms➢ New signal detection➢ Decision control➢ ...

Page 34: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 34

New Physics Mining

● Variational Auto-Encoder learns arepresentation of the standard modelprocesses

● Can potentially help identify and recordunexpected events

Unexpected!

https://arxiv.org/abs/1811.10276

One of many ongoing worksin trigger

Page 35: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 35

Reconstruction

● Machine learning used in energy regression andclassification at many levels

➢ Jet tagging as a showcase of deep learning.➢ Local reconstruction, particle id, ...

● Development of deep learning applications, takingadvantage of rawest complex data

➢ Local reconstruction (calorimeter, ...)➢ Object identification (including jet tagging)➢ Energy regression➢ Tracking, vertexing, ...➢ Energy clustering, particle flow, ...➢ Noise cancelling, pileup mitigation, ...➢ ...

Page 36: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 36

Charged Particle Tracking

● Tracker hits form graph, using simple geometrical constraints● Graph neural network and message passing network achieve

classification of good edges● Promising approach on TrackML dataset at 200PU

https://arxiv.org/abs/1810.06111

One of many ongoing works

in reconstruction

Page 37: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 37

Analysis

● Machine learning long been used in signal-background classification

➢ Increased sensitivity of many analysis, betterPhysics-per-dollar

● Continuous development of new methods usingmachine learning for analysis

➢ Multi-signal, multi-background categorisations➢ Background modeling➢ New physics searches➢ Inverse problem➢ ...

Page 38: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 38

Likelihood-Free Inference

Slide K. Cranmer

One of many ongoing works in analysis

Page 39: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 39

Operation

● “Operation” covers a wide range of environments:detector control, data acquisition systems, onlinemonitoring, computing centers, networking, storage,transfers, ...

● Potential for more efficient operation and automatemanpower intensive tasks

● Large amount of mostly untouched analytics fromlots of appliance. Sometimes difficulty withunstructured data and/or lack of clear objectives, ...

Page 40: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 40

Online Anomaly Detection

https://arxiv.org/abs/1808.00911

Unsupervised and supervised methods to identifyalarming patterns in the muon drift tubes chambers.

One of many ongoing worksin operation

Page 41: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 41

Take home message :

Machine learning has long been used inseveral tasks (classification, regression, ...)

Many promising R&D projects using deeplearning on more complex tasks.

With aim at better accuracy, or better resourceefficiency (or both ...).

Fast inference engine is a must have.

Page 42: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 42

Fast Machine Learning

Page 43: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 43

Inference

➢ Adding more machine learning models in eventreconstruction may increase the computationrequirements for the experiments. Fast inference engine on standard CPU

➢ Heterogenous computing facilities (HPC, HLT, ...) arebeing integrated to the LHC workflows” : Fast inferenceengine on hosted accelerators (GPU, FPGA, ...)

➢ Cloud/Edge resources might be easier to integrate inlarge facilities : offloading computation to remoteaccelerators (GPU, TPU, FPGA, ...) ,

Page 44: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 44

Training

➢ Data locality, resource access, training recipes, ... aremaking training models sometimes complicated andlimit the development of machine learning in the field :training as a service on dedicated resources.

➢ Complex models needing ever growing datasets maytake days to week to converge. Hinder turnaround timeand fast development : distributed training &optimization to increase productivity.

Page 45: (Fast) Machine Learning at the LHC · 09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 31 ML in Simulation Non-differentiable simulators have large number of parameters Machine learning

09/10/19 Fast ML, ML at the LHC, J.-R. Vlimant 45

Summary

● The LHC is a large play ground for advancedmachine learning

● Specific challenges related to data representation,detector complexity, computation restrictions, ...

● Fast inference is mandatory for triggering systems.● Affordable inference needed in order to deploy

machine learning in event reconstruction.● Fast training desirable to speed up development

cycles.