15th september 2005phystat 05, oxford 1 statistics in root rené brun, anna kreshuk, lorenzo moneta...

26
15th September 2005 PHYSTAT 05, Oxford 1 Statistics in ROOT René Brun, Anna Kreshuk, Lorenzo Moneta PH/SFT group, CERN http://root.cern.ch ftp://root.cern.ch/root/phystat05.ppt

Post on 22-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

15th September 2005 PHYSTAT 05, Oxford 1

Statistics in ROOTRené Brun, Anna Kreshuk, Lorenzo Moneta

PH/SFT group, CERN

http://root.cern.chftp://root.cern.ch/root/phystat05.ppt

PHYSTAT 05, Oxford 215th September 2005

Contents

User interface Data storage and access Analysis Visualization New Math libraries Future plans

PHYSTAT 05, Oxford 315th September 2005

ROOT’s user interface

C++ in batch mode

C++ interpreted code with CINT – the C++ interpreter in the command line:

loading a macro:

C++ compiled code via CINT

Python: Access to ROOT from Python

Access to Python from ROOT

>>> from ROOT import TLorentzVector>>> l = TLorentzVector

root [0] TPython::LoadMacro(“MyPyClass.py”);

root [1] MyPyClass mpc;

root -b -q myMacro.C > myMacro.log

root[0] for (int i=0; i<10; i++) cout<<“hello ”<<i<<endl;

root[1] .L mySmallMacro.C;root[2] myFunction(1, 2, 3);

root[] .L myScript.C+Creating shared library /home/…/MyScript_C.so

PHYSTAT 05, Oxford 415th September 2005

ROOT and external libraries

Using external libraries from ROOT: rootcint – utility to link compiled C/C++ objects with

CINT C/C++ interpreter Example:

In the Makefile of MyLibrary, rootcint generates the dictionary for MyClass

Load and use MyLibrary in a ROOT session:

root[] .L MyLibrary.soroot[] MyClass *mc = new MyClass();

PHYSTAT 05, Oxford 515th September 2005

Data storage and access

• Allows to analyze Terabytes of data• Can select entries from different physical locations and collect them into the analysis dataset

Dataset to analyze

V1 V2 …………V23 ………….....V99 Branches of a TTree are read independently,so the variables not needed for the analysis are not loaded into memory

TTree1

TTree2

TTreeN

PHYSTAT 05, Oxford 615th September 2005

Histograms

1-2-3 dimensional histograms Errors for each bin can be computed:

Default: as sqrt(bin content) As sqrt(sum of squares of weights of the bin)

1-2 dimensional profile histograms Mean value of Y and its standard deviation for each bin in X

PHYSTAT 05, Oxford 715th September 2005

Analysis of TTrees

TTree::Draw method and TTreeViewer - an easy way to examine the tree: Producing histograms of user-defined expressions in up to 4 dimensions Expressions – C++ formulas Selections – expressions, user-defined macros or graphical cuts

Examples: Tree.Draw(“sqrt(x):y”, “x>0 && y<1”);Tree.Draw(“2*TMath::Log(x)”, cut1 || cut2);

PHYSTAT 05, Oxford 815th September 2005

Fitting - interface

Minimization packages: Minuit and Fumili Fitting can be done:

Directly in those packages with a user-defined function to minimize Through the general interface of

TH1::Fit (binned data) – Chisquare and Loglikelihood methods TGraph::Fit (unbinned data) TGraphErrors::Fit (data with errors) TGraphAsymmErrors::Fit (taking into account asymmetry of errors) TTree::Fit and TTree::UnbinnedFit

RooFit package for object-oriented data modeling. Distributed with ROOT

starting from version 5.02-00

PHYSTAT 05, Oxford 915th September 2005

Linear Fitting (1)

New class TLinearFitterUsed to fit functions linear in the parameters10-15 times faster than Minuit, depending on

the fitting functionSimple to use in a multidimensional case

Example:

Expressions with such syntax can be used in all the Fit interface functions

lfitter.SetFormula(“1 ++ x0 ++ sqrt(x1) ++ exp(x2) ++ x3 ++ x4”);

PHYSTAT 05, Oxford 1015th September 2005

Linear Fitting (2)

Based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals

Robust least trimmed squares fitting

High breakdown point – smallest proportion of outliers that can cause the estimator to produce values arbitrarily far from the true parameters

Graph.Fit(“pol3”, “rob=0.75”, -2, 2); 2nd parameter – fraction h of the good points

PHYSTAT 05, Oxford 1115th September 2005

Smoothing and peak finding

TSpectrum class: 1 and 2-dim background

estimation smoothing deconvolution peak search and fitting

Graph smoothers: Kernel smoother Lowess “Super smoother”

Splines – cubic and quintic

PHYSTAT 05, Oxford 1215th September 2005

Multivariate methods (1)

Minimum Covariance Determinant Estimator – a highly robust estimator of multivariate location and scatter

Class TRobustEstimator

High breakdown point Algorithm similar to Least Trimmed Squares regression

PHYSTAT 05, Oxford 1315th September 2005

Multivariate methods (2)

TPrincipal - principal components analysis TMultiDimFit – approximates a

multidimensional function with monomials, Chebyshev or Legendre polynomials

TMultiLayerPerceptron – a neural networks class

All multivariate methods can take input data from a TTree

PHYSTAT 05, Oxford 1415th September 2005

Confidence intervals

TLimit – computes 95% C.L. limits using the Likelihood ratio semi-Bayesian method

TRolke – computes confidence intervals for the rate of the Poisson in the presence of background and efficiency with a fully frequentist treatment of uncertainties.

TFeldmanCousins – calculate the C.L. upper limit using the Feldman-Cousins method

PHYSTAT 05, Oxford 1515th September 2005

Small useful algorithms

In the namespace TMath:Most probability distribution functions, their

densities and inversesSpecial functionsMean and Median – also for weighted

datasets, Variance and K-th order statisticKolmogorov-Smirnov test

PHYSTAT 05, Oxford 1615th September 2005

Linear algebra and quadratic programming Linear algebra package:

General, symmetric and sparse matrices

Matrix decompositions Eigenvalue analysis

Quadratic programming library: Dense and sparse data Gondzio and Mehrotra

solving methods

PHYSTAT 05, Oxford 1715th September 2005

Graphs

1-d: TGraph TGraphErrors TGraphAsymmErrors TMultiGraph – a collection

of graphs

2-d: TGraph2D TGraph2DErrors

PHYSTAT 05, Oxford 1815th September 2005

ROOT Math Packages

PHYSTAT 05, Oxford 1915th September 2005

MathCore

Library with the basic Math functionality build-able as a standalone library

no dependency on others ROOT packages no external dependency

Main content of MathCore: Basic and commonly used mathematical functions

Special and statistics (pdf, cdf) functions Interfaces to function and algorithm classes

Basic implementation of some numerical algorithms 3D and LorentzVectors Random numbers

PHYSTAT 05, Oxford 2015th September 2005

MathMore

Library with extra mathematical functionalities Current content:

C++ interface to functions and algorithms from the Gnu Scientific Library (GSL)

Mathematical functions implemented using GSL Algorithms currently present:

adaptive numerical integration, derivation, root finders, interpolation,1D minimization

repository for needed and useful extra Math functionality could include other useful math libraries

PHYSTAT 05, Oxford 2115th September 2005

Summary and Future plans

First versions of MathCore and MathMore libraries are being released Transition phase, over in 2-3 months

Next addition will be new random number package Improvement of the fitting interface Statistical algorithms to add:

sPlot Loess - locally weighted polynomial regression Cluster analysis Boxplot and spiderplot

Interface with R?

PHYSTAT 05, Oxford 2215th September 2005

Mathematical Functions

Special functions use proposed C++ standard interface:

double cyl_bessel_i (double nu, double x);

Statistical functions Probability density functions (pdf) Cumulative dist. (lower tail and upper tail) Inverse of cumulative distributions Coherent naming scheme (also proposed to C++

standard) chisquared_pdf, chisquared_prob, chisquared_quant, Chisquared_prob_inv, chisquare_quant_inv

PHYSTAT 05, Oxford 2315th September 2005

Mathematical Functions (cont)

New functions with better precision than old one in ROOT Extensive tests of numerical accuracy Comparison with other libraries (Nag, Mathematica)

PHYSTAT 05, Oxford 2415th September 2005

Numerical Algorithm

New C++ classes and interfaces for describing algorithms and functions

Integrator classes Implementation based on GSL (QGS) for

definite and indefinite integration Move of functionality currently in ROOT

TF1 inside new classes in MathCoreEasier to use for all clients

PHYSTAT 05, Oxford 2515th September 2005

Physics and Geometry Vectors

Classes for 3D Vectors and LorentzVectors with their operations and transformations Merge old ROOT and CLHEP

New classes with cleaner interfaces, generic on the scalar type and the based coordinates (cartesian, polar, cylindrical, etc..)

Classes for 3D rotations and Lorentz transformations Have also rotations based on quaternion

Work done in collaboration with Fermilab group

PHYSTAT 05, Oxford 2615th September 2005

Minimization

New C++ version of Minuit being introduced in ROOT Same algorithms translated in C++ plus some added

functionality Fumili minimizer, single side bounds

Going under extensive validation tests

before after