multivariate methods of data analysis in cosmic ray astrophysics a. chilingarian, a. vardanyan...

22
Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

Post on 22-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics

A. Chilingarian, A. Vardanyan

Cosmic Ray Division, Yerevan Physics Institute, Armenia

Page 2: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Topics

Main tasks to be solved in cosmic ray astrophysicsAnalysis methods

Preprocessing and indication of the best parametersNeural Networks for the main data analysis

Multi Start Random Search learning algorithmTraining, Validation and Generalization errorsOvertraining control

Page 3: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Individual event weights

Results of NN classification and estimation

Examples of applications

Page 4: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The MAGIC telescope for detecting

-rays from point sources

Page 5: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The MAKET-ANI installation for the registration of Extensive Air Showers

Page 6: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The development of an extensive air shower induced by primary cosmic ray particle in the atmosphere

Page 7: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The Monte-Carlo Simulation is the key problem of any physical inference in indirect experiments

Page 8: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

What tasks we want to solve measuring EAS characteristics?

An inverse problem to be solved:

Experimental dataExperimental data Simulated Simulated datadata

?,?(N?,?(Nee,N,Nμμ,N,Nhh,S…),S…) E,A(NE,A(Nee,N,Nμμ,N,Nhh,S…),S…)

Identification of primary particle type Estimation of primary particle energy

Page 9: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Why Neural Networks?

Neural Networks belong to the general class of nonparametric methods that do not require any assumption about the parametric form of a statistical model they use

Are appropriate technique for classification and estimation tasks

Are able to treat multidimensional input data

Page 10: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The Neural information techniques

The central issue of Neural Networks is a bounded mapping of n-dimensional input to m-dimensional output:

The functional form of is accumulated in -NN parameters (weights) during the NN training process.

The NN training process consists in iterative processing of simulated events,

The aim of the training process consists in finding that provides the minimum of error (quality) function:

( ,ef N NW

f W;;;;;;;;;;;;;;

W;;;;;;;;;;;;;;

2

1

1( ) ( ( ) ) ;

evN

k k kev k

Q W OUT W TRUE VN

;;;;;;;;;;;;;;<<<<<<<<<<<<<<

Page 11: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

A Feed-Forward Neural Network

Page 12: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

An example of the NN output distribution in case of classification task

Page 13: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Common drawbacks in NN training process

Training only one network can lead to the suboptimal generalization

Insufficient training events and a risk of overtraining

Page 14: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Multi start random search algorithm

The Random search learning algorithm implements the following steps:

1. The initial values of NN weights are chosen randomly from Gaussian distribution with =0 and 0.01

2. The random step in the multidimensional space of NN weights is performed from initial point to modify the weights, the alternation of weights is done according to:

where is the NN weight vector at th iteration, is the step size, RNDM is a random number from [0,1] interval, and the term introduces and controls the degree of dependence of the random step value on the already achieved quality function

1 ( 0.5)pi i iW W Q RNDM

;;;;;;;;;;;;;;;;;;;;;;;;;;;;1, iteri N

iW;;;;;;;;;;;;;;

i piQ

Page 15: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

3. The quality function is calculated at each iteration by presenting all the training events to NN

4. If i ≤ i-1 , then the vector is kept as new weights of NN and the next step is initializing from that point in space of NN weights, otherwise – return to the previous point is implemented and a new random step is performed.

Multi start technique consists in training many Neural Nets starting from different initial weights and using different step size parameters, allowing to scan many points in the multidimensional space of NN weights

iW;;;;;;;;;;;;;;

Page 16: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Training and Validation errors,Overtraining control

Page 17: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

An acceptable procedure to avoid the overtraining

after each successful iteration of the learning process the net error is calculated for the validation sample

if the validation error is less than the one obtained at previous iteration, then the NN weights obtained at the current training iteration are memorized

else, the NN weights obtained at the previous iteration are stored

At the end of training process the weights which provide the minimal error on the validation sample are found and used as the final best weights for NN

Page 18: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

The multi start RS technique provides a possibility to select the NN with best performance on the control data set

Page 19: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Results of energy estimation by NN

Page 20: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Results of mass classification by NN

Page 21: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Application of NN for gamma/hadron separation task in gamma-ray

astronomy

Page 22: Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia

ACAT 2002, 24-28 June, Moscow, Russia A.Chilingarian, A.Vardanyan, CRD-YerPhI

Cosmic Ray differential energy spectra obtained by NN classification and

estimation