ellipsoidal representations about correlations (2011-11, tsukuba, kakenhi-symposium)

Post on 26-Jan-2015

110 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

A fundamental theory in statistics, possibly applicable to data mining, machine learning, as well as epistemology. The principia mathematica of mine, 2nd version.

TRANSCRIPT

Ellipsoidal representations about correlations(Towards general correlation theory)

Toshiyuki Shimonotshimono@05.alumni.u-tokyo.ac.jp

KAKENHI* Symposium*Grant-in-Aid for Scientific Research

University of Tsukuba2011-11-8

My profile

• My jobs are mainly building algorithms using data in large amounts such as: o web access log o newspaper articleso POS(Point of Sales) data o tags of millions of pictureso links among billions of pageso psychology test results of a human resource companyo data produced used for recommendation engineso data produced an original search engine

• This presentation touches on those above.

Background

1. Paradoxes of real world data :o any elaborate regression analysis mostly gives ρ < 0.7

(This is when the observation is not very accurate, and 0.7 is arbitrary.)     -> so how to deal with them? 

o data accuracy seems not important to see ρ if  ρ < 0.7,      -> details shown later. 

2. My temporal answer :o The correlations are very important,

   so we need interpretation methods.o The ellipsoids will give you insights.

3. Then we will :o understand the real world  dominated by weak correlations.o find new rules and findings  in broad science, hopefully.

Main contents§1.  What is ρ?

o Shape of ellipse/ellipsoido Mysterious robustness

§2.  Geometry of regressiono Similarity ratio of ellips * so Graduated rulerso Linear scalar fields

§1. What is ρ ?(ρ : the correlation coefficient)

It was developed by Karl Pearson from a similar but slightly different idea introduced by Francis Galton in the 1880s. 

(quoted from en.wikipedia.org)

The shapes of correlation ellipses (1)Each entry of the left figure shows the 2-dimensional Gaussian distributions with ρ changing from -1 to +1 stepping with 0.1.  (5000 points are plotted for each)

The shapes of correlation ellipses (2) 

The ellipse inscribes the unit square at 4 points (±1,±ρ) and (±ρ,±1).

The density function of 2-dim Gauss-distribution with standardizations.

Note: for higher dimensions,

The shapes of correlation ellipses (3)

When you draw the ellipses above, 1. draw an ellipse with the height and width of

√(1±ρ),2. rotate it 45 degree,3. do parallel-shift and axial-rescaling.

• Displacement and axial-rescaling are allowed.  (Rotation or rescaling along other direction is prohibited.)

The shapes of correlation ellipses (4) [Baseball example]   6 teams of the Central League played 130 games in the each of past 31 years.  Each dot below corresponds to each team and each year (N = 186  = 6 × 31). 

x : total score gained(G)y : - rank  ρ  = 0.419

x : total score lost(L)y : - rank  ρ  = -0.471

x : total score gainedy : total score lost   ρ  = 0.423

x : -rank prediction from both G & Ly : - rank   ρ  = -0.828

(The prediction is through the multiple regression analysis)

The shapes of correlations (5) SKIP

Correlation ellipsoid (higher dimension)

For 3-dim case, the probability ellipsoid touches the unit cube at 6 points of ±( ρ ・ 1 , ρ ・ 2 , ρ ・ 3 ) where ・ = 1,2,3.(For k-dimensions, the hyper-ellipsoid touches the unit hyper-cubeat 2×k points of of ±( ρ ・ 1 , ρ ・ 2 ,.., ρ ・ k ) where ・ = 1,2,..,k.

x

y

z

 ( 1 , 0.3 , 0.5 ) 

 ( 0.3 , 1 , 0.7 ) 

 ( 0.5 , 0.7 , 1 ) 

ρ-matrix herein is,

 1    0.3  0.5 0.3  1    0.7 0.5  0.7  1 

 (-0.5 ,-0.7 ,-1 ) 

 (-0.3 ,-1 ,-0.7 ) 

 (-1,-0.3,-0.5) 

The mysterious robustness (1) 

ρ[X:Y]  and  ρ [ f(X) : g(Y) ] seems to differ only little each other • when f and g are both increasing functions• unless X, Y, f(X) or g(Y)  contains `outlier(s)'.   

(Sampling fluctuations of ρ are much more than the effectcaused by non-linearity as well as error ε.)

* A function f( ・ ) is increasing iff f(x) f(y) holds for any x y. ≦ ≦

The mysterious robustness (2)

ρ[X:Y]=0.557 ρ[X2:Y]=0.519X を 2 乗

ρ[X:Y2]=0.536Y を 2 乗

ρ[Xrank:Yrank]=0.537X,Y を順位化

ρ[X:log(Y)]=0.539Y を対数化

ρ[X(5):Y(5)]=0.507X,Y を 5 値化

ρ[X(7):Y(7)]=0.524X,Y を 7 値化

• The deformations cause less effect on ρ,• N=200 1 causes bigger ρ fluctuations.≫

Even N=200 causes the samplingcorrelations rather big fluctuations, whereas the X marks from the experiments rather concentrates.

(x,y)=(u,0.5*u+0.707*v) with (u,v) from an uniform square.

The mysterious robustness (3)

Sampled ρ are perturbed corresponding to the sampling size with N=30(blue) or N=300(red). The deformation effect by f( ) is less.

Where does the champion come from?

If ρ of the game is not close to 1, the true cannot win. The winner is approximately ρ times as strong as the true guy.(If the results and abilities form a 2-dim 0-centered Gaussian.)

potential ability

The champion of a game is often not the true champion.

Summary of   `§1. What is ρ? '

• ρ is recognizable as an ellipse. • ρ-matrix is recognizable as an ellipsoid.• ρ seems robust against axial deformations unless outliers exist.• ρ of a game is suggested by the champions.

§2. Geometry of Regression

The figures herein show the possible region where (x,y,z)=(ρ[Y:Z],ρ[Z:X],ρ[X:Y]) can exist.

Multiple-ρ is the similarity ratio of ellipses

(When X ・ is k-dimentional, the hyper-ellipsoid is determined by k×k matrix whose  elements are ρ [ Xi : Xj ], and the inner point is at p-dimensional vector whose elements are ρ [  Xi : Y ] . )

[ Formulation of MRA ]

[ Multiple - ρ ]

The multiple-ρ ( 1) is the≦similarity ratio of the ellipses.

Examples : Multiple-ρ from the ellipses

Many interesting phenomena would be systematically explained.

Partial-ρ is read by a ruler in the ellipse

The red ruler • parallel to the corresponding axis,• passing through (r1,r2),• fully expanding inside the ellipse,• graduated linearly ranging ±1, 

reads the partial-ρ. 

The partial correlation r1' comes form the idea of the correlation between X1 and Y but X2 is fixed. 

r1' = 0.75 for this case.r2' is also read by changing the ruler direction vertically.  

Standardized partial regression coefficients

• ai are called the partial regression coefficients. • Assume X1,X2,Y are standardized. 

Make a scalar field inside the ellipse• 1 on the plus-side boundary of k-th axis,• 0 on the boundary of the other axis,• interpolate the assigning values linearly.

Then, ak is read by the value at (r1,r2). Note: • Extension to higher dimensions are easy.• Boundary points at each facet is single.• This pictorialization may be useful to SEM

(Structural Equation Analysis). 

The elliptical depiction for the baseball example This page is added after the symposium

Red : for the multiple-ρ (0.828), Blue : for the two partial-ρMagenta : for the partial regression coefficients.

Each value corresponds to the length ratio of the bold part to the whole same-colored line section.

X1 : annual total score gainedX2: annual total score lostY: zero minus annual ranking 

( ρ[Y:X1] , ρ[Y:X2] ) = (0.419,-0.471)  is plotted inside the ellipse slanted with ρ[X1:X2]=0.423.

-> The meaning of numbers becomes clearer.

Summary and findings of  §2 Geometry of regression

• Multiple-ρ is the similarity ratio of two ellipses/ellipsoids.• Partial-ρ is read by a graduated ruler in the ellipse/ellipsoids.• Each regression coefficients are given by the schalar field.

So far, the derived numbers from MRA (Multiple Regression Analysis) have often said to be hard to recognize. But this situation can be changed. 

[ Main resutls ]Using the ellipse or hyper-ellipsoid, •  any correlation matrix is wholly pictorialized.•  multiple regression is translated into geometric quotients.

[ Sub results ]•  ρ seems quite robust against axial deformations unless outliers exist.• (Spherical trigonometry may give you insights). <- Not referred today.

[ Next steps ] • treat the parameter/sampling perturbations• systematize interesting statistical phenomena• produce new theories further on• give new twists to other research areas• make useful applications to the real world cases• organize a new logic system for this ambiguous world.

Summary as a whole

Refs1. 岩波数学辞典     Encyclopedic Dictionary of Mathematics, The Mathematical Society of Japan

2. R, http://www.r-project.org/

3.  共分散構造分析 [ 事例編 ] 

  The  author sincerely welcomes any related literature. 

Background of this presentation SKIP1. We make judgements from related things

 in daily or social life, but this real world is noisy and filled with exceptions.

  e.g. "Does the better posture and mental   concentration cause the better performance?" 

2. The real world data causes paradoxes :o any elaborate regression analysis mostly gives ρ < 0.7, how to deal?o data accuracy is not important when  ρ < 0.7, details shown later.o why subjective sense works in the real? 

3. Geometric interpretations of multiple regression analysis may be usefulo that wholly takes in any correlation matrixo that is geometric using ellipsoids

    to observe, analyze the background phenomena in detail. 

4. Then we will understand weak correlations that dominates our world.

A primitive question  SKIPQuestion         Why(How) is data analysing important? 

My Answer         It gives you inspirations and             updates your recognition to the real world.            Knowing the numbers μ, σ, ρ, ranking, VaR *            from phenomena you have met           is crucially important to make your next action             in either of your daily, social or business life!!                  * average, std deviation, correlation coefficient, the rank order, Value at Risk

                And so, the interpretation of the numbers is necessary. 

        (And I provides you that of ρ today!)

    

Main ideas in more detail SKIPUsing the ellipse or hyper-ellipsoid, •  2nd order moments are completely imaginable in a picture.•  the numbers from Multiple-Regression are also imaginable.

1. (Pearson's) Correlation Coefficient • basic of statistics (as you know)• may change well when outliers are contained• however, changes only few against `monotone' map   • depicted as 'correlation ellipse'

2. Multiple Regression Analysis• (Spherical Surface Interpretation)• Ellipse Interpretation

Main ideas  SKIP

1. What is the correlation coefficient after all? 2. Geometric interpretations of Multiple Regression

Analysis. 

The mysterious robustness (3) SKIP

front figures: x - original sampling correlation. y - 3-valued then correlation calculated.  back figures: sample of 100. 

Summary of   `§1. What is ρ? 'REDUNDANT

• A correlation ρ is recognizable as an ellipse. • A correlation matrix is also recognizable as an ellipsoid.• ρ seems robust against axial deformations unless outliers exist.• You can guess `ρ' of a game by the champion.

When partial-ρ is zero. (SKIP)

The condition partial-ρ = 0  ⇔• The inner angle of the spheric triangle is 90 degrees.• The two `hyper-planes' cross at 90 degrees at the `hyper-

axis'. The axis corresponds the fixed variables and each of the planes contains each of the two variables.

•  On the ellipse/ellipsoid, the characteristic point is on the midpoint of the ruler.

Multiple-ρ is the similarity ratio of ellipsesREDUNDANT

(When X ・ is k-dimentional, the hyper-ellipsoid is determined by k×k matrix whose  elements are ρ [ Xi : Xj ], and the inner point is at p-dimensional vector whose elements are ρ [  Xi : Y ] . )

[ Formulation of MRA ]

[ Multiple - ρ ]

For arbitrary variables number case, you calculate: the inverse of the correlation matrix → the reciprocal of each of the diagonal elements → 1 minus each of them → take the square root of each → each are the multiple-ρ of the corresponding variable from the rest variables. 

The multiple-ρ ( 1) is the≦similarity ratio of the ellipses.

Summary and findings of  §2 Geometry of regressionREDUNDANT

• Multiple-ρ is the similarity ratio of two ellipses/ellipsoids.• Partial-ρ is read by a graduated ruler in the ellipse/ellipsoids.• Each regression coefficients are given by the scholar field.• (Spherical trigonometry)

So far, the derived numbers from MRA have often said to be hard to recognize. But this situation can be changed. 

Introduction  This page is added after the symposium

There is a Japanese word `kaizen', which means improvement.

The real world is, however, so ambiguous that it often is hard to know whether any kaizen action would make positive effect or not.

Sometimes your action may cause negative effect or zero effect in an averaged sense even if you believe your action is a good one. Assume a situation that you can control a variable to make some effect on the outcome variable (the number of control variables would increase in the following).

The author's hypothetical proposition is that the correlation coefficient indeed plays important role. A reason is that when the correlation is positive then your rational action is just increasing the value of the control variable. And it seems very reasonable that you should select a strongly correlated variable to the output variable.

The problems still existing today are as follows:  - The meaning of correlation value is not yet well known.  - The meaning of multiple regression analysis is also not yet    well known(, although when the correlation is weak the reasonable    choice of analysis is multiple analysis or its elaborate    derivatives).

The author found that correlation is very robust against any`axial deformations’ unless variables contain outliers.  Rather sampling correlation coefficient perturbs much more in many cases when N is  less than 1000.  The author also found geometrical backgrounds of correlations of multiple regression analysis (Perhaps R.A.Fisher already knew that, but any personaround me didn’t know that) that is producing many insights.

(The robustness is not well analyzed at this moment (somepieces of analysis and numerical examples) Thegeometrical background is analyzed in basic points sothe author is considering to investigate further for parameter perturbations.)

This page may need intensive proofreading by the author.

top related