why we use exploratory data analysis
DESCRIPTION
ESTIMATES BASED. ON NORMAL DISTRIB. DATA. YES. NO. WHY ?. OUTLIERS. CAN WE. KURTOSIS ,. EXTR EMS. REMOVED THEM ?. SKEWNESS. YES. NO. QUANTILE. (ROBUST). TRANSFORMA TIONS. ESTIMATES. QUANTILE. (ROBUST). ESTIMATES. WHY WE USE EXPLORATORY DATA ANALYSIS. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/1.jpg)
1
WHY WE USE EXPLORATORY DATA ANALYSIS
DATA YES
NO
ESTIMATES BASEDON NORMAL DISTRIB.
KURTOSIS, SKEWNESS
TRANSFORMATIONS
QUANTILE (ROBUST)
ESTIMATES
OUTLIERS
EXTREMS YES
NO
QUANTILE (ROBUST)
ESTIMATES
WHY ?
CAN WEREMOVED THEM ?
DO DATA COME FROM NORMAL DISTRIBUTION?
TRANSFORMATIONS
![Page 2: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/2.jpg)
2
METHODS OF EDA
Graphical:
dot plot
box plot
notched box plot
QQ plot
histogram
density plots
Tests:
tests of normality
minimal sample size
![Page 3: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/3.jpg)
3
DOT PLOT
![Page 4: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/4.jpg)
4
BOX PLOT
lowerquartil
upperkvartil
fenceouter inner
fenceinner outer
interquartilerange (H)
číselná osa
median
![Page 5: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/5.jpg)
5
NOTCHED BOX PLOT
interval estimate of median
FD,H
1,57 RI = M ±
n
RF
![Page 6: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/6.jpg)
6
Q-Q PLOT
X: theoretical quantiles of analysed distribution
Y: sample quantilesideal coincidence of sample values and theoretical distribution
measured values
![Page 7: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/7.jpg)
7
Q-Q GRAF
25 30 35 40 45 50 55 60 65
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Oče
káva
ná n
orm
ální
hod
nota
![Page 8: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/8.jpg)
8
Q-Q GRAF
-20 0 20 40 60 80 100 120
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Očekávaná n
orm
áln
í hodnota
![Page 9: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/9.jpg)
9
Q-Q plot
right sided – skewed to left
left sided – skewed to right
platycurtic („flat“) leptocurtic(„steep“)
![Page 10: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/10.jpg)
10
![Page 11: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/11.jpg)
11
![Page 12: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/12.jpg)
12
HISTOGRAM
Histogram - Sheet1 - TLOUSTKYČetnost
TLOUSTKY
20 30 40 50 60 700
10
20
30
![Page 13: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/13.jpg)
13
HISTOGRAM
correct width of interval:
0,4int 2,46 ( 1)L n nL 2int
![Page 14: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/14.jpg)
14
HISTOGRAM – kernel density function
Odhad hustoty - Sheet1 - TLOUSTKYHustota
TLOUSTKY
10 20 30 40 50 60 70 800.000
0.010
0.020
0.030
0.040
0.050
0.060
![Page 15: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/15.jpg)
15
TRANSFORMATION
Aim of transformation:reduction of variance better level of symmetry(normality) of data
Transformation function:non-linear function monotonic function
![Page 16: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/16.jpg)
16
TRANSFORMATION – basic concept
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 0.5 1 1.5 2 2.5 3 3.5
Original data (tree-rings widths in mm)
Tra
nsf
orm
ed d
ata
mean of original data
transformed mean and its
projection to original data set
![Page 17: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/17.jpg)
17
TRANSFORMATION – logaritmic transformation
lnx x
0.0
5.0
10.0
15.0
0.0 266.7 533.3 800.0
Histogram
C2
Count
0.0
3.3
6.7
10.0
3.0 4.3 5.7 7.0
Histogram
C7
Count
![Page 18: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/18.jpg)
18
TRANSFORMATION – power transformation
0
( ) ln 0
0
x
x x for
x
![Page 19: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/19.jpg)
19
TRANSFORMATION – Box-Cox
0xln
01x
)x(
![Page 20: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/20.jpg)
20
TRANSFORMATION – Box-Cox
![Page 21: WHY WE USE EXPLORATORY DATA ANALYSIS](https://reader034.vdocuments.mx/reader034/viewer/2022051316/56815b14550346895dc8c337/html5/thumbnails/21.jpg)
21
TRANSFORMATION– estimate of optimal
logarithm oflikelihood function
for various values of optimal
interval estimate of parameter
= 1 is not included in intervalestimate of . It means that
transformation will be probably
successful
1.00
maxLF – 0,5*quantile 2