chapter 12 비모수 통계학 (nonparametric analysis) › ~hokim › int ›...

Post on 23-Jun-2020






Click to see full reader


Chapter 12 비모수 통계학

(nonparametric analysis)


9.1 머리말 (introduction) • 모수적 방법

– 모집단의 분포를 가정

– 그 분포는 모수의 함수

– 모수를 알면 분포를 완전히 안다.

– 모수의 추정과 검정이 주요 문제 →모집단의 분포 가정이 틀리면 전체 논리가 다 틀리게 된다.

• Parametric approach * assumes dist’n of the pop * dist’n is the function of the parameters * Characteristics of the pop is determined by the parameters

* Estimation and testing of the parameters are main problems

* If the parametric assumptions are not valid, all the results of the analysis are questionable.

9.1 머리말 (introduction)

• 비모수적 방법; * 모집단의 분포를 가정하지 않음(무분포 방법) * data의 순위를 사용 * 모수 가정이 합리적인 경우 모수적 방법이 훨씬 더 효과적(efficient)

• Nonparametric approach * does not assumes the distributions of the pop (distribution-free method) * uses order of the data * If the parametric assumes are valid then parametric method is more efficient (smaller variance, less p-value)

data mean median

1,2,3,4,5 3 3

1,2,3,4,5,100 19 3.5

Median is robust to the outliers comparing to mean. (<-> sensitive)

median is the same if 100 -> 10000000

Nonparametric methods typically uses order of the data, not the value of the data.

Parametric vs. nonparametric methods

• 비모수적 방법은 자료의 (정규성) 분포가정을 하지 않는다

• Nonparametric methods are not dependent on parametric distributions.

• 자료의 평균과 분산이 아닌 순위를 이용한 방법을 사용한다.

• It typically uses ranks rather than the mean and variance.

• 자료의 분포가정 (eg 정규성)이 만족되면 효율이 떨어진다.

• If the distributional assumptions are valid, then nonparametric methods are less efficient (larger variance)

• Robust 한 결과를 준다. (outlier에 둔감)

• It is robust (not sensitive) to outliers

12.2 측정척도 (measurement scale)

• 명목척도(Nominal Scale) 남자, 여자, (male, female) 서울, 부산 (NY, LA)

• 서열척도(Ordinal Scale) 上, 中, 下 (high, medium, low)

• 구간척도(Interval Scale) 서열도 의미, 절대적 차이도 의미

• 비척도(Ratio Scale) 비율도 의미

12.3 부호검정(Sign Test)

Ex 12.3.1

•가설 Ho : 중위수(Median)=102 ,

Ha :중위수(Median)≠102

학생번호 (No) 점수(Score) 학생번호 (No) 점수(Score)

1 75 9 82

2 90 10 103

3 86 11 88

4 110 12 124

5 115 13 110

6 94 14 77

7 132 15 99

8 74

• Decision rule :P(+)>P(-)=Median>102 : enough # of +’s -> Reject

:P(+)<P(-)=Median<102: enough # of -’s -> Reject

:P(+)≠P(-)=Median≠102: enough # of + or -’s -> Reject

Ex12.3.1 에서 :(중위수=102) : P(+) ≠ P(-) # of +’s out of 15 under ~ Bin(15,1/2)








Scores above(+) or below(-) the hypothesized median (103)

학생번호 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

관측값−103 − − − + + − + − − + − + + − −

•Test statistic

𝑃 𝑘 ≤ 6 15,0.5


0.5 0 0.5 15 +151

0.5 1 0.5 14 + ⋯ +156

0.5 6 0.5 9

= 0.3036

We cannot reject Ho

[짝비교를 위한 부호검정]

짝지은 관측값들의 차이의 + 혹은 – 여부를 사용함.

We may apply Sign test for paired observations (like paired t-test)

data sign;

input score @@;


75 90 86 110 115 94 132 74 82 103 88 124 110 77 99



proc univariate mu0=102 ;



1-sided=.6072/2 =.3036

• Ex 12.3.2 (쌍을 이룬 집단 비교) paired data

• Hypothesis

: median of the difference is P(+)=P(-)

: median of the difference is negative P(+) < P(-) 0H


Dental Hygiene Score


id 양치질 교육을 받은 사람(𝑿𝒊) 양치질 교육을 받지 않은 사람(𝒀𝒊)

1 1.6 2 2 2 2 3 3.7 4.1 4 3.5 2.4 5 3.3 4.2 6 2.4 3.6 7 2 3.5 8 1.5 3 9 1.5 2.5 10 2.1 2.5 11 3.6 2.5 12 2.3 2.5

instructed Not-instructed

• Test statistic : # of (+)

𝑃 𝑘 ≤ 2 11, 𝑝 = 0.5 = 11𝑟


pbinom(2,11,0.5)=0.0327< 0.05

-> 𝛼 = 0.05에서 영가설을 기각한다. (Reject Ho)

[오른쪽 부호검정] (Sign Test Using right tail)

[표본의 크기] (Sample size)

id 1 2 3 4 5 6 7 8 9 10 11 12

𝑋𝑖 − 𝑌𝑖 − 0 − + − − − − − − + −

data pair;

input edu noedu ;

diff=noedu-edu ;


1.5 2.0

2.0 2.0

3.5 4.0

3.0 2.5

3.5 4.0

2.5 3.0

2.0 3.5

1.5 3.0

1.5 2.5

2.0 2.5

3.0 2.5

2.0 2.5


proc univariate ;

var diff ;



1-sided=.0654/2 =.03275

12.4 Wilcoxon의 위치에 대한 부호순위검정 (Wilcoxon’s signed rank test)

관측값 (obs) 𝒅𝒊 = 𝒙𝒊 − 𝟓. 𝟎𝟓 |𝒅𝒊|의 순서 |𝒅𝒊|의 순서와 부호의 곱

4.90 −0.15 1 −1 4.1 −0.95 7 −7 6.73 1.68 10 10 7.27 2.22 13 13 7.42 2.37 14 14 7.5 2.45 15 15 6.76 1.71 9 9 4.64 −0.41 3 −3 5.98 0.93 6 6 3.14 −1.91 12 −12 3.24 −1.81 11 −11 5.8 0.75 5 5 6.17 1.12 8 8 5.39 0.34 2 2 5.78 0.73 4 4

𝑊+ = 86, 𝑊− = 34, 𝑊 = 52

Ho: mean=5.50, Ha: Mean≠5.50 Test stat: W= 𝑊+ + 𝑊− = 52 Reject Ho if W is too large or too small >wilcox.test(c(4.90,4.1,6.73,7.27,7.42,7.5,6.76,4.64,5.98,3.14,3.24,5.8,6.17,5.39, 5.78), mu=5.05) 𝑝-값은 0.1514

12.5 중위수 검정법(Median Test) • H0 :중위수(농촌)=중위수(도시)


# >= Median

# < Median

urban rural

Mental health score

urban rural urban rural

35 29 25 50

26 50 27 37

27 43 45 34

21 22 46 31

27 42 33

38 47 26

23 42 46

25 32 41

도시 시골 합계

중위수보다 큰 값의 수 6 8 14

중위수보다 작은 값의 수 10 4 14

합계 16 12 28

• 하에서는 2ⅹ2분할표의 row와 column이 독립

• Row and column are independent under Ho

• ∴Do not reject 두 집단의 중위수는 동일하다.

Medians of two groups are not different.





( )

( )( )( )( )

28 6 4 10 8 = 2.33 3.841

16 12 14 14

2.33 2.706 0.10

n ad bc

a c b d a b c d




12.6 Mann-Whitney test

• 가정 :두 집단의 sample size가 각각 n, m일때 ① 독립적이고 확률적으로 뽑았다. ② 서열적이다. ③ 두 집단은 같은 분포이고, 중위수만 다르다.

• Assumptions: samples are n, m, respectively. ① sampled independently and randomly. ② ordinal scale. ③ different only by the medians. Shapes are exactly the same

•Ex 12.6.1

몸무게 (Weight) Group 1 (𝑿) Group 2 (𝒀)

252 254 185 280 240 164 310 264 205 288 212 270 200 138 238 210 170 240 184 192 170 217 136 126 320 240 200 220 148 302 270 295

214 312

그룹 1의 모중위수가 그룹2의 모중위수보다 작다고 할 수 있나?

Is population median of group 1 is smaller than that of group 2?

𝐻0 ∶ 𝑀𝑋 ≥ 𝑀𝑌 vs 𝐻𝐴∶ 𝑀𝑋 < 𝑀𝑌

rank rank

Rank sum of X

그룹 1 순서 그룹 2 순서 126 1 136 2

138 3 148 4 164 5 170 6.5 170 6.5

184 8 185 9 192 10

200 11.5 200 11.5 205 13

210 14 212 15

214 16 217 17

220 18 238 19

240 21 240 21 240 21 252 23 254 24

264 25 270 26.5 270 26.5 280 28

288 29 295 30

302 31 310 32

312 33 320 34 Total 319.5

𝑈 = 𝑊 −𝑚 𝑚 + 1


= 319.5 −18 18 + 1

2= 148.5

Rule: Reject Ho if U is small enough.

p-value=0.14 Evidence is not

enough to reject Ho.

install.packages('coin') > library(coin) > xx<-c(252,240,205,200,170,170,320,148,214,185,310,212,238,184,136,200,270) > yy<-c(254,164,288,138,240,217,240,302,312,254,164,288,138,240,217,240,302,312) > dat<-data.frame(val=c(xx,yy),group=factor(rep(1:2,c(17,18))) ) > wilcox_test(val~group,data=dat,distribution = 'exact') Exact Wilcoxon-Mann-Whitney Test data: val by group (1, 2) Z = -1.4882, p-value = 0.1404 alternative hypothesis: true mu is not equal to 0

11.6 Kolmogorov-Smirnov (K-S) goodness-of-fit test

• Are cumulative dist’ns the same? ⇔Are dist’ns of two pops the same?

• 검정통계량 (test stat)



ˆ ( ) : Pr( )

( ) : Pr( )

: ( ) ( )

: ( ) ( )





F x x x

F x X x

H F x F x

H F x F x



ˆsup | ( )S

xD F x ˆ ( ) |

TF x

(pop) Cumulative dist’n ft

sample cumulative dist’n ft

•계산방법 , 보기 11.6.1 공복시 혈당량이 정규분포를 따르는가 ? Glucose level ~ normal dist’n ?

75 92 80 80 83 72

83 77 81 77 75 81

80 92 72 77 78 76

77 86 77 92 80 78

67 78 92 67 80 81

87 76 80 87 77 86

𝒙 도수 누적도수 𝑭𝑺(𝒙)

67 2 2 0.0556

72 2 4 0.1111

75 2 6 0.1667

76 2 8 0.2222

77 6 14 0.3889

78 3 17 0.4722

80 6 23 0.6389

83 3 26 0.7222

84 2 28 0.7778

86 2 30 0.8333

87 2 32 0.8889

92 4 36 1.0000

합계 36

D=0.1547 < 0.221

𝒙 𝒛 = (𝒙 − 𝟖𝟎) 𝟔 𝑭𝑻(𝒙)

[67,72) −2.00 0.0228

[72,75) −1.33 0.0918

[75,76) −0.83 0.2033

[76,77) −0.67 0.2514

[77,78) −0.50 0.3085

[78,80) −0.33 0.3707

[80,83) 0.00 0.5000

[83,84) 0.17 0.5675

[84,86) 0.67 0.7486

[86,87) 1.00 0.8413

[87,92) 1.17 0.8790

[92,∞) 2.00 0.9772

𝒙 𝑭𝑺 𝒙 𝑭𝑻(𝒙) |𝑭𝑺 𝒙 − 𝑭𝑻(𝒙)|

67 0.0556 0.0228 0.0328

72 0.1111 0.0918 0.0193

75 0.1667 0.2033 0.0366

76 0.2222 0.2514 0.0292

77 0.3889 0.3085 0.0804

78 0.4722 0.3707 0.1015

80 0.6389 0.5000 0.1389

83 0.7222 0.5675 0.1547

84 0.7778 0.7486 0.0292

86 0.8333 0.8413 0.0080

87 0.8889 0.8790 0.0099

92 1.0000 0.9772 0.0228

> xx<-c(75,92,80,80,83,72,83,77,81,77,75,81,80,92,72,77,78,76,77,86,77,92,80,78, + 67,78,92,67,80,81,87,76,80,87,77,86)

> ks.test(xx,'pnorm',mean=80,sd=6)

One-sample Kolmogorov-Smirnov test

data: xx

D = 0.15604, p-value = 0.3447

alternative hypothesis: two-sided


In ks.test(xx, "pnorm", mean = 80, sd = 6) :

Kolmogorov-Smirnov 테스트를 이용할 때는 ties가 있으면 안됩니다

> 근사적인 p-값을 사용한다.

12.8 Kruskal-Wallis One-way ANOVA7

• 가정 H0: k개의 집단은 같은 분포에서 나왔다. HA: 적어도 하나의 집단은 다른 집단과 다른 분포(큰 값 혹은 작은값)에서 나왔다.

• Assumptions

H0 : k samples from the same distributions

HA : one or more sample from distribution with larger or smaller location parameter

• H0하에서는 각 집단에서의 순위합 들은 비슷하다. 원래는 의 형태이고 값들이 비슷하면 값이 작아지므로 Ho를 reject 못한다.

• rank-sums are similar under Ho

• If ‘s are similar then are small -> H is small, we cannot reject Ho

1 2, , ,

kR R R


iR R i



iR R

1 2, , ,

kR R R



iR R

•보기 12.8.1 2



123( 1) ~

( 1)j


RH n

n n n



12.01 3.67 55.63

29.44 4.05 27.88

28.02 6.49 66.81

38.33 21.12 46.27

55.91 1.11 31.19



5 2 13

9 3 7

8 4 15

11 6 12

14 1 10

47 16 57

Original values Ordered values

𝐻 =12







5− 3 15 + 1 = 9.14

P<0.009 Page 486

> xx<-c(12.01,3.67,55.63,29.44,4.05,27.88,28.02,6.49,66.81,38.33,21.12,46.27,55.91,1.11,31.19)

> dat<-data.frame(val=xx,group=factor(rep(1:3,5)))

> kruskal.test(val~group,data=dat)

Asymptotic Kruskal-Wallis Test

data: val by group (1, 2, 3)

chi-squared = 9.14, df = 2, p-value = 0.01036

•Ex 12.8.2

𝐻 =12

41(41 + 1)










7− 3 41 + 1 = 36.39

pchisq(36.39,4,lower=F)= 2.4 × 10−7

Treatment cost by drug type per bed by hospital type

Drug type


17.38(11) 52.59(35) 27.87(20) 34.55(26) 60.77(40)

15.20(2) 44.55(28) 24.00(12) 31.15(22) 59.99(38)

14.76(1) 44.80(29) 26.55(16) 30.50(21) 58.94(37)

16.88(7) 43.25(27) 25.00(13) 31.25(23) 57.05(36)

17.02(10) 50.75(32) 27.55(19) 32.75(24) 60.50(39)

26.67(17) 52.25(34) 25.92(14) 33.00(25) 61.50(41)

15.75(4) 46.13(30) 26.01(15) 27.30(18) 51.10(33)

16.02(5) 48.87(31) 16.48(6)

15.30(3) 17.00(9)


𝑅1 =68 𝑅2 =246 𝑅3 =124 𝑅4 =159 𝑅5 =264

> val<-c(17.38,15.20,14.76,16.88,17.02,26.67,15.75,16.02,15.30,16.98,52.59,44.55,44.80,43.25,50.75, 52.25,46.13,48.87,27.87,24.00,26.55,25.00,27.55,25.92,26.01,16.48,17.00,34.55,31.15,30.50,31.25,32.75,33.00,27.30,60.77,59.99,58.94,57.05,60.50,61.50,51.10)

> group<-factor(rep(c('A','B','C','D','E'),c(10,8,9,7,7)))

> dat<-data.frame(val,group)

> kruskal.test(val~group,data=dat)

Kruskal-Wallis rank sum test

data: val by group

Kruskal-Wallis chi-squared = 36.394, df = 4, p-value = 2.401e-07

12.9 Friedman’s 2-way ANOVA

• Ex 12.9.1

Physical therapists’ ranks of three low-volt electrical simulators


물리치료사 A B C

1 2 3 1 2 2 3 1 3 2 3 1 4 1 3 2 5 3 2 1 6 1 2 3 7 2 3 1 8 1 3 2 9 1 3 2

𝑅𝑗 15 25 14

Medical device


𝐻0: 3가지 의료기기의 성능은 동일하다. (Three devices are equivalent)

𝐻A: 적어도 하나의 의료기기 성능은 다르다. (They are not equivalent)

𝑋𝑟2 =


9 3 3 + 1[ 15 2+ 25 2+ 15 2] − 3(9)(3 + 1)

= 8.222 [표 B(a)]-> p=0.016. 유의수준 0.05에서 영가설 기각

(Reject Ho)

> val<-c(2,3,1,2,3,1,2,3,1,1,3,2,3,2,1,1,2,3,2,3,1,1,3,2,1,3,2) > group<-factor(rep(1:3,9)) > id<-factor(rep(1:9,each=3)) > friedman.test(val,group,id) Friedman rank sum test data: val, group and id Friedman chi-squared = 8.2222, df = 2, p-value = 0.01639

12.10 Spearman rank correlation coefficient

• 양측검정 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 독립적이 아니다.

• 단측검정 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 정비례 H0 : X와 Y는 서로 독립적이다. HA : X와 Y는 반비례

• 2-sided H0 : X and Y are indep. HA : X and Y are not indep.

• 1-sided H0 : X and Y are indep. HA : X and Y: + association H0 : X and Y are indep. HA : X and Y: - association

• Ex 12.10 식별번호 𝐗 𝐘 식별변호 𝐗 𝐘

1 500 525 10 50 60 2 475 130 11 175 105 3 390 325 12 130 148 4 325 190 13 76 75 5 325 90 14 200 250 6 205 295 15 174 102 7 200 180 16 201 151 8 75 74 17 125 130

9 230 420

식별번호 순서 (𝐗) 순서 (𝐘) 𝒅𝒊 𝒅𝒊


1 17 17 0.0 0.00 2 16 7.5 8.5 72.25 3 15 15 0.0 0.00 4 13.5 12 1.5 2.25 5 13.5 4 9.5 90.25 6 11 14 -3.0 9.00 7 8.5 11 -2.5 6.25 8 2 2 0.0 0.00 9 12 16 -4.0 16.00 10 1 1 0.0 0.00 11 7 6 1.0 1.00 12 5 9 -4.0 16.00 13 3 3 0.0 0.00 14 8.5 13 -4.5 20.25 15 6 5 1.0 1.00 16 10 10 0.0 0.00 17 4 7.5 -3.5 12.25

𝑑𝑖2 =246.5

•가설검정의 순서 ① X,Y 따로 순위를 준다. ② di=순위(xi)-순위(Yi) ③ 을 구한다. 2


•반비례의 관계가 있다면 가 커지고 rs가 작아진다.

•비례의 관계가 있다면 가 작아지고 rs가 커진다. -> 충분히 큰 rs -> 두 변수가 독립이라는 귀무가설을 기각함

(table C)



•steps ① rank X, Y seperately. ② di=rank(xi)-rank(Yi) ③ calculate



• negative association -> large -> small rs

• positive association -> small -> large rs

• rs is large enough -> reject H0 : independence ∴ We conclude positive association between X and Y



𝑟𝑠 = 1 −6 𝑑𝑖


𝑛(𝑛2−1)=0.697 > 0.4853





• Ex 12.10.2(n>30일 경우) 식별번호 나이 (𝑿) 무기질 농도 (𝒀) 식별번호 나이 (𝑿) 무기질 농도 (𝒀)

1 82 169.62 19 50 4.48

2 85 48.94 20 71 46.93

3 83 41.16 21 54 30.91

4 64 63.95 22 62 34.27

5 82 21.09 23 47 41.44

6 53 5.40 24 66 109.88

7 26 6.33 25 34 2.78

8 47 4.26 26 46 4.17

9 37 3.62 27 27 6.57

10 49 4.82 28 54 61.73

11 65 108.22 29 72 47.59

12 40 10.20 30 41 10.46

13 32 2.69 31 35 3.06

14 50 6.16 32 75 49.57

15 62 23.87 33 50 5.55

16 33 2.70 34 76 50.23

17 36 3.15 35 28 6.81

18 53 60.59

• Ex 12.10.2(n>30일 경우)


0.75 1 4.37 1.96S

r Z r ns

reject H

• Z가 너무 크거나(반비례관계) 들이 크고 Z가 너무 작거나(비례관계) 들이 작고

01 2if Z Z then reject H





• larger Z (- asso) larger smaller Z(+asso) smaller





식별번호 순서(𝑿) 순서(𝒀) 𝒅𝒊 𝒅𝒊𝟐 식별번호 순서(𝑿) 순서(𝒀) 𝒅𝒊 𝒅𝒊


1 32.5 35 − 2.5 6.25 19 17 9 8.0 64.00

2 35 27 8.0 64.00 20 28 25 3.0 9.00

3 34 23 11.0 121.00 21 21.5 21 0.5 0.25

4 25 32 − 7.0 49.00 22 23.5 22 1.5 2.25

5 32.5 19 13.5 182.25 23 13.5 24 − 10.5 110.25

6 19.5 11 8.5 72.25 24 27 34 − 7.0 49.00

7 1 14 − 13.0 169.00 25 6 3 3.0 9.00

8 13.5 8 5.5 30.25 26 12 7 5.0 25.00

9 9 6 3.0 9.00 27 2 15 − 13.0 169.00

10 15 10 5.0 25.00 28 21.5 31 − 9.5 90.25

11 26 33 − 7.0 49.00 29 29 26 3.0 9.00

12 10 17 − 7.0 49.00 30 11 18 − 7.0 49.00

13 4 1 3.0 9.00 31 7 4 3.0 9.00

14 17 13 4.0 16.00 32 30 28 2.0 4.00

15 23.5 20 3.5 12.25 33 17 12 5.0 25.00

16 5 2 3.0 9.00 34 31 29 2.0 4.00

17 8 5 3.0 9.00 35 3 16 -13.0 169.00

18 19.5 30 − 10.5 110.25

𝑑𝑖2 =1788.5

12.11 비모수 회귀분석 (non-parametric regression)

• Ex. 12.11.1 [Theil’s method]

𝛽 = median 𝑆12, ⋯ , 𝑆𝑛−1,𝑛 ,

𝑆𝑖𝑗 = 𝑦𝑗 − 𝑦𝑖 / 𝑥𝑗 − 𝑥𝑖 , 𝑆12 =164−163

57.4−53.9= 0.285

테스토스테론(𝐘) 163 164 156 151 152 167 165 153 155

구연산(𝐗) 53.9 57.4 41.0 40.0 42.0 64.4 59.1 49.9 43.2

0.285 0.470 0.202

0.643 0.655 0.126

0.487 0.669 0.965

0.863 0.384 1.304

0.747 0.588 0.747

5.000 0.497 0.633

0.924 0.732 − 0.454

0.779 0.760 1.250

− 4.00 0.377 2.500

0.500 2.500 0.566

0.380 1.466 0.628

0.428 − 0.337 − 0.298

절편의 추정 (Estimating intercept )

𝛽 = median 𝑦1 − 𝛽 1𝑥1, ⋯ , 𝑦𝑛 − 𝛽 1𝑥𝑛

𝛽 =

median mean 𝑦1 − 𝛽 1𝑥1, 𝑦2 − 𝛽 1𝑥2 , mean 𝑦1 − 𝛽1𝑥1, 𝑦3 − 𝛽1𝑥3 , ⋯ , mean 𝑦𝑛−1 − 𝛽1𝑥𝑛−1, 𝑦𝑛 − 𝛽1𝑥𝑛

• /* File name :

Nonparametric One-Way Anova */

options pageno=1 nodate ls=130

ps=60 nocenter;

filename inbrakes


data one;

infile inbrakes ;

input id vehtype group positn

speedzn resptime follotme


if group=1;

label vehtype='Vehicle Type'

group='Group - Light On=1

Light Off=2'

positn='Light Position'

speedzn='Speed Zone'

resptime='Response Time'

follotme='Following Time

in Vedio Frames'

folltmec='Following Time

in Categories‘;


proc sort; by vehtype;

/* Let's do one-way ANOVA to see

the effect of vehicle type */

proc anova;

class vehtype;

model resptime=vehtype;

title 'Parametric ANOVA analysis';


/* What's wrong with this ?

We didn't check the normality


Let's do proc univariate to

check the normality*/

proc univariate normal plot;

var resptime;

by vehtype;

title 'Normality Check';




proc npar1way wilcoxon;

class vehtype;

var resptime ;

title 'Nonpara One-Way ANOVA for

Tail Light Study';


/* The other way is transformation.

Let's take log transformation

so that we have normal


data t;

set one;


label t='ln (response time)';


proc sort; by vehtype;

proc univariate normal plot;

var t;

by vehtype;

title 'Normality Check for

transformed variable';


/* The transformed variable

seems to normally


Then we can do parametric

ANOVA with normality



proc anova;

class vehtype;

model t=vehtype;

title 'ANOVA for the log

transformed response time';


Nonpapametric Smoothing (1) Smoothing

• Consider X Y plot.

• Draw a regression line which requires no parametric assumptions

• The regression line is not linear

• The regression line is totally dependent on the data

Two components of smoothing

• Kernal function : How to calculate weighted mean

• Bandwidth : width of the window (span), determines the smoothness of the regression line; wider > smoother

Nonpapametric Smoothing (2)

Uniform Kernel

Nonpapametric Smoothing (3)

Triangular Kernel

Nonpapametric Smoothing (4)

Normal Kernel

Nonpapametric Smoothing (5)

Default Lowess line : Span=0.5

Nonpapametric Smoothing (6)

Lowess line : Span=0.2

Nonpapametric Smoothing (7)

Lowess line : Span=0.1

data A; input x y @@;

datalines; 1 4 2 9 3 20 4 25 5 1 6 5 7 -4 8 12 ;

title "sm45 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm45 value=circle height=2; /* note that x is sorted */ run;

title "sm70 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm70 value=circle height=2; /* note that x is sorted */


title "sm20 spline smoother";

proc gplot data=A; plot y*x; symbol1 interpol=sm20 value=circle height=2; /* note that x is sorted */



plot(cars, main = "lowess(cars)")

lines(lowess(cars), col = 2)

lines(lowess(cars, f = .2), col = 3)

legend(5, 120, c(paste("f = ", c("2/3", ".2"))), lty = 1, col = 2:3)

data<- read.csv("", sep=",")



sl <-subset(data, ccode==11 )

boxplot(meanpm10~yy, ylab=expression(PM[10]), axes=T, data=sl)

plot(sl$date,sl$meanpm10, ylab=expression(PM[10]), xaxt='n', cex=0.6)<-seq(as.Date("2000-01-01"), as.Date("2007-12-31"),"year")

xname<-c("'00-01-01","'01-01-01", "'02-01-01", "'03-01-01",

"'04-01-01", "'05-01-01", "'06-01-01", "'07-01-01")

axis(side=1,, labels=xname)



sl[829,"meanpm10"]<-(sl[828,"meanpm10"]+ sl[830,"meanpm10"])/2


plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(a)f=.1", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.1), col="red", lwd=2)

axis(side=1,, labels=xname)

plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(b)f=.05", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.05), col="red", lwd=2)

axis(side=1,, labels=xname)

plot(sl$date, sl$meanpm10, ylab=expression(PM[10]),xlab="date",main="(c)f=.5", xaxt='n', cex=0.6)

lines(lowess(sl$date, sl$meanpm10, f=0.5), col="red", lwd=2)

axis(side=1,, labels=xname)


top related