nonparametric statistical methods
DESCRIPTION
Nonparametric Statistical Methods. Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang. Introduction. Definition. - PowerPoint PPT PresentationTRANSCRIPT
Nonparametric Statistical Methods
Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang
1
Introduction
Definition
Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.
Used for small sample sizes. Used when the data are measured on an
ordinal scale and only their ranks are meaningful.
3
Outline
1. Sign Test 2. Wilcoxon Signed Rank Test 3. Inferences for Two Independent Samples 4. Inferences for Several Independent Samples 5. Friedman Test 6. Spearman’s Rank Correlation 7. Kendall’s Rank Correlation Coefficient
4
1 .Sign Test
5
Parameter of interest: Median
Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions.
6
Hypothesis test
H0: µ = µ0 vs Ha: µ > µ0 where µ0 is a specified value and µ is unknown median
7
Testing Procedure
Step 1: Given a random sample x1, x2, …, xn from a population with unknown median µ, count the number of xi’s that exceed µ0. Denote them by s+. s-= n - s+
Step 2: Reject H0 if s+ is large or s- is small.
8
How to reject H0?
To determine how large s+ must be in order to reject H0, we need to find out the distribution of the corresponding random variable S+.
Xi: random variable corresponding to the observed values xi
S-: random variable corresponding to s-
9
Distribution of S+ and S-
10
Calculating P-value
11
Rejection criteria
12
Large sample z-test
13
Confidence Interval
14
Example
15
SAS code
16
DATA themostat;INPUT temp;datalines;202.2203.4…;PROC UNIVARIATE DATA=themostat loccount mu0=200;VAR temp;RUN;
SAS Output Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000
Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048
17
2. Wilcoxon signed rank test
18
Inventor
Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests.
19
What is it used for?
Two related samples Matched samples Repeated measurements on a single
sample
Hypothesis
21
Testing procedure
22
Example
23
SAS codes
24
DATA thermo;INPUT temp;datalines;202.2203.4…;PROC UNIVARIATE DATA=thermo loccount mu0=200;TITLE "Wilcoxon signed rank test the thermostat";VAR temp;RUN;
SAS outputs (selected results)
25
8
Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000
Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048
Large sample approximation
26
Derive E(x) & Var(x)
27
Rejection region:
28
3. Inferences for Two Independent Samples
29
Hypothesis
Definition
31
Definition
32
Wilcoxon sum rank test
33
Mann-Whitney-U test
34
Between two tests
35
Advantages
36
For large samples
37
For large samples
38
Treatment of ties
39
Example
To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows
A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20
8.32 7.70
40
Example
7.20 7.70 7.76 8.10 8.14 8.16 8.20 8.25
B B A B B A B B
1 2 3 4 5 6 7 8
8.27 8.32 8.50 8.63 8.65 8.83 9.00 9.48
B B A A A A B A
9 10 11 12 13 14 15 16
41
Example
42
Example
43
SAS code
Data exam;Input group $ score @@;Datalines;A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70;
44
SAS code
Proc npar1way data=exam wilcoxon;Var score;Class group;Exact wilcoxon;Run;
45
Output
Wilcoxon Scores (Rank Sums) for Variable scoreClassified by Variable group
group N Sum ofScores
ExpectedUnder H0
Std DevUnder H0
MeanScore
A 7 75.0 59.50 9.447222 10.714286
B 9 61.0 76.50 9.447222 6.777778
46
OutputWilcoxon Two-Sample Test
Statistic (S) 75.0000
Normal Approximation
Z 1.5878
One-Sided Pr > Z 0.0562
Two-Sided Pr > |Z| 0.1123
t Approximation
One-Sided Pr > Z 0.0666
Two-Sided Pr > |Z| 0.1332
Exact Test
One-Sided Pr >= S 0.0571
Two-Sided Pr >= |S - Mean| 0.1142
Z includes a continuity correction of0.5.
47
Output
48
4. Inferences for Several Independent Samples
49
Introduction
We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test.
50
When to use Kruskal-Wallis test?
But what happens when our data is not normal? This is when we use the nonparametric
Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution.
The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data. 51
Kruskal-Wallis Test (kw Test)
A non-parametric method for testing whether samples originate from the same distribution.
Used for comparing more than two samples that are independent.
52
Kruskal-Wallis Test: History William Henry Kruskal
October 10th, 1919 – April 21st, 2005 Obtained Bachelors and Masters degree
in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955.
Wilson Allen Wallis November 5th,1912 – October 12th, 1998 Undergraduate work at the University of
Minnesota and Graduate work at the University of Chicago in 1933.
53
Kruskal-Wallis Test: Steps
1. Create Hypothesis:Null Hypothesis (Ho): The samples from populations are identicalAlternative Hypothesis (Ha): At least one sample is different
54
Kruskal-Wallis Test: Steps
2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied.
3. All the ranks of the different samples are added together. Label these sums L1, L2, L3, and L4.
55
Kruskal-Wallis Test: Steps
4. Find Test Statistic:
n = total number of observations in all samplesLi = total rank of each sample
kw = test statistic
5. Reject Ho if H is greater than the chi-square table value.
56
Kruskal-Wallis Test: Example
An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal-Wallis test to the test scores data set.
57
Kruskal-Wallis Test: Example
Given
Data
Ranksof Data values
58
Kruskal-Wallis Test: Example
59
Kruskal-Wallis Test: Example
60
SAS Input
data test; input methodname $ scores; cards; case 14.59 case 23.44 case 25.43 case 18.15 Case 20.82 Case 14.06 Case 14.26 Formula 20.27 Formula 26.84 Formula 14.71 Formula 22.34 Formula 19.49 Formula 24.92 Formula 20.20 Equation 27.82
Equation 24.92 Equation 24.92 Equation 28.68 Equation 23.32 Equaiton 32.85 Equation 33.90 Equation 23.42 Unitary 33.16 Unitary 26.93 Unitary 30.43 Unitary 36.43 Unitary 37.04 Unitary 29.76 Unitary 33.88 ; proc npar1way
data=test wilcoxon; class methodname; var scores; run;
61
SAS Output
Wilcoxon Scores (Rank Sums) for Variable scores Classified by Variable methodname
Sum of Expected Std Dev Mean methodname N Scores Under H0 Under H0 Score case 7 49.00 101.50 18.845498 7.000000 formula 7 66.50 101.50 18.845498 9.500000 equation 7 125.50 101.50 18.845498 17.928571 unitary 7 165.00 101.50 18.845498 23.571429
Average scores were used for ties.
Kruskal-Wallis Test
Chi-Square 18.1390 DF 3 Pr > Chi-Square 0.0004
62
4. Friedman Test
63
Introduction
A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it.
The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data.
64
Steps in the Friedman test
65
Steps in the Friedman test
66
Example
Now we have 8 treatments separated in 3 blocks,
α = 0.025
67
Define Null and Alternative Hypothesis
H0: There is no difference between 8 treatments
Ha: There exists difference between 8 treatments
68
Rank Sum
69
Friedman Test
70
Conclusion
71
5. Spearman’s Rank Correlation Coefficient
72
Introduction
From Pearson to Spearman Spearman’s Rank Correlation
Coefficient Large-Sample Approximation Hypothesis Test Examples
73
From Pearson to Spearman Pearson’s
Measure only the degree of linear association Based on the assumption of bivariate
normally of two variables
Spearman’s Take in account only the ranks Measure the degree of monotone association Inferences on the rank correlation
coefficients are distribution-free
74
From Pearson to Spearman
75
From Pearson to Spearman
Charles Edward Spearman As a psychologist ① General factor of intelligence
② the nature and causes of variations in human
As a statistician ① Rank correlation
② two-way analysis
Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945)
③ Correlation coefficient
76
Spearman’s Rank Correlation Coefficient
77
Spearman’s Rank Correlation Coefficient
78
Large sample approximation
79
Hypothesis testing
80
Example
Table 5.1 Wine Consumption and Heart Disease Deaths
81
Example
82
ExampleTable 5.2 Ranks of Wine Consumption and Heart Disease Deaths
83
Example
84
Example
85
6. Kendall’s Rank Correlation Coefficient
86
Kendall’s Tau It is a coefficient use to measure the
association between two pairs of ranked data.
Named after British statistician Maurice Kendall who developed it in 1938.
Ranges from -1.0 to 1.0 Tau-a (with no ties) and Tau-b (with ties)
87
Formula for Tau-a
88
Concordant and Discordant
89
Example 1 Kendall’s tau-a
Raw data for 11 students in 2 exams:Exam 1 Exam 2
85 8598 9590 8083 7557 7063 6577 7399 9380 7996 8869 74 90
Ranks of exam resultsExam1 x Exam 2 y c d
1 2 9 1
2 1 9 0
3 3 8 0
4 5 6 1
5 4 6 0
6 7 4 1
7 6 4 0
8 9 2 1
9 8 2 010 11 0 111 10 C=50 D=5 91
Calculation for ṫ
92
Steps for calculating ṫ
1.Sort data x in ascending order, pair y ranks with x2.Count c and d for each y3.Sum C and D4.Use formula to calculate ṫ
93
Formula for tau-b(with ties)
94
Example 2 Kendall’s tau-b Wine Consumption and heart disease deaths data i Country xi yi c d
1 Ireland 0.7 300 0 182 Iceland 0.80.8 211 3 11
2 Norway 0.80.8 227 2 134 Finland 0.80.8 297 0 155 U.S. 1.2 199 5 96 U.K 1.3 285 0 137 Sweden 1.6 207 3 9
8 Netherlands 1.8 167 5 59 N. Z 1.9 266 0 10
10 Canada 2.4 191 2 711 Australia 2.5 211 1 712 Germany 2.7 172 1 613 Belgium 2.9 131 2 414 Denmark 2.9 220 0 515 Austria 3.9 167 0 4
16 Switzerland 5.8 115 0 317 Spain 6.5 86 1 118 Italy 7.9 107 0 119 France 9.1 71 0 0
C=25 D=141 95
Calculation for tau-b
96
Hypothesis Test for τ
97
Hypothesis test results
98
Hypothesis test results
99
100
Example 1 extension
101
102
103
SAS CodeData exams;Input exam1 exam2;Datalines;85 8598 95…;Run;Proc corr data=exams kendall;Var exam1 exam2;Run;
104
SAS outputThe CORR Procedure
2 Variables: exam1 exam2
Simple Statistics
Variable N Mean Std Dev Median Minimum Maximum
exam1 11 81.54545 14.13056 83.00000 57.00000 99.00000
exam2 11 79.72727 9.58218 79.00000 65.00000 95.00000
Kendall Tau b Correlation Coefficients, N = 11 Prob > |tau| under H0: Tau=0
exam1 exam2
exam1 1.00000
0.81818
0.0005
exam2 0.81818
0.0005
1.00000
105
7. Conclusion
106
Summary
Nonparametric tests are very useful when we don’t know anything about the distributions.
Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods.
Median is a better measurement of central tendency for non-normal population.
Sample can be ordinal and sample size is usually small.
107
Summary
In summary, we have briefly introduced some most common methods in our presentation including:Sign test Wilcoxon rank sum test and signed rank testKruskal-Wallis TestFriedman TestSpearman’s Rank CorrelationKendall’s Rank Correlation Coefficient
108
Questions
109
The End.
Thank You !
110