plan statistical basics - unil
TRANSCRIPT
Quantitative approaches
Lesson 9:
Statistical basics
Quantitative approaches
Plan
1. Types of analysis
2. Types of variables : nominal, ordinal, interval, metric
3. Measures of central tendency: mode, median, mean
4. Degrees of freedom
5. Measures of variability: variance and standard deviation
6. Central limit theorem and the normal distribution
7. A measure of unreliability : the standard error
8. Confidence intervals
9. Statistical tests
10. Other distributions and tests : T, F, Chi-square, Poisson,Binomial
Quantitative approaches
Useful resources
http://onlinestatbook.com/rvls/index.html
Rice Virtual Lab in Statistics
Quantitative approaches
1. Types of analysis
Quantitative approaches
Types of analysis
- descriptive or inferential
- univariate, bivariate, multivariate
Quantitative approaches
Descriptive vs. inferential analysis
"Descriptive analysis is about the data you have in hand.Inferential analysis involves making statements -inferences - about the world beyond the data you have inhand."
"When you say that the average age of a group oftelephone survey respondents is 44.6 years, that's adescriptive analytic statement. When you say that there is a95% statistical probability that the true mean of thepopulation from which you drew your sample ofrespondents is between 42.5 and 47.5 years, that's aninferential statement. You infer something about the rest ofthe world from data in your sample."
(Bernard, 2000: 502)
Quantitative approaches
Univariate, bivariate, multivariate
- univariate : uses 1 variable
- bivariate: uses 2 variables
- multivariate: uses 3 and more variables
Quantitative approaches
Univariate, bivariate, multivariate
- "Univariate analysis involves getting to know dataintimately by examining variables precisely and in detail.Bivariate analysis involves looking at axssociationsbetween pairs of variables and trying to understand howthose associations work. Multivariate analysis involves,among other things, understanding the effects of more thanone independent variable at a time on a dependentvariable."
(Bernard, 2000: 502)
Quantitative approaches
Univariate, bivariate, multivariate: how to
proceed
1. Look at the variables one by one: what is their range,mean, median, variance (is there variance!?), distribution(univariate)
2. Inspect associations between pairs of variables. How doesthe independent variable "influence" the dependentvariable? (bivariate)
3. Look at the associations of several variablessimultaneously. How do two or more independentvariables influence a dependent variable at the same time?(multivariate)
Quantitative approaches
Bivariate analysis: questions to ask
1. How big/important is the covariation? In other words,how much better could we predict the score of a dependentvariable in our sample if we knew the score of someindependent variable? Covariation coefficients answer thisquestion
2. Is the covariation statistically significant? Is it due tochance, or is it likely to exist in the overall population towhich we want to generalize? Statistical tests answer thisquestion.
3. What is it direction? (look at graphs)
4. What is its shape? Is it linear or non linear? (look atgraphs)
Quantitative approaches
Multivariate analysis: questions to ask
1. How is a relationship between two variables changed if athird variable is controlled? (Multiple crosstabs, partialcorrelation, multiple regression, MANOVA)
2. What is the overall variance of a dependent variable thatcan be explained by several independent variables. Whatare the relative strenghs of different predictors(independent variables)? (Multiple regression)
3. What groups of variables tend to correlate with each other,given a multitude of variables? (Factor analysis)
4. Which individuals tend to be similar concerning selectedvariables? (Cluster analysis)
Quantitative approaches
2. Types of variables :
nominal, ordinal, interval, metric
Quantitative approaches
Levels of measurement and covariation:
AnalysisDepend. Nominal Ordinal Interval/ratio
Independ.
Nominal Crosstabs (ANOVA ANOVA
Crosstabs) (Means)
Ordinal (Corr/Regr (Corr/Regr)
Crosstabs)
Interval/ratio Logistic (Corr/Regr) Correlation
Regression Regression
Quantitative approaches
3. Measures of central tendency:
mode, median, mean
Quantitative approaches
Definitions : Mode, Median, Mean
Mode = Value in the distribution of the variablethat comes up most frequently
Median = Value in the distribution that has 50% ofthe values «! to its right! » and 50% of thevalues «!to its left!».
Mean = Sum of the values divided by n
Quantitative approaches
Variables : nominal, ordinal, interval
Variables :
1. Nominal have no inherent order
example: party preference, male-female
2. Ordinal are ordered, but the distances are not quantifiable (we cannot add or subtract)
example: agree a lot, agree a bit, disagree abit, disagree a lot
3. Interval can be measured numerically; it makessense to to additions or subtraction
example : height, weight, income, numberof cars
Quantitative approaches
Example : Size of 11 dwarfs
1. Size of 11 dwarfs:
13, 7, 5, 12, 9, 15, 7, 11, 9, 7, 12 (cm)
= 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15
5 7 7 7 9 9 11 12 12 13 15
Median
9,7272
Mean
Mode
Quantitative approaches
Example : Size of 11 dwarfs
Mode
Median
5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15
Mean
9.7272
Quantitative approaches
Calculating mean, mode, median
mean = y =y!
n
mean = y =5 + 7 + 7 + 7 + 9 + 9 +11+12 +12 +13+15
11
mean = y =107
11= 9.727273
median = 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15
mode= 5, 7, 7, 7, 9, 9, 11, 12, 12, 13, 15
Quantitative approaches
4. Degrees of freedom
Quantitative approaches
Degrees of freedom : definition
Degrees of freedom (df) =
number of values in the calculation of an estimate (e.g.mean, variance, standard error) that are «!free to vary!».
Degrees of freedom =
number of independent values that go into the estimate (=n) minus the number of parameters estimated
Quantitative approaches
Degrees of freedom : example
We have 5 dwarfs, their meansize is 4.
What is the sum of their sizes?It must be 20, otherwise themean could not be 4.
So now let’s think about eachof the five dwarfs in turn
We are free to choose the firstfour numbers, but we are notfor the last one - what is it?
(example adapted from
Crawley 2005)
2
2
2
2
2
6
6
6
6
8
8
8
3
3 ?
Quantitative approaches
Degrees of freedom : example
The last dwarf must have thesize = 1, since the sum = 20.
Therefore, we are not «!freeto choose this last number!».
This means, that we have df= 4 in this case.
Check:
n - number of parameters tobe estimated:
5 - 1 = 4
2
2
2
2
2
6
6
6
6
8
8
8
3
3 1
Quantitative approaches
5. Measures of variability:
variance and standard deviation
Quantitative approaches
Variance and standard deviation : definitions
Variance and standard deviation are measures of the«! variability! » of a variable. In other words: how muchthey «!vary!» around the mean.
Variance = the sum of the square of the individualdepartures from the mean divided by the degrees offreedom
Standard deviation = the square root of the variance.
Quantitative approaches
Variance
mean = y =y!
n
variance =sum of squares
degrees of freedom= s
2=
(y " y)2!
(n "1)
standard deviation = s =(y " y)2
!(n "1)
Quantitative approaches
Example: Dwarfs in 3 gardens
Quantitative approaches
Size of dwarfs in 3 gardens
Garden A
Garden B
Garden C
Quantitative approaches Quantitative approaches
Size of dwarfs in 3 gardens
A B C
3 5 3
4 5 3
4 6 2
3 7 1
2 4 10
3 4 4
1 3 3
3 5 11
5 6 3
2 5 10
Garden
mean(A) = yA= 3
mean(B) = yB= 5
mean(C) = yC= 5
var(A) = sA2= 1.3
var(B) = sB2= 1.3
var(C) = sC2= 14.2
Quantitative approaches
Computing variance of dwarfs in garden A
Var = s2=
(y ! y)"n !1
; y = 5
VarC =(3 ! 5)
2+ (3 ! 5)
2+ (2 ! 5)
2+ (1 ! 5)
2+ (10 ! 5)
2+ (4 ! 5)
2+ (3 ! 5)
2+ (11 ! 5)
2+ (3 ! 5)
2+ (10 ! 5)
2
(10 ! 1)
VarC =(!2)
2
+ (!2)2
+ (!3)2
+ (!4)2
+ (5)2
+ (!1)2
+ (!2)2
+ 62
+ (!2)2
+ (5)2
9
VarC =4 + 4 + 9 +16 + 25 +1+ 4 + 36 + 4 + 25
9
VarC =128
9= 14.2
Quantitative approaches
Computing variance of dwarfs in garden C
Var = s2=
(y ! y)"n !1
; y = 5
VarA =(3 ! 5)
2+ (3 ! 5)
2+ (2 ! 5)
2+ (1 ! 5)
2+ (10 ! 5)
2+ (4 ! 5)
2+ (3 ! 5)
2+ (11 ! 5)
2+ (3 ! 5)
2+ (10 ! 5)
2
(10 ! 1)
VarA =(!2)
2
+ (!2)2
+ (!3)2
+ (!4)2
+ (5)2
+ (!1)2
+ (!2)2
+ 62
+ (!2)2
+ (5)2
9
VarA =4 + 4 + 9 +16 + 25 +1+ 4 + 36 + 4 + 25
9
VarA =128
9= 14.2
Quantitative approaches
Boxplot = graphical summary of the
variability of a variable
75% quartile
Median (50% quartile)
Whiskers = lowest data point
that are not outliers or extreme
values.
Boxplots
25% quartile
Quantitative approaches
Outliers = values that are between 1.5 and 3 timesthe interquartile range
Extreme values = values that are more than 3 times theinterquartile range
Interquartile range = distance between the quartiles
In boxplots, outliers and extreme values are represented bycircles beyond the whiskers.
Outliers and extreme values in boxplots
Quantitative approaches
Showing differences between means and
variance graphically with „boxplots“