methods of applied statistics i · 2015-09-19 · logistics lecture 2-5pm every friday quiz at ~...
TRANSCRIPT
![Page 1: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/1.jpg)
Methods of Applied Statistics I
STA442 / STA2101
Craig Burkett
1
![Page 2: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/2.jpg)
About me Craig Burkett Formerly Aerospace engineer High-school teacher Lecturer at UBC Lecturer at UTM
Now a lecturer at U of T and a statistical consultant
2
![Page 3: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/3.jpg)
Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when
you may retrieve old quizzes) All course material (outline, lecture slides, practice
problems, quiz solutions) will be posted on the course website: http://utstat.toronto.edu/burkett/sta442f15/
For all inquiries come to office hours or speak to me before lecture or during the break Please do not send me an email
3
![Page 4: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/4.jpg)
Marking Scheme
Assessment Weight Due
Quizzes(best 10 of 11) 70%
Sept 25Oct 2Oct 9Oct 16Oct 23Oct 30Nov 6
Nov 13Nov 20Nov 27Dec 4
Final Exam 30% ???
4
![Page 5: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/5.jpg)
Hierarchy of Stats Programs R / SAS R more popular among academics (free!) SAS more popular in large business & medicine
Everything else that can be scripted SPSS, Stata, …
Everything else that is menu-driven Minitab, R Commander, …
Anything browser-based Statcrunch, …
Excel 5
![Page 6: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/6.jpg)
Hierarchy of Operating Systems Unix It’s the fastest, most customizable
Windows power user Because they build their own machines
MAC power user Benefit mostly to video/audio
Windows average user Good enough for this course
MAC hipster user Grrrrrrr! 6
![Page 7: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/7.jpg)
Ready … ?
Introduction to Everything
7
![Page 8: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/8.jpg)
Applied Statistics ProcessFormulate Research
Question & Sample size
Collect Data
Obtain Funding
MergeSort
CleanAggregateCompute
Descrip-tive
Stats & Graphs
Obtain Funding
Choose / Fit
Model
Check Assump-
tions
Interpret Results & Follow-up Tests / CIs
Formal Writeup / Publish
Our main focus Our
secondary focus
8
![Page 9: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/9.jpg)
Types of Statements Analytic Propositions Statements that are true by definition All red cars are red All bachelors are unmarried sin2 𝑥𝑥 + cos2 𝑥𝑥 = 1
Domain of Mathematicians If you reject these types of claims, you are
widely considered an Irrational person Or, at best, you don’t understand the symbols
9
![Page 10: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/10.jpg)
Types of Statements Scientific Statements Can be verified empirically Tylenol is an effective painkiller Fertilizer A is better than B for crop growth As the force, so the deformation (Hooke’s Law: 𝐹𝐹 = 𝑘𝑘𝑥𝑥 for springs)
Domain of Scientists (that’s us!) Logical Positivists
If you reject these types of claims, you are a very poor scientist
10
![Page 11: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/11.jpg)
Types of Statements Dogmatic Claims Cannot be verified or proven I think, therefore I am There is a God Friends are more important than money
Domain of Theologians & Yoga instructors If you reject these claims, you are considered
either a cynic, a boor, enlightened, thoughtful, playing devil’s advocate, or a heathen, depending on who is judging you
11
![Page 12: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/12.jpg)
Types of Statements Aesthetic/Ethical Claims Confer an opinion about something That painting is beautiful Students should not text during class Craig should not slurp his cereal
Domain of people who are more interesting at parties than the previous three groups
If you reject these claims, you might annoy my wife
12
![Page 13: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/13.jpg)
Scientific Claims Must be falsifiable Karl Popper
Are ‘proven’ true by experiment Evidence-based medicine
A hypothesis is formed, and either rejected or not rejected based on the results of a properly designed experiment That’s where we come in!
This is the scientific method If you don’t like it, go back to the middle ages 13
![Page 14: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/14.jpg)
Applied Stats Algorithm We have our first steps in a methodology
Is the research question scientific?
Continue to next step
Not our problem
Yes No
14
![Page 15: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/15.jpg)
Proven? How?
“When you have eliminated the impossible, whatever remains, however improbable, MUST be the truth”- Sherlock Holmes
15
![Page 16: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/16.jpg)
Experiments We prove things in Science by setting up
experiments In the simplest setup, we form two groups Apply a different Treatment to each group Measure something of interest (Response)
Ideally, the groups are identical in every possible way, except for the treatment
If the groups differ in the response, it must be due to the treatment variable Elementary, my dear Watson 16
![Page 17: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/17.jpg)
Observational Studies We also do research this way Cheaper, faster
Analytical methods basically the same as those used in experiments
Setup is fundamentally different Treatments not assigned randomly
You can say a lot about Obs. Studies, but no matter how you slice it, you cannot determine causation (Some folks are working on this …) Will return to this later, in a mathematical way 17
![Page 18: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/18.jpg)
Experiments and Observational Studies
Similarities and Differences
18
![Page 19: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/19.jpg)
Experiments vs. Observational Studies Two main types of studies that use regression
models (ie. applied statistics) Analysis can be similar (at least for MLR
models) but conclusions very different Experiments can determine Cause and Effect
relationships At least they’re our best way to determine C&E
thus far
19
![Page 20: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/20.jpg)
Experiments A treatment imposed on experimental units Popular in engineering, medicine, forestry,
farming, education, biology, psychology …
Engineering Effect of temperature on failure rate of electronics Which production method produces ‘better’ quality
parts?
20
![Page 21: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/21.jpg)
Experiments Medicine Does drug A work ‘better’ than drug B? Do patients receiving surgery live longer than
patients on a chemical treatment?
Forestry Predict future coverage Estimation of age (without core samples)
21
![Page 22: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/22.jpg)
Experiments Farming What is the optimal irrigation level and fertilizer
dosage to maximize crop yield?
Education Do students in a ‘flipped classroom’ retain more? Which medium is best to keep attention?
22
![Page 23: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/23.jpg)
Experiments Biology What pH level allows cells to live longest? In vivo vs. in vitro
Psychology Eyewitness testimony Pygmalion / Golem effect Demand effect
23
![Page 24: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/24.jpg)
Experiments Medicine - RCTs Randomized Controlled Trials Gold standard for evidence-based medicine
Registered now at ClinicalTrials.gov to avoid publication bias
Double-blind Both the experimental unit (patient) and the
researcher do not know the treatment Lanarkshire Milk Study
24
![Page 25: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/25.jpg)
Experiments Placebo-controlled Placebo effect
Sample size calculations Ethical reasons
Intention To Treat Subjects can switch treatments of their own
accord – how to deal with it? Often assume they stayed with the original group,
to avoid selection bias Conservative results
25
![Page 26: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/26.jpg)
Experiments Advantages Strength of conclusion If properly randomized can make cause-effect
conclusions
Disadvantages Expensive Time-consuming Loss to follow-up Ethical issues (ie. animal testing)
26
![Page 27: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/27.jpg)
Observational Studies Data measured without intervention Can determine associations, but can’t say that X
causes Y Can make predictions Popular in medicine, education, social science
…
27
![Page 28: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/28.jpg)
Observational Studies Medicine Does smoking cause lung cancer? What are the long-term effects of amputation? Not ethical to assign these treatments!
Education Do students from higher income families do better? Is there a difference in graduation rate based on race?
Not possible to assign these treatments
28
![Page 29: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/29.jpg)
Observational Studies Social Science Is there a personality difference between dog and
cat owners? Can we predict rat populations throughout
Vancouver using demographic information? Treatments not assigned but ‘chosen’ by subjects
29
![Page 30: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/30.jpg)
Observational Studies Can be prospective or retrospective Prospective Identify a cohort and follow them through time
Retrospective Look back through time and see what happened
to a cohort Prospective studies provide better evidence of
an association, but take longer, require ethical approval and have follow-up issues
30
![Page 31: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/31.jpg)
Observational Studies Advantages Cheaper Data can be expensive, but analysis can be done by a
single person Instant (Retrospective, anyway) Fewer ethical issues as subjects chose own Tx No withdrawl problems (for retrospective)
Disadvantages No Cause-Effect conclusions Although they can motivate future experiments
31
![Page 32: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/32.jpg)
Observational Studies They differ from experiments in that the
Treatments were not assigned randomly They were ‘chosen’ by the subjects themselves Does smoking cause lung cancer? What is the effect of an amputation on longevity? What is the effect of climate on happiness?
In observational studies, causation cannot be determined
32
![Page 33: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/33.jpg)
Experiments vs. Observational Studies This is so important, that it gets its own slide
Only properly controlled, double-blind, randomized experiments can determine causation
Confounding cannot be removed from observational studies so a causal link cannot be made Doesn’t stop people from doing it though 33
![Page 34: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/34.jpg)
34
![Page 35: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/35.jpg)
Anecdotal Evidence “My grandfather lived to 102 and he smoked a
pack a day!” “My aunt went to this tarot-card reader and
found true love the next week!” … Not statistical or scientific Arguably more influential than both experiments
and observational studies You should work to change this! How? Collect and study data
35
![Page 36: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/36.jpg)
Data“In God we trust; all others bring data”
- W. Edwards Deming
(Quote on the wall at JSC – NASA)
36
![Page 37: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/37.jpg)
X: Independent Variable(s) Not the best name for it Predictor, Explanatory
Manipulated in an experiment
Can take one of four types Categorical / Nominal Ordinal Interval Ratio
37
![Page 38: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/38.jpg)
Categorical variables Takes on several levels, none of which have any
natural ordering Sex (M, F, …) Race (Black, White, Asian, …) Program major (Stat, CS, Math, Psych, Bio, …) Type of fertilizer (A, B, …) Drug (Active, Placebo)
When controlled by the experimenter, called a Factor
38
![Page 39: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/39.jpg)
Ordinal variables Takes on several levels which have a natural
order, but no consistent distance metric Grade (A+, A, A-, B+, …) Professor Rating (5, 4, 3, 2, 1) Likert item
Level of education (PhD, Masters, Bachelors, HS, Primary, None)
Sports (Rugby, Football, Soccer, … Basketball) Difficult to deal with, so we usually consider
them as either Categorical, or39
![Page 40: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/40.jpg)
Interval variables Numerical variable with a consistent distance
metric, but no proper zero point IQ Temperature (in °C) SAT score
Slope and difference are meaningful, but ratios are not
40
![Page 41: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/41.jpg)
Ratio variables Interval variable with a proper zero point Age Weight Temperature (in K) Amount of rainfall
Ratios are meaningful Important for reporting on multiplicative effects
and using log transformations41
![Page 42: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/42.jpg)
Example data fileSex Grade IQ AgeM B+ 110 19M C- 102 21F A+ 119 19F C+ 103 20… … … … Rows are cases Columns are variables That’s the way we roll – don’t rock the boat
42
![Page 43: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/43.jpg)
Y: Dependent Variable Not a good name for it either Response variable is better
Measured by experimenter Can take four types as well We’ll study numerical and categorical responses
in this course Should not be subject to Floor or Ceiling Effect Math contest scores and high school grades
43
![Page 44: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/44.jpg)
X: Independent Variable(s) Could be numerical or categorical If X is categorical We call X a factor if we manipulate it Single-Factor experiments have only one X
variable, call it Factor A The possible levels of Factor A are {a1, a2, a3, …}
Two-Factor experiments have two independent variables, Factors A and B
Treatments are combinations of Factor levels
44
![Page 45: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/45.jpg)
Y: Response Variable(s) Could also be numerical or categorical This leads to many different setups, some of
which you may have studied alreadyY
Categorical Numerical
XCategorical Cont. Tables ANOVANumerical Log. Reg. Regression
45
![Page 46: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/46.jpg)
Experimental Units A minimal unit that could possibly receive a
unique treatment Often, but not always, a single row in the datafile
Each one gets a single treatment from the possible levels of all factors ie. Households may have multiple people
(measurements) but if the treatment was applied to the household, then household is the unit There is no way that different members of the
household could get a different Tx46
![Page 47: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/47.jpg)
Example We want to do an experiment to see if Teaching
Style affects the Learning Outcome in a classroom Note italics: those are the variables Which is the response? Independent? What type is each? How many levels? Go!
47
![Page 48: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/48.jpg)
Example Suppose we want to see if Teaching Style and
Lecture Time affect Learning Outcome How many factors? Levels of each? What are the possible treatments? What are the experimental units? Go!
48
![Page 49: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/49.jpg)
Crossed Factors When all possible factor combinations are actual
treatments, the experiment is said to be fully crossed
These are easier to deal with than other designs such as Nested, which we may study later Leads to hierarchical linear models
49
![Page 50: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/50.jpg)
X vs Y When there is more than one response variable,
this calls for a Multivariate Analysis More difficult We may cover these methods in this course
Most interesting experiments have more than one independent variable This is not a “multivariate” situation Call it “multiple” if you must
50
![Page 51: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/51.jpg)
Nuisance variables Anything that influences the response variable
other than the treatment condition Teaching study Different lecturers for each section Amount of instruction given for each section Older students may be better (or worse) at
performance Amount of homework done Time of day for lecture section
What can we do about them? 51
![Page 52: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/52.jpg)
Control If we can control a nuisance variable and keep it
constant throughout all treatment levels, then we should It is then no longer a ‘variable’ – problem solved
Teaching study Same lecturer delivers each style Amount of instruction given for each section can
easily be controlled – 36h each
52
![Page 53: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/53.jpg)
Blocking For nuisance variables that cannot be controlled
but can still be observed, we can use Blocking to make sure each group has an equal amount of each
Teaching study When forming the two sections, ensure that there
is an equal distribution by age in each class
53
![Page 54: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/54.jpg)
Randomization For nuisance variables that cannot be controlled
or observed, we depend on Randomization to spread these variables out evenly
Teaching study If, after any desired blocking, we allocate students
randomly to each section, then the amount of homework done by each student should be equal between the two sections Even if not equal among all units
54
![Page 55: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/55.jpg)
Replication It would be silly to run an experiment with just
one observation at each treatment combination If there’s a difference between treatments, it might
just be that experimental units are different! Cannot say for sure
Also, we can’t estimate variance with only one observation
Teaching study There are many students in each class Another university could replicate the study
55
![Page 56: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/56.jpg)
Replication There are really two levels of replication Treatment Level We take more than one observation at each
treatment combination Experiment Level We like to replicate the entire experiment, to make
sure that our results were not an artifact of something else, and to generalize to other populations
Also an easy way to get a “Me Too!” paper published 56
![Page 57: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/57.jpg)
Blinding If subjects are aware of the assigned treatments,
they may influence the response variable Consciously or not Placebos
If researchers are aware of the assigned treatments, they may influence the response too Consciously or not Lanarkshire Milk Study
We try to make experiments Double-Blind for these reasons
57
![Page 58: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/58.jpg)
Principles of Experiment Design These form the principles of good experiment
design: Control Blocking Randomization Replication Blinding
Observational studies lack randomization, and likely one or more other principles as well Exactly why we can’t infer causation from them 58
![Page 59: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/59.jpg)
Does this matter? Ignoring these principles is like applying the
rules of one sport (say, basketball) to another (say, soccer) It might work, but it probably won’t
It doesn’t make ANY SENSE to apply statistical methods rooted in the preceding assumptions, to an experiment that hasn’t followed the principles of good design
So yes, it does matter
59
![Page 60: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/60.jpg)
Confounding Any nuisance variables that are systematically
related to the treatments will have their effects confounded with the treatment effects we are trying to measure
Teaching study If style A is taught in the morning and style B in
the evening section, then any difference in response between sections cannot be uniquely attributed to teaching style
60
![Page 61: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/61.jpg)
Mozart Effect Babies who listen to classical music tend to do
better in school later on
Does this mean parents should play classical music for their babies?
Please comment What is one possible confounding variable?
![Page 62: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/62.jpg)
Balance We generally try to design experiments with the
same number of experimental units in each treatment group
Such an experiment is called balanced There are reasons to design unbalanced
experiments as well, on purpose At any rate, as soon as somebody drops a test
tube, it’s no longer balanced We’ll have to learn to deal with both
62
![Page 63: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/63.jpg)
Applied Stats Algorithm
Experiment Observational Study
Scientificquestion?
Prospective Retrospective
Properly Controlled?
No
No
Crossed Nested
SingleFactor
Factorial
63
![Page 64: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/64.jpg)
Applied Stats AlgorithmScientificquestion?
No
Classify Study
Unacceptable
64
![Page 65: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/65.jpg)
Acquiring Data
Surveys and Samples
65
![Page 66: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/66.jpg)
Another way We already saw two ways to come into data Designed Experiments Observational Studies
You could also run a survey, or take a sample
66
![Page 67: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/67.jpg)
Population and Sample Population A (large) group of experimental (observational)
units about which we want to make some inference People, rabbits, trees, bags of water, …
Sometimes well-defined, sometimes not Sample A (smaller) group of experimental units that we
hope well-represents the population Selected from a frame Approximation to list of population 67
![Page 68: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/68.jpg)
Population and Sample From our sample, we compute statistics We don’t really want statistics, we want the
population parameters With our knowledge of probability and the magic
of inductive reasoning, we can infer parameters from statistics, usually with some sort of interval Confidence, Credible
68
![Page 69: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/69.jpg)
Confidence Interval Pair of numbers chosen so that the probability
they will enclose the (fixed) parameter, or function of parameters, is large, like 95%
CIs are random – there is nothing particularly special about your CI
Because they are random, they miss the parameter occasionally Like 𝛼𝛼 % of the time
![Page 70: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/70.jpg)
Credible Interval This is a Bayesian concept, and thus heresy in this
course (and UG program)
I don’t really believe this, and we shall discuss Bayesian philosophy shortly But not methods
Image credit: Monty Python Series 2, Episode 2
![Page 71: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/71.jpg)
Probability Sampling Techniques SRS Stratified Cluster Multistage
Systematic
71
![Page 72: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/72.jpg)
Simple Random Sample Every possible subset of size n has equal
chance of selection Theoretically easy Practically difficult Basis against whichall other samples are measured
Images: D. Kernler (CC BY-SA 4.0)
72
![Page 73: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/73.jpg)
Stratified Sample Divide the population into fixed ‘strata’, and
sample as an SRS from each No less efficient than SRS; potentially more Allows stratum-levelinference as well
73
![Page 74: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/74.jpg)
Cluster Sample Divide the population into natural clusters, and
take an SRS of clusters Sample entire cluster Very cost-effectivesampling strategy Population-level framenot required Usually implementedas multistage
74
![Page 75: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/75.jpg)
Stratified vs Cluster Strata are fixed groups and there are usually few Clusters are random and there are usually many We sample from all strata but only some clusters Stratified sampling is very efficient when The strata themselves are very homogeneous Differences (variation) between strata large
Cluster sampling is very efficient when The clusters themselves are very heterogeneous Differences (variation) between clusters minimal
75
![Page 76: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/76.jpg)
Systematic Sample Choose every kth unit
Does not strictly require a frame, but watch out for natural periods! 76
![Page 77: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/77.jpg)
Differences with Studies A study tries to measure the effect of something,
whether or not it is causal A survey is one way to collect data in an OS, if
you don’t feel like measuring it yourself Or can’t It’s certainly not an experiment, since there is no
treatment applied With surveys, you don’t need to look at “effects” You can still estimate parameters, with intervals
77
![Page 78: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/78.jpg)
Probability and Statistics
What’s the difference?
78
![Page 79: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/79.jpg)
Probability I have a fair coin (½ chance to get tails) I am going to flip it n times What is the probability that I see tails exactly
k times?
𝑃𝑃 𝑋𝑋 = 𝑘𝑘 = 𝑛𝑛𝑘𝑘
12
k 12
n−k
You know the parameter (p = ½), and want the chances of observing a specific sample
Deductive reasoning
![Page 80: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/80.jpg)
Statistics I have a coin, but I don’t know if it’s fair p chance to get tails
I have just flipped it n times I got tails k times What can I say about the coin’s fairness? ie. I want to estimate p
You know the outcome of a specific sample, and want to infer the parameter p
Inductive reasoning
![Page 81: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/81.jpg)
Again Probability Parameters Experiment
Statistics Experiment Parameters
They are inverse problems
![Page 82: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/82.jpg)
Basic Statistical Problem I have a coin and I want to know if it’s fair Or, more specifically, what is p The probability of tails on a single toss
There are three approaches to solve this problem1. Layman2. Frequentist3. Bayesian
![Page 83: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/83.jpg)
The Layman Approach It’s a coin, of course it’s fair!
I don’t need to do an experiment Let’s go watch the hockey game instead
Obviously not very scientific
![Page 84: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/84.jpg)
The Frequentist Approach I have no idea, let’s do an experiment Toss the coin 10 times Get tails on 3 out of 10 tosses
Predict 𝑝𝑝 = 0.3
Much more scientific Definitely objective and unbiased
![Page 85: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/85.jpg)
The Bayesian Approach I think it’s probably fair since most coins are
fair, but I’ll do an experiment anyway Toss the coin 10 times Get tails on 3 out of 10 tosses
I thought 𝑝𝑝 = 0.5, but it looks like 𝑝𝑝 = 0.3 Predict 𝑝𝑝 = 0.4 It’s a bit more complicated than taking the midpoint
Subjective Uses more information than just the sample
![Page 86: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/86.jpg)
Polling Time Which do you prefer?
1. Layman2. Frequentist3. Bayesian
![Page 87: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/87.jpg)
The situation You are a teacher, and one of your students
(Billy) has turned in five items of ‘A’ quality so far this term (out of five total items)
Call them papers They could be essays, tests, assignments, …
A sixth paper has just come due, and you have to mark it
A A A A A ???
You turn to Statistics for guidance
![Page 88: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/88.jpg)
The paper Unfortunately for Billy, the paper is actually of
‘D’ quality Objectively
Look at three possible marking options
![Page 89: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/89.jpg)
The Layman says … Billy always gets an A, so I’m not even going
to read this paper Give him an A and let’s watch the game!
This approach would miss the objective fact that the paper is of D quality
Sounds ridiculous, but I’m pretty sure some of my former colleagues (ahem … English department) marked this way
![Page 90: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/90.jpg)
The Frequentist says … This paper is a D, so I’m going to give Billy a
D because that’s what he earned I don’t care that he always gets an A
Seems ‘fair’, although Billy won’t like it Of course, he would like it just fine if he usually got
D’s and turned in an A paper
![Page 91: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/91.jpg)
The Bayesian says … Billy always gets an A, but I’ll read his paper
anyway to see if I should update my thinking Oh, it’s a D Better give him a B just to be safe If he turns in another D paper, I’ll give him a C, and
the following D paper will get a D Billy will appreciate this method, unless of
course the situation were reversed
![Page 92: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/92.jpg)
Polling Time Which do you prefer?
1. Layman2. Frequentist3. Bayesian
![Page 93: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/93.jpg)
Anecdotally Most choose the Frequentist method, in
contrast to the coin experiment Maybe because it’s ingrained in us early as
teachers & students However, if you’re looking at the name first,
you’re a Bayesian whether you like it or not Even if you don’t look, you may recognize the
hand-writing We can extend this
![Page 94: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/94.jpg)
It’s an important question We can extend this idea beyond just marking,
to cover all interactions with people I generally like my friend Mark, but he just
said something mean to me Should I react accordingly, or temper my reaction
based on his past good behaviour? Similarly with people who have screwed me
over in the past Should I trust them now after one good instance,
or will that be naïve?
![Page 95: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/95.jpg)
Lastly I have a 6-sided die, the general outcome of
which we’ll call X And we’ll call it x when we observe a specific
outcome I just rolled it for you to see x = 5
Is x a fixed constant, or a random variable?
95
![Page 96: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/96.jpg)
Lastly Same setup, except I haven’t rolled the die yet
Is X a fixed constant, or a random variable?
What’s the best we can say about our upcoming outcome x?
96
![Page 97: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/97.jpg)
Lastly Same setup, except I rolled the die but didn’t
show you the outcome
Is x a fixed constant, or a random variable?
To be clear, X (or x) is not a parameter, but it’s still something unknown, at least to you
97
![Page 98: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/98.jpg)
Distribution = Population Histogram
![Page 99: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/99.jpg)
For each value x of the independent variable X, there is a separate distribution of the dependent Variable Y
This is called the conditional distribution of Ygiven X = x Conditional distribution of height given Sex = F
Conditional Distribution
![Page 100: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/100.jpg)
Definition of “Related” We will say that the independent and dependent
variables are unrelated if the conditional distribution of the dependent variable is identical for each value of the independent variable
If the distribution of the dependent variable doesdepend on the value of the independent variable, we will describe the two variables as related, or associated
![Page 101: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/101.jpg)
Testing Statistical Significance Are IV and DV “really” related? Null Hypothesis H0: They are unrelated in the
population. Reasoning: Suppose that the IV and DV are actually unrelated
in the population. If H0 is true, what is the probability of obtaining a sample relationship between the variables that is as strong or stronger than the one observed? If the probability is small (say, p < 0.05), then we describe the sample relationship as statistically significant, and it is socially acceptable to discuss the results
![Page 102: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/102.jpg)
P-value The probability of getting our results (or better)
just by chance.
The conditional probability of our test statistic, given the null hypothesis is true
The minimum significance level at which the null hypothesis can be rejected.
![Page 103: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/103.jpg)
We can be wrong Type I error: H0 is true, but we reject it
Type II error: H0 is false, but we fail to reject it
![Page 104: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/104.jpg)
Errors
http://experimentaltheology.blogspot.ca/2010/09/theology-of-type-1-type-2-errors.html
104
![Page 105: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/105.jpg)
Image: Stock photo
![Page 106: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/106.jpg)
We can be wrong Type I error: H0 is true, but we reject it
Type II error: H0 is false, but we fail to reject it
Type III error: Answer the wrong question
![Page 107: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/107.jpg)
Power The probability of correctly rejecting H0
Power = 1 - P(Type II Error)
Power increases with true strength of relationship, and with sample size
Power can be used to select sample size in advance of data collection
![Page 108: Methods of Applied Statistics I · 2015-09-19 · Logistics Lecture 2-5pm every Friday Quiz at ~ 3pm, followed by 10-minute break (when you may retrieve old quizzes) All course material](https://reader033.vdocuments.mx/reader033/viewer/2022060313/5f0b6d917e708231d4307829/html5/thumbnails/108.jpg)
Should we accept H0? When the results are not statistically significant,
usually we will say that the data do not provide enough evidence to conclude that the variables are related
Sometimes, we have to make a decision either way, in which case ‘not rejecting’ H0 is tantamount to accepting H0 Quality control