1 chapter 4 gathering data. 2 looking back in chapters 2 & 3 we learned how to describe data...
Post on 22-Dec-2015
216 views
TRANSCRIPT
1
Chapter 4
Gathering Data
2
Looking Back In Chapters 2 & 3 we learned how to describe data
both graphically and numerically. For these statistical analyses to be useful, we must
have good data. In fact, the way a study is designed (how we gather
data) can have a major impact on the results of the study.
The purpose of this course is for you to learn what you can conclude about an entire population given a sample from that population.
If a study is poorly designed and implemented, the results may be meaningless or misleading.
Example taken from Statistics: The Art and Science of Learning from Data
3
Two Scenarios Study 1
A U.S. study (2000) compared 469 patients with brain cancer to 422 patients who did not have brain cancer. The patients’ cell phone use was measured using a questionnaire. The two groups’ use of cell phones was similar.
Study 2 An Australian study (1997) conducted a study with 200
transgenic mice. One hundred were exposed for two 30 minute periods a day to the same kind of microwaves with roughly the same power as the kind transmitted from a cell phone. The other 100 mice were not exposed. After 18 months, the brain tumor rate for the exposed mice was twice as high as that for the unexposed mice.
4
Questions to Consider
How do the two studies differ? Study 1
Study 2
5
Questions to Consider
How do the two studies differ? Study 1
No treatments assigned Patients merely questioned
Study 2
6
Questions to Consider
How do the two studies differ? Study 1
No treatments assigned Patients merely questioned
Study 2 Uses mice in hopes of generalizing to humans
7
Questions to Consider
Why do the results of different medical studies sometimes disagree?
Could the second study be performed on human beings?
8
Questions to Consider
Why do the results of different medical studies sometimes disagree? Differing types of studies, data collection or
sample frames Could the second study be performed on
human beings?
9
Questions to Consider
Why do the results of different medical studies sometimes disagree? Differing types of studies, data collection or
sample frames Could the second study be performed on
human beings? No, because it would be unethical to knowingly
expose humans to possibly harmful waves.
10
Questions to Consider
Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? Informal observations of this type are called
_____________ _____________. You should rely on reputable research studies,
not anecdotes.
11
Questions to Consider
Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? Informal observations of this type are called
anecdotal evidence. You should rely on reputable research studies,
not anecdotes.
12
Two Main Ways to Gather Data Observational Study
The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments
Example: Experiment
The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable.
Treatments correspond to values of the explanatory variable
Example:
13
Two Main Ways to Gather Data Observational Study
The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments
Example: Study 1 Experiment
The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable.
Treatments correspond to values of the explanatory variable
Example:
14
Two Main Ways to Gather Data Observational Study
The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments
Example: Study 1 Experiment
The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable.
Treatments correspond to values of the explanatory variable
Example: Study 2
15
Advantages of Experiments over Observational Studies
In an observational study, there can always be lurking variables affecting the results.
This means that observational studies can _________ show causation.
It is easier to adjust for lurking variables in an experiment.
In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.
16
Advantages of Experiments over Observational Studies
In an observational study, there can always be lurking variables affecting the results.
This means that observational studies can never show causation.
It is easier to adjust for lurking variables in an experiment.
In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.
17
Disadvantages of Experiments
They can be ____________ to perform on the subjects in which you are interested.
It can be difficult to monitor subjects to ensure that they are doing what they are told.
They can take many years, even decades, to complete.
Results of experiments that use animals do not ______________ to humans.
They are unnecessary when the question of interest does not involve trying to assess _____________.
18
Disadvantages of Experiments
They can be unethical to perform on the subjects in which you are interested.
It can be difficult to monitor subjects to ensure that they are doing what they are told.
They can take many years, even decades, to complete.
Results of experiments that use animals do not ______________ to humans.
They are unnecessary when the question of interest does not involve trying to assess _____________.
19
Disadvantages of Experiments
They can be unethical to perform on the subjects in which you are interested.
It can be difficult to monitor subjects to ensure that they are doing what they are told.
They can take many years, even decades, to complete.
Results of experiments that use animals do not generalize to humans.
They are unnecessary when the question of interest does not involve trying to assess _____________.
20
Disadvantages of Experiments
They can be unethical to perform on the subjects in which you are interested.
It can be difficult to monitor subjects to ensure that they are doing what they are told.
They can take many years, even decades, to complete.
Results of experiments that use animals do not generalize to humans.
They are unnecessary when the question of interest does not involve trying to assess causality.
Example taken from Statistics: The Art and Science of Learning from Data
21
Example 4.1 A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing.
This is an example of an
Could there be any lurking variables?
Example taken from Statistics: The Art and Science of Learning from Data
22
Example 4.1 A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing.
This is an example of an observational study.
Could there be any lurking variables?
Example taken from Statistics: The Art and Science of Learning from Data
23
Example 4.1 A large study of student drug use and how it
depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing.
This is an example of an observational study.
Could there be any lurking variables? Frequency of drug testing, whether testing is random, etc.
Used with permission from Dr. Ellen Toby 24
Example 4.2 A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties.
This is an example of an
Could there be any lurking variables?
Used with permission from Dr. Ellen Toby 25
Example 4.2 A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties.
This is an example of an experiment.
Could there be any lurking variables?
Used with permission from Dr. Ellen Toby 26
Example 4.2 A researcher buys seeds of two different varieties of
corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties.
This is an example of an experiment.
Could there be any lurking variables? Soil quality, temperature
Used with permission from Dr. Ellen Toby 27
Example 4.3 A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups.
This is an example of an
Are there any lurking variables?
Used with permission from Dr. Ellen Toby 28
Example 4.3 A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups.
This is an example of an experiment.
Are there any lurking variables?
Used with permission from Dr. Ellen Toby 29
Example 4.3 A researcher has seeds of only one variety of tomato. She has
60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups.
This is an example of an experiment.
Are there any lurking variables? No, everything has been controlled here.
30
Types of Observational Studies
Retrospective Observational studies that look back in time
This is sometimes done to find risk factors for certain diseases
Cross-Sectional Observational studies that take a cross section of
the population at the current time Prospective
Observational studies in which subjects are followed into the future
31
Sampling Designs for Observational Studies
Simple Random Sampling (SRS) A simple random sample of n subjects from a
population is one in which each possible sample of that size has the _______ chance of being selected.
32
Sampling Designs for Observational Studies
Simple Random Sampling (SRS) A simple random sample of n subjects from a
population is one in which each possible sample of that size has the same chance of being selected.
33
Sampling Designs for Observational Studies
Stratified Sampling A stratified random sample divides the population
into separate groups, called strata, and then selects an SRS of _________ from each stratum.
34
Sampling Designs for Observational Studies
Stratified Sampling A stratified random sample divides the population
into separate groups, called strata, and then selects an SRS of subjects from each stratum.
35
Sampling Designs for Observational Studies Cluster Sampling
A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of ________(or strata) is taken. Every member of the selected groups is put into the sample.
36
Sampling Designs for Observational Studies Cluster Sampling
A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of groups (or strata) is taken. Every member of the selected groups is put into the sample.
37
Sampling Designs for Observational Studies
Systematic Sampling A systematic sample selects every kth person from
the sample frame. The researcher randomly selects a number between 1 and k in order to know which person to select first, then selects every kth person after this.
38
Advantages of the Various Sampling Designs
Simple Random Sampling (SRS) It is the easiest most widespread form of
sampling. Each subject has an _______ chance to be in the
sample. The sample enables us to determine how likely it
is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).
39
Advantages of the Various Sampling Designs
Simple Random Sampling (SRS) It is the easiest most widespread form of
sampling. Each subject has an equal chance to be in the
sample. The sample enables us to determine how likely it
is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).
40
Advantages of the Various Sampling Designs
Stratified Sampling It ensures that there are enough _________ in
each group that you want to compare. Cluster Sampling
It does not require a sampling frame of subjects. It is less ___________ to implement.
41
Advantages of the Various Sampling Designs
Stratified Sampling It ensures that there are enough subjects in each
group that you want to compare. Cluster Sampling
It does not require a sampling frame of subjects. It is less ___________ to implement.
42
Advantages of the Various Sampling Designs
Stratified Sampling It ensures that there are enough subjects in each
group that you want to compare. Cluster Sampling
It does not require a sampling frame of subjects. It is less expensive to implement.
43
Bias in Sampling
A sampling method is _________ if The sample tends to favor some parts of the
population over others. In other words, the results from the sample are
not representative of the population. Obviously, __________ samples are our
goal.
44
Bias in Sampling
A sampling method is biased if The sample tends to favor some parts of the
population over others. In other words, the results from the sample are
not representative of the population. Obviously, __________ samples are our
goal.
45
Bias in Sampling
A sampling method is biased if The sample tends to favor some parts of the
population over others. In other words, the results from the sample are
not representative of the population. Obviously, unbiased samples are our goal.
46
Types of Bias
Undercoverage Occurs when a sampling frame leaves out some groups in
the population
Nonresponse bias Occurs when some sampled subjects cannot be reached,
refuse to participate or fail to answer some questions
Response bias Occurs when the subject gives an incorrect response or
when the question wording or the way the interviewer asks the questions is confusing or misleading
47
Examples of Poor Samples that Result in Bias
Convenience Samples
Voluntary Response Samples
48
Examples of Poor Samples that Result in Bias
Convenience Samples Sampling friends Sampling at the mall
Voluntary Response Samples
49
Examples of Poor Samples that Result in Bias
Convenience Samples Sampling friends Sampling at the mall
Voluntary Response Samples Internet surveys Call-in surveys
Example taken from Statistics: The Art and Science of Learning from Data
50
Example 4.4
In 1997 in her book Women and Love, Shere Hite presented results of a survey mailed to 100,000 women in the United States. One of her conclusions was that 70% of women who had been married at least five years have extramarital affairs. She based this conclusion on the replies of only 4500 women.
This is an example of
Example taken from Statistics: The Art and Science of Learning from Data
51
Example 4.4
In 1997 in her book Women and Love, Shere Hite presented results of a survey mailed to 100,000 women in the United States. One of her conclusions was that 70% of women who had been married at least five years have extramarital affairs. She based this conclusion on the replies of only 4500 women.
This is an example of nonresponse bias.
Used with permission from Dr. Ellen Toby 52
Example 4.5
Ann Landers asked readers, “If you had it to do over again, would you have children?” A few weeks later, her column was headlined, “70% OF PARENTS SAY KIDS NOT WORTH IT.” Of the nearly 10,000 parents who wrote in, 70% said they would not have children if they could go back in time.
This is an example of ______________________ sampling.
Used with permission from Dr. Ellen Toby 53
Example 4.5
Ann Landers asked readers, “If you had it to do over again, would you have children?” A few weeks later, her column was headlined, “70% OF PARENTS SAY KIDS NOT WORTH IT.” Of the nearly 10,000 parents who wrote in, 70% said they would not have children if they could go back in time.
This is an example of voluntary response sampling.
Example taken from Statistics: The Art and Science of Learning from Data
54
Example 4.6
In 1936, the Literary Digest conducted a poll to predict the winner of the presidential election. Alf Landon and Franklin Roosevelt were both running for president. The sample frame for the poll was constructed from telephone directories, country club memberships and automobile registrations. The Digest predicted that Landon would win, but in reality FDR won by a landslide.
This is an example of _____________ sampling that resulted in _______________.
Example taken from Statistics: The Art and Science of Learning from Data
55
Example 4.6
In 1936, the Literary Digest conducted a poll to predict the winner of the presidential election. Alf Landon and Franklin Roosevelt were both running for president. The sample frame for the poll was constructed from telephone directories, country club memberships and automobile registrations. The Digest predicted that Landon would win, but in reality FDR won by a landslide.
This is an example of convenience sampling that resulted in undercoverage.
Used with permission from Dr. Ellen Toby 56
Example 4.7
An experiment involving adolescent males (ages 15-19) appeared in Science, 1995. The purpose of the study was to determine whether there was an association between survey techniques and the desire to give socially acceptable answers.
The participants were randomly assigned to one of two different survey forms, each of which had identical questions concerning sexual practices and drug habits.
57
Example 4.7
The two versions of the survey were Paper: participants put answers in an envelope with ID#
on it and return in person Computer: participants listened to questions in
headphones and then answered on laptops.
58
Types of Experimental Studies
Completely Randomized Design The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design Each subject is matched up with another subject who is
similar in terms of age, health, etc. This creates a ______________ _______.
The treatments are then randomly assigned to the subjects in each pair.
This ensures that the treatment groups are essentially ______________.
59
Types of Experimental Studies
Completely Randomized Design The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design Each subject is matched up with another subject who is
similar in terms of age, health, etc. This creates a matched pair.
The treatments are then randomly assigned to the subjects in each pair.
This ensures that the treatment groups are essentially ______________.
60
Types of Experimental Studies
Completely Randomized Design The subjects are randomly assigned to one of the
treatments.
Matched Pairs Design Each subject is matched up with another subject who is
similar in terms of age, health, etc. This creates a matched pair.
The treatments are then randomly assigned to the subjects in each pair.
This ensures that the treatment groups are essentially identical.
61
Types of Experimental Studies
Crossover Design The subjects cross over during the experiment from one
treatment to another.
Randomized Block Design Similar subjects are matched up to create a large set of
experimental units. This is called a _________.
The treatments are then randomly assigned to units within the blocks.
62
Types of Experimental Studies
Crossover Design The subjects cross over during the experiment from one
treatment to another.
Randomized Block Design Similar subjects are matched up to create a large set of
experimental units. This is called a block.
The treatments are then randomly assigned to units within the blocks.
63
Elements of a Good Experiment
Control group Allows us to compare against an existing treatment Enables us to control the __________ _______
The placebo effect occurs when patients seem to improve regardless of the treatment they receive.
Randomization Eliminates ______ that can result when researchers assign
treatments to the subjects Balances the group on variables that you know affect the
response Balances the group on _________ variables that may be
unknown to you
64
Elements of a Good Experiment
Control group Allows us to compare against an existing treatment Enables us to control the placebo effect
The placebo effect occurs when patients seem to improve regardless of the treatment they receive.
Randomization Eliminates ______ that can result when researchers assign
treatments to the subjects Balances the group on variables that you know affect the
response Balances the group on _________ variables that may be
unknown to you
65
Elements of a Good Experiment
Control group Allows us to compare against an existing treatment Enables us to control the placebo effect
The placebo effect occurs when patients seem to improve regardless of the treatment they receive.
Randomization Eliminates bias that can result when researchers assign
treatments to the subjects Balances the group on variables that you know affect the
response Balances the group on _________ variables that may be
unknown to you
66
Elements of a Good Experiment
Control group Allows us to compare against an existing treatment Enables us to control the placebo effect
The placebo effect occurs when patients seem to improve regardless of the treatment they receive.
Randomization Eliminates bias that can result when researchers assign
treatments to the subjects Balances the group on variables that you know affect the
response Balances the group on lurking variables that may be
unknown to you
67
Elements of a Good Experiment
Blinding Increases reliability of the results
_________-blind: subjects do not know the treatment assignment
_________-blind: neither the subjects nor those in contact with the subjects know the treatment assignment
Replication Assigns several _________________ ________
to each treatment
68
Elements of a Good Experiment
Blinding Increases reliability of the results
Single-blind: subjects do not know the treatment assignment
_________-blind: neither the subjects nor those in contact with the subjects know the treatment assignment
Replication Assigns several _________________ ________
to each treatment
69
Elements of a Good Experiment
Blinding Increases reliability of the results
Single-blind: subjects do not know the treatment assignment
Double-blind: neither the subjects nor those in contact with the subjects know the treatment assignment
Replication Assigns several _________________ ________
to each treatment
70
Elements of a Good Experiment
Blinding Increases reliability of the results
Single-blind: subjects do not know the treatment assignment
Double-blind: neither the subjects nor those in contact with the subjects know the treatment assignment
Replication Assigns several experimental units to each
treatment
71
Example 4.9 A pharmaceutical company has developed a new drug for treating
high blood pressure. To determine the effectiveness of the drug, the company conducted an experiment in which subjects with a history of high blood pressure were treated with the new drug.
A later experiment randomly divided subjects with a history of high blood pressure into two groups. Group A was treated with the new drug as before. Group B received the most popular drug on the market at that time. The subjects were unaware of which treatment they received. 60% of the patients in Group A improved, while 63% of the patients in Group B improved.
The __________ experiment is better because
72
Example 4.9 A pharmaceutical company has developed a new drug for treating
high blood pressure. To determine the effectiveness of the drug, the company conducted an experiment in which subjects with a history of high blood pressure were treated with the new drug.
A later experiment randomly divided subjects with a history of high blood pressure into two groups. Group A was treated with the new drug as before. Group B received the most popular drug on the market at that time. The subjects were unaware of which treatment they received. 60% of the patients in Group A improved, while 63% of the patients in Group B improved.
The second experiment is better because it employs a control group and blinding.
Example taken from Statistics: The Art and Science of Learning from Data
73
Example 4.10 To investigate whether antidepressants help smokers to quit
smoking, one study used 429 men and women who were 18 or older and had smoked 15 cigarettes or more per day in the previous year. They were all highly motivated to quit and in good health. They were assigned to one of two groups: one group took an antidepressant called Zyban, while the other group did not take anything. At the end of a year, the study observed whether each subject had successfully abstained from smoking.
74
Logic Behind Randomized Comparative Experiments
Randomization ensures that the groups of subjects are similar in all respects before the treatments are applied.
Using a control group for comparison ensures that external influences operate equally on both groups.
If the groups are large enough, natural differences in subjects will average out.
This means that there be little difference in the results for the groups unless the treatments themselves actually cause the difference.
75
Did You Know?
Observational studies can also have control groups. These are called ______-________ studies. The cases are people who have a certain disease
or condition, and the controls are people who do not have the disease.
Their purpose is to see if one of the explanatory variables is related to the disease.
_________ from the beginning of these notes is an example of a case-control study.
76
Did You Know?
Observational studies can also have control groups. These are called case-control studies. The cases are people who have a certain disease
or condition, and the controls are people who do not have the disease.
Their purpose is to see if one of the explanatory variables is related to the disease.
_________ from the beginning of these notes is an example of a case-control study.
77
Did You Know?
Observational studies can also have control groups. These are called case-control studies. The cases are people who have a certain disease
or condition, and the controls are people who do not have the disease.
Their purpose is to see if one of the explanatory variables is related to the disease.
Study 1 from the beginning of these notes is an example of a case-control study.
78
Important Points
Observational studies Types
Retrospective, Cross-Sectional, Prospective Sampling Designs
Simple random sample (SRS), Stratified random sample, Cluster sample, Systematic sample
Bias Types Undercoverage, Response bias, Nonresponse bias
Sources of Bias Convenience sampling, Voluntary response sampling
79
Important Points
Experiments Types
Completely randomized design, matched pairs designs, crossover designs, randomized block designs
Elements of Good Experiments Control group, randomization, blinding and replication
Advantages Can show causation
Disadvantages Can be unethical Can take decades to complete
80
Important Points
If a group is underrepresented in the sample, we cannot make inference about it.
We must be careful when interpreting the results of observational studies.
For comparison of several treatments to be valid, you must apply all treatments to similar groups of experimental units.
Interesting questions are usually pretty tough to answer. This is due in part to the fact that no single experiment or observational study can determine causation.