1 chapter 4 gathering data. 2 looking back in chapters 2 & 3 we learned how to describe data...

1

Chapter 4

Gathering Data

2

Looking Back In Chapters 2 & 3 we learned how to describe data

both graphically and numerically. For these statistical analyses to be useful, we must

have good data. In fact, the way a study is designed (how we gather

data) can have a major impact on the results of the study.

The purpose of this course is for you to learn what you can conclude about an entire population given a sample from that population.

If a study is poorly designed and implemented, the results may be meaningless or misleading.

Example taken from Statistics: The Art and Science of Learning from Data

3

Two Scenarios Study 1

A U.S. study (2000) compared 469 patients with brain cancer to 422 patients who did not have brain cancer. The patients’ cell phone use was measured using a questionnaire. The two groups’ use of cell phones was similar.

Study 2 An Australian study (1997) conducted a study with 200

transgenic mice. One hundred were exposed for two 30 minute periods a day to the same kind of microwaves with roughly the same power as the kind transmitted from a cell phone. The other 100 mice were not exposed. After 18 months, the brain tumor rate for the exposed mice was twice as high as that for the unexposed mice.

4

Questions to Consider

How do the two studies differ? Study 1

Study 2

5



No treatments assigned Patients merely questioned

Study 2

6



No treatments assigned Patients merely questioned

Study 2 Uses mice in hopes of generalizing to humans

7


Why do the results of different medical studies sometimes disagree?

Could the second study be performed on human beings?

8


Why do the results of different medical studies sometimes disagree? Differing types of studies, data collection or

sample frames Could the second study be performed on

human beings?

9


Why do the results of different medical studies sometimes disagree? Differing types of studies, data collection or

sample frames Could the second study be performed on

human beings? No, because it would be unethical to knowingly

expose humans to possibly harmful waves.

10


Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? Informal observations of this type are called

_____________ _____________. You should rely on reputable research studies,

not anecdotes.

11


Suppose a friend recently diagnosed with brain cancer was a frequent cell phone user. Is this strong evidence that frequent cell phone use increases the likelihood of getting brain cancer? Informal observations of this type are called

anecdotal evidence. You should rely on reputable research studies,

not anecdotes.

12

Two Main Ways to Gather Data Observational Study

The researcher observes values of the response and explanatory variables for the sampled subjects without imposing any treatments

Example: Experiment

The researcher assigns experimental conditions (also called treatments) to subjects (also called experimental units) and then observes outcomes on the response variable.

Treatments correspond to values of the explanatory variable

Example:

13



Example: Study 1 Experiment



Example:

14



Example: Study 1 Experiment



Example: Study 2

15

Advantages of Experiments over Observational Studies

In an observational study, there can always be lurking variables affecting the results.

This means that observational studies can _________ show causation.

It is easier to adjust for lurking variables in an experiment.

In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.

16

Advantages of Experiments over Observational Studies

In an observational study, there can always be lurking variables affecting the results.

This means that observational studies can never show causation.

It is easier to adjust for lurking variables in an experiment.

In general, we can study the effect of an explanatory variable on a response variable more accurately with an experiment than with an observational study.

17

Disadvantages of Experiments

They can be ____________ to perform on the subjects in which you are interested.

It can be difficult to monitor subjects to ensure that they are doing what they are told.

They can take many years, even decades, to complete.

Results of experiments that use animals do not ______________ to humans.

They are unnecessary when the question of interest does not involve trying to assess _____________.

18


They can be unethical to perform on the subjects in which you are interested.



Results of experiments that use animals do not ______________ to humans.


19





Results of experiments that use animals do not generalize to humans.


20





Results of experiments that use animals do not generalize to humans.

They are unnecessary when the question of interest does not involve trying to assess causality.


21

Example 4.1 A large study of student drug use and how it

depends on drug testing enrolled 76,000 middle and high school students. Each student in the study filled out a questionnaire. One question asked whether the student used drugs. The study found that drug use was not affected by student drug testing.

This is an example of an

Could there be any lurking variables?


22



This is an example of an observational study.



23



This is an example of an observational study.

Could there be any lurking variables? Frequency of drug testing, whether testing is random, etc.

Used with permission from Dr. Ellen Toby 24

Example 4.2 A researcher buys seeds of two different varieties of

corn. He randomly selects 30 seeds of each variety and plants them in his backyard, making sure to label the location of each seed and its type. He then measures how long it takes each seed to sprout. At the end of the study he compares the average germination time of the different varieties.






This is an example of an experiment.






Could there be any lurking variables? Soil quality, temperature


Example 4.3 A researcher has seeds of only one variety of tomato. She has

60 nearly identical pots of soil and plants one tomato seed in each. She randomly selects 30 pots and keeps them at 75° F. The other 30 pots she keeps at 65° F. Aside from temperature, she provides the same growing conditions to all pots. She then measures how long it takes for the seeds to sprout. At the end of the study she compares the average germination time of the different temperature groups.


Are there any lurking variables?





Are there any lurking variables?





Are there any lurking variables? No, everything has been controlled here.

30

Types of Observational Studies

Retrospective Observational studies that look back in time

This is sometimes done to find risk factors for certain diseases

Cross-Sectional Observational studies that take a cross section of

the population at the current time Prospective

Observational studies in which subjects are followed into the future

31

Sampling Designs for Observational Studies

Simple Random Sampling (SRS) A simple random sample of n subjects from a

population is one in which each possible sample of that size has the _______ chance of being selected.

32


Simple Random Sampling (SRS) A simple random sample of n subjects from a

population is one in which each possible sample of that size has the same chance of being selected.

33


Stratified Sampling A stratified random sample divides the population

into separate groups, called strata, and then selects an SRS of _________ from each stratum.

34


Stratified Sampling A stratified random sample divides the population

into separate groups, called strata, and then selects an SRS of subjects from each stratum.

35

Sampling Designs for Observational Studies Cluster Sampling

A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of ________(or strata) is taken. Every member of the selected groups is put into the sample.

36

Sampling Designs for Observational Studies Cluster Sampling

A cluster random sample can be used if the target population naturally divides into groups, each of which is representative of the entire target population. In this method, a SRS of groups (or strata) is taken. Every member of the selected groups is put into the sample.

37


Systematic Sampling A systematic sample selects every kth person from

the sample frame. The researcher randomly selects a number between 1 and k in order to know which person to select first, then selects every kth person after this.

38

Advantages of the Various Sampling Designs

Simple Random Sampling (SRS) It is the easiest most widespread form of

sampling. Each subject has an _______ chance to be in the

sample. The sample enables us to determine how likely it

is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).

39


Simple Random Sampling (SRS) It is the easiest most widespread form of

sampling. Each subject has an equal chance to be in the

sample. The sample enables us to determine how likely it

is that descriptive statistics (like the sample mean) fall close to corresponding values for which we would like to make inference (like the population mean).

40


Stratified Sampling It ensures that there are enough _________ in

each group that you want to compare. Cluster Sampling

It does not require a sampling frame of subjects. It is less ___________ to implement.

41


Stratified Sampling It ensures that there are enough subjects in each

group that you want to compare. Cluster Sampling

It does not require a sampling frame of subjects. It is less ___________ to implement.

42


Stratified Sampling It ensures that there are enough subjects in each

group that you want to compare. Cluster Sampling

It does not require a sampling frame of subjects. It is less expensive to implement.

43

Bias in Sampling

A sampling method is _________ if The sample tends to favor some parts of the

population over others. In other words, the results from the sample are

not representative of the population. Obviously, __________ samples are our

goal.

44

Bias in Sampling

A sampling method is biased if The sample tends to favor some parts of the


not representative of the population. Obviously, __________ samples are our

goal.

45

Bias in Sampling

A sampling method is biased if The sample tends to favor some parts of the


not representative of the population. Obviously, unbiased samples are our goal.

46

Types of Bias

Undercoverage Occurs when a sampling frame leaves out some groups in

the population

Nonresponse bias Occurs when some sampled subjects cannot be reached,

refuse to participate or fail to answer some questions

Response bias Occurs when the subject gives an incorrect response or

when the question wording or the way the interviewer asks the questions is confusing or misleading

47

Examples of Poor Samples that Result in Bias

Convenience Samples

Voluntary Response Samples

48


Convenience Samples Sampling friends Sampling at the mall

Voluntary Response Samples

49


Convenience Samples Sampling friends Sampling at the mall

Voluntary Response Samples Internet surveys Call-in surveys


50

Example 4.4

In 1997 in her book Women and Love, Shere Hite presented results of a survey mailed to 100,000 women in the United States. One of her conclusions was that 70% of women who had been married at least five years have extramarital affairs. She based this conclusion on the replies of only 4500 women.

This is an example of


51

Example 4.4

In 1997 in her book Women and Love, Shere Hite presented results of a survey mailed to 100,000 women in the United States. One of her conclusions was that 70% of women who had been married at least five years have extramarital affairs. She based this conclusion on the replies of only 4500 women.

This is an example of nonresponse bias.


Example 4.5

Ann Landers asked readers, “If you had it to do over again, would you have children?” A few weeks later, her column was headlined, “70% OF PARENTS SAY KIDS NOT WORTH IT.” Of the nearly 10,000 parents who wrote in, 70% said they would not have children if they could go back in time.

This is an example of ______________________ sampling.


Example 4.5

Ann Landers asked readers, “If you had it to do over again, would you have children?” A few weeks later, her column was headlined, “70% OF PARENTS SAY KIDS NOT WORTH IT.” Of the nearly 10,000 parents who wrote in, 70% said they would not have children if they could go back in time.

This is an example of voluntary response sampling.


54

Example 4.6

In 1936, the Literary Digest conducted a poll to predict the winner of the presidential election. Alf Landon and Franklin Roosevelt were both running for president. The sample frame for the poll was constructed from telephone directories, country club memberships and automobile registrations. The Digest predicted that Landon would win, but in reality FDR won by a landslide.

This is an example of _____________ sampling that resulted in _______________.


55

Example 4.6

In 1936, the Literary Digest conducted a poll to predict the winner of the presidential election. Alf Landon and Franklin Roosevelt were both running for president. The sample frame for the poll was constructed from telephone directories, country club memberships and automobile registrations. The Digest predicted that Landon would win, but in reality FDR won by a landslide.

This is an example of convenience sampling that resulted in undercoverage.


Example 4.7

An experiment involving adolescent males (ages 15-19) appeared in Science, 1995. The purpose of the study was to determine whether there was an association between survey techniques and the desire to give socially acceptable answers.

The participants were randomly assigned to one of two different survey forms, each of which had identical questions concerning sexual practices and drug habits.

57

Example 4.7

The two versions of the survey were Paper: participants put answers in an envelope with ID#

on it and return in person Computer: participants listened to questions in

headphones and then answered on laptops.

58

Types of Experimental Studies

Completely Randomized Design The subjects are randomly assigned to one of the

treatments.

Matched Pairs Design Each subject is matched up with another subject who is

similar in terms of age, health, etc. This creates a ______________ _______.

The treatments are then randomly assigned to the subjects in each pair.

This ensures that the treatment groups are essentially ______________.

59



treatments.


similar in terms of age, health, etc. This creates a matched pair.


This ensures that the treatment groups are essentially ______________.

60



treatments.


similar in terms of age, health, etc. This creates a matched pair.


This ensures that the treatment groups are essentially identical.

61


Crossover Design The subjects cross over during the experiment from one

treatment to another.

Randomized Block Design Similar subjects are matched up to create a large set of

experimental units. This is called a _________.

The treatments are then randomly assigned to units within the blocks.

62


Crossover Design The subjects cross over during the experiment from one

treatment to another.

Randomized Block Design Similar subjects are matched up to create a large set of

experimental units. This is called a block.

The treatments are then randomly assigned to units within the blocks.

63

Elements of a Good Experiment

Control group Allows us to compare against an existing treatment Enables us to control the __________ _______

The placebo effect occurs when patients seem to improve regardless of the treatment they receive.

Randomization Eliminates ______ that can result when researchers assign

treatments to the subjects Balances the group on variables that you know affect the

response Balances the group on _________ variables that may be

unknown to you

64


Control group Allows us to compare against an existing treatment Enables us to control the placebo effect


Randomization Eliminates ______ that can result when researchers assign



unknown to you

65




Randomization Eliminates bias that can result when researchers assign



unknown to you

66




Randomization Eliminates bias that can result when researchers assign


response Balances the group on lurking variables that may be

unknown to you

67


Blinding Increases reliability of the results

_________-blind: subjects do not know the treatment assignment

_________-blind: neither the subjects nor those in contact with the subjects know the treatment assignment

Replication Assigns several _________________ ________

to each treatment

68



Single-blind: subjects do not know the treatment assignment

_________-blind: neither the subjects nor those in contact with the subjects know the treatment assignment


to each treatment

69




Double-blind: neither the subjects nor those in contact with the subjects know the treatment assignment


to each treatment

70




Double-blind: neither the subjects nor those in contact with the subjects know the treatment assignment

Replication Assigns several experimental units to each

treatment

71

Example 4.9 A pharmaceutical company has developed a new drug for treating

high blood pressure. To determine the effectiveness of the drug, the company conducted an experiment in which subjects with a history of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of high blood pressure into two groups. Group A was treated with the new drug as before. Group B received the most popular drug on the market at that time. The subjects were unaware of which treatment they received. 60% of the patients in Group A improved, while 63% of the patients in Group B improved.

The __________ experiment is better because

72

Example 4.9 A pharmaceutical company has developed a new drug for treating

high blood pressure. To determine the effectiveness of the drug, the company conducted an experiment in which subjects with a history of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of high blood pressure into two groups. Group A was treated with the new drug as before. Group B received the most popular drug on the market at that time. The subjects were unaware of which treatment they received. 60% of the patients in Group A improved, while 63% of the patients in Group B improved.

The second experiment is better because it employs a control group and blinding.


73

Example 4.10 To investigate whether antidepressants help smokers to quit

smoking, one study used 429 men and women who were 18 or older and had smoked 15 cigarettes or more per day in the previous year. They were all highly motivated to quit and in good health. They were assigned to one of two groups: one group took an antidepressant called Zyban, while the other group did not take anything. At the end of a year, the study observed whether each subject had successfully abstained from smoking.

74

Logic Behind Randomized Comparative Experiments

Randomization ensures that the groups of subjects are similar in all respects before the treatments are applied.

Using a control group for comparison ensures that external influences operate equally on both groups.

If the groups are large enough, natural differences in subjects will average out.

This means that there be little difference in the results for the groups unless the treatments themselves actually cause the difference.

75

Did You Know?

Observational studies can also have control groups. These are called ______-________ studies. The cases are people who have a certain disease

or condition, and the controls are people who do not have the disease.

Their purpose is to see if one of the explanatory variables is related to the disease.

_________ from the beginning of these notes is an example of a case-control study.

76

Did You Know?

Observational studies can also have control groups. These are called case-control studies. The cases are people who have a certain disease



_________ from the beginning of these notes is an example of a case-control study.

77

Did You Know?

Observational studies can also have control groups. These are called case-control studies. The cases are people who have a certain disease



Study 1 from the beginning of these notes is an example of a case-control study.

78

Important Points

Observational studies Types

Retrospective, Cross-Sectional, Prospective Sampling Designs

Simple random sample (SRS), Stratified random sample, Cluster sample, Systematic sample

Bias Types Undercoverage, Response bias, Nonresponse bias

Sources of Bias Convenience sampling, Voluntary response sampling

79

Important Points

Experiments Types

Completely randomized design, matched pairs designs, crossover designs, randomized block designs

Elements of Good Experiments Control group, randomization, blinding and replication

Advantages Can show causation

Disadvantages Can be unethical Can take decades to complete

80

Important Points

If a group is underrepresented in the sample, we cannot make inference about it.

We must be careful when interpreting the results of observational studies.

For comparison of several treatments to be valid, you must apply all treatments to similar groups of experimental units.

Interesting questions are usually pretty tough to answer. This is due in part to the fact that no single experiment or observational study can determine causation.

1 chapter 4 gathering data. 2 looking back in chapters 2 & 3 we learned how to describe data...

Documents

questioned study

scenarios study

australian study

data observational study

gathering data slide

humans slide

brain cancer

patients cell phone