1Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Download 1Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling

Post on 21-Dec-2015




2 download

Embed Size (px)


<ul><li> Slide 1 </li> <li> 1Chapter 8 Sampling1 Chapter 8 Producing Data: Sampling </li> <li> Slide 2 </li> <li> 2 Objectives (BPS chapter 8) Producing Data: Sampling u Observation versus experiment u Population versus sample u Sampling methods u How to sample badly u Simple random samples u Other sampling designs u Caution about sample surveys u Learning about populations from samples (inference) </li> <li> Slide 3 </li> <li> 3Chapter 8 Sampling3 Experiments vs. Observational Studies EExperiment DDeliberately imposes some treatment on individuals in order to observe their responses. SStudies whether the treatment causes change in the response. eexperimenter determines which units receive which treatments (ideally using some form of random allocation) OObservational study ccompare units that happen to have received each of the treatments ooften useful for identifying possible causes of effects, but cannot reliably establish causation OOnly properly designed and executed experiments can reliably demonstrate causation. </li> <li> Slide 4 </li> <li> 4Chapter 8 Sampling4 Population TThe complete collection of all subjects or objects (scores, people, measurements, and so on) that are being studied. TThe collection is complete in the sense that it includes all subjects to be studied. </li> <li> Slide 5 </li> <li> 5 5 Census: The collection of data from every individual in a population. Sample : A subset of elements drawn from a population from which we collect data. The sample must be a good representative of the entire population. A sampling design describes exactly how to choose a sample from the population. </li> <li> Slide 6 </li> <li> 6Chapter 8 Sampling6 Population individuals </li> <li> Slide 7 </li> <li> 7Chapter 8 Sampling7 Sampling Frame List of individuals that could possibly be selected for the sample (not necessarily the same as the population) </li> <li> Slide 8 </li> <li> 8 8 List of Individuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Census 1 9 2 345 6 7 8 10 17 16 15 13 14 12 11 1 9 2 345 6 7 8 10 17 16 15 13 14 12 11 Census </li> <li> Slide 9 </li> <li> 9Chapter 8 Sampling9 Sampling Frame 1 9 2 345 6 7 8 10 17 16 15 13 14 12 11 List of Individuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Sample </li> <li> Slide 10 </li> <li> 10Chapter 8 Sampling10 Example Suppose we are interested in the average age of all Malaspina students. The relevant population is all Malaspina students (including students in all campuses). Possible Sampling Frame: List of Malaspina students at the Nanaimo campus. </li> <li> Slide 11 </li> <li> 11Chapter 8 Sampling11 Example Cont. AA sample can be students in this Math 161 class, or, 50 randomly selected Malaspina students at the Nanaimo campus. IIf we use the ages of all Malaspina students, then we have a census. </li> <li> Slide 12 </li> <li> 12Chapter 212 Thought Question Popular magazines often contain surveys that ask their readers to answer questions about hot topics in the news. Do you think the responses the magazines receive are representative of public opinion? Explain why or why not. </li> <li> Slide 13 </li> <li> 13Chapter 213 Thought Question Suppose you access an online listing of all courses at your institution, alphabetized by department, to determine what proportion of all courses have a statistics course as a prerequisite. If you decide to sample 50 courses in order to get a representative sample of courses, how would you select them? Would it be appropriate to simply select the first 50 courses listed? </li> <li> Slide 14 </li> <li> 14Chapter 214 Bad Sampling Plans u Convenience sampling selecting individuals who are easiest to reach Problem: Sample might not be representative of the target population. </li> <li> Slide 15 </li> <li> 15Chapter 215 Convenience Sampling u Sampling mice from a large cage to study how a drug affects physical activity lab assistant reaches into the cage to select the mice one at a time until 10 are chosen u Which mice will likely be chosen? could this sample yield biased results? </li> <li> Slide 16 </li> <li> 16Chapter 216 Bad Sampling Plans u Voluntary response sampling allowing individuals to choose to be in the sample Problem: People with strong opinions (or feelings) about the issue tend to respond. Example: RateMyProfessor.com </li> <li> Slide 17 </li> <li> 17Chapter 217 Voluntary Response u To prepare for her book Women and Love, Shere Hite sent questionnaires to 100,000 women asking about love, sex, and relationships. 4.5% responded Hite used those responses to write her book u Moore (Statistics: Concepts and Controversies, 1997) noted: respondents were fed up with men and eager to fight them the anger became the theme of the book but angry women are more likely to respond </li> <li> Slide 18 </li> <li> 18 CNN on-line surveys: Bias: People have to care enough about an issue to bother replying. This sample is probably a combination of people who hate wasting the taxpayers money and animal lovers. </li> <li> Slide 19 </li> <li> 19Chapter 219 Bias The design of a statistical study is biased if it systematically favours certain outcomes. Convenience Sampling and Voluntary Response Sampling often produce biased samples. </li> <li> Slide 20 </li> <li> 20Chapter 220 Polls and Surveys u Data carelessly collected (even if the sample size is large), is subject to a high degree of bias. u To avoid biases, samples must be randomly chosen. </li> <li> Slide 21 </li> <li> 21Chapter 221 Avoiding Bias u We select a sample in order to get information about some population. u How can we choose a sample that fairly represents the population? u Probability Sample: A sample chosen by chance. We must know what samples are possible and what chance (probability), each possible sample has. </li> <li> Slide 22 </li> <li> 22Chapter 222 Simple Random Sampling u Each individual in the population has the same chance of being chosen for the sample u Each group of individuals in the population of the required size (n) has the same chance of being the sample actually selected </li> <li> Slide 23 </li> <li> 23Chapter 223 Simple Random Sample (SRS) A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance of being selected. </li> <li> Slide 24 </li> <li> 24Chapter 224 How to choose an SRS Label each individual in the population with a unique number drawing names (numbers) out of a hat random number table (see Table B on pg. 686 of text) computer software (www.randomizer.org) or see textbook website (http://bcs.whfreeman.com/bps4e)www.randomizer.orghttp://bcs.whfreeman.com/bps4e Statistical Applets Simple Random Sample </li> <li> Slide 25 </li> <li> 25 We need to select a random sample of 5 from a class of 20 students. 1)List and number all members of the population, which is the class of 20. 2)The number 20 is two digits long. 3)Parse the list of random digits into numbers that are two digits long. Here we chose to start with line 103, for no particular reason. Choosing a simple random sample 45 46 71 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56 </li> <li> Slide 26 </li> <li> 26 01 Alan 02 Amber 03 Andrew 04 Ashley 05 Candice 06 Chase 07 Dana 08 Elisha 09 Gwen 10 James 11 Adrienne 12 Arron 13 Beverly 14 Bryce 15 Caleigh 16 Carl 17 Carly 18 Chanel 19 Christina 20 Christine Remember that 1 is 01, 2 is 02, etc. Under sampling without replacement, if you were to hit 17 again before getting five people, dont sample Ramon twiceyou just keep going. 4)Choose a random sample of size 5 by reading through the list of two-digit random numbers, starting with line 103 and on. 5)The first five random numbers matching numbers assigned to people make the SRS. 45 46 71 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56 52 71 13 88 89 93 07 46 02 The first individual selected is Ramon, number 17. Then Henry (9 or 09). Thats all we can get from line 103. We then move on to line 104. The next three to be selected are Moe, George, and Amy (13, 7, and 2). </li> <li> Slide 27 </li> <li> 27Chapter 227 Simple Random Sampling Suppose there are 800 courses at an institution, alphabetized by department (and numbered 001-800), and you decide to randomly select 50 of them to determine what proportion of all the courses have a statistics course as a prerequisite. Use a random number table to select which 50 courses to sample. Example: Courses with Statistics Prerequisite Page 686 of textbook: Pick a line and column at random: suppose we get line 111, column 3 Random numbers: 605130929700412712 TRY: Use line 126, column 1: Random numbers: 969271993136809741 </li> <li> Slide 28 </li> <li> 28Chapter 428 Systematic Sample u randomly select a member of the sampling frame for the sample u using a set procedure or rule, select the rest of the individuals for the sample for example, Suppose we must choose 4 addresses out of 100, because 100/4 =25, we randomly select an individual from the sampling frame (01-25), and then select every 25 th member of the sampling frame to be in the sample Example: Page 210, # 8.44 </li> <li> Slide 29 </li> <li> 29Chapter 429 Stratified Random Sample u first divide the population into groups of similar individuals, called strata u second, choose a separate simple random sample in each stratum u third, combine these simple random samples to form the full sample if only certain strata are (randomly) chosen to be used, and all subjects in these strata make up the sample, then we have a cluster sample. the population is often divided according to geographic regions (called clusters). </li> <li> Slide 30 </li> <li> 30Chapter 430 Multistage Sample u divide the population of interest into groups u randomly select some of those groups u divide the resulting collection of individuals into smaller groups u randomly select some of those groups u continue dividing the resulting collection of individuals into groups and randomly selecting some of those groups until you can simply list all of the resulting individuals and randomly select n of them for your sample </li> <li> Slide 31 </li> <li> 31Chapter 431 Probability Sampling Plans u Simple random sampling (SRS) u Systematic sampling u Stratified random sampling u Cluster sampling u Multistage sampling </li> <li> Slide 32 </li> <li> 32Chapter 432 Steps for Designing a Study 1. Identify your objective 2. Develop a plan: Experiment or Observational study 3. Use a random procedure to collect data 4. Analyze the data and form conclusions </li> <li> Slide 33 </li> <li> 33Chapter 233 Hey! Do you believe in the death penalty? _________________ Sampling - use results that are readily available </li> <li> Slide 34 </li> <li> 34Chapter 434 __________________ - selection so that each has an equal chance of being selected </li> <li> Slide 35 </li> <li> 35Chapter 435 _____________ Sampling - Select some starting point and then select every Kth element in the population </li> <li> Slide 36 </li> <li> 36Chapter 436 _______________ Sampling - subdivide the population into subgroups (strata) that share the same characteristic, then draw a sample from each stratum </li> <li> Slide 37 </li> <li> 37Chapter 437 __________________ Sampling - divide the population into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters </li> <li> Slide 38 </li> <li> 38Chapter 438 Thought Question When surveying students on their opinions on their professors teaching methods, do you think it matters who conducts the interviews? Explain your answer with an example. </li> <li> Slide 39 </li> <li> 39Chapter 439 Sources of Error in Surveys u Random sampling reduces bias in choosing a sample and allows control of variability. u Sampling in the real world is more complex and less reliable than we might hope for. u Confidence statements do not reflect all sources of error that are present in sampling. </li> <li> Slide 40 </li> <li> 40Chapter 440 Sampling Errors Errors that are caused by the act of taking a sample. Random Sampling Error: the difference between a sample result and the true population result; such an error results from chance (sample fluctuations). - Measured by the margin or error. Nonsampling Errors Errors that are not related to the act of taking a sample. Example: Sample data that are incorrectly collected, recorded, or analyzed (such as using a defective instrument, or copying the data incorrectly). Nonsampling errors can be much larger than the sampling errors. </li> <li> Slide 41 </li> <li> 41Chapter 441 Sampling Errors Using the wrong sampling frame. Undercoverage: Excluding some units in the population. </li> <li> Slide 42 </li> <li> 42Chapter 442 Sampling Errors Disasters Using voluntary response (self selection) Using a convenience or haphazard sample v cannot extend results to the population of interest (need a broad cross-section of the population) </li> <li> Slide 43 </li> <li> 43Chapter 443 u Difficulties Processing errors (data entry, calculations) Wording of questions / Response error u Disasters Nonresponse (cannot contact subjects or they do not respond) Nonsampling Errors </li> <li> Slide 44 </li> <li> 44Chapter 444 Sources of Nonsampling Errors Non-response bias: Cannot contact subjects or they do not respond. - Nonrespondents often behave or think differently from respondents. low response rates can lead to huge biases. Processing Errors: D ata that are incorrectly collected, recorded, calculated etc. </li> <li> Slide 45 </li> <li> 45Chapter 445 Nonsampling errors cont. u Survey format effects: Factors such as question order, questionnaire layout, self -administered questionnaire or interviewer, can affect the results. u Interviewer effects: Different interviewers asking the same questions can tend to obtain different answers. u Response bias: Fancy term for lying when you think you should not tell the truth. Like if your family doctor asks: How much do you drink? Or a survey of female students asking: How many men do you date per week? People also simply forget and often give erroneous answers to questions about the past. </li> <li> Slide 46 </li> <li> 46Chapter 446 Concerns when Asking Survey Questions u Deliberate bias u Unintentional bias u Desire to please u Asking the uninformed u Unnecessary complexity u Ordering of questions u Confidentiality and anonymity </li> <li> Slide 47 </li> <li> 47Chapter 447 Confidentiality and Anonymity u Confidential answer respondent is known, but the information is a secret facilitates follow-up studies u Anonymous answer the respondent is not known, or cannot be linked to his/her response usually yields more truthful answers </li> <li> Slide 48 </li> <li> 48Chapter 448 Dealing with errors u Statistical methods are available for estimating the likely size of sampling errors. -margin of error gives the sampling error. u All we can do with nonsampling errors is to try to minimize them at the study-design stage. u Pilot Survey: One tests a survey on a relatively small group...</li></ul>