Chapter Eleven
Sampling Foundations
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 2
Chapter Objectives• Define and distinguish between sampling and census
studies• Discuss when to use a probability versus a
nonprobability sampling method and implement the different methods
• Explain sampling error and sampling distribution• Construct confidence intervals for population means
and proportions• List the factors to consider in determining sample
size, and compute the required sample size to achieve a specific degree of precision at a desired confidence level
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 3
Gallup Poll on Sampling: China• 12,500 counties, cities, and urban districts were divided
into 50 strata based on their geographic location, degree of economic development, and proportion of non-agricultural population
• One primary sampling unit (PSU), consisting of either a county or a city, was selected from each stratum based on probability proportional to population size
• Within each PSU, the populations of all neighborhoods and villages were compiled. From this listing, four neighborhoods or villages were selected proportional to size.
• From each of these four neighborhoods or villages, five households were selected at random
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 4
Gallup Poll on Sampling: China (Cont’d)
• One respondent was selected from each of the selected households, ensuring proper representation in the sample of all age groups by both genders
• The respondent to be interviewed is then selected according to a prescribed systematic procedure
• If the designated respondent was not at home, or could not be reached, a second or, if needed, a third adult family member was selected systematically from among the household members remaining on the list
• If contact with the designated respondent could not be made after a total of three separate visits to the household, an interview with a respondent in a substitute household in the same locality was permitted
• Two substitute households were kept in reserve for each five assigned households in the interviewing area
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 5
Gallup Poll on Sampling: China (Cont’d)
• By following this methodology and correcting for any rural/urban sampling issues the Gallup China polls are statistically accurate to within + or – 2%
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 6
National Poll –Sample Size
• Harris Poll – A weekly study that monitors the reactions of
the American public to a variety of economic, political, and social issues
• Sample Size – Based on a nationally representative
telephone survey of 1,000 adults age 18 or over
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 7
AC Nielsen Scantrack Index
• Offers valuable scanner-based sales and brand share data on a regular basis to manufacturers of a wide variety of consumer products such as food, drugs, and cosmetics
• Sample Size– Sales and brand share estimates are gathered
weekly from a representative sample of more than 4,800 stores representing over 800 retailers in 50 major markets
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 8
Sampling vs. Census Studies
• A census study draws inferences from the entire body of units of interest (the population)
• A sample study draws inferences from a sample drawn from the population
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 9
Advantages of Sampling
• Low Cost• Reduced time
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 10
Sampling and Nonsampling Errors
• Sampling error: The difference between a statistic value that is generated through a sampling procedure and the parameter value, which can be determined only through a census study
• Nonsampling error: Any error in a research study other than sampling error (which arises purely because a sample, rather than the entire population, is studied)
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 11
Minimizing Sampling Errors
• Increase the sample size• Use a statistically efficient sampling plan• Make the sample as representative of the
population as possible
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 12
Types of Nonsampling Errors
• Nonsampling Error – Any error other than sampling error
• Sampling Frame Error – Sampling frame not being representative of ideal
population
• Nonresponse Error – Final sample not representative of planned sample
• Data Error – Distortions in collected data and mistakes in data
coding, analysis, or interpretation
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 13
Potential Causes of Sampling Frame Errors
• Incomplete sampling frame over-represents some population segments and underrepresents others
• Sampling frame contains irrelevant units
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 14
Minimizing Sampling Frame Errors
• Start with a complete sampling frame• Modify the sampling frame to make it
representative of the ideal population using plus-one dialing in telephone surveys
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 15
Potential Causes of Nonresponse Errors
• Mail surveys/Internet surveys– Certain types of sample units being more likely
to respond than others
• Telephone and personal interview surveys– Person not-at‑home problem and respondent
refusal problem
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 16
Minimizing Nonresponse Errors
• Mail surveys: increase response rates through the use of incentives, follow-up mailings, etc.– Caution: increase in response rate per se may
not reduce non-response error
• Telephone and personal interview surveys: make call-backs and spread out the time blocks during which interviews are conducted
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 17
Potential Causes of Data Errors
• Respondents’ reluctance/inability to give accurate answers
• Ill-trained interviewers• Unscrupulous interviewers• Poorly designed questionnaire• Mistakes in coding data• Erroneous analysis• Incorrect/ inappropriate interpretation of
results
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 18
Exhibit 11.1 Types and Potential Causes of Nonsampling Errors
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 19
When Census Studies Are Appropriate
• The feasibility condition – Whenever a population is relatively small or
can be accessed easily
• The necessity condition– When the population units are extremely
varied and each population unit is likely to be very different from all the other units
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 20
Probability and Nonprobability Sampling
• Probability sampling is an objective procedure in which the probability of selection is known in advance for each population unit
• Nonprobability sampling is a subjective procedure in which the probability of selection for each population unit is unknown beforehand
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 21
Exhibit 11.3 Classification of Sampling Methods
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 22
Probability Sampling Methods
• Simple Random Sampling• Stratified Random Sampling• Cluster Sampling
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 23
Gallup Poll: USA
• Identify and describe the population that a given poll is attempting to represent
• Choose or design a method that will enable Gallup to sample the target population randomly
• Random Digit Dialing (RDD): a procedure that creates a list of all possible household phone numbers in America and then selects a sub-set of numbers from that list for Gallup to call
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 24
Simple Random Sampling
• Every possible sample of a certain size within a population has a known and equal probability of being chosen as the study sample
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 25
Stratified Random Sampling
• Two Types of Stratified Random Sampling– Proportionate Stratified Random Sampling
– Disproportionate Stratified Random Sampling
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 26
Proportionate Stratified Random Sampling
• Sample consists of units selected from each population stratum in proportion to the total number of units in the stratum
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 27
Kirkwood University- Proportionate Stratified Random Sampling
• Administrators of Kirkwood University wanted to determine the attitudes of their students toward various aspects of the university
• They selected a proportionate stratified random sample of 500 students for conducting the attitude survey
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 28
Table 11.2 Proportionate Allocation of Total Sample of Kirkwood University Students
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 29
Disproportionate Stratified Random Sampling
• Sample consists of units selected from each population stratum according to how varied the units are within the stratum
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 30
Exhibit 11.4 Disproportionate Stratified Random Sampling Used by A.C. Nielsen Company
Copyright © ACNielsen Company. Reprinted by permission.
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 31
Cluster Sampling
• Clusters of population units are selected at random and then all or some units in the chosen clusters are studied
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 32
Systematic Sampling Steps
• An organized procedure, selecting a sample from a list containing all the population units
Steps:
1) Determine the sampling interval,
number of units in the population
k = ------------------------------------------
number of units desired in the sample
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 33
Systematic Sampling Steps (Cont’d)
2) Choose randomly one unit between the first and kth units in the population list
3) The randomly chosen unit and every kth unit thereafter are designated as part of the sample
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 34
Practical Considerations: Probability Sampling Methods
• Probability sampling techniques are generally used by large commercial marketing research firms that maintain national samples or panels that can be readily accessed for conducting periodic research surveys
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 35
Nonprobability Sampling Methods
• Convenience Sampling• Judgment Sampling• Quota sampling
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 36
Convenience Sampling
• Researcher's convenience forms the basis for selecting a sample of units– The administrators of a college have announced a
sharp increase in tuition fees for the next year.
– A TV reporter covering this news item is shown standing on campus talking to several students, one at a time, about their reactions to the proposed tuition fee increase.
– TV Reporter says: “While some of the students feel that the 10 percent fee hike is justified, most of them consider it to be unfair.”
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 37
Judgment Sampling
• A procedure in which a researcher exerts some effort in selecting a sample that he or she believes is most appropriate for a study
• Example– The administrators of a college have announced a
sharp increase in tuition fees for the next year
– A judgment sample of student officers may be more representative than a convenience sample of students
– The researcher should be knowledgeable about the ideal population for a study
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 38
Quota Sampling
• Involves sampling a quota of units to be selected from each population cell based on the judgment of the researchers and/or decision makers
Steps
1) Divide the population into segments (referred to as cells) based on certain control characteristics
2) Determine the quota of units for each cell (quotas are determined by the researchers and/or decision makers)
3) Instruct the interviewers to fill the quotas assigned to the cells
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 39
Parameter & Statistic
• Parameter – The actual, or true, population mean value or
population proportion for any variable • income, product ownership
• Statistic– An estimate of a parameter from sample data
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 40
Sampling Error
• Sampling Error = Parameter Value - Statistic Value
• Difference between a statistic value that is generated through a sampling procedure and the parameter value, which can be determined only through a census study
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 41
Sampling Distribution
• Representation of the sample statistic values obtained from every conceivable sample of a certain size chosen from a population by using a specified sampling procedure along with the relative frequency of occurrence of those statistic values
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 42
µX SX
Sampling Distribution
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 43
5001045094008350730062505200415031002501
Annual expenditure for eating out ($)
Family Number
Table 11.4 Expenditures for Eating Out for a Hypothetical Population
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 44
4759,10
3755,10;6,9;7,8
2751,10;2,9;3,8;4,7;5,6
1751,6;2,5;3,4
751,2
Sample Mean Values ($)
Samples of Two Families
Table 11.5 Partial List of Possible Samples and Sample Means
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 45
Exhibit 11.5 Sampling Distribution (Bar Chart) for Simple Random Samples of Two Units
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 46
Exhibit 11.6 Sampling Distribution Shown as a Histogram
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 47
Central Limit Theorem
• When the sample size is sufficiently large, the sampling distribution associated with the sampling procedure display the properties of a normal distribution.
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 48
Confidence Estimation for Interval Data
n = number of units in the sample
X = sample mean value
Sx = s / n
S = standard deviation
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 49
• Given n = 100, x = 1,278 units, and s = 399 units
• To Construct 95 percent confidence interval s 399
sx = --- = ----- = 39.9 units n 100
• The 95 percent confidence interval is
x ± 1.96 sx = 1,278 ± (1.96)(39.9) = 1,278 ± 78.204 = 1,278 ± 78, approximately
Confidence Estimation for Interval Data (Cont’d)
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 50
Confidence Estimation for Interval Data (Cont’d)
• Interpretation– From the sample data, we can be 95 percent
confident that the average annual sales of men's suits, across all men's clothing stores in the population, are between 1,200 and 1,356 units
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 51
= true population proportion (i.e., the parameter value)Confidence Intervals for Population proportion:
p - 1.96sp p + 1.96sp
p = proportion obtained from a single sample (i.e., the statistic value)
sp = estimate of the standard error of the sample proportion
p = number of sample units having a certain feature total number of sample units (i.e., n)
sp = p (1 - p) n
Finding Confidence Intervals for Population Proportions
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 52
Given n = 100 and p = .64. To construct a 95 percent confidence interval for the population proportion
sp = p (1 - p) n (.64)(.36) = .048
100 The 95 percent confidence interval is p ± 1.96 sp = .64 ± (1.96)(.048)
= .64 ± .09408 = .64 ± .09, approximately.
Finding Confidence Intervals for Population Proportions (Cont’d)
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 53
Finding Confidence Intervals for Population Proportions (Cont’d)
• Interpretation– This confidence interval can also be
expressed in percentage terms: 64% ± 9%
– In other words, we can be 95 percent confident that between 55 and 73 percent of all grocery stores in the city carry potted plants
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 54
Factors Influencing Sample Size
• Desired precision level• Desired confidence level• Degree of variability • Resources available
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 55
Methods for Determining Sample Size
• The desired precision level• The desired confidence level• An estimate of the degree of variability in the
population, expressed in the form of a standard deviation
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 56
zq2 s2
N = ------ H2
zqs H = ----
n
Sample Size Estimation
• H-> Desired precision level • q-> Desired confidence level • S-> Sample Standard deviation • N-> Population mean
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 57
Sample Size Estimation (Cont’d)
• A marketing manager of a frozen-foods firm wants to estimate within ±$10 the average annual amount that families in a certain city spend on frozen foods per year and have 99 percent confidence in the estimate
• He estimates that the standard deviation of annual family expenditures on frozen foods is about $100
• How many families must be chosen for this study?
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 58
Sample Size Estimation (Cont’d)
H = $10, s = $100, and zq = 2.575 (corresponding to a confidence level of 99 percent)
n = (2.575)2(100)2 = 663 families,approximately
(10)2
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 59
Determining Sample Size
• A sporting goods marketer wants to estimate the proportion of tennis players among high school students in the United States
• The marketer wants the estimate to be accurate within ±.02 and wants to have 95 percent confidence in the interval estimate
• A pilot telephone survey of 50 high school students showed that 20 of them played tennis. Estimate the required sample size for the final study from the given data
• What should the sample size be if the desired precision and confidence levels are to be guaranteed?
Copyright © Houghton Mifflin Company. All rights reserved. 11 | 60
Determining Sample Size (Cont’d)
H = .02 and zq = 1.96. p = 20/50 =0.4
s = (20/50)(1 - 20/50) = (.4)(.6) = .24
z2q s2 (l.96)2(.24 )2
n = ------------ = ------------------ H2 (.02)2
= 2,305 students, approximately
The maximum sample size is .25z2
q
nmax = ------------ = 2,401 students H2