it is often said that without water, life would be impossible. similarly, without sampling,...

It is often said that without water,life would be impossible.

Similarly, without sampling, marketing research as we know it would be impossible.

Feinberg, Kinnear, & Taylor (2008, p. 290)

SAMPLING

Chapter 10

http://images.google.com/imgres?imgurl=http://www.tushar-mehta.com/excel/charts/normal_distribution/images/normal2.gif&imgrefurl=http://www.tushar-mehta.com/excel/charts/normal_distribution/&h=336&w=525&sz=5&hl=en&start=11&usg=__BykyUNdD6Jp3zPJ0sc4jW0maAWU=&tbnid=XyW_oZiMt8nOwM:&tbnh=84&tbnw=132&prev=/images%3Fq%3Dnormal%2Bdistribution%26gbv%3D2%26hl%3Den

http://images.google.com/imgres?imgurl=http://wiki.uiowa.edu/download/attachments/2228512/800px-Normal_distribution_pdf.png&imgrefurl=http://wiki.uiowa.edu/display/bstat/The%2BNormal%2BDistribution&h=600&w=800&sz=75&hl=en&start=18&usg=__ENwa6sfLvBA5oD-gr8Irjx7OEUU=&tbnid=E5t6fiWrJZ5o0M:&tbnh=107&tbnw=143&prev=/images%3Fq%3Dnormal%2Bdistribution%26gbv%3D2%26hl%3Den

http://images.google.com/imgres?imgurl=http://www.k12.nf.ca/jakeman/grade_12_03/graphs4.gif&imgrefurl=http://www.k12.nf.ca/jakeman/grade_12_03/graphs.htm&h=647&w=1063&sz=15&hl=en&start=2&usg=__xo7nEH1QeJbvgxoagQLwSxqaHUM=&tbnid=GbzskOq_FkcI1M:&tbnh=91&tbnw=150&prev=/images%3Fq%3Dgraphs%26gbv%3D2%26hl%3Den

Probability vs. Nonprobability Sampling

• Probability Sampling• Each sampling unit has a known probability of being

included in the sample

• Nonprobability sampling• When the probability of selecting each sampling unit is

unknown

Probability Sampling Procedures

• Simple Random Sampling• A sampling approach in which each sampling unit in a target

population has a known and equal probability of being included• Advantage: Good generalizability and unbiased estimates• Disadvantage: must be able to identify all sampling units within a

given population; often, this is not feasible

• Systematic Random Sampling• Similar to random sampling, but work with a list of sampling units

that is ordered in some way (e.g., alphabetically).• Select a starting point at random, then survey each nth person where

the “skip interval” = (population size/desired sample size)• Advantage: quicker and easier than SRS• Disadvantage: may be hidden “patterns” in the data


• Stratified Random Sampling• Break up population into meaningful groups (e.g., men, women),

then sample within each “strata”, then combine• Proportionate stratified sampling: here you sample based on the

size of the populations (i.e., sample more from the bigger strata: e.g., Caucasians)

• Disproportionate stratified sampling: sample the same number of units from each strata, regardless of the strata’s size in the pop.• A variant is optimal allocation: here you use smaller sample

sizes for strata within which there is low variability (as the lower variability will give you more precision with lower N).

• Advantages: more representative; can compare strata• Disadvantages: Can be hard to figure out what to base strata on

(Gender? Ethnicity? Political party?)


• Cluster Sampling

• Similar to stratified random sampling, but with stratified random sampling, the strata are thought to possibly differ between strata (men vs. women), but be homogeneous within strata.

• In cluster sampling, you divide overall population into subpopulations (like SRS), but each of those subpopulations (called “clusters”) are assumed to be mini-representations of the population (e.g., survey customers at 10 Red Robins in WA).

• Area sampling: clusters based on geographic region


• Cluster Sampling• One-step clustering: just select one cluster (e.g., one store);

problem = may not be representative of population

• Two-step cluster sampling: break into meaningful subgroups (Red Robins in big cities vs. Red Robins in suburbs), then randomly sample within each of those clusters

• Advantages: easy to generate sampling frame; cost efficient; representative; can compare clusters

• Disadvantages: must be careful in selecting the basis for clusters; also, within clusters, often little variability (they’re homogeneous), and this lack of variability leads to less precise estimates

Nonprobability Sampling Procedures

• Convenience Sample• Survey people based on convenience (e.g., college students)• Advantage: is fast and easy• Disadvantage: may not be representative

• Judgment Sampling• Use your judgment about who is best to survey• Advantage: Can be better than convenience if judgment is right• Disadvantage: but if judgment wrong, may not be

representative/generalizable

Nonprobability Sampling Procedures

• Quota Sampling• Sample fixed number of people from each of X categories, possibly

based on their relative prevalence in the population• Advantage: Can ensure that certain groups are included• Disadvantage: but b/c you aren’t using random sampling,

generalizability may be questionable

• Snowball Sampling• You contact one person, they contact a friend (e.g., one cancer

survivor is in contact with other survivors, and so recruits them)• Advantages: can make it easier to contact people in hard to reach

groups• Disadvantage: there may be bias in the way people recruit others

Factors Affecting Choice of Sampling Procedure

• Use some type of random sampling if:

• You are collecting quantitative data that you want to use to arrive at accurate generalizations about population

• You have sufficient resources and time

• You have a good sense for the population

• You are sampling over a broader range (e.g., of states, nations)

Computing the Sample Size Based on Usable Rates

• Several factors can reduce your sample size

• Thus, you may want to plan for more than your final sample size (i.e., use a higher “number of contacts” to achieve your final sample size). You adjust using the following three factors:• RR = reachable rate (e.g., how many people on a telephone list

will you actually be able to reach?)• OIR = overall incidence rate (i.e., % of target population that will

qualify for inclusion; e.g., can’t use people over 40)• ECR = expected completion rate (i.e., some folks won’t

complete your survey)

• For example

Computing the Sample Size Based on Usable Rates

• You want a sample size of n = 500

• You figure you can reach 95% of the folks on your list (RR = .95)

• You think 60% will be 40 or younger (OIR = .60)

• You predict that 70% will complete your survey (ECR = .70)

• Based on these numbers, you should contact 1,253 people

253,1)70)(.60)(.95(.

500

(ECR) x (OIR) x (RR)contacts ofNumber

n

Some Key Terms• SamplingSampling

• Selection of a small number of elements from a larger defined target group of elements and expecting that the information gathered from the small group will allow judgments to be made about the larger group

• PopulationPopulation• The identifiable set of elements of interest to the researcher and pertinent to

the information problem• Defined Target PopulationDefined Target Population

• The complete set of elements identified for investigation• ElementElement

• A person or object (e.g., a firm) from the defined target population from which information is sought

• Sampling UnitsSampling Units• The target population elements available for selection during the sampling

process• Sampling FrameSampling Frame

• The list of all eligible sampling units

Some Key Terms• Total Error = Total Error = Sampling Error + Nonsampling ErrorSampling Error + Nonsampling Error

• Sampling ErrorSampling Error• Any type of bias that is attributable to mistakes in either drawing a

sample or determining the sample size

• Nonsampling Error (controllable)Nonsampling Error (controllable)• A bias that occurs in a research study regardless of whether a sample

or a census is used (recall all the different types of errors we discussed)

• Respondent Errors (non response, response errors)• Researcher’s measurement/design errors (survey, data analysis)• Problem definition errors• Administrative errors (data input errors, interview errors, poor

sample design)

Central Limit Theorem• A theory that states that, regardless of the shape of the A theory that states that, regardless of the shape of the

population from which we sample (e.g., positively skewed), as population from which we sample (e.g., positively skewed), as long as our sample size is > 30, the sampling distribution of long as our sample size is > 30, the sampling distribution of the mean (x-bar) will be normally distributed with the the mean (x-bar) will be normally distributed with the following characteristics:following characteristics:

x

n

ssx

The mean of the sampling distribution of the meanwill equal the mean in the population.

The standard error of the sampling distribution ofthe mean will equal sample standard deviation (s)divided by sample size (n). This is a sample estimate of the true standard error in population.The larger the sample size, the more precise we canget about our estimate of the true mean in the population (e.g., in our confidence interval).

variance

Note: Dr. Joireman does not puta “bar” above s or s2.

Computing Standard Deviation

Assume your data are continuous(i.e., are not just yes/no data).

For example, let’s say we want to know how much people would be willing to pay for a tennis racquet.

We sample 7 folks and wish to generalize to the population….

Results

Formulas forVariance and Standard Deviation

N

SSDeviationStandardPopulation

1VarianceSample 2

N

SSs

1DeviationStandardSample

N

SSs

POPULATION

SAMPLE

N

SS 2VariancePopulation

Sum of SquaredDeviations

The Sum of Squared Deviations (SS)

2)(

Formula Conceptual

XXSS i

N

XXSS

22 )(

Formula ScoreRaw

Highlights ConceptTells a Story

“Crank it Out”Faster, Less Meaningful

• Both Formulas Give Identical Answers!• SS = NUMERATOR of the Variance• Examples on board…

Example of Computing Standard Deviation (for a Sample)

Xi Mean Xi-Mean (Xi-Mean)2 X 2

60 75 -15 225 3600

65 75 -10 100 4225

70 75 -5 25 4900

75 75 0 0 5625

80 75 5 25 6400

85 75 10 100 7225

90 75 -15 225 8100

ΣX = 525 Σ(X-M) = 0 ! Σ(X-M)2 = 700 ΣX2 = 40075

700)(

SSfor Formula Conceptual2 XXSS i

70039375400757

)525(40075

)(

SSfor Formula ScoreRaw22

2

N

XXSS

82.1017

700

1DeviationStandardSample

N

SSs

67.11617

700

1VarianceSample 2

N

SSs

09.47

82.10 Mean ofError Standard

n

ssX

This is the “standard deviation” of the sampling distribution of means.This (4.09) will naturally be smaller than our sample standard deviation (10.82)based on our single sample of scores, and it will become smaller as n increases.

Confidence Intervals

A confidence interval is the statistical range of values within which the true value of the

target population parameter is expected to lie.

02.875)96.1)(09.4(75),)((CI Interval Confidence .95 clZbsxx

3.028X 66.98 Interval Confidence Restated, .95

• 95% Confidence Interval: 95% Confidence Interval:

• We are 95% confident that the mean of the population from which we

took our sample has a mean between these lower and upper limits.

• To compute, we need:

Computing Confidence Intervals

Mean of our sample

Standard errorof mean

Critical Z-value for ourdesired level of confidence

(see next page for Z-critical values)

Based on these results, we are 95% confident that the mean in the population from which we sampled is between 66.98 and 83.02. Cool beans!

• To be 90% confident, you use a z-critical value of 1.65



Common Z-Critical Values

+1.96-1.96

.025.025

An example…Z-critical values for 95% confidence

(put ½ of .05 on each side)

What if my data are Yes/No?Here we want to estimate the

population percentage.For example, a CNN poll (9/25/08) asked

whether readers believed Obama and McCain should continue with their plans to

debate on Friday (9/26/08).

Results

Recent Poll on Presidential Debate

%44.9782

)25)(75())((

Percentage Sample ofError Standard Estimated

n

qpsP

86.0%75)96.1)(44(.%75),)((CI Interval Confidence .95 clZbspp

86.7514.74CI Interval Confidence .95 P

Yes = 75% (or yes, but debate on economy)No = 25% No (wait till bailout is taken care of)N = 9782Let’s compute standard error and 95% confidence intervalHere, p = % yes, q = (1-p) or % no

it is often said that without water, life would be impossible. similarly, without sampling,...

Documents