data collection data vocabulary data vocabulary level of measurement level of measurement time...

76
Data Collection Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Time Series and Cross-sectional Data Data Sampling Concepts Sampling Concepts Sampling Methods Sampling Methods Data Sources Data Sources Survey Research Survey Research C h a p t e r 2 2 McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, Inc. All rights reserved.

Upload: imogene-barker

Post on 25-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

Data CollectionData CollectionData CollectionData CollectionData VocabularyData Vocabulary

Level of MeasurementLevel of Measurement

Time Series and Cross-sectional DataTime Series and Cross-sectional Data

Sampling ConceptsSampling Concepts

Sampling MethodsSampling Methods

Data SourcesData Sources

Survey ResearchSurvey Research

Chapter2222

McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, Inc. All rights reserved.

Page 2: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-2

Data VocabularyData VocabularyData VocabularyData Vocabulary

• DataData is the plural form of the Latin is the plural form of the Latin datumdatum (a “given” (a “given” fact).fact).

• In scientific research, In scientific research, datadata arise arise from experiments whose results from experiments whose results are recorded systematically.are recorded systematically.

• Important decisions may depend on Important decisions may depend on data.data.

• In business, In business, datadata usually arise from usually arise from accounting transactions or accounting transactions or management processes.management processes.

Page 3: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-3

Data VocabularyData VocabularyData VocabularyData Vocabulary

Subjects, Variables, Data SetsSubjects, Variables, Data Sets• We will refer to We will refer to DataData as plural and as plural and data setdata set as a as a

particular collection of data as a whole.particular collection of data as a whole.

• ObservationObservation – each data value. – each data value.

• SubjectSubject (or (or individualindividual) – an item for study (e.g., an ) – an item for study (e.g., an employee in your company).employee in your company).

• VariableVariable – a characteristic about the subject or – a characteristic about the subject or individual (e.g., employee’s income).individual (e.g., employee’s income).

Page 4: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-4

Data VocabularyData VocabularyData VocabularyData Vocabulary

Subjects, Variables, Data SetsSubjects, Variables, Data Sets• Three types of data sets:Three types of data sets:Data SetData Set VariablesVariables Typical TasksTypical Tasks

UnivariateUnivariate OneOne Histograms, descriptive Histograms, descriptive statistics, frequency talliesstatistics, frequency tallies

BivariateBivariate TwoTwo Scatter plots, correlations, Scatter plots, correlations, simple regressionsimple regression

MultivariateMultivariate More than More than twotwo

Multiple regression, data Multiple regression, data mining, econometric modelingmining, econometric modeling

Page 5: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-5

Data VocabularyData VocabularyData VocabularyData Vocabulary

Subjects, Variables, Data SetsSubjects, Variables, Data SetsConsider the multivariate data set with Consider the multivariate data set with

5 variables5 variables 8 subjects8 subjects 5 x 8 = 40 observations5 x 8 = 40 observations

Page 6: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-6

Data VocabularyData VocabularyData VocabularyData Vocabulary

Data TypesData Types• A data set may have a mixture of A data set may have a mixture of data typesdata types..

Types of DataTypes of Data

AttributeAttribute(qualitative)(qualitative)

NumericalNumerical(quantitative)(quantitative)

Verbal LabelVerbal LabelXX = economics = economics

(your major)(your major)

CodedCodedXX = 3 = 3

(i.e., economics)(i.e., economics)

DiscreteDiscreteXX = 2 = 2

(your siblings)(your siblings)

ContinuousContinuousXX = 3.15 = 3.15

(your GPA)(your GPA)

Page 7: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-7

Data Data VocabularyData Data Vocabulary

Attribute DataAttribute Data• Also called Also called categoricalcategorical, , nominalnominal or or qualitativequalitative data. data.

• Values are described by words rather than Values are described by words rather than numbers.numbers.

• For example, For example, - Automobile style (e.g., - Automobile style (e.g., XX = full, midsize, = full, midsize, compact, subcompact). compact, subcompact).- Mutual fund (e.g., - Mutual fund (e.g., XX = load, no-load). = load, no-load).

Page 8: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-8

Data VocabularyData VocabularyData VocabularyData Vocabulary

Data CodingData Coding• CodingCoding refers to using numbers to represent refers to using numbers to represent

categories to facilitate statistical analysis.categories to facilitate statistical analysis.

• Coding an attribute as a number does Coding an attribute as a number does notnot make make the data numerical.the data numerical.

• For example, For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate

• Rankings may exist, for example, Rankings may exist, for example, 1 = Liberal, 2 = Moderate, 3 = Conservative 1 = Liberal, 2 = Moderate, 3 = Conservative

Page 9: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-9

Data VocabularyData VocabularyData VocabularyData Vocabulary

Binary DataBinary Data• A A binary variablebinary variable has only two values, has only two values,

1 = presence, 0 = absence of a characteristic of 1 = presence, 0 = absence of a characteristic of interest (codes themselves are arbitrary).interest (codes themselves are arbitrary).

• For example, For example, 1 = employed, 0 = not employed 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = married, 0 = not married 1 = male, 0 = female 1 = male, 0 = female 1 = female, 0 = male 1 = female, 0 = male

• The coding itself has no numerical value so binary The coding itself has no numerical value so binary variables are variables are attribute dataattribute data..

Page 10: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-10

Data VocabularyData VocabularyData VocabularyData Vocabulary

Numerical DataNumerical Data• NumericalNumerical or or quantitativequantitative data arise from counting data arise from counting

or some kind of mathematical operation.or some kind of mathematical operation.• For example, For example,

- Number of auto insurance claims filed in - Number of auto insurance claims filed in March (e.g., March (e.g., XX = 114 claims). = 114 claims).- Ratio of profit to sales for last quarter - Ratio of profit to sales for last quarter (e.g., (e.g., XX = 0.0447). = 0.0447).

• Can be broken down into two types – Can be broken down into two types – discretediscrete or or continuouscontinuous data. data.

Page 11: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-11

Data VocabularyData VocabularyData VocabularyData Vocabulary

Discrete DataDiscrete Data• A numerical variable with a countable number of A numerical variable with a countable number of

values that can be represented by an integer (no values that can be represented by an integer (no fractional values).fractional values).

• For example, For example, - Number of Medicaid patients (e.g., - Number of Medicaid patients (e.g., XX = 2). = 2).- Number of takeoffs at O’Hare (e.g., - Number of takeoffs at O’Hare (e.g., XX = 37). = 37).

Page 12: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-12

Data VocabularyData VocabularyData VocabularyData Vocabulary

Continuous DataContinuous Data• A numerical variable that can have any value A numerical variable that can have any value

within an interval (e.g., length, weight, time, sales, within an interval (e.g., length, weight, time, sales, price/earnings ratios).price/earnings ratios).

• Any continuous interval contains infinitely many Any continuous interval contains infinitely many possible values (e.g., 426 < possible values (e.g., 426 < XX < 428). < 428).

Page 13: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-13

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Four levels of measurement for data:Four levels of measurement for data:

Level of Level of MeasurementMeasurement CharacteristicsCharacteristics ExampleExample

NominalNominal Categories onlyCategories only Eye color (Eye color (blueblue, , brownbrown, , greengreen, , hazelhazel))

OrdinalOrdinal Rank has meaningRank has meaning Bond ratings (Aaa, Aab, Bond ratings (Aaa, Aab, C, D, F, etc.)C, D, F, etc.)

IntervalInterval Distance has Distance has meaningmeaning

Temperature (57Temperature (57oo Celsius)Celsius)

RatioRatio Meaningful zero Meaningful zero existsexists

Accounts payable ($21.7 Accounts payable ($21.7 million)million)

Page 14: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-14

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Nominal MeasurementNominal Measurement• Nominal data merely identify a category.• Nominal data are qualitative, attribute, categorical

or classification data (e.g., Apple, Compaq, Dell, HP).

• Nominal data are usually coded numerically, codes are arbitrary (e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP).

• Only mathematical operations are counting (e.g., frequencies) and simple statistics.

Page 15: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-15

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Ordinal MeasurementOrdinal Measurement• Ordinal data codes can be Ordinal data codes can be rankedranked

(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, (e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).4 = Never).

• DistanceDistance between codes is not meaningful between codes is not meaningful (e.g., distance between 1 and 2, or between 2 and (e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning).3, or between 3 and 4 lacks meaning).

• Many useful statistical tests exist for ordinal data. Many useful statistical tests exist for ordinal data. Especially useful in social science, marketing and Especially useful in social science, marketing and human resource research.human resource research.

Page 16: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-16

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Interval MeasurementInterval Measurement• Data can not only be ranked, but also have Data can not only be ranked, but also have

meaningful intervals between scale points meaningful intervals between scale points (e.g., difference between 60(e.g., difference between 60F and 70F and 70F is same F is same as difference between 20as difference between 20F and 30F and 30F).F).

• Since intervals between numbers represent Since intervals between numbers represent distancesdistances, mathematical operations can be , mathematical operations can be performed (e.g., average).performed (e.g., average).

• Zero point of interval scales is arbitrary, so ratios Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60are not meaningful (e.g., 60F F is notis not twice as warm twice as warm as 30as 30F).F).

Page 17: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-17

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Likert ScalesLikert Scales• A special case of interval data frequently used in A special case of interval data frequently used in

survey research.survey research.• The The coarsenesscoarseness of a Likert scale refers to the of a Likert scale refers to the

number of scale points (typically 5 or 7).number of scale points (typically 5 or 7).

““College-bound high school students should be required to study a College-bound high school students should be required to study a foreign language.” (check one)foreign language.” (check one)

StronglyStronglyAgreeAgree

SomewhatSomewhatAgreeAgree

Neither Neither AgreeAgreeNor Nor

DisagreeDisagree

SomewhatSomewhatDisagreeDisagree

StronglyStronglyDisagreeDisagree

Page 18: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-18

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Likert ScalesLikert Scales• A A neutral midpointneutral midpoint (“Neither Agree Nor Disagree”) (“Neither Agree Nor Disagree”)

is allowed if an is allowed if an oddodd number of scale points is used number of scale points is used or omitted to force the respondent to “lean” one or omitted to force the respondent to “lean” one way or the other.way or the other.

• Likert data are Likert data are coded numerically coded numerically (e.g., 1 to 5) but any (e.g., 1 to 5) but any equally spaced equally spaced values will work.values will work.

Likert coding: Likert coding: 1 to 5 scale1 to 5 scale

Likert coding: Likert coding: -2 to +2 scale-2 to +2 scale

5 = Help a lot5 = Help a lot4 = Help a little4 = Help a little3 = No effect 3 = No effect 2 = Hurt a little2 = Hurt a little1 = Hurt a lot1 = Hurt a lot

+2 = Help a lot+2 = Help a lot+1 = Help a little+1 = Help a little 0 = No effect0 = No effect1 = Hurt a little1 = Hurt a little2 = Hurt a lot2 = Hurt a lot

Page 19: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-19

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Likert ScalesLikert Scales• Careful choice of verbal anchors results in Careful choice of verbal anchors results in

measurable measurable intervalsintervals (e.g., the distance from 1 to (e.g., the distance from 1 to 2 is “the same” as the 2 is “the same” as the intervalinterval, say, from 3 to 4)., say, from 3 to 4).

• Ratios are not meaningful (e.g., here 4 is not Ratios are not meaningful (e.g., here 4 is not twice 2).twice 2).

• Many statistical calculations can be performed Many statistical calculations can be performed (e.g., averages, correlations, etc.).(e.g., averages, correlations, etc.).

Page 20: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-20

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Likert ScalesLikert Scales• More variants of Likert scales:More variants of Likert scales:

How would you rate your marketing instructor? (check one)How would you rate your marketing instructor? (check one)

TerribleTerrible

PoorPoor

AdequateAdequate

GoodGood

ExcellentExcellent

How would you rate your marketing instructor? (check one)How would you rate your marketing instructor? (check one)

Very BadVery Bad Very GoodVery Good

Page 21: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-21

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

AmbiguityAmbiguity• Grades are usually coded numerically Grades are usually coded numerically

((AA = 4, = 4, BB = 3, = 3, CC = 2, = 2, DD = 1 = 1, F, F = 0) and are used to = 0) and are used to calculate a mean GPA.calculate a mean GPA.

• Is the Is the intervalinterval from 3.0 to 4.0 really the same as from 3.0 to 4.0 really the same as the interval from 1.0 to 2.0?the interval from 1.0 to 2.0?

• What is the underlying reality ranging from 0 to 4 What is the underlying reality ranging from 0 to 4 that we are measuring?that we are measuring?

• Best to be conservative and limit statistical tests to Best to be conservative and limit statistical tests to those for ordinal data.those for ordinal data.

Page 22: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-22

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Ratio MeasurementRatio Measurement• Ratio dataRatio data have all properties of nominal, ordinal have all properties of nominal, ordinal

and interval data types and also possess a and interval data types and also possess a meaningful zeromeaningful zero (absence of quantity being (absence of quantity being measured).measured).

• Because of this zero point, ratios of data values Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as are meaningful (e.g., $20 million profit is twice as much as $10 million).much as $10 million).

• Zero does not have to be observable in the data, Zero does not have to be observable in the data, it is an absolute reference point.it is an absolute reference point.

Page 23: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-23

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement Use the following procedure to Use the following procedure to

recognize data types: recognize data types:

QuestionQuestion If “Yes”If “Yes”

Q1. Is there a Q1. Is there a meaningful zero point?meaningful zero point?

Ratio data (all statistical operations are Ratio data (all statistical operations are allowed)allowed)

Q2. Are intervals Q2. Are intervals between scale points between scale points meaningful?meaningful?

Interval data (common statistics allowed, Interval data (common statistics allowed, e.g., means and standard deviations)e.g., means and standard deviations)

Q3. Do scale points Q3. Do scale points represent rankings?represent rankings?

Ordinal data (restricted to certain types Ordinal data (restricted to certain types of nonparametric statistical tests)of nonparametric statistical tests)

Q4. Are there discrete Q4. Are there discrete categories?categories?

Nominal data (only counting allowed, Nominal data (only counting allowed, e.g. finding the mode)e.g. finding the mode)

Page 24: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-24

Level of MeasurementLevel of MeasurementLevel of MeasurementLevel of Measurement

Changing Data by RecodingChanging Data by Recoding• In order to simplify data or when exact data In order to simplify data or when exact data

magnitude is of little interest, ratio data can be magnitude is of little interest, ratio data can be recoded recoded downwarddownward into ordinal or nominal into ordinal or nominal measurements (but not conversely).measurements (but not conversely).

• For example, recode systolic blood pressure as For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “normal” (under 130), “elevated” (130 to 140), or “high” (over 140).“high” (over 140).

• The above recoded data are ordinal (ranking is The above recoded data are ordinal (ranking is preserved) but intervals are unequal and some preserved) but intervals are unequal and some information is lost.information is lost.

Page 25: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-25

Time Series and Cross-sectional Time Series and Cross-sectional DataData

Time Series and Cross-sectional Time Series and Cross-sectional DataData

Time Series DataTime Series Data• Each observation in the sample represents a Each observation in the sample represents a

different equally spaced point in time (e.g., years, different equally spaced point in time (e.g., years, months, days).months, days).

• PeriodicityPeriodicity may be annual, quarterly, monthly, may be annual, quarterly, monthly, weekly, daily, hourly, etc.weekly, daily, hourly, etc.

• We are interested in We are interested in trends and patterns over timetrends and patterns over time (e.g., annual growth in (e.g., annual growth in consumer debit card use consumer debit card use from 1999 to 2006). from 1999 to 2006).

Page 26: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-26

Time Series and Cross-sectional Time Series and Cross-sectional DataData

Time Series and Cross-sectional Time Series and Cross-sectional DataData

Cross-sectional DataCross-sectional Data• Each observation represents a different individual Each observation represents a different individual

unit (e.g., person) at the same point in time unit (e.g., person) at the same point in time (e.g., monthly VISA balances).(e.g., monthly VISA balances).

• We are interested in We are interested in - - variation among observationsvariation among observations or in or in - - relationships.relationships.

• We can combine the two data types to get We can combine the two data types to get pooled pooled cross-sectional and time series data.cross-sectional and time series data.

Page 27: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-27

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Sample or Census?Sample or Census?• A A samplesample involves looking only at some items involves looking only at some items

selected from the population.selected from the population.• A A censuscensus is an examination of all items in a is an examination of all items in a

defined population.defined population.

- MobilityMobility- Illegal immigrants- Illegal immigrants- Budget constraints- Budget constraints- Incomplete responses or nonresponses- Incomplete responses or nonresponses

• Why can’t the United States Census survey every Why can’t the United States Census survey every person in the population?person in the population?

Page 28: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-28

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Situations Where A Situations Where A SampleSample May Be Preferred: May Be Preferred:

Infinite PopulationInfinite PopulationNo census is possible if the population is infinite or of indefinite size No census is possible if the population is infinite or of indefinite size (an assembly line can keep producing bolts, a doctor can keep (an assembly line can keep producing bolts, a doctor can keep seeing more patients).seeing more patients).

Destructive TestingDestructive TestingThe act of sampling may destroy or devalue the item (measuring The act of sampling may destroy or devalue the item (measuring battery life, testing auto crashworthiness, or testing aircraft turbofan battery life, testing auto crashworthiness, or testing aircraft turbofan engine life). engine life).

Timely ResultsTimely ResultsSampling may yield more timely results than a census (checking Sampling may yield more timely results than a census (checking wheat samples for moisture and protein content, checking peanut wheat samples for moisture and protein content, checking peanut butter for aflatoxin contamination). butter for aflatoxin contamination).

Page 29: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-29

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Situations Where A Situations Where A SampleSample May Be Preferred: May Be Preferred:

AccuracyAccuracySample estimates can be more accurate than a census. Instead of Sample estimates can be more accurate than a census. Instead of spreading limited resources thinly to attempt a census, our budget spreading limited resources thinly to attempt a census, our budget of time and money might be better spent to hire experienced staff, of time and money might be better spent to hire experienced staff, improve training of field interviewers, and improve data safeguards.improve training of field interviewers, and improve data safeguards.

CostCostEven if it is feasible to take a census, the cost, either in time or Even if it is feasible to take a census, the cost, either in time or money, may exceed our budget.money, may exceed our budget.

Sensitive InformationSensitive InformationSome kinds of information are better captured by a well-designed Some kinds of information are better captured by a well-designed sample, rather than attempting a census. Confidentiality may also sample, rather than attempting a census. Confidentiality may also be improved in a carefully-done sample.be improved in a carefully-done sample.

Page 30: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-30

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Situations Where A Situations Where A CensusCensus May Be Preferred May Be Preferred

Small PopulationSmall PopulationIf the population is small, there is little reason to sample, for the effort of If the population is small, there is little reason to sample, for the effort of data collection may be only a small part of the total cost.data collection may be only a small part of the total cost.

Large Sample SizeLarge Sample SizeIf the required sample size approaches the population size, we might as If the required sample size approaches the population size, we might as well go ahead and take a census.well go ahead and take a census.

Legal RequirementsLegal RequirementsBanks must count Banks must count allall the cash in bank teller drawers at the end of each the cash in bank teller drawers at the end of each business day. The U.S. Congress forbade sampling in the 2000 decennial business day. The U.S. Congress forbade sampling in the 2000 decennial population census.population census.

Database ExistsDatabase ExistsIf the data are on disk we can examine 100% of the cases. But auditing or If the data are on disk we can examine 100% of the cases. But auditing or validating data against physical records may raise the cost.validating data against physical records may raise the cost.

Page 31: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-31

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Parameters and StatisticsParameters and Statistics• StatisticsStatistics are computed from a sample of are computed from a sample of nn items, items,

chosen from a population of chosen from a population of NN items. items.• Statistics can be used as estimates of Statistics can be used as estimates of parametersparameters

found in the population.found in the population.• Symbols are used to represent population Symbols are used to represent population

parameters and sample statistics.parameters and sample statistics.

Page 32: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-32

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Parameters and StatisticsParameters and Statistics

StatisticStatistic Any measurement computed from a Any measurement computed from a samplesample. Usually, . Usually, the statistic is regarded as an estimate of a population the statistic is regarded as an estimate of a population parameter. Sample statistics are often (but not parameter. Sample statistics are often (but not always) represented by Roman letters.always) represented by Roman letters.

Parameter or Statistic?Parameter or Statistic?

ParameterParameter Any measurement that describes an entire Any measurement that describes an entire populationpopulation. . Usually, the parameter value is unknown since we Usually, the parameter value is unknown since we rarely can observe the entire population. Parameters rarely can observe the entire population. Parameters are often (but not always) represented by Greek are often (but not always) represented by Greek letters.letters.

Page 33: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-33

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Parameters and StatisticsParameters and Statistics• The population must be carefully specified and the The population must be carefully specified and the

sample must be drawn scientifically so that the sample must be drawn scientifically so that the sample is representative.sample is representative.

• The The target populationtarget population is the population we are is the population we are interested in (e.g., U.S. gasoline prices).interested in (e.g., U.S. gasoline prices).

Target PopulationTarget Population

• The The sampling framesampling frame is the group from which we is the group from which we take the sample (e.g., 115,000 stations).take the sample (e.g., 115,000 stations).

• The frame should not differ from the target The frame should not differ from the target population.population.

Page 34: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-34

NN nn

Finite or Infinite?Finite or Infinite?• A population is A population is finitefinite if it has a definite size, even if if it has a definite size, even if

its size is unknown.its size is unknown.• A population is A population is infiniteinfinite if it is of arbitrarily large if it is of arbitrarily large

size.size.• Rule of Thumb: A population may be treated as Rule of Thumb: A population may be treated as

infinite when infinite when NN is at least 20 times is at least 20 times n n (i.e., when (i.e., when NN//nn > 20) > 20)

Sampling ConceptsSampling ConceptsSampling ConceptsSampling Concepts

Here,Here,NN//nn > 20 > 20

Page 35: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-35

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Probability SamplesProbability Samples

Simple Random Simple Random SampleSample

Use random numbers to select items Use random numbers to select items from a list (e.g., VISA cardholders).from a list (e.g., VISA cardholders).

Systematic SampleSystematic Sample Select every Select every kkth item from a list or th item from a list or sequence (e.g., restaurant customers).sequence (e.g., restaurant customers).

Stratified SampleStratified Sample Select randomly within defined strata Select randomly within defined strata (e.g., by age, occupation, gender).(e.g., by age, occupation, gender).

Cluster SampleCluster Sample Like stratified sampling except strata Like stratified sampling except strata are geographical areas (e.g., zip are geographical areas (e.g., zip codes).codes).

Page 36: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-36

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Nonprobability SamplesNonprobability Samples

Judgment Judgment SampleSample

Use expert knowledge to choose Use expert knowledge to choose “typical” items (e.g., which employees “typical” items (e.g., which employees to interview).to interview).

Convenience Convenience SampleSample

Use a sample that happens to be Use a sample that happens to be available (e.g., ask co-worker opinions available (e.g., ask co-worker opinions at lunch).at lunch).

Page 37: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-37

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Simple Random SampleSimple Random Sample• Every item in the population of Every item in the population of NN items has the items has the

same chance of being chosen in the sample of same chance of being chosen in the sample of nn items.items.

• We rely on We rely on random random numbersnumbers to select a to select a name.name.

=RANDBETWEEN(1,48)=RANDBETWEEN(1,48)

Page 38: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-38

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Random Number TablesRandom Number Tables• A table of random digits used to select random A table of random digits used to select random

numbers between 1 and numbers between 1 and N.N.• Each digit 0 through 9 is equally likely to be Each digit 0 through 9 is equally likely to be

chosen.chosen. Setting Up a RuleSetting Up a Rule

• For example, NilCo wants to award cash prizes to For example, NilCo wants to award cash prizes to 10 of its 875 loyal customers.10 of its 875 loyal customers.

• To get 10 three-digit numbers between 001 and To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving 875, we define any consistent rule for moving through the random number table.through the random number table.

Page 39: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-39

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Setting Up a RuleSetting Up a Rule• Randomly point at the table to choose a starting Randomly point at the table to choose a starting

point.point.

• Choose the first three digits of the selected five-Choose the first three digits of the selected five-digit block, move to the right one column, down digit block, move to the right one column, down one row, and repeat.one row, and repeat.

• When we reach the end of a line, wrap around to When we reach the end of a line, wrap around to the other side of the table and continue.the other side of the table and continue.

• Discard any number greater than 875 and any Discard any number greater than 875 and any duplicates.duplicates.

Page 40: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-40

82134 14458 66716 54269 31928 46241 03052 00260 32367 25783

07139 16829 76768 11913 42434 91961 92934 18229 15595 02566

45056 43939 31188 43272 11332 99494 19348 97076 95605 28010

10244 19093 51678 63463 85568 70034 82811 23261 48794 63984

12940 84434 50087 20189 58009 66972 05764 10421 36875 64964

84438 45828 40353 28925 11911 53502 24640 96880 93166 68409

98681 67871 71735 64113 90139 33466 65312 90655 75444 30845

43290 96753 18799 49713 39227 15955 46167 63853 03633 19990

96893 85410 88233 22094 30605 79024 01791 38839 85531 94576

75403 41227 00192 16814 47054 16814 81349 92264 01028 29071

78064 92111 51541 76563 69027 67718 06499 71938 17354 12680

26246 71746 94019 93165 96713 03316 75912 86209 12081 57817

98766 67312 96358 21351 86448 31828 86113 78868 67243 06763

37895 51055 11929 44443 15995 72935 99631 18190 85877 31309

27988 81163 52212 25102 61798 28670 01358 60354 74015 18556

19216 53008 44498 19262 12196 93947 90162 76337 12646 26838

28078 86729 69438 24235 35208 48957 53529 76297 41741 54735

34455 61363 93711 68038 75960 16327 95716 66964 28634 65015

53510 90412 70438 45932 57815 75144 52472 61817 41562 42084

30658 18894 88208 97867 30737 94985 18235 02178 39728 66398

Table of 1,000 Random DigitsTable of 1,000 Random DigitsStart HereStart Here

Page 41: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-41

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

With or Without ReplacementWith or Without Replacement• If we allow duplicates when sampling, then we are If we allow duplicates when sampling, then we are

sampling sampling with replacementwith replacement..

• Duplicates are unlikely when Duplicates are unlikely when nn is much smaller is much smaller than than NN..

• If we If we do notdo not allow duplicates when sampling, then allow duplicates when sampling, then we are sampling we are sampling without replacementwithout replacement..

Page 42: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-42

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Computer MethodsComputer Methods

These are These are pseudo-randompseudo-random generators because even the best generators because even the best algorithms eventually repeat themselves.algorithms eventually repeat themselves.

Excel - Option AExcel - Option A Enter the Excel function Enter the Excel function =RANDBETWEEN(1,875)=RANDBETWEEN(1,875) into 10 spread-sheet cells. Press F9 to get a new into 10 spread-sheet cells. Press F9 to get a new sample.sample.

Excel - Option BExcel - Option B Enter the function Enter the function =INT(1+875*RAND())=INT(1+875*RAND()) into 10 into 10 spreadsheet cells. Press F9 to get a new sample.spreadsheet cells. Press F9 to get a new sample.

InternetInternet The web site www.random.org will give you many The web site www.random.org will give you many kinds of excellent random numbers (integers, kinds of excellent random numbers (integers, decimals, etc).decimals, etc).

MinitabMinitab Use Minitab’s Random Data menu with the Integer Use Minitab’s Random Data menu with the Integer option.option.

Page 43: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-43

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Randomizing a ListRandomizing a List• In Excel, use function =RAND() beside each row In Excel, use function =RAND() beside each row

to create a column of random numbers between to create a column of random numbers between 0 and 1.0 and 1.

• Copy and paste these numbers into the same Copy and paste these numbers into the same column using “Paste Special | Values” (to paste column using “Paste Special | Values” (to paste only the values and not the formulas).only the values and not the formulas).

• Sort the spreadsheet on the random number Sort the spreadsheet on the random number column.column.

Page 44: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-44

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

• The first The first nn items items are a random are a random sample of the sample of the entire list (they entire list (they are as likely as are as likely as any others).any others).

Randomizing a ListRandomizing a List

Page 45: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-45

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Systematic SamplingSystematic Sampling

• For example, starting at item 2, we sample every For example, starting at item 2, we sample every k k = 4 items to obtain a sample of = 4 items to obtain a sample of nn = 20 items from = 20 items from a list of a list of NN = 78 items. = 78 items.

• Note that Note that NN//n = n = 78/20 78/20 4. 4.

• Sample by choosing every Sample by choosing every kkth item from a list, th item from a list, starting from a randomly chosen entry on the list.starting from a randomly chosen entry on the list.

Page 46: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-46

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Systematic SamplingSystematic Sampling• A systematic sample of A systematic sample of nn items from a population items from a population

of of NN items requires that periodicity items requires that periodicity kk be be approximately approximately N/nN/n..

• Systematic sampling should yield acceptable Systematic sampling should yield acceptable results unless patterns in the population happen to results unless patterns in the population happen to recur at periodicity recur at periodicity kk..

• Can be used with unlistable or infinite populations.Can be used with unlistable or infinite populations.

• Systematic samples are well-suited to linearly organized physical populations.

Page 47: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-47

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Systematic SamplingSystematic Sampling• For example, out of 501 companies, we want to For example, out of 501 companies, we want to

obtain a sample of 25. What should the periodicity obtain a sample of 25. What should the periodicity kk be? be?

k = Nk = N//n n = 501/25= 501/25 20. 20.

• So, we should choose every 20So, we should choose every 20thth company from a company from a random starting point. random starting point.

Page 48: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-48

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Stratified SamplingStratified Sampling• Utilizes prior information about the population.

• Applicable when the population can be divided into relatively homogeneous subgroups of known size (strata).

• A simple random sample of the desired size is taken within each stratum.

• For example, from a population containing 55% males and 45% females, randomly sample 120 males and 80 females (n = 200).

Page 49: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-49

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Stratified SamplingStratified Sampling• Or, take a random sample of the entire population Or, take a random sample of the entire population

and then combine individual strata estimates using and then combine individual strata estimates using appropriate weights.appropriate weights.

• For a population with For a population with LL strata, the population size strata, the population size NN is the sum of the stratum sizes: is the sum of the stratum sizes: NN = = NN11 + + NN22 + ... + + ... + NNLL

• The weight assigned to stratum The weight assigned to stratum jj is is wwjj = = NNjj / / nn

• For example, take a random sample of For example, take a random sample of nn = 200 = 200 and then weight the responses for males by and then weight the responses for males by wwMM = = .55 and for females by .55 and for females by wwFF = .45. = .45.

Page 50: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-50

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Cluster SampleCluster Sample• Strata consist of geographical regions.Strata consist of geographical regions.

• One-stageOne-stage cluster sampling – sample consists of cluster sampling – sample consists of all elements in each of all elements in each of kk randomly chosen randomly chosen subregions (clusters).subregions (clusters).

• Two-stageTwo-stage cluster sampling, first choose cluster sampling, first choose kk subregions (clusters), then choose a random subregions (clusters), then choose a random sample of elements within each cluster.sample of elements within each cluster.

Page 51: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-51

• Here is an Here is an example of 4 example of 4 elements sampled elements sampled from each of 3 from each of 3 randomly chosen randomly chosen clusters (two-stage clusters (two-stage cluster sampling).cluster sampling).

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Cluster SampleCluster Sample

Page 52: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-52

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Cluster SampleCluster Sample• Cluster sampling is useful whenCluster sampling is useful when

- Population frame and stratum characteristics are- Population frame and stratum characteristics are not readily available not readily available- It is too expensive to obtain a simple or stratified- It is too expensive to obtain a simple or stratified sample sample- The cost of obtaining data increases sharply with- The cost of obtaining data increases sharply with distance distance- Some loss of reliability is acceptable- Some loss of reliability is acceptable

Page 53: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-53

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Judgment SampleJudgment Sample• A nonprobability sampling method that relies on A nonprobability sampling method that relies on

the expertise of the sampler to choose items that the expertise of the sampler to choose items that are representative of the population.are representative of the population.

• Can be affected by subconscious bias (i.e., Can be affected by subconscious bias (i.e., nonrandomnessnonrandomness in the choice). in the choice).

• Quota samplingQuota sampling is a special kind of judgment is a special kind of judgment sampling, in which the interviewer chooses a sampling, in which the interviewer chooses a certain number of people in each category.certain number of people in each category.

Page 54: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-54

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Convenience SampleConvenience Sample• Take advantage of whatever sample is available at Take advantage of whatever sample is available at

that moment. A quick way to sample.that moment. A quick way to sample.

• Sample size depends on the inherent variability of Sample size depends on the inherent variability of the quantity being measured and on the desired the quantity being measured and on the desired precision of the estimate.precision of the estimate.

Sample SizeSample Size

Page 55: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-55

Type of DataType of Data ExamplesExamples

U.S. general dataU.S. general data Statistical Abstract of the U.S.Statistical Abstract of the U.S.

U.S. economic dataU.S. economic data Economic Report of the PresidentEconomic Report of the President

AlmanacsAlmanacs World Almanac, Time AlmanacWorld Almanac, Time Almanac

PeriodicalsPeriodicals Economist, Business Week, FortuneEconomist, Business Week, Fortune

IndexesIndexes New York Times, Wall Street JournalNew York Times, Wall Street Journal

DatabasesDatabases CompuStat, Citibase, U.S. CensusCompuStat, Citibase, U.S. Census

World dataWorld data CIA World FactbookCIA World Factbook

WebWeb Google, Yahoo, msnGoogle, Yahoo, msn

Data SourcesData SourcesData SourcesData Sources

Useful Data SourcesUseful Data Sources

Page 56: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-56

• Step 1:Step 1: State the goals of the research State the goals of the research

• Step 2:Step 2: Develop the budget (time, money, staff)Develop the budget (time, money, staff)

• Step 3:Step 3: Create a research design (target population, Create a research design (target population,

frame, sample size) frame, sample size)

• Step 4:Step 4: Choose a survey type and method ofChoose a survey type and method of administration administration

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Basic Steps of Survey ResearchBasic Steps of Survey Research

Page 57: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-57

• Step 5:Step 5: Design a data collection instrumentDesign a data collection instrument (questionnaire) (questionnaire)

• Step 6:Step 6: Pretest the survey instrument and revise asPretest the survey instrument and revise as needed needed

• Step 7:Step 7: Administer the survey (follow up if needed)Administer the survey (follow up if needed)

• Step 8:Step 8: Code the data and analyze itCode the data and analyze it

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Basic Steps of Survey ResearchBasic Steps of Survey Research

Page 58: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-58

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of

SurveySurveyCharacteristicsCharacteristics

MailMail You need a well-targeted and current mailing list You need a well-targeted and current mailing list (people move a lot). Low response rates are typical (people move a lot). Low response rates are typical and nonresponse bias is expected (nonrespondents and nonresponse bias is expected (nonrespondents differ from those who respond). Zip code lists (often differ from those who respond). Zip code lists (often costly) are an attractive option to define strata of costly) are an attractive option to define strata of similar income, education, and attitudes. To similar income, education, and attitudes. To encourage participation, a cover letter should clearly encourage participation, a cover letter should clearly explain the uses to which the data will be put. Plan explain the uses to which the data will be put. Plan for follow-up mailings. for follow-up mailings.

Page 59: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-59

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey TypesSurvey Types

Type of Type of SurveySurvey

CharacteristicsCharacteristics

TelephoneTelephone Random dialing yields very low response and is Random dialing yields very low response and is poorly targeted. Purchased phone lists help reach poorly targeted. Purchased phone lists help reach the target population, though a low response rate the target population, though a low response rate still is typical (disconnected phones, caller still is typical (disconnected phones, caller screening, answering machines, work hours, no-screening, answering machines, work hours, no-call lists). Other sources of nonresponse bias call lists). Other sources of nonresponse bias include the growing number of non-English include the growing number of non-English speakers and distrust caused by scams and speakers and distrust caused by scams and spams.spams.

Page 60: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-60

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey TypesSurvey Types

Type of Type of SurveySurvey

CharacteristicsCharacteristics

InterviewsInterviews Interviewing is expensive and time-consuming, yet Interviewing is expensive and time-consuming, yet a trade-off between sample size for high-quality a trade-off between sample size for high-quality results may still be worth it. Interviews must be results may still be worth it. Interviews must be carefully handled so interviewers must be well-carefully handled so interviewers must be well-trained – an added cost. But you can obtain trained – an added cost. But you can obtain information on complex or sensitive topics (e.g., information on complex or sensitive topics (e.g., gender discrimination in companies, birth control gender discrimination in companies, birth control practices, diet and exercise habits). practices, diet and exercise habits).

Page 61: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-61

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey TypesSurvey Types

Type of Type of SurveySurvey

CharacteristicsCharacteristics

WebWeb Web surveys are growing in popularity, but are Web surveys are growing in popularity, but are subject to nonresponse bias because those who subject to nonresponse bias because those who participate may differ from those who feel too busy, participate may differ from those who feel too busy, don’t own computers or distrust your motives don’t own computers or distrust your motives (scams and spam are again to blame). This type of (scams and spam are again to blame). This type of survey works best when targeted to a well-defined survey works best when targeted to a well-defined interest group on a question of self-interest (e.g., interest group on a question of self-interest (e.g., views of CPAs on new proposed accounting rules, views of CPAs on new proposed accounting rules, frequent flyer views on airline security).frequent flyer views on airline security).

Page 62: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-62

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of SurveySurvey

CharacteristicsCharacteristics

Direct Direct ObservationObservation

This can be done in a controlled setting (e.g., This can be done in a controlled setting (e.g., psychology lab) but requires informed consent, psychology lab) but requires informed consent, which can change behavior. Unobtrusive which can change behavior. Unobtrusive observation is possible in some nonlab settings observation is possible in some nonlab settings (e.g., what percentage of airline passengers carry (e.g., what percentage of airline passengers carry on more than two bags, what percentage of SUVs on more than two bags, what percentage of SUVs carry no passengers, what percentage of drivers carry no passengers, what percentage of drivers wear seat belts).wear seat belts).

Page 63: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-63

PlanPlan What is the purpose of the survey? What is the purpose of the survey? Consider staff expertise, needed skills, Consider staff expertise, needed skills, degree of precision, budget.degree of precision, budget.

DesignDesign Invest time and money in designing the Invest time and money in designing the survey. Use books and references to survey. Use books and references to avoid unnecessary errors.avoid unnecessary errors.

QualityQuality Take care in preparing a quality survey Take care in preparing a quality survey so that people will take you seriously.so that people will take you seriously.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey GuidelinesSurvey Guidelines

Page 64: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-64

Pilot TestPilot Test Pretest on friends or co-workers to make Pretest on friends or co-workers to make sure the survey is clear.sure the survey is clear.

Buy-inBuy-in Improve response rates by stating the Improve response rates by stating the purpose of the survey, offering a token of purpose of the survey, offering a token of appreciation or paving the way with appreciation or paving the way with endorsements.endorsements.

ExpertiseExpertise Work with a consultant early on.Work with a consultant early on.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey GuidelinesSurvey Guidelines

Page 65: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-65

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Getting AdviceGetting Advice• Consider hiring a consultant in the early stages.Consider hiring a consultant in the early stages.

• Many resources are available to help Many resources are available to help - The American Statistical Association- The American Statistical Association

- The Research Industry Coalition- The Research Industry Coalition

- - The Council of American Survey Research OrganizationsThe Council of American Survey Research Organizations

Page 66: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-66

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

• Use a lot of white space in layout.Use a lot of white space in layout.

Questionnaire DesignQuestionnaire Design

• Begin with short, clear instructions.Begin with short, clear instructions.

• State the survey purpose. State the survey purpose.

• Assure anonymity.Assure anonymity.

• Instruct on how to submit the completed survey.Instruct on how to submit the completed survey.

Page 67: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-67

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Questionnaire DesignQuestionnaire Design

• Break survey into naturally occurring sections.Break survey into naturally occurring sections.

• Let respondents bypass sections that are not Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, applicable (e.g., “if you answered no to question 7, skip directly to Question 15”).skip directly to Question 15”).

• Pretest and revise as needed.Pretest and revise as needed.

• Keep as short as possible.Keep as short as possible.

Page 68: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-68

Questionnaire DesignQuestionnaire Design

Type of QuestionType of Question ExampleExample

Open-ended questionOpen-ended question Briefly describe your job goals.Briefly describe your job goals.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Fill-in-the-blankFill-in-the-blank How many times did you attend formal How many times did you attend formal religious services during the last year? religious services during the last year?

________ times________ timesCheck boxesCheck boxes Which of these statistics packagesWhich of these statistics packages

have you ever used?have you ever used? SAS SAS Visual Statistics Visual Statistics SPSS SPSS MegaStat MegaStat Systat Systat Minitab Minitab

Page 69: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-69

Type of QuestionType of Question ExampleExample

Questionnaire DesignQuestionnaire Design

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Ranked choicesRanked choices ““Please evaluate your dining experience”Please evaluate your dining experience”

ExcellentExcellent GoodGood FairFair PoorPoor

FoodFood

ServiceService

AmbianceAmbiance

CleanlinessCleanliness

OverallOverall

Page 70: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-70

Type of QuestionType of Question ExampleExample

PictogramsPictograms ““What do you think of the President’sWhat do you think of the President’seconomic policies?” (circle one)economic policies?” (circle one)

Questionnaire DesignQuestionnaire Design

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Likert scaleLikert scale Statistics is a difficult subject.Statistics is a difficult subject. NeitherNeither Strongly Slightly Agree Nor Slightly StronglyStrongly Slightly Agree Nor Slightly Strongly Agree Agree Disagree Disagree DisagreeAgree Agree Disagree Disagree Disagree

Page 71: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-71

• The way a question is asked has a profound The way a question is asked has a profound influence on the response. For example,influence on the response. For example,

1.1. Shall state taxes be cut?Shall state taxes be cut?

2.2. Shall state taxes be cut, if it means Shall state taxes be cut, if it means reducing highway maintenance?reducing highway maintenance?

3.3. Shall state taxes be cut, it is means firing Shall state taxes be cut, it is means firing teachers and police?teachers and police?

Question WordingQuestion Wording

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Page 72: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-72

• Make sure you have covered all the possibilities. Make sure you have covered all the possibilities. For example,For example,

Are you married? Are you married? Yes Yes No No

• Overlapping classes or Overlapping classes or unclear categories are a unclear categories are a problem. For example, problem. For example,

How old is your father? How old is your father? 35 – 45 35 – 45 45 – 55 45 – 55 55 – 65 55 – 65 65 or older 65 or older

Question WordingQuestion Wording

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Page 73: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-73

• Responses are usually coded numerically Responses are usually coded numerically (e.g., 1 = male 2 = female).(e.g., 1 = male 2 = female).

• Missing values are typically denoted by special Missing values are typically denoted by special characters (e.g., blank, “.” or “*”). characters (e.g., blank, “.” or “*”).

• Discard questionnaires that are flawed or missing Discard questionnaires that are flawed or missing many responses.many responses.

• Watch for multiple responses, outrageous or Watch for multiple responses, outrageous or inconsistent replies or range answers.inconsistent replies or range answers.

• Follow-up if necessary and always document your Follow-up if necessary and always document your data-coding decisions.data-coding decisions.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Coding and Data ScreeningCoding and Data Screening

Page 74: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-74

Source of ErrorSource of Error CharacteristicsCharacteristics

Nonresponse biasNonresponse bias Respondents differ from nonrespondentsRespondents differ from nonrespondents

Selection biasSelection bias Self-selected respondents are atypicalSelf-selected respondents are atypical

Response errorResponse error Respondents give false informationRespondents give false information

Coverage errorCoverage error Incorrect specification of frame or Incorrect specification of frame or populationpopulation

Interviewer errorInterviewer error Responses influenced by interviewerResponses influenced by interviewer

Measurement errorMeasurement error Survey instrument wording is biased or Survey instrument wording is biased or unclearunclear

Sampling errorSampling error Random and unavoidableRandom and unavoidable

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research Sources of ErrorSources of Error

Page 75: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-75

• Enter data into a spreadsheet or database as a Enter data into a spreadsheet or database as a “flat file” (“flat file” (n n subjects x subjects x mm variables matrix). variables matrix).

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Data File FormatData File Format

Page 76: Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

2-76

• Using commas (,), dollar signs ($), or percents (%) Using commas (,), dollar signs ($), or percents (%) as part of the values may result in your data being as part of the values may result in your data being treated as text values.treated as text values.

• A numerical variable may only contain the digits A numerical variable may only contain the digits 0-9, a decimal point, and a minus sign.0-9, a decimal point, and a minus sign.

• To avoid round-off errors, format the data column To avoid round-off errors, format the data column as plain numbers with the desired number of as plain numbers with the desired number of decimal places decimal places beforebefore you copy the data to a you copy the data to a statistical package.statistical package.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Advice on Copying DataAdvice on Copying Data