data collection definitions level of measurement time series and cross- sectional data sampling...

80
Data Collection Data Collection Definitions Level of Measurement Time Series and Cross-sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research C h a p t e r 2 2 McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Upload: anthony-mcbride

Post on 12-Jan-2016

237 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data CollectionData Collection

Definitions

Level of Measurement

Time Series and Cross-sectional Data

Sampling Concepts

Sampling Methods

Data Sources

Survey Research

Chapter2222

McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.

Page 2: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data: Singular or Plural?Data: Singular or Plural?

DataData is the plural form of the Latin is the plural form of the Latin datumdatum (a “given” fact).(a “given” fact).

2-2

Page 3: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data: Singular or Plural?Data: Singular or Plural?

Subjects, Variables, Data SetsSubjects, Variables, Data SetsWe will refer to DataData as plural and data setdata set as a

particular collection of data as a whole.

ObservationObservation – each data value.

SubjectSubject (or individual) – an item for study (e.g., an employee in your company).

VariableVariable – a characteristic about the subject or individual (e.g., employee’s income).

2-3

Page 4: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions(Table 2.2)(Table 2.2)

Number of Variables and Typical TasksNumber of Variables and Typical Tasks

Data Set Variables Typical Tasks

Univariate One Histograms, descriptive statistics, frequency tallies

Bivariate Two Scatter plots, correlations, simple regression

Multivariate More than two

Multiple regression, data mining, econometric modeling

2-4

Page 5: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions(Table 2.1)(Table 2.1)

A Small Multivariate Data Set A Small Multivariate Data Set 5 Variables8 Subjects

2-5

Page 6: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

(Figure 2.1)

2-6

Page 7: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Data TypesData TypesCategoricalCategorical or QualitativeQualitative data.

Values are described by words rather than numbers.

For example, - Automobile style (e.g., X = full, midsize, compact, subcompact).- Mutual fund (e.g., X = load, no-load).

2-7

Page 8: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Data CodingData CodingCodingCoding refers to using numbers to represent

categories to facilitate statistical analysis.

Coding an attribute as a number does not make the data numerical.

For example, 1 = Bachelor’s, 2 = Master’s, 3 = Doctorate

Rankings may exist, for example, 1 = Liberal, 2 = Moderate, 3 = Conservative

2-8

Page 9: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Binary DataBinary DataA binary variablebinary variable has only two values,

1 = presence, 0 = absence of a characteristic of interest (codes themselves are arbitrary).

For example, 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = male, 0 = female 1 = female, 0 = male

The coding itself has no numerical value so binary variables are attribute dataattribute data.

2-9

Page 10: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Numerical DataNumerical DataNumericalNumerical or quantitativequantitative data arise from

counting or some kind of mathematical operation.

For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims).- Ratio of profit to sales for last quarter (e.g., X = 0.0447).

Can be broken down into two types – discretediscrete or continuous data.

2-10

Page 11: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Discrete DataDiscrete DataA numerical variable with a countable number of

values that can be represented by an integer (no fractional values).

For example, - Number of Medicaid patients (e.g., X = 2).- Number of takeoffs at O’Hare (e.g., X = 37).

2-11

Page 12: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

Continuous DataContinuous DataA numerical variable that can have any value

within an interval (e.g., length, weight, time, sales, price/earnings ratios).

Any continuous interval contains infinitely many possible values (e.g., 426 < X < 428).

2-12

Page 13: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Data DefinitionsData Definitions

RoundingRoundingAmbiguity is introduced when continuous data are

rounded to whole numbers.

Underlying measurement scale is continuous.

Precision of measurement depends on instrument.

Sometimes discrete data are treated as continuous when the range is very large (e.g., SAT scores) and small differences (e.g., 604 or 605) aren’t of much importance.

2-13

Page 14: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Levels of MeasurementLevels of MeasurementLevel of Level of MeasurementMeasurement CharacteristicsCharacteristics ExampleExample

NominalNominal Categories only Eye color ((blueblue, , brownbrown, , greengreen, , hazelhazel))

OrdinalOrdinal Rank has meaning

Bond ratings (Aaa, Aab, C, D, F, etc.)

IntervalInterval Distance has meaning

Temperature (57o Celsius)

RatioRatio Meaningful zero exists

Accounts payable ($21.7 million)

2-14

Page 15: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Nominal MeasurementNominal MeasurementNominal data merely identify a categorycategory.Nominal data are qualitative, attribute, categorical

or classification data (e.g., Apple, Compaq, Dell, HP).

Nominal data are usually coded numerically, codes are arbitrary (e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP).

Only mathematical operations are counting (e.g., frequencies) and simple statistics.

2-15

Page 16: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Ordinal MeasurementOrdinal MeasurementOrdinal data codes can be ranked

(e.g., 1 = Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).

Distance between codes is not meaningful (e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning).

Many useful statistical tests exist for ordinal data. Especially useful in social science, marketing and human resource research.

2-16

Page 17: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Interval MeasurementInterval MeasurementData can not only be ranked, but also have

meaningful intervals between scale points (e.g., difference between 60F and 70F is same as difference between 20F and 30F).

Since intervals between numbers represent distances, mathematical operations can be performed (e.g., average).

Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60F is not twice as warm as 30F).

2-17

Page 18: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Likert ScalesLikert ScalesA special case of interval data frequently used in

survey research.The coarseness of a Likert scale refers to the

number of scale points (typically 5 or 7).““College-bound high school students should be required College-bound high school students should be required to study a foreign language.” (check one)to study a foreign language.” (check one)

StronglyStrongly

AgreeAgreeSomewhatSomewhat

AgreeAgreeNeither Neither AgreeAgree

Nor Nor DisagreeDisagree

SomewhatSomewhatDisagreeDisagree

StronglyStronglyDisagreeDisagree

2-18

Page 19: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Likert ScalesLikert ScalesA neutral midpoint (“Neither Agree Nor Disagree”)

is allowed if an odd number of scale points is used or omitted to force the respondent to “lean” one way or the other.

• Likert data are coded Likert data are coded numerically (e.g., 1 to numerically (e.g., 1 to 5) but any equally 5) but any equally spaced values will spaced values will work.work.

Likert coding: Likert coding: 1 to 5 scale1 to 5 scale

Likert coding: Likert coding: -2 to +2 scale-2 to +2 scale

5 = Help a lot5 = Help a lot4 = Help a little4 = Help a little3 = No effect 3 = No effect 2 = Hurt a little2 = Hurt a little1 = Hurt a lot1 = Hurt a lot

+2 = Help a lot+2 = Help a lot+1 = Help a little+1 = Help a little 0 = No effect0 = No effect1 = Hurt a little1 = Hurt a little2 = Hurt a lot2 = Hurt a lot

2-19

Page 20: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Likert ScalesLikert ScalesCareful choice of verbal anchors results in

measurable intervals (e.g., the distance from 1 to 2 is “the same” as the interval, say, from 3 to 4).

Ratios are not meaningful (e.g., here 4 is not twice 2).

Many statistical calculations can be performed (e.g., averages, correlations, etc.).

2-20

Page 21: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Likert ScalesLikert ScalesMore variants of Likert scales:

How would you rate your marketing instructor? How would you rate your marketing instructor? (check one)(check one)

TerribleTerrible

PoorPoor

AdequateAdequate

GoodGood

ExcellentExcellent

How would you rate your marketing instructor? How would you rate your marketing instructor? (check one)(check one)

Very Very BadBad

Very Very GoodGood

2-21

Page 22: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Ratio MeasurementRatio MeasurementRatio dataRatio data have all properties of nominal, ordinal

and interval data types and also possess a meaningful zero (absence of quantity being measured).

Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as much as $10 million).

Zero does not have to be observable in the data, it is an absolute reference point.

2-22

Page 23: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Use the following procedure to Use the following procedure to recognize data types: recognize data types:

QuestionQuestion If “Yes”If “Yes”

Q1. Is there a meaningful zero point?

Ratio data (all statistical operations are allowed)

Q2. Are intervals between scale points meaningful?

Interval data (common statistics allowed, e.g., means and standard deviations)

Q3. Do scale points represent rankings?

Ordinal data (restricted to certain types of nonparametric statistical tests)

Q4. Are there discrete categories?

Nominal data (only counting allowed, e.g. finding the mode)

2-23

Page 24: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Level of MeasurementLevel of Measurement

Changing Data by RecodingChanging Data by RecodingIn order to simplify data or when exact data

magnitude is of little interest, ratio data can be recoded downward into ordinal or nominal measurements (but not conversely).

For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “high” (over 140).

The above recoded data are ordinal (ranking is preserved) but intervals are unequal and some information is lost.

2-24

Page 25: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Time Series versus Cross-Sectional Time Series versus Cross-Sectional DataData

Time Series DataTime Series DataEach observation in the sample represents a

different equally spaced point in time (e.g., years, months, days).

Periodicity may be annual, quarterly, monthly, weekly, daily, hourly, etc.

We are interested in trends and patterns over time (e.g., annual growth in consumer debit card use from 2001 to 2008).

2-25

Page 26: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Time Series versus Cross-Sectional Time Series versus Cross-Sectional DataData

Cross-sectional DataCross-sectional DataEach observation represents a different individual

unit (e.g., person) at the same point in time (e.g., monthly VISA balances).

We are interested in - variation among observations or in - relationships.

We can combine the two data types to get pooled cross-sectional and time series data.

2-26

Page 27: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Time Series versus Cross-Sectional Time Series versus Cross-Sectional DataData

Sample or Census?Sample or Census?A samplesample involves looking only at some items

selected from the population.A censuscensus is an examination of all items in a

defined population.

Mobility- Illegal immigrants- Budget constraints- Incomplete responses or non-responses

Why can’t the United States Census survey every person in the population?

2-27

Page 28: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

Situations Where A Situations Where A SampleSample May Be May Be Preferred:Preferred:

Infinite PopulationInfinite PopulationNo census is possible if the population is infinite or of indefinite size (an assembly line can keep producing bolts, a doctor can keep seeing more patients).

Destructive TestingDestructive TestingThe act of sampling may destroy or devalue the item (measuring battery life, testing auto crashworthiness, or testing aircraft turbofan engine life).

2-28

Page 29: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

Situations Where A Situations Where A SampleSample May Be May Be Preferred:Preferred:

Timely ResultsTimely ResultsSampling may yield more timely results than a census (checking wheat samples for moisture and protein content, checking peanut butter for aflatoxin contamination).

AccuracyAccuracySample estimates can be more accurate than a census. Instead of spreading limited resources thinly to attempt a census, our budget of time and money might be better spent to hire experienced staff, improve training of field interviewers, and improve data safeguards.

2-29

Page 30: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

Situations Where A Situations Where A SampleSample May Be May Be Preferred:Preferred:

CostCostEven if it is feasible to take a census, the cost, either in time or money, may exceed our budget.

Sensitive InformationSensitive InformationSome kinds of information are better captured by a well-designed sample, rather than attempting a census. Confidentiality may also be improved in a carefully-done sample.

2-30

Page 31: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

Situations Where A Situations Where A CensusCensus May Be May Be PreferredPreferred

Small PopulationSmall PopulationIf the population is small, there is little reason to sample, for the effort of data collection may be only a small part of the total cost.

Large Sample SizeLarge Sample SizeIf the required sample size approaches the population size, we might as well go ahead and take a census.

2-31

Page 32: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

Situations Where A Situations Where A CensusCensus May Be May Be PreferredPreferred

Database ExistsDatabase ExistsIf the data are on disk we can examine 100% of the cases. But auditing or validating data against physical records may raise the cost.

Legal RequirementsLegal RequirementsBanks must count all the cash in bank teller drawers at the end of each business day. The U.S. Congress forbade sampling in the 2000 decennial population census.

2-32

Page 33: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters and Statistics?Parameters and Statistics?

• StatisticsStatistics are computed from a sample of n items, chosen from a population of N items.

• Statistics can be used as estimates of parametersparameters found in the population.

• Symbols are used to represent population parameters and sample statistics.

2-33

Page 34: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameter or Statistic?Parameter or Statistic?

ParameterParameterAny measurement that describes an entire populationpopulation. Usually, the parameter value is unknown since we rarely can observe the entire population. Parameters are often (but not always) represented by Greek letters.

2-34

Page 35: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameter or Statistic?Parameter or Statistic?

StatisticStatisticAny measurement computed from a samplesample. Usually, the statistic is regarded as an estimate of a population parameter. Sample statistics are often (but not always) represented by Roman letters.

2-35

Page 36: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Parameters or Statistics?Parameters or Statistics?

The population must be carefully specified and the

sample must be drawn scientifically so that the sample is representative.

The target populationtarget population is the population we are interested in (e.g., U.S. gasoline prices).

Target PopulationTarget Population

The sampling framesampling frame is the group from which we take the sample (e.g., 115,000 stations).

The frame should not differ from the target population.

2-36

Page 37: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

NN nn

Finite or Infinite?Finite or Infinite?A population is finitefinite if it has a definite size, even

if its size is unknown.A population is infiniteinfinite if it is of arbitrarily large

size.Rule of Thumb: A population may be treated as

infinite when N is at least 20 times n (i.e., when N/n ≥ 20)

Parameters or Statistics?Parameters or Statistics?Parameters or Statistics?Parameters or Statistics?

Here,N/n ≥ 20

2-37

Page 38: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Probability SamplesProbability SamplesSimple Random Simple Random SampleSample

Use random numbers to select items from a list (e.g., VISA cardholders).

Systematic Systematic SampleSample

Select every kth item from a list or sequence (e.g., restaurant customers).

Stratified SampleStratified Sample Select randomly within defined strata (e.g., by age, occupation, gender).

Cluster SampleCluster Sample Like stratified sampling except strata are geographical areas (e.g., zip codes).

2-38

Page 39: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Non-probability SamplesNon-probability SamplesJudgment Judgment SampleSample

Use expert knowledge to choose “typical” items (e.g., which employees to interview).

Convenience Convenience SampleSample

Use a sample that happens to be available (e.g., ask co-worker opinions at lunch).

Focus GroupsFocus Groups In-depth dialog with a representative panel of individuals (e.g. iPod users).

2-39

Page 40: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Simple Random SampleSimple Random Sample

Every item in the Every item in the population of N items population of N items has the same chance has the same chance of being chosen in the of being chosen in the sample of n items.sample of n items.

We rely on We rely on random random numbersnumbers to select a to select a name.name.

=RANDBETWEEN(1,48)=RANDBETWEEN(1,48)2-40

Page 41: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Random Number TablesRandom Number TablesA table of random digits used to select random

numbers between 1 and N.Each digit 0 through 9 is equally likely to be

chosen. Setting Up a RuleSetting Up a Rule

For example, NilCo wants to award cash prizes to 10 of its 875 loyal customers.

To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving through the random number table.

2-41

Page 42: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Setting Up a RuleSetting Up a RuleRandomly point at the table to choose a starting

point.

Choose the first three digits of the selected five-digit block, move to the right one column, down one row, and repeat.

When we reach the end of a line, wrap around to the other side of the table and continue.

Discard any number greater than 875 and any duplicates.

2-42

Page 43: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

8213482134 1445814458 6671666716 5426954269 3192831928 4624146241 0305203052 0026000260 3236732367 2578325783

0713907139 1682916829 7676876768 1191311913 4243442434 9196191961 9293492934 1822918229 1559515595 0256602566

4505645056 4394393939 3118831188 4327243272 1133211332 9949499494 1934819348 9707697076 9560595605 2801028010

1024410244 1909319093 5165167878 6346363463 8556885568 7003470034 8281182811 2326123261 4879448794 6398463984

1294012940 8443484434 5008750087 2012018989 5800958009 6697266972 0576405764 1042110421 3687536875 6496464964

8443884438 4582845828 4035340353 2892528925 1191191111 5350253502 2464024640 9688096880 9316693166 6840968409

9868198681 6787167871 7173571735 6411364113 9013990139 3343346666 6531265312 9065590655 7544475444 3084530845

4329043290 9675396753 1879918799 4971349713 3922739227 1595515955 4614616767 6385363853 0363303633 1999019990

9689396893 8541085410 8823388233 2209422094 3060530605 7902479024 0179101791 3883883939 8553185531 9457694576

7540375403 4122741227 0019200192 1681416814 4705447054 1681416814 8134981349 9226492264 0100102828 2907129071

7806478064 9211192111 5154151541 7656376563 6902769027 6771867718 0649906499 7193871938 1735417354 1261268080

2622624646 7174671746 9401994019 9316593165 9671396713 0331603316 7591275912 8620986209 1208112081 5781757817

9876698766 6731267312 9635896358 2135121351 8644886448 3182831828 8611386113 7886878868 6724367243 0676306763

3789537895 5105551055 1192911929 4444344443 1599515995 7293572935 9963199631 1819018190 8587785877 3130931309

2798827988 8116381163 5221252212 2510225102 6179861798 2867028670 0135801358 6035460354 7401574015 1855618556

1921619216 5300853008 4449844498 1926219262 1219612196 9394793947 9016290162 7633776337 1264612646 2683826838

2807828078 8672986729 6943869438 2423524235 3520835208 4895748957 5352953529 7629776297 4174141741 5473554735

3445534455 6136361363 9371193711 6803868038 7596075960 1632716327 9571695716 6696466964 2863428634 6501565015

5351053510 9041290412 7043870438 4593245932 5781557815 7514475144 5247252472 6181761817 4156241562 4208442084

3065830658 1889418894 8820888208 9786797867 3073730737 9498594985 1823518235 0217802178 3972839728 6639866398

Table of 1,000 Random DigitsTable of 1,000 Random Digits

2-43

Start HereStart Here

Page 44: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

With or Without ReplacementWith or Without ReplacementIf we allow duplicates when sampling, then we are

sampling with replacementwith replacement.

Duplicates are unlikely when n is much smaller than N.

If we do not allow duplicates when sampling, then we are sampling without replacementwithout replacement.

2-44

Page 45: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Computer MethodsComputer Methods

These are These are pseudo-randompseudo-random generators because even the generators because even the best algorithms eventually repeat themselves.best algorithms eventually repeat themselves.

Excel - Option AExcel - Option A Enter the Excel function =RANDBETWEEN(1,875) into 10 spread-sheet cells. Press F9 to get a new sample.

Excel - Option BExcel - Option B Enter the function =INT(1+875*RAND()) into 10 spreadsheet cells. Press F9 to get a new sample.

InternetInternet The web site www.random.org will give you many kinds of excellent random numbers (integers, decimals, etc).

MinitabMinitab Use Minitab’s Random Data menu with the Integer option.

2-45

Page 46: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Row – Column Data ArraysRow – Column Data ArraysWhen the data are arranged in a rectangular

array, an item can be chosen at random by selecting a row and column.

For example, in the 4 x 3 array, select a random column between 1 and 3 and a random row between 1 and 4.

This way, each item has an equal chance of being selected.

2-46

Page 47: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Dillard's K-Mart Saks

Dollar General Kohl's Sears Roebuck

Federated Dept Stores

May Dept Stores TargetTarget

J. C Penney Nordstrom Wal-Mart Stores

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Row – Column Data ArraysRow – Column Data ArraysUse =RANDBETWEEN function to choose row 3 and

column 3 (Target).

2-47

Page 48: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Randomizing a ListRandomizing a ListIn Excel, use function =RAND() beside each row

to create a column of random numbers between

0 and 1.Copy and paste these numbers into the same column using “Paste Special | Values” (to paste only the values and not the formulas).

Sort the spreadsheet on the random number column.

2-48

Page 49: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

• The first n items are a random sample of the entire list (they are as likely as any others).

Randomizing a ListRandomizing a List (Figure 2.6)

2-49

Page 50: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Systematic SamplingSystematic Sampling

For example, starting at item 2, we sample every For example, starting at item 2, we sample every k k = 4 items to obtain a sample of = 4 items to obtain a sample of nn = 20 items = 20 items from a list of from a list of NN = 78 items. = 78 items.

Note that Note that NN//n = n = 78/20 78/20 4. 4.

• Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list.

2-50

Page 51: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Systematic SamplingSystematic SamplingA systematic sample of n items from a population

of N items requires that periodicity k be approximately N/n.

Systematic sampling should yield acceptable results unless patterns in the population happen to recur at periodicity k.

Can be used with unlistable or infinite populations.

• Systematic samples are well-suited to linearly Systematic samples are well-suited to linearly organized physical populations.organized physical populations.

2-51

Page 52: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Systematic SamplingSystematic SamplingFor example, out of 501 companies, we want to

obtain a sample of 25. What should the periodicity k be?

k = Nk = N//n n = 501/25= 501/25 20. 20.

So, we should choose every 20th company from a random starting point.

2-52

Page 53: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Stratified SamplingStratified SamplingUtilizes prior information about the population.

Applicable when the population can be divided into relatively homogeneous subgroups of known size (stratastrata).

A simple random sample of the desired size is taken within each stratumstratum.

For example, from a population containing 55% males and 45% females, randomly sample 120 males and 80 females (n = 200).

2-53

Page 54: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Stratified SamplingStratified SamplingOr, take a random sample of the entire population

and then combine individual strata estimates using appropriate weights.

For a population with L strata, the population size N is the sum of the stratum sizes: N = N1 + N2 + ... + NL

The weight assigned to stratum j is wj = Nj / n

For example, take a random sample of For example, take a random sample of nn = 200 and then = 200 and then weight the responses for males by weight the responses for males by wwMM = = .55 and for .55 and for

females by females by wwFF = .45. = .45.2-54

Page 55: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Cluster SampleCluster SampleStrata consist of geographical regions.

One-stageOne-stage cluster sampling – sample consists of all elements in each of k randomly chosen sub regions (clusters).

Two-stageTwo-stage cluster sampling, first choose k sub regions (clusters), then choose a random sample of elements within each cluster.

2-55

Page 56: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

• Here is an example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).

Sampling MethodsSampling MethodsSampling MethodsSampling Methods

Cluster SampleCluster Sample (Figure 2.7)

2-56

Page 57: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Cluster SampleCluster SampleCluster sampling is useful when

- Population frame and stratum characteristics are not readily available- It is too expensive to obtain a simple or stratified sample- The cost of obtaining data increases sharply with distance- Some loss of reliability is acceptable

2-57

Page 58: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Sampling MethodsSampling Methods

Judgment SampleJudgment SampleA non-probability sampling method that relies on

the expertise of the sampler to choose items that are representative of the population.

Can be affected by subconscious bias (i.e., non-randomness in the choice).

Quota samplingQuota sampling is a special kind of judgment sampling, in which the interviewer chooses a certain number of people in each category.

2-58

Page 59: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Focus GroupsFocus Groups

Sampling MethodsSampling Methods

Convenience SampleConvenience SampleTake advantage of whatever sample is available

at that moment. A quick way to sample.

A panel of individuals chosen to be representative of a wider population, formed for open-ended discussion and idea gathering.

2-59

Page 60: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Type of DataType of Data ExamplesExamples

U.S. general dataU.S. general data Statistical Abstract of the U.S.

U.S. economic dataU.S. economic data Economic Report of the President

AlmanacsAlmanacs World Almanac, Time Almanac

PeriodicalsPeriodicals Economist, Business Week, Fortune

IndexesIndexes New York Times, Wall Street Journal

DatabasesDatabases CompuStat, Citibase, U.S. Census

World dataWorld data CIA World Factbook

WebWeb Google, Yahoo, msn

Data SourcesData SourcesData SourcesData Sources

Useful Data SourcesUseful Data Sources (Table 2.11)

2-60

Page 61: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

• Step 1:Step 1: State the goals of the research.State the goals of the research.

• Step 2: Step 2: Develop the budget (time, money, staff).Develop the budget (time, money, staff).

• Step 3: Step 3: Create a research design (target population, Create a research design (target population, frame, sample size). frame, sample size).

• Step 4: Step 4: Choose a survey type and method ofChoose a survey type and method of administration. administration.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Basic Steps of Survey ResearchBasic Steps of Survey Research

2-61

Page 62: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

• Step 5: Step 5: Design a data collection instrumentDesign a data collection instrument (questionnaire). (questionnaire).

• Step 6: Step 6: Pretest the survey instrument and revise asPretest the survey instrument and revise as needed. needed.

• Step 7: Step 7: Administer the survey (follow up if needed).Administer the survey (follow up if needed).

• Step 8: Step 8: Code the data and analyze it.Code the data and analyze it.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Basic Steps of Survey ResearchBasic Steps of Survey Research

2-62

Page 63: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Survey TypesSurvey Types (Table 2.12)

Type of Type of SurveySurvey

CharacteristicsCharacteristics

MailMail You need a well-targeted and current mailing list (people move a lot). Low response rates are typical and non-response bias is expected (non-respondents differ from those who respond). Zip code lists (often costly) are an attractive option to define strata of similar income, education, and attitudes. To encourage participation, a cover letter should clearly explain the uses to which the data will be put. Plan for follow-up mailings.

2-63

Page 64: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of SurveySurvey

CharacteristicsCharacteristics

TelephoneTelephone Random dialing yields very low response and is poorly targeted. Purchased phone lists help reach the target population, though a low response rate still is typical (disconnected phones, caller screening, answering machines, work hours, no-call lists). Other sources of non-response bias include the growing number of non-English speakers and distrust caused by scams and spam's.

2-64

Page 65: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of SurveySurvey

CharacteristicsCharacteristics

InterviewsInterviews Interviewing is expensive and time-consuming, yet a trade-off between sample size for high-quality results may still be worth it. Interviews must be carefully handled so interviewers must be well-trained – an added cost. But you can obtain information on complex or sensitive topics (e.g., gender discrimination in companies, birth control practices, diet and exercise habits).

2-65

Page 66: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of SurveySurvey

CharacteristicsCharacteristics

WebWeb Web surveys are growing in popularity, but are subject to non-response bias because those who participate may differ from those who feel too busy, don’t own computers or distrust your motives (scams and spam are again to blame). This type of survey works best when targeted to a well-defined interest group on a question of self-interest (e.g., views of CPAs on new proposed accounting rules, frequent flyer views on airline security).

2-66

Page 67: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Survey TypesSurvey TypesType of Type of SurveySurvey

CharacteristicsCharacteristics

Direct Direct ObservationObservation

This can be done in a controlled setting (e.g., psychology lab) but requires informed consent, which can change behavior. Unobtrusive observation is possible in some no lab settings (e.g., what percentage of airline passengers carry on more than two bags, what percentage of SUVs carry no passengers, what percentage of drivers wear seat belts).

2-67

Page 68: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

PlanningPlanning What is the purpose of the survey? What is the purpose of the survey? Consider staff expertise, needed skills, Consider staff expertise, needed skills, degree of precision, budget.degree of precision, budget.

DesignDesign Invest time and money in designing Invest time and money in designing the survey. Use books and references the survey. Use books and references to avoid unnecessary errors.to avoid unnecessary errors.

Survey GuidelinesSurvey Guidelines (Table 2.13)

QualityQuality Take care in preparing a quality survey Take care in preparing a quality survey so that people will take you seriously.so that people will take you seriously.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

2-68

Page 69: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Pilot TestPilot Test Pretest on friends or co-workers to Pretest on friends or co-workers to make sure the survey is clear.make sure the survey is clear.

Buy-inBuy-in Improve response rates by stating the Improve response rates by stating the purpose of the survey, offering a token purpose of the survey, offering a token of appreciation or paving the way with of appreciation or paving the way with endorsements.endorsements.

ExpertiseExpertise Work with a consultant early on.Work with a consultant early on.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Survey GuidelinesSurvey Guidelines

2-69

Page 70: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Questionnaire DesignQuestionnaire Design

• Instruct on how to submit the completed survey.

• Use a lot of white space in layout.

• Begin with short, clear instructions.

• State the survey purpose.

• Assure anonymity.

2-70

Page 71: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Survey ResearchSurvey Research

Questionnaire DesignQuestionnaire Design• Break survey into naturally occurring sections.

• Let respondents bypass sections that are not applicable (e.g., “if you answered no to question 7, skip directly to Question 15”).

• Pretest and revise as needed.

• Keep as short as possible.

2-71

Page 72: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Questionnaire DesignQuestionnaire Design (Table 2.14)

Type of QuestionType of Question ExampleExample

Open-ended questionOpen-ended question Briefly describe your job goals.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Fill-in-the-blankFill-in-the-blank How many times did you attend formal religious services during the last year?

________ timesCheck boxesCheck boxes Which of these statistics packages

have you ever used? SAS Visual Statistics SPSS MegaStat Systat Minitab

2-72

Page 73: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Type of Type of QuestionQuestion

ExampleExample

Questionnaire DesignQuestionnaire Design

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Ranked choicesRanked choices “Please evaluate your dining experience”

Excellent Good Fair Poor

Food

Service

Ambiance

Cleanliness

Overall

2-73

Page 74: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Type of Type of QuestionQuestion

ExampleExample

PictogramsPictograms “What do you think of the President’s economic policies?” (circle one)

Questionnaire DesignQuestionnaire Design

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Likert scaleLikert scale Statistics is a difficult subject. Neither Strongly Slightly Agree Nor Slightly Strongly Agree Agree Disagree Disagree Disagree

2-74

Page 75: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Question WordingQuestion Wording

1. Shall state taxes be cut?

2. Shall state taxes be cut, if it means reducing highway maintenance?

3. Shall state taxes be cut, it is means firing teachers and police?

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

• The way a question is asked has a profound influence on the response. For example,

2-75

Page 76: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Question WordingQuestion Wording• Make sure you have covered all the

possibilities. For example,

Are you married? Yes No

• Overlapping classes or unclear categories are a problem. For example,

How old is your father? 35 – 45 45 – 55 55 – 65 65 or older

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

2-76

Page 77: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Coding and Data ScreeningCoding and Data Screening• Responses are usually coded numerically

(e.g., 1 = male 2 = female).• Missing values are typically denoted by special

characters (e.g., blank, “.” or “*”). • Discard questionnaires that are flawed or missing

many responses.• Watch for multiple responses, outrageous or

inconsistent replies or range answers.• Follow-up if necessary and always document your

data-coding decisions.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

2-77

Page 78: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

• Enter data into a spreadsheet or database as a “flat file” (n subjects x m variables matrix).

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

Data File FormatData File Format

2-78

Page 79: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Advice on Copying DataAdvice on Copying Data• Using commas (,), dollar signs ($), or percents

(%) as part of the values may result in your data being treated as text values.

• A numerical variable may only contain the digits

0-9, a decimal point, and a minus sign.• To avoid round-off errors, format the data column as plain numbers with the desired number of decimal places before you copy the data to a statistical package.

Survey ResearchSurvey ResearchSurvey ResearchSurvey Research

2-79

Page 80: Data Collection Definitions Level of Measurement Time Series and Cross- sectional Data Sampling Concepts Sampling Methods Data Sources Survey Research

Applied Statistics in Applied Statistics in Business and EconomicsBusiness and Economics

End of Chapter 2End of Chapter 2

2-80