2-1. Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

Download 2-1. Data Collection Data Vocabulary Data Vocabulary Level of Measurement Level of Measurement Time Series and Cross-sectional Data Time Series and Cross-sectional

Post on 04-Jan-2016

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Data CollectionData VocabularyLevel of MeasurementTime Series and Cross-sectional DataSampling ConceptsSampling MethodsData SourcesSurvey ResearchChapter2McGraw-Hill/Irwin 2008 The McGraw-Hill Companies, Inc. All rights reserved.

  • Data VocabularyData is the plural form of the Latin datum (a given fact).Important decisions may depend on data.

  • Data Vocabulary Subjects, Variables, Data Sets

    We will refer to Data as plural and data set as a particular collection of data as a whole.Observation each data value.Subject (or individual) an item for study (e.g., an employee in your company).Variable a characteristic about the subject or individual (e.g., employees income).

  • Data Vocabulary Subjects, Variables, Data Sets

    Three types of data sets:

  • Data Vocabulary Subjects, Variables, Data SetsConsider the multivariate data set with 5 variables8 subjects5 x 8 = 40 observations

  • Data Vocabulary Data Types

    A data set may have a mixture of data types.

  • Data Vocabulary Attribute Data

    Also called categorical, nominal or qualitative data.Values are described by words rather than numbers.For example, - Automobile style (e.g., X = full, midsize, compact, subcompact). - Mutual fund (e.g., X = load, no-load).

  • Data Vocabulary Data Coding

    Coding refers to using numbers to represent categories to facilitate statistical analysis.Coding an attribute as a number does not make the data numerical.For example, 1 = Bachelors, 2 = Masters, 3 = Doctorate

    Rankings may exist, for example, 1 = Liberal, 2 = Moderate, 3 = Conservative

  • Data Vocabulary Binary Data

    A binary variable has only two values, 1 = presence, 0 = absence of a characteristic of interest (codes themselves are arbitrary).For example, 1 = employed, 0 = not employed 1 = married, 0 = not married 1 = male, 0 = female 1 = female, 0 = maleThe coding itself has no numerical value so binary variables are attribute data.

  • Binary Data

  • Data Vocabulary Numerical Data

    Numerical or quantitative data arise from counting or some kind of mathematical operation.For example, - Number of auto insurance claims filed in March (e.g., X = 114 claims). - Ratio of profit to sales for last quarter (e.g., X = 0.0447).Can be broken down into two types discrete or continuous data.

  • Data Vocabulary Discrete Data

    A numerical variable with a countable number of values that can be represented by an integer (no fractional values).For example, - Number of Medicaid patients (e.g., X = 2). - Number of takeoffs at OHare (e.g., X = 37).

  • Data Vocabulary Continuous Data

    A numerical variable that can have any value within an interval (e.g., length, weight, time, sales, price/earnings ratios).Any continuous interval contains infinitely many possible values (e.g., 426 < X < 428).

  • Data Vocabulary Rounding

    Ambiguity is introduced when continuous data are rounded to whole numbers.Underlying measurement scale is continuous.Precision of measurement depends on instrument.Sometimes discrete data are treated as continuous when the range is very large (e.g., SAT scores) and small differences (e.g., 604 or 605) arent of much importance.

  • Level of Measurement Likert Scales

    A special case of interval data frequently used in survey research.The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7).

  • Level of Measurement Likert Scales

    A neutral midpoint (Neither Agree Nor Disagree) is allowed if an odd number of scale points is used or omitted to force the respondent to lean one way or the other.

    Likert data are coded numerically (e.g., 1 to 5) but any equally spaced values will work.

  • Level of Measurement Likert Scales

    Careful choice of verbal anchors results in measurable intervals (e.g., the distance from 1 to 2 is the same as the interval, say, from 3 to 4).

    Ratios are not meaningful (e.g., here 4 is not twice 2).

    Many statistical calculations can be performed (e.g., averages, correlations, etc.).

  • Level of Measurement Likert Scales

    More variants of Likert scales:

    How would you rate your marketing instructor? (check one) Terrible Poor Adequate Good Excellent

  • Time Series and Cross-sectional Data Time Series DataEach observation in the sample represents a different equally spaced point in time (e.g., years, months, days).

    Periodicity may be annual, quarterly, monthly, weekly, daily, hourly, etc.

  • Sampling Concepts Sample or Census?A sample involves looking only at some items selected from the population.

    A census is an examination of all items in a defined population. Mobility - Illegal immigrants - Budget constraints - Incomplete responses or nonresponses

  • Sampling Concepts

  • Sampling Concepts

  • Sampling Concepts

  • Sampling Concepts Parameters and StatisticsStatistics are computed from a sample of n items, chosen from a population of N items.

    Statistics can be used as estimates of parameters found in the population.Symbols are used to represent population parameters and sample statistics.

  • Sampling Concepts Parameters and Statistics

  • Sampling Concepts Parameters and StatisticsThe population must be carefully specified and the sample must be drawn scientifically so that the sample is representative.

    The target population is the population we are interested in (e.g., U.S. gasoline prices). Target PopulationThe sampling frame is the group from which we take the sample (e.g., 115,000 stations).The frame should not differ from the target population.

  • Sampling Concepts Finite or Infinite?A population is finite if it has a definite size, even if its size is unknown.

    A population is infinite if it is of arbitrarily large size.Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n > 20)Here, N/n > 20

  • Sampling Methods

  • Sampling Methods

  • Sampling Methods Simple Random SampleEvery item in the population of N items has the same chance of being chosen in the sample of n items.We rely on random numbers to select a name.

  • Sampling Methods Random Number TablesA table of random digits used to select random numbers between 1 and N.Each digit 0 through 9 is equally likely to be chosen. Setting Up a RuleFor example, NilCo wants to award cash prizes to 10 of its 875 loyal customers.To get 10 three-digit numbers between 001 and 875, we define any consistent rule for moving through the random number table.

  • Sampling Methods Setting Up a RuleRandomly point at the table to choose a starting point.Choose the first three digits of the selected five-digit block, move to the right one column, down one row, and repeat.When we reach the end of a line, wrap around to the other side of the table and continue.Discard any number greater than 875 and any duplicates.

  • Table of 1,000 Random Digits

    8213414458667165426931928462410305200260323672578307139168297676811913424349196192934182291559502566450564393931188432721133299494193489707695605280101024419093516786346385568700348281123261487946398412940844345008720189580096697205764104213687564964

    8443845828403532892511911535022464096880931666840998681678717173564113901393346665312906557544430845432909675318799497133922715955461676385303633199909689385410882332209430605790240179138839855319457675403412270019216814470541681481349922640102829071

    7806492111515417656369027677180649971938173541268026246717469401993165967130331675912862091208157817987666731296358213518644831828861137886867243067633789551055119294444315995729359963118190858773130927988811635221225102617982867001358603547401518556

    1921653008444981926212196939479016276337126462683828078867296943824235352084895753529762974174154735344556136393711680387596016327957166696428634650155351090412704384593257815751445247261817415624208430658188948820897867307379498518235021783972866398

  • Sampling Methods With or Without ReplacementIf we allow duplicates when sampling, then we are sampling with replacement.Duplicates are unlikely when n is much smaller than N.If we do not allow duplicates when sampling, then we are sampling without replacement.

  • Sampling Methods Systematic SamplingFor example, starting at item 2, we sample every k = 4 items to obtain a sample of n = 20 items from a list of N = 78 items. Note that N/n = 78/20 4. Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list.

  • Sampling Methods Systematic SamplingA systematic sample of n items from a population of N items requires that periodicity k be approximately N/n.Systematic sampling should yield acceptable results unless patterns in the population happen to recur at periodicity k.Can be used with unlistable or infinite populations.Systematic samples are well-suited to linearly organized physical populations.

  • Sampling Methods Systematic SamplingFor example, out of 501 companies, we want to obtain a sample of 25. What should the periodicity k be? k = N/n = 501/25 20. So, we should choose every 20th company from a random starting point.

  • Sampling Methods Stratified SamplingUtilizes prior information about the population.Applicable when the population can be divided into relatively homogeneous subgroups of known size (strata).A simple random sample of the desired size is taken within each stratum.For example, from a population containing 55% males and 45% females, randomly sample 120 males and 80 females (n = 200).

  • Sampling Methods Strat