multivariate data analysis

32
1 Chapter 1 Chapter 1 Introduction Introduction Copyright © 2007 Copyright © 2007 Prentice-Hall, Prentice-Hall, Inc. Inc.

Upload: divya-padmanaban

Post on 07-Jul-2015

148 views

Category:

Data & Analytics


0 download

DESCRIPTION

multivariate data analysis

TRANSCRIPT

Page 1: multivariate data analysis

1

Chapter 1Chapter 1

IntroductionIntroduction

Copyright © 2007Copyright © 2007Prentice-Hall, Prentice-Hall, Inc.Inc.

Page 2: multivariate data analysis

2

LEARNING OBJECTIVES:LEARNING OBJECTIVES:Upon completing this chapter, you should be able to do the Upon completing this chapter, you should be able to do the

following:following:

1.1. Explain what mult ivariate analysis is and when its Explain what mult ivariate analysis is and when its application is appropriate.application is appropriate.

2.2. Define and discuss the specif ic techniques included in Define and discuss the specif ic techniques included in mult ivariate analysis.mult ivariate analysis.

3.3. Determine which multivariate technique is appropriate for Determine which multivariate technique is appropriate for a specif ic research problem.a specif ic research problem.

4.4. Discuss the nature of measurement scales and their Discuss the nature of measurement scales and their relat ionship to mult ivariate techniques.relat ionship to mult ivariate techniques.

5.5. Describe the conceptual and stat ist ical issues inherent in Describe the conceptual and stat ist ical issues inherent in mult ivariate analyses.mult ivariate analyses.

Chapter 1: IntroductionChapter 1: IntroductionChapter 1: IntroductionChapter 1: Introduction

Page 3: multivariate data analysis

3

• What is it? Multivariate Data Analysis = all What is it? Multivariate Data Analysis = all statistical methods that simultaneously analyze statistical methods that simultaneously analyze multiple measurements on each individual or multiple measurements on each individual or object under investigation.object under investigation.

• Why use it?Why use it? MeasurementMeasurement Explanation & PredictionExplanation & Prediction Hypothesis TestingHypothesis Testing

What is Mult ivariate Analysis?What is Mult ivariate Analysis?What is Mult ivariate Analysis?What is Mult ivariate Analysis?

Page 4: multivariate data analysis

4

• The VariateThe Variate• Measurement ScalesMeasurement Scales

NonmetricNonmetric MetricMetric

• Multivariate MeasurementMultivariate Measurement• Measurement ErrorMeasurement Error• Types of TechniquesTypes of Techniques

Basic Concepts of Multivariate AnalysisBasic Concepts of Multivariate AnalysisBasic Concepts of Multivariate AnalysisBasic Concepts of Multivariate Analysis

Page 5: multivariate data analysis

5

• The variate is a l inear combination of variables The variate is a l inear combination of variables with empirically determined weights.with empirically determined weights.

• Weights are determined to best achieve the Weights are determined to best achieve the objective of the specif ic multivariate technique.objective of the specif ic multivariate technique.

• Variate equation: (Y’) = Variate equation: (Y’) = W 1W 1 X X 1 1 + + W 2 W 2 XX 2 2 + . . . + + . . . + W n W n XX nn

• Each respondent has a variate value (Y’).Each respondent has a variate value (Y’).• The Y’ The Y’ valuevalue is a is a l inear combinationlinear combination of the entire of the entire

set of variables. It is the dependent variable.set of variables. It is the dependent variable.• Potential Independent VariablesPotential Independent Variables ::

X1 = incomeX1 = incomeX2 = educationX2 = educationX3 = family sizeX3 = family sizeX4 = ??X4 = ??

The VariateThe VariateThe VariateThe Variate

Page 6: multivariate data analysis

6

Types of Data and Measurement ScalesTypes of Data and Measurement Scales

DataData

MetricMetricoror

Quantitat iveQuantitat ive

NonmetricNonmetricoror

Qualitat iveQualitat ive

NominalNominalScaleScale

OrdinalOrdinalScaleScale

IntervalIntervalScaleScale

RatioRatioScaleScale

Page 7: multivariate data analysis

7

• NonmetricNonmetrico Nominal – size of number is not related to the amount of Nominal – size of number is not related to the amount of

the characterist ic being measuredthe characterist ic being measuredo Ordinal – larger numbers indicate more (or less) of the Ordinal – larger numbers indicate more (or less) of the

characterist ic measured, but not how much more (or less).characterist ic measured, but not how much more (or less).

• MetricMetrico Interval – contains ordinal propert ies, and in addit ion, there Interval – contains ordinal propert ies, and in addit ion, there

are equal dif ferences between scale points.are equal dif ferences between scale points.o Ratio – contains interval scale propert ies, and in addit ion, Ratio – contains interval scale propert ies, and in addit ion,

there is a natural zero point.there is a natural zero point.

NOTE: The level of measurement is crit ical in determining the NOTE: The level of measurement is crit ical in determining the appropriate mult ivariate technique to use!appropriate mult ivariate technique to use!

Measurement ScalesMeasurement ScalesMeasurement ScalesMeasurement Scales

Page 8: multivariate data analysis

8

• All variables have some error. What are All variables have some error. What are the sources of error?the sources of error?

• Measurement error = distorts observed Measurement error = distorts observed relationships and makes multivariate relationships and makes multivariate techniques less powerful.techniques less powerful.

• Researchers use summated scales, for Researchers use summated scales, for which several variables are summed or which several variables are summed or averaged together to form a composite averaged together to form a composite representation of a concept.representation of a concept.

Measurement ErrorMeasurement ErrorMeasurement ErrorMeasurement Error

Page 9: multivariate data analysis

9

In addressing measurement error, researchers In addressing measurement error, researchers evaluate two important characteristics of evaluate two important characteristics of measurement:measurement:

• Validity = the degree to which a measure Validity = the degree to which a measure accurately represents what it is supposed to.accurately represents what it is supposed to.

• Reliabil i ty = the degree to which the Reliabil i ty = the degree to which the observed variable measures the “true” value observed variable measures the “true” value and is thus error free.and is thus error free.

Measurement ErrorMeasurement ErrorMeasurement ErrorMeasurement Error

Page 10: multivariate data analysis

10

Statist ical Signif icance and PowerStatist ical Signif icance and Power

• Type I errorType I error, or , or αα, is the probabil i ty of rejecting the null , is the probabil i ty of rejecting the null hypothesis when it is true.hypothesis when it is true.

• Type II errorType II error, or , or ββ, is the probabil i ty of fai l ing to reject the null , is the probabil i ty of fai l ing to reject the null hypothesis when it is false.hypothesis when it is false.

• PowerPower, or , or 1-1-ββ, is the probabil i ty of rejecting the null , is the probabil i ty of rejecting the null hypothesis when it is false.hypothesis when it is false.

HH00 true true HH00 false false

Fail to Reject HFail to Reject H00 1-1- αα ββType II errorType II error

Reject HReject H00 ααType I errorType I error

1-1- ββPowerPower

Page 11: multivariate data analysis

11

Power is Determined by Three Factors:Power is Determined by Three Factors:

• Effect size:Effect size: the actual magnitude of the effect of the actual magnitude of the effect of interest (e.g., the difference between means or interest (e.g., the difference between means or the correlation between variables).the correlation between variables).

• Alpha (Alpha ( αα ):): as as αα is set at smaller levels, power is set at smaller levels, power decreases. Typically, decreases. Typically, αα = .05. = .05.

• Sample size:Sample size: as sample size increases, power as sample size increases, power increases. With very large sample sizes, even increases. With very large sample sizes, even very small effects can be statistical ly signif icant, very small effects can be statistical ly signif icant, raising the issue of practical signif icance vs. raising the issue of practical signif icance vs. statistical signif icance.statistical signif icance.

Page 12: multivariate data analysis

12

Figure 1-1 Impact of Sample Size on PowerFigure 1-1 Impact of Sample Size on Power

Page 13: multivariate data analysis

13

Rules of Thumb 1–1 Rules of Thumb 1–1

Statistical Power Analysis

• Researchers should always design the study to achieve a power level of .80 at the desired significance level.

• More stringent significance levels (e.g., .01 instead of .05) require larger samples to achieve the desired power level.

• Conversely, power can be increased by choosing a less stringent alpha level (e.g., .10 instead of .05).

• Smaller effect sizes always require larger sample sizes to achieve the desired power.

• Any increase in power is most likely achieved by increased sample size.

Page 14: multivariate data analysis

14

Types of Mult ivariate TechniquesTypes of Mult ivariate Techniques

• Dependence techniques:Dependence techniques: a variable or set of a variable or set of variables is identif ied as the dependent variable to variables is identif ied as the dependent variable to be predicted or explained by other variables known be predicted or explained by other variables known as independent variables.as independent variables.

o Mult iple RegressionMult iple Regressiono Mult iple Discriminant AnalysisMult iple Discriminant Analysiso Logit/Logist ic RegressionLogit/Logist ic Regressiono Mult ivariate Analysis of Variance (MANOVA) and Mult ivariate Analysis of Variance (MANOVA) and

CovarianceCovarianceo Conjoint AnalysisConjoint Analysiso Canonical Correlat ionCanonical Correlat iono Structural Equations Modeling (SEM)Structural Equations Modeling (SEM)

Page 15: multivariate data analysis

15

• Interdependence techniques:Interdependence techniques: involve the involve the simultaneous analysis of al l variables in the simultaneous analysis of al l variables in the set, without distinction between dependent set, without distinction between dependent variables and independent variables.variables and independent variables.

o Principal Components and Common Factor Principal Components and Common Factor AnalysisAnalysis

o Cluster AnalysisCluster Analysiso Mult idimensional Scaling (perceptual mapping)Mult idimensional Scaling (perceptual mapping)o Correspondence AnalysisCorrespondence Analysis

Types of Mult ivariate TechniquesTypes of Mult ivariate Techniques

Page 16: multivariate data analysis

16

Selecting a Mult ivariate TechniqueSelecting a Mult ivariate Technique

1.1. What type of relationship is being examined – What type of relationship is being examined – dependence or interdependence?dependence or interdependence?

2.2. Dependence relationship: How many variables are Dependence relationship: How many variables are being predicted?being predicted? What is the measurement scale of the What is the measurement scale of the

dependent variable?dependent variable? What is the measurement scale of the predictor What is the measurement scale of the predictor

variable?variable?3.3. Interdependence relationship: Are you examining Interdependence relationship: Are you examining

relationships between variables, respondents, or relationships between variables, respondents, or objects?objects?

Page 17: multivariate data analysis

17

Multiple RegressionMultiple Regression

A single metric A single metric

dependent variable is dependent variable is

predicted by several metric predicted by several metric

independent variables.independent variables.

Page 18: multivariate data analysis

18

A non-metric (categorical) A non-metric (categorical)

dependent variable is predicted bydependent variable is predicted by

several metric independent several metric independent

variables.variables.

Examples:

• Gender – Male vs. Female

• Heavy Users vs. Light Users

• Purchasers vs. Non-purchasers

• Good Credit Risk vs. Poor Credit Risk

• Member vs. Non-Member

Discriminant AnalysisDiscriminant Analysis

Page 19: multivariate data analysis

19

Logist ic RegressionLogist ic Regression

A single nonmetric dependent variable is A single nonmetric dependent variable is

predicted by several metric independent predicted by several metric independent

variables. This technique is similar to variables. This technique is similar to

discriminant analysis, but rel ies on discriminant analysis, but rel ies on

calculations more l ike regression.calculations more l ike regression.

Page 20: multivariate data analysis

20

MANOVAMANOVA

Several metric dependent variablesSeveral metric dependent variables

are predicted by a setare predicted by a set of nonmetric of nonmetric

(categorical) independent variables.(categorical) independent variables.

Page 21: multivariate data analysis

21

CANONICAL ANALYSISCANONICAL ANALYSIS

Several metric dependent Several metric dependent variables are predicted by variables are predicted by several metric independent several metric independent variables.variables.

Page 22: multivariate data analysis

22

. . . is used to understand . . . is used to understand respondents’ preferences respondents’ preferences for products and services.for products and services.

In doing this, it determines the In doing this, it determines the importance of importance of bothboth::

attributesattributes and and

levels of attributeslevels of attributes

. . . based on a smaller subset of . . . based on a smaller subset of combinations of attributes and combinations of attributes and

levels.levels.

CONJOINT ANALYSIS

Page 23: multivariate data analysis

23

Typical ApplicationsTypical Applications :: Soft DrinksSoft Drinks Candy BarsCandy Bars CerealsCereals BeerBeer Apartment Buildings; CondosApartment Buildings; Condos Solvents; Cleaning FluidsSolvents; Cleaning Fluids

CONJOINT ANALYSIS

Page 24: multivariate data analysis

24

Structural Equations Modeling (SEM)Structural Equations Modeling (SEM)

Estimates multiple, interrelated Estimates multiple, interrelated dependence relationships based on two dependence relationships based on two components:components:

1.1. Structural ModelStructural Model

2.2. Measurement ModelMeasurement Model

Page 25: multivariate data analysis

25

. . . .. . . . analyzes the structure of the analyzes the structure of the interrelationships among a large number interrelationships among a large number of variables to determine a set of common of variables to determine a set of common underlying dimensions (factors).underlying dimensions (factors).

Factor Analysis

Page 26: multivariate data analysis

26

. . . .. . . . groups objects (respondents, groups objects (respondents, products, f irms, variables, etc.) so that each products, f irms, variables, etc.) so that each object is similar to the other objects in the object is similar to the other objects in the cluster and different from objects in al l the cluster and different from objects in al l the other clusters.other clusters.

Cluster AnalysisCluster Analysis

Page 27: multivariate data analysis

27

Mult idimensional ScalingMult idimensional Scaling

. . .. . . identif ies “unrecognized” dimensions identif ies “unrecognized” dimensions that affect purchase behavior based on that affect purchase behavior based on customer judgments of:customer judgments of:

• similarit iessimilarit ies or or• preferencespreferences

and transforms these into distances and transforms these into distances represented as perceptual maps.represented as perceptual maps.

Page 28: multivariate data analysis

28

Correspondence AnalysisCorrespondence Analysis

. . .. . . uses non-metric data and evaluates uses non-metric data and evaluates either l inear or non-l inear relationships in either l inear or non-l inear relationships in an effort to develop a perceptual map an effort to develop a perceptual map representing the association between representing the association between objects (f irms, products, etc.) and a set of objects (f irms, products, etc.) and a set of descriptive characteristics of the objects.descriptive characteristics of the objects.

Page 29: multivariate data analysis

29

Guidelines for Mult ivariate AnalysisGuidelines for Mult ivariate Analysis

• Establish Practical Signif icance as Well Establish Practical Signif icance as Well as Statistical Signif icance.as Statistical Signif icance.

• Sample Size Affects All Results.Sample Size Affects All Results.• Know Your Data.Know Your Data.• Strive for Model Parsimony.Strive for Model Parsimony.• Look at Your Errors.Look at Your Errors.• Validate Your Results.Validate Your Results.

Page 30: multivariate data analysis

30

Stage 1:Stage 1: Define the Research Problem, Objectives, andDefine the Research Problem, Objectives, and Multivariate Technique(s) to be UsedMultivariate Technique(s) to be Used

Stage 2:Stage 2: Develop the Analysis PlanDevelop the Analysis PlanStage 3:Stage 3: Evaluate the Assumptions Underlying theEvaluate the Assumptions Underlying the

Multivariate Technique(s)Multivariate Technique(s)Stage 4:Stage 4: Estimate the Multivariate Model and AssessEstimate the Multivariate Model and Assess

Overall Model FitOverall Model Fit

Stage 5:Stage 5: Interpret the Variate(s)Interpret the Variate(s)

Stage 6:Stage 6: Validate the Multivariate ModelValidate the Multivariate Model

A Structured Approach to A Structured Approach to Mult ivariate Model Building:Mult ivariate Model Building:

Page 31: multivariate data analysis

31

Variable DescriptionVariable Description Variable TypeVariable TypeData Warehouse Classification VariablesData Warehouse Classification VariablesX1X1 Customer TypeCustomer Type nonmetric nonmetric X2X2 Industry TypeIndustry Type nonmetric nonmetric X3X3 Firm SizeFirm Size nonmetric nonmetric X4X4 RegionRegion nonmetricnonmetricX5X5 Distribution SystemDistribution System nonmetricnonmetricPerformance Perceptions VariablesPerformance Perceptions VariablesX6X6 Product QualityProduct Quality metricmetricX7X7 E-Commerce Activities/WebsiteE-Commerce Activities/Website metricmetricX8X8 Technical SupportTechnical Support metricmetricX9X9 Complaint ResolutionComplaint Resolution metricmetricX10X10 Advertising Advertising metricmetricX11X11 Product LineProduct Line metricmetricX12X12 Salesforce ImageSalesforce Image metricmetricX13X13 Competitive PricingCompetitive Pricing metricmetricX14X14 Warranty & ClaimsWarranty & Claims metricmetricX15X15 New ProductsNew Products metricmetricX16X16 Ordering & BillingOrdering & Billing metricmetricX17X17 Price FlexibilityPrice Flexibility metricmetricX18X18 Delivery SpeedDelivery Speed metricmetricOutcome/Relationship MeasuresOutcome/Relationship MeasuresX19X19 SatisfactionSatisfaction metric metric X20X20 Likelihood of RecommendationLikelihood of Recommendation metric metric X21X21 Likelihood of Future PurchaseLikelihood of Future Purchase metric metric X22X22 Current Purchase/Usage LevelCurrent Purchase/Usage Level metric metric X23X23 Consider Strategic Alliance/Partnership in FutureConsider Strategic Alliance/Partnership in Future nonmetricnonmetric

Description of HBAT Primary Database VariablesDescription of HBAT Primary Database Variables

Page 32: multivariate data analysis

32

Multivariate AnalysisMult ivariate AnalysisLearning Checkpoint:Learning Checkpoint:

1.1. What is multivariate analysis?What is multivariate analysis?2.2. Why use multivariate analysis?Why use multivariate analysis?3.3. Why is knowledge of measurement Why is knowledge of measurement

scalesscales important in using multivariate analysis?important in using multivariate analysis?4.4. What basic issues need to be examinedWhat basic issues need to be examined

when using multivariate analysis?when using multivariate analysis?5.5. Describe the process for applying Describe the process for applying

multivariate analysis.multivariate analysis.