department of cognitive science michael kalsher adv. experimental methods & statistics psyc 4310 /...

Download Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental

Post on 17-Jan-2018

218 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2009, Michael Kalsher and James Watt 3 When to use Factor Analysis? Data ReductionData Reduction Identification of underlying latent structuresIdentification of underlying latent structures -Clusters of correlated variables are termed factors –Example: –Factor analysis could potentially be used to identify the characteristics (out of a large number of characteristics) that make a person popular. Candidate characteristics: Level of social skills, selfishness, how interesting a person is to others, the amount of time they spend talking about themselves (Talk 2) versus the other person (Talk 1), their propensity to lie about themselves.

TRANSCRIPT

Department of Cognitive Science Michael Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Factor Analysis 1 PSYC 4310 Advanced Experimental Methods and Statistics 2013, Michael Kalsher PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 2 What Are Factors? Representing Factors Graphs and Equations Extracting factors Methods and Criteria Interpreting Factor Structures Factor Rotation Reliability Cronbachs alpha Writing Results Outline PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 3 When to use Factor Analysis? Data ReductionData Reduction Identification of underlying latent structuresIdentification of underlying latent structures -Clusters of correlated variables are termed factors Example: Factor analysis could potentially be used to identify the characteristics (out of a large number of characteristics) that make a person popular. Candidate characteristics: Level of social skills, selfishness, how interesting a person is to others, the amount of time they spend talking about themselves (Talk 2) versus the other person (Talk 1), their propensity to lie about themselves. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 4 The R-Matrix Meaningful clusters of large correlation coefficients between subsets of variables suggests these variables are measuring aspects of the same underlying dimension. Factor 1: The better your social skills, the more interesting and talkative you tend to be. Factor 2: Selfish people are likely to lie and talk about themselves. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 5 What is a Factor? Factors can be viewed as classification axes along which the individual variables can be plotted. The greater the loading of variables on a factor, the more the factor explains relationships among those variables. Ideally, variables should be strongly related to (or load on) only one factor. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 6 Graphical Representation of a factor plot Note that each variable loads primarily on only one factor. Factor loadings tell use about the relative contribution that a variable makes to a factor PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 7 Mathematical Representation of a factor plot Y i = b 1 X 1i +b 2 X 2i + b n X n + i Factor i = b 1 Variable 1i +b 2 Variable 2i + b n Variable n + i The equation describing a linear model can be applied to the description of a factor. The bs in the equation represent the factor loadings observed in the factor plot. Note: there is no intercept in the equation since the lines intersection at zero and hence the intercept is also zero. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 8 Mathematical Representation of a factor plot Sociability i = b 1 Talk 1 i +b 2 Social Skills i + b 3 interest i + b 4 Talk 2 + b 5 Selfish i + b 6 Liar i + i There are two factors underlying the popularity construct: general sociability and consideration. We can construct equations that describe each factor in terms of the variables that have been measured. Consideration i = b 1 Talk 1 i +b 2 Social Skills i + b 3 interest i + b 4 Talk 2 + b 5 Selfish i + b 6 Liar i + i PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 9 Mathematical Representation of a factor plot Sociability i = 0.87Talk 1 i +0.96Social Skills i Interest i Talk Selfish i Liar i + i The values of the bs in the two equations differ, depending on the relative importance of each variable to a particular factor. Consideration i = 0.01Talk 1 i Social Skills i interest i Talk Selfish i Liar i + i Ideally, variables should have very high b-values for one factor and very low b-values for all other factors. Replace values of b with the co-ordinate of each variable on the graph. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 10 Factor Loadings The b values represent the weights of a variable on a factor and are termed Factor Loadings. These values are stored in a Factor pattern matrix (A). Columns display the factors (underlying constructs) and rows display how each variable loads onto each factor. Variables Factors SociabilityConsideration Talk Social Skills Interest Talk Selfish Liar PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 11 Factor Scores Once factors are derived, we can estimate each persons Factor Scores (based on their scores for each factors constituent variables). Potential uses for Factor Scores. -Estimate a persons score on one or more factors. -Answer questions of scientific or practical interest (e.g., Are females are more sociable than males? using the factors scores for sociability). Methods of Determining Factor Scores -Weighted Average (simplest, but scale dependent) -Regression Method (easiest to understand; most typically used) -Bartlett Method (produces scores that are unbiased and correlate only with their own factor). -Anderson-Rubin Method (produces scores that are uncorrelated and standardized) PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt Approaches to Factor Analysis Exploratory Reduce a number of measurements to a smaller number of indices or factors (e.g., Principal Components Analysis or PCA). Goal: Identify factors based on the data and to maximize the amount of variance explained. Confirmatory Test hypothetical relationships between measures and more abstract constructs. Goal: The researcher must hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items load onto and reflect particular factors. In contrast to EFA, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt Communality Understanding variance in an R-matrix Total variance for a particular variable has two components: Common Variance variance shared with other variables. Unique Variance variance specific to that variable (including error or random variance). Communality The proportion of common (or shared) variance present in a variable is known as the communality. A variable that has no unique variance has a communality of 1; one that shares none of its variance with any other variable has a communality of 0. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt Factor Extraction: PCA vs. Factor Analysis Principal Component Analysis. A data reduction technique that represents a set of variables by a smaller number of variables called principal components. They are uncorrelated, and therefore, measure different, unrelated aspects or dimensions of the data. Principal Components are chosen such that the first one accounts for as much of the variation in the data as possible, the second one for as much of the remaining variance as possible, and so on. Useful for combining many variables into a smaller number of subsets. Factor Analysis. Derives a mathematical model from which factors are estimated. Factors are linear combinations that maximize the shared portion of the variance underlying latent constructs. May be used to identify the structure underlying such variables and to estimate scores to measure latent factors themselves. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt Factor Extraction: Eigenvalues & Scree Plot Eigenvalues Measure the amount of variation accounted for by each factor. Number of principal components is less than or equal to the number of original variables. The first principal component accounts for as much of the variability in the data as possible. Each succeeding component has the highest variance possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components. Scree Plots Plots a graph of each eigenvalue (Y-axis) against the factor with which it is associated (X-axis). By graphing the eigenvalues, the relative importance of each factor becomes apparent. PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 16 Factor Retention Based on Scree Plots PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 17 Kaiser (1960) recommends retaining all factors with eigenvalues greater than 1. -Based on the idea that eigenvalues represent the amount of variance explained by a factor and that an eigenvalue of 1 represents a substantial amount of variation. -Kaisers criterion tends to overestimate the number of factors to be retained. Factor Retention: Kaisers Criterion PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 18 Students often become stressed about statistics (SAQ) and the use of computers and/or SPSS to analyze data. Suppose we develop a questionnaire to measure this propensity (see sample items on the following slides; the data can be found in SAQ.sav). Does the questionnaire measure a single construct? Or is it possible that there are multiple aspects comprising students anxiety toward SPSS? Doing Factor Analysis: An Example PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 19 PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 20 PSYC 4310/6310 Advanced Experimental Methods and Statistics 2009, Michael Kalsher and James Watt 21 Doing Factor Analysis: Some Considerations Sample size is important! A sample of 300 or more will likely