rss spss short course module 9 principal components analysis 1

9/10/2014 RSS SPSS Short Course Module 9 Principal Components Analysis 1

http://www.unt.edu/rss/class/Jon/SPSS_SC/Module9/M9_PCA/SPSS_M9_PCA1.htm 1/11

Research and Statistical Support

UIT | ACUS | Help Desk | Training | About Us | Publications | RSS Home

Return to the SPSS Short Course

MODULE 9

Principal Components Analysis in SPSS.Before we begin with the analysis; let's take a moment to address and hopefully clarify one of the most confusing and misarticulated issues in statistical teaching and practiceliterature.

First, Principal Components Analysis (PCA) is a variable reduction technique which maximizes the amount of variance accounted for in the observed variables by a smaller groupof variables called COMPONENTS. As an example, consider the following situation. Let's say, we have 500 questions on a survey we designed to measure persistence. We want toreduce the number of questions so that it does not take someone 3 hours to complete the survey. It would be appropriate to use PCA to reduce the number of questions by identifyingand removing redundant questions. For instance, if question 122 and question 356 are virtually identical (i.e. they ask the exact same thing but in different ways), then one of them isnot necessary. The PCA process allows us to reduce the number of questions or variables down to their PRINCIPAL COMPONENTS.

PCA is commonly, but very confusingly, called exploratory factor analysis (EFA). The use of the word factor in EFA is inappropriate and confusing because we are really interested inCOMPONENTS, not factors. This issue is made more confusing by some software packages (e.g. PASW/SPSS & SAS) which list or use PCA under the heading factor analysis.

Second, Factor Analysis (FA) is typically used to confirm the latent factor structure for a group of measured variables. Latent factors are unobserved variables which typically can notbe directly measured; but, they are assumed to cause the scores we observe on the measured or indicator variables. FA is a model based technique. It is concerned with modeling therelationships between measured variables, latent factors, and error.

As stated in O'Rourke, Hatcher, and Stepanski (2005): "Both (PCA & FA) are methods that can be used to identify groups of observed variables that tend to hang together empirically.Both procedures can also be performed with the SAS FACTOR procedure and they generally tend to provide similar results. Nonetheless, there are some important conceptualdifferences between principal component analysis and factor analysis that should be understood at the outset. Perhaps the most important deals with the assumption of an underlyingcausal structure. Factor analysis assumes that the covariation in the observed variables is due to the presence of one or more latent variables (factors) that exert causal influence onthese observed variables" (p. 436).

Final thoughts. Both PCA and FA can be used as exploratory analysis. But; PCA is predominantly used in an exploratory fashion and almost never used in a confirmatory fashion. FAcan be used in an exploratory fashion, but most of the time it is used in a confirmatory fashion because it is concerned with modeling factor structure. The choice of which is usedshould be driven by the goals of the analyst. If you are interested in reducing the observed variables down to their principal components while maximizing the variance accounted forin the variables by the components, then you should be using PCA. If you are concerned with modeling the latent factors (and their relationships) which cause the scores on yourobserved variables, then you should be using FA.

Principal Components Analysis

The following covers a few of the SPSS procedures for conducting principal component analysis. For the duration of this tutorial we will be using the ExampleData4.sav

PCA 1. So, here we go. Begin by clicking on Analyze, Dimension Reduction, Factor...



Next, highlight all the variables you want to include in the analysis; here y1 through y15. Then click on Descriptives...and select the following. Then click the Continue button.

Next, click on the Extraction... button and select the following (notice Principal components is specified by default). Also notice the extraction is based on components witheigenvalues greater than 1 (also a default). There are a number of perspectives on determining the number of components to extract and what criteria to use for extraction. Originally,eigenvalues greater than 1 was generally accepted. However, more recently Zwick and Velicer (1986) have suggested, Horns (1965) parallel analysis tends to be more precise determining the number of reliable components or factors. Unfortunately, Parallel Analysis is not available in SPSS. Therefore, a review of the parallel analysis engine (Mishra, & Donavan, 2007) is strongly recommended. Next, click the Continue button, then click the Scores... button.

Scores... will add new columns to our dataset; each new column will consist of each variable's score on eachextracted component. Then, click on the Continue button, then click the OK button.

The output should be similar to what is displayed below.



The Descriptive Statistics table simply reports the mean, standard deviation, and number of cases for each variable included in theanalysis.

The Correlation Matrix (above) is the correlation matrix for the variables included. Generally speaking, a close review of this table can offer an insight into how the PCA results willcome out.

The next table is used as to test assumptions; essentially, the Kaiser-Meyer-Olking (KMO) statistic should be greaterthan 0.600 and the Bartlett's test should be significant (e.g. p < .05). KMO is used for assessing sampling adequacyand evaluates the correlations and partial correlations to determine if the data are likely to coalesce on components(i.e. some items highly correlated, some not). The Bartlett's test evaluates whether or not our correlation matrix is anidentity matrix (1 on the diagonal & 0 on the off-diagonal). Here, it indicates that our correlation matrix (of items) isnot an identity matrix--we can verify this by looking at the correlation matrix. The off-diagonal values of ourcorrelation matrix are NOT zeros, therefore the matrix is NOT an identity matrix.

A communality (h) is the sum of the squared component loadings and represents the amount of variance in that variable accounted for by all the components. For example, all fiveextracted components account for 51.1% of the variance in variable y1 (h = .511).



The next table is intuitively named and reports the variance explained by each component as well as the cumulative variance explained by all components. When we speak of varianceexplained with regard to this table, we are referring to the amount of variance in the total collection of variables/items which is explained by the component(s). For instance,component 5 explains 7.035% of the variance in the items; specifically, in the items' variance-covariance matrix. We could also say, 55.032% of the variance in our items wasexplained by the 5 extracted components.

The scree plot graphically displays the information in the previous table; the components' eigenvalues.

The next table displays each variable's loading on each component. We notice from the output, we have two items (y14 & y15) which do not load on the first component (always thestrongest component without rotation) but create their own retained component (also with eigenvalue greater than 1). We know a component should have, as a minimum, 3items/variables; but let's reserve deletion of items until we can discover whether or not our components are related.



To determine if our components are related, we can run a simple correlation on the saved component scores. Click onAnalyze, Correlate, Bivariate...

Next, highlight all the REGR factor scores (really component scores) and use the arrow button to move them to the Variables: box. Then click the OK button.

Here we see there is NO relationship between the components; which indicates we should be using an orthogonal rotation strategy.



PCA 2. Rotation imposed. Next, we re-run the PCAspecifying 5 components to be retained. We will alsospecify the VARIMAX rotation strategy, which is a form oforthogonal rotation.

Begin by clicking on Analyze, Dimension Reduction,Factor...

Next, you should see that the previous run is still specified; variables y1 through y15. Next click on Descriptives...and select the following; we no longer need the univariatedescriptives, the correlation matrix, or the KMO and Bartlett's tests. Then click the Continue button. Next, click on the Extraction... button. We no longer need the scree plot; but we doneed to change the number of components (here called factors) to extract. We know from the first run, there were 5 components with eigenvalues greater than one, so we select 5factors to extract (meaning components). Then click the Continue button.

Next, click on Rotation... and select Varimax. Then click the Continue button. Then click on the Scores... button and remove the selection for Save as Variables. Then click theContinue button. Then click the OK button.

The Component Transformation Matrix simply displays the component rotation.



The first 3 tables in the output should be identical to what is displayed above from PCA 1; accept, now we have two new tables at the bottom of the output.

The rotated component matrix table shows which items/variables load on which components after rotation. We see that the rotation cleaned up the interpretation by eliminating theglobal first component. This provides a clear depiction of our principal components (marked with red ellipses).

PCA 3. Finally, we can eliminate the two items (y14 & y15) which (a) by themselves create a component (components should have more than 2 items or variables) and (b) do not loadon the un-rotated or initial component 1. Again, click on Analyze, Dimension Reduction, then Factor...

All the communalities indicate 50% or more of the variance in eachvariable/item is explained by the combined four components; withone exception (y4) which is lower than what we would prefer.



Again, you'll notice the previous run is still specified, however we need to remove the y14 and y15 variables. Next, click on Extraction... and change the number of factors to extract(really components) from 5 to 4. Then click the Continue button and then click the OK button.

The output should be similar to what is displayed below.

The Component Matrix table displays component loadings for each item (prior to rotation).

The Rotated Component Matrix displays the loadings for each item on each rotated component, again clearly showing which itemsmake up each component.

And again, the Component Transformation Matrix displays the correlations among the components prior to and after rotation.



To help clarify the purpose of PCA, consider reviewing the table with the "Total Variance Explained" from PCA 1. The last column on the right table is called "Cumulative" and refers to the cumulative variance by the components. Now focus on the fifth value from the top in that column.That value of 55.032 tells us 55.032% of the variance in the items the items' variance - covariance matrix) is accounted for by all 5 components. Asa comparison, and to highlight the purpose of PCA; look at the same table onlyfor PCA 3, which has the title "Total Variance Explained". Pay particularattention to the fourth value in the last (cumulative) column. This value of 55.173tells us 55.173% of the variance in the items (specifically the items' variance -covariance matrix) is accounted for by all 4 components. So, we have reducedthe number of items from 15 to 13, reduced the number of components, and yethave improved the amount of variance accounted for in the items by our principalcomponents.

REFERENCES / RESOURCES

Horn, J. (1965). A rationale and test for the number of factors in factor analysis.

Psychometrika, 30, 179 185.

O'Rourke, N., Hatcher, L., & Stepanski, E.J. (2005). A step-by-step approach to using SAS for univariate and multivariate statistics, Second Edition. Cary, NC: SAS Institute Inc.

Patil, V. H., Singh, S. N., Mishra, S., & Donavan, D. T. (2007). Parallel Analysis Engine to Aid Determining Number of Factors to Retain [Computer software]. Retrieved 08/23/2009from http://ires.ku.edu/~smishra/parallelengine.htm

Zwick, W. R., & Velicer, W. F. (1986). Factors influencing five rules for determing the number of components to retain. Psychological Bulletin, 99, 432 442.

rss spss short course module 9 principal components analysis 1

Documents