16.0 cm manual eng

Authors

Martin Klint Hansen

Nicholas Fritsche

Rasmus Porsgaard

Per Surland

Ulrick Tøttrup

Niels Yding Sørensen

Cand. Merc. Manual for SPSS 16.0

Description

Contains statistical methods used on master

level.

IT – department

Fall 2008

Table of Contents

1. ............................................................................................................................................................. 1 Introduction

2. ....................................................................................................................................... 2 General facts about SPSS

2.1 ............................................................................................................................ 2 The contents of the Data Editor2.1.1 ............................................................................................................................................. 2 The File menu2.1.2 ............................................................................................................................................. 2 The Edit menu2.1.3 ........................................................................................................................................... 2 The View menu2.1.4 ............................................................................................................................................ 3 The Data menu2.1.5 .................................................................................................................................. 3 The Transform menu2.1.6 ...................................................................................................................................... 3 The Analyze menu2.1.7 ........................................................................................................................................ 4 The Graphs menu2.1.8 ...................................................................................................................................... 4 The Utilities menu2.1.9 ........................................................................................................................................... 4 The Help Menu

2.2 ............................................................................................................................ 4 Contents of the Output Window

3. ....................................................................................................................................................... 6 Cluster analysis

3.1 .......................................................................................................................................................... 6 Introduction

3.2 .......................................................................................................................... 6 Hierarchical analysis of clusters3.2.1 ...................................................................................................................................................... 6 Example3.2.2 ................................................................................................................... 8 Implementation of the analysis3.2.3 ....................................................................................................................................................... 12 Output

3.3 ........................................................................... 14 K-means cluster analysis (Non-hierarchical cluster analysis)3.3.1 .................................................................................................................................................... 14 Example3.3.2 ................................................................................................................. 14 Implementation of the analysis3.3.3 ....................................................................................................................................................... 18 Output

4. ..................................................................................................................................... 21 Correspondence Analysis

4.1 ........................................................................................................................................................ 21 Introduction

4.2 .............................................................................................................................................................. 21 Example

4.3 ........................................................................................................................... 22 Implementation of the analysis4.3.1 .................................................................................................................... 23 Specification of the variables4.3.2 ........................................................................................................................................................ 24 Model4.3.3 .................................................................................................................................................... 25 Statistics4.3.4 .......................................................................................................................................................... 25 Plots

4.4 ................................................................................................................................................................. 26 Output

4.5 ................................................................................................... 31 HOMALS (Multiple correspondence analysis)

5. ................................................................................................................................ 32 Multiple analysis of variance

5.1 ........................................................................................................................................................ 32 Introduction

5.2 .............................................................................................................................................................. 33 Example

5.3 ........................................................................................................................... 33 Implementation of the analysis

5.4 ................................................................................................................................................................. 39 Output

6. ............................................................................................................................................ 43 Discriminant analysis

6.1 ........................................................................................................................................................ 43 Introduction

6.2 .............................................................................................................................................................. 44 Example

6.3 ........................................................................................................................... 44 Implementation of the analysis6.3.1 .................................................................................................................................................... 46 Statistics6.3.2 ..................................................................................................................................................... 47 Classify

6.3.3 .......................................................................................................................................................... 48 Save

6.4 ................................................................................................................................................................. 48 Output

7. ....................................................................................................................................................... 53 Profile analysis

7.1 ........................................................................................................................................................ 53 Introduction

7.2 .............................................................................................................................................................. 53 Example

7.3 ........................................................................................................................... 54 Implementation of the analysis7.3.1 ........................................................................................................................... 57 Are the profiles parallel?7.3.2 ....................................................................................................................... 60 Are the profiles congruent?7.3.3 ...................................................................................................................... 61 Are the profiles horizontal?

8. ....................................................................................................................................................... 63 Factor analysis

8.1 ........................................................................................................................................................ 63 Introduction

8.2 .............................................................................................................................................................. 63 Example

8.3 ........................................................................................................................... 64 Implementation of the analysis8.3.1 .............................................................................................................................................. 65 Descriptives8.3.2 .................................................................................................................................................. 66 Extraction8.3.3 ..................................................................................................................................................... 67 Rotation8.3.4 ........................................................................................................................................................ 68 Scores8.3.5 ...................................................................................................................................................... 68 Options

8.4 ................................................................................................................................................................. 69 Output

9. ......................................................................................................................... 74 Multidimensional scaling (MDS)

9.1 ........................................................................................................................................................ 74 Introduction

9.2 .............................................................................................................................................................. 74 Example

9.3 ........................................................................................................................... 75 Implementation of the analysis9.3.1 ........................................................................................................................................................ 77 Model9.3.2 ...................................................................................................................................................... 78 Options

9.4 ................................................................................................................................................................. 79 Output

10. .................................................................................................................................................. 86 Conjoint Analysis

10.1 ................................................................................................................................................... 86 Introduction

10.2 ......................................................................................................................................................... 86 Example

10.3 .................................................................................................................. 87 Generating an orthogonal design10.3.1 ................................................................................................................................................. 90 Options

10.4 ...................................................................................................................................................... 90 PlanCards

10.5 ........................................................................................................................................... 92 Conjoint analysis

10.6 ............................................................................................................................................................ 94 Output

11. .......................................................................................................................................................... 97 Item analysis

11.1 ................................................................................................................................................... 97 Introduction

11.2 ......................................................................................................................................................... 97 Example

11.3 ...................................................................................................................... 98 Implementation of the analysis11.3.1 ................................................................................................................................................... 99 Model11.3.2 ............................................................................................................................................... 99 Statistics

11.4 .......................................................................................................................................................... 101 Output

12. .......................................................................................................................... 104 Introduction to Matrix algebra

12.1 ................................................................................................................................................ 104 Initialization

12.2 ................................................................................................................ 105 Importing variables into matrices

12.3 ..................................................................................................................................... 105 Matrix computation

12.4 .......................................................................................................................... 109 General Matrix operations

12.5 .......................................................................................................................... 118 Matrix algebraic functions

APPENDIX A [Choice of Statistical Methods] ............................................................................................................ 122

Literature List ............................................................................................................................................................... 123

Introduction

1. Introduction

The purpose of this manual is to introduce the reader to the use of SPSS for the statistic analyses

used at Master Level (Cand. Merc.). For data manipulation and basic statistical analysis (such as T-

test, ANOVA, log linear and logit loglinear analysis) please refer to the manual “Introduction to

SPSS 16.0”.

The manual is to be considered a collection of examples. The idea is to introduce the reader to the

statistical methods by means of fairly simple examples based on a specific problem. The manual

suggests a solution and provides a short interpretation. However, it is important to emphasize that it

is no more than a suggestion - and not a final solution. The data sets, which are referred to, can be

found on the following server location (accessible for all students at The Aarhus School of

Business): \\okf-filesrv1\exemp\Spss\Manual.

The manual provides a short description of each statistical analysis including its main purpose, and

exemplifies a relevant situation to use the treated test. However, no theoretic basis is provided,

which means that the reader is expected to find the specific theoretical literature him/herself.

Inspiration can be found in the list of literature at the back of this manual, or at the beginning of

each chapter.

The manual has been written for SPSS version 16.0. If using other versions differences may occur.

Suggestions for improvements or corrections can be handed in at the IT instructors’ office in room

H14 or sent by e-mail to the following address: [email protected]

IT Department, The Aarhus School of Business, Autumn 2007.

1

mailto:[email protected]

General facts about SPSS

2. General facts about SPSS

SPSS consists of two parts – a Data Editor and an Output Window. In the Data Editor,

Manipulation of data is carried out and commands are executed. In the Output window all results,

tables and figures are displayed, while at the same time operating as a log window.

2.1 The contents of the Data Editor

The menu bar is located at the top of the Data Editor, and each of its items is discussed below.

2.1.1 The File menu

This item is used for the administration of data, i.e. loading, saving, importing and exporting data

and printing. The options are roughly similar to those of other Windows based programs.

2.1.2 The Edit menu

Edit is used for editing the contents of the current window. Here, the cut, copy and paste functions

are found. In addition, the options function makes it possible to change the fonts used in the

outputs, the decimal separator, etc.

2.1.3 The View menu

In this item it is possible to switch the status bar, gridlines, etc. on and off. The font and font size

Data Editor used in are defined here as well.

2


2.1.4 The Data menu

This is where various manipulations of data are made possible. Examples include the definition of

new variables (Define Variable…) and the sorting of variables (Sort Cases…), etc.

2.1.5 The Transform menu

The Transform menu gives the user the opportunity to recode variables, generate random numbers,

rank data sets (construct ordinal data), define missing values, etc.

2.1.6 The Analyze menu

In this menu the user chooses which statistic analysis to use. In the table below the available

analyses are listed together with a short description.

Method of Analysis Description

Reports Case- and report summaries

Descriptive Statistics Descriptive statistics, frequencies, P-P- and Q-Q plots etc.

Custom Tables Construction of various tables

Compare Means Comparison of various mean values, e.g. by the use of a T-test or

ANOVA

General Linear Model Estimation by means of GLM, MANOVA

Generalized Linear Model Expand of the opportunities of the regression and General Linear

Model. Specifically of models on data which does not satisfy the

condition of normally distributed errors and models containing links

between independent variables.

Mixed Models Flexible modeling with data that are correlated and non-constant

variability.

Correlate Various association measures for the variables of the data set.

Among other things it is possible to calculate covariance, Pearson’s

correlation coefficient, etc.

Regression Regression; both linear, logistic and curve estimation

Loglinear General loglinear analysis and logit loglinear analysis

Classify Cluster- and discriminant analysis

Data Reduction Factor analysis and Correspondence analysis

Scale Item analysis and multidimensional scaling

Nonparametric Test Chi-square-, binomial-, hypothesis tests and test of independence

Time Series Autoregression and ARIMA

3


Survival Kaplan-Maier, Cox-regression, etc.

Multiple Response Frequency tables and crosstabs for multiple response datasets

Missing Value Analysis Description of patterns concerning missing values.

2.1.7 The Graphs menu

In the Graphs menu there are several possibilities to get a graphical overview of the data set. The

options include the construction of histograms, line graphs, pie charts, box plots as well as pareto-,

pp- and qq-diagrams.

2.1.8 The Utilities menu

Here it is possible to obtain information about the different variables, e.g. type, length, etc. If only

some of the variables are needed, the user has the opportunity to construct a new data set based on

the existing variables. This is done by clicking Define Sets and choosing the relevant variables

followed by choosing Use Sets from the menu, where it is possible to use the new data sets.

2.1.9 The Help Menu

In the Help menu it is possible to seek help about how, the specific calculations and data manipulations are

handled in SPSS. The most important submenu is Topics, where it is possible carry out a search on keywords

using the index bar. You can also press the F1 key anywhere in SPSS to get help about the specific menu

you are working in. If you want to do a regression and need some help in the Statistics menu you can simply

press F1 and information about the different possibilities within the Statistics menu will be displayed.

2.2 Contents of the Output Window

As mentioned, the Output Window shows results, tables and figures and works as a log window. In

the Window menu the user can switch between the Data Editor and the Output Window.

4


In the left side of the Output Window a tree structure is displayed, which provides a good overview

of results, figures and tables. A specific output can be viewed by clicking on it in the tree structure,

whereupon it will appear in the right side of the screen. If an error is committed, a log-item will pop

up. By clicking this item a window will appear, where a possible explanation to the problem is

provided.

5

Cluster analysis

3. Cluster analysis

The following analysis and interpretation of Cluster analysis is based on the following literature:

- ”Videregående data-analyse med SPSS og AMOS”, Niels Blunch 1. udgave 2000, Systime.

Chp. 1, p. 3 – 29.

3.1 Introduction

Cluster analysis is a multivariate procedure for detecting groupings in the data where there is no

clarity. Cluster analysis is often applied as an explorative technique. The purpose of a cluster

analysis is to divide the units of the analysis into smaller clusters so that the observations in the

cluster are homogenous and the observations in the other clusters in one way or the other are

different from these.

In contrast to the discriminant analysis, the groups in the cluster analysis are unknown from the

outset. This means that the groups are not based on pre-defined characteristics. Instead the groups

are created on basis of the characteristics of the data material. It is important to note that cluster

analysis should be used with extreme caution since SPSS will always find a structure no matter if

there is one present or not. The choice of method should therefore always be carefully considered

and the output should be critically examined before the result of the analysis is employed.

Cluster analysis in SPSS is divided into hierarchical and K-means clustering (non-hierarchical

analysis) Examples of both types of analyses are found below.

3.2 Hierarchical analysis of clusters

In the hierarchical method, clustering begins by finding the closest pair of objects (cases or

variables) according to a distance measure and combining them to form a cluster. This algorithm

starts with each case (or variable) in a separate cluster and combines clusters one step at a time until

all data is placed in a cluster. The method is called hierarchical because it does not allow single

objects to change cluster once they have been joined; this requires that the entire cluster be changed.

As indicated, the method can be used for both cases and variables.

3.2.1 Example

Data set: \\okf-filesrv1\exemp\Spss\Manual\Hierakiskdata.sav

6

Cluster analysis

Data on a number of economic variables is registered for 15 randomly chosen countries. By means

of hierarchical cluster analysis it is now possible to find out which countries are similar with regard

to the variables in question. In other words the task is to divide the countries into relevant clusters.

The chosen countries are:

- Argentina, Austria, Bangladesh, Bolivia, Brazil, Chile, Denmark, The Dominican Republic,

India, Indonesia, Italy, Japan, Norway, Paraguay and Switzerland.

The following variables are included in the analysis:

- urban (number of people living in cities), lifeexpf (average life span for women), literacy

(number of people that can read), pop_incr (increase in population in % per year), babymort

(mortality for newborns per 1000 newborns), calories (daily calorie intake), birth_rt (birth

rate per capita), death_rt (death rate per 1000 inhabitants), log_gdp (the logarithm to BNP

per inhabitant), b_to_d (birth rate in proportion to mortality), fertilty (average number of

babies), log_pop (the logarithm of a number of inhabitants).

Both the hierarchical and the non-hierarchical method will be described in the following

paragraphs. However it should be noted that for both analysis methods the cluster structure is made

based on cases. As it will be mentioned later, it is possible to conduct the hierarchical analysis

based on variables, however this is beyond the scope of the manual.

7

Cluster analysis

3.2.2 Implementation of the analysis

Choose the following commands in the menu bar:

Hereby the following dialogue box will appear:

8

Cluster analysis

The relevant variables are chosen and added to the Variable(s) window (see below). Under Cluster

it is chosen whether the analysis has to do with observations or variables, i.e. if the user wants to

group variables or observations into clusters. Since this example has to do with observations, Cases

are chosen.

Provided that the hierarchical analysis is based on cases, such as the above standing example, it is

very important that the chosen variable is defined as a string ( in variable view – type).

Under Display it is possible to include a display of plots and/or statistics. The Statistics and

Methods buttons must be chosen before the analysis can be carried through. When these dialogue

boxes are activated it is possible to click the Save button. Subsequently, the above-mentioned

buttons will be described:

3.2.2.1 Statistics

Choosing Agglomeration schedule helps identify which clusters are combined in each iterative step.

Proximity matrix shows how the distances or the similarities between the units (cases or variables)

appear in the output.

Cluster membership shows how every single observations- or variables cluster membership appear

in the output.

3.2.2.2 Plots

A dendrogram or an icicle plot can be chosen here. In this example as well as in the following, a

dendrogram will be chosen, since the interpretation of the cluster analysis is often based on this.

9

Cluster analysis

3.2.2.3 Method

In this dialogue box the methods on which the cluster analysis is based are indicated. In Cluster

method a number of methods can be chosen. The most important are described below:

• Nearest neighbour (single linkage): The clusters are created by minimizing the distance

between pairs of clusters.

• Furthest neighbour (complete linkage): The distance between two clusters is calculated as

the distance between the two elements (one from each cluster) that are furthest apart.

10

Cluster analysis

• Between-groups linkage (average linkage): The distance between two clusters is calculated

as the average of all distance combinations (which is given by all distances between

elements from the two clusters).

• Ward’s method (Ward’s algorithm): For each cluster a stepwise minimization of the sum of

the squared distances to the centre (the centroid) within each cluster is carried out.

The result of the cluster analysis depends heavily on the chosen method. As a consequence, one

should try different methods before completing the analysis. In this example Between-groups

linkage has been chosen.

Measure provides the user with the choice of different distance measures depending on the type of

data in question. The type of the data in question should be stated (interval data, frequency table or

binary variables). In this example interval has been chosen.

After this, the distance measure must be specified in the interval drop-down box. A thorough

discussion of these is beyond the scope of this manual, although several are relevant. It should be

noted, hovewer, that Pearson’s correlation is often used in connection with variables, while

Squared Euclidean distance is often used in connection with cases. The latter is default in SPSS,

and it is chosen in the example as well, since observations are considered here. When using

Pearson’s correlation one should be aware that larger distance measures mean less distance

(measures range from –1 to 1).

Transform values is used for standardizing data before the calculation of the distance matrix, which

is often relevant for the cluster analysis. This is due to the fact that variables with high values

contribute more than variables with low values to the calculation of distances. Consequently, this

matter should be carefully considered. In this case, the values are transformed to a scale ranging

from 0 to 1.

Transform Measures is used for transforming data after the calculation of the distance matrix,

which is not relevant in this case.

3.2.2.4 Save

This dialogue box gives the user the opportunity to save cluster relationships for observations or

variables as new variables in the data set. This option is ignored in the example.

11

Cluster analysis

3.2.3 Output

After the specification of the options the analysis is carried out. This results in a number of outputs,

of which three parts are commented on and interpreted below.

(1) Proximity matrix shows the distances that have been calculated in the analysis, in this case the

calculated distances between the countries. For example, the distance between Norway and

Denmark is 0.154 when using the Squared Euclidean distance. The cluster analysis itself is carried

out on the basis of this distance matrix. As can be seen, the matrix is symmetric:

Proximity Matrix

,000 1,008 4,883 2,233 ,442 ,423 1,024 ,936 3,164 1,353 ,877 ,836 ,694 2,286 ,8401,008 ,000 7,671 4,809 1,885 1,895 ,184 2,667 5,653 2,746 ,191 ,742 ,135 4,659 ,1284,883 7,671 ,000 1,398 3,032 5,109 8,498 3,024 ,484 1,727 7,886 7,867 7,655 4,089 7,8302,233 4,809 1,398 ,000 1,590 1,883 5,276 ,788 1,367 1,261 5,157 4,817 4,318 1,370 4,600

,442 1,885 3,032 1,590 ,000 ,921 2,106 ,787 1,665 ,547 1,611 1,452 1,717 2,572 1,837,423 1,895 5,109 1,883 ,921 ,000 2,081 ,446 3,543 1,721 1,771 1,217 1,341 1,309 1,441

1,024 ,184 8,498 5,276 2,106 2,081 ,000 3,125 6,437 3,488 ,355 ,994 ,154 5,243 ,342,936 2,667 3,024 ,788 ,787 ,446 3,125 ,000 2,160 ,911 2,772 2,286 2,231 ,920 2,311

3,164 5,653 ,484 1,367 1,665 3,543 6,437 2,160 ,000 ,739 5,479 5,270 5,684 3,344 5,7631,353 2,746 1,727 1,261 ,547 1,721 3,488 ,911 ,739 ,000 2,657 2,639 2,872 2,320 2,759

,877 ,191 7,886 5,157 1,611 1,771 ,355 2,772 5,479 2,657 ,000 ,312 ,311 4,883 ,236,836 ,742 7,867 4,817 1,452 1,217 ,994 2,286 5,270 2,639 ,312 ,000 ,634 4,245 ,561,694 ,135 7,655 4,318 1,717 1,341 ,154 2,231 5,684 2,872 ,311 ,634 ,000 4,013 ,112

2,286 4,659 4,089 1,370 2,572 1,309 5,243 ,920 3,344 2,320 4,883 4,245 4,013 ,000 3,952,840 ,128 7,830 4,600 1,837 1,441 ,342 2,311 5,763 2,759 ,236 ,561 ,112 3,952 ,000

Case1:Argentina2:Austria3:Bangladesh4:Bolivia5:Brazil6:Chile7:Denmark8:Domincan R9:India10:Indonesia11:Italy12:Japan13:Norway14:Paraguay15:Switzerlan

1:Argentina 2:Austria 3:Bangladesh 4:Bolivia 5:Brazil 6:Chile 7:Denmark8:Domincan

R. 9:India 10:Indonesia 11:Italy 12:Japan 13:Norway14:Paraguay15:

Switzerland

Squared Euclidean Distance

This is a dissimilarity matrix

The next output is (2) Agglomeration Schedule, which shows when the individual observations or

clusters are combined. In the example observation 13 (Norway) is first combined with observation

15 (Switzerland), where after observation 2 (Austria) is tied to the very same cluster. This

procedure continues until all observations are combined. The Coefficients column shows the

distance between the observations/clusters that are combined.

12

Cluster analysis

Agglomeration Schedule

13 15 ,112 0 0 22 13 ,132 0 1 32 7 ,227 2 0 42 11 ,273 3 0 81 6 ,423 0 0 93 9 ,484 0 0 145 10 ,547 0 0 102 12 ,649 4 0 131 8 ,691 5 0 101 5 1,023 9 7 124 14 1,370 0 0 121 4 1,716 10 11 131 2 2,718 12 8 141 3 4,651 13 6 0

Stage1234567891011121314

Cluster 1 Cluster 2Cluster Combined

Coefficients Cluster 1 Cluster 2

Stage Cluster FirstAppears

Next Stage

The (3) Dendrogram shown below is a simple way of presenting the cluster analysis:

The dendrogram indicates that there are three clusters in the data set. These are:

13

Cluster analysis

• Norway, Denmark, Austria, Switzerland, Italy and Japan (i.e. the European countries and

Japan).

• Brazil, Argentina, Chile, The Dominican Republic, Bolivia, Paraguay and Indonesia (i.e. the

South American countries and Indonesia).

• India and Bangladesh (i.e. the Asian countries).

The cluster structure is found by making a sectional elevation through the dendrogram, where the

distance is relatively long. In the example above the sectional elevation is made between 10 and 15.

This structure is very much in accordance with the prior expectations one might have regarding the

economic variables in question.

3.3 K-means cluster analysis (Non-hierarchical cluster analysis)

Contrary to the hierarchical cluster analysis, the non-hierarchical cluster analysis does allow

observations to change from one cluster to another while the analysis is proceeding. The method is

based on an iterative procedure where every single observation is grouped into a number of clusters

until the relation, the variance between the clusters and the variance within the clusters, is

maximized. The procedure itself will not be explained further here. It is noted, however, that the

user has the opportunity to choose clusters as well as so-called cluster-centres, which will often be

an advantage.

The non-hierarchical cluster analysis is often used in large data sets since the method is capable of

treating more observations than the hierarchical method. Apart from this the method is only used in

analysis regarding observations and not analysis regarding variables.

3.3.1 Example

Data: \\okf-filesrv1\exemp\Spss\Manual\K-meansdata.sav

The analysis is based on the same example as the hierarchical cluster analysis in section 3.2.1.

3.3.2 Implementation of the analysis

Before the cluster analysis is carried out it is necessary to standardize the variables from the

previous example. As mentioned earlier this is done to prevent large values from swinging too

much weight on the analysis. When using the k-means method it is not possible to make any

14

Cluster analysis

transformations during the analysis, as described in the hierarchical method. Therefore, this must be

done prior to the analysis.

Choose Analyze – Descriptive statistics – Descriptives from the menu bar. This results in a dialogue

box, which should be completed as follows:

After this it is time to carry out the analysis on the basis of the new standardized variables.

Choose the following from the menu bar:

15

Cluster analysis

This results in the following dialogue box:

Here all the standardized variables are moved to the Variables window. Since the names of the

countries are meant to be included in the output, the variable Country is moved to the Label Cases

by window.

As opposed to the hierarchical cluster analysis it is not a requirement for the non-hierarchical

cluster analysis that the variable used to group the clusters by (in this case country) is defined as a

string.

Method is set to Iterate and classify as default, since the classification is to be performed on the

basis of iterations.

In that connection it must be decided how many clusters are wanted as a result of the iteration

process. This is chosen in the Number of Clusters window. The number of clusters depends on the

data set as well as on preliminary considerations. From the result of the hierarchical analysis there is

reason to believe that the number of clusters is three, so this is the number that has been entered. If

one has detailed knowledge of the number and characteristics of expected clusters, it is possible to

define cluster centres in Centers.

16

Cluster analysis

In addition to the options described above there is a number of ways, in which the user can

influence the iteration procedure and the output. These are dealt with below.

3.3.2.1 Iterate

Clicking the Iterate button results in the following dialogue box with three changeable settings:

Maximum iteration indicates the number of iterations in the algorithm, which ranges between 1 and

999. Default is 10 and this will be used in this example as well.

Convergence Criterion defines the limit, at which the iteration is ended. The value must be between

0 and 1. In this example a value of 0.001 is used, which means that the iteration ends when none of

the cluster centres move by more than 0.1 percent of the shortest distance between two clusters.

Selecting Use running means is the mean values will be recalculated each time an observation has

been assigned to a cluster. By default this only happens when all observations have been assigned to

the clusters.

3.3.2.2 Save

Under Save it is possible to save the cluster number in a new variable in the data set. In addition it is

possible to create a new variable that indicates the distance between the specific observation and the

mean value of the cluster. See the dialogue box below:

3.3.2.3 Options

Clicking Options gives the user the following possibilities:

17

Cluster analysis

Initial cluster centers are the initial estimates of the cluster mean values (before any iterations).

ANOVA table carries out an F-test for the cluster variables. However, since the observations are

assigned to clusters on the basis of the distance between the clusters, it is not advisable to use this

test for the null hypothesis that there is no difference between the clusters.

Cluster information for each case ensures that a cluster number is assigned to each observation –

i.e. information about the cluster relationship. Furthermore, it produces information about the

distance between the observation and the mean value of the cluster.

Choosing exclude cases listwise in the Missing values section means, that a respondent will be

excluded from the calculations of the clusters, if there is a missing value for any of the variables.

Exclude cases pairwise will classify a respondent in the closest cluster on the basis of the remaining

variables. Therefore, a respondent will always be classified, unless there are missing values for all

variables.

3.3.3 Output

The analysis results in a number of outputs, of which the most relevant will be interpreted and

commented on the following page.

18

Cluster analysis

The output (1) Cluster Membership shows, as the name indicates, the countries’ relation to the

clusters. From the output it is obvious that, except for a few changes, the cluster structure is

identical to that of the hierarchical method. Distance is a measure of how far the individual country

is located compared to the centre of its cluster. The output shows that Brazil is the country, which is

located furthest away from its cluster centre with a distance of 3,075.

The next output to be commented on is (2) Final Cluster Centers. From this output the mean value

of the standardized variables for each of the three clusters appear. It is therefore possible to compare

the clusters on the basis of each variable.

19

Cluster analysis

The last output, which will be commented on is (3) Distances between Final Cluster Centers.

Distances between Final Cluster Centers

4,301 4,3214,301 5,8184,321 5,818

Cluster123

1 2 3

This output indicates the individual distances between the clusters. In this example cluster 1 and 2

are most alike. A closer look at output (2) is required in order to explore the differences and

similarities in depth.

20

Correspondence Analysis

4. Correspondence Analysis

The following explanation and interpretation of the correspondence analysis is based on the

following literature:

- “SPSS Categories® 10.0”, Jacqueline J. Meulman & Willem J. Heiser, SPSS Inc. 1999. Chp.

1, p. 10-11; Chp. 5, p. 45-46; Chp. 11, p. 147

For further reading see:

- “Theory and Applications of Correspondence Analysis”, Michael J. Greenacre, Academic

Press inc., 1984.

4.1 Introduction

The purpose of correspondence analysis is to make a graphical illustration of data in cross tables,

since the relation between rows and columns in a cross table can be hard to comprehend. Often, a

cross table does not create a well-arranged picture of the characteristics of a data set – i.e. the

relation between two variables. This is especially the case when nominal scaled data are considered

(data without any natural order or rank).

Factor analysis is normally perceived as the standard technique used to describe the connection

between variables. However an important requirement for factor analysis is that the data is interval

scaled. Correspondence analysis makes it possible to perform an analysis of the relationship

between nominal scaled variables graphically on a multidimensional level (see appendix 1). Row

and column values are calculated and plots are produced based on this. Categories that are alike will

be placed close to each other in a plot. This makes the analysis a useful technique for market

segmentation.

Please note that it is possible to make a multiple correspondence analysis (homogeneity analysis)

with more than two variables. See the last section of this chapter.

4.2 Example1

Data set: \\okf-filesrv1\exemp\Spss\Manual\Korrespondancedata.sav

A company wants to analyze which brands are preferred by different age groups.

1 The example is based on Blunch, Niels, “Analyse af markedsdata”, 2. rev. udgave 2000, Systime, pp. 70-78

21


• Brands: A, B, C and D

• Age groups: 20-29 years, 30-39 years, 40-49 years and >50 years

210 random respondents distributed across the age groups have been asked about their preference

for the four brands. Firstly, it must be clarified whether the objective is to study the differences

between the age groups with regard to their choice of brand, to map out the brands’ composition in

relation to age, or a combination of both. In the last mentioned example, which is shown here, it is

necessary to run the analysis twice. The first step is to carry out an analysis based on row profiles or

Row principal as it is called in SPSS. After this the analysis is carried out based on column profiles

or Column principal as SPSS calls it.

4.3 Implementation of the analysis

The starting point of the correspondence analysis is shown below:

22


4.3.1 Specification of the variables

A dialogue box now appears, and it should be filled out as follows:

The age variable is moved to the Row window and the brand variable is moved to Column. The

question marks behind the variables appear until the Range has been defined, i.e. the minimum and

maximum values.

This must be done for each variable by clicking the Define Range button. This results in the

dialogue box shown below:

For both variables in the data set Range is from 1 to 4.

The output from the correspondence analysis has yet to be specified. This is done by specifying the

settings in Model, Statistics and Plots. Each of these will be described separately below.

23


4.3.2 Model

By clicking the Model button the user can define which methods should be used for estimating the

model. First, the desired number of dimensions for the solution is defined. A two-dimensional

solution is desired in this example. Distance Measure deals with the method used for the estimation

of the distance matrix. For a standard correspondence analysis as this the Chi Square distance

measure is chosen. This automatically activates Row and column means are removed as the

Standardization Method.

The last setting has to do with the definition of the method by which normalization takes place. The

chosen method is important for the later interpretation of the analysis, which is discussed below (see

section 4.4 output).

As mentioned above, the analysis must be carried out twice to obtain a graphic picture of the row

and the column profiles and to provide an opportunity for a comparison of the two.

First, the analysis is carried out on the basis of the row profiles, because our first priority is to

examine the difference between the age groups with regard to brand choice. Therefore, Row

principal is chosen.

24


4.3.3 Statistics

Clicking the Statistics button makes it possible to calculate various statistic measures.

Correspondence table displays a cross tabulation of the data. This table provides a very clear

picture of the distribution in the individual groups. The Row and Column profiles are useful as well,

so they are also selected. In Confidence Statistics it is possible to calculate standard deviations and

correlations for either the row or the column profiles. Which additional measures to choose,

depends on the individual analysis and its purpose.

4.3.4 Plots

The Plots menu is divided into two parts. Scatterplots makes it possible to plot the individual

variables (Row and Column points) against the individual dimensions (in this case age and brand),

as well as to plot both variables against these dimensions (Biplot).

25


ID label width for scatterplots is used to define the maximum number of characters (letters) to be

used in the description of an individual point in the plot. When there are a lot of points in the plot, it

is often a good idea to reduce this number in order to avoid confusion in the scatterplot. In this case

a reduction is not necessary.

The second part of the Plots menu, Lineplots, makes it possible to plot the two variables

individually against each of the dimensions. Transformed row categories produce a plot of the row

categories (i.e. age) against the matching row scores. A similar plot with the column categories (i.e.

brand) and the column scores instead is produced, when Transformed column categories is chosen.

Since Row principal was chosen above, one part of the scatterplot will be identical to the plot of the

row profiles. The analysis is carried out by clicking Continue followed by OK. Finally it is possible

to restrict the number of dimensions in the Plot Dimensions submenu.

4.4 Output

The output from the correspondence analysis is discussed below. Some parts of the output are left

out, as they have not been considered relevant. The first output is (1) Correspondence table, which

is a cross tabulation of the data set.

Correspondence Table

30 18 6 3 5710 30 10 5 55

3 10 27 8 482 2 40 6 50

45 60 83 22 210

age20-29 years30-39 years40-49 years>50 yearsActive Margin

A B C Dbrand

Active Margin

The table indicates that 210 respondents are included in the analysis. Furthermore, it shows that

there are 55 respondents in the age group 30-39 years. Ten of these prefer brand A, 30 prefer brand

B, ten prefer brand C, and five people prefer brand D. It can be concluded that this age group is in

favor of brand B in contrast to the age group >50 years, where only two prefer brand B. For all age

groups in total there is a preference for brand C with 83 of the 210 respondents in favor of this

brand.

26


The next output, (2) Row Profiles, is a cross tabulation of the row profiles:

Row Profiles

,526 ,316 ,105 ,053 1,000,182 ,545 ,182 ,091 1,000,063 ,208 ,563 ,167 1,000,040 ,040 ,800 ,120 1,000,214 ,286 ,395 ,105

age20-29 years30-39 years40-49 years>50 yearsMass

A B C D Active Marginbrand

The horizontal percentages display the difference between the individual age groups with regard to

their brand choice. For example, 18.2 % in the age group 30-39 years prefer brand A, 54.4 % prefer

B, 18.2 % prefer C, and 9.1 % prefer D. As mentioned above, the largest preference for all 210

respondents is brand C. As can be seen, this preference represents 39.5 %.

A cross tabulation of the column profiles is found in (3) Column Profiles:

Column Profiles

,667 ,300 ,072 ,136 ,271,222 ,500 ,120 ,227 ,262,067 ,167 ,325 ,364 ,229,044 ,033 ,482 ,273 ,238

1,000 1,000 1,000 1,000

age20-29 years30-39 years40-49 years>50 yearsActive Margin

A B C D Massbrand

The vertical percentages display the age composition of each individual brand. For example, of all

the 45 respondents preferring brand A, 66.7 % belong to the age group 20-29 years, 22.2 % belong

to the age group 30-39 years, 6.7 % belong to the age group 40-49 years, and only 4.4 % belong to

the age group >50 years.

In order to determine the degree of pure coincidence in the data set with regard to the differences

described above, a χ2-test is carried out. It tests a homogeneity hypothesis, i.e. a test of whether the

individual age groups can be assumed to be alike. The result is to be found in (4) Summary:

27


Summary

,648 ,420 ,808 ,808 ,048 ,153,309 ,095 ,184 ,991 ,075,068 ,005 ,009 1,000

,520 109,191 ,000a 1,000 1,000

Dimension123Total

SingularValue Inertia Chi Square Sig. Accounted for Cumulative

Proportion of Inertia

StandardDeviation 2

Correlation

Confidence SingularValue

9 degrees of freedoma.

It appears from the output that the hypothesis regarding equal market profiles for the four age

groups must be rejected. This conclusion is based on a χ2-value of 109.19 with a corresponding p-

value of 0.000. Therefore, the profiles can be expected to be different, and in the plot below they

will be separated from each other. In the case of an insignificant p-value the dispersion would have

been small, and it would not have been possible to conclude dissimilarity in the preferences. (4)

Summary also shows how much of the dispersion that can be maintained in a graphing of only two

dimensions. The first dimension (age) explains 80.8 % of the inertia, and the second dimension

(brand) explains 18.4 % of the inertia. This leaves only 0.9 % for the third, excluded, dimension.

Below the interesting Biplot (5) Row and Column Points/Row Principal Normalization is shown:

28


The plot shows the row and the column profiles together. However, it is only possible to compare

the row points (age) individually, since Row Principal was chosen as the normalization method.

The smaller the distances between the age groups, the more similar they are with regard to brand

preferences and vice versa. In this case the age groups 40-49 years and >50 years are most alike.

However, it is not possible to conclude anything on the basis of the location of the brands.

As mentioned above, the analysis must be run twice in order to compare the individual column

points (brand). In the second analysis Column Principal should be chosen as normalization method.

The procedure was shown in section 4.3.2 in the dialogue box Correspondence Analysis: Model:

Part of the dialogue box Correspondence Analysis: Model (see section 4.3.2).

This results in a new Biplot (6) Row and Column Points/Column Principal Normalization:

29


The plot shows the row and the column profiles together. Again it is only possible to compare the

column points (brand) individually, since Column Principal has now been chosen as normalization

method. From the plot it appears that brand C and D are most alike with regard to the age groups, to

which they appeal.

Unfortunately, it is not possible to create “The French plot”, a combination of the two plots (5) and

(6), directly in SPSS. However, this can be done from the output instead. From the analysis where

Row Principal was used, the coordinates for both the first and the second dimension for the row

variable (age) can be viewed. This is shown in (7) Overview Row Points below.

Overview Row Pointsa

,271 -,756 ,354 ,189 ,369 ,356 ,820 ,180 1,000,262 -,399 -,445 ,094 ,099 ,542 ,443 ,552 ,995,229 ,462 -,097 ,054 ,116 ,022 ,906 ,040 ,946,238 ,856 ,179 ,183 ,416 ,080 ,952 ,041 ,993

1,000 ,520 1,000 1,000

age20-29 years30-39 years40-49 years>50 yearsActive Total

Mass 1 2

Score in Dimension

Inertia 1 2

Of Point to Inertia ofDimension

1 2

Contribution

Of Dimension to Inertia of PointTotal

Row Principal normalizationa.

After this the coordinates for both dimensions for the column variable (brand) can be viewed from

the analysis, where Column Principal was used. This is shown in (8) Overview Column Points

below.

Overview Column Pointsa

,214 -,808 ,448 ,183 ,333 ,451 ,764 ,236 1,000,286 -,494 -,409 ,118 ,166 ,500 ,593 ,405 ,998,395 ,710 ,086 ,203 ,475 ,031 ,983 ,014 ,998,105 ,321 -,127 ,016 ,026 ,018 ,657 ,103 ,761

1,000 ,520 1,000 1,000

brandABCDActive Total

Mass 1 2

Score in Dimension

Inertia 1 2

Of Point to Inertia ofDimension

1 2 TotalOf Dimension to Inertia of Point

Contribution

Column Principal normalizationa.

When the coordinates for each point (in this case there are eight points in total, i.e. four age groups

and four brands) have been found, they can be plotted into a coordinate system (Excel can easily be

used). This procedure as well as the interpretation of “The French plot” is left to the reader2.

2 Blunch, Niels J.: “Analyse af markedsdata”, p. 70.

30


4.5 HOMALS (Multiple correspondence analysis)

As mentioned in the beginning it is also possible to make a multiple correspondence analysis in

SPSS, which will be briefly presented here. HOMALS is an abbreviation for homogeneity analysis

by means of altering least squares. The purpose of HOMALS is to find a number of homogenous

groups. The analysis attempts to produce a solution where groups within the same category will be

close to each other, while groups that are different from each other will be plotted far from each

other.

An important difference from multiple correspondence analysis and the ordinary correspondence

analysis is that input for the multiple analysis is a data matrix where the rows represent objects and

the columns represent variables. In the ordinary correspondence analysis, however, it is possible for

both rows and columns to represent variables. The multiple correspondence analysis makes an

analysis with more than two variables possible. This is not possible in ordinary correspondence

analysis.

The purpose of multiple correspondence analysis is to describe the relationship between two or

more nominal scaled variables on a multidimensional level, containing the categories of variables as

well as the objects in these categories. It is a qualitative factor analysis where nominal scaled

variables are measured.

As an example, multiple correspondence analysis can be used to graphically illustrate the relation

between employment, income and gender. It is possible that the analysis will reveal that income and

gender discriminates between the respondents but that there is no difference with regard to

employment. The implementation of the analysis will not be shown in this manual.

31

Multiple analysis of variance

5. Multiple analysis of variance

The following explanation and interpretation of the correspondence analysis is based on the


- ”Videregående data-analyse med SPSS og AMOS”, Niels Blunch 1. edition 2000, Systime.

Chp. 3, p. 51 – 67.

- ”Analyse af markedsdata”, Niels Blunch 2. rev. edition 2000, Systime. Chp. 6, p. 213-220

- ”Multivariate data analysis”, Hair, Anderson and more, 5. edition, Prentice Hall. Chp. 6, p.

326-387

5.1 Introduction

Multiple analysis of variance (MANOVA) is an extension of the traditional analysis of variance

(ANOVA), and is used for situations where there are more than one dependent variable, which is

dependent on one or more explanatory variables (see appendix 1). Similar to ANOVA the method is

used to measure or prove differences between groups or experimental treatments.

The difference between the two methods is that ANOVA identifies group differences based on one

single dependent variable. MANOVA, in contrast, identifies differences simultaneously across

several dependent variables. In other words, MANOVA clarifies differences between groups while

at the same time considering the correlations between the dependent variables. Notice that it is not

sufficient to carry through an ANOVA test for the dependent variables. There will be situations

where MANOVA is significant despite the fact that all ANOVA tests are insignificant. Likewise the

opposite can also occur – where the ANOVA test is significant and the MANOVA is insignificant.

A short notice on the connection between MANOVA and discriminant analysis: Provided that

MANOVA turns out to be significant it will be possible to find a linear function of the dependent

variables that separate the groups. Hereby, at least one of the discriminant functions is significant.

Therefore, MANOVA and discriminant analysis are opposite analyses. In MANOVA the groups are

known in advance, and the objective is to find their relationship with the values of the variables. In

discriminant analysis the objective is to find the groupings based on the values of the variables.

Obviously discriminant analysis, mentioned in chapter 4, should only be employed when the

MANOVA test is significant.

32


5.2 Example

Data set: \\okf-filesrv1\exemp\Spss\Manual\Manovadata.sav

The objective is to test whether the effect of two pills given at the same time is more effective than

the intake of only one of the pills. 20 patients with the disease in question are distributed into four

groups by lot.

In group 1 the patients get a placebo, in group 2 they get both pills, in group 3 the patients get one

of the pills and in group 4 they get the other pill. The effect is measured on two variables: Y1 and

Y2. The data is shown in table 1 below:

Table 1

Treatments (Groups)

1 2 3 4

Y1 Y2 Y1 Y2 Y1 Y2 Y1 Y2

1

2

3

2

2

2

1

2

3

2

8

9

7

8

8

9

8

9

9

10

2

3

3

3

4

4

2

3

5

6

4

3

3

5

5

5

3

4

6

7

Means

2

2

8

9

3

4

4

5

The null hypothesis for this multivariate analysis of variance is that the treatment has no influence

on either Y1 or Y2. The alternative hypothesis is that the treatment can explain the effects on Y1 and

Y2.


The starting point of the MANOVA is shown below:

33


When choosing GLM Multivariate a dialogue box with various options appears as shown below.

First the dependent variables must be chosen. In this case it is Y1 and Y2, so they are marked and

moved to the Dependent Variables window. All dependent variables must be interval scaled.

After this the group variable(s) that determines the groups to be compared on the basis of the

dependent variables must be defined in the Fixed Factor(s) window. In this example the group

variable (treatment) are moved to the window so that the four groups are compared with regard to

Y1 and Y2.

34


In the Covariate(s) window variables can be added. The linear effect of the covariances on the

dependent variables will in that case be removed separately from the factor level effects. In this

example such variables are assumed to be non-existing.

In the WLS Weight window a variable can be added in order to assign different weights to the

observations. However, this is not relevant in this example.

The output and the estimation of the MANOVA have yet to be specified. This will be done by

describing the Model, Contrast, Plots, Post Hoc and Options buttons individually below.

5.3.1.1 Model

Clicking the Model button results in the following dialogue box, where the model can be specified.

In Specify Model the default setting is Full factorial, which means that a full model is specified –

i.e. a model containing the intercept, all factor and covariate main effects as well as all interactions.

The alternative is Custom, where the model is specified manually – i.e. terms from the full model

can be omitted. If Custom is chosen, the desired factors must be marked and moved to the window

in the right-hand side of the dialogue box. In this example a Full factorial model has been chosen.

In the bottom of the dialogue box it is possible to specify the method, by which the sum of squares

is calculated. The options are type 1, 2, 3 or 4. Type 3 is default, and this is the most widely used

method on The Aarhus School of Business as well. Therefore, type 3 is chosen for this example

also.

35


Before clicking Continue it is a good idea to make sure that Include intercept in model is activated.

This ensures the calculation of an overall mean, which is required if Deviation is chosen as the

estimation method for the factors (see next section).

5.3.1.2 Contrast

Clicking the Contrast button results in the following dialogue box:

In Change Contrast the estimation method for each factor can be changed. No method is chosen by

default, but the user can choose between six different ones: Deviation, Simple, Difference, Helmert,

Repeated and Polynomial. A method is chosen by marking the factor in the Factors window and

clicking the appropriate method in the Contrast drop-down-box. The chosen method depends on

each individual data set. If Deviation or Simple is chosen, SPSS requires a specification of a

Reference Category (either the first or last variable in the data set). In this example Deviation has

been chosen.

5.3.1.3 Plots

Clicking this button makes it possible to get profile plots produced for each factor. This plot will

spread out on the dependent variables. In Horizontal Axis the user must specify the factor to be

plotted. When the chosen factor is moved from Factors to Horizontal Axis, the Add button must be

clicked for the factor to move to the Plots window in the bottom square.

36


When there are several factors (grouping variables), the described procedure is repeated for all

factors, of which a profile plot is desired. Furthermore, it is possible to use Separate Lines for each

variable to combine two factors with each other in the plot. In this example where there is only one

factor, the factor (group) is moved to Horizontal Axis, and the Add button is clicked.

5.3.1.4 Post Hoc

When it has been determined that there is a difference between two or more groups, Post Hoc

Multiple Comparison can be used to determine which groups are significantly different from each

other. It is worth mentioning that the chosen tests are only carried out for each dependent variable

separately, which is similar to the comparisons that are made in ANOVA. In order to be able to

specify which test to use, it is necessary to specify which factors to be tested first. This is done at

the top of the dialogue box:

37


There are several tests, which are categorized on the basis of whether or not equal variances in the

groups are assumed. For this reason, it is beyond the scope of this manual to have a thorough

discussion of each test, but at the Aarhus School of Business the tests that have been activated in the

dialogue box shown above are normally used.

5.3.1.5 Options

Clicking the Options button results in the dialogue box shown below. In Estimated Marginal Means

the factors or factor combinations (in the case of several factors) are chosen, for which an estimate

is desired. Since this example only contains a single factor, group is moved to the Display means

for window.

Display offers various ways of controlling the output from the GLM procedure. The boxes that are

selected in the dialogue box above will be described below.

• Descriptive statistics displays the observed mean value, the standard deviation for the value

of each dependent variable in the factor combinations.

• Estimates of effect size produce a partial, eta-squared value for each effect and each

parameter estimate.

• Observed power indicates the strength of the individual tests.

• Parameter estimates displays the parameter estimates, standard errors, t-tests, confidence

intervals, and the observed power for each test.

38


• SSCP matrices display the sum-of-squares and cross-product matrices. When Residual

SSCP is chosen, a Bartlett’s test for sphericity of the residual covariance matrix is added.

• Homogeneity tests carries out Levene’s test for variance homogeneity for each dependent

variable across the existing factor levels/combinations. In addition Box’s M test for

homogeneity of the covariance matrices of the dependent variables across all factor

combinations is carried out.

• Spread vs. level plots and residual plots are very useful for checking the model

assumptions. Residual plots produce an observed/expected standardized residual plot for

each dependent variable.

Furthermore, it is possible to change the level of significance.

5.4 Output

When the procedure described above is carried out, click the OK button from the initial dialogue

box Multivariate in order for SPSS to carry out the analysis and produce the outputs. The outputs,

with the exception of a few, which have been considered irrelevant for a discussion, will be

commented on stepwise below.

Between-Subjects Factors

placebo 5begge piller 5pille A 5pille B 5

1234

gruppeValue Label N

Descriptive Statistics

2,00 ,707 58,00 ,707 53,00 ,707 54,00 1,000 54,25 2,447 202,00 ,707 59,00 ,707 54,00 1,581 55,00 1,581 55,00 2,847 20

gruppeplacebobegge pillerpille Apille BTotalplacebobegge pillerpille Apille BTotal

y1

y2

Mean Std. Deviation N

39


The outputs shown above contains a description of the data set, among other things the number of

observations in each group output (1a). In output (1b) there is some descriptive statistics, which will

not be further discussed.

Output (2) displays Box’s test, which indicates whether or not the covariance matrices for the

individual groups are equal. This constitutes one of the assumptions of MANOVA, and it is similar

to the assumption regarding equal variances for all groups known from ANOVA. From the shown

output it appears that the null hypothesis of equal covariance matrices is accepted, which means that

the assumption is not violated.

Box's Test of Equality of Covariance Matricesa

13,100Box's M1,123F

9df12933,711df2

Sig. ,343Tests the null hypothesis that the observed covariancematrices of the dependent variables are equal across groups.

Design: Intercept+gruppea.

After this test of assumptions (3) Bartlett’s test of Sphericity follows. The null hypothesis is that the

residual covariance matrices are proportional with an identity matrix, which means that the two

dependent variables are uncorrelated. If the null hypothesis is accepted, there is no reason to carry

out the MANOVA, since it would be sufficient with a number of simple ANOVAs. As can be seen

from the output below, the null hypothesis is rejected, and MANOVA is therefore relevant in this

example.

Bartlett's Test of Sphericitya

,0166,212

2,045

Likelihood RatioApprox. Chi-SquaredfSig.

Tests the null hypothesis that the residual covariancematrix is proportional to an identity matrix.

Design: Intercept+gruppea.

The output (4) Multivariate Tests shown below illustrates a number of multivariate tests, which are

all significant. It shows that all four test values indicate that the null hypothesis of equal groups is

40


rejected, since the p-value is 0. It can therefore be concluded that the combined dependent variables,

Y1 and Y2, vary across the treatment.

Multivariate Testsd

,976 303,141b 2,000 15,000 ,000 ,976 606,283 1,000,024 303,141b 2,000 15,000 ,000 ,976 606,283 1,000

40,419 303,141b 2,000 15,000 ,000 ,976 606,283 1,00040,419 303,141b 2,000 15,000 ,000 ,976 606,283 1,000

1,027 5,631 6,000 32,000 ,000 ,514 33,786 ,989,073 13,566b 6,000 30,000 ,000 ,731 81,396 1,000

11,414 26,632 6,000 28,000 ,000 ,851 159,791 1,00011,292 60,223c 3,000 16,000 ,000 ,919 180,670 1,000

Pillai's TraceWilks' LambdaHotelling's TraceRoy's Largest RootPillai's TraceWilks' LambdaHotelling's TraceRoy's Largest Root

EffectIntercept

gruppe

Value F Hypothesis df Error df Sig.Partial EtaSquared

Noncent.Parameter

ObservedPowera

Computed using alpha = ,05a.

Exact statisticb.

The statistic is an upper bound on F that yields a lower bound on the significance level.c.

Design: Intercept+grupped.

The SSCP matrices are shown in output (5) below. The group (H) and error (E) matrix displays the

variance between and within the groups, respectively. The matrices are displayed with bold letters.

It is the relationship between these matrices that forms the basis of the multivariate tests mentioned

above.

Between-Subjects SSCP Matrix

361,250 425,000425,000 500,000103,750 115,000115,000 130,000

10,000 7,0007,000 24,000

y1y2y1y2y1y2

Intercept

gruppe

Hypothesis

Error

y1 y2

Based on Type III Sum of Squares

The output (6) Tests of Between-Subjects-Effects shows whether or not the explanatory variable is

significant for the dependent variables. As can be seen, the contribution of the explanatory variable

group to the explanation of the variation of the dependent variables is significant for both of the

dependent variables.

41


Tests of Between-Subjects Effects

103,750b 3 34,583 55,333 ,000 ,912 166,000 1,000130,000c 3 43,333 28,889 ,000 ,844 86,667 1,000361,250 1 361,250 578,000 ,000 ,973 578,000 1,000500,000 1 500,000 333,333 ,000 ,954 333,333 1,000103,750 3 34,583 55,333 ,000 ,912 166,000 1,000130,000 3 43,333 28,889 ,000 ,844 86,667 1,00010,000 16 ,62524,000 16 1,500

475,000 20654,000 20113,750 19154,000 19

Dependent Variabley1y2y1y2y1y2y1y2y1y2y1y2

SourceCorrected Model

Intercept

gruppe

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

Partial EtaSquared

Noncent.Parameter

ObservedPowera

Computed using alpha = ,05a.

R Squared = ,912 (Adjusted R Squared = ,896)b.

R Squared = ,844 (Adjusted R Squared = ,815)c.

So far it has only been determined that one or more groups are different. Therefore it is relevant to

take a closer look at these differences. In SPSS this is done by means of two by two comparisons,

but as mentioned in section 5.3.2.4, these comparisons are only carried out for each dependent

variable separately. As a consequence these two by two comparisons are only partly relevant and

will not be shown here.

As mentioned in the introduction of this section, a significant MANOVA means that at least one

discriminant function is significant as well. Therefore a discriminant analysis is very often carried

out following the MANOVA. Hereby it is possible to interpret the differences between the groups

by studying the discriminant functions. Please refer to chapter 6 about discriminant analysis.

42

Discriminant analysis

6. Discriminant analysis

The explanation and interpretation of discriminant analysis presented in this section is based on the following

literature:

- ”Videregående data-analyse med SPSS og AMOS”, Niels Blunch 1. edition 2000, Systime.

Chp. 2, p. 29 – 48, Chp. 3, p. 49-77.

- ”Analyse af markedsdata”, Niels Blunch 2. rev. edition 2000, Systime. Chp. 6, p. 209-220

6.1 Introduction

Discriminant analysis is used for studying the relationship between one nominal scaled dependent

variable and one or more interval scaled explanatory variables (see appendix 1). In other words

discriminant analysis is used instead of regression analysis when the dependent variable is

qualitative.

The main purpose of discriminant analysis is to find the function of the explanatory variables that

discriminates best between two or more groups that are made up of the dependent variable. After

this it is possible to:

- Classify and place the units of the analysis (the respondents, the observations, etc.) in

several previously defined groups.

- Assess the discrimination ability of the explanatory variables.

A distinction is made between simple and multiple discriminant analysis. The difference is that the

simple discriminant analysis only discriminates between two groups, whereas the multiple analysis

discriminates between three or more groups. The group relation for both types is decided through

discriminant functions. In the multiple discriminant analysis there are two or more of these

functions depending on the number of groups and variables.

In this representation it is assumed that the above-mentioned discriminant functions are linear. The

following is an example of the simple discriminant analysis.

As mentioned in the previous section regarding MANOVA, a discriminant analysis is often

conducted as an extension of a significant MANOVA test. In this case a discriminant analysis will

ease the interpretation of the MANOVA. However it should be mentioned that the discriminant

43


analysis can be performed even without a preceding MANOVA. The following analysis is carried

through under the assumption of a significant MANOVA.

6.2 Example

Data: \\okf-filesrv1\exemp\Spss\Manual\Diskriminantdata.sav

Salmon fishing is a very dear resource for both Alaska (USA) and Canada. Since it is a limited

resource it is very important that fishermen from both countries observe the fishing quotas. The

salmon have a remarkable lifecycle. They are born in freshwater and after approximately 1-2 years

they swim out into the ocean, i.e. into saltwater. After a couple of years in the ocean they return to

their place of birth to spawn and die. Before this withdrawal takes place the fishermen try to catch

the fish. With a view to regulate the catch, there is a desire to identify where the fish come from.

The salmon carries an interesting piece of information about their place of birth in the shape of

growth rings on their scale. These growth rings, which can be identified both for the time spent in

freshwater and in saltwater, are typically smaller on the Alaskan (salmon than on the Canadian

salmon.

On the basis of this information 50 salmons from Alaska and 50 salmons from Canada are

examined. The 100 salmon are examined for the following explanatory variables.

X1 = diameter of growth rings for the first year of the salmon’s life in freshwater (1/100 inches)

X2 = diameter of growth rings for the first year of the salmon’s life in saltwater (1/100 inches)

X3 = gender

Subsequently, a discriminant analysis based on the collected data is performed. The purpose is to

find a discriminant function, which is capable of classifying where the fish originates from, based

on the salmons’ growth rings.


The starting point of the discriminant analysis is shown below:

44


The following dialogue box now appears:

In this dialogue box the desired grouping variable must be marked in the left square and moved to

the Grouping Variable window by clicking the arrow. In this case the grouping variable is country.

The range is defined by clicking the Define Range button so that SPSS can calculate the number of

groups. In this example 1 is specified as minimum and 2 as maximum.

The two variables, Freshwater [x1] and Saltwater [x2], which have been moved to the

Independents window, are the measured variables on the basis of which the classification will be

carried out.

45


Furthermore, it must be specified whether the Stepwise method or the Independents together

method is desired. Using Stepwise method means that the variables that discriminate best are found

automatically. However, one must be aware of the possible problems with multicollinearity for the

explanatory variables. The Independents together method uses all the variables simultaneously for

the discrimination. In this example the latter method has been chosen, since both variables are to be

included in the discriminant function.

In the side of the dialogue box there is a number of buttons, which are used for specifying the

method that determines how the discriminant analysis is carried out. Subsequently, Statistics,

Classify and Save are described below. Method should only be activated if the Stepwise method has

been chosen, since this is where the specification of the criteria that apply when choosing the

independent variables takes place.

6.3.1 Statistics

This button is used for managing the estimation itself and defining which covariance matrices to

display in the output. The mean values for each group can be calculated here as well. Furthermore,

it is possible to carry out a test of whether pooling the covariance matrices for the two groups is

possible. The following dialogue box appears:

The purpose of the individual parts is described as follows:

By activating Means in Descriptives, a calculation of the mean and the standard deviation for each

group of independent variables and for the entire sample takes place.

46


Univariate ANOVAs tests the hypothesis that the mean value for each independent variable is the

same for each group, i.e. an analysis of variance.

Selecting Box’s M, results in a test of the hypothesis that covariance matrices for each group are

equal, i.e. if it is possible to pool the matrices.

In Function Coefficients it is specified, whether Fisher’s method or the unstandardized method is

used for the calculation of the classification coefficients.

In Matrices it is specified which matrices to include in the output.

In this example it has been chosen to have Means calculated and to carry out Box’s M test to see if

the covariance matrices can be pooled. The coefficients are calculated by means of Fisher’s

method, since an estimate of the discriminant function itself is desired. Furthermore, an output of

the covariance matrices for each group as well as a single within-groups covariance matrix has been

chosen.

6.3.2 Classify

In the following dialogue box the classification is managed:

The options are as follows:

• Prior Probabilities specify the a priori probability that forms the basis of the classification.

Either the prior probability for group membership can be determined from the observed

group size, or it can be equal for all groups.

47


• In Use Covariance Matrix3 it can be determined whether the pooled or the separate

covariance matrices should be used. This is normally not determined until Box’s M test has

been carried out.

• By activating Casewise results in Display the group membership will be displayed for each

observation. Selecting Summary table displays the result of the classification as well as a

table showing observed and expected group membership.

• In Plots various plots can be chosen.

In this example the prior probability will be equal for all groups and the estimation of the

discriminant function will be performed on the basis of separate covariance matrices, since there

is reason to believe that the differences between these are too great to justify a pooled matrix.

Furthermore, it has been chosen to display the classification result.

6.3.3 Save

In brief it is possible to save expected group membership, the probability for group membership and

the calculated discriminant scores. These values will be saved as new variables in the data set.

However, there will be no such new variables in this example.

6.4 Output

When the various settings in the dialogue boxes have been specified, SPSS performs the analysis.

The result is a number of outputs, of which the most relevant ones will be commented and

interpreted below.

Group Statistics

98,38 16,143 50 50,000429,66 37,404 50 50,000137,46 18,058 50 50,000366,62 29,887 50 50,000117,92 26,001 100 100,000398,14 46,240 100 100,000

ferskvandsaltvandferskvandsaltvandferskvandsaltvand

fødestedUSA

Canada

Total

Mean Std. Deviation Unweighted WeightedValid N (listwise)

3 This choice only has a consequence for the classification of the observations, and not for the calculation of the discriminant function, since equal covariance matrices are assumed within the groups. A violation of this assumption means that SPSS cannot perform the calculation.

48


(1) Group Statistics shows mean values, standard deviations and the number of observations for

both groups for each of the explanatory variables. Furthermore, results for the entire data set

are shown.

The figures below are (2) The pooled covariance matrix and (3) The covariance matrix for the two

groups. The two matrices appear to be significantly different from each other, so it is unlikely that

the estimation of the discriminant function can be based on the pooled covariance matrix. This is

verified by (4) Box’s M test below.

Pooled Within-Groups Matricesa

293,349 -27,294-27,294 1146,173

ferskvandferskvand saltvand

Covariancesaltvand

The covariance matrix has 98 degrees of freedom.a.

Covariance Matrices

260,608 -188,093-188,093 1399,086326,090 133,505133,505 893,261

ferskvandsaltvandferskvandsaltvand

fødestedUSA

Canada

ferskvand saltvand

(4) Box’s M test below rejects the null hypothesis of equal covariance matrices. This is the

argument behind the former statement about performing the analysis on the basis of separate

covariance matrices.

Test Results

10,9383,565

31728720

,013

Box's MApprox.df1df2Sig.

F

Tests null hypothesis of equal population covariance matrices.

In the part of the output called Summary of Canonical Discriminant Functions it is relevant to look

at (5) Wilks’ Lambda.

49


Wilks' Lambda

,321 110,223 2 ,000Test of Function(s)1

Wilks'Lambda Chi-square df Sig.

Wilks’ Lambda displays the capability of the discriminant function to discriminate. As the output

shows, the discriminant function is significant at all α-levels.

In (6) The unstandardized canonical Discriminant Function below, it appears that the mutual

relationship between the two variables is of interest to the analysis.

The discriminant function can be written as follows: D = 1.924 + 0.045 x1 – 0.018 x2.

The standardized discriminant function is displayed in (7) Standardized Canonical Discriminant

Function Coefficients. It has been calculated on the basis of a standardization of the explanatory

variables.

Standardized Canonical Discriminant Function Coefficients

,764-,611

ferskvandsaltvand

1Function

After this there are a number of outputs, which will not be commented here.

The part of the output called Classification contains relevant outputs, which indicate how good the

discriminant function is at discriminating. (8) Classification Function Coefficients displays Fisher’s

linear discriminant functions. These functions are calculated on the basis of the previously

mentioned unstandardized canonical discriminant function. From the output below it appears that

there is a function for both USA and Canada.

50


By looking at the difference between the two functions the following discriminant function can be

written as:

Classification Function Coefficients

,371 ,499,384 ,332

-101,377 -95,835

ferskvandsaltvand(Constant)

USA Canadafødested

Fisher's linear discriminant functions

Y = -5.54121 – 0.12839 x1 + 0.05194 x2

In this function the values for the thickness of the growth rings for the individual fish can be

substituted for x1 and x2. If the value of the function is positive, the fish is classified as American,

and if it is negative, the fish is classified as Canadian. If there are more than two groups, Fisher’s

linear functions are used separately for each group. After this the function values are calculated for

each function, and the units of the analysis are assigned to the group with the highest function

value.

(9) Classification results show the classification of the fish on the basis of the Y function above. It

appears that 93% of the fish are classified correctly by means of the function. Thus 7% of the fish

are classified incorrectly, which deserves a comment. In that connection it should be noted that the

probability of incorrectly classifying Canadian fish as American is lower than the other way around.

This is due to the fact that the standard deviation for American salmon is larger than for Canadian

salmon (see output (3). It is important to be aware of this matter when performing a linear

discriminant analysis.

Classification Resultsa

46 4 503 47 50

92,0 8,0 100,06,0 94,0 100,0

fødestedUSACanadaUSACanada

Count

%

OriginalUSA Canada

Predicted GroupMembership

Total

93,0% of original grouped cases correctly classified.a.

51


Another important matter to be aware of is that the discriminant function will often classify the

units of the analysis better than new units, since the function has been calculated from the sample

data. As a consequence, a verification of the discriminant function is often required before it is

used.

52

Profile analysis

7. Profile analysis

The explanation and interpretation of Profile Analysis is made based on the following literature:

- “SPSS, Advanced Models 10.0”, chp. 1 p. 1 – 14 and chp. 12 p. 95 – 108

7.1 Introduction

Profile analysis is useful for illustrating and visualizing a number of problems. The analysis is often

employed to compare averages in statistical models by the use of so-called profile plots. A profile

plot is a line in a co-ordinate system where every point corresponds to an average for an

independent variable (corrected for correlation) on a given factor level. A line can be illustrated for

every factor level. It is the comparison of these lines that is interesting.

It should be noted that there is a close connection between MANOVA and the analysis, which is

made on profile plots. This connection lies in the fact that there are more interval scaled dependent

variables and one or more nominal scaled explanatory variables.

7.2 Example

Data set: \\okf-filesrv1\exemp\Spss\Manual\Profildata.sav

The starting point for this example is the data set above. The data set is a result of a survey of the

way men and women experience their marriage. The following variables are registered in the data

set:

Dependent variables: q1 (contribution to the marriage)

q2 (the lapse of the marriage)

q3 (the degree of passion in the marriage)

q4 (the degree of friendship in the marriage)

Explanatory variables: Spouse (man/woman)

With the above-mentioned survey as a starting point the objective is to find out whether there is a

difference in the profiles that men and women create with reference to the dependent variables.

53

Profile analysis


The first thing to do is to plot the profiles in a co-ordinate system. It should be noted that it is

possible to choose profile plots in connection with GLM multivariate analyses (MANOVA).

However, this is not recommendable, since this approach does not allow for multiple plots. Instead

the following approach can be adopted:

This results in a dialogue box, which should be filled out as follows:

54

Profile analysis

This enables the user to plot several variables in the same graph as well as a display of summary

measures for the individual variables. After clicking Define the resulting dialogue box is filled out

as shown below:

Lines Represent specifies which lines are included in the graph. Category Axis specifies the

variable, which is used for categorization.

55

Profile analysis

To get the profiles plotted it is now necessary to get the questions displayed in accordance to the profiles and

not the other way around as is the current case. This is obtained by opening the graph editor by double

clicking on the graph and selecting properties in the edit menu. When this is done the explanatory variables

depiction is changed by selecting Group instead of X Axis. The questions will now be grouped by the

outcomes in the explanatory variable instead of the dependent variables. See below.

legendq4q3q2q1

Cou

nt

5

4

3

2

1

0

4,5334,633

4,1

3,833

4,4

4,333

3,967

3,9

WifeHusbandWifeHusband

Spouse

The profiles have now been plotted, and a visual comparison can be made. It appears that the

profiles are likely to be parallel and perhaps even congruent. Whether the profiles are horizontal is

more doubtful. The following sections address the following issues:

56

Profile analysis

• Are the profiles horizontal and parallel? (7.3.1) ⇒ Both genders attach equal weight to the

factors, but there is a difference in the level.

• Are the profiles congruent? (7.3.2) ⇒ Both genders attach weight to the factors.

• Are the profiles at the same level? (7.3.3) ⇒ Both genders find that all factors have equal

influence.

7.3.1 Are the profiles parallel?

The starting point for all three issues mentioned above is shown below:

The resulting dialogue box is filled out by moving the dependent variables to the Dependent

Variables window and the explanatory variable to the Fixed Factor(s) window as shown below:

57

Profile analysis

By clicking the Model button it is possible to specify the model for the current analysis

(parallelism). The dialogue box is filled out as follows:

After this Continue is selected.

In order to perform the current analysis it is necessary to change the method of the SPSS

programme slightly. This is done by means of the syntax window.

58

Profile analysis

Since the model has been almost completely set up by SPSS already, it is a good idea to make use

of that in the syntax. The conversion of the model to the syntax is done by clicking the Paste button.

After that two additional lines should be added as shown. LMATRIX defines the two levels to be

compared in the spouse variable. MMATRIX defines the C matrix. A thorough description of the

theory behind the used matrices is beyond the scope of this manual. Please refer to SPSS Advanced

Models, which is located at the IT instructors’ office in room H15 instead.

To make SPSS run the analysis the syntax should be marked (i.e. by dragging the mouse over it and

at the same time holding down the left mouse-button) and chosen from the menu bar.

7.3.1.1 Output

The procedure described above results in a number of outputs. One of those is an output known

from the MANOVA analysis (see section 7 for an interpretation of this). In the part of the output

named Custom Hypothesis Tests #1 an output is found, which indicates whether or not the null

hypothesis regarding parallelism is true. This is (1) Multivariate Test Results shown below.

From the output it appears that the null hypothesis is accepted at α = 0.05, since the p-value is

0.063. The conclusion is therefore that the profiles are parallel. However, this conclusion is

sensitive to the choice of the level of significance.

59

Profile analysis

Multivariate Test Results

,121 2,580a 3,000 56,000 ,063,879 2,580a 3,000 56,000 ,063,138 2,580a 3,000 56,000 ,063,138 2,580a 3,000 56,000 ,063

Pillai's traceWilks' lambdaHotelling's traceRoy's largest root

Value F Hypothesis df Error df Sig.

Exact statistica.

7.3.2 Are the profiles congruent?

For this analysis the previous syntax can be used again with minor changes. Instead of inserting the

C-matrix the 1-matrix should be inserted. The sum of the coefficients must be 1, which explains

why each has been multiplied by 0.25. The complete syntax is as follows:

The syntax is run in the same way as the previous.

7.3.2.1 Output

As in the previous section the relevant output is found in Custom Hypothesis Tests #1. This output

indicates whether the null hypothesis of congruent profiles is true. In this particular example SPSS

only uses the univariate result. This means that the correlation between the dependent variables is

not taken into account.

60

Profile analysis

Test Results

Transformed Variable: T1

2343,750 1 2343,750 1,533 ,22188687,500 58 1529,095

SourceContrastError

Sum ofSquares df Mean Square F Sig.

(2) Test Results shows that the null hypothesis of congruent profiles cannot be rejected at α = 0.05,

since the p-value is 0.221. Therefore, it can be concluded that the profiles are congruent.

7.3.3 Are the profiles horizontal?

For this analysis the C-matrix from section 7.3.1 is reinserted, and an intercept is added as shown

below:

The syntax is run in the same way. For this analysis to make any sense it is necessary for all

variables to be scaled similarly.

7.3.3.1 Output

From (3) Multivariate Test Results shown below it appears that the null hypothesis of horizontal

profiles can be rejected at α = 0.05, since the p-value is 0.000. As a consequence, it can be

concluded that the mean values for the four dependent variables are not equal.

61

Profile analysis

Multivariate Test Results

,305 8,188a 3,000 56,000 ,000,695 8,188a 3,000 56,000 ,000,439 8,188a 3,000 56,000 ,000,439 8,188a 3,000 56,000 ,000

Pillai's traceWilks' lambdaHotelling's traceRoy's largest root

Value F Hypothesis df Error df Sig.

Exact statistica.

62

Factor analysis

8. Factor analysis

The following presentation and interpretation of Factor Analysis of the results are based on:

- ”Videregående data-analyse med SPSS og AMOS”, Niels Blunch 1. udgave 2000, Systime.

Chp. 6, p. 124-155

- ”Analyse af markedsdata”, 2 rev. Udgave 2000, Systime. Chp. 3, p. 87-118

- Hair et.al. (2006): Multivariate Analysis 6th Ed., Pearson, kap. 3

8.1 Introduction

Factor analysis can be divided into component analysis as well as exploratory and confirmative factor

analysis. The three types of analysis can be used on the same data set and builds on different mathematical

models. Component analysis and exploratory factor analysis still produce relatively similar results. In this

manual only component analysis is described.

In component analysis the original variables are transformed into the same number of new variables, so-

called principal components, by linear transformation. The principal components have the characteristic that

they are uncorrelated, which is why they are suitable for further data processing such as regression analysis.

Furthermore, the principal components are calculated so that the first component carries the bulk of

information (explains most variance), the second component carries second-most information and so forth.

For this reason component analysis is often used to reduce the number of variables/components so that the

last components with the least information are disregarded. Henceforth the task is to discover what the new

components correspond to, which is exemplified below.

8.2 Example

Data: \\okf-filesrv1\Exemp\Spss\Manual\Faktoranalysedata.sav

The following example is based on an evaluation of a course at the Aarhus School of Business. 162 students

attending the lecture were asked to fill out a questionnaire containing various questions regarding the

assessment of the course. Each of the assessments were based on a five point Likert-scale, where 1 is “No,

not at all” and 5 is “Yes, absolutely”.

The questions in the survey were (label name in parenthesis):

Has this course met your expectations? (Met expectations)

63

Factor analysis

Was this course more difficult than your other courses? (More difficult than other courses)

Was there a reasonable relationship between the amount of time spent and the benefit derived? (Relationship

time/Benefit)

Have you found the course interesting? (Course Interesting)

Have you found the textbooks suitable for the course (Textbooks suitable)

Was the literature inspiring? (Literature inspiring)

Was the curriculum too extensive? (Curriculum too extensive)

Has your overall benefit from the course been good? (Overall benefit)

The original data consists of more assessments, but these have not been included in the example.

The purpose is now, based on the survey, to carry out a component analysis with a view to reduce the

number of assessment criteria in a smaller number of components. Furthermore, the new components should

be examined with a view to name them.


Component analysis is a method, which is used exclusively for uncovering latent factors from manifest

variables in a data set. Since these fewer factors usually form the basis of further analysis, the component

analysis is to be found in the following menu:

64

Factor analysis

This results in the dialogue box shown below. The variables to be included in the component analysis are

marked in the left-hand window, where all numeric variables in the data set are listed, and moved to the

Variables window by clicking the arrow. In this case all the variables are chosen.

It has now been specified, which variables SPSS should base the analysis on. However, a more definite

method for performing the component analysis has yet to be chosen. This is done by means of the

Descriptives, Extraction, Rotation, Scores and Options buttons. These are described individually below.

8.3.1 Descriptives

By clicking the Descriptives button the following dialogue box appears:

In short the purpose of the individual options is as follows:

Statistics

• Univariate descriptives includes the mean, standard deviation and the number of useful observations

for each variable.

• Initial solution includes initial communalities, eigenvalues, and the percentage of variance explained.

65

Factor analysis

Correlation Matrix

• Here it is possible to get information about the correlation matrix, among other thing the

appropriateness to perform factor analysis on the data set.

In this example Initial solution is chosen, because a display of the explained variance for the suggested

factors of the component analysis is desired. At the same time this is the most widely used method. Anti-

Image, KMO and Bartlett’s test of sphericity are checked as well to analyze the appropriateness of the data

analysis on the given data set.

With regards to the tests selected above, it may be useful to add a few comments. The Anti-Image provides

the negative values of the partial correlations between the variables. Anti-Image ought therefore to be low

indicating that the variables do not differ too much from the other variables. KMO and Bartlett provide as

previously mentioned measures for the appropriateness as well. As a rule of thumb one could say that KMO

ought to attain values of at least 0.5 and preferably above 0.7 to indicate that the data is suitable for a factor

analysis. Equivalently the Bartlett’s test should be significant, indicating that significant correlations exist

between the variables.

8.3.2 Extraction

By clicking the Extraction button the following dialogue box appears:

This is where the component analysis in itself is managed. In this example it has been chosen to use the

Principal components method for the component analysis. This is chosen in the Method drop-down-box.

Since the individual variables of this example are scaled very differently, it has been chosen to base the

analysis on the Correlation matrix, Cf. a standardization is carried out.

66

Factor analysis

A display of the un-rotated factor solution is wanted in order to compare this with the rotated solution.

Therefore, Unrotated factor solution is activated in Display.

Since the last components do not explain very much of the variance in a data set, it is standard practice to

ignore these. This results in a bit of lost information (variance) in the data set, but in return a more simple

output is obtained for further analysis. In addition, it makes the interpretation of the data easier.

The excluded components are treated as “noise” in the data set. The important question is just how many

components to exclude without causing too much loss of information. The following rules of thumb for

choosing the right number of components for the analysis apply:

• Scree plot: By selecting this option in Display a graphical illustration of the variance of the

components appears. A typical feature of this graph is a break on the curve. This curve break forms

the basis of a judgement of the right number of components to include.

• Kaiser’s criterion: Components with an eigenvalue of more than 1 are included. This can be

observed from the Total Variance Explained table or the scree plot shown in section 8.4. However,

these are only guidelines. The actual number of chosen factors is subjective, and it depends strongly

on the data set and the characteristics of the further analysis.

If the user wants to carry out the component analysis based on a specific number of factors, this number can

be specified in Number of factors in Extract. Last but not least it is possible to specify the maximum number

of iterations. Default is 25. This option is not relevant for this example, but for the Maximum Likelihood

method it could be relevant.

8.3.3 Rotation

Clicking the Rotation button results in the following dialogue box:

67

Factor analysis

In brief, rotation of the solution is a method, where the axes of the original solution are rotated in order to

obtain an easier interpretation of the found components. In other words it is assured that the individual

variables are highly correlated with a small proportion of the components, while being low correlated with

the remaining components.

In this example Varimax is chosen as the rotation method, since it ensures that the components of the rotated

solution are uncorrelated. The remaining methods will not be further described here.

A display of the rotated solution has been chosen in Display. Loading plots enables a display of the solution

in a three-dimensional plot of the first three components. If the solution consists of only two components, the

plot will be two-dimensional instead. With the 12 assessment criteria in this example, a three-dimensional

plot looks rather confusing, and it has therefore been ignored here.

8.3.4 Scores

Now the solution of the component analysis has been rotated, which should have resulted in a clearer picture

of the results. However, there are still two options to bear in mind before the analysis is carried out. By

selecting Scores it is possible to save the factor scores, which is sensible if they are to be used for other

analyses such as profile analysis, etc. These factor scores will be added to the data set as new variables with

default names provided by SPSS. In this example this option has not been chosen, as no further analysis

including these scores is to be performed.

8.3.5 Options

Treatment of missing values in the data set is managed in the Options dialogue box shown below:

68

Factor analysis

Missing values can be treated as follows:

• Exclude cases listwise excludes observations that have missing values for any of the variables.

• Exclude cases pairwise excludes observations with missing values for either or both of the pair of

variables in computing a specific statistic.

• Replace with mean replaces missing values with the variable mean.

In this example Exclude cases listwise has been chosen in order to exclude variables with missing values.

With regard to the output of the component analysis there are two options:

• Sorted by size sort’s factor loading and structure matrices so that variables with high loadings on the

same factor appear together. The loadings are sorted in descending order.

• Suppress absolute values less than makes it possible to control the output so that coefficients with

absolute values less than a specified value (between 0 and 1) are not shown. This option has no

effect on the analysis, but ensures a good overview of the variables in their respective factors. In this

analysis it has been chosen that no values below 0.1 is shown in the output

8.4 Output

When the various settings in the dialogue boxes have been specified, SPSS performs the component analysis.

After clicking OK a rather comprehensive output is produced, of which the most relevant outputs are

commented below.

KMO and Bartlett's Test

,791Kaiser-Meyer-Olkin Measure of SamplingAdequacy.

413,37728

,000

Approx. Chi-Squaredf

Bartlett's Test ofSphericity

Sig.

69

Factor analysis

Anti-image Matrices

,497 ,070 -,172 -,221 -,032 ,080 -,005 -,076,070 ,876 -,025 -,079 -,010 ,114 -,152 ,050

-,172 -,025 ,579 -,009 ,019 -,061 ,081 -,147

-,221 -,079 -,009 ,499 ,005 -,111 ,027 -,104

-,032 -,010 ,019 ,005 ,526 -,263 -,087 -,108,080 ,114 -,061 -,111 -,263 ,467 ,084 -,054

-,005 -,152 ,081 ,027 -,087 ,084 ,886 ,019-,076 ,050 -,147 -,104 -,108 -,054 ,019 ,486,763a ,106 -,321 -,443 -,063 ,166 -,008 -,155,106 ,728a -,036 -,120 -,015 ,178 -,172 ,077

-,321 -,036 ,843a -,017 ,035 -,117 ,114 -,278

-,443 -,120 -,017 ,806a

,010 -,231 ,041 -,211

-,063 -,015 ,035 ,010 ,746a -,531 -,127 -,214,166 ,178 -,117 -,231 -,531 ,735a ,130 -,114

-,008 -,172 ,114 ,041 -,127 ,130 ,751a ,029-,155 ,077 -,278 -,211 -,214 -,114 ,029 ,875a

Faget relevantFaget sværtTid ift. udbytteFaget spændende oginteressantEgnede lærebøgerInspirerende litteraturPensum for stortSamlede udbytteFaget relevantFaget sværtTid ift. udbytteFaget spændende oginteressantEgnede lærebøgerInspirerende litteraturPensum for stortSamlede udbytte

Anti-image Covariance

Anti-image Correlation

Faget relevant Faget svært Tid ift. udbytte

Fagetspændende

og interessantEgnede

lærebøgerInspirerende

litteraturPensumfor stort

Samledeudbytte

Measures of Sampling Adequacy(MSA)a.

SPSS has now generated the tables for the KMO and Bartlett’s test as well as the Anti-Image matrices. KMO

attains a value of 0.791, which apparently seems to satisfy the criteria mentioned above. Equivalently the

Bartlett’s test attains a probability value of 0.000. Similary high correlations are primarily found on the

diagonal of the Anti-Image matrix (marked with an a). This confirms out thesis of an underlying structure of

the variables.

Total Variance Explained

3,499 43,740 43,740 3,499 43,740 43,740 2,526 31,574 31,5741,114 13,922 57,662 1,114 13,922 57,662 1,853 23,167 54,7411,031 12,887 70,549 1,031 12,887 70,549 1,265 15,807 70,549,755 9,434 79,983,547 6,835 86,818,401 5,013 91,831,383 4,792 96,624,270 3,376 100,000

Component12345678

Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings

Extraction Method: Principal Component Analysis.

As can be seen from the output shown above, SPSS has produced the table Total Variance Explained, which

takes account of both the rotated and unrotated solution.

(1) Initial Eigenvalues displays the calculated eigenvalues as well as the explained and accumulated variance

for each of the 8 components.

(2a) Extraction Sums of Squared Loadings displays the components, which satisfy the criterion that has been

chosen in section Extraction (Kaiser’s criterion was chosen in section 8.3.2). In this case there are three

components with an eigenvalue above 1. These three components together explain 70.549% of the total

variation in the data. The individual contributions are 43.740%, 13.922%, 12.887% of the variation for

component 1, 2 and 3 respectively. These are the results for the un-rotated solution.

70

Factor analysis

In (2b) Rotation Sums of Squared Loadings similar information to that of (2a) can be found, except that these

are the results for the Varimax rotated solution. As shown the sum of the three variances is the same both

before and after the rotation. However, there has been a shift in the relationship between the three

components, as they contribute more equally to the variation in the rotated solution.

The Scree plot below is an illustration of the variance of the principal components:

Component Number87654321

Eig

enva

lue

4

3

2

1

0

Scree Plot

After the inclusion of the third component the factors no longer have eigenvalues above 1, and consequently

the curve flattens. Normally the Scree plot will exhibit a break on the curve, which confirms how many

components to include. The graphical depiction could therefore be better in the current example however it is

decided to include three components that satisfy the Kaisers Criteria in the further analysis. It is therefore

reasonable to treat the remaining components as “noise”. This agrees with the previous output of the

explained variance in the table Total Variance Explained.

71

Factor analysis

Component Matrixa

,816

,762 ,286 -,111

,730 -,257 ,430,723 ,330 -,328,722 ,187 -,278,672 ,592

-,340 ,710 -,331 ,547 ,540

Samlede udbytteFaget spændende oginteressantInspirerende litteraturFaget relevantTid ift. udbytteEgnede lærebøgerFaget sværtPensum for stort

1 2 3Component

Extraction Method: Principal Component Analysis.3 components extracted.a.

(3a) Component Matrix displays the principal component loadings for the un-rotated solution. This table

shows the coefficients of the variables in the un-rotated solution. For example it can be observed that the

correlation between the variable Overall benefit and component 1 is 0.829, i.e. very high. The un-rotated

solution does not form an optimal picture of the correlations, however. Therefore, it is a good idea to rotate

the solution in hope of clearer results.

(3b) Rotated Component Matrix displays the principal component loadings in the same way, but as the name

reveals this is for the rotated solution.

Rotated Component Matrixa

,854

,771 ,282

,764 ,153 -,161,669 ,462 -,123,226 ,871 ,261 ,819 -,214

-,226 ,799 -,303 ,731

Faget relevantFaget spændende oginteressantTid ift. udbytteSamlede udbytteEgnede lærebøgerInspirerende litteraturPensum for stortFaget svært

1 2 3Component

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization.

Rotation converged in 5 iterations.a.

As previously mentioned, the purpose of rotating the solution is to make some variables correlate highly with

one of the components – i.e. to enlarge the large coefficients (loadings) and to reduce the small coefficients

72

Factor analysis

(loadings). Whether a variable is highly or lowly correlated is subjective, but one rule of thumb is that

correlations below 0.4 are considered low.

By means of output (3b) Rotated Component Matrix it is possible to try to join the most similar assessment

criteria in three different groups:

• Component 1: For this group the variables Met expectations, Relationship time/Benefit, Course

Interesting and Overall benefit are important. Therefore it will be appropriate to categorize

component 1 as course benefit.

• Component 2: The variable Textbooks suitable and Literature inspiring are correlating highly with

component 2 and therefore it is named quality of the literature.

• Component 3: The variables More difficult than other courses and Curriculum too extensive all have

values above 0.4 compared to the third component. Thus could be named difficulty of the course.

This concludes the example. It is worth mentioning that if these results where to be used in later analyses, all

factor scores should be used. As mentioned, these scores can be calculated by selecting the property called

Scores. SPSS will then calculate a score for each respondent based on each of the components.

73

Multidimensional scaling (MDS)

9. Multidimensional scaling (MDS)

The explanation and interpretation of multidimensional scaling presented in this section is based on

the following literature:

• ”Analyse af markedsdata”, Niels Blunch 2. rev. edition 2000, Systime. chp. 4, p. 136-

157

• ”Marketing Research – An Applied Orientation”, Naresh K. Malhotra, 3rd edition 1999,

chp. 21, p. 633-646.

• “SPSS Categories® 10.0”, Jacqueline J. Meulman, Willem J. Heiser, SPSS Inc. 1999, chp. 1,

p. 13, chp. 13, p.193

• “Multidimensional skalering og conjoint analyse – et produktudviklingsværktøj”, Hans

Jørn Juhl, Internt undervisningsmateriale H nr. 151, Institut for informationsbehandling

1992, Handelshøjskolen i Århus. chp. 2 & 4.

9.1 Introduction

The purpose of multidimensional scaling is to find a structure in a number of objects or cases based

on the distances between them. This is done by placing the objects/units of the analysis in a

dimensional plane (normally two- or three-dimensional) based on a proximity-matrix. The objects

are placed in the plane based on information about the differences between them (the differences

may be disclosed by asking a number of respondents).

In the literature a distinction is made between two different types of multidimensional scaling:

Metric and non-metric MDS. For non-metric as well as metric MDS the starting point is a square

proximity matrix. By metric MDS it is a requirement that the data is on interval- or ratio-level

which means that the proximity matrix can be a covariance- or a correlation matrix. The non-metric

MDS is used when the data is ordinal or nominal scaled. An example could be a matrix of pair wise

comparisons or a transformation of rank values.

9.2 Example

Data set: \\okf-filesrv1\exemp\Spss\Manual\MDS.sav

74


In a test of beverages two respondents have been asked to compare ten different soft drinks. After

the consumption of the drinks in random order the respondents were asked to express their opinion4

about each single soft drink ranking them on a 10-point scale. Small values expressed a large degree

of homogeneity and large values expressed a small degree of homogeneity (=distance). The

following soft drinks were rated: Coca-Cola, Pepsi, Fanta, Afri-Cola, Sprite, Bluna, Sinalco, Club-

Cola, Red Bull and Mezzo-Mix5.

Because this example deals with ordinal data, a non-metric multidimensional scaling is carried out.


The multidimensional scaling is to be found in the following menu:

Following this action the following dialogue box appears:

4 Pair wise comparison 5 The survey is fictitious and is taken from the following publication: “Multidimensional Skalierung – Beispieldatei zur Datenanalyse, Lehrstuhl für empirische Wirtshafts- und Sozialforschung Fachbreich Wirtschaftswissenschaft, BUGH Wuppertal 2001. Bergische Universität Gesamthochschule Wuppertal.

75


In this dialogue box the desired variables for the multidimensional scaling should be marked in the

left-hand side window. The chosen variables are then transferred to the Variables window by

clicking the arrow next to this field (in this case all the variables should be transferred).

In this example the data have been entered in a quadratic and symmetrical matrix and are at this

point already distances. Therefore Data are distances is chosen and Shape is activated, which

results in the dialogue box shown below. It should be filled out as illustrated:

If the data set is not already in a proximity matrix this should be defined, because MDS requires the

input to be in the shape of proximity data. Because of this it is sometimes necessary to transform the

data, which can be done by choosing the option Create distances from data. (Please notice that this

option is not pictured here).

76


Hereby the data are expressed as distances. In the section Measure the method for transformation is

chosen. If the analysis is based on continuous data, Interval is chosen. If the analysis is based on

frequency data, Counts is chosen, and if it is based on binary data, Binary is chosen.

The method by which the distances should be calculated is chosen under each item. Normally the

Euclidean measure of distance is used, which is also the case in this example. The proximity matrix

is generated based on this measure of distance.

The menu item Transform Values provides the possibility of standardization before the calculation

of the proximity data. It is possible to specify the standardisation method in the drop-down menu.

However there are other crucial functions, which are to control the scaling itself. For this reason

Model and Options will subsequently be described separately.

9.3.1 Model

The activation of Model is done to specify the model. The dialogue box below will appear:

In this dialogue box there are a number of possibilities:

• Level of Measurement: Specification of scaling of the data. Under this item it is chosen to

calculate the distances on an ordinal level, since the data in this example are ordinal scaled.

• Conditionality: This item indicates which comparisons make sense. Matrix Conditionality

should be chosen when the calculated distances must be compared within each single

77


distance matrix. This is the case when the analysis is concerned with one matrix only (as in

this case) or when each matrix represents different areas. Row means that the calculated

distances are only compared in the same row of the matrix. This is relevant if the analysis is

concerned with asymmetrical or rectangular matrixes. Row cannot be chosen if the distance

matrix has been made for the data, e.g. by proximity data since these are quadratic and

symmetrical. Unconditional should be chosen if comparisons between various distance

matrices are needed.

• In Dimensions it should be indicated how many dimensions are needed for the analysis. If

only one solution is needed, minimum and maximum should have the same value.

• In Scaling Model the assumptions behind the scaling are indicated. Euclidian distance can

be used for all distance matrices, while Individual differences Euclidian distances (known as

INDSCAL) can only be used when the analysis deals with multiple distance matrices and

there are at least two dimensions.

In this case MDS should be performed with ordinal data in a two-dimensional solution. Since the

analysis only deals with one distance matrix, which is quadratic and symmetrical, matrix

conditionality and Euclidian distance model are chosen.

9.3.2 Options

Options is activated to define the actual output of the multidimensional scaling. Hereby the

following dialogue box will appear:

The decision regarding the contents of the output from the MDS-procedure is controlled under

Display. The following options are available:

78


• Group Plots produces a scatter plot of the linear fit between the model and the data material

as the most important feature.

• Individual subject plots creates a plot of the transformations of the data from a single area. A

requirement is that Matrix Conditionality and Ordinal data have been activated in the Model

dialogue box as described above.

• Data matrix includes both the raw data and the scaled data for each subject-area.

• Model and options summary describe the effect that the scaling-possibilities have on the

analysis.

Another important item is Criteria since it controls the scaling itself. Here a decision should be

made regarding when the iteration should stop based on criteria concerning badness-of-fit.

• S-stress convergence: S-stress is a badness-of-fit criterion, which indicates how well the

distance matrix is pictured in the two-dimensional space. In S-stress convergence the

smallest acceptable size of the change in badness-of-fit between the iterations is decided.

This means that the iteration procedure stops if the change equals the given value. This size,

of course, cannot be negative. The value should be as small as possible since smaller s-stress

values enhances precision, however this will also require more time for the calculations.

• Minimum s-stress value indicates the smallest acceptable badness-of-fit. The value should be

between 0 and 1, with 1 being the worst measure of fit.

• Maximum iterations indicate the number of iterations. MDS is conducted via an iteration

procedure by which the products in turn are placed in the coordinate system. It is therefore

possible to decide the maximum number of iterations. The default in SPSS is 30 iterations

but this number can be raised to get a more precise solution (again this will require more

time for the calculations).

Finally it is possible to ensure that distances of a certain size are treated as missing. This is done in

Treat distances less than __ as missing. This function has not been chosen in this example.

9.4 Output

The above actions result in a long and comprehensive output of data regarding the multidimensional

scaling. The most relevant parts will subsequently be commented on.

First some descriptive information about the procedure itself appears which will only be

commented briefly:

79


Alscal Procedure Options

Data Options-

Number of Rows (Observations/Matrix). 10

Number of Columns (Variables) . . . 10

Number of Matrices . . . . . . 1

Measurement Level . . . . . . . Ordinal

Data Matrix Shape . . . . . . . Symmetric

Type . . . . . . . . . . . Dissimilarity

Approach to Ties . . . . . . . Leave Tied

Conditionality . . . . . . . . Matrix

Data Cutoff at . . . . . . . . ,000000

The output above shows that the symmetrical matrix consists of ten rows and ten columns.

Furthermore, it can be seen that the data is of the ordinal scale of measurement.

A lot of descriptive output follows which will not be commented on.

A print-out of raw data matrices follows (in this case only one). This matrix is similar to the original

matrix (not displayed here).

The following warning draws attention to the fact that an increase of the scale level in the

multidimensional scaling has taken place (from ordinal to ratio).

>Warning # 14654

>The total number of parameters being estimated (the number of stimulus

>coordinates plus the number of weights, if any) is large relative to the

>number of data values in your data matrix. The results may not be reliable

>since there may not be enough data to precisely estimate the values of the

>parameters. You should reduce the number of parameters (e.g. request

>fewer dimensions) or increase the number of observations.

>Number of parameters is 20. Number of data values is 45

80


If the analysis has more than ten objects in a two-dimensional scaling this will no longer appear.

Then follows a description of the iteration procedure:

Iteration history for the 2 dimensional solution (in squared distances)

Young's S-stress formula 1 is used.

Iteration S-stress Improvement

1 ,02813

2 ,02541 ,00272

3 ,02459 ,00082

Iterations stopped because

S-stress improvement is less than ,001000

As the output illustrates, no more than three iterations were made before the search process stopped.

The reason for this is that the s-stress improvement was less than the specified value of 0.001.

Furthermore the development in badness-of-fit (s-stress) can be compared to the improvement from

iteration to iteration.

Finally the first result of the multidimensional scaling follows:

Stress and squared correlation (RSQ) in distances

RSQ values are the proportion of variance of the scaled data (disparities)

in the partition (row, matrix, or entire data) which

is accounted for by their corresponding distances.

Stress values are Kruskal's stress formula 1.

For matrix

Stress = ,02179 RSQ = ,99707

Here the s-stress value, which should be less than 0.1 according to Kruskal’s assumption, can be

read. This assumption is fulfilled and the conclusion is that it has been possible to illustrate the

distance matrix in a two-dimensional space with a very high degree of precision. The RSQ (R-

81


Square = R2), which has a value of 0.99707, is the traditional expression for the degree of

explanation; in this case it is quite large and accordingly very satisfactory.

The next output indicates the set of co-ordinates in the two dimensions for each soft drink.

Configuration derived in 2 dimensions

Stimulus Coordinates

Dimension

Stimulus Stimulus 1 2

Number Name

1 cocacola 1,0732 -,4818

2 pepsi ,7125 -,3833

3 fanta -1,3131 -,2381

4 afri ,6451 ,8952

5 sprite -,0711 -1,4253

6 bluna -1,7408 -,0561

7 sinalco -1,7354 ,5402

8 club ,3461 1,1895

9 redbull 2,2358 ,1069

10 mezzo -,1523 -,1472

These co-ordinates are used to present the configuration of the objects graphically (see (1) Derived

Stimulus Configuration below). The co-ordinates may be transformed to other programs if a

different graphical representation, different from the one SPSS offers, is needed.

Then follows a matrix with the transformed data (the optimally scaled data):

82


Optimally scaled data (disparities) for subject 1

1 2 3 4 5

1 ,000

2 ,471 ,000

3 2,424 2,031 ,000

4 1,490 1,274 2,194 ,000

5 1,490 1,274 1,664 2,424 ,000

6 2,808 2,424 ,471 2,602 2,194

7 2,989 2,602 ,891 2,424 2,602

8 1,799 1,664 2,194 ,471 2,602

9 1,274 1,490 3,566 1,799 2,808

10 1,274 ,891 1,274 1,274 1,274

6 7 8 9 10

6 ,000

7 ,471 ,000

8 2,424 2,194 ,000

9 3,987 3,987 2,194 ,000

10 1,664 1,664 1,490 2,424 ,000

With this the text based output ends and the graphical output follows. The first thing that is

presented is the result of the multidimensional scaling in two dimensions:

83


Dimension 13210-1-2

Dim

ensi

on

2

1,5

1,0

0,5

0,0

-0,5

-1,0

-1,5

mezzo

redbull

club

sinalco

bluna

sprite

afri

fantapepsi

cocacola

Derived Stimulus Configuration

Euclidean distance model

This plot gives a visual impression of the position of the soft drinks in relation to each other, based

on the two respondents’ evaluation.

The two respondents’ opinion of the ten soft drinks can be divided into six categories:

1. Orange lemonade (Sinalco, Bluna and Fanta)

2. Lemon lemonade (Sprite)

3. Cola-Fanta-mix (mezzo-mix)

4. Branded Cola (Coca-Cola and Pepsi)

5. Regional brands (Club-Cola and Afri-Cola)

6. Energy drinks (Red Bull)

It seems that the further to the right in the co-ordinate system the observation is located, the larger

the caffeine content in the drinks. The drinks to the left contain little or no caffeine.

84


The vertical plane indicates facts about the brands of the drinks. The known brands are further south

in the graph (diagram) while the lesser-known brands are situated further north.

The last scatter plots serve the purpose of showing how good the multidimensional scaling is.

Disparities43210

Dis

tan

ces

4

3

2

1

0

Scatterplot of Linear Fit

Euclidean distance model

This scatter plot shows the linear fit. In this case Disparities has been plotted against Distances. The

closer these points are around the sloping line through the origin, the larger the degree of

explanation.

85

Conjoint Analysis

10. Conjoint Analysis

This chapter’s explanation and interpretation of Conjoint Analysis in SPSS is based on the


- ”Analyse af markedsdata”, Niels Blunch 2. rev. edition 2000, Systime.

- ”Multidimensional skalering og conjoint analyse- et produktudviklingsværktøj”, Internt

undervisningsmateriale H nr. 151, Chp. 3, p. 40-59, 68-73

- “Multivariate Dataanalysis”, Hair, Anderson and others, fifth edition, Prentice Hall

10.1 Introduction

Conjoint analysis is a statistical method especially used when conducting market analysis. The

analysis can help explain consumer preferences for (ranking of) existing or new products/product

concepts. A side from market analysis conjoint analysis can also be used for a number of other

analyses, such as cost-benefit analysis, segmentation, etc. For a more detailed explanation of these

analyses please refer to the above mentioned literature.

The purpose with the analysis in this chapter is to determine a company’s optimal product design. This is

done by a study of how much a respondent is willing to surrender of one attribute in order to receive more of

another.

An important part of the conjoint analysis is the experiment plan where each respondent ranks the presented

product design based on the combination of attributes. However due to the number of attribute combinations

this will often prove to be a difficult task. Therefore an easier method is available in SPSS, here the

respondent is exposed to a smaller number of attribute combinations, and based on these combinations it is

possible to determine the importance of each attribute.

10.2 Example

Data: Blank dataset (see example)

\\okf-filesrv1\exemp\Spss\Manual\Conjoint_orthogonaldesign.sav

\\okf-filesrv1\exemp\Spss\Manual\Conjointdata.sav

A company is interested in marketing their new carpet and furniture cleaning product. The

management has identifies 5 attributes which are believed to influence consumer preferences these

are, package design (package), brand, price, seal type (seal), and money back guarantee if

unsatisfactory (money).

86

Conjoint Analysis

In the table below it is shown how many levels each attribute has.

Attribute Level 1 Level 2 Level 3

Package A B C

Brand K2R Glory Bisset

Price $ 1,19 $ 1,39 $ 1,59

Seal No Yes

Money No Yes

The company is now looking to perform a conjoint analysis in order to examine how much value

the consumer assigns each attribute’s different levels and which attributes the consumer finds most

important. Therefore an analysis is conducted where 10 respondents rank a number of product

concepts based on their preferences for each concept. However, a full factorial design would require

that the respondents should rank 3x3x3x2x2 = 108 different alternatives. In order not to confuse the

respondents and keep costs to costs to a minimum a simplified examination is therefore conducted

at first.

The entire example basically consists of two parts. First, the reduced number of combinations which

the respondent has to asses is determined. Then based on this result the importance of each attribute

is determined.

10.3 Generating an orthogonal design

SPSS can help you generate a simplified analysis plan, this way the 108 product concepts are

reduced to a more reasonable number. In spite of being reduced a large part of the informational

value is still present6. In order to start this procedure an empty data set is opened in SPSS. Below

the full procedure is explained.

6 This function assumes that there are no interactions.

87

Conjoint Analysis

After Orthogonal Design and Generate… has been activated, following dialogue box appears, this is where

the definition of the design is carried out.

88

Conjoint Analysis

In the Factor Name you have to specify each variable for the orthogonal design independently.

Furthermore a label name has to be specified for each variable since these will be used on the

generated plancards which will be shown to the respondents. E.g. the package variable is defined

with a label as package design. When this has been done click add, and the variable is transferred to

the large area. This is done for all variables.

In the Data File area you can either choose to save the new dataset on the computer or choose

Replace working data file which will make the generated data appear in the current dataset.

After this the levels for each variable has to be specified. Each factor has to be selected one at a

time and then click Define Values. Doing so activates the following box:

As can be seen from the figure a level has to be specified for each variable. The figure above shows

the defined values for the variable Brand. It is also possible to specify labels for each of the levels,

it is recommended that this is done. Furthermore, it is possible to make SPSS generate a number of

levels using Auto-Fill. However you still have to specify the labels for each level yourself.

When this is done for all the variables the actual design function must be established. This is done

by activating options in the main dialogue box.

89

Conjoint Analysis

10.3.1 Options

In the Option dialogue box the number of wanted variables are specified. Since the company wished

to test 18 designs this means that 18 different designs have to be picked from the possible 108.

Thus, the number 18 is stated in the Minimum number of cases to generate. If nothing is specified

SPSS will automatically generate the minimum number of cases which are necessary to calculate

the value of each of the attributes. The more different designs that are generated by SPSS – and

assessed by the respondents – the better the results will be but at the same time the costs will also

increase and at one point you can reach so many different designs that the validity of each

respondent’s answer will drop.

Holdout Cases are set to 4, which mean that 4 extra designs are generated. These will

correspondingly be assessed by the respondents but not used to the estimation of utilities (the value

of each of the attributes). The Holdout Cases are used to check the validity of the analysis.

The procedure is now ready to be started therefore OK is clicked on in the main dialogue box and the design

will be generated.

10.4 PlanCards

The above standing implies that 22 plancards are generated, which respondents later will have to

asses. Furthermore, 2 so-called simulationcards are also added, these simulationcards can either be

existing or expected products which then will be assessed based on the results of the test. The

simulationcards has to be created manually in SPSS a long with the specification of the attributes.

This mentioned dataset can be found on:

\\okf-filesrv1\exemp\spss\manual\conjoint_orthogonaldesign.sav

90

Conjoint Analysis

These plancards must then be moved to the output window. This procedure is shown below:

By doing this the dialogue box shown below appears here it is specified which variables that are

included in these plancards.

In this case we wish that the respondents have to asses all the different variables which mean that

they all are selected. In addition, listing for experimenter and profiles for subjects are activated.

Below examples of two generated plancards are shown.

91

Conjoint Analysis

Profile Number 1

1 A Glory $1,39 Yes NoCard ID

Packagedesign Brand name Price

GoodHousekeeping seal

Money-backgurantee

Profile Number 2

2 B K2R $1,19 No NoCard ID

Packagedesign Brand name Price

GoodHousekeeping seal

Money-backgurantee

10.5 Conjoint analysis

The generated plancards are now used in connection with the analysis where a number of

respondents have to asses the different designs. There are different ways the respondents can be

asked to asses the designs. In this example each respondent has ranked all of the 22 different (18+4)

different plancards from 1 to 22. The higher the rank value the higher the respondent’s preference

for this combination.

For this illustration 10 respondents have been used. However, this is clearly below what is normally

recommended.

The result can be seen here:

\\okf-filesrv1\exemp\Spss\Manual\Conjointdata.sav

92

Conjoint Analysis

In SPSS it is not possible to perform a conjoint analysis by the normal point and click version,

therefore it is necessary to use the syntax. The syntax code for the conjoint analysis is shown below:

Furthermore, in SPSS it is possible to specify a relationship between single variables and the

respondents’ rankings, if one is believed to be present. In the above standing window this is

described as e.g. discrete and linear (less / more specifies whether the relationship is believed to be

positive or negative. Further explanation of the code is available in the SPSS help menu. Choose

Help\topics and search for “Conjoint”, and the following command syntax will show:

93

Conjoint Analysis

In order to run the code from the syntax you choose the Run menu and the All. SPSS then performs

the conjoint analysis and the result can be seen in the output window

10.6 Output

In the output the results for each respondent is available as well as an average result of all the

respondents. The output of the average result is shown below in (1) ) Overall Statistics. Initially

only the results for the first 2 respondents are shown, however if the output window is expanded at

the bottom then all results are available.

Importance Values

35,63514,91129,41011,172

8,872

packagebrandpricesealmoney

Averaged Importance Score

The Averaged importance column shows each attributes’ importance to the respondent. The higher

the score the more important is the specific attribute to the consumers. Based on the output below it

can be seen that the average respondent regards Package Design and Price as the most important

94

Conjoint Analysis

attributes. After those come Brand, Good Housekeeping seal and least important is Moneyback

Guarantee.

Utilities

-2,233 ,1921,867 ,192

,367 ,192,367 ,192

-,350 ,192-,017 ,192

-1,108 ,166-2,217 ,332-3,325 ,4982,000 ,2874,000 ,5751,250 ,287

ABC

package

K2RGloryBissell

brand

$1,19$1,39$1,59

price

NoYes

seal

No2,500 ,5757,383 ,650

Yesmoney

(Constant)

UtilityEstimate Std. Error

Coefficients

-1,108price2,000seal

money 1,250

EstimateB

Utility expresses the utility values for the variables of each attribute. These values are calculated in

such a way that if they are added together, with respect to each of the attribute combinations, the

product that receives the highest score will also be the product that has received the highest rank.

An example of how to calculate the utility for a product concept is shown below (Package = B,

Brand Name = K2R, Price = $1,19, Good Housekeeping seal = No og Money-back guarantee =

No):

1,8667 + 0,3667 – 1,1083 + 2 + 1,25 = 4,3004

It is these calculated utility values the company can employ as a decision basis for the development

of their new product. This however, requires a closer study of the output which will not be

described here.

Factor is mini plots of the utility values for the variables of each attributes. They basically show the

same as the plots that can be found at the bottom of the output. Based on the Factor plots one can

see that B is the preferred Package Design and that $1,19 is the preferred price.

These results must of course be seen in relation with the fact that the respondents gave the highest

grades to the most preferred product combinations. This means that the higher the utility values are,

the better.

95

Correlationsa

,982 ,000,892 ,000,667 ,087

Pearson's RKendall's tauKendall's tau for Holdouts

Value Sig.

Correlations between observed andestimated preferences

a.

Pearson’s R and Kendall’s tau, which are specified after each of the respondents values, indicate how good

the model is. Both are calculated as correlations between the respondents’ observed and expected preferences

and therefore these should yield as high a value as possible.

By changing the syntax /PRINT=ALL to /PRINT=SUMMARYONLY the following output can be obtained.

(2) Preference Probabilities of Simulations:

Preference Probabilities of Simulationsb

23 30,0% 43,1% 30,9%24 70,0% 56,9% 69,1%

Card Number12

IDMaximum

UtilityaBradley-

Terry-Luce Logit

Including tied simulationsa.

y out of x subjects are used in the Bradley-Terry-Luce and Logitmethods because these subjects have all nonnegative scores.

b.

(2) Preference Probabilities of Simulations shows the probabilities of choosing a certain profile as

the preferred one by the help of three different probability calculations. The two profiles which have

been assessed here are the simulation designs which were generated earlier; these could be either

current or newly planned products.

The three probability models which are applied are the maximum utility model, the BTL (Bradley-

Terry-Luce) model and the logit model. The probabilities are calculated based on the two earlier

mentioned simulation cards. It can be seen that all three models indicates that card number 24 is

preferred.

96

Item analysis

11. Item analysis

The explanation and interpretation of multidimensional scaling presented in this section is based on

the following literature:

- SPSS Base 10.0 User’s guide, SPSS. Ch. 34 page 407-411.

11.1 Introduction

The item analysis is often used for evaluating questions for questionnaires. The analysis is used to

confirm the choice of a group of questions, which added together, is presumed to measure a

predefined concept.

The analysis can be used for several purposes. One of the purposes is to evaluate each question. In

this connection the focus is on whether questions can be left out of an analysis, perhaps because the

questions do not measure what they were intended to measure.

Furthermore, it is possible to examine if the answers should be turned around by making a so-called

negation. For example, an answer with the value 1 (on a scale of 1 to 5) might be turned around and

be assigned a value of 5 instead. This can be very important for the further analysis. The item

analysis is often used before the factor analysis (see chapter 8) or AMOS models (See also “Manual

for AMOS 6.0” from ITA).

11.2 Example

Data set: \\okf-filesrv1\exemp\Spss\Manual\Itemanalysedata.sav

This example is based on questionnaires from the lectures, which are filled out by the students at the

Aarhus School of Business every semester. Each questionnaire contains 19 questions. The students

are asked to evaluate the subject itself; pedagogical competence and his/her own performance. The

data is the same as what was used in the Factor Analysis (chapter 8). The answer of each single

question is indicated on a Likert-scale from 1 to 5 where:

• 1 is: No, not at all

• 3 is: Acceptable

• 5 is: Yes, absolutely

97

Item analysis

In this example of the item analysis, the purpose is to assess to what extent the first eight questions

are relevant for the evaluation of the subject and to what extent questions can be left out.

Furthermore it is examined whether it is relevant to make so-called negations on the eight questions.


In SPSS terminology item analysis is called Reliability Analysis. It is not possible to look for the

word Item in the online help-function.

The following choices are made under Analyze:

This results in the following dialogue box:

98

Item analysis

In this dialogue box the following variables (questions) are chosen in the window to the left and

moved to the right-hand side window by clicking on the arrow:

Relevant, svrt, tid_udb, spn_int, lit_egn, lit_insp, pensum, udbytte

There are now two places of specification. First a Model must be chosen and subsequently it is

possible to define the output under Statistics.

11.3.1 Model

Model gives the following possibilities of reliability models:

• Alpha (Cronbach): This is a model of internal consistency, based on the average inter-item

correlation.

• Split-half: This model splits the scale into two parts and examines the correlation between

the parts.

• Guttman: This model computes Guttman’s lower bounds for true reliability.

• Parallel: This model assumes that all items have equal variances and equal error variances

across replications.

• Strict parallel: This model makes the assumptions of the parallel model and also assumes

equal means across items.

In this example Alpha has been chosen as model.

11.3.2 Statistics

By clicking the Statistics button, the following dialogue box appears. The boxes that are activated

indicate the chosen outputs:

99

Item analysis

Descriptives for provides the opportunity to display the mean and the standard deviation for item

and scale. In this case items and scale have been chosen.

Scale if item deleted indicates how much it is possible to raise the scale (Chonbachs α) if the item is

removed from the analysis.

Inter-item provides the opportunity to display the correlation matrix and/or the covariance matrix

between all items.

Summaries:

• Means: Summary statistics for item means. The smallest, largest, and average item means,

the range and variance of item means, and the ratio of the largest to the smallest item means

are displayed.

• Variances: The same as for means, only for the variance instead.

• Covariances: The same as for means, only for the covariances between items.

• Correlations: The same as for means, only for the correlations instead.

ANOVA Table:

100

Item analysis

Here it is possible to carry out a repeated-measurement analysis of the variance under F-test. The

Friedman chi-square test is appropriate for rank data. The Cochran chi-square is appropriate for

data that is dichotomous.

The following possibilities are also available:

• Hotelling’s T-square: A multivariate test of the null hypothesis that all items on the scale

have the same mean.

• Tukey’s test of additivity: Estimates the power to which a scale must be raised to achieve

additivity. Tests the assumption that there is no multiplicative interaction among the items.

• Intraclass correlation coefficient: Produces single and average measure intraclass

correlation coefficients, along with a confidence interval, F statistic, and significance value

for each.

• Model: Here, the model for the computation of the intraclass correlation coefficients is

specified. Select two-way mixed when people effects are random and the item effects are

fixed. Select two-way random when people effects and the item effects are random, and one-

way random when people effects are random.

• Type: Here the definition of the intraclass correlation coefficient is chosen.

• Confidence interval: Here the desired confidence interval is indicated.

• Test value: The value with which an estimate of the intraclass correlation coefficient is

compared. The value should be between 0 and 1.

11.4 Output

Below, the relevant output is chosen and commented on. The first output is (1) Correlation matrix,

which shows the correlation between the responses on the eight questions.

Inter-Item Correlation Matrix

1,000 -,155 ,399 ,427 ,305 ,333 -,187 ,447-,155 1,000 -,276 -,267 -,139 -,144 ,313 -,248,399 -,276 1,000 ,404 ,318 ,232 -,232 ,474,427 -,267 ,404 1,000 ,346 ,438 -,057 ,443,305 -,139 ,318 ,346 1,000 ,547 -,214 ,447,333 -,144 ,232 ,438 ,547 1,000 -,086 ,332

-,187 ,313 -,232 -,057 -,214 -,086 1,000 -,193,447 -,248 ,474 ,443 ,447 ,332 -,193 1,000

relevantsvrttid_udbspn_intlit_egnlit_insppensumudbytte

relevant svrt tid_udb spn_int lit_egn lit_insp pensum udbytte

101

Item analysis

This output shows that DIFFICUL (the degree of difficulty) and CURRICUL (the extent of the

curriculum) correlate negatively with the remainder of the variables. For this reason a so-called

negation is made, where the scale is turned around. This is done by creating two new variables

(choose Transform – Compute):

• NY_SVRT = 6 – SVRT

• NY_PENSUM = 6 – PENSUM

The negation takes place by calculating the new variables as: N+1-the former value, where N is the

number of possible answers. After the new variables have been calculated the analysis is run once

more with the two new variables but without the variables svrt and pensum. The method is the same

as the previously mentioned and will not be repeated here. The new (2) Correlation Matrix looks as

follows:

Inter-Item Correlation Matrix

1,000 ,399 ,427 ,305 ,333 ,447 ,155 ,187relevant tid_udb spn_int lit_egn lit_insp udbytte ny_svrt Ny_PENSUM

relevant,399 1,000 ,404 ,318 ,232 ,474 ,276 ,232tid_udb,427 ,404 1,000 ,346 ,438 ,443 ,267 ,057spn_int,305 ,318 ,346 1,000 ,547 ,447 ,139 ,214lit_egn,333 ,232 ,438 ,547 1,000 ,332 ,144 ,086lit_insp,447 ,474 ,443 ,447 ,332 1,000 ,248 ,193udbytte,155 ,276 ,267 ,139 ,144 ,248 1,000 ,313ny_svrt

Ny_PENSUM ,187 ,232 ,057 ,214 ,086 ,193 ,313 1,000

The negative correlations are now positive.

Then follows (3) Item-total statistics. This output shows Cronbach’s alpha in case the

items/questions were removed.

Item-Total Statistics

19,7429 12,020 ,511 ,311 ,74120,0810 12,792 ,534 ,326 ,74019,9667 11,879 ,546 ,377 ,73520,1048 12,104 ,526 ,399 ,73920,6429 12,652 ,488 ,380 ,74620,1095 12,404 ,600 ,398 ,72921,2905 13,403 ,334 ,183 ,77120,5286 13,772 ,275 ,167 ,780

relevanttid_udbspn_intlit_egnlit_inspudbytteny_svrtNy_PENSUM

Scale Mean ifItem Deleted

ScaleVariance if

Item Deleted

CorrectedItem-TotalCorrelation

SquaredMultiple

Correlation

Cronbach'sAlpha if Item

Deleted

Output (3) must be compared to Cronbach’s alpha for the analysis, which is mentioned in the output

below (4) Reliability Coefficients.

102

Item analysis

Reliability Statistics

,773 ,774 8

Cronbach'sAlpha

Cronbach'sAlpha Based

onStandardized

Items N of Items

By comparing output (3) and (4) it can be seen that Cronbach’s alpha climbs from 0.773 to 0.780 if

NY_PENSUM is removed from the analysis. Hence, a decision should be made whether this

question contributes to the analysis. The rule of thumb is that items should be removed if

Cronbach’s alpha increases. Furthermore, Cronbachs alpha is in this case higher than 0.7, which it

should be. It is also noted that Cronbach’s alpha increases with the number of items.

103

Introduction to Matrix algebra

12. Introduction to Matrix algebra

The purpose of this introduction to matrix algebra in SPSS is to provide the initial insight needed to

become familiar with the programming language. The matrix programming language contains many

twists and turns and you will therefore not become an expert by following this introduction. The

introduction primarily contains matrix algebraic operations and will be strongly inspired by the

Cand.merc subjects “Applied Econometric Methods” as well as “Applied Quantitative Methods”.

The complexities encountered when using interactive matrix programming are NOT associated with

identifying and understanding applied syntax commandos, since this is relatively simple. On the

other hand it can be quite difficult to gain an overview of the vast possibilities that the program

provides access to. Often it is not difficult to – through theoretical considerations - to determine

which matrices one desires to compute, but it can be a very problematic task to determine which

algebraic operators and functions are needed to achieve the desired computation. Even so all these

difficulties become smaller and smaller by repeated use of the program.

12.1 Initialization

In order to work with matrix algebra in SPSS it is necessary to work in the syntax environment.

This is done by opening a syntax file under the menu - FIlE – NEW – SYNTAX. It is then possible

to initialize the recognition of matrix programming code by writing the following syntax command:

MATRIX.

It is important to remember that matrix programming code must be executed with the following

syntax statement:

END MATRIX.

In other words the matrix algebra must be written between these 2 commands.

104


12.2 Importing variables into matrices

Often it will be necessary for you to import an existing dataset into matrices. For this purpose you

can use the following syntax:

GET X

/VARIABLES = Variable1, Variable2

/NAMES = VARNAMES .

Above, SPSS reads in all the observations from the open dataset into matrix X. The variable

selection is carried out by listing the variable names after the equal sign, separating the variable

names by commas. The selected variables will hence be imported as column vectors into the matrix

X.

12.3 Matrix computation

The calculations in the matrix language can be carried out on common scalar values or it is possible

to capitalize on the advantages of using matrix operators to save time and space. The matrix

language establishes scalar values as matrices of the dimension 1x1.

The establishment of scalar values is carried out by using the following command:

Compute A = 2.

Establishing a row vector:

Compute B = {1,2, 3}.

Establishing a column vector:

105


Compute C = {4;5;6}.

Finally the establishment of matrices:

Compute D = {1,2,3; 4,5,6; 7,8,9}.

106


Or for an improved overview:

Compute E ={4,2,3;

6,8,6;

1,3,9}.

If you are interested in seeing the matrices that have been computed you can use the following

code:

Print E.

This prints the matrix E to the output. One must be observant of the fact that it is not possible to list

matrices in the print commando unless you are in the process of printing matrices that have an equal

number of columns or rows AND you employ brackets to embrace the matrices which should be

separated by commas, as shown here:

Print {B,E}.

This prints a matrix composed of the two temporarily merged matrices. To summarize the following

code and corresponding output exemplifies the above mentioned syntax commands:

Example 1 Full syntax file.

107


MATRIX .

Compute A = 2.

Compute B = {1,2, 3}.

Compute C ={4;5;6}.

Compute D = {1,2,3; 4,5,6; 7,8,9}.

Compute E = {4,2,3;

6,8,6;

1,3,9}.

Print A.

Print B.

Print C.

Print {A,B}.

Print {B;D}.

END MATRIX .

108


Example 1 Output

12.4 General Matrix operations

The following table provides an overview of the different operators that are applied in order to carry

out matrix operations.

Symbol Operator

−

Changes the sign of the matrix. A minus sign alone in front of a matrix

will invert the sign of each element in the matrix.

+ Matrix addition.

− Matrix subtraction.

* Multiplication.

/ Division.

**

Matrix exponentiation. The matrix is multiplied by itself, as many times

as the number it is raised to the power of.

&* Elementwise multiplikation.

109


&/ Elementwise division.

&** Elementwise exponentiation.

110


Scalarvalues and matrices can hence be combined base don conventional arithmetical operations.

This can be seen by adding the following syntax to the before mentioned matrix syntax (remember

that the code must be between the MATRIX. and END MATRIX. statements):

Compute A = B+4.

Compute F = B-4.

Compute G= C*4.

Compute H= D/4.

Print A.

Print F.

Print G.

Print H.

By running (RUN - ALL) the new syntax code it is possible to get the following output:

A 5 6 7 F -3 -2 -1 G

16 20

24 H ,250000000 ,500000000 ,750000000 1,000000000 1,250000000 1,500000000 1,750000000 2,000000000 2,250000000

Note that the matrix A no longer has the value 2.

Two matrices can only be added if they are of the same order:

111


Compute I = D+E.

Print I.

By adding this code you will add the following to your new output document:

The same rules apply for subtraction. As seen in the matrix operator table it is possible to change

sign of the elements in the matrix by using a minus sign, as seen here:

Compute J = -I.

Multiplication can only be carried out as long the matrices conform.

Compute K = B*D.

Print K.

In this case the dimensions of B are 1x3 while D has 3x3 meaning that J will become a 1x3 matrix:

The joining of two matrices can be done either horizontally or vertically, but once again it is

required that the matrices conform to each other.

Compute L={D,C}. /* Horisontalt, 3x3 matrice med 3x1 søjlevektor.*/

Print L.

I 5 4 6 10 13 12 8 11 18

K 30 36 42

112


Compute M={D;B}. /* Vertikalt, 3x3 matrice med 1x3 rækkevektor.*/

Print M.

The result of these mergers are:

M 1 2 3

L 4 5 6 1 2 3 4 7 8 9

1 2 3 4 5 6 5 7 8 9 6

It is also possible to compute matrix values by using the index operator (:). It works as follows:

Compute N={9:4}.

Print N.

The result is:

N 9 8 7 6 5 4

113


It is also possible to generate arithmetic sequences by using the following type of syntax:

Compute O = {2:14:3}.

Print O.

In this case a matrix N is computed. Its first column value is 2 upon which the following columns

value increases by 3 until the last row’s value reacher 14.

O 2 5 8 11 14

Finally, it is also possible to transpose a matrix – or a vector – by using the following syntax:

Compute P=T(K).

Print P.

In this case a new matrix, O, is computed which corresponds to the transposed J matrix. Output

resulting from adding the above syntax is:

P 30 36 42

The use of different matrix operations will now be exemplified through the use of a more

comprehensive example.

EXAMPLE 2

114


The Simpson family has decided to keep account of the expenses used on their meals, so they can

track the daily expenses for each person. The family is composed of 4 people: Homer, Marge, Bart

and Lisa. The expenses are registered into three categories: Breakfast, Lunch and Dinner.

The accounts for the past week can be seen in the following table:

ACCOUNTS BREAKFAST LUNCH DINNER

Homer 50.50 45.00 67.25

Marge 45.00 40.25 88.75

Bart 30.00 30.25 30.75

Lisa 20.00 40.00 80.50

The Simpson Family has chosen to keep track of their expenses using SPSS matrix algebra.

1.part

MATRIX.

COMPUTE Accounts = {50.5, 45, 67.25;

45, 40.25, 88.75;

30, 30.25, 30.75;

20, 40, 80.5}.

COMPUTE names = {"Homer";"Marge";"Bart";"Lisa"}.

COMPUTE meal = {"Breakfast","Lunch","Dinner"}.

PRINT Accounts /TITLE = "The Simpson Family" /FORMAT = F12.2 /RNAMES =

names /CNAMES = meal.

END MATRIX.

This provides the following output:

115


The Simpson Family Breakfast Lunch Dinner Homer 50,50 45,00 67,25 Marge 45,00 40,25 88,75 Bart 30,00 30,25 30,75 Lisa 20,00 40,00 80,50

The family agrees to compute their expenses without VAT, and therefore add the following syntax

within the matrix syntax:

2.part

COMPUTE vat = 0.22.

COMPUTE Accounts = Accounts/(1+vat).

PRINT Accounts /TITLE "The Simpson Family, Without VAT" /FORMAT = F12.2

/RNAMES = names /CNAMES = meal.

This results in the following output:

The Simpson Family, Without VAT Breakfast Lunch Dinner Homer 41,39 36,89 55,12 Marge 36,89 32,99 72,75 Bart 24,59 24,80 25,20 Lisa 16,39 32,79 65,98

When the week is over the accounts are totaled for each person.

116


3.part

COMPUTE unit={1;1;1}.

COMPUTE pers_tot = Accounts*unit.

PRINT pers_tot /TITLE "Total each person" /FORMAT = F12.2 /RNAMES = names.

This results in the following table:

Total each person Homer 133,40 Marge 142,62 Bart 74,59 Lisa 115,16

In the mean time the Simpson Family has received a guest for the holidays – Milhouse, a poor

student – will also be entering into the family accounts. Milhouse is exempt from VAT!

4.part

COMPUTE guest = {"Milhouse"}.

COMPUTE newnames = {names; guest}.

COMPUTE mexpense = {80, 120, 150}.

COMPUTE holiday = {Accounts; mexpense}.

PRINT holiday /TITLE "Holiday Accounts" /FORMAT = F12.2 /RNAMES = newnames

/CNAMES = meal.

Below it is possible to observe the output that results from adding the above syntax.

Holiday Accounts Breakfast Lunch Dinner Homer 41,39 36,89 55,12 Marge 36,89 32,99 72,75 Bart 24,59 24,80 25,20 Lisa 16,39 32,79 65,98 Milhouse 80,00 120,00 150,00

117


12.5 Matrix algebraic functions

Function Use

DET(squarematrix) Returns the determinant af en square matrix.

MDIAG(argument) Returns a diagonal matrix. The argument is a

vector or a square matrix.

EVAL(symmetricmatrix) Returns a columnvector contain eigen values

in descending order.

IDENT(dimension) Computes an identity matrix.

INV(matrix) Computes the invers of a matrix.

MAKE(n,m, value) Computes a n x m matrice of identical values.

MSSQ(matrix) Computes the squared sum of all elements.

CSUM(matrix) Computes the sum of all column elements.

TRACE(matrix) Returns the sum of the diagonal elements.

NCOL(matrix) Returns a scalar containing the number of

columns in a matrix.

NROW(matrix) Returns a scalar containing the number of

rows in a matrix.

SOLVE(Matrix1,Matrix2) Solves homogenous equations. If

M1*X=M2, then X= SOLVE(M1, M2).

DIAG(matrix) Computes a columnmatrix containing the

elements of the main diagonal.

Below the use of some of the above functions are demonstrated in a comprehensive example. The

example carries out a regressionanalysis. The theory used is, where deemed necessary, included as

comments to the syntax. The example can be run by entering the code into a syntax window (FILE-

NEW-SYNTAX) and then using the menu point ”Run All” (RUN-ALL).

118


EXAMPLE 3 Regressionanalysis:

By using the dataset which can found here: \\okf-filesrv1\exemp\Spss\Ha manual\het-km-alder.sav .

MATRIX. /*Start recognition of matrix commands in SPSS syntax*/

GET y /*Import the dependent varaible from the open dataset*/

/VARIABLES = km /*The variables name in this case km */

/NAMES = varnamey . /* Row vector with variable name*/

GET x /* Import the independent variables from the open dataset*/

/VARIABLES = alder /* The variablenames in this case alder, with more than one

variable: separate variable names wiht commas */

/NAMES = varnamex. /* Row vector with variable names*/

Compute k = ncol(x). /*Problem dimension established*/

Compute n = nrow(x).

Compute varnames = t({"Constant", varnameX}). /* Create a name column vector*/

Compute con = make(n,1,1). /* Create a column vector to compute the constant*/

Compute x={con,x}. /* Merge the constant to the independent variables*/

Compute df2 = n-k-1. /* Degrees of freedom established*/

Compute invxtx = inv(t(x)*x). /* OLS-estimation*/

Compute b = invxtx*(t(x)*y). /* b=(x’x)-1x’y */

Compute j = nrow(b). /* Number of estimated parameters */

Compute yhat = x*b. /*Estimated value of Y, called yhat*/

Compute resid = (y-(yhat)). /*Hence calculate residuals*/

Compute sse= cssq(resid). /* The squared error (sum of e i 2) is estimated to

gain an estimate for the unexplained variation*/

Compute mse = sse/df2. /* Mean-Square Error*/

Compute tss = cssq((y-(csum(y)/n))). /* Total variation, SAKy or TSS*/

Compute r2 = (tss-sse)/tss. /* Explanatory strength, R2 */

Compute covb = (invxtx)&*(mse). /* Varianscovarians matrix*/

Compute stdeb = sqrt(diag(covb)). /*Column vector with std error of parameters */

Compute tobs = b/stdeb. /* Calculate t- values, t=(b-β)/σβ */

Compute f = ((tss-sse)/k)/mse.

Compute pf = 1-fcdf(f,k,df2). /* Lookup in the F-distribution */

Compute s = sqrt(mse). /* Regression Standard Error */

119


Compute pf = {r2,f,k,df2,pf}. /* Result row vektor */

Compute p = 2*(1-tcdf(abs(tobs), n-j)). /*p-values for parameter estimates */

Compute oput = {b, stdeb, tobs, p}. /* Collecting estimates in output matrix */

Print varnameY /title = "Dependent Variable" /format A8. /* Print the name of the

dependent variable */

Print sse /TITLE = "SSE" /FORMAT = F12.3 . /* Print measure for unexplained part*/

Print s /TITLE = "Regression Standard Error (RES)" /FORMAT = F12.3 .

Print pf /title = "Model Fit:" /clabels = "R-sq" "F" "df1" "df2" "p"/format F10.4 .

Print oput /title = 'Regression Results' /clabels = "B(OLS)" "SE(OLS)" "t" "P>|t| " /rnames =

varnames /format f10.4.

END MATRIX. /* Stop recognition of the matrix language in SPSS syntax.

EXECUTE.

Try to carry out an interpretation of the results of this regressionanalysis. It is especially interesting

to gain a greater understanding of the dimensions of the different matrices while the

120


regressionanalysis is being carried out. It is possible to find more matrixalgebraic functions by

using the SPSS help function. It can be found as follows:

Choose help\topics and search for ”Matrix functions” under Index:

121

APPENDIX A [Choice of Statistical Methods]

APPENDIX A [Choice of Statistical Methods]

Inte

rval

Inte

rval

Nom

inal

Can

onic

al A

naly

sis

Inte

rval

Nom

inal

Nom

inal

Nom

inal

Ord

inal

Inte

rval

Obs

erva

tions

Bot

h

Var

iabl

es

Several dependent variables (interval scaled) One

YesNo

Is the analysis concerning variables or observations?

Is any of the variables dependent of other variables?

Number of dependent variables?

Scale for dependent variable?

Scale for independent variable

Scale for independent variable

Scale for independent variable Cluster

Analysis

Logi

t Ana

lysi

s

Dis

crim

inan

t Ana

lysi

s

Multidimensional Scaling


Reg

ress

ion

w/d

umm

y

Ana

lysi

s of

Var

ianc

e

Reg

ress

ion

Ana

lysi

s Factor Analysis

Con

join

t Ana

lysi

s

MA

NO

VA

/GLM

Log-linear Analysis


Source: Niels J. Blunch: Analyse af markedsdata 2.rev. udg.2000. Systime.

122

Literature List

123

Literature List

Blunch, Niels J. (2000): ”Analyse af markedsdata”, 2 rev. udgave, Systime

_____________ (2000): ”Videregående data-analyse med SPSS og AMOS”, 1. udgave, Systime

_____________ (1993): ”Nogle teknikker til analyse af kvalitative data”, Internt

undervisningsmateriale E nr. 19, Handelshøjskolen i Århus.

Greenacre, Michael J. (1984): “Theory and Applications of Correspondence Analysis”, Academic

Press Inc.

Hair, Anderson m.fl.(1998): “Multivariate Data Analysis”, 5th edition 1998, Prentice Hall

Johnson, Richard A.; Wichern, Dean W. (1998): “Applied Multivariate Statistical Analysis”, 4th

edition, Prentice Hall

Juhl, Hans Jørn (1992): ”Multidimensional skalering og conjoint analyse – et

produktudviklingsværktøj”. Internt undervisningsmateriale H nr. 151, Institut for

informationsbehandling, Handelshøjskolen i Århus.

Kappelhoff, Peter (2001): “Multidimensionale Skalierung - Beispieldatei zur Datenanalyse.

Lehrstuhl für empirische Wirtschafts- und Sozialforschung Fachbreich

Wirtschaftswissenschaft“, Bergische Universität Gesamthochschule (BUGH)

Wuppertal.

Malhotra, Naresh K.(1999): Marketing Research – An Applied Orientation, 3rd edition, Prentice

Hall

Meulman, Jacqueline J.; Heiser, Willem J. (1999): “SPSS Categories®10.0”, SPSS Inc.

___________________________________ (1999): SPSS® Advanced ModelsTM 10.0, SPSS Inc.

__________________________________ (1999): SPSS® Base 10.0 Applications Guide, SPSS Inc.

___________________________________ (1999): SPSS® Base 10.0 User’s Guide, SPSS Inc.

16.0 cm manual eng

Documents