manual 07.pdfusing spss version 14 joel elliott, jennifer burnaford, stacey weiss spss is a program...

�� Using SPSS version 14

Joel Elliott, Jennifer Burnaford, Stacey Weiss SPSS is a program that is very easy to learn and is also very powerful. This manual is designed to introduce you to the program – however, it is not supposed to cover every single aspect of SPSS. There will be situations in which you need to use the SPSS Help Menu or Tutorial to learn how to perform tasks which are not detailed in here. You should turn to those resources any time you have questions. The following document provides some examples of common statistical tests used in Ecology. To decide which test to use, consult your class notes, your Statistical Roadmap or the Statistics Coach (under Help menu in SPSS).

�� Data entry p. 2 Descriptive statistics p. 4 Examining assumptions of parametric statistics

Test for normality p. 5 Test for homogeneity of variances p. 6 Transformations p. 7

Comparative Statistics 1: Comparing means among groups Comparing two groups using parametric statistics

Two-sample t-test p. 8 Paired T-test p. 10

Comparing two groups using non-parametric statistics Mann Whitney U test p. 11

Comparing three or more groups using parametric statistics One-way ANOVA and post-hoc tests p. 13

Comparing three or more groups using non-parametric statistics Kruskal-Wallis test p. 15

For studies with two independent variables Two-way ANOVA p. 17 ANCOVA p. 20

Comparative Statistics 2: Comparing frequencies of events Chi Square Goodness of Fit p. 23 Chi Square Test of Independence p. 24

Comparative Statistics 3: Relationships among continuous variables Correlation (no causation implied) p. 26 Regression (causation implied) p. 27

Graphing your data Simple bar graph p. 30 Clustered bar graph p. 31 Box plot p. 32 Scatter plot p. 32

Printing from SPSS p. 33

� ��

��

� � �� • Start SPSS and when the first box appears for “What would you like to do?” click the button for

Type in data. • A spreadsheet will appear. The set-up here is similar to Excel, but at the bottom of the window

you will notice two tabs. One is “Data View.” The other is “Variable View.” To enter your data, you will need to switch back and forth between these pages by clicking on the tabs.

!�" � �� # ��$ �� %!�" � �� # ��$ �� %!�" � �� # ��$ �� %!�" � �� # ��$ �� %��Suppose you are part of a biodiversity survey group working in the Galapagos Islands and you are studying marine iguanas. After visiting a couple of islands you think that there may be higher densities of iguanas on island A than on island B. To examine this hypothesis, you decide to quantify the population densities of the iguanas on each island. You take 20 transects (100 m2) on each island (A and B), counting the number of iguanas in each transect. Your data are shown below. A 12 13 10 11 12 12 13 13 14 14 14 14 15 15 15 16 14 12 14 14 B 15 13 16 10 9 24 13 18 14 16 15 19 14 16 17 15 17 22 15 16

• First define the variables to be used. Go to Variable View of the SPSS Data Editor window as

shown below.

• The first column (Name) is where you name your variables. For example, you might name one

“Location” (you have 2 locations in your data set, Island A and Island B). You might name the other one “Density” (this is your response variable, number of iguanas).

• Other important columns are the Type, Label, Values, and Measure. o For now, we will keep Type as “Numeric” – but look to see what your options are. At

some point in the future, you may need to use one of these options. o The Label column is very helpful. Here, you can expand the description of your variable

name. In the Name column you are restricted by the number & type of characters you can use. In the Label column, there are no such restrictions. Type in labels for your iguana data.

o In the Values column, you can assign numbers to represent the different locations (so Island A will be “1” and Island B will “2”). To do this, you need to assign “Values” to your categorical explanatory variable. Click on the cell in the Values column, and click on the “…” that shows up. A dialog box will appear as below. Type in “1” in the value cell and “A” in the value label cell, and then hit Add. Type in “2” in the value cell and “B” in the value label cell. Hit Add again. Then Hit OK.

� ��

&&&&��

o In the Measure column, you can tell the computer what type of variables these are. In this

example, island is a categorical variable. So in the Location row, go to the measure column (the far right) and click on the cell. There are 3 choices for variable types. You want to pick “Nominal”. Iguana density is a continuous variable... since scale (meaning continuous) is the default condition, you don’t need to change anything.

• Now switch to the Data View. You will see that your columns are now titled… Location and Density.

• To make the value labels appear in the spreadsheet pull down the View menu and choose Value Labels. The labels will appear as you start to enter data.

• You can now enter your data in the columns. Each row is a single observation. Since you have chosen “View Value Labels” and entered your Location value labels in the Variable View window, when you type “1” in the Location column, the letter “A” will appear. After you’ve entered all the values for Island A, enter the ones from Island B below them. The top of your data table will eventually look like this:

� ��

''''��

� � �� $ ( � �� %� $ ( � �� %� $ ( � �� %� $ ( � �� %��

!!!!�) � � * �( � + �� " � �� , �� " �� $ - � �� ) � � * �( � + �� " � �� , �� " �� $ - � �� ) � � * �( � + �� " � �� , �� " �� $ - � �� ) � � * �( � + �� " � �� , �� " �� $ - � �� Once you have the data entered, you want to summarize the trends in the data. There a variety of statistical measures for summarizing your data, and you want to explore your data by making tables and graphs. To help you do this you can use the Statistics Coach found under the Help menu in SPSS, or you can go directly to the Analyze menu and choose the appropriate tests. To get a quick view of what your data look like: • Pull down the Analyze menu and choose Descriptive statistics, then Frequencies. A new

window will appear. Put the Density variable in the box, then choose the statistics that you want to use to explore your data by the clicking on the Statistics and Charts buttons at the bottom of the box (e.g., mean, median, mode, standard deviation, skewness, kurtosis). This will produce summary statistics for the whole data set. Your results will show up in a new window.

• SPSS can also produce statistics and plots for each of the islands separately. To do this, you need to “split” the file. Pull down the Data menu and choose Split File. Click on Organize output by groups and then select the Island [Location] variable as shown below. Click OK.

• Now, if you repeat the Analyze � Descriptive statistics � Frequencies steps and hit Okay

again, your output will now be similar to the following for each Island. Statistics(a) Density

Valid 20 N Missing 0

Mean 13.3500 Median 14.0000 Mode 14.00 Std. Deviation 1.49649 Variance 2.239 Skewness -.463 Std. Error of Skewness .512 Kurtosis -.045 Std. Error of Kurtosis .992 Range 6.00 Minimum 10.00 Maximum 16.00

a Island = A

Statistics(b) Density

Valid 20 N Missing 0

Mean 15.7000 Median 15.5000 Mode 15.00(a) Std. Deviation 3.46562 Variance 12.011 Skewness .475 Std. Error of Skewness .512 Kurtosis 1.302 Std. Error of Kurtosis .992 Range 15.00 Minimum 9.00 Maximum 24.00

a Multiple modes exist. The smallest value is shown b Island = B

� ��

....��

10.00 12.00 14.00 16.00

Density

0

1

2

3

4

5

6

7

Freq

uenc

y

Mean = 13.35Std. Dev. = 1.49649N = 20

Island: A

Histogram

9.00 12.00 15.00 18.00 21.00 24.00

Density

0

2

4

6

8

10

Freq

uenc

y

Mean = 15.70Std. Dev. = 3.46562N = 20

Island: B

Histogram

• From these summary statistics you can see that the mean density of iguanas on Island A is

smaller than that on Island B. Also, the variation patterns of the data are different on the two islands as shown by the frequency distributions of the data and their different dispersion parameters. In each histogram, the normal curve indicates the expected frequency curve for a normal distribution with the same mean and standard deviation as your data. The range of data values for Island A is lower with a lower variance and kurtosis. Also, the distribution of Island A is skewed to the left whereas the data for Island B is skewed to the right.

• You could explore your data more by making box plots, stem-leaf plots, and error bar charts. Use the functions under the Analyze and Graphs menus to do this.

• After getting an impression of what your data look like you can now move on to determine whether there is a significant difference between the mean densities of iguanas on the two islands. To do this we have to use comparative statistics.

NOTE: Once you are done looking at your data for the two islands separately, you need to “unsplit” the data. Go to Data � Split File and select Analyze all cases, do not create groups.

� # �� # �� # �� # �� - � ��$ �� - � ��$ �� - � ��$ �� - � ��$ ��$ �� $ �� $ �� $ ��

As you know, parametric tests have two main assumptions: 1) approximately normally distributed data, and 2) homogeneous variances among groups. Let’s examine each of these assumptions.

� � � �� Before you conduct any parametric tests you need to check that the data values come from an “approximately normal” distribution. To do this, you can compare the frequency distribution of your data values with those of a normalized version of these values (See Descriptive Statistics section above). If the data are approximately normal, then the distributions should be similar. From your initial descriptive data analysis you know that the distributions of data for Island A and B did not appear to fit an expected normal distribution perfectly. However, to objectively determine whether the distribution varies significantly from a normal distribution you have to conduct a normality test. This test will provide you with a statistic that determines whether your data are

� ��

////��

significantly different from normal. The null hypothesis is that the distribution on your data is NOT different from a normal distribution. • For the marine iguana example, you want to know if the data from Island A population are

normally distributed and if the data from Island B are normally distributed. Thus, your data must be “split.” (Data � Split File � Organize output by groups � split by Location) Don’t forget to “unsplit” when you are done!

• To conduct a statistical test for normality on your split data, go to Analyze � Nonparametric Tests � 1 Sample K-S. In the window that appears, put the response variable (in this case, Density) variable into the box on the right. Click Normal in the Test Distribution check box below. Then click OK.

• A output shows a Komolgorov-Smirnov (K-S) table for the data from each island. Your p-value is the last line of the table: “Asymp. Sig. (2-tailed).”

• If p>0.05 (i.e., there a greater than 5% chance that your null hypothesis is true), you should conclude that the distribution of your data is not significantly different from a normal distribution.

• If p<0.05 (i.e., there is a less than 5% chance that your null hypothesis is true), you should conclude that the distribution of your data is significantly different from normal. Note: always look at the p-value. Don’t trust the “test distribution is normal” note below… sometimes that lies.

• If your data are not normal, you should inspect them for outliers which can have a strong effect on this test. Remove the extreme outliers and try again. If this does not work, then you must either transform your data so that they are normally distributed, or use a nonparametric test. Both of these options are discussed later.

One-Sample Kolmogorov-Smirnov Test(c)

Density N 20

Mean 13.3500 Normal Parameters(a,b) Std. Deviation 1.49649

Absolute .218 Positive .132

Most Extreme Differences

Negative -.218 Kolmogorov-Smirnov Z .975 Asymp. Sig. (2-tailed) .298

a Test distribution is Normal. b Calculated from data. c Island = A

One-Sample Kolmogorov-Smirnov Test(c)

Density N 20

Mean 15.7000 Normal Parameters(a,b) Std. Deviation 3.46562

Absolute .166 Positive .166

Most Extreme Differences

Negative -.120 Kolmogorov-Smirnov Z .740 Asymp. Sig. (2-tailed) .644

a Test distribution is Normal. b Calculated from data. c Island = B

• For the iguana example, you should find that the data for both populations are not significantly

different from normal (p > 0.05). With a sample size of only N=20 the data would have to be skewed much more or have some large outliers to vary significantly from normal.

• If your data are not normally distributed, you should try to transform the data to meet this important assumption. (See below.)

� �� - �� ( �� - �� ( �� - �� ( �� - �� ( ��

Another assumption of parametric tests is that the variances of each of the groups that you are comparing have relatively similar variances. Most of the comparative tests in SPSS will do this test

� ��

0000��

for you as part of the analysis. For example, when you run a t-test, the output will include columns labeled “Levene’s test for Equality of Variances.” The p-value is labeled “Sig.” and will tell you whether or not your data meet the assumption of parametric statistics. If the variances are not homogeneous, then you must either transform your data (e.g., using a log transformation) to see if you can equalize the variances, or you can use a nonparametric comparison test that does not require this assumption.

� - � �" �� " � �� - � �" �� " � �� - � �" �� " � �� - � �" �� " � �� " � ��" � ��" � ��" � ��" �� - � ��$ ��1�" �� - � ��$ ��1�" �� - � ��$ ��1�" �� - � ��$ ��1 ��

� ��

If your data do not meet one or both of the above assumptions of parametric statistics, you may be able to transform the data so that they do. You can use a variety of transformations to try and make the variances of the different groups equal or normalize the data. If the transformed data meet the assumptions of parametric statistics, you may proceed by running the appropriate test on the transformed data. If, after a number of attempts, the transformed data do not meet the assumptions of parametric statistics, you must run a non-parametric test. If the variances were not homogeneous, look at how the variances change with the mean. The usual case is that larger means have larger variances. If this is the case, a transformation such as common log, natural log or square root often makes the variances homogeneous. Whenever your data are percents (e.g., % cover) they will generally not be normally distributed. To make percent data normal, you should do an arcsine-square root transformation of the percent data (percents/100).

To transform your data: • Go to Transform � Compute. You will get the Compute Variable window. • In the Target Variable box, you want to name your new transformed variable (for example,

Log_Density”). • There are 3 ways you can transform your data. 1) using the calculator, 2) choosing functions

from lists on the right, or 3) typing the transformation in the Numeric Expression box. • For this example: In the Function Group box on the right, highlight Arithmetic by clicking on it

once. Various functions will show up in the Functions and Special Variables box below. Choose the LG10 function. Double click on it.

• In the Numeric Expression box, it will now say LG10[?]. Double-click on the name of the variable you want to transform (e.g., Density) in the box on the lower left to make Density replace the “?”.

• Click “Ok”. SPSS will create a new column in your data sheet that has log-values of the iguana densities.

• NOTE: you might want to do a transformation such as LN (x + 1). Follow the directions as above but choose LN instead of LG10 from the Functions and Special Variables box. Move your variable in the parentheses to replace the “?”. Then type in “+1” after your variable so it reads, for example, LN[Density+1].

• NOTE: for the arcsine-square root transformation, the composite function to be put into the Numeric Expression box would look like: arcsin(sqrt(percent data/100)).

� ��

2222��

After your transform your data, redo the tests of normality and homogeneity of variances to see if the transformed data now meet the assumptions of parametric statistics. Again, if your data now meet the assumptions of the parametric test, conduct a parametric statistical test using the transformed data. If the transformed data still do not meet the assumption, you can do a nonparametric test instead, such as a Mann-Whitney U test on the original data. This test is described later in this handout.

��$ �� ( � �� %��$ �� ( � �� %��$ �� ( � �� %��$ �� ( � �� %��$ �� $ �� $ �� $ �� $ �� $ �� $ �� $ ��

��$ �� $ �� $ �� $ �� + � + � + � + �� $ �� $ �� $ �� $ �� $ �� $ �� $ �� $ �� %� + ��$%� + ��$%� + ��$%� + ��$ �� 3 �$ � �� " � �� 3 �$ � �� " � �� 3 �$ � �� " � �� 3 �$ � �� " � �� + � + � + � + ��$ �� $ �� $ �� $ �� This test compares the means from two groups, such as the density data for the two different iguana populations. To run a two-sample t-test on the data: • First, be sure that your data are “unsplit”. (Data � Split File � Analyze all cases, do not create

groups.) • Then, go to Analyze � Compare Means � Independent Samples T-test. • Put the Density variable in the Test Variable(s) box and the Location variable in the Grouping

Variable box as shown below.

• Now, click on the Define Groups button and put in the names of the groups in each box as shown

below. The click Continue and OK.

� ��

4444��

• The output consists of two tables… Group Statistics

Island N Mean Std. Deviation Std. Error

Mean A 20 13.3500 1.49649 .33462 Density B 20 15.7000 3.46562 .77494

Independent Samples Test

4.234 .047 -2.784 38 .008 -2.35000 .84410 -4.05879 -.64121

-2.784 25.847 .010 -2.35000 .84410 -4.08557 -.61443

Equal variancesassumedEqual variancesnot assumed

DensityF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

• The first table shows the means and variances of the two groups. The second table shows the

results of the Levene’s Test for Equality of Variances, the t-value of the t-test, the degrees of freedom of the test, and the p-value which is labeled “Sig. (2-tailed)”.

• Before you look at the results of the t-test, you need to make sure your data fit the assumption of homogeneity of variances. Look at the columns labeled “Levene’s test for Equality of Variances.” The p-value is labeled “Sig.”.

• In this example the data fail the Levene’s Test for Equality of Variances, so the data will have to be transformed in order to see if we can get it to meet this assumption of the t-test. If you log-transformed the data and re-ran the test, you’d get the following output.

Group Statistics

Island N Mean Std. Deviation Std. Error

Mean Log_Density A 20 1.1228 .05052 .01130 B 20 1.1856 .09817 .02195

Independent Samples Test

2.642 .112 -2.547 38 .015 -.06288 .02469 -.11286 -.01290

-2.547 28.404 .017 -.06288 .02469 -.11342 -.01234

Equal variancesassumedEqual variancesnot assumed

Log_DensityF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper


Difference

t-test for Equality of Means

• Now the variances of the two groups are not significantly different from each other (p =0.112)

and you can focus on the results of the t-test. For the t-test, p=0.015 (which is <0.05) so you can conclude that the two means are significantly different from each other. Thus, this statistical test provides strong support for your original hypothesis that the iguana densities varied significantly between Island A and Island B.

� ��

� 5� 5� 5� 5 ��

• WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the t-value, df, and p. For example: Iguanas are significantly more dense on Island B than on Island A (t=2.5, df=38, p<0.05).

�� " �� " �� " �� " �� You should analyze your data with a paired t-test only if you paired your samples during data collection. This analysis tests to see if the mean difference between samples in a pair is = 0. The null hypothesis is that the difference is not different from zero. For example, you may have done a study in which you investigated the effect of light intensity on the growth of the plant Plantus speciesus. You took cuttings from source plants and for each source plant, you grew 1 cutting in a high light environment and 1 cutting in a low-light environment. The other conditions were kept constant between the groups. You measured growth by counting the number of new leaves grown over the course of your experiment. Your data look like this: Plant 1 2 3 4 5 6 7 8 9 10 Low Light

2 4 1 3 2 5 4 1 3 4

High Light

3 6 2 4 5 6 5 2 5 5

• Enter your data in 2 columns named “Low” and “High”. Each row in the spreadsheet should

have a pair of data. In Variable View, leave the Measure column on Scale. Leave Values as None.

• Go to Analyze � Compare Means � Paired Samples T-test. • Highlight both of your variables and hit the arrow to put them in the “Paired-Variables” box.

They will show up as “Low-High”. Hit OK. The following output should be produced. • The output consists of 3 tables…

Paired Samples Statistics

2.9000 10 1.37032 .433334.3000 10 1.49443 .47258

Low LightHigh Light

Pair1

Mean N Std. DeviationStd. Error

Mean

Paired Samples Correlations

10 .884 .001Low Light & High LightPair 1N Correlation Sig.

Paired Samples Test

-1.40000 .69921 .22111 -1.90018 -.89982 -6.332 9 .000Low Light - High LightPair 1Mean Std. Deviation

Std. ErrorMean Lower Upper


Difference

Paired Differences

t df Sig. (2-tailed)

� ��

� ��

• The first table shows the summary statistics for the 2 groups. The second table shows information that you can ignore. The third table, the Paired Samples Test table, is the one you want. It shows the mean difference between samples in a pair, the variation of the differences around the mean, your t-value, your df, and your p-value (labeled as Sig (2-tailed)). In this case, the P-value reads 0.000, which means that it is very low – it is smaller than the program will show in the default 3 decimal places. You can express this in your results section as p<0.001.

• WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the t-value, df, and p. For example: Plants in the high light treatment added significantly more leaves than their counterpart plants in the low light treatment (t=6.3, df=9, p<0.001).

��$ �� $ �� $ �� $ �� + �� $ �� + �� $ �� + �� $ �� + �� $ �� $ �� $ �� $ �� $ �� %�6 �� %�6 �� %�6 �� %�6 �� - �� 7 � � � � - �� 7 � � � � - �� 7 � � � � - �� 7 � � � �� The t-test is a parametric test, meaning that it assumes that the sample mean is a valid measure of center. While the mean is valid when the distance between all scale values is equal, it's a problem when your test variable is ordinal because in ordinal scales the distances between the values are arbitrary. Furthermore, because the variance is calculated using squared deviations from the mean, it too is invalid if those distances are arbitrary. Finally, even if the mean is a valid measure of center, the distribution of the test variable may be so non-normal that it makes you suspicious of any test that assumes normality.

If any of these circumstances is true for your analysis, you should consider using the nonparametric procedures designed to test for the significance of the difference between two groups. They are called nonparametric because they make no assumptions about the parameters of a distribution, nor do they assume that any particular distribution is being used.

A Mann-Whitney U test doesn’t require normality or homogeneous variances, but it is slightly less powerful than the t-test (which means the Mann-Whitney U test is less likely to show a significant difference between your two groups). So, if you have approximately normal data, then you should use a t-test.

To run a Mann-Whitney U test: • Go to Analyze � Nonparametric tests � 2 Independent samples and a dialog box will appear. • Put the variables in the appropriate boxes, define your groups, and confirm that the Mann-

Whitney U test type is checked. Then click OK.

� ��

� ��

• The output consists of two tables. The first table shows the parameters used in the calculation of the test. The second table shows the statistical significance of the test. The value of the U statistic is given in the 1st row (“Mann-Whitney U”). The p-value is labeled as Asymp. Sig. (2-tailed).

Ranks Island N Mean Rank Sum of Ranks

A 20 15.08 301.50 B 20 25.93 518.50

Density

Total 40 Test Statistics(b) Density Mann-Whitney U 91.500 Wilcoxon W 301.500 Z -2.967 Asymp. Sig. (2-tailed) .003

Exact Sig. [2*(1-tailed Sig.)] .003(a)

a Not corrected for ties. b Grouping Variable: Island

• In the table above (for the marine iguana data), the p-value = 0.003, which means that the densities of iguanas on the two islands are significantly different from each other (p < 0.05). So, again this statistical test provides strong support for your original hypothesis that the iguana densities are significantly different between the islands.

• WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the U-value, df, and p. For example: Iguanas are significantly more dense on Island B than on Island A (U=91.5, df=39, p<0.01).

��$ �� - �� $ ��$ �� - �� $ ��$ �� - �� $ ��$ �� - �� $ �� $ �� $ �� $ �� $ �� %�%�%�%��

� �� + �� !� � 8 !��" �� + �� !� � 8 !��" �� + �� !� � 8 !��" �� + �� !� � 8 !��" �� - � � � � �- � � � � �- � � � � �- � � � � ��

Let’s now consider parametric statistics that compare three or more groups of data. To continue the example using iguana population density data, let’s add data from a series of 16 transects from a third island, Island C. Enter these data into your spreadsheet at the bottom of the column “Density”. Density (100 m2) Island C: 15 13 10 14 12 12 13 13 14 14 11 14 15 12 15 16 To enter the Location for Island C, you must first edit the Value labels by going to Variable View: add a third Value (3) and Value label (C). Then, back on Data View, type a 3 into the last cell of the “Location” column, and copy the C and paste it into the rest of the cells below. The appropriate parametric statistical test for continuous data with one independent variable and more than two groups is the One-way analysis of variance (ANOVA). It tests whether there is a

� ��

� &� &� &� &��

significant difference among the means of the groups, but does not tell you which means are different from each other. In order to find out which means are significantly different from each other, you have to conduct “post-hoc paired comparisons”. They are called post-hoc, because you conduct the tests after you have completed an ANOVA and it shows where significant differences lie among the groups. One of the Post-hoc tests is the Fisher PLSD (Protected Least Sig. Difference) test, which gives you a test of all pairwise combinations. To run the ANOVA test: • Go to Analyze � Compare Means � One-way ANOVA. • In the dialog box put the Density variable in the Dependent List box and the Location variable in

the Factor box. • Click on the Post Hoc button and then click on the LSD check box and then click Continue. • Click on the Options button and check 2 boxes: Descriptive and Homogeneity of variance test.

Then click Continue and then OK.

• The output will include four tables… Descriptive statistics, results of the Levene test, the results

of the ANOVA, and the results of the post-hoc tests. • The first table gives you some basic descriptive statistics for the three islands.

Descriptives

Density

20 13.3500 1.49649 .33462 12.6496 14.0504 10.00 16.0020 15.7000 3.46562 .77494 14.0780 17.3220 9.00 24.0016 13.3125 1.62147 .40537 12.4485 14.1765 10.00 16.0056 14.1786 2.63616 .35227 13.4726 14.8845 9.00 24.00

ABCTotal

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval forMean

Minimum Maximum

• The second table gives you the results of the Levene Test (which examines the assumption of

homogeneity of variances). You must assess the results of this test before looking at the results of your ANOVA.

Test of Homogeneity of Variances

Density

3.237 2 53 .047

LeveneStatistic df1 df2 Sig.

� ��

� '� '� '� '��

• In this case, your variances are not homogeneous (p<0.05), the data do not meet one of the assumptions of the test. Thus, and you cannot proceed to using the results of the ANOVA comparisons of means. You have two main choices of what to do. You can either transform your data to attempt to make the variances homogeneous or you may run a test that does not require homogeneity of variances (a non-parametric test like Welch’s Test for three or more groups).

• First, try transforming the data for each population (try a log transformation), and then run the test again. The following tables are for the log transformed data.

Descriptives

Log_Density

20 1.1228 .05052 .01130 1.0991 1.1464 1.00 1.2020 1.1856 .09817 .02195 1.1397 1.2316 .95 1.3816 1.1211 .05472 .01368 1.0919 1.1503 1.00 1.2056 1.1447 .07729 .01033 1.1240 1.1654 .95 1.38

ABCTotal

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval forMean

Minimum Maximum

Test of Homogeneity of Variances

Log_Density

1.902 2 53 .159

LeveneStatistic df1 df2 Sig.

• Now your variances are homogeneous (p>0.05), and you can continue with the assessment of the

ANOVA. • The third table gives you the results of the ANOVA test, which examined whether there were

any significant differences in mean density among the three island populations of marine iguanas.

ANOVA

Log_Density

.052 2 .026 4.989 .010

.277 53 .005

.329 55

Between GroupsWithin GroupsTotal

Sum ofSquares df Mean Square F Sig.

• Look at the p-value in the ANOVA table (“Sig.”). If this p-value is > 0.05, then there are no

significant differences among any of the means. If the p-value is < 0.05, then at least one mean is significantly different from the others. In this example, p = 0.01 in the ANOVA table, and thus p < 0.05, so the mean densities are significantly different.

• Now that you know the means are different, you want to find out which pairs of means are different from each other. e.g., is the density on Island A greater than B? Is it greater than C? How do B & C compare with each other?

• The Post Hoc tests, Fisher LSD (Least Sig. Difference), allow you to examine all pairwise comparisons of means. The results are listed in the fourth table. Which groups are and are not significantly different from each other? Look at the “Sig.” column for each comparison. B is different from both A and C, but A and C are not different from each other.

� ��

� .� .� .� .��

Multiple Comparisons

Dependent Variable: Log_DensityLSD

-.06288* .02284 .008 -.1087 -.0171.00166 .02423 .946 -.0469 .0503.06288* .02284 .008 .0171 .1087.06453* .02423 .010 .0159 .1131

-.00166 .02423 .946 -.0503 .0469-.06453* .02423 .010 -.1131 -.0159

(J) IslandBCACAB

(I) IslandA

B

C

MeanDifference

(I-J) Std. Error Sig. Lower Bound Upper Bound95% Confidence Interval

The mean difference is significant at the .05 level.*.

• WHAT TO REPORT: Following a statement that describes the general patterns in the data, you should parenthetically report the F-value, df, and p from the ANOVA. Following statements that describe the differences between specific groups, you should report the p-value from the post-hoc test only. (NOTE: there is no F-value or df associated with the post-hoc tests… only a p-value!) For example: Iguana density varies significantly across the three islands (F=5.0, df=2,53, p=0.01). Iguana populations on Island B are significantly more dense than on Island A (p<0.01) and on Island C (p=0.01), but populations on Islands A and C have similar densities (p>0.90).

��$ �� - �� $ ��$ �� - �� $ ��$ �� - �� $ ��$ �� - �� $ �� $ �� $ �� $ �� $ �� %�%�%�%��

9 ��* ��9 ��* ��9 ��* ��9 ��* ��

Like a Mann-Whitney U test was a non-parametric version of a t-test, a Kruskal-Wallis test is the non-parametric version of an ANOVA. The test is used when you want to compare three or more groups of data, and those data do not fit the assumptions of parametric statistics even after attempting standard transformations. Remind yourself of the assumptions of parametric statistics and the downside of using non-parametric statistics by reviewing the information on Page 11. To run the Kruskal-Wallis test: • Go to Analyze � Nonparametric Tests � K Independent Samples.

• Note: Remember for the Mann-Whitney U test, you went to Nonparametric tests � 2 Independent Samples. Now you have more than 2 groups, so you go to K Independent Samples instead, where K is just standing in for “any number” or “more than 2”.

• Put your variables in the appropriate boxes, define your groups, and be sure Kruskal-Wallis box is clicked on in the Test Type box. Click OK.

� ��

� /� /� /� /��

• The output consists of two tables. The first table shows the parameters used in the calculation of

the test. The second table shows you the statistical results of the test. As you will see, the test statistic that gets calculated is a chi-square value and it is reported in the first row of the second table. The p-value is labeled as Asymp. Sig. (2-tailed).

Ranks

20 23.1520 38.2016 23.0656

LocationABCTotal

densityN Mean Rank

Test Statistics(a,b)

density Chi-Square 11.279 df 2 Asymp. Sig. .004

a Kruskal Wallis Test b Grouping Variable: Location

• In the table above, the p-value = 0.004, which means that the densities on the three islands are significantly different from each other (p < 0.01). So, this test also supports the hypothesis that iguana densities differ among islands. We do not yet know which islands are different from which other ones.

• Unlike an ANOVA, a Kruskal-Wallis test does not have an easy way to do post-hoc analyses. So, if you have a significant effect for the overall Kruskal-Wallis, you can follow that up with a series of two-group comparisons using Mann-Whitney U tests. In this case, we would follow up the Kruskal-Wallis with three Mann-Whitney U tests: Island A vs. Island B, Island B vs. Island C, and Island C vs. Island A.

• WHAT TO REPORT: Following a statement that describes that general patterns in the data, you should parenthetically report the chi-square value, df, and p. For example: Iguana density varied significantly across the three islands (�2=11.3, df=2, p=0.004).

: �� " � ��+ - � + : �� " � ��+ - � + : �� " � ��+ - � + : �� " � ��+ - � + � �" � $ � �" � � � �" � $ � �" � � � �" � $ � �" � � � �" � $ � �" � � �( �� , �� %��( �� , �� %��( �� , �� %��( �� , �� %�� + � + � + � + ��+ �� !� � 8 !+ �� !� � 8 !+ �� !� � 8 !+ �� !� � 8 !�3 �!� �� 8 !�3 �!� �� 8 !�3 �!� �� 8 !�3 �!� �� 8 !�� In many studies, researchers are interested in examining the effect of >1 independent variable (i.e., “factors”) on a given dependent variable. For example, say you want to know whether the bill size of finches is different between males and females of two different species. In this example, you

� ��

� 0� 0� 0� 0��

have two factors (Species and Sex) and both are categorical. They can be examined simultaneously in a Two-way ANOVA, a parametric statistical test. The two-way ANOVA will also tell you whether the two factors have joint effects on the dependent variable (bill size), or whether they act independently of each other (i.e., does bill size depend on sex in one species but not in the other species?). What if we wanted to know, for a single species, how sex and body size affect bill size? We still have two factors, but now one of the factors is categorical (Sex) and one is continuous (Body Size). In this case, we need to use an ANCOVA – an analysis of covariance. Both tests require that the data are normally distributed and all of the groups have homogeneous variances. So you need to check these assumptions first. If you want to compare means from two (or more) grouping variables simultaneously, as ANOVA and ANCOVA do, there is no satisfactory non-parametric alternative. So you may need to transform your data. � + � + � + � + �� !� � 8 !� �� !� � 8 !� �� !� � 8 !� �� !� � 8 !�� Enter the data as shown to the right: • The two factors (Species and Sex) are put in two

separate columns. The dependent variable (Bill length) is entered in a third column.

Before you run a two-way ANOVA, you might want to first run a t-test on bill size just between species, then a t-test on bill size just between sexes. Note the results. Do you think these results accurately represent the data? This exercise will show you how useful a two-way ANOVA can be in telling you more about the patterns in your data. Now run a two-way ANOVA on the same data. The procedure is much the same as for a One-way ANOVA with one added step to include the second variable to the analysis. • Go to Analyze � General Linear Model �

Univariate. A dialog box appears as below. • Your dependent variable goes in the “Dependent

Variable” box. • Your explanatory variables are “Fixed Factors” • Now click Options. A new window will appear.

Click on the check boxes for Descriptive

� ��

� 2� 2� 2� 2��

Statistics and Homogeneity tests, then click Continue.

• Click OK. The output will consist of three tables which show descriptive statistics, the results of the Levene’s test and the results of the 2-way ANOVA.

• From the descriptive statistics, it appears that the means may be different between the sexes and also different between species.

Descriptive Statistics

Dependent Variable: Bill size

17.60 1.140 523.00 1.581 520.30 3.129 1026.60 2.074 516.60 2.074 521.60 5.621 1022.10 4.999 1019.80 3.795 1020.95 4.478 20

SpeciesSpecies ASpecies BTotalSpecies ASpecies BTotalSpecies ASpecies BTotal

SexFemale

Male

Total

Mean Std. Deviation N

• From this second table, you know that your data meet the assumption of homogeneity of

variance. So, you are all clear to interpret the results of your 2-way ANOVA. Levene's Test of Equality of Error Variancesa


1.193 3 16 .344F df1 df2 Sig.

Tests the null hypothesis that the error variance of thedependent variable is equal across groups.

Design: Intercept+Sex+Species+Sex * Speciesa.

� ��

� 4� 4� 4� 4��

• The ANOVA table shows the statistical significance of the differences among the means for each of the independent variables (i.e., factors or “main effects”. Here, they are Sex and Species) and the interaction between the two factors (i.e., Sex * Species). Let’s walk through how to interpret this information…

Tests of Between-Subjects Effects


331.350a 3 110.450 35.629 .0008778.050 1 8778.050 2831.629 .000

8.450 1 8.450 2.726 .11826.450 1 26.450 8.532 .010

296.450 1 296.450 95.629 .00049.600 16 3.100

9159.000 20380.950 19

SourceCorrected ModelInterceptSexSpeciesSex * SpeciesErrorTotalCorrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .870 (Adjusted R Squared = .845)a.

• Always look at the interaction term FIRST. The p-value of the interaction term tells you the probability that the two factors act independently of each other and that different combinations of the variables have different effects. In this bill-size example, the interaction term shows a significant sex*species interaction (p < 0.001). This means that the effect of sex on bill size differs between the two species. Simply looking at sex or species on their own won’t tell you anything.

• To get a better idea of what the interaction term means, make a Bar Chart with error bars. See the graphing section of the manual for instructions on how to do this.

• If you look at the data, the interaction

should become apparent. In Species A, bills are larger in males than in females, but in Species B, bills are larger in females than in males. So simply looking at sex doesn’t tell us anything (as you saw when you did the t-test) and neither sex has a consistently larger bill when considered across both species.

• The main effects terms in a 2-way ANOVA basically ignore the interaction term and give similar results to the t-tests you may have performed earlier. So, the p-value associated with each independent variable (i.e., factor or main effect)

tells you the probability that the means of the different groups of that variable are the same. So, if p < 0.05, the groups of that variable are significantly different from each other. In this case, it tests whether males and females are different from each other – disregarding the fact that we have males and females from two different species in our data set. And it tests whether the two species are different from each other – disregarding the fact that we have males and females from each species in our data set.

� ��

�5�5�5�5 ��

• The two-way ANOVA found that species was significant if you ignore the interaction. This suggests that species A has larger bills overall, mainly because of the large size of the males of Species A, but does not always have larger bills because bill size also depends gender.

• WHAT TO REPORT: • If there is a significant interaction term, the significance of the main effects cannot

be fully accepted because of differences in the trends among different combinations of the variables. Thus, you only need to tell your reader about the interaction term of the ANOVA table. Describe the pattern and parenthetically report the appropriate F-value, df, and p). For example: The way that sex affected bill size was different for the two different species (F=95.6, df=1,16, p<0.001). (Often, a result like this would be followed up with two separate t-tests.)

• If the interaction term is not significant, then the statistical results for the main effects can be fully recognized. In this case, you need to tell your reader about the interaction term and about each main effect term of the ANOVA table. Following a statement that describes the general patterns for each of these terms, you should parenthetically report the appropriate F-value, df, and p. For example: Growth rates of the both invasive and native grass species were significantly higher at low population densities than at high population densities (F=107.1, df=1,36, p<0.001). However, the invasive grass grew significantly faster than the native grass at both populations densities (F=89.7, df=1,36, p<0.001). There is no interaction between grass species and population densities on growth rate (F=1.2, df=1,36, p>0.20).

!�!�!�!� �� 8 !� 8 !� 8 !� 8 !�� Remember, ANCOVA is used when you have 2 or more independent variables that are a mixture of categorical and continuous variables. Our example here is a study investigating the effect of gender (categorical) and body size (continuous) on bill size in a species of bird. Your data must be normally distributed and have homogeneous variances to use this parametric statistical test. Enter the data as shown to the right: • The two factors (Species and Body Size) are put in two separate

columns. The dependent variable (Bill size) is entered in a third column.

To run the ANCOVA: • Go to Analyze � General Linear Model � Univariate as you

did for the two-way ANOVA. • Put your dependent variable in the “Dependent Variable” box. • Put your categorical explanatory variable in the “Fixed

Factor(s)” box. • Put your continuous explanatory variable in the “Covariate(s)”

box. • Click on Options. A new window will appear. Click on the

check boxes for Descriptive Statistics and Homogeneity tests, then click Continue.

• Click on Model. A new window will appear. At the top middle of the pop-up window, specify the model as “Custom” instead of “Full factorial”. Highlight one of the factors shown on the left side of the pop-up window

� ��

��

(under “Factors & Covariates”) and click the arrow button. That variable should now show up on the right side (under “Model”). Do the same with the second factor. Now, highlight the two factors on the right simultaneously and click the arrow, making sure the option is set to “interaction”. In the end, your Model pop-up window should look something like the image below:

• Click Continue and then click OK. The output will consist of four tables which show the

categorical (“between-subjects”) variable groupings, some descriptive statistics, the results of the Levene’s test and the results of the ANCOVA.

• From the first and second table, it appears that males and females have similarly sized bills. Between-Subjects Factors

Value Label N 1.00 male 8 sex 2.00 female 8

Descriptive Statistics Dependent Variable: bill_size

sex Mean Std.

Deviation N male 21.2625 1.70791 8 female 21.6500 2.24817 8 Total 21.4563 1.93906 16

• From the third table, you know that the data meet the assumption of homogeneity of variance. So, you are clear to interpret the results of the ANCOVA (assuming your data are normal…).

Levene's Test of Equality of Error Variances(a) Dependent Variable: bill_size

F df1 df2 Sig. .237 1 14 .634

Tests the null hypothesis that the error variance of the dependent variable is equal across groups. a Design: Intercept+sex+body_size+sex * body_size • The ANCOVA results are shown in an ANOVA table which is interpreted similar to the table

from the two-way ANOVA. You can see the statistical results regarding the two independent

� ��

��

variables (factors) and the interaction between the two factors (i.e., Sex * Body_size) are shown on three separate rows of the table below.

Tests of Between-Subjects Effects Dependent Variable: bill_size

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 48.612(a) 3 16.204 24.970 .000 Intercept 10.555 1 10.555 16.265 .002 sex .278 1 .278 .428 .525 body_size 44.322 1 44.322 68.299 .000 sex * body_size .141 1 .141 .217 .649 Error 7.787 12 .649 Total 7422.330 16 Corrected Total 56.399 15

a R Squared = .862 (Adjusted R Squared = .827)

• As with the 2-way ANOVA, you must interpret the interaction term FIRST. In this example, the interaction term shows up on the ANOVA table as a row labeled “sex*body_size” and it tells you whether or not the way that body size affects bill size is the same for males as it is for females. The null hypothesis is that body size does affect bill size the same for each of the two sexes. In other words, the null hypothesis is that the two factors (body size and sex) do not interact in the way they affect bill size.

• Here, you can see that the interaction term is not significant (p=0.649). Therefore, you can go on to interpret the two factors independently. You can see that there is no effect of Sex on bill size (p=0.525). And, you can see that there is an effect of Body Size on bill size (p<0.001).

• Let’s see how this looks graphically. Make a scatterplot with the dependent variable (Bill Size) on the y-axis and the continuous independent variable (Body Size) on the x-axis. To make the Male and Female data show up as different shaped symbols on your graph, move the categorical independent variable (Sex) into the box labeled “Style” as shown below:

� male� female

sex

10.00 11.00 12.00 13.00 14.00

body_size

19.00

20.00

21.00

22.00

23.00

24.00

bill_

size

�

�

�

�

�

�

��

�

��

�

�

�

�

�

� ��

�&�&�&�&��

• From the figure you can see 1) that the way that body size affects bill size is the same for males as it is for females (i.e., there is no interaction between the two factors), that males and females do not differ in their mean bill size (there is clear overlap in the distributions of male and female bill sizes), and 3) that body size and bill size are related to each other (as body size increase, bill size also increases).

• WHAT TO REPORT: • If there is a significant interaction term, the significance of the main effects cannot

be fully accepted because of differences in the trends among different combinations of the variables. Thus, you only need to tell your reader about the interaction term from the ANOVA table. Describe the pattern and parenthetically report the appropriate F-value, df, and p). For example: The way that prey size affected energy intake rate was different for large and small fish (F=95.6, df=1,16, p<0.001). (Typically, a result like this would be followed up with two separate regressions (see pg. 27 below) – one for large fish and one for small fish.)

• If the interaction term is not significant, then the statistical results for the main effects can be fully recognized. In this case, you need to tell your reader about the interaction term and about each main effect term of the ANOVA table. Following a statement that describes the general patterns for each of these terms, you should parenthetically report the appropriate F-value, df, and p. For example: Males and females have similar mean bill sizes (F=0.4, df=1,12, p>050), and for both sexes, bill size increases as body size increases (F=68.3, df=1,12, p<0.001). There is no interaction between gender and body size on bill size (F=0.2, df=1,12, p>0.60).

��$ �� ( � �� $ �� ( � �� $ �� ( � �� $ �� ( � �� %%%%��$ �� $ �� $ �� $ �� ) �� ) �� ) �� ) �� ( � � �� ( � � �� ( � � �� ( � � ��

�- ��) �� !�� - ��) �� !�� - ��) �� !�� - ��) �� !�� ;;;;�� " �� : �� " �� : �� " �� : �� " �� : ��

This test allows you to compare observed to expected values within a single group of test subjects. For example: Are guppies more likely to be found in predator or non-predator areas? You are interested in whether predators influence guppy behavior. So you put guppies in a tank that is divided into a predator-free refuge and an area with predators. The guppies can move between the two sides, but the predators can not. You count how many guppies were in the predator area and in the refuge after 5 minutes. Here are your data: number of guppies in predator area in refuge

4 16 Your null hypothesis for this test is that guppies are evenly distributed between the 2 areas. To perform the Chi-Square Goodness of fit test:

� ��

�'�'�'�'��

• Open a new data file in SPSS • In Variable View, name the first variable “Location.” In the Measure column, choose

“Ordinal.” Assign 2 values: one for Predator Area and one for Refuge. Then create a second variable called “Guppies.” In the Measure column, choose Scale.

• In Data View, enter the observed number of guppies in the 2 areas. • Go to Data � Weight Cases. In the window that pops up, click on Weight Cases by and select

Guppies. Hit OK. • Go to Analyze � Nonparametric Tests � Chi-square. • Your test variable is Location. • Under Expected Values click on Values. Enter the expected value for the refuge area first, hit

add – then enter the expected value for the predator area and hit add. Hit OK. • In the Location Table, check the values to make sure the test did what you thought it was going

to do. Are the observed and expected numbers for the 2 categories correct? • Your Chi-Square value, df, and p-value are displayed in the Test Statistics Table. NOTE: Once you are done with this analysis, you will likely want to stop “weighting cases”. Go to Data � Weight Cases and select Do not weight cases. • WHAT TO REPORT: You want to report the �2 value, df, and p, parenthetically, following a

statement that describes the patterns in the data.

�- ��) �� !�� - ��) �� !�� - ��) �� !�� - ��) �� !�� ;;;;�� <�" � $ � �" � �� <�" � $ � �" � �� <�" � $ � �" � �� <�" � $ � �" � ��

If you have 2 different test subject groups, you can compare their responses to the independent variable. For example, you could ask the question: Do female guppies have the same response to predators as male guppies? The chi-square test of independence allows you to determine whether the response of your 2 groups (in this case, female & male guppies) is the same or is different. You are interested in whether male and female guppies have different responses to predators. So you test 10 male and 10 female guppies in tanks that are divided into a predator-free refuge and an area with predators. Guppies can move between the areas – predators can not. You count how many guppies were in the predator area and in the refuge after 5 minutes. Here are the data: number of guppies in predator area in refuge male guppies 1 9 female guppies 3 7 Your null hypothesis is that guppy gender does not affect response to predators… or in other words, that there will be no difference in the response of male and female guppies to predators. Or in other words… you predict that the effect of predators will not depend on guppy gender. To perform the test in SPSS:

• In Variable View, set up two variables: Gender and Location. Both are categorical, so they must be Nominal, and you need to set up Values.

� ��

�.�.�.�.��

• Enter your data in 2 columns. Each row is a single fish. • Go to Analyze � Descriptive Statistics � Crosstabs. • In the pop-up window, move one of your variables into the Rows window and the other one

into the Column window. • Click on the Statistics button on the bottom of the Crosstabs window, then click Chi-square

in the new pop-up window. • Click Continue, then Okay.

Your output should look like this:

Case Processing Summary Cases

Valid Missing Total N Percent N Percent N Percent Gender * Location 20 100.0% 0 .0% 20 100.0%

Gender * Location Crosstabulation

Location predators refuge Total

male 1 9 10 Gender female 3 7 10

Total 4 16 20

Chi-Square Tests

Value df Asymp. Sig. (2-

sided) Exact Sig. (2-

sided) Exact Sig. (1-

sided) Pearson Chi-Square 1.250(b) 1 .264 Continuity Correction(a) .313 1 .576 Likelihood Ratio 1.297 1 .255 Fisher's Exact Test .582 .291 Linear-by-Linear Association 1.188 1 .276

N of Valid Cases 20 a Computed only for a 2x2 table b 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.00. • How to interpret your output:

• Ignore the 1st table. • The second table (Gender*Location Crosstabulation) has your observed values for each

category. You should check this table to make sure your data were entered correctly. In this example, the table correctly reflects that there were 10 of each type of fish, and that 1 male and 3 females were in the predator side of their respective tanks.

• In the 3rd table, look at the Pearson Chi-Square line. Your Chi-square value is � 2 = 1.250. Your p-value is p = 0.264. This suggests that the response to predators was not different between male and female guppies.

• WHAT TO REPORT: You want to report the �2 value, df, and p, parenthetically, following a

statement that describes the patterns in the data. For example: Male and female guppies did not differ in their response to predators (chi-square test of independence, �2=1.25, df=1, p>0.20).

� ��

��////��

Regardless of gender, more guppies fed in the refuge areas than in the predator areas. Ninety percent of males and seventy percent of females fed in the refuge areas.

��$ �� ( � �� $ �� ( � �� $ �� ( � �� $ �� ( � �� &&&&%%%%��=� �� - $ �� ( �� , �� =� �� - $ �� ( �� , �� =� �� - $ �� ( �� , �� =� �� - $ �� ( �� , ��

� �� $ � � " %�� $ � � " %�� $ � � " %�� $ � � " %��

If the values of two variables appear to be related to one another, but one is not dependent on the other, they are considered to be correlated. For example, fish weight and egg production are generally correlated, but neither variable is dependent on the other. No causation is implied, meaning we have no reason to suspect that fish weight causes egg number or vice versa. The correlation coefficient, r, provides a quantitative measurement of how closely two variables are related. It ranges from 0 (no correlation) to 1 or -1 (the two variables are perfectly related, positively or negatively). Let’s examine the correlation between bird weight and bill length, using the data displayed below. Bird # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 bird weight (g) 15 13 10 14 12 12 9 17 14 14 11 13 16 12 15 bill length (mm) 43 45 35 41 42 39 39 47 44 48 41 43 42 45 45 • Enter the data above in a new spreadsheet and name the columns Weight and Length. • To visualize what the correlation represents, make a scatterplot of the data. For instructions, go

to the graphing section of this manual. • The bird data listed above looks like this when graphed:

• From this plot you can see that as weight increases

there is also an increase in bird bill length. Thus, these two variables appear to be correlated.

To quantify the extent of the correlation and see if it is statistically significant: • Go to Analyze� Correlate � Bivariate.

10 12 14 16

Bird weight (g)

36

40

44

48

Bill

leng

th (m

m)

�

�

�

�

�

��

�

�

�

�

�

�

� �

� ��

�0�0�0�0��

• In the dialog box, move your 2 variables to the box on the right. Click on the check box for Pearson. Click OK.

Correlations

1 .666**. .007

15 15.666** 1.007 .

15 15

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

Weight

Length

Weight Length

Correlation is significant at the 0.01 level(2-tailed).

**.

• The first row in your correlation table gives you the Pearson correlation coefficient (r). In this

example, r = 0.666, which shows there is a positive correlation between Weight and Length. • The results of the statistical test shows that it is a statistically significant correlation, (p = 0.007

which is <0.05, and the results ** indicate that the Correlation is significant at the 0.01 level. This means that the slope is significantly different from zero. In other words, there is a strong relationship between the two variables.

• WHAT TO REPORT: Following a statement that describes the patterns in the data, you should parenthetically report the r, df, and p. For example: Larger birds have significantly longer bills (r=0.67, df=13, p<0.001).

��

��

�� $ � � " %�� $ � � " %�� $ � � " %�� $ � � " %�====� � ��

Regressions and correlations both test whether two variables are related to each other, and if so, how closely they are related. Regression is used when you suspect that the two variables are causally related, such that variation in one is causing the variation in the other. Regression is also used when you want to know the degree to which a change in one variable can predict a change in the other. A simple example is a one-to-one relationship between two variables, such as the relationship between the age and the number of growth rings of a tree. Another example is the relationship between the age and length of a fish.

Age (years) Length (cm)

1 12.2 2 14.3 3 15.7 4 16.1 5 18.8 6 19.0 7 20.4

The data consist of a value for the independent variable (x) and the associated value for the dependent variable (y). Think of these as on an x-axis and a y-axis. In our example, given the age of the fish (x) one can predict its length (y). Generally, the independent variable (x) is controlled or standardized by the investigator, and the y variable is dependent on the value of x.

� ��

�2�2�2�2��

A regression calculates the equation of the best fitting straight line through the (x,y) points that the data pairs define. In the equation of a line (y = a + bx), a is the y-intercept (where x=0) and b is the slope. The output of a regression will give you estimates for both of these values. If we wanted to predict the length of a fish at a given age, we could do so using the regression equation that best fits these data. • Enter the data above into a new spreadsheet and

name the two data columns Age and Length. • To visualize the relationship between these two

variables, make a scatterplot of the data. See the graphing section of this manual for instructions on making scatterplots.

• The graph shows that there is a strong positive

relationship between fish Age and Length. The equation is for the regression line that best describes the relationship between the two variables.

• The R-square (R2) value is the coefficient of determination, and can be interpreted as the proportion of the variation in the dependent variable that is explained by variation in the independent variables. R2 ranges from 0 to 1. If it is close to 1, it means that your independent variable has explained almost all of why the value of your dependent variable differs from observation to observation. If R2 is close to 0, it means that they have explained almost none of the variation in your dependent variable.

• In this example it appears that 97% of the variation in the Length is explained by variation in Age.

• Now what you want to do is determine whether the relationship is statistically significant. To run a regression analysis: • Go to Analyze � Regression � Linear. • Your response variable goes in the “Dependent variable” box. • Your explanatory variable goes in the “Independent variable” box. • The output contains four tables… The first table simply tells you what variables were used in

what way. Variables Entered/Removed(b)

Model Variables Entered

Variables Removed Method

1 Age(a) . Enter a All requested variables entered. b Dependent Variable: Length • The model summary table provides the basic data for the analysis, along with the R2 value. Model Summary

2 4 6

Age (years)

12

14

16

18

20

Leng

th (c

m)

�

�

��

��

�Length (cm) = 11.34 + 1.33 * AgeR-Square = 0.97

� ��

�4�4�4�4��

Model R R Square Adjusted R

Square Std. Error of the Estimate

1 .984(a) .969 .963 .56208 a Predictors: (Constant), Age

• The next table is an ANOVA table, and in fact, a regression analysis is very similar to an ANOVA. (If the independent variable is categorical you use an ANOVA and if it is continuous you use a regression.) The results of the ANOVA table indicate whether the relationship between the two variables is significant. Here, p < 0.001 (in the Sig. column), so we can conclude that age is a significant predictor of length.

ANOVAb

49.157 1 49.157 155.597 .000a

1.580 5 .31650.737 6

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Agea.

Dependent Variable: Lengthb.

• The fourth table contains the Regression Coefficients which are the estimates of y-intercept (in

the row titled Constant) and the slope (in the row titled Age). A regression analysis tests whether the y-intercept and slope of the best fit line are each significantly different from zero. The p-value for each row allows you to assess this. If the p-value for the y-intercept is less than 0.05, then the y-intercept is significantly different from zero. If the p-value for the slope is less than 0.05, then the slope is significantly different from zero.

Coefficientsa

11.343 .475 23.878 .0001.325 .106 .984 12.474 .000

(Constant)Age

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Lengtha.

• From the output, we can see the very high R2 value reveals that 97% of the variation in length (dependent variable) can be explained by variation in age (independent variable).

• NOTE: There is no p-value associated with an R2! • The very low p-value (Sig. = 0.000) in the ANOVA table indicates that the relationship is highly

significant, and thus very unlikely to occur by chance alone. • The output also indicates that the y-intercept (Constant) and the slope (Age) are significantly

different from zero. • These statistics support the strong relationship that is evident in the scatterplot shown on the

previous page. • In a paper describing your results you would include a scatterplot of your data along with the

equation for the regression line. • WHAT TO REPORT: Following a statement that describes the patterns in the data, you should

parenthetically report the F, df, and p from the ANOVA, as well as the R2 value. Remember that

� ��

&5&5&5&5 ��

there is no p-value associated with an R2! For example: Fish age can significantly predict fish length (F=155.6, df=1,5, p<0.001; R2=0.97).

� ��$ - �� " � �%� ��$ - �� " � �%� ��$ - �� " � �%� ��$ - �� " � �%�� For studies that use parametric statistics to compare means of different groups, you will want to present those data in the form of a bar graph that shows means and some measure of variation around the means (standard deviation or standard error). For studies that use non-parametric statistics to compare groups, the mean is likely not a good representation of the “typical” member of the population, so instead you will want to present your data in the form of a box plot which shows medians and interquartile ranges. For studies that compare the relationships between two (or more) continuous variables, you will want to present those data in the form of a scatterplot.

��

� �$ �� ,� �$ �� ,� �$ �� ,� �$ �� , �� $ - �� $ - �� $ - �� $ - �� • First, make sure your data are set up to be graphed:

• Make sure that your data file is not split. (Go to Data � Split File and select Analyze all cases, do not create groups)

• Go to Variable View and give your data Labels (in the Labels column) if you haven’t already. Make sure that you include units for all quantities. For the iguana example, you might give “Density” the label “Number of Iguanas”. Or if you were graphing % cover, you could label it “lichen % cover”.

• Identify your categorical variable (e.g. site, location, etc.) as being Nominal (in the Measure column). Leave the response variable as Scale.

• In the Decimals column, change the number of decimals associated with each “scale” variable so that they accurately reflect the precision of your measurement.

• Go to Graphs � Interactive � Bar. • On the right of the pop-up window is a graphic which represents your two axes. You want to

define the variables for each axis by dragging them into the appropriate cells. • Grab your categorical variable from the left window and drag to the x axis. • Grab your response variable and drag to the y axis. The default value for the Y axis is

“Count” (for some unknown reason). • REMEMBER: you might do the analysis on transformed data, but you always want to

graph raw data. • At the bottom, check to make sure that under “Bars Represent…” it says “Means” • Choose the Bar Chart Options tab. Make sure neither box is checked under Bar Labels. • Choose the Error Bars tab. Click on the box for “Display Error Bars.” Under Confidence

Intervals, instead of “Confidence Interval for Mean” select “Standard Error of the Mean” as your units. Make sure the box directly under the pull-down menu says 1.0.

� ��

&�&�&�&� ��

Island A Island B

Location

0

5

10

15

Den

sity

of i

guan

as

�

�

Figure 1. Mean (+SE) density of iguanas on two smallislands in the Galapagos archipelago. Island B hassignificantly more iguanas per unit area than Island A(t=2.5, df=38, p<0.05).

• Choose the Titles tab. Do not put a title or subtitle, but DO put in the complete text of your figure legend in the box labeled “caption”. Read the note at the bottom of the Titles tab window and do as it suggests!

• Hit OK. To pretty it up: • For presentation in a paper, you don’t want

any information on the sides – you want info about error bars and sample size in the caption. Double click on any floating information. Unclick the box that says “display key.”

• Double click on a bar and see your options for bar color, fill, width, etc. Resist your temptation to use fancy colors, etc. As a rule, you should try to keep your figures as simple as possible – so stick to black, white, and/or hatched bars if possible.

Sometimes you will want to put letters over your bars (for example, to show the results of post-hoc tests from an ANOVA). • Double click on your figure to select it. • Click on the Text Tool on the left hand

menu bar (a line w/ an a next to it). • Click above a bar and type the appropriate letter. • Do this for all of your bars. • You might need to change the y axis so your letter will fit above the bars. To do that, double

click on the y axis. The “scale axis” window will pop up. Unclick the “Auto” box next to “Maximum” and manually change the value to what you need it to be. Click OK.

�� " �, �� $ - �� " �, �� $ - �� " �, �� $ - �� " �, �� $ - �� If you have several different data categories (for example, if you were doing a 2-way ANOVA) you would want to use a clustered bar graph in which you plot the mean number of the response variable found in each level of your multiple categories. • Prepare your data for graphing as you did above. • Go to Graphs � Interactive � Bar. Drag the response variable to the y-axis and put one of the

independent variables on the x-axis. Now, drag the other factor and put it in the space for Legend Variables: Color. Keep the selection to the right of this space as “Cluster”.

• For the bird bill size by species and sex example on page 18, Species was put on the x-axis and Sex was put in the space for Legend Variables: Color.

• Follow the same directions as above for what to do on the Bar Chart Options, Error Bars, and Titles tabs. Hit OK.

• You will get a graph which has different colors for different groups. As with a regular bar graph, make sure you include standard error bars and that you get rid of floating stuff.

� ��

&�&�&�&��

Island A Island B

Location

12

16

20

24

Den

sity

of i

guan

as

�

�

�

• Keep the key that shows which bars are identified by which colors! Make sure you use colors / patterns that will print well in black and white. (Two dark solid colors will not work).

� # �$ � � # �$ � � # �$ � � # �$ � �� Box plots shows the median, the interquartile ranges, and any outliers in the data. This is a common way to graphically represent non-parametric data. • Go to Graphs � Interactive � Boxplot. • Put your categorical variable on the x axis and your

response variable on the y. • Click on the Boxes tab in the window. Make sure that

the following boxes are all checked: Outliers, Extremes, and Median Line.

• Don’t forget to add a caption…. • To pretty it up, follow directions as above. • A boxplot has a number of nifty features. The line in

the middle of each box represents the median value of the response variable in that category. The box covers the middle 50% of observations in each category. The whiskers outside the box extend between the highest and lowest values in the sample that are within 1.5 box lengths from the edge of the box. Individuals that are outside this limit are shown by circles.

• What a boxplot can tell you: a) where the medians are in the 2 groups, b) how variable the groups are. For example, iguana densities on Island B have a higher median value and are much more variable than iguana densities on island A.

�� $ � �� $ � �� $ � �� $ � �� • Go to Graphs � Interactive � Scatterplot. • For a dataset being analyzed with correlation, it doesn’t

really matter which variable goes on the x-axis and which goes on the y-axis.

• Click and drag the variables where you want them. For the correlation example in this manual, we put Length on the y-axis and Weight on the x-axis.

• Click OK. • For a dataset being analyzed with regression, you must

put the explanatory variable on the x-axis and the response variable on the y-axis. In the regression example in this manual, we put age on the x-axis and length on the y-axis.

• Click on the tab for Fit and for the method choose Regression • Click OK.

� ��

&&&&&&&&��

�� To print table or graph output from SPSS, click on it, go to Print (under File or choose the icon) and print the selection. Alternatively, you can copy it into MS Word and print from there. Some students find it easier to produce figures in SPSS without figure legends, copy the figures to Word, and add captions there using text boxes.

manual 07.pdfusing spss version 14 joel elliott, jennifer burnaford, stacey weiss spss is a program...

Documents