the not-even-remotely close to being a complete guide to ...€¦ · the not-even-remotely close to...

The Not-Even-Remotely Close to

Being a Complete Guide to

SPSS / PASW Syntax

(For SPSS / PASW v.18+)

Dr. Bryan R. Burnham

Department of Psychology

University of Scranton

1 of 49

Table of Contents1. What is SPSS / PASW ?.........................................................................................................................3

1.1 Where is it Available?......................................................................................................................31.2 Finding and Opening PASW...........................................................................................................31.3 Three Types of PASW Files.............................................................................................................31.4 Data Files (Data View)....................................................................................................................41.5 Defining and Adjusting Variables in Data Files (Variable View)....................................................41.6 Basic Structure of Output Files.......................................................................................................61.7 Data Files Associated with this Guide.............................................................................................7

2. The Syntax Editor..................................................................................................................................92.1 Why Syntax? Because it’s Better!.................................................................................................102.2 Some Syntax Basics...It’s Easy?....................................................................................................102.3 Opening .sav files with Syntax......................................................................................................112.4 Opening Microsoft Excel (.xls) Files with Syntax........................................................................112.5 Opening Text (.txt) files with Syntax............................................................................................12

3. Syntax for Basic Statistical Needs.......................................................................................................143.1 Variable Labels..............................................................................................................................143.2 Value Labels .................................................................................................................................143.3 Frequencies....................................................................................................................................153.4 Descriptive Statistics.....................................................................................................................163.5 SORT CASES................................................................................................................................163.6 SPLIT FILE...................................................................................................................................17

4. Correlation & Regression.....................................................................................................................194.1 Pearson Correlations (Bivariate)...................................................................................................194.2 Pearson Correlations (Partial).......................................................................................................214.3 Univariate Regression (one regressor)..........................................................................................23

5. t-Tests...................................................................................................................................................265.1 One-Sample t-test..........................................................................................................................265.2 Independent Groups t-Tests...........................................................................................................275.3 Correlated Samples (Paired Samples) t-Tests................................................................................30

6. Analysis of Variance.............................................................................................................................326.1 Oneway Analysis of Variance (via GLM).....................................................................................326.2 Between Subjects Factorial ANOVA (via GLM)..........................................................................366.3 Repeated Measures ANOVA (via GLM).......................................................................................40

7. Chi Square............................................................................................................................................447.1 Cross-Tabulation Procedure (Factorial Chi-Square).....................................................................447.2 Oneway Chi-Square......................................................................................................................467.3 Goodness of Fit Test......................................................................................................................477.4 Alternative Method for Goodness of Fit Test................................................................................48

2 of 49

1. What is SPSS / PASW ?Statistics Package for Social Scientists (SPSS) is a software tool for analyzing sets of data. I have absolutely no idea what the acronym PASW stands for. I wish it was PAWS, because it would be easier to say. Anyway, PASW is just the newest version of SPSS (currently in version 18). SPSS/PASW operate like a spreadsheet program, such as Microsoft Excel, and the data files look a lot like Excel. Unlike Excel, PASW/SPSS is designed for manipulating and analyzing data.

As part of your course requirements, you will gain basic understanding of how to use PASW. Indeed, most statistical analyses are performed with PASW or some other software. Why do we teach you this stuff by hand, why not just use PASW? Simply put, it’s because without conceptual knowledge of where the results of an analysis done with PASW come from, they’re just a bunch of numbers in a computer file! Thus, we teach you what the variance of a set of data is and where it comes from by showing you how it’s calculated. This way, variance should make sense when using PASW. If my logic doesn’t make sense, drop out of the course and preferably out of college. :-)

1.1 Where is it Available?At the University of Scranton, SPSS / PASW is available in the Weinberg Memorial Library (WML) on the 1st floor and in group study rooms, Brennan Hall (BRN) rooms 102 and 201, McGurrin Hall (MGH) room 110, Hyland (HYL) Café and room 102 (where statistics classes are held), and Alumni Memorial Hall (AMH) rooms 214 and 202. It may be available in the PT/OT lab in the basement of Leahy Hall and in the Nursing Lab and the Stout Lab in McGurrin Hall.1

1.2 Finding and Opening PASWFrom the Start Menu, → All Programs → SPSS Inc. → PASW Statistics 18 → PASW Statistics 18 (the red icon with the gray sigma symbol).

1.3 Three Types of PASW FilesThere are three main files associated with PASW (and SPSS):

1. Data Files contain data to be analyzed, and have the extension '.sav'. Data files look a lot like a Microsoft Excel spreadsheet, with columns, rows and cells. Columns represent variables, with an abbreviated name of the variable at the top of each column. Rows represent cases, or research subjects. That is, each row/case could be the data associated with an individual, or a sample. The cells and values within the file are the data. (See Figures 1 & 2)

2. Syntax Files are used to request PASW conduct an analysis, and have the extension '.sps'. Hence, syntax files are command files that tell PASW what to do with data. I admit that most analyses and procedures in PASW can be obtained through the pull-down menus in the data file; but, syntax is better for reasons given later. Syntax files are similar to text editors where you insert text-based commands for PASW to interpret and, hopefully, run your requested analyses on the data. (See Figure 5)

3. Output Files are generated in response to PASW running an analysis on a set of data, and have the extension '.spv' (in SPSS the extension is '.spo'). Importantly, if something was written incorrectly in the syntax file, PASW will produce a “Warning”, usually with no additional output. Most of an output file is table-format, with the exception of graphs and charts. (See Figure 3)

1 Thanks to Dr. Barry Kuhle (University of Scranton) for compiling this list.

3 of 49

ΣΣ

1.4 Data Files (Data View)There are two different 'views' of a PASW data file:

1. Data View, where your data can be entered by hand, and where you can view the actual values of the working data file.

2. Variable View, where you can define parameters of your variables, such as how many decimals are showing, whether the variable is a string, a date, or a numeric variable, etc.

The figure below is a screen shot of the Data View in a blank PASW data file:

You can toggle between the Data View and the Variable View by clicking on the appropriate tab at the bottom left hand corner in any data file. You can also toggle back and forth between the Data View and the Variable View by double-clicking on any variable name. This amounts to double-clicking a column in Data View and double-clicking any row in Variable View. I will assume that you can figure out how to insert values into a data file, so I will not cover them here.

1.5 Defining and Adjusting Variables in Data Files (Variable View)If necessary, it is good to define the parameters of your variables first, so that when when you run an

4 of 49

Figure 1: Data View of a blank PASW data file.

analysis the output of any tables and graphs will be complete and understandable. Below is a screen shot of the Variable View in a blank PASW data file:

Below, I've listed each of the parameters that can be seen at the top of each column in Variable View, with a brief description of what each parameter can do:

NAME Refers to variables labels that you can enter, but must begin with a letter.

TYPE Indicates whether a variable is numeric, a string, a date, etc. Clicking TYPE opens a dialogue box, in which you can specify the type of data contained in a variable.

WIDTH Is how many numbers or letters is allowable for a value under a variable.

DECIMAL The number of decimal places displayed for numeric variables.

LABEL Allows you to assign a longer name to an abbreviated variable label in the data file. That is, you could 'name' a variable STAI, but 'label' the variable ‘State Trait Anxiety Inventory at time 1’. The abbreviated name appears under NAME, and the longer LABEL will appear on any tables or graphs in the output.

5 of 49

Figure 2: Variable View of a blank PASW data file.

VALUES Allows you to assign dummy-codes to variable. For example, if your data file contains the variable ‘Sex’, a 0 could refer to males and 1 could refer to females. But, 0's and 1's are arbitrary unless they are defined. This packet will show you how to assign labels using syntax.

MISSING Refers to what PASW should do with missing data entries.

COLUMNS Refers to how many columns wide you want the variable name to appear in the Data View. Normally this is set to eight.

ALIGN Allows you to have the values in each column left-justified, right-justified, or centered.

MEASURE Relevant to numeric variables. Indicates the measurement scale of a variable. It allows three levels: nominal, ordinal and scale, which refers to both interval and ratio data.

Most of these parameters are irrelevant for the time being. Later, you'll learn how to assign longer, more descriptive labels to a variable name, as well as dummy-code a variable.

1.6 Basic Structure of Output FilesAfter you have opened a data file, written syntax commands to request an analysis, and then run that analysis; PASW will produce an output file, like that below:

6 of 49

Figure 3: Example output file.

The output file is what we are trying to get PASW to provide us. It presents, in table or graph form, the descriptive and/or inferential statistics requested. As you can see in Figure 3, the output contains a single table with a listing of several descriptive statistics (N, Minimum, Maximum, Mean, Standard Deviation), for two different variables (SAT_CR and SAT_M). Don't worry about the variable names right now; trust me, you'll know what they are in a bit. Later, when you have PASW run an analysis on a set of data, I will not include whole screen shots of the output. Rather, I'll simply paste the output tables into the document. (Gotta conserve megabytes!)

1.7 Data Files Associated with this GuideThe data file that will be used throughout most of this packet is 'GRE Therapy Data File.sav', and is available on my statistics course website (http://sites.google.com/site/psyc210stats/), on the course files page. There are actually three data files with the same name ('GRE Therapy Data File.sav'; 'GRE Therapy Data File.xls'; and 'GRE Therapy Data File.txt'). I'll show you how to open each of these types of data files using syntax, so download each file. Here's a screen shot of a portion the data file:

The file contains a set of data from a fictitious study that examined the influence of a new Study Drug and different Types of Tutoring on student scores on the Graduate Record Examination (GRE). The GREs are a set of standardized examinations, like the Scholastic Aptitude Tests (SATs). The GREs are required by most graduate school programs to be reported by applicants. The GREs contain three sections, like the SATs: (1) quantitative reasoning, (2) verbal reasoning, and (3) analytical writing.

7 of 49

Figure 4: A portion of the data file used in this packet.

http://sites.google.com/site/psyc210stats/

http://sites.google.com/site/psyc210stats/

In this fictitious study, researchers investigated whether two independent variables (Study Drug and Type of Tutoring) improved scores on each section of the GREs. For the independent variable Study Drug, subjects were given nothing (control group), a placebo (placebo group), or one of two different dosages of the drug (100 mg/day or 200 mg/day). For the independent variable Type of Tutoring, subjects were not tutored (control group), were tutored with other students in small groups (Group Tutoring), or were tutored one-on-one (Individual Tutoring).

Subjects were tested at the beginning of the study during a pretest phase (before the independent variables were administered), and were tested several months later during a posttest phase (after the independent variables should have an influence). In addition to scores on each of the three sections of the GREs, there are a number of other variables included in the data set. Each subject's SAT scores were collected, their heights and weights were measured, and each subject was measured on their level of Trait Anxiety (enduring level of anxiety) and State Anxiety (temporary, situational anxiety). Trait and State anxieties were assessed using the State Trait Anxiety Inventory (STAI), during both the pretest and posttest phase. The table below lists the abbreviated NAME for each variable, along with a brief description of each variable.

Variable NAME Description of Variable

ID Identification number assigned to each subject.

Sex Each subject's biological sex; dummy-coded, where 1 = male and 2 = female.

Coll_Class Each subject's current year in college; dummy coded, where 1 = Freshmen, 2 = Sophomore, 3 = Junior, and 4 = Senior.

Coll_Maj Each subject's primary major; dummy-coded, where 1 = Psychology, 2 = History, 3 = Biology, 4 = Communications, 5 = English, and 6 = Mathematics.

Height_cm Each subject's height, measured to the nearest 0.1 cm.

Weight_kg Each subject's weight, measured to the nearest 0.1 kg.

SAT_CR Each subject's score on the Critical Reading (CR) section of the SATs.

SAT_M Each subject's score on the Mathematics (M) section of the SATs.

SAT_V Each subject's score on the Verbal (V) section of the SATs.

SAT_Tot Each subject's summed SAT score (SAT_CR + SAT_M + SAT_V)

GPA Each subject's current cumulative GPA.

Drug_Group Level of the independent variable Drug Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no drug given), 2 = Placebo Group, 3 = 100-mg of Drug/Day, and 4 = 200-mg of Drug/Day.

Tutor_Group Level of the independent variable Tutor Group, into which the subject was assigned; dummy-coded, where 1 = Control Group (no tutoring), 2 = Group Tutoring, 3 = Individual Tutoring.

Pre_STAIt Each subject's trait anxiety (t) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI).

Pre_STAIs Each subject's state anxiety (s) during the pretest phase; measured using the State Trait Anxiety Inventory (STAI).

Pre_GREv Each subject's score on the Verbal Reasoning (v) section of the GREs, during the pretest phase.

Pre_GREq Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the pretest phase.

Pre_GREa Each subject's score on the Analytical Writing (a) section of the GREs, during the pretest phase.

Post_STAIt Each subject's trait anxiety (t) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI).

Post_STAIs Each subject's state anxiety (s) during the posttest phase; measured using the State Trait Anxiety Inventory (STAI).

Post_GREv Each subject's score on the Verbal Reasoning (v) section of the GREs, during the posttest phase.

Post_GREq Each subject's score on the Quantitative Reasoning (q) section of the GREs, during the posttest phase.

Post_GREa Each subject's score on the Analytical Writing (a) section of the GREs, during the posttest phase.

Table 1: Variable NAMES and brief descriptions.

8 of 49

2. The Syntax EditorLooks and works like a text editor (Text Pad, Note Pad, Word Pad). You type in what you want PASW to do, in the correct sequence and using PASWs language, and PASW does what you asked it to do (hopefully). If anyone has ever done a little computer programming (C, C++, Matlab, etc.), then this is just like writing code; albeit much simpler code! PASW Syntax files have the file extension *.sps. Here’s an example of what the text editor looks like:

Note, if you use SPSS, then you won't have the various colors and the numbers for each line. The inclusion of different colors for different syntax statements I the PASW structure is a huge improvement over SPSS.

From here on out, I won't be pasting in screen shots of the syntax that we'll be using. Rather, I'll just be writing the syntax that you need to include in order to run a specific analysis or procedure. For example, rather than including a screen shop like Figure 5, I'll type out the syntax (with the appropriate colors and line numbers). Note, that you do not have to type out line numbers. Thus, the syntax in Figure 5 will appear as (see top of next page):

9 of 49

Figure 5: Example PASW syntax editor.

1 GET DATA2 /TYPE=XLS3 /FILE='C:\Documents and Settings\burnhamb2\My Documents\Class Materials\PSYC 210'+4 'Statistics\SPSS Assignments\SPSS-PASW Packet\GRE Therapy Data File.xls'5 /SHEET=name 'Sheet1'6 /CELLRANGE=full7 /READNAMES=on8 /ASSUMEDSTRWIDTH=32767 .9 DATASET NAME DataSet 2 WINDOW=FRONT

Don't worry about what all of this means right now, it will make sense in a little while. :-)

2.1 Why Syntax? Because it’s Better!There are two methods that can be used to have PASW do stuff: (1) using pull-down menus, (2) telling PASW what to do by writing syntax commands. (I’ll refer to these as the wrong-way and right-way, respectively.)

Is the syntax-method easier? No, but it’s much more useful, for a variety of reasons. First, you can do more within one syntax file and in a shorter time than with the pull-down menu method. Specifically, you can plan out all of the stuff you need PASW to do, write the appropriate syntax for everything, and then run it all at once. In contrast, with pull-down menus you have to do one thing at a time. Second, you can do more with syntax. There are certain procedures that are simply not possible with the pull-down menus, but that are possible with syntax. Third (and certainly not finally), if you go to grad school, especially in the sciences, you’ll need to learn programming. I’m giving you a head start. You’re welcome!

2.2 Some Syntax Basics...It’s Easy?PASW syntax is not case-sensitive, except for variable names. Remember: variable names are case sensitive. If you spell a variable's name correctly, but forget to capitalize a letter or make a letter lowercase, the syntax will not run.

I suggest writing commands and sub-commands in CAPS to help distinguish between commands and variables. This will allow you to parse the syntax quickly, especially if you write variable names in lowercase and uppercase.

Syntax commands and sub-commands should be entered on separate lines, or ended with a period (.), but not every syntax line has to end with a period, just the overall procedures. That is, if you look at the syntax in Figure 5, there is a period only on Line 8. This is because lines 1-8 are, collectively, asking PASW to retrieve a data file; hence, these eight lines encompass one whole pocedure.

Sub-commands within a command procedure, and parts of a command that appear on different lines, must start with a forward-slash (/), not a backward slash. PASW will not know what to do with such sub-commands if the forward slash is not entered. For example, if you look at Figure 5, you can see a forward slash beginning lines 2,3,5,6,7, and 8 (there is no slash in line 4, because line 4 is a continuation of line 3).

It is good to enter 'EXECUTE .' at the end of a command procedure. Some commands will not run without this terminator command. Unfortunately, I have never figured out which commands will and will not run with and without this ending statement.

10 of 49

Once your syntax is written, you need to run it in order to generate an output file. Highlight the syntax that you want to run and hit Ctrl+R to run the procedures. Or, instead of hitting Ctrl+R, click the Run Button on the toolbar. The Run Button is the green rightward-pointing arrow in the middle.

2.3 Opening .sav files with SyntaxI admit that if you have a PASW data file already created, you can really just locate that file and double click to open. Nonetheless, here's how to open a PASW data file using syntax (notes follow):

1 GET2 FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.sav'.3 DATASET NAME DataSet1 WINDOW=FRONT.

The file directory address in line 2 will differ, depending on where the file is placed on your hard drive. In this case, I placed the file on the Desktop for easy access. Note that the directory address for the file must be contained in single quotes ('). DATASET NAME on line 3 should just be set to DataSet1 as listed.

An output file will be generated when you run any syntax. When opening a data set, the output file will contain only the commands that led to the opening of the file. You can delete that output file.

2.4 Opening Microsoft Excel (.xls) Files with SyntaxBelow is an example of the syntax needed to open a data file saved as a Microsoft Excel spreadsheet:

1 GET DATA2 /TYPE=XLS3 /FILE='C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.xls'4 /SHEET=name 'Sheet1'5 /CELLRANGE=full6 /READNAMES=on7 /ASSUMEDSTRWIDTH=32767.8 DATASET NAME DataSet1 WINDOW=FRONT.

Notice that Line 1 here and Line 1 for opening a PASW data file are the same (GET DATA). You can think of this statement as the 'major command' that are you asking PASW to perform; all of the additional lines are sub-commands.

When opening an Excel spreadsheet, special care must be taken that you are asking PASW to open the correct sheet within the workbook (usually Sheet1), that you are asking for the correct cells in the worksheet, and that you have asked PASW to read in any variable names in the spreadsheet.

The sub-command on Line 2 (/TYPE) lists XLS, which is the file extension for Microsoft Excel files. On Line 4 (/SHEET=name), the name between the single quotes ('Sheet1') is the name of the worksheet within the Excel workbook where the data is located. If the data sheet in the workbook has a different name or number, this needs to be changed here. Line 5 (/CELLRANGE=full), refers to which cells within the named workbook sheet that are to be imported into PASW. If all of the cells with data are to

11 of 49

be imported, just use 'full', but if only some of the cells are to be imported, this should be indicated here (e.g., A1:B200). On Line 6 (/READNAMES=on), this tells PASW that the first row of the Excel sheet contains the names of the variables, and these should be treated as variable names. If the Excel book does not include variable names, then 'off' should be substituted for on.

2.5 Opening Text (.txt) files with SyntaxBelow is an example of the syntax necessary to open a data file that is saved as a text file:

1 GET DATA2 /TYPE=TXT3 /FILE="C:\Documents and Settings\burnhamb2\Desktop\GRE Therapy Data File.txt"4 /DELCASE=LINE5 /DELIMITERS="\t"6 /ARRANGEMENT=DELIMITED7 /FIRSTCASE=28 /IMPORTCASE=ALL9 /VARIABLES=10 ID F3.011 Sex F1.012 Coll_Class F1.013 Coll_Maj F1.014 Height_cm F5.115 Weight_kg F5.116 SAT_CR F3.017 SAT_M F3.018 SAT_V F3.019 SAT_Tot F4.020 GPA F5.321 Drug_Group F1.022 Tutor_Group F1.023 Pre_STAIt F2.024 Pre_STAIs F2.025 Pre_GREv F3.026 Pre_GREq F3.027 Pre_GREa F3.128 Post_STAIt F2.029 Post_STAIs F2.030 Post_GREv F3.031 Post_GREq F3.032 Post_GREa F3.1.33 CACHE.34 EXECUTE.35 DATASET NAME DataSet4 WINDOW=FRONT.

First thing, I have no idea why the lines are not colored; I was surprised myself. This set of syntax is a bit longer, mainly because you need to tell PASW to read in each variable name form the text file (Lines 10 – 32). Like the PASW syntax for importing data in an Excel spreadsheet, you need to be careful to include certain commands.

12 of 49

On Line 2 (/TYPE=TXT), the TXT is the file extension for text files.

On Line 4 (/DELCASE=LINE), this is telling PASW that each new case (i.e., each subject) is a different line (row) within the text file.

On Line 5 (/DELIMITERS="\t"), 'delimiters' define the boundaries between adjacent entries, that is, data points in a data file. The \t is telling PASW that the boundaries are defined by TABS.

On Line 7 (/FIRSTCASE=2), this is telling PASW that the data in the text file actually begin on line 2; that is, the first case (subejct) is on line 2 of the data file.

On Line 8 (/IMPORTCASE=ALL), this is telling PASW to import all of the data. This can be changed is you only want to import some of the data file.

Lines 10 – 32 list the labels of each variable in the data set. These variable labels actually appear on line 1 of the data set.

Once you have opened a data set, you should save it as a PASW data file to be used in the future. Then, you can just double click it open.

Throughout the reminder of this packet, when I am providing syntax examples or the output of a procedure, I am not going to provide too much commentary. I'd rather you explore the output and the syntax on your own to get a feel for everything.

13 of 49

3. Syntax for Basic Statistical Needs

3.1 Variable LabelsIn the data file, the NAME given to each variable is a short acronym. For example, 'ID' stands for 'Identification Number', 'Coll_Maj' stands for 'College Major', 'SAT_CR' stands for 'Critical Reading Score on the SATs', etc. So that you do not have to memorize each of these acronyms, it's a good idea to assign a LABEL to each variable. These VARIABLE LABELS do not show up in the data file, but will show up in an output file. Here is how to use the VARIABLE LABELS syntax to assign the label 'SAT Critical Writing Score' to SAT_CR (remember, you do not type the number at the beginning):

1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score' .

All that you need to do is to list the variable NAME (SAT_CR) followed by the LABEL you wish to assign (SAT Critical Writing Score). Be sure that the label is in single quotes. You can also assign labels to more than one variable at a time:

1 VARIABLE LABEL SAT_CR 'SAT Critical Writing Score' SAT_M 'SAT Math Score' .

3.2 Value Labels For independent variables that have several levels/groups, it is best to dummy-code those groups in the data file. That is, in the data file, male subjects and female subjects will not be called 'male' and 'female'; rather, they will be assigned arbitrary numbers. In the data file for this packet, for the variable 'Sex', males are assigned 1 and females are assigned 2. The numbers can be anything, as long as all males have the same number, and all females have the same number.

The reason, is that if you want to compare levels/groups of an independent variable, PASW requires they have numeric labels. The downside, is that if you run an analysis that involves those groups/levels, only the arbitrary numbers will appear in the output. You'd have to memorize what the label 1 means for the variable Sex, versus what the label 1 means for another independent variable.

But, you can assign LABELS to the dummy-code VALUE assigned to groups. These VALUE LABELS will not show in the data file, but do show in output. Here is an example of how to use the VALUE LABELS syntax to assign labels to the dummy-coded males and females for the variable Sex:

1 VALUE LABEL Sex 1 'Males' 2 'Females' .

If you want to assign labels to more than one independent variable at a time, it is best to use several individual commands:

1 VALUE LABEL Sex 1 'Males' 2 'Females' .2 VALUE LABEL Coll_Class 1 'Freshmen' 2 'Sophomore' 3 'Junior' 4 'Senior' .3 VALUE LABEL Coll_Maj 1 'Psychology' 2 'History' 3 'Biology' 4 'Communications' 5 'English' 6

14 of 49

4 'Mathematics' .5 VALUE LABEL Drug_Group 1 'Control Group (no drug)' 2 'Placebo Group' 3 '100 mg/day6 Group' 4 '200 mg/day Group' .7 VALUE LABEL Tutor_Group 1 'Control Group (no tutoring)' 2 'Group Tutoring' 3 'Individual8 Tutoring'.

In the data file, I have assigned VALUE LABELS to each independent variable. Hence, when output is presented later in this packet, the groups will not have dummy-codes, they have the labels assigned from the syntax above.

3.3 FrequenciesThe FREQUENCIES command is used to obtain a frequency table for a variable. The syntax below asks PASW to determine the frequency for each group within the variables Sex and Coll_Class. Note that the variable names have to be entered just as they appear at the top of the columns in the data file. Also, note that you can request frequencies for several variables at once. This is typical for most PASW commands: you can request a procedure for several variables simultaneously:

1 FREQUENCIES VARIABLES=Sex Coll_Class2 /ORDER=ANALYSIS.

The syntax above provides the following output (comments were added by me):

Statistics

Sex Coll_Class Coll_Maj

N Valid 240 240 240

Missing 0 0 0

Frequency TableSex

Frequency Percent Valid Percent Cumulative Percent

Valid Males 109 45.4 45.4 45.4

Females 131 54.6 54.6 100.0

Total 240 100.0 100.0

Coll_Class

Frequency Percent Valid Percent Cumulative Percent

Valid Freshmen 57 23.8 23.8 23.8

Sophomore 65 27.1 27.1 50.8

Junior 63 26.3 26.3 77.1

Senior 55 22.9 22.9 100.0

Total 240 100.0 100.0

15 of 49

How many cases (subjects) that contribute to each of the three variables.

Each group that contributes to each variable is listed to the left

3.4 Descriptive StatisticsAlthough descriptive statistics can be requested as a sub-command within many PASW commands, there is a specific DESCRIPTIVES command. Like the FREQUENCIES command, you can request descriptive statistics for several variables at the same time. In the syntax below, I requested PASW to compute descriptive statistics on the variables Height_cm and Weight_kg:

1 DESCRIPTIVES VARIABLES=Height_cm Weight_kg2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS3 SKEWNESS.

You can request a variety of descriptive statistics. On Lines 2 and 3, I listed each descriptive statistic that can be requested; most should be self-explanatory, except for 'SEMEAN', which stands for standard error of the mean, and KURTOSIS and SKEWNESS, which refer to the peakedness of a distribution and the skewness of a distribution, respectively. In the output that follows, I did not request the KURTOSIS and the SKEWNESS statistics:

Descriptive Statistics

N Range Minimum Maximum Sum Mean

Std.

Deviation Variance

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic

Height_cm 240 49.3 142.3 191.6 40559.5 168.998 .5922 9.1745 84.171

Weight_kg 240 84.7 25.6 110.3 16248.1 67.700 .9066 14.0451 197.266

Valid N

(listwise)

240

3.5 SORT CASES

If you want to sort all of the cases in the data file in ascending or descending order, based on a certain variable, the following SORT CASES command is used. The syntax below asks PASW to arrange the data file in ascending order (A) based on the variable Coll_Class. In the data file, freshmen will appear first, then sophomores, followed by juniors, and finally seniors. If you want to sort in descending order, use (D) in place of (A). (There is no output for this syntax command.)

1 SORT CASES BY Coll_Class(A).

16 of 49

Each requested variable is listed in a different column.

Each variable is listed in the far left column.

3.6 SPLIT FILEI section 3.4 above, where PASW was asked to calculate descriptive statistics, each statistic was based on the n = 240 subjects in the data file. There is nothing wrong with this, but what if you wanted to look at the means and descriptive statistics for different groups? For example, you may want to look at students' mean weights and mean heights for each college class. But, the output in section 3.4 includes data combined from across all four college classes.

Luckily, PASW has a SPLIT FILE command that asks PASW to calculate descriptive statistics for different groups within some independent variable. For example, say you wanted to examine the descriptive statistics by college class. First, you need to use the following syntax to 'split' the output file into different groups:

1 SORT CASES BY Coll_Class.2 SPLIT FILE SEPARATE BY Coll_Class.

Next, run the same DESCRIPTIVES syntax in Section 3.4:

1 DESCRIPTIVES VARIABLES=Height_cm Weight_kg2 /STATISTICS=MEAN SUM STDDEV VARIANCE RANGE MIN MAX SEMEAN KURTOSIS3 SKEWNESS.

You will get the following output, which is the descriptive statistics performed on each group within the variable Coll_Class:

Coll_Class = FreshmenDescriptive Statistics

N Range Minimum Maximum Sum MeanStd.

Deviation VarianceStatistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Statistic

Height_cm 57 31.2 153.3 184.5 9638.8 169.102 1.1159 8.4251 70.982Weight_kg 57 71.9 38.4 110.3 3847.0 67.491 1.9307 14.5762 212.467Valid N (listwise)

57

Coll_Class = SophomoreDescriptive Statistics




65

17 of 49

T he variable by which you want the output 'split ' into different groups is listed here.

Coll_Class = JuniorDescriptive Statistics




63

Coll_Class = SeniorDescriptive Statistics




55

When you're done using the SPLIT FILE COMMAND, don't forget to turn it off; or else all of your output will be separated into different groups:

1 SPLIT FILE OFF .

18 of 49

4. Correlation & Regression

4.1 Pearson Correlations (Bivariate)PASW can measure the statistical association between two variables in a variety of ways (e.g., Pearson correlation, Spearman correlation, Chi-Square, gamma coefficients). For the data in our file, we'll be dealing with how PASW can calculate the Pearson correlation between two variables.

The CORRELATIONS syntax below asks PASW to calculate the Pearson correlation between the variables SAT_CR (SAT Critical Writing Score) and SAT_M (SAT Math Score). All that you need to do is to list on Line 2 the variables between which you want the Pearson correlation measured:

1 CORRELATIONS2 /VARIABLES= SAT_CR SAT_M3 /PRINT=TWOTAIL NOSIG4 /MISSING=PAIRWISE.

On Line 3, the TWOTAIL sub-command tells PASW to run the inferential test on the Pearson correlation as a non-directional, two-tailed test. NOSIG asks PASW to indicate which correlations are statistically significant with an asterisk (*). On Line 4, the /MISSING=PAIRWISE sub-command tells PASW what to do with any missing data points. (In this data file, there are no missing data.) If you have a missing data point, PASW must know what to do with that subject's data. You have two options: handle missing data PAIRWISE or LISTWISE. If you choose LISTWISE, any subject who has a missing data point for any variable will be excluded from all correlations. If you choose PAIRWISE, a subject will be excluded from only those correlations where the subject is missing a data point. When you run the syntax above, you get the following output:

Correlations

SAT_CR SAT_M

SAT_CR Pearson Correlation 1 -.062

Sig. (2-tailed) .340

N 240 240

SAT_M Pearson Correlation -.062 1

Sig. (2-tailed) .340

N 240 240

Each variable is listed in its own column and own row. To find the Pearson correlation between two variables, cross-reference one variable in the columns with the other variable in the rows. The Sig. (2-tailed) value under the Pearson correlation is the p-value for that correlation. It is the exact alpha-level (α) associated with that size correlation (r = -.062) based on that sample size (n = 240). To interpret a p-value: if the listed p-value is less than your chosen alpha-level, which is generally α = .05 or less, then the correlation is significant. In this case, the Pearson correlation is not significant, because the p-value (p = .340) is greater than .05.

19 of 49

It is also possible to calculate several Pearson correlations at the same time. The more variables that you list on the /VARIABLES sub-command line, the more correlations will be calculated. For example, in the syntax below, I have listed three variables (SAT_CR, SAT_M, and SAT_V). When I run this syntax, PASW will generate the Pearson correlation between each pair of variables:

1 CORRELATIONS2 /VARIABLES= SAT_CR SAT_M SAT_V3 /PRINT=TWOTAIL NOSIG4 /MISSING=PAIRWISE.

Correlations

SAT_CR SAT_M SAT_V

SAT_CR Pearson Correlation 1 -.062 .481**

Sig. (2-tailed) .340 .000

N 240 240 240

SAT_M Pearson Correlation -.062 1 -.048

Sig. (2-tailed) .340 .461

N 240 240 240

SAT_V Pearson Correlation .481** -.048 1

Sig. (2-tailed) .000 .461

N 240 240 240

You can see in the output above, in addition to the correlation between SAT_CR and SAT_M that was calculated earlier, PASW also calculated the correlation between SAT_CR and SAT_V (r = .541), and between SAT_M and SAT_V (r = -0.48).

PASW also has a sub-command that allows you to request descriptive statistics to be calculated for each variable, and for the sums of squares, variances, sums of cross products, and covariances to be calculated. On line 4 of the syntax below, the DESCRIPTIVES command requests the means and standard deviations for each variable, and the XPROD command requests the variability and co-variability measures:

1 CORRELATIONS2 /VARIABLES=SAT_CR SAT_M SAT_V3 /PRINT=TWOTAIL NOSIG4 /STATISTICS DESCRIPTIVES XPROD5 /MISSING=PAIRWISE.

20 of 49

Here is the output from the last set of syntax. The first table includes the descriptive statistics for each variable, and the second table includes the person correlations, measures of variability, and measures of co-variability:


Mean Std. Deviation N

SAT_CR 491.14 105.938 240

SAT_M 516.79 127.100 240

SAT_V 496.69 66.772 240

Correlations

SAT_CR SAT_M SAT_V

SAT_CR Pearson Correlation 1 -.062 .481**

Sig. (2-tailed) .340 .000

Sum of Squares and Cross-products 2682249.18 -199155.775 813393.625

Covariance 11222.800 -833.288 3403.321

N 240 240 240

SAT_M Pearson Correlation -.062 1 -.048

Sig. (2-tailed) .340 .461

Sum of Squares and Cross-products -199155.775 3860928.162 -96992.938

Covariance -833.288 16154.511 -405.828

N 240 240 240

SAT_V Pearson Correlation .481** -.048 1

Sig. (2-tailed) .000 .461

Sum of Squares and Cross-products 813393.625 -96992.938 1065571.563

Covariance 3403.321 -405.828 4458.458

N 240 240 240

4.2 Pearson Correlations (Partial)Having PASW calculate the partial correlation between two variables (the correlation between two variables with the influence of other variables factored out from both variables), is not much different than asking PASW to calculate a raw (zero-order) correlation. For example, say you want to calculate the partial correlation between GPA and Pre_GREv scores (Pretest GRE Verbal Reasoning Scores), while factoring out the SAT_CR scores (Critical Reasoning Scores on the SAT) from both variables.

21 of 49

Between two different variables, this is the sum of cross products. Between the same variable, this is the sum of squares.

Between two different variables, this is the covarance. Between the same variable, this is the variance.

In the syntax below, on the /VARIABLES sub-command line, the two variables listed before the BY (GPA and Pre_GREv) are the variables between between which we want to calculate a partial correlation. The variable that comes after the BY (SAT_CR) is the variable we want factored out of the other variables. Please note that you can ask PASW to factor out more than one variable:

1 PARTIAL CORR2 /VARIABLES=GPA Pre_GREv BY SAT_CR3 /SIGNIFICANCE=TWOTAIL4 /STATISTICS=DESCRIPTIVES CORR 5 /MISSING=LISTWISE.

On Line 3, the /SIGNIFICANCE=TWOTAIL asks PASW to run the inferential test on the partial correlation as a non-directional, two-tailed test. You have the option of selecting a ONETAIL test as well. On line 4, the /STATISTICS sub-command is asking PASW to calculate the descriptive statistics (DESCRIPTIVES) for each variable. The CORR sub-command is asking PASW to provide the raw Pearson correlations between each pair of variables, in addition to the partial correlation between GPA and Pre_GREv.

Here is the output from the syntax above. The first table reports the descriptive statistics, and the second table is the correlations and partial correlations. The areas in yellow are the raw Pearson correlations, and the areas in green are the partial correlations:

Mean Std. Deviation NGPA 3.01082 .670476 240Pre_GREv 412.25000 57.397717 240SAT_CR 491.14167 105.937717 240

CorrelationsControl Variables GPA Pre_GREv SAT_CR

-none-a GPA Correlation 1.000 .169 .531Significance (2-tailed) . .009 .000df 0 238 238

Pre_GREv Correlation .169 1.000 .135Significance (2-tailed) .009 . .037df 238 0 238

SAT_CR Correlation .531 .135 1.000Significance (2-tailed) .000 .037 .df 238 238 0

SAT_CR GPA Correlation 1.000 .116Significance (2-tailed) . .074df 0 237

Pre_GREv Correlation .116 1.000Significance (2-tailed) .074 .df 237 0

22 of 49

4.3 Univariate Regression (one regressor)There is a mountain of stuff that you can do with PASWs REGRESSION procedures, including how a regression analysis is performed and what statistics can be requested. Below, I am performing a 'bare-bones' REGRESSION analysis to keep things simple. The analysis below will regress (predict) GPA on the Summed SAT Scores (SAT_tot). Hence, GPA is the dependent variable (Y) and SAT_tot is the predictor variable (X).

In the syntax below, PASW is being asked to regress GPA on SAT_tot. The DEPENDENT (predicted, or regressed) variable is listed on Line 6. The predictor (independent, or regressor) variable is listed on Line 7 after the ?METHOD sub-command. A few notes on Line 7: First, if you have more than one predictor, each predictor would be entered here. In this example we have only one predictor (SAT_tot). Second, there are a number of methods that you can use to have PASW conduct the analysis (ENTER, STEPWISE, etc.), but this is beyond the scope of this packet. Just use METHOD=ENTER:

1 REGRESSION2 /MISSING LISTWISE3 /STATISTICS COEFF OUTS R ANOVA4 /CRITERIA=PIN(.05) POUT(.10)5 /NOORIGIN 6 /DEPENDENT GPA7 /METHOD=ENTER SAT_Tot.

The /STATISTICS sub-command on Line 2 is where you can ask PASW to provide various statistics and inferential tests as part of the regression analysis. COEFF requests the slope and intercept coefficients in the regression model. OUTS asks PASW to list any predictors that were entered into the regression model, but were not included due to their not meeting criteria specified on Line 4. 'R' asks for the R and R2 values of the regression model. ANOVA ask for the analysis of variance to be conducted on the overall regression model.

On Line 4, the /CRITERIA=PIN(.05) POUT(.10) are inclusion and exclusion criteria for each regressor coefficient that is initially entered into the model. Basically, if a regressor coefficient does not meet these set criteria, which are based on the t-Tests for the coefficients, they are not included in the final regression model. These values can be adjusted, but the .05 and .10 are used by default.

When you run the syntax above, you get the following output:

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 SAT_Tota . Enter

Model Summary

Model

R R Square Adjusted R Square

Std. Error of the

Estimate

1 .774a .599 .597 .425693

23 of 49

T his table simply lists the predictor variables that are being entered into the regression analysis.

T his table provides the R and R2 values. T he R2 is the proport ion of explained variance.

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 64.311 1 64.311 354.887 .000a

Residual 43.129 238 .181

Total 107.440 239

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients t Sig.

B Std. Error Beta

1 (Constant) -1.093 .220 -4.979 .000

SAT_Tot .003 .000 .774 18.838 .000

You can also ask PASW to report descriptive statistics for each variable, correlations between variables, and a host of other information. In the syntax below, I added a /DESCRIPTIVES sub-command on Line 2 that asks for the MEAN and standard deviation (STDEV) for each variable, the Pearson correlation (CORR) between each pair of variables, that a significance test (SIG) be performed on each correlation, and for the number of subjects (N) contributing to each variable and to each correlation:

1 REGRESSION2 /DESCRIPTIVES MEAN STDDEV CORR SIG N3 /MISSING LISTWISE4 /STATISTICS COEFF OUTS R ANOVA ZPP5 /CRITERIA=PIN(.05) POUT(.10)6 /NOORIGIN 7 /DEPENDENT GPA8 /METHOD=ENTER SAT_Tot.

I also added ZPP to the /Statistics sub-command on Line 4. This asks PASW to calculate the zero-order, partial, and semi-partial correlations between every pair of variables. In this case, because no variable is being factored out of the relationship between GPA and SAT_tot, each of these correlations will be the same. The output from this syntax appears below and on the next page:



GPA 3.01082 .670476 240

SAT_Tot 1504.62 190.169 240

24 of 49

T he ANOVA is the overall analysis of the regression model.

T his table provides the values of the coefficients in the regression equat ion, as well as t -Tests on each coefficient .

T he requested descript ive stat ist ics for each variable.

Correlations

GPA SAT_Tot

Pearson Correlation GPA 1.000 .774

SAT_Tot .774 1.000

Sig. (1-tailed) GPA . .000

SAT_Tot .000 .

N GPA 240 240

SAT_Tot 240 240

Variables Entered/Removedb

Model Variables Entered Variables Removed Method

1 SAT_Tota . Enter

Model Summary

Model

R R Square Adjusted R Square

Std. Error of the

Estimate

1 .774a .599 .597 .425693

ANOVAb

Model Sum of Squares df Mean Square F Sig.

1 Regression 64.311 1 64.311 354.887 .000a

Residual 43.129 238 .181

Total 107.440 239

Coefficientsa

Model Unstandardized

Coefficients

Standardized

Coefficients t Sig. Correlations

B Std. Error Beta Zero-order Partial Part

1 (Constant) -1.093 .220 -4.979 .000

SAT_Tot .003 .000 .774 18.838 .000 .774 .774 .774

25 of 49

Lists the requested correlat ions and p-values.

Here are the requested zero-order, part ial, and semi-part ial correlat ions.

5. t-Tests

5.1 One-Sample t-testThere are three t-Tests PASW can perform on a set of data: one-sample t-Test, independent-groups t-Test (independent-samples t-Test), and correlated samples t-Test (paired-samples t-Test). But, the statistics that can be requested and the test parameters that you can control are very limited.

The syntax below asks PASW to run a one-sample t-Test. The dependent variable is GPA, which is entered on the /VARIABLES sub-command on Line 4:

1 T-TEST2 /TESTVAL=33 /MISSING=ANALYSIS4 /VARIABLES=GPA5 /CRITERIA=CI(.95).

Importantly, for the one-sample t-Test, you must state a value to which the mean of the dependent variable is compared. This value is entered after the /TESTVAL sub-command on Line 2. In this case, PASW is being asked to compare the mean GPA to a value of 3, which coincides with a grade of 'B'.

The /CRITERIA sub-command on Line 5 is pretty much all you have control over, besides the /TESTVAL on Line 2. The CI value tells PASW what size confidence interval and what alpha-level to use in the t-Test. In this case, .95 corresponds to the 95% confidence interval, and alpha level of .05.

If you run the syntax above, you get the following in the output file:

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

GPA 240 3.01082 .670476 .043279

One-Sample Test

Test Value = 3

t df Sig. (2-tailed) Mean Difference

95% Confidence Interval of the

Difference

Lower Upper

GPA .250 239 .803 .010821 -.07444 .09608

In the table for the One-Sample Test above, the Sig. (2-tailed) value is the p-value used as a basis for determining statistical significance. If it is less than your chosen alpha level (α = .05, or less), then the

26 of 49

T his table presents the descript ive stat ist ics for the dependent variable.

T his table presents the results of the inferent ial, one-sample t -test .

difference between the mean (3.01082) and the test value (3) is significant. In this case, the difference is not significant, because .803 > .05. The values underneath the heading 95% Confidence Interval of the Difference are the upper and lower boundaries for the 95% confidence interval around the difference between the mean and the test value (.010821).

As another example, the syntax below asks PASW to compare the mean pretest score on from the Analytical Writing section of the GREs (Pre_GREa) to a test value of 4.9. This test value of 4.9 is actually the national mean score on that section of the GREs:

1 T-TEST2 /TESTVAL=4.93 /MISSING=ANALYSIS4 /VARIABLES=Pre_GREa5 /CRITERIA=CI(.95).

Running this syntax, we get the following in the output file:

One-Sample Statistics

N Mean Std. Deviation Std. Error Mean

Pre_GREa 240 4.11 .361 .023

One-Sample Test

Test Value = 4.9

t df Sig. (2-tailed) Mean Difference


Difference

Lower Upper

Pre_GREa -33.731 239 .000 -.785 -.83 -.74

In this case, the One-Sample Test indicates that the mean difference (-.785) is statistically significant, because the p-value in the Sig. (2-tailed) column is less than the conventional alpha-level of α = .05.

5.2 Independent Groups t-TestsThe syntax on the next page illustrates how to conduct an independent groups t-Test. Note that when comparing two different groups or levels within a between-subjects independent variable, you must be sure that the groups/levels of that independent variable have been dummy-coded; that is, assigned numeric values in the data file. PASW will not run the independent groups t-Test if the groups have been assigned descriptive (string) labels in the data file.

Say that we want to compare the mean posttest score on the Verbal Reasoning Section of the GREs between different levels of the independent variable Tutor_Group. Specifically, we want to compare mean performance between the group of subjects who did not receive tutoring (Control Group) and the group of subjects who received individual tutoring (Individual Tutoring Group). Recall, within the

27 of 49

independent variable Tutor_Group, the group that did not receive tutoring was dummy-coded with 1 and the group that received individual tutoring was dummy-coded with 3 (the group that received group tutoring was dummy coded with 2).

In the syntax below, after the T-TEST command, the GROUPS sub-command is listed. In the parentheses, the 1 and 3 are the values that were assigned to the no tutoring group and the individual tutoring group, respectively. The dependent variable (Post_GREv) is listed after the /VARIABELS sub-command on Line 3:

1 T-TEST GROUPS=Tutor_Group(1 3)2 /MISSING=ANALYSIS3 /VARIABLES=Post_GREv4 /CRITERIA=CI(.95).

When you run this syntax, you get the following output:

Group Statistics

Tutor_Group N Mean Std. Deviation Std. Error Mean

Post_GREv Control Group (no tutoring) 80 419.25 62.334 6.969

Individual Tutoring 80 442.88 68.993 7.714

Independent Samples TestLevene's Test for

Equality of Variances t-test for Equality of Means

F Sig. t dfSig. (2-tailed)

Mean Difference

Std. Error Difference


DifferenceLower Upper

Post_GREv Equal variances assumed

.978 .324 -2.273 158 .024 -23.625 10.396 -44.157 -3.093

Equal variances not assumed

-2.273 156.400 .024 -23.625 10.396 -44.159 -3.091

The table Independent Samples Test lists a lot of information, some of which is relevant, some of which is less relevant. First, you will almost always assume equal variances, so be sure to use information from those rows. Second, Levene's Test for Equality of Variances is a test for whether the variances of the groups being compared are statistically equivalent. If Levene's Test is not significant, which is the case here, then we can assume that the variances are indeed equal.

The information under the heading t-test for Equality of Means is relevant to the independent groups t-Test on the data and most of the terms should be self-explanatory. Importantly, the Sig. (2-tailed) value is the p-value used for determining statistical significance. If it is less than a chosen alpha level (α = .05, or less), then the mean difference (-23.625) is significant, which is the case here. Please note that the mean difference is negative because of how the groups were entered into the t-Test in the syntax. That is, the no tutoring group was entered first in the syntax and the individual tutoring group was

28 of 49

T his table presents the descript ive stat ist ics on the dependent variable for each group within the independent variable.

T his table presents the results of the independent groups t -test that is comparing the means in the table above.

entered second. This means that PASW will subtract the individual tutoring mean from the no tutoring mean. Thus, this value is negative only because of how the groups are being entered; it has nothing to do with any hypotheses.

A nice feature about the PASW independent groups t-Test procedure is that you can run several t-Tests that are comparing performance between the same two groups. For example, let's say we also want to compare mean posttest score on the Analytical Writing Section of the GREs between the no tutoring group and the individual tutoring group. All that you have to do is add this dependent variable on the /VARIABLES sub-command on Line 3:

1 T-TEST GROUPS=Tutor_Group(1 3)2 /MISSING=ANALYSIS3 /VARIABLES=Post_GREv Post_GREa4 /CRITERIA=CI(.95).

Running this syntax, we get the following output:

Group Statistics

Tutor_Group N Mean Std. Deviation Std. Error Mean

Post_GREv Control Group (no tutoring) 80 419.25 62.334 6.969

Individual Tutoring 80 442.88 68.993 7.714

Post_GREa Control Group (no tutoring) 80 4.17 .355 .040

Individual Tutoring 80 4.22 .456 .051

Independent Samples TestLevene's Test for

Equality of Variances t-test for Equality of Means

F Sig. t dfSig. (2-tailed)

Mean Difference

Std. Error Difference


DifferenceLower Upper

Post_GREv Equal variances assumed

.978 .324 -2.273 158 .024 -23.625 10.396 -44.157 -3.093


-2.273 156.400 .024 -23.625 10.396 -44.159 -3.091

Post_GREa Equal variances assumed

4.221 .042 -.774 158 .440 -.050 .065 -.178 .078


-.774 149.096 .440 -.050 .065 -.178 .078

29 of 49

5.3 Correlated Samples (Paired Samples) t-TestsRecall, that the correlated samples t-Test is used to compare performance on some dependent variable across levels of a within-subjects independent variable. In PASW, the correlated samples t-Test is called the paired samples t-Test. As was the case for the one-sample t-Test and independent groups t-Tests, there is not much control over what you can request for the paired samples t-Test.

Say we want to compare trait-anxiety levels between the pretest and posttest periods. Recall, in the hypothetical study, the researchers measured each subject's state anxiety and trait anxiety using the State-Trait Anxiety Inventory (STAI), and these types of anxiety were measured during the pretest and posttest periods. If we are interested in, specifically, the change in trait anxiety between the pretest and posttest periods, we're going to want to compare the Pre_STAIt mean with the Post_STAIt mean.

The syntax below asks PASW to compare Pre_STAIt with Post_STAIt scores. On the T-TEST command line, the PAIRS sub-command tells PASW to run a paired samples t-Test. The levels of the variable being compared come before and after the WITH. Thus, Line 1 is basically telling PASW to “compare Pre_STAIt scores WITH Post_STAIt scores using a PAIRED samples t-Test:”

1 T-TEST PAIRS=Pre_STAIt WITH Post_STAIt (PAIRED)2 /CRITERIA=CI(.95)3 /MISSING=ANALYSIS.

Running this syntax, you get the following output:

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 Pre_STAIt 50.27 240 11.923 .770

Post_STAIt 50.09 240 11.944 .771

Paired Samples Correlations

N Correlation Sig.

Pair 1 Pre_STAIt & Post_STAIt 240 .987 .000

Paired Samples Test

Paired Differences t dfSig. (2-tailed)

MeanStd.

DeviationStd. Error

Mean95% Confidence Interval of

the DifferenceLower Upper

Pair 1 Pre_STAIt - Post_STAIt .175 1.939 .125 -.072 .422 1.398 239 .163

In the table Paired Samples Test, most of the statistics should be familiar and straightforward. Mean is the mean difference in the dependent variable between the levels of the independent variable. Note

30 of 49

that this value is positive because of how PASW entered the levels of the independent variable into the t-Test. In the syntax, Pre_STAIt was entered before WITH and Post_STAIt was entered after WITH; hence, the Post_STAIt mean was subtracted from the Pre_STAIt mean. Thus, it is positive only because of how the levels were entered. Ion this example, the mean difference (.175) is not statistically significant, because the p-value (.163) is greater than .05.

You can request several Paired Samples t-tests at the same time. For example, in addition to comparing the Pre_STAIt mean with the Post-STAIt mean, say we also want to compare the pretest and posttest scores from the Quantitative Reasoning section of the GREs (Pre_GREq compared to Post_GREq). In the syntax below, two variables are listed before WITH (Pre_STAIt and Pre_GREq) and two variables are listed after WITH (Post_STAIt and Post_GREq):

1 T-TEST PAIRS=Pre_STAIt Pre_GREq WITH Post_STAIt Post_GREq (PAIRED)2 /CRITERIA=CI(.95)3 /MISSING=ANALYSIS.

When the syntax is run, PASW will compare the mean of the first variable before WITH (Pre_STAIt) with the mean of first variable after WITH (Post_STAIt); and PASW will compare the mean of the second variable before WITH (Pre_GREq) with the mean of second variable after WITH (Post_GREq). Thus, it is critical to enter the variables on each side of the WITH in the appropriate order when running several paired-samples t-tests. Running this syntax provides the following output:

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 Pre_STAIt 50.27 240 11.923 .770

Post_STAIt 50.09 240 11.944 .771

Pair 2 Pre_GREq 568.71 240 76.484 4.937

Post_GREq 591.13 240 79.685 5.144

Paired Samples Correlations

N Correlation Sig.

Pair 1 Pre_STAIt & Post_STAIt 240 .987 .000

Pair 2 Pre_GREq & Post_GREq 240 .934 .000

Paired Samples Test

Paired Differences t dfSig. (2-tailed)

MeanStd.

DeviationStd. Error

Mean95% Confidence Interval of

the DifferenceLower Upper

Pair 1 Pre_STAIt - Post_STAIt

.175 1.939 .125 -.072 .422 1.398 239 .163

Pair 2 Pre_GREq - Post_GREq

-22.417 28.461 1.837 -26.036 -18.798 -12.202 239 .000

31 of 49

6. Analysis of Variance

6.1 Oneway Analysis of Variance (via GLM)Analysis of Variance (ANOVA) is used, for among other reasons, to compare performance on a dependent variable across two or more levels of one or more independent variables. Oh the things I could say about ANOVA and experimental design! Alas, we do not have time. The PASW procedure for ANOVA is the General Linear Model (GLM). Don't worry about what it means, just know that it calculates F-tests for single-factor and factorial designs.

ANOVA can be used when the levels of an independent variable are manipulated (experimental design), or naturally-occurring (quasi-experimental design). Critically: Setting up ANOVA in PASW requires you to think about the design: Is there one independent variable, or more? How many levels of each independent variable are there? Do the levels of the independent variables differ between-subjects or within-subjects? I don't want to get technical, so I'll be as simple as possible.

From the data set, say we want to compare the Posttest GRE Verbal Reasoning Scores (Post_GREv) across the four groups within the independent variable Drug_Group. Thus, we have a oneway ANOVA; that is, one independent variable and one dependent variable. The syntax below presents the minimal set of sub-commands needed to run a oneway ANOVA. This syntax is used only if the independent variable is between-subjects (withing-subjects variables require a repeated measured GLM):

1 UNIANOVA Post_GREv BY Drug_Group2 /METHOD=SSTYPE(3)3 /INTERCEPT=INCLUDE4 /CRITERIA=ALPHA(.05)5 /DESIGN=Drug_Group.

The variable before BY (Post_GREv) is always the dependent variable and the variable after BY (Drug_Group) is always the independent variable. If you have a factorial design, the additional independent variables would be entered here. On Line 2, the /METHOD sub-command tells PASW how the sums of squares should be calculated (SSTYPE), which is usually set to 3. On Line 4, the /CRITERIA sub-command tells PASW what alpha level to use. Finally, on Line 5, the /DESIGN sub-command is where you build the effects to be examined in the ANOVA. In the case of a oneway design, there is only one independent variable to influence the dependent variable; hence, you list that independent variable. When you are using ANOVA to analyze a factorial designs, additional factors can be included.

Running the syntax above gives you the following:

Between-Subjects FactorsValue Label N

Drug_Group 1 Control Group (no drug) 602 Placebo Group 603 100 mg/day Group 604 200 mg/day Group 60

32 of 49

T his table lists each level of the independent variable, as well as the number of subjects (N) contribut ing to each level.

Tests of Between-Subjects EffectsDependent Variable:Post_GREvSource Type III Sum of

Squares df Mean Square F Sig.Corrected Model 28571.250a 3 9523.750 2.333 .075Intercept 4.487E7 1 4.487E7 10991.299 .000Drug_Group 28571.250 3 9523.750 2.333 .075Error 963375.000 236 4082.097Total 4.586E7 240Corrected Total 991946.250 239

The ANOVA summary table (Tests of Between-Subjects Effects) contains a lot of information, some of it unnecessary for our present purpose. I have highlighted relevant portions of the table in yellow. The terms associated with between group variance (variability due to the independent variable) are in the row labeled Drug_Group, which is the independent variable. The terms associated with the within group variance are in the row labeled Error. Most values in each column should be straightforward: Sums of squares for each source of variance are in the second column, degrees of freedom are in the third column, mean squares come next, followed by the F-test, and finally p-values. In this case, the F-Test on the independent variable is not statistically significant, because the p-value (.075) is greater than the chosen alpha-level (.05). Let's assume the test was significant, so we can do post-hoc tests.

If you have a statistically significant F-Test, you need to know between which levels of the independent variable there is a significant difference in the dependent variable: we need post-hoc tests.

The syntax below includes additional sub-commands. First, the /POSTHOC sub-command on Line 4 asks PASW to compare levels of the independent variable Drug_Group using Fisher's Least Significant Difference test (LSD). You have several options for what post-hoc test to use (TUKEY, BONFERRONI), but we'll stick with LSD for now. On Lines 5 and 6, the /EMEANS sub-command asks PASW to calculate the estimated mean of the dependent variable at each levels of the independent variable. Specifically, Line 5 asks for the grand mean (OVERALL), and Line 6 asks for the estimated mean for each level of Drug_Group. Finally, the /PRINT sub-command on Line 7 asks PASW to include additional items in the output. Specifically, ETASQ requests the eta-squared measure for the effect size, and DESCRIPTIVE asks for the descriptive statistics. There are many additional items that you can ask PASW to 'print' in the output, but we'll stick with these.

1 UNIANOVA Post_GREv BY Drug_Group2 /METHOD=SSTYPE(3)3 /INTERCEPT=INCLUDE4 /POSTHOC=Drug_Group(LSD) 5 /EMMEANS=TABLES(OVERALL) 6 /EMMEANS=TABLES(Drug_Group) 7 /PRINT=ETASQ DESCRIPTIVE8 /CRITERIA=ALPHA(.05)9 /DESIGN=Drug_Group.

33 of 49

T his table is the ANOVA summary table. T he sums of squares, degrees of freedoms, mean squares, F-Tests, and p-values are listed here.

When you run the syntax, you get the following output:



Descriptive StatisticsDependent Variable:Post_GREvDrug_Group Mean Std. Deviation N

Control Group (no drug) 424.83 57.327 60Placebo Group 423.83 58.457 60100 mg/day Group 430.00 68.668 60200 mg/day Group 450.83 70.068 60Total 432.38 64.424 240

Tests of Between-Subjects EffectsDependent Variable:Post_GREvSource Type III Sum of

Squares df Mean Square F Sig.

Partial Eta

SquaredCorrected Model 28571.250a 3 9523.750 2.333 .075 .029Intercept 4.487E7 1 4.487E7 10991.299 .000 .979Drug_Group 28571.250 3 9523.750 2.333 .075 .029Error 963375.000 236 4082.097Total 4.586E7 240Corrected Total 991946.250 239

Estimated Marginal Means

1. Grand MeanDependent Variable:Post_GREv

Mean Std. Error 95% Confidence IntervalLower Bound Upper Bound

432.375 4.124 424.250 440.500

2. Drug_GroupDependent Variable:Post_GREvDrug_Group Mean Std. Error 95% Confidence Interval

Lower Bound Upper BoundControl Group (no drug) 424.833 8.248 408.584 441.083Placebo Group 423.833 8.248 407.584 440.083100 mg/day Group 430.000 8.248 413.750 446.250200 mg/day Group 450.833 8.248 434.584 467.083

34 of 49

T his table comes from request ing DESCRIPT IVES as part of the /PRINT sub-command.

Est imated marginal means come from the /EMEANS sub-commands. Table 1 comes from the OVERALL request on Line 5 of the syntax, and Table 2 comes from Line 6.

Post Hoc Tests

Drug_Group

Multiple ComparisonsDependent Variable:Post_GREv

(I) Drug_Group (J) Drug_Group Mean Difference

(I-J) Std. Error Sig. 95% Confidence IntervalLower Bound Upper Bound

LSD Control Group (no drug) Placebo Group 1.00 11.665 .932 -21.98 23.98100 mg/day Group -5.17 11.665 .658 -28.15 17.81200 mg/day Group -26.00* 11.665 .027 -48.98 -3.02

Placebo Group Control Group (no drug) -1.00 11.665 .932 -23.98 21.98100 mg/day Group -6.17 11.665 .598 -29.15 16.81200 mg/day Group -27.00* 11.665 .021 -49.98 -4.02

100 mg/day Group Control Group (no drug) 5.17 11.665 .658 -17.81 28.15Placebo Group 6.17 11.665 .598 -16.81 29.15200 mg/day Group -20.83 11.665 .075 -43.81 2.15

200 mg/day Group Control Group (no drug) 26.00* 11.665 .027 3.02 48.98Placebo Group 27.00* 11.665 .021 4.02 49.98100 mg/day Group 20.83 11.665 .075 -2.15 43.81

In the output above, the estimated marginal means and the descriptive statistics table provide more or less the same information: the means of each level of the independent variable Drug_Group. The table under Post Hoc Tests tells you which differences between levels of the independent variable are statistically significant.

To read the Post Hoc Tests (Multiple Comparisons) table: There are two columns (I and J), both of which are labeled with the independent variable (Drug_Group). Under column I, one level of the independent variable should be listed, and in column J each of the other three levels of that independent variable are listed in separate rows. For example, the first level of the independent variable listed in column I is Control Group (no drug), and each of the other three levels of the independent variable are listed under column J: Placebo Group, 100 mg/day group, 200 mg/day Group.

You should see a mean difference next to each of the groups in column J. This is the mean difference in the dependent variable between the level of the independent variable listed in column J with the level of the independent variable listed in column I. Thus, the mean difference in Posttest Verbal Reasoning GRE scores between the Placebo Group and the Control group is -1.00. The mean difference in Posttest Verbal Reasoning GRE scores between the 100 mg/day Group and the Control group is -5.17. (Note, they are negative only because of the direction PASW is subtracting.)

To determine whether a mean difference is statistically significant, look at the column labeled Sig. This column lists the p-value that can be used to determine whether the mean difference is significant. If the p-value is less than a chosen alpha level (α = .05, or less), then the mean difference is significant. In this data set, the only statistically significant mean differences are between the Control Group and the 200 mg/day Group (-26.00, p = .027) and between the Placebo Group and the 200 mg/day Group (-27.00, p = .021). But, it should be noted that because the F-Test was not significant, these post-hoc, pairwise comparisons are meaningless.

35 of 49

T his table presents all of the pairwise comparisons between levels of the independent variable; that is, all of the POST HOC comparisons.

6.2 Between Subjects Factorial ANOVA (via GLM)Factorial designs examine the influence of two or more independent variables on a dependent variable, and several possible effects can be significant (or not) in a factorial ANOVA: main effects and interactions. (I assume you know what these are.) The PASW procedure for requesting a factorial ANOVA is not very different from requesting a oneway ANOVA. In the syntax the follows, we will cover how to request factorial ANOVA in PASW with two between-subjects independent variables.

Say that we want to examine the influence of the independent variables Drug_Group and Tutor_Group on Posttest GRE Verbal Reasoning Scores (Post_GREv). Recall that Drug_Group has four levels (Control, Placebo, 100 mg/day, and 200 mg/day), and Tutor_Group has three levels (Control, Group Tutoring, and Individual Tutoring). Thus, we have a 4 (Drug_Group) x 3 (Tutor_Group) factorial design.

The set of syntax, below, which we not actually run, includes minimum sub-commands needed to have PASW run a factorial ANOVA. On the UNIANOVA command line (Line 1), before the BY, the dependent variable (Post_GREv) is listed. After the BY, both independent variables are listed (Drug_Group and Tutor_Group). The inclusion of the second independent variable is one difference from the oneway ANOVA in Section 6.1. Lines 2 – 4 are exactly the same at the oneway ANOVA performed in Section 6.1, and need no additional commentary.

1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group2 /METHOD=SSTYPE(3)3 /INTERCEPT=INCLUDE4 /CRITERIA=ALPHA(.05)5 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group.

The /DESIGN sub-command on Line 5 is where you request effects to be included in the overall ANOVA design. Remember, in factorial designs there is the potential of a main effect of each independent variable, and the potential for interactions between independent variables. Thus, each main effect and interaction that should be included in the analysis should be listed here. To include a main effect in the design, list the name of that independent variable. In the syntax above, the inclusion of Drug_Group and Tutor_Group on Line 5 asks PASW conduct F-Tests for those main effects. To include an interaction, list the independent variables that are part of the desired interaction and include an asterisk (*) between them. In the syntax above, the inclusion of Drug_Group*Tutor_Group asks PASW to conduct an F-Test on that interaction. Because we have only two independent variables, this is the only possible interaction. With three of more independent variables, additional interactions could be listed here.

So that's it! Again, we won't run this syntax; I'll include some more stuff before presenting any output. Below, I listed the minimum syntax for a oneway ANOVA alongside the minimum syntax for a factorial ANOVA, for comparison:

Line Oneway ANOVA Line Factorial NAOVA

1 UNIANOVA Post_GREv BY Drug_Group 1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group

2 /METHOD=SSTYPE(3) 2 /METHOD=SSTYPE(3)

3 /INTERCEPT=INCLUDE 3 /INTERCEPT=INCLUDE

4 /CRITERIA=ALPHA(.05) 4 /CRITERIA=ALPHA(.05)

5 /DESIGN=Drug_Group. 5 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group.

36 of 49

The syntax below (output follows) builds on the syntax above. The /POSTHOC sub-command on Line 4 requests Fisher's LSD tests to be conducted for the main effects of Drug_Group and Tutor_Group. Post hoc tests for interactions are usually done by way of a simple main effects analysis, or t-Test, but this is beyond the scope of this packet for now. The /EMEANS sub-commands on Line 5 – 8 asks PASW to calculate the grand mean (OVERALL, Line 5), the mean for each level of Drug_Group (Line 6), the mean for each level of Tutor_Group (Line 7), and the mean for each cell in the Drug_Group by Tutor_group design (Line 8). Finally, the /PRINT sub-command asks PASW to provide descriptive statistics (DESCRIPTIVE) and the eta-squared measure of the effect size for each F-Test:

1 UNIANOVA Post_GREq BY Drug_Group Tutor_Group2 /METHOD=SSTYPE(3)3 /INTERCEPT=INCLUDE4 /POSTHOC=Drug_Group Tutor_Group(LSD) 5 /EMMEANS=TABLES(OVERALL) 6 /EMMEANS=TABLES(Drug_Group)7 /EMMEANS=TABLES(Tutor_Group)8 /EMMEANS=TABLES(Drug_Group*Tutor_Group) 9 /PRINT=ETASQ DESCRIPTIVE10 /CRITERIA=ALPHA(.05)11 /DESIGN=Drug_Group Tutor_Group Drug_Group*Tutor_Group.

Running the syntax above, you get the following output:



Tutor_Group 1 Control Group (no tutoring) 802 Group Tutoring 803 Individual Tutoring 80


Dependent Variable:Post_GREqDrug_Group Tutor_Group Mean Std. Deviation N

Control Group (no drug) Control Group (no tutoring) 560.00 78.940 20Group Tutoring 609.50 76.810 20Individual Tutoring 596.50 88.334 20Total 588.67 82.861 60

Placebo Group Control Group (no tutoring) 589.00 71.517 20Group Tutoring 578.00 66.221 20Individual Tutoring 607.00 93.533 20Total 591.33 77.601 60

37 of 49

T his table lists each independent variable (far left ) and each level of each independent variable (under Value Label), along with the number of subjects in each combinat ion of the variables.

This table lists the descriptive statistics (mean and std. Deviation) for each level of each independent variable, as w ell as for each combination of the levels of the independent variables.


Dependent Variable:Post_GREq100 mg/day Group Control Group (no tutoring) 575.00 82.430 20

Group Tutoring 597.00 75.888 20Individual Tutoring 597.50 75.105 20Total 589.83 77.273 60

200 mg/day Group Control Group (no tutoring) 573.00 75.818 20Group Tutoring 571.50 77.682 20Individual Tutoring 639.50 79.305 20Total 594.67 82.718 60

Total Control Group (no tutoring) 574.25 76.502 80Group Tutoring 589.00 74.436 80Individual Tutoring 610.13 84.606 80Total 591.13 79.685 240

Tests of Between-Subjects Effects

Dependent Variable:Post_GREqSource Type III Sum of


Partial Eta

SquaredCorrected Model 103061.250a 11 9369.205 1.510 .129 .068

Intercept 8.386E7 1 8.386E7 13517.334 .000 .983Drug_Group 1217.917 3 405.972 .065 .978 .001Tutor_Group 52022.500 2 26011.250 4.193 .016 .035Drug_Group * Tutor_Group 49820.833 6 8303.472 1.338 .241 .034Error 1414535.000 228 6204.101

Total 8.538E7 240

Corrected Total 1517596.250 239

Estimated Marginal Means

1. Grand MeanDependent Variable:Post_GREq

Mean Std. Error 95% Confidence IntervalLower Bound Upper Bound

591.125 5.084 581.107 601.143

2. Drug_Group

EstimatesDependent Variable:Post_GREqDrug_Group Mean Std. Error 95% Confidence Interval

Lower Bound Upper BoundControl Group (no drug) 588.667 10.169 568.630 608.703Placebo Group 591.333 10.169 571.297 611.370100 mg/day Group 589.833 10.169 569.797 609.870200 mg/day Group 594.667 10.169 574.630 614.703

38 of 49

This table is the ANOVA summary table. The highlighted sections are relevant for the F-Tests. In this case, only the main ef fect of Tutor_Group w as signif icant

3. Tutor_Group

EstimatesDependent Variable:Post_GREqTutor_Group Mean Std. Error 95% Confidence Interval

Lower Bound Upper BoundControl Group (no tutoring) 574.250 8.806 556.898 591.602Group Tutoring 589.000 8.806 571.648 606.352Individual Tutoring 610.125 8.806 592.773 627.477

4. Drug_Group * Tutor_GroupDependent Variable:Post_GREqDrug_Group Tutor_Group Mean Std. Error 95% Confidence Interval

Lower Bound Upper BoundControl Group (no drug) Control Group (no tutoring) 560.000 17.613 525.296 594.704

Group Tutoring 609.500 17.613 574.796 644.204Individual Tutoring 596.500 17.613 561.796 631.204

Placebo Group Control Group (no tutoring) 589.000 17.613 554.296 623.704

Group Tutoring 578.000 17.613 543.296 612.704

Individual Tutoring 607.000 17.613 572.296 641.704

100 mg/day Group Control Group (no tutoring) 575.000 17.613 540.296 609.704

Group Tutoring 597.000 17.613 562.296 631.704


200 mg/day Group Control Group (no tutoring) 573.000 17.613 538.296 607.704

Group Tutoring 571.500 17.613 536.796 606.204


Post Hoc Tests

Drug_Group

Multiple ComparisonsDependent Variable:Post_GREq

(I) Drug_Group (J) Drug_Group Mean

Difference (I-

J) Std. Error Sig. 95% Confidence IntervalLower Bound Upper Bound

LSD Control Group (no drug) Placebo Group -2.67 14.381 .853 -31.00 25.67100 mg/day Group -1.17 14.381 .935 -29.50 27.17200 mg/day Group -6.00 14.381 .677 -34.34 22.34

Placebo Group Control Group (no drug) 2.67 14.381 .853 -25.67 31.00100 mg/day Group 1.50 14.381 .917 -26.84 29.84200 mg/day Group -3.33 14.381 .817 -31.67 25.00

100 mg/day Group Control Group (no drug) 1.17 14.381 .935 -27.17 29.50Placebo Group -1.50 14.381 .917 -29.84 26.84200 mg/day Group -4.83 14.381 .737 -33.17 23.50

200 mg/day Group Control Group (no drug) 6.00 14.381 .677 -22.34 34.34Placebo Group 3.33 14.381 .817 -25.00 31.67100 mg/day Group 4.83 14.381 .737 -23.50 33.17

39 of 49

Tutor_Group

Multiple ComparisonsDependent Variable:Post_GREq

(I) Tutor_Group (J) Tutor_Group Mean

Difference (I-

J) Std. Error Sig. 95% Confidence IntervalLower Bound Upper Bound

LSD Control Group (no tutoring) Group Tutoring -14.75 12.454 .238 -39.29 9.79Individual Tutoring -35.88* 12.454 .004 -60.41 -11.34

Group Tutoring Control Group (no tutoring) 14.75 12.454 .238 -9.79 39.29Individual Tutoring -21.13 12.454 .091 -45.66 3.41

Individual Tutoring Control Group (no tutoring) 35.88* 12.454 .004 11.34 60.41Group Tutoring 21.13 12.454 .091 -3.41 45.66

The majority of the output in above is not all that much different than the output from the oneway ANOVA performed in Section 6.1, and needs no elaboration. As stated in the comment on the ANOVA summary table: only the main effect of Tutor Group was significant (p = .016). Exploring this main effect, you can see from the Multiple Comparisons table that includes the post hoc tests between the levels of the independent variable for Tutor_Group, the only statistically significant mean difference is between the Control Group (no tutoring) and the Individual Tutoring group (mean difference = -35.88, p = .004).The mean difference between the Group Tutoring group and the Individual Tutoring group was nearly significant (mean difference = -21.13, p = .091). I encourage the reader to explore the output more thoroughly.

6.3 Repeated Measures ANOVA (via GLM)

Sections 6.1 and 6.2 showed how to request ANOVAs when the levels of an independent variable differed between subjects. In this section, I briefly introduce how to request ANOVA when the levels of an independent variable differ within subjects (repeated measures ANOVA). This section will cover only how to request a oneway, repeated measures ANOVA, as the data file includes only a single independent variable that can be considered to differ 'within subjects', and that is the pretest versus posttest period. The PASW GLM procedure for within-subjects variables is referred to as the 'repeated measures GLM'.

Let's say that we want to compare the mean score on the Verbal Reasoning Section of the GREs between the pretest and posttest periods (Pre_GREv vs. Post_GREv). Thus, we have one independent variable (Pretest vs. Posttest) with two levels. The syntax below lists the minimum set of sub-commands needed to perform this oneway repeated measures ANOVA:

1 GLM Pre_GREv Post_GREv2 /WSFACTOR=Pretest_Posttest 2 Difference 3 /METHOD=SSTYPE(3)4 /CRITERIA=ALPHA(.05)5 /WSDESIGN=Pretest_Posttest.

On the GLM command line, the levels of the within-subject variable are listed (Pre_GREv and Post_GREv). If there was three or more levels of the independent variable, they would be listed here as well. The order in which the levels are entered is critically important for repeated measures factorial

40 of 49

ANOVAs, but is of a concern for oneway repeated measures ANOVAs. On the /WSFACTOR command line (Line 2), the independent variable is listed. This independent variable does not actually appear in the data set; rather, it is a name that you give to the independent variable. In this example, because we are comparing GRE Verbal Reasoning scores between the pretest and posttest periods, I have called the independent variable Pretest_Posttest (PASW does not allow spaces in the name). On this line, the 2 indicates how many levels are within that independent variable. Finally the 'Difference' request is telling PASW how to compare the levels of that independent variable. This is akin to requesting a post-hoc test. The 'Difference' request tells PASW to compare each level with every other level, just like a Fisher's LSD test.

Lines 3 and 4 should be familiar from the between-subjects ANOVAs performed in Sections 6.1 and 6.2. The last line, /WSDESIGN lists each factor that should be included in the analysis. In this case, because we have only one independent variable, it should be the only factor listed.

The syntax above will output the results of only the F-Test; it will not provide any descriptive information. The syntax below includes the /PRINT sub-command on Line 4, which asks PASW to provide the DESCRIPTIVE statistics as well as the eta-squared measure of effect size. There is also the ability to request descriptive statistics through the /EMEANS sub-command:

1 GLM Pre_GREv Post_GREv2 /WSFACTOR=Pretest_Posttest 2 Difference 3 /METHOD=SSTYPE(3)4 /PRINT=DESCRIPTIVE ETASQ 5 /CRITERIA=ALPHA(.05)6 /WSDESIGN=Pretest_Posttest.

If you run the syntax above, you get the following output:

Within-Subjects Factors

Measure:MEASURE_1

Pretest_Posttest Dependent

Variable

1 Pre_GREv

2 Post_GREv



Pre_GREv 412.25 57.398 240

Post_GREv 432.38 64.424 240

41 of 49

Multivariate Testsb

Effect

Value F Hypothesis df Error df Sig.

Partial Eta

Squared

Pretest_Posttest Pillai's Trace .380 146.761a 1.000 239.000 .000 .380

Wilks' Lambda .620 146.761a 1.000 239.000 .000 .380

Hotelling's Trace .614 146.761a 1.000 239.000 .000 .380

Roy's Largest Root .614 146.761a 1.000 239.000 .000 .380

Mauchly's Test of Sphericityb

Measure:MEASURE_1

Within Subjects Effect

Mauchly's W

Approx. Chi-

Square df Sig. Epsilona

Greenhouse-

Geisser Huynh-Feldt Lower-bound

Pretest_Posttest 1.000 .000 0 . 1.000 1.000 1.000

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source Type III Sum

of Squares df

Mean

Square F Sig.

Partial Eta

Squared

Pretest_Posttest Sphericity Assumed 48601.875 1 48601.875 146.761 .000 .380

Greenhouse-Geisser 48601.875 1.000 48601.875 146.761 .000 .380

Huynh-Feldt 48601.875 1.000 48601.875 146.761 .000 .380

Lower-bound 48601.875 1.000 48601.875 146.761 .000 .380

Error(Pretest_Posttest) Sphericity Assumed 79148.125 239 331.164

Greenhouse-Geisser 79148.125 239.000 331.164

Huynh-Feldt 79148.125 239.000 331.164

Lower-bound 79148.125 239.000 331.164

42 of 49

T his tab;e is not relevant for our purposes.

This table lists the outcome of a 'sphericity' test, w hich is similar to homogeneity of variance. If sphericity is violated, it can be an issue.

T his is the ANOVA summary table.

Tests of Within-Subjects Contrasts

Measure:MEASURE_1

Source Pretest_Posttest Type III Sum

of Squares df Mean Square F Sig.

Pretest_Posttest Level 2 vs. Level 1 97203.750 1 97203.750 146.761 .000

Error(Pretest_Posttest) Level 2 vs. Level 1 158296.250 239 662.327

Tests of Between-Subjects Effects

Measure:MEASURE_1

Transformed Variable:Average

Source Type III Sum of


Partial Eta

Squared

Intercept 8.561E7 1 8.561E7 12034.036 .000 .981

Error 1700183.125 239 7113.737

The table Tests for Within Subjects Effects is the output of the ANOVA summary table. I have highlighted the relevant portions of the table in yellow. The terms associated with the effect of the independent variable (between group variability) are in the rows headed by Pretest_Posttest. The terms associated with the error variance are in the rows headed by Error(Pretest_Posttest). The information in most of the columns should be self explanatory. To determine whether the influence of the independent variable is statistically significant, look to the column labeled Sig. This is the p-value. If this value is less than your chosen alpha level (α = .05 or less), then the independent variable had a statistically significant influence on the dependent variable, which is the case (p < .001).

Because the influence of the independent variable is statistically significant, you can conclude that the mean difference between Pre_GREv and Post_GREv is statistically significant. You can find the mean for each level of the independent variable in the Descriptive Statistics table. The table Tests of Within-Subjects Contrasts lists the post hoc test results of each comparisons between levels of the independent variable. Because there is only two levels of the independent variable, there is only one possible comparison (Level 1 vs. Level 2).

43 of 49

This table lists the results of the post hoc tests betw een levels of the independent variable.

This table lists ANOVA results for any betw een subjects factors, w hich w e did not have in this analysis.

7. Chi Square

7.1 Cross-Tabulation Procedure (Factorial Chi-Square)The PASW procedure for requesting a factorial chi-square analysis (a chi-square analysis with two or more independent variables in the design) is done by way of PASWs CROSSTABS (cross-tabulation) procedure. Cross-tabulation is the process of creating a contingency table from two or more independent variables. You can have PASW create a contingency table for several independent variables but without actually conducting the chi-square analysis.

From the data set, let's say that we want to know whether the n = 240 subjects in the study are equally distributed across the levels of the independent variables college class (Coll_Class) and college major (Coll_Maj). The syntax below presents the basic set of sub-commands needed to have PASW carry out a chi-square analysis though the cross-tabulation procedure:

1 CROSSTABS2 /TABLES=Coll_Class BY Coll_Maj3 /FORMAT=AVALUE TABLES4 /STATISTICS=CHISQ PHI5 /CELLS=COUNT EXPECTED .

The /TABLES sub-command on Line 2 lists the independent variables being set up in the contingency table. One independent variable has to go before BY and the other goes after the BY, but it is not terribly important which one goes where. The /FORMAT sub-command on Line 3 is tells PASW in what format the output should be presented. In this case, tabled form (TABLE) and in the table the entries should appear in ascending order (AVALUE). The /STATISTICS sub-command on Line 4 is where you request the chi-square analysis (CHISQ); if you do not include this sub-command, PASW will not perform the analysis. I have also included PHI request, which has PASW calculate Cramer's C and the Phi Coefficient as measures of effect size. The /CELLS sub-command on Line 5 tells PASW what information to include in each cell of the cross-tabulation table. In this case, PASW is being told to include the observed frequency (COUNT) and the expected frequency (EXPECTED).

If you run the syntax about, you get the following output:

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

Coll_Class * Coll_Maj 240 100.0% 0 .0% 240 100.0%

44 of 49

T his table tells you the total number of subjects/cases included (240), and whether any appear to be missing.

Coll_Class * Coll_Maj Crosstabulation

Coll_Maj Total

Psychology History Biology Communications English Mathematics

Coll_Class Freshmen Count 10 12 13 6 7 9 57

Expected Count 9.3 9.5 9.5 7.1 9.0 12.6 57.0

Sophomore Count 12 11 7 7 13 15 65

Expected Count 10.6 10.8 10.8 8.1 10.3 14.4 65.0

Junior Count 6 13 14 9 7 14 63

Expected Count 10.2 10.5 10.5 7.9 10.0 13.9 63.0

Senior Count 11 4 6 8 11 15 55

Expected Count 8.9 9.2 9.2 6.9 8.7 12.1 55.0

Total Count 39 40 40 30 38 53 240

Expected Count 39.0 40.0 40.0 30.0 38.0 53.0 240.0

Chi-Square Tests

Value df

Asymp. Sig.

(2-sided)

Pearson Chi-Square 16.617a 15 .342

Likelihood Ratio 17.789 15 .274

Linear-by-Linear Association 2.970 1 .085

N of Valid Cases 240a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 6.88.

Symmetric Measures

Value Approx. Sig.

Nominal by Nominal Phi .263 .342

Cramer's V .152 .342

N of Valid Cases 240

The Chi-Square Tests table above, included information relevant for determining whether there is a significant difference between the observed and expected frequencies. Use the information in the row labeled Pearson Chi-Square. The number in the Value column (16.617) is the chi-square statistic. The value under the df column (15) are the degrees of freedom in the cross-tabulation table. The value under the Asymp. Sig. (2-sided) column (.342) is the p-value used to determine significance. If this value is less than your chosen alpha-level (α = .05 or less), then there is a significant difference between the observed and the expected frequencies. In this case,. Because .345 > .05, there is not a significant difference between the observed and the expected frequencies.

45 of 49

This is the cross-tabulation table w ith college majors listed in columns and college classes in row s. The values in each cell are the observed and expected frequencies.

This table reports the results of the chi-square analysis. You use the terms in the Pearson Chi-Square row.

This table reports Cramer's C (V) and the Phi coeff icient measures of ef fect size.

7.2 Oneway Chi-Square

If you have only one independent variable and want to know whether a set of observed frequencies differ across the levels of the independent variable from what frequencies are expected, you do not use the CROSSTABS procedure from Section 7.1. The CROSSTABS procedure is used only when there are two or more independent variables. There is a separate chi-square procedure within PASWs non-parametric test (NPAR TESTS) for dealing with one independent variable.

Say that we want to determine whether the observed frequencies across the four college classes differ from a set of frequencies expected by chance. The syntax below lists the sub-commands needed to run a oneway chi-square test to determine whether the observed frequencies in each college class differ from what frequency is expected for each college class:

1 NPAR TESTS2 /CHISQUARE=Coll_Class3 /EXPECTED=EQUAL4 /MISSING ANALYSIS.

The /CHISQURE sub-command on Line 2 tells PASW to perform a chi-square test across the levels of the independent variable Coll_Class listed after the equal sign. The /EXPECTED sub-command on Line 3 tells PASW how to calculate the expected frequencies. In this case the choice of EQUAL asks PASW to assume the expected frequency should be equal for each college class. Hence, with 240 students and four college classes, the expected frequency for each college class should be 240/4 = 60. Finally, the /MISSING sub-command on Line 4 tells PASW how to handle missing data, which is usually set to ANALYSIS, or LISTWISE. When you run the syntax above, you get the following output:

Coll_Class

Observed N Expected N Residual

Freshmen 57 60.0 -3.0

Sophomore 65 60.0 5.0

Junior 63 60.0 3.0

Senior 55 60.0 -5.0

Total 240

Test Statistics

Coll_Class

Chi-square 1.133a

df 3

Asymp. Sig. .769a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 60.0.

46 of 49

This table lists each of the levels of the independent variable, the observed frequencies, the expected frequencies, and the dif ference betw een them.

T his table lists the outcome of the chi-square test between the expected and observed frequencies.

From the Test Statistics table you can determine whether the observed frequencies significantly differ from the expected frequencies by examining the p-value in the Asymp. Sig. Row. If this value is less than your chosen alpha-level (generally α = .05 or less), then there is a significant difference between the observed and the expected frequencies. In this case, because .769 > .05, there is not a significant difference between the observed and the expected frequencies.

7.3 Goodness of Fit Test

Requesting a goodness of fit test is virtually identical to requesting a chi-square analysis for one independent variable. Assume that in most research studies performed using college students, freshmen are most likely to participate, sophomores are second-most likely, juniors are third-most likely, and seniors are least likely. Thus, we may expect that 50% (.5) of the subjects in a study are freshmen, 25% (.25) are sophomores, 15% (.15) are juniors, and 10% (.1) are seniors. We want to run a goodness of fit test to determine whether the frequencies observed in each college class are consistent with these expected percentages.

The syntax below lists the sub-commands needed to run a goodness of fit test to determine whether the frequencies observed in each college class are congruent with the predicted percentages above:

1 NPAR TESTS2 /CHISQUARE=Coll_Class3 /EXPECTED=.5 .25 .15 .14 /MISSING ANALYSIS.

Notice that Lines 1, 2, and 4 are identical to the oneway chi-square conducted in Section 7.2; the only difference is the /EXPECTED sub-command on Line 3. The numbers after the equal sign are the expected proportions of freshmen (.5), sophomores (.25), juniors (.15), and seniors (.1) from above. For the goodness of fit test, you can use proportions as done here, or expected frequencies. Ehich would need to be determined.

Importantly: the order of proportions must coincide with the dummy-codes assigned to the levels of the independent variable. That is, whichever level was dummy-coded as 1 would have it's expected proportion presented first, whichever level was dummy-coded as 2 would have it's expected proportion presented second, etc. In the data file, freshmen were coded 1, sophomores were coded 2, etc. When you run the syntax above, you get the following output:

Coll_Class


Freshmen 57 120.0 -63.0


Junior 63 36.0 27.0

Senior 55 24.0 31.0

Total 240

47 of 49

This table lists each of the levels of the independent variable, the observed frequencies, the expected frequencies, and the dif ference betw een them. The expected frequencies are obtained by multiplying the sample size (240) by the expected proportion of each level.

Test Statistics

Coll_Class

Chi-square 93.783a

df 3


From the Test Statistics table you can determine whether the observed frequencies significantly differ from the expected frequencies by examining the p-value in the Asymp. Sig. Row. If this value is less than your chosen alpha-level (generally α = .05 or less), then there is a significant difference between the observed and the expected frequencies. In this case, because p < .001, there is a significant difference between the observed and the expected frequencies.

7.4 Alternative Method for Goodness of Fit Test

There is an alternative procedure for requesting a goodness of fit test. Say that you know the numbers of freshmen (57), sophomores (66), juniors (63), and seniors (56) in the study, and want to run a goodness of fit test on those numbers, but have not set up an entire data file with all 240 subject cases. The screen shot below, shows a data file with the independent variable Coll_Class (1 = freshmen; 2 = sophomores; 3 = juniors; 4 = seniors) and the Observed frequencies in each class. We could also run a goodness of fit test (or a oneway chi-square) when data are set up in this manner.

48 of 49

Figure 6: Observed frequencies in each college class.

This table lists the outcome of the chi-square test betw een the expected and observed frequencies.

The syntax below shows you how to conduct a chi-square goodness of fit test when the frequency data is set up in the manner shown in Figure 6:

1 WEIGHT BY Observed.23 NPAR TESTS4 /CHISQUARE=Coll_Class5 /EXPECTED=.5 .25 .15 .16 /MISSING ANALYSIS.78 WEIGHT OFF.

The WEIGHT command on Line 1 is critical, as this tells PASW to weight each case (each level of the independent variable) by the amount listed in the Observed variable. The 'Observed' is the frequency data in the data file pictured in Figure 6. Thus, the WEIGHT BY command tells PASW to weight the freshmen by 57, the sophomores by 65, the juniors by 63, and the seniors by 55. This is basically telling PASW that each college class contains that many subjects.

Lines 3 – 6 are identical to the goodness of fit procedure discussed in Section 7.3, and I will not elaborate on them here. The WEIGHT OFF command turns off the weighting factor after the analysis has been run. When you run the syntax above, you get the following output:

Coll_Class


Freshmen 57 120.0 -63.0


Junior 63 36.0 27.0

Senior 55 24.0 31.0

Total 240

Test Statistics

Coll_Class

Chi-square 93.783a

df 3


Notice that the output is exactly the same as in Section 7.3.

49 of 49