training on data management and analysis using spss

Post on 18-Dec-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Training on

Data Management and Analysis Using SPSS

Bedilu Alamirie

Addis Ababa University

Statistics Department

July, 2019

Outline1. Motivational example and Overview of SPSS

2. Modifying and Organizing Data in SPSS

3. Descriptive Statistics using SPSS

4. Testing equality of two or more population

Means

5. Nonparametric Methods using SPSS

6. Correlation and Regression Analysis

7. Analysis of Variance and Covariance using

SPSS

8. Generalized linear model using SPSS

Objectives of the training

Main Objectives of this workshop are:

• Introducing the SPSS interface

• Opening and reviewing layouts of SPSS

• Becoming familiar with menus and icons

• Manipulating data files

• Calculating descriptive statistics

• Perform in-depth analysis using real data

• Explain output produced by SPSS

Motivational Background

Why Statistics and Statistical software

important for decision making process?

Statistical Decision Making Process

Begin Here:

Identify the

Problem

Data

Organization &

Presentation of Data

Data Analysis

Interpretation

Descriptive Statistics,

Probability,

Statistical software's

Inferential

Statistics, Experience,

Theory, Literature,

Statistical software's

Software'sSTATA,SAS, R, SPSS,Excel

Information

Knowledge

Decision

Data Presentation/Summarization Methods

• Data in raw form are usually not easy to use for

decision making.

- What can you do from Eth oil enterprise, CBE,

Ethio telecom?

• Some type of organization is needed

• Summary statistics

• Table

• Graph

• The type of data summarization depends on the

type of data/variable being summarized

Example: Consider Commercial Bank of Ethiopia (CBE) data

• Raw data’s doesn’t facilitate decision making

process!

How the CBE manager will

use this data for decision making?

Types of Variable/Data

Variable/Data

Categorical/Qualitative Numerical/Quantitative

Discrete Continuous

Eg:

Marital Status

registered to vote?

Region

(Defined categories or groups) Eg:

Number of Children

Defects per hour

(Counted items)

Examples:

Weight

Height

(Measured characteristics)

8There are different statistical methods for each type!

Descriptive Statistics

Collect data

e.g., Survey

Present data

e.g., Tables and graphs

Summarize data

e.g., Sample mean = iX

n

9

Inferential Statistics

Estimation

e.g: Estimate the population mean

using the sample mean

Confidence interval

Hypothesis testing

e.g., Test the claim that the population

mean weight is 56 kg.

comparison of two or more means or

proportions

Inference is the process of drawing conclusions or making decisions about a population based on

sample results10

What is SPSS?SPSS

• Originally it was an acronym for Statistical Package for the

Social Science (SPSS) but now it stands for Statistical

Product and Service Solutions.

• It was originally launched in 1968 by SPSS Inc., and was later

acquired by IBM in 2009

• Windows based program that can be used to perform data

entry and analysis and to create tables and graphs with

simple instructions.

• Capable of handling large amounts of data and undertake a

wide range of statistical analyses relatively easily

• SPSS is updated often. The latest version is SPSS 25/6

Introduction …

• SPSS is a software for editing and analyzing all sorts of data.

• data may come from basically any source

– scientific research

– a customer database

– Google Analytics or even the server log files of a website.

12

Introduction …

• SPSS can open all file formats that are commonly used for

structured data such as

– spreadsheets from MS Excel

– plain text files (.txt or .csv)

– relational (SQL) databases

– STATA and SAS etc

13

Starting SPSSYou may use any one of the following options to start SPSS.

1 Go to the Applications folder, and select SPSS from the

list of programs (or Start All Programs IBM SPSS

Statistics IBM SPSS Statistics 23).

2 Double-click SPSS shortcut icon on the desktop (if present).

SPSS welcome dialogue

SPSS Environment

• Common windows in SPSS are

– Data Editor

– Output Viewer

– Pivot Table Editor

– Chart Editor

– Syntax Editor

15

SPSS Environment

• The file extension for

– Data Editor is *.sav

– Output viewer is *.spv

– Syntax editor is *.sps

16

SPSS/ Windows

• The user for SPSS/Windows is built by three primary and

distinct windows:

• Data Editor window

open at start-up and is used to enter and store data

in a spreadsheet format

• Syntax Editor windowa text editor where you compose SPSS commands

• Viewer windowresults of all statistical analyses and graphical displays

of data.

1. Data Editor window • It is the view where we see our data

• It is useful to see and manipulate the data

• It contains variables in columns and cases in rows

• It has two views:

1. Data view 2. Variable view

Click Click

columns represent variables and

rows represent cases (observations)

each row is a variable, and

each column is an attribute

associated with that

variable

A. Data View window

B. Variable View windowThis sheet contains information about the data set (to be)

stored

– name of variable, type of variable, its label and its value

B. Variable View windowName

– Variable names must be unique, and have to be less than 64 characters.

– Spaces are NOT allowed. Names are not case sensitive

Type of variables

There are different types of variables

It is displayed when clicked upper right corner of type

column

Width/Decimals• The width column allows us to specify the total number of

characters required for the column

• If numeric type of variable, it will ask to choose number of

widths and decimals (as a default the width comes 8 and

decimals of 2)

• If a qualitative data with words, it will ask you to choose

number of characteristics you wanted to add

• If date type of variable, it may ask you to choose number of

characteristics of the type of date

Decimals:

• The Decimals option allows you to specify the number of

decimal places for your variable

• It has to be less than or equal to 16

• If it is date or string variable, it will not ask you decimals

Label of the variables

Label of a variable is detailed description of the variable name

• You can specify the details of the variable

• You can write characters with spaces up to 256 characters

and may contain spaces and punctuation

Labels of values of variable• A value label is a label assigned to a particular value of a

variable

• You are most likely to use value labels for nominal orcategorical data

• It is for variables whose values are nominated

• Eg. ‘Sex’ the value can be 1. male, 2. female

• ‘Residence’ = = = = = 1. urban, 2. rural ….etc

• For continuous variables, no value is needed coding

Defining the value labels

• Click the cell in the values column as shown below

• For the value, and the label, you can put up to 60 characters.

• After defining the values click add and then click OK.

Exercise 1The following small data set consists of three variables namely,

Agecat, gender, income.

Where agecat is a categorical variable created for age.

1= ‘ Under 21 ‘ 2= ‘ 21-25’ , 3 = ‘ 26-30’

Gender: 0 = ‘Male’ and 1= ‘Female’

Income is numeric.

1. Define these variables in a data editor window

2. Enter the following data for the variables agecat, Gender and Income respectively. Your data should appear as given below.

1 1 5799

2 1 5711

3 1 3412

1 0 6393

2 0 6485

3 0 6680

3. Save the data set as exercise1.sav.

2. The Viewer window/output/

• It is displayed after any data manipulation

• Analysis result, commands are displayed in the viewer

window

• Editing of graphs is also performed in this window

2. The Viewer window/output/

3. The Syntax Editor

• It is the window in which SPSS commands can be typed and

submitted for processing.

• Commands saved in files can be opened in a syntax Editor

window for processing.

• it has *.sps extension

Opening an existing data file

Your data may already have been entered and

saved as a data file

– in SPSS having .*.sav. extension or

– a different package such as EPI info, Excel,

SAS, Stata or dBase etc.

In data editor window

• Choose the file type, and then browse and

select your file so it appears by file name.

Opening SPSS for windows

• Data also could be opened using open-

database

• Excel file, Data base file and MS access data

base could be opened

• The procedure is

Exercise 2: Open bankloan.sav from SPSS

package

Data, Transform, analysis and graphs

Data, Transform, Statistics and Graphs

• Data menu is useful to define variables and make changes to

the data file you are using.

• Transform menu is used to make changes to selected

variable(s) in the data file you are using. It includes recoding

existing variables & computing new variables.

• Statistics (Analyze) menu is useful to perform statistical

analyses such as producing Reports, calculating Descriptive

Statistics, as well as various statistical procedures such as

Regression and Correlation.

• Graphs menu lets one make various types of plots from a

given dataset.

SPSS Pull-down menus

2. Descriptive Statistics

Using SPSS

(Univariate and Bivariate)

Steps of Data Analysis

Descriptive Statistics:

1st: Univariate analysis

Examine the distribution of each variable

2nd: Bivariate analysis

Describe association between pairs of variables (mostly response with each

covariate)

Inferential Statistics:

3rd: Multiple (Multivariable) regression

• Analysis with more than one independent variables

Step 3: Use a statistical model called Regression (Linear , logistic, )

• To examine the relationship between multiple independent variables & a

dependent variable

• To gain insight into causal relationships (cause & effect)

1. Univariate Analysis

• Univariate analysis is the process of describing the sample

by examining and summarizing the distribution of each

individual variable.

• Used for all variables, regardless of level of measurement.

• Useful to make the researcher familiar with variables/data.

• It can also be used to test variables for fulfilling

assumptions.

Data Exploration

• Before doing any kind of statistical testing or model building,

examine your data using summary statistics and graphs

• Exploring data enables to know

– which values are typical

– which values are unusual

– where is it centered

– how spread out is it

– what are its extremes

43

Data Exploration…

• When summarizing a quantitative variable consider

– How many observations were there?

– How many cases had missing values?

– Where is the "center" of the data?

– Where are the "benchmarks" of the data? (Quartiles, percentiles)

– How spread out is the data?

– What are the extremes of the data? (Minimum, maximum; Outliers)

– What is the "shape" of the distribution

44

Descriptive Statistics (Frequency)

• Frequency

– Frequency distributions are tabular presentations of data

• show each category for a variable and

• the frequency of the category's occurrence in the data set

– Frequencies are used when you want to know how many of

something you have

– Additional statistics are also available via the Statistics button

– The Charts button is particularly useful to automatically produce

charts

45

Descriptive Statistics (Frequency)…

• Frequency

– Go to

Analyze > Descriptive Statistics > Frequencies

46

Descriptive Statistics (Frequency)…

• Frequency

– Clicking on buttons Statistics and Charts the following dialogue

box will appear

47

Descriptive Statistics (Frequency)…

• Frequency

– Frequency for Employment Category from Employee sample data

– Majority (76%) of the employees are clerical, while little (5.7%) are

custodial.

48

Descriptive Statistics (Descriptive)

• Descriptive

– The descriptive procedure produces summary measures for numeric

variables

– It produces measures of center, spread and shape & size

– Go to

Analyze > Descriptive Statistics > Descriptives…

49

Descriptive Statistics (Descriptive)…

• Descriptive

– Click Options button to get what measures to get

50

Descriptive Statistics (Descriptive)…

• Descriptive

– Use Employee sample data to describe the salary variable

– The average salary of employees is 34419.57 dollars with 17075.66 dollars

spread in standard deviation. Also the minimum salary paid is 15750 dollars

while the maximum is 135000 dollars.

51

Descriptive Statistics (Explore)

• Explore

– It produces detailed univariate statistics and graphs for numeric scale

variables

– It can also be used to assess the normality of a numeric scale

variable with special inferential statistics and detailed diagnostic

plots

– It produce results by controlling for a particular categorical variable

52

Descriptive Statistics (Explore)…

• Explore

– Got to

Analyze > Descriptive Statistics > Explore

53

Descriptive Statistics (Explore)…

• Explore

– Use Employee sample data to display detailed univariate statistics for

salary

– The missing and non-missing cases in count and percent are

displayed.

54

Descriptive Statistics (Explore)…

• Explore…

– The mean salary of clerical employees is 27838.54 dollars with a standard deviation

of 7567.995 dollars; 50% of clerical employees get at most 26550 dollars.

»

55

• salary of clerical workers has

asymmetric shape and a

leptokurtic size.

Descriptive Statistics (Crosstabs)

• Crosstabs

– Frequency table describes a single categorical variable

– Cross tabulation describes the relationship between two categorical

variables

• It can be called as two way table, contingency table or crosstabs

– Go to

Analyze > Descriptive Statistics >Crosstabs...

» 56

Descriptive Statistics (Crosstabs)…

• Crosstabs…

– Use Employee sample data to display detailed univariate statistics for

salary

»

57

Descriptive Statistics (Crosstabs)…

• Crosstabs…

– Use Employee sample data to display detailed univariate statistics for

salary

– The total number of female clerical employees are 206, while male

clerical are 157.

»

58

Descriptive Statistics (Crosstabs)…

• Crosstabs…

– Remark: you can create a higher order table by entering in “Layer”

box

59

Click next

until you

complete the

categorical

variables to

be used

Descriptive Statistics (Ratio)

• Ratio

– The Ratio Statistics procedure provides a comprehensive list of summary

statistics for describing the ratio between two scale variables.

– It is helpful to calculate range, average absolute deviation, median-centered

coefficient of variation and mean-centered coefficient of variation.

– Go to

Analyze > Descriptive Statistics >Ratio...

60

Descriptive Statistics (Ratio)…

• Ratio…

– Use Employee sample data to display ratio of beginning salary

(salbegin) and current salary (salary)

– v

61Click on Statistics

Descriptive Statistics (Ratio)…

• Ratio…

» 62

Descriptive Statistics (P-P plots)

• P-P Plots

– Useful to check whether a variable under consideration follows a

certain distribution, eg: Normal

– Go to

Analyze > Descriptive Statistics >P-P Plots...

63

Descriptive Statistics (P-P plots) …

• P-P Plots

– Use Employee sample data to test normality of current salary (salary)

»

64

Select the type of

distribution to be

checked

Descriptive Statistics (P-P plots) …

• P-P Plots

– Use Employee sample data to test normality of current salary (salary)

»

65

Descriptive Statistics (Q-Q plots)

• Q-Q Plots

– Useful to check whether a variable under consideration follows a

certain distribution, eg: Normal

– Go to

Analyze > Descriptive Statistics >Q-Q Plots...

66

Descriptive Statistics (Q-Q plots) …

• Q-Q Plots…

– Useful to check whether a variable under consideration follows a

certain distribution, eg: Normal

»

67

Select the type of

distribution to be

checked

Descriptive Statistics (Q-Q plots) …

• Q-Q Plots…

– Useful to check whether a variable under consideration follows a

certain distribution, eg: Normal

»

68

Descriptive Statistics (Graphs)

• Graphs

– SPSS provides a wide variety of charts(graphs) to choose from such as

• Bar chart

• Pie chart

• Histogram

• Boxplots and

• Scatter plots

– There are three options to produce graphs

• Chart Builder

• Graph board Template Chooser and

• Legacy Dialogs 69

Descriptive Statistics (Graphs) …

• Chart Builder

– Go to

Graph > Chart builder >OK

»

70Click Ok

Choose chart

type

Descriptive Statistics (Graphs)…

• Chart Builder…

– Use Employee sample data to construct bar chart for Job category (jobcat)

»

71

Click OkSelect the statistic to

be displayed

Descriptive Statistics (Graphs) …

• Chart Builder

– Go to

Graph > Graphboard Template Chooser…

72

Descriptive Statistics (Graphs) …

• Chart Builder…

– Use Employee sample data to construct bar chart for Job category (jobcat)

73

Descriptive Statistics (Graphs) …

• Legacy Dialogs

– Go to

Graphs> Legacy Dialogs>Bar/3-D Bar/Line/Area/Pie/ etc

74

Descriptive Statistics (Graphs) …

• Multiple Bar Chart

– Use Employee sample data to construct multiple bar chart for current salary

(salary) by Gender(gender)

75

Select

Clustered

Click Define

1

2

Descriptive Statistics (Graphs) …

• Multiple Bar Chart

– Use Employee sample data to construct multiple bar chart for current salary

(salary) by Gender(gender)

76

Sel

ect

wh

at b

ars

rep

rese

nt

Click

Change

statistic

1 2

3

Descriptive Statistics (Graphs) …

• Multiple Bar Chart

– Use Employee sample data to construct multiple bar chart for current salary

(salary) by Gender(gender) for job category(jobcat)

77

Click Ok

Descriptive Statistics (Graphs) …

• Stacked Bar Chart

– Use Employee sample data to construct multiple bar chart for current salary

(salary) by Gender(gender) for job category(jobcat)

78

Sel

ect

wh

at b

ars

rep

rese

nt

Click

Change

statistic

1 2

3

Descriptive Statistics (Graphs) …

• Stacked Bar Chart

– Use Employee sample data to construct multiple bar chart for current salary

(salary) by Gender(gender) for job category(jobcat)

79

Descriptive Statistics (Graphs) …

• Histogram

– Histograms are used for visualizing quantitative data.

– Go to

Graphs>Legacy Dialogs>Histogram…

80

Descriptive Statistics (Graphs) …

• Histogram

– Use Employee sample data to construct histogram for current salary (salary)

81

Descriptive Statistics (Graphs) …

• Box Plot

– Boxplots are useful to check variability, observe outliers, etc

– They are also useful way of comparing two or more datasets

– Go to

Graphs>Legacy Dialogs>Boxplot

82

1

2

Descriptive Statistics (Graphs) …

• Box Plot…

– Use Employee sample data to construct histogram for current salary (salary)

by employment category (jobcat)

83

Descriptive Statistics (Graphs) …

• Box Plot…

84

Inferential Statistics

• Statistical inference refers to making generalizations about populations

based on samples of those populations

• Two basic issues in statistical inference: estimation and hypothesis

testing

• Inferential procedures can be either parametric or non-parametric

• Parametric: Assumptions about the shape of the distributions of the

population are mandatory.

– Examples: t test, z test, ANOVA, etc.

• Non-parametric: These do not require any assumption on the shape of

the distribution.

– Examples: Chi-square, the Mann-Whitney U , the Wilcoxon test, etc.85

Inferential Statistics

• One sample t test

– A one-sample test can be used to compare a sample mean to a given

value.

– Assumptions

• Test variable that is continuous (i.e., interval or ratio level)

• Observations on the test variable are independent

• Random sample of data from the population

• Normal distribution (approximately) of the sample and population on the

test variable

• No outliers86

0 0 1 0: :H vs H

Inferential Statistics…

• One sample t test …

– Use Employee sample data to test mean current salary (salary) is different

from 34000 dollars

– Go to

Analyze >Compare Means >One-Sample T Test...

87

Inferential Statistics…

• One sample t test …

88

Select the

variable to

be tested

1

3

2Specify the

claim

4Specify the

level of

confidence

5

6

Inferential Statistics…

• One sample t test …

• The mean salary of employees is not different from $34000

89

Inferential Statistics…

• Two sample paired t test

– This compares the differences between pairs of readings for two

related samples

– The two means typically represent two different times (e.g., pre-test

and post-test with an intervention between the two time points) or

two different but related conditions or units (e.g., left and right ears,

twins).

– The purpose of the test is to determine whether there is statistical

evidence that the mean difference between paired observations on a

particular outcome is significantly different from zero.

90

Inferential Statistics…

• Two sample paired t test…

– Assumptions

• Dependent variable that is continuous (i.e., interval or ratio level)

• The paired measurements must be recorded in two separate

variables

• Related samples/groups (i.e., dependent observations)

• Random sample of data from the population

• Normal distribution (approximately) of the difference between

the paired values

• No outliers in the difference between the two related groups

91

Inferential Statistics…

• Two sample paired t test

– To run paired t test go to

Analyze> Compare Means > Paired-Samples T Test

92

Inferential Statistics…

• Two sample paired t test

– From Employee sample data set assuming normality for the

difference for beginning salary (salbegin) and current salary (salary).

Conduct paired t test.

93

1 2

3

45

6

Inferential Statistics…

• Two sample paired t test…

• There is a significant difference between beginning salary and current salary of

employees.94

Inferential Statistics…

• Two sample Independent t test

– The Independent Samples t Test compares the means of two

independent groups in order to determine whether there is statistical

evidence that the associated population means are significantly

different.

– The purpose of the test is to determine whether there is statistical

evidence that the mean difference between two groups of

observations on a particular outcome is significantly different from

zero.

• Compare salary of male and female

95

Inferential Statistics…

• Two sample Independent t test…

– Assumptions

• Dependent variable that is continuous (i.e., interval or ratio level)

• Independent variable that is categorical (i.e., two or more groups)

• Cases that have values on both the dependent and independent variables

• Independent samples/groups (i.e., independence of observations)

• Random sample of data from the population

• Normal distribution (approximately) of the dependent variable for each

group

• Homogeneity of variances (i.e., variances approximately equal across

groups)

• No outliers96

Inferential Statistics…

• Two sample Independent t test…

– To run two sample independent t test go to

Analyze> Compare Means > Independent Samples T Test

97

Inferential Statistics…

• Two sample Independent t test…

– From Employee sample data set assuming normality current salary

(salary). Conduct whether salary for males is different from females.

98

12

3

4

5

6

7

Inferential Statistics…

• Two sample Independent t test…

• Consider two cases

– Equal variance

– Unequal variance 99

The mean salary of males is

different from females

The variability of salary for

males and females is not equal

Inferential Statistics…

• One way ANOVA

– Use when there are more than two group means which are

independent

– The variables used in this test are known as:

• Dependent variable

• Independent variable (also known as the grouping variable, or factor)

– The total variance of all samples will be calculated; portion can be

accounted by known and unknown causes.

100

Inferential Statistics…

• One way ANOVA…

– Assumptions

• Dependent variable that is continuous (i.e., interval or ratio level)

• Independent variable that is categorical (i.e., two or more groups)

• Cases that have values on both the dependent and independent variables

• Independent samples/groups (i.e., independence of observations)

• Random sample of data from the population

• Normal distribution (approximately) of the dependent variable for each

group (i.e., for each level of the factor)

• Homogeneity of variances (i.e., variances approximately equal across

groups)

• No outliers101

Inferential Statistics…

• One way ANOVA…

– To run one way ANOVA go to

Analyze> Compare Means > One-Way ANOVA...

102

Inferential Statistics…

• One way ANOVA…

– From Employee sample data set assuming normality current salary (salary).

Conduct whether salary different at least between two Employment

categories (jobcat)

103

1

2

3

4

56

Inferential Statistics…

• One way ANOVA…

104There is at least difference in the salary of two Employment

categories

Inferential Statistics…

• Chi Square Tests of independence

– Used to test variables that have nominal data

– It doesn’t recognize any quantitative distinction among categories

– Test to see whether a relationship (association) exists between two

nominal variables

– Analogues to bivariate correlation

– Compares expected and observed counts in each category

105

Inferential Statistics…

• Chi Square Tests of independence

– To perform chi-square test of association go to

Analyze> Nonparametric Tests> Legacy Dialogs> Chi-square…

106

Inferential Statistics…

• Chi Square Tests of independence

– Use Employee sample data to test association between Employment

category (jobcat) and minority classification (minority)

107

There is a association between

jobcat and minority.

Inferential Statistics…

• Wilcoxon Signed-Ranks Test

– It is a non-parametric equivalent test for paired t test

– Used when the assumption of normality is violated

– It can also be used on ordinal variables -although ties may be a real

issue for Likert items

108

Inferential Statistics…

• Wilcoxon Signed-Ranks Test

– To perform Wilcoxon Signed-Rank test go to

Analyze> Nonparametric Tests> Legacy Dialogs> 2 Related Samples…

109

Inferential Statistics…

• Wilcoxon Signed-Ranks Test

– Use Employee sample data to conduct current salary (salary) and

beginning salary (salbegin) are same

110

1 2

3

4

Inferential Statistics…

• Wilcoxon Signed-Ranks Test

111

There is a statistically

significant difference on the

beginning and current salary

Displays the number of -, +,

0 difference

Inferential Statistics…

• Mann-Whitney U Test

– It is a nonparametric test equivalent to two sample independent t test

– The assumptions of the Mann-Whitney U test are

• The variable of interest is continuous (not discrete). The measurement

scale is at least ordinal

• The probability distributions of the two populations are identical, except

for location

• The two samples are independent

• Both samples are simple random samples from their respective

populations. Each individual in the population has an equal probability

of being selected in the sample

• Can be called Wilcoxon Test not Wilcoxon Signed Rank test112

Inferential Statistics…

• Mann-Whitney U Test

– To run Mann-Whitney test go to

Analyze> Nonparametric Tests> Legacy Dialogs> 2 Independent Samples…

113

Inferential Statistics…

• Mann-Whitney U Test

– Use Employee sample data to test equality of salary for minority

classification

114

1

2

3

4

5

6

Inferential Statistics…

• Mann-Whitney Test…

115

There is a significant difference in the current

salary of employees based on employment

category

Inferential Statistics…

• Kruskal-Wallis Test

– The Kruskal-Wallis test is an alternative for a one-way ANOVA if

the assumptions of the latter are violated

– It is a rank based nonparametric test

– It is an extension of Mann-Whitney Test

– To run Kruskal-Wallis test go to

Analyze> Nonparametric Tests> Legacy

Dialogs> K Independent Samples…

116

Inferential Statistics…

• Kruskal-Wallis Test…

117

Inferential Statistics…

• Kruskal-Wallis Test…

– Use Employee sample data to test for difference in current salary

based on Employment category

118

1

2

3

4

5

6

Inferential Statistics…

• Kruskal-Wallis Test…

119

There is a difference at least between two

employment categories on their current

salary

Inferential Statistics…

• Correlation Analysis

– Measures the strength and direction of linear relationship between

pairs of variables

– To run bivariate Pearson correlation go to

Analyze> Correlate> Bivariate

120

Inferential Statistics…

• Correlation Analysis

– Use Employee sample data to run bivariate correlation between

beginning salary (salbeg) and current salary (salary)

121

There is a strong linear relationship

between beginning salary and current

slary.

Inferential Statistics…

• Partial correlation Analysis

– Partial correlation is a measure of the strength and direction of a

linear relationship between two continuous variables whilst

controlling for the effect of one or more other continuous variables

• Example:

– A linear relationship between ice cream sales and price, whilst

controlling for daily temperature

– relationship between 10,000 m running performance and VO2max

but you would like to know if this relationship is affected by wind

speed and humidity

122

Inferential Statistics…

• Partial correlation Analysis

– Assumptions

• There is one (dependent) variable and one (independent) variable

and these are both measured on a continuous scale (i.e., they are

measured on an interval or ratio scale).

• There are one or more control variables, also known as covariates

(i.e., control variables are just variables that you are using to adjust the

relationship between the other two variables; dependent and

independent

• There needs to be a linear relationship between all pairwise variables

• There should be no significant outliers

• variables should be approximately normally distributed 123

Inferential Statistics…

• Partial correlation Analysis

– To run partial correlation go to

Analyze> Correlate> Partial…

124

Inferential Statistics…

• Partial correlation Analysis

– Use job_satisfaction data to partial correlation

125

1

2

3

Inferential Statistics…

• Partial correlation Analysis

126

Correlation between Outcome of job performance and outcome

of motivation test controlling for the effect of outcome of

social support

Inferential Statistics…

• Linear Regression Analysis

– Used when we want to predict the value of a variable based on the

value of another variable.

– The variable we want to predict is called the dependent variable (or

sometimes, the outcome variable).

– The variable we are using to predict the other variable's value is

called the independent variable (or sometimes, the predictor

variable).

127

Inferential Statistics…

• Linear Regression Analysis

– Assumptions

• Dependent variables should be measured at the continuous level (i.e., either

interval or ratio variables).

• There needs to be a linear relationship between the dependent and independent

variables.

• There should be no significant outliers.

• Observations should be independent, check using the Durbin-Watson statistic,

which is a simple test to run using SPSS Statistics.

• Data should be homoscedastic

• The residuals (errors) of the regression line are approximately normally

distributed

128

user
Highlight

Inferential Statistics…

• Linear Regression Analysis

– To run regression analysis go to

Analyze> Regression> Linear…

129

Inferential Statistics…

• Linear Regression Analysis

– A company wants to know how job performance (perf) relates to IQ (iq),

motivation (mot) and social support (soc).

130

1

2

3

Inferential Statistics…

• Linear Regression Analysis

131

65.4% of the total variation in performance is

explained by IQ, Mot, and Soc

This shows overall significance

All predictors are

significant

A unit increase in IQ

will have an increase in

performance by 0.265

units

Inferential Statistics…

• Binary Logistic Regression Analysis

– Predicts the probability that an observation falls into one of two

categories of a dichotomous dependent variable based on one or

more independent variables

– Suppose that we are interested in the factors that influence whether a

political candidate wins an election.

• The outcome (response) variable is binary (0/1); win or

lose.

• The predictor variables of interest are

– the amount of money spent on the campaign,

– the amount of time spent campaigning negatively and

– whether or not the candidate is an incumbent.

132

Inferential Statistics…

• Binary Logistic Regression Analysis

– Assumptions

• dependent variable should be measured on a dichotomous scale

• one or more independent variables, which can be either

continuous or categorical

• independence of observations and the dependent variable should

have mutually exclusive and exhaustive categories

• a linear relationship between any continuous independent

variables and the logit transformation of the dependent variable.

133

Inferential Statistics…

• Binary Logistic Regression Analysis…

– To run binary logistic regression go to

Analyze >Regression >Binary Logistic…

134

Inferential Statistics…

• Binary Logistic Regression Analysis…

– Use job_performance data to fit logistic regression

– Convert perf variable into categorical (perfcat)based on a cutoff point 78

135

78,

78,

perf badperfcat

perf good

6

1

2

3

4

5

9

7

8

6

Inferential Statistics…

• Binary Logistic Regression Analysis…

136

The model fits well

the explained variation in the dependent

variable based on our model ranges from

24.0% to 33.0%

Inferential Statistics…

Binary Logistic Regression Analysis…

137

The model predicts

76.70% correctly

Statistical significance for each of the independent variables

A unit increase in soc score will increase the odds of getting a good

performance evaluation by a factor of 1.038

Inferential Statistics…

Multinomial Logistic Regression Analysis

• Used to predict a nominal dependent variable given one or more

independent variables

• An extension of binary logistic regression

• Entering high school students make program choices among general

program, vocational program and academic program.

– Their choice might be modeled using their writing score and their social

economic status.

138

Inferential Statistics…

Multinomial Logistic Regression Analysis…

• Assumptions

– dependent variable should be measured at the nominal level

– have one or more independent variables that are continuous, ordinal or

nominal

– have independence of observations and the dependent variable should have

mutually exclusive and exhaustive categories

– There should be no multicollinearity

– There needs to be a linear relationship between any continuous independent

variables and the logit transformation of the dependent variable.

– There should be no outliers, high leverage values or highly influential points

139

Inferential Statistics…

Multinomial Logistic Regression Analysis…

• To run multinomial logistic regression go to

Analyze >Regression >Multinomial

Logistic…

140

Inferential Statistics…

Multinomial Logistic Regression Analysis…

• Entering high school students make program choices among

general program, vocational program and academic program.

Their choice might be modeled using their writing score and their

social economic status.

• Use multidemo data

141

Inferential Statistics…

Multinomial Logistic Regression Analysis…

142

1

2 3

4

5

8

6

7

Inferential Statistics…

Multinomial Logistic Regression Analysis…

143

Descriptive percentage for categorical variables

Model improves by adding ses and write for

explaining type of type of program admitted

Proportion of variation explained

Inferential Statistics…

Multinomial Logistic Regression Analysis…

144

• Tests for the overall effect of ses

and write

• The effects are statistically

significant

Inferential Statistics…

Multinomial Logistic Regression Analysis…

145

Inferential Statistics…

Multinomial Logistic Regression Analysis…

• A one-unit increase in the variable write is associated with a 0.058

decrease in the relative log odds of being in general program versus

academic program

• A one-unit increase in the variable write is associated with a .1136

decrease in the relative log odds of being in vocation program versus

academic program.

• The relative log odds of being in general program versus in academic

program will increase by 1.163 if moving from the highest level of

ses (ses = 3) to the lowest level of ses (ses = 1).

146

top related