lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · md....

15
Md. Abdullah Al Mahmud Senior Lecturer Manarat International University Lecture-3 Cross-tabulation Tables Cross-tabulation tables (contingency tables) display the relationship between two or more categorical (nominal or ordinal) variables. The size of the table is determined by the number of distinct values for each variable, with each cell in the table representing a unique combination of values. Numerous statistical tests are available to determine whether there is a relationship between the variables in a table. What factors affect the products that people buy? The most obvious is probably how much money people have to spend. In this example, we'll examine the relationship between income level and PDA (personal digital assistant) ownership. From the file demo.sav- From the menus choose: Analyze DescriptiveStatistics Crosstabs Select Income category in thousands (inccat) as the row variable. Select Owns PDA (ownpda) as the column variable. Click OK to run the procedure

Upload: others

Post on 30-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Lecture-3

Cross-tabulation Tables

Cross-tabulation tables (contingency tables) display the relationship between two or more categorical (nominal or ordinal) variables. The size of the table is determined by the number of distinct values for each variable, with each cell in the table representing a unique combination of values. Numerous statistical tests are available to determine whether there is a relationship between the variables in a table.

What factors affect the products that people buy? The most obvious is probably how much money people have to spend. In this example, we'll examine the relationship between income level and PDA (personal digital assistant) ownership.

From the file demo.sav-

► From the menus choose:

Analyze DescriptiveStatistics Crosstabs

► Select Income category in thousands (inccat) as the row variable.

► Select Owns PDA (ownpda) as the column variable.

► Click OK to run the procedure

Page 2: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Page 3: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

The cells of the table show the count or number of cases for each joint combination of values. For example, 455 people in the income range $25,000–$49,000 own PDAs.

None of the numbers in this table, however, stand out in any obvious way, indicating any obvious relationship between the variables.

Counts vs. Percentages

It is often difficult to analyze a cross-tabulation simply by looking at the simple counts in each cell.

The fact that there are more than twice as many PDA owners in the $25,000–$49,000 income category than in the under $25,000 category may not mean much (or anything) since there are also more than twice as many people in that income category.

► Open the Crosstabs dialog box again. (The two variables should still be selected.)

► You can use the Dialog Recall button on the toolbar to quickly return to recently used procedures.

► Click Cells.

► Click (check) Row in the Percentages group.

► Click Continue and then click OK in the main dialog box to run the procedure.

Page 4: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Page 5: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

A clearer picture now starts to emerge. The percentage of people who own PDAs rises as the income category rises.

Significance Testing for Cross-tabulations

The purpose of a cross-tabulation is to show the relationship (or lack thereof) between two variables. Although there appears to be some relationship between the two variables, is there any reason to believe that the differences in PDA ownership between different income categories is anything more than random variation?

A number of tests are available to determine if the relationship between two crosstabulated variables is significant. One of the more common tests is chi-square. One of the advantages of chi-square is that it is appropriate for almost any kind of data.

► Open the Crosstabs dialog box again.

Page 6: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

► Click Statistics.

► Click (check) Chi-square.

► Click Continue and then click OK in the main dialog box to run the procedure.

Pearson chi-square tests the hypothesis that the row and column variables are independent. The actual value of the statistic isn't very informative

Page 7: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

The significance value (Asymp. Sig.) has the information we're looking for. The lower the significance value, the less likely it is that the two variables are independent (unrelated).

In this case, the significance value is so low that it is displayed as .000, which means that it would appear that the two variables are, indeed, related.

You can add a layer variable to create a three-way table in which categories of the row and column variables are further subdivided by categories of the layer variable.

This variable is sometimes referred to as the control variable because it may reveal how the relationship between the row and column variables changes when you "control" for the effects of the third variable.

► Open the Crosstabs dialog box again.

► Click Cells.

Page 8: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

► Uncheck Row Percents.

► Click Continue

► Select Level of Education (ed) as the layer variable.

► Click OK to run the procedure

If you look at the cross-tabulation table, it might appear that the only thing we have accomplished is to make the table larger and harder to interpret.

But if you look at the table of chi-square statistics, you can easily see that in all but one of the education categories, the apparent relationship between income and PDA ownership disappears (typically, a significance value less than 0.05 is considered "significant").

Page 9: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

This suggests that the apparent relationship between income and PDA ownership is merely an artifact of the underlying relationship between education level and PDA ownership.

Since income tends to rise as education rises, apparent relationships between income and other variables may actually be the result of differences in education.

Correlation analysis

Correlation means the direction and strength of linear relationship.

Simple Correlation Coefficient (r):

A quantitative measure of the direction and strength of linear relationship.

Page 10: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

The Karl Pearson’s correlation coefficient (r) is defined as:

][][)()(),(

2222

YnYXnX

YXnXY

YVXVYXCovr

Pearson’s correlation coefficient (r) is a measure of linear association, but if the relationship is not linear, Pearson’s correlation coefficient is not an appropriate statistic for measuring their association.

Bivariate Data:

Bivariate data: Data with measurements on two variables on same individual; let’s call them X and Y.

Example: The height(X) and weight(Y) of a group of people.

Hypothesis Test for Correlation:

Null Hypothesis: 0:0 H

Alternative Hypothesis: 0:1 H

Or, 0:1 H or, 0:1 H

P-value:

P-value = probability value

If P-value < 0.05, then we say null hypothesis is rejected at 5% level of significance.

To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...],

To begin, enter the data as follows,

IQ GPA 102 2.75 108 4.00 109 2.25 118 3.00 79 1.67 88 2.25

Page 11: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Simple Correlation

Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and "GPA" to the Variables: list. [Explore the options presented on this controlling dialog box.]

Click on [OK] to generate the requested statistics.

Page 12: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

The results from output window should look like the following, Correlations IQ GPA

IQ Pearson Correlation

1 .669

Sig. (2-tailed) .147

N 6 6

GPA Pearson Correlation

.669 1

Sig. (2-tailed) .147

N 6 6

As you can see, r=0.669. The results suggest that the correlation is significant.

Illustrative Example (Pearson’s correlation coefficient):

Let us consider the following data set:

Sales experience (in year) X

Annual sales volume (in Tk.’ 000) Y

1 3

4

4

6

7

8

10

11

13

80

97

92

102

103

98

119

123

110

125

Page 13: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Find the correlation coefficient between the years of experience of the salespersons and the annual sales volume.

Solution:

First enter the values of the variable experience(X) and sales(Y) in SPSS data sheet. From the menu bar choose

Analyze

Correlate

Bivariate…

Select variables exper and sales. After that send these variables into the variable box.

The following options are available:

For quantitative normally distributed variables, choose the Pearson correlation coefficient.

If your data are not normally distributed or have ordered categories, choose Kendall’s tau or Spearman, which measure the association between rank orders.

Comment:

The range of correlation coefficient(r) is -1 to +1. r = -1 means perfect negative relationship. r = 1 means perfect positive relationship. r = 0 means no linear relationship.

Test of significance:

You can select Two-tailed or One tailed. If the direction of association is known in advance, select One-tailed otherwise select Two-tailed. Flag significant correlation:

If the value of correlation coefficient is significant at 5% level and 1% level, are identified with a single asterisk and double asterisk respectively.

Page 14: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

At the right side bottom of the dialogue box you will see the box named options... After clicking in the options… part you will find another dialogue box named Bivariate Correlation Options.

Bivariate Correlation Options:

Statistics – you can choose the one or both of the following:

Means and standard deviations: Displayed for each variable. Cross-product deviations and covariance’s: Displayed for each pair

of variable. The cross-product of deviations is equal to the sum of the products of the mean-corrected variables. This is the numerator of the Pearson’s correlation coefficient. The covariance is an un-standardized measure of the relationship between two variables, equal to the cross-product deviation divided by N-1.

Missing Values - you can choose one of the following:

Exclude cases pairwise: Cases with missing values for one or both of a pair of variables for a correlation coefficient are excluded from analysis.

Exclude cases liswise: Cases with missing values for any variable are excluded from all correlations.

After choosing the desired options then click Continue and then OK Output:

The SPSS result is as follows-

Descriptive Statistics

Mean Std. Deviation N

Years of sales experience

6.70 3.831 10

Annual sales volume 104.90 14.395 10

Correlations

Page 15: Lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · Md. Abdullah Al Mahmud Senior Lecturer Manarat International University The results from output

Md. Abdullah Al Mahmud Senior Lecturer

Manarat International University

Years of sales experience

Annual sales volume

Years of sales experience

Pearson Correlation 1 .886(**)

Sig. (2-tailed) . .001

N 10 10

Annual sales volume Pearson Correlation .886(**) 1

Sig. (2-tailed) .001 .

N 10 10

** Correlation is significant at the 0.01 level (2-tailed).

Interpretation of the Result

Comment on r:

The value of correlation coefficient, r=0.886, which implies that there is a strong positive association between the variables, years of sales experience and annual sales volume.

Comment on significance:

Here p-value=0.001

Since p-value is less than 0.01, we may reject the null hypothesis at 1% level of significance and conclude that the population correlation coefficient is not equal to 0, i.e., there is a linear association between years of sales experience annual sales volume.