lecture-3emrulmahmud.weebly.com/uploads/5/2/4/2/52421679/lecture... · 2019. 10. 4. · md....
TRANSCRIPT
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Lecture-3
Cross-tabulation Tables
Cross-tabulation tables (contingency tables) display the relationship between two or more categorical (nominal or ordinal) variables. The size of the table is determined by the number of distinct values for each variable, with each cell in the table representing a unique combination of values. Numerous statistical tests are available to determine whether there is a relationship between the variables in a table.
What factors affect the products that people buy? The most obvious is probably how much money people have to spend. In this example, we'll examine the relationship between income level and PDA (personal digital assistant) ownership.
From the file demo.sav-
► From the menus choose:
Analyze DescriptiveStatistics Crosstabs
► Select Income category in thousands (inccat) as the row variable.
► Select Owns PDA (ownpda) as the column variable.
► Click OK to run the procedure
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
The cells of the table show the count or number of cases for each joint combination of values. For example, 455 people in the income range $25,000–$49,000 own PDAs.
None of the numbers in this table, however, stand out in any obvious way, indicating any obvious relationship between the variables.
Counts vs. Percentages
It is often difficult to analyze a cross-tabulation simply by looking at the simple counts in each cell.
The fact that there are more than twice as many PDA owners in the $25,000–$49,000 income category than in the under $25,000 category may not mean much (or anything) since there are also more than twice as many people in that income category.
► Open the Crosstabs dialog box again. (The two variables should still be selected.)
► You can use the Dialog Recall button on the toolbar to quickly return to recently used procedures.
► Click Cells.
► Click (check) Row in the Percentages group.
► Click Continue and then click OK in the main dialog box to run the procedure.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
A clearer picture now starts to emerge. The percentage of people who own PDAs rises as the income category rises.
Significance Testing for Cross-tabulations
The purpose of a cross-tabulation is to show the relationship (or lack thereof) between two variables. Although there appears to be some relationship between the two variables, is there any reason to believe that the differences in PDA ownership between different income categories is anything more than random variation?
A number of tests are available to determine if the relationship between two crosstabulated variables is significant. One of the more common tests is chi-square. One of the advantages of chi-square is that it is appropriate for almost any kind of data.
► Open the Crosstabs dialog box again.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
► Click Statistics.
► Click (check) Chi-square.
► Click Continue and then click OK in the main dialog box to run the procedure.
Pearson chi-square tests the hypothesis that the row and column variables are independent. The actual value of the statistic isn't very informative
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
The significance value (Asymp. Sig.) has the information we're looking for. The lower the significance value, the less likely it is that the two variables are independent (unrelated).
In this case, the significance value is so low that it is displayed as .000, which means that it would appear that the two variables are, indeed, related.
You can add a layer variable to create a three-way table in which categories of the row and column variables are further subdivided by categories of the layer variable.
This variable is sometimes referred to as the control variable because it may reveal how the relationship between the row and column variables changes when you "control" for the effects of the third variable.
► Open the Crosstabs dialog box again.
► Click Cells.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
► Uncheck Row Percents.
► Click Continue
► Select Level of Education (ed) as the layer variable.
► Click OK to run the procedure
If you look at the cross-tabulation table, it might appear that the only thing we have accomplished is to make the table larger and harder to interpret.
But if you look at the table of chi-square statistics, you can easily see that in all but one of the education categories, the apparent relationship between income and PDA ownership disappears (typically, a significance value less than 0.05 is considered "significant").
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
This suggests that the apparent relationship between income and PDA ownership is merely an artifact of the underlying relationship between education level and PDA ownership.
Since income tends to rise as education rises, apparent relationships between income and other variables may actually be the result of differences in education.
Correlation analysis
Correlation means the direction and strength of linear relationship.
Simple Correlation Coefficient (r):
A quantitative measure of the direction and strength of linear relationship.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
The Karl Pearson’s correlation coefficient (r) is defined as:
][][)()(),(
2222
YnYXnX
YXnXY
YVXVYXCovr
Pearson’s correlation coefficient (r) is a measure of linear association, but if the relationship is not linear, Pearson’s correlation coefficient is not an appropriate statistic for measuring their association.
Bivariate Data:
Bivariate data: Data with measurements on two variables on same individual; let’s call them X and Y.
Example: The height(X) and weight(Y) of a group of people.
Hypothesis Test for Correlation:
Null Hypothesis: 0:0 H
Alternative Hypothesis: 0:1 H
Or, 0:1 H or, 0:1 H
P-value:
P-value = probability value
If P-value < 0.05, then we say null hypothesis is rejected at 5% level of significance.
To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...],
To begin, enter the data as follows,
IQ GPA 102 2.75 108 4.00 109 2.25 118 3.00 79 1.67 88 2.25
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Simple Correlation
Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and "GPA" to the Variables: list. [Explore the options presented on this controlling dialog box.]
Click on [OK] to generate the requested statistics.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
The results from output window should look like the following, Correlations IQ GPA
IQ Pearson Correlation
1 .669
Sig. (2-tailed) .147
N 6 6
GPA Pearson Correlation
.669 1
Sig. (2-tailed) .147
N 6 6
As you can see, r=0.669. The results suggest that the correlation is significant.
Illustrative Example (Pearson’s correlation coefficient):
Let us consider the following data set:
Sales experience (in year) X
Annual sales volume (in Tk.’ 000) Y
1 3
4
4
6
7
8
10
11
13
80
97
92
102
103
98
119
123
110
125
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Find the correlation coefficient between the years of experience of the salespersons and the annual sales volume.
Solution:
First enter the values of the variable experience(X) and sales(Y) in SPSS data sheet. From the menu bar choose
Analyze
Correlate
Bivariate…
Select variables exper and sales. After that send these variables into the variable box.
The following options are available:
For quantitative normally distributed variables, choose the Pearson correlation coefficient.
If your data are not normally distributed or have ordered categories, choose Kendall’s tau or Spearman, which measure the association between rank orders.
Comment:
The range of correlation coefficient(r) is -1 to +1. r = -1 means perfect negative relationship. r = 1 means perfect positive relationship. r = 0 means no linear relationship.
Test of significance:
You can select Two-tailed or One tailed. If the direction of association is known in advance, select One-tailed otherwise select Two-tailed. Flag significant correlation:
If the value of correlation coefficient is significant at 5% level and 1% level, are identified with a single asterisk and double asterisk respectively.
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
At the right side bottom of the dialogue box you will see the box named options... After clicking in the options… part you will find another dialogue box named Bivariate Correlation Options.
Bivariate Correlation Options:
Statistics – you can choose the one or both of the following:
Means and standard deviations: Displayed for each variable. Cross-product deviations and covariance’s: Displayed for each pair
of variable. The cross-product of deviations is equal to the sum of the products of the mean-corrected variables. This is the numerator of the Pearson’s correlation coefficient. The covariance is an un-standardized measure of the relationship between two variables, equal to the cross-product deviation divided by N-1.
Missing Values - you can choose one of the following:
Exclude cases pairwise: Cases with missing values for one or both of a pair of variables for a correlation coefficient are excluded from analysis.
Exclude cases liswise: Cases with missing values for any variable are excluded from all correlations.
After choosing the desired options then click Continue and then OK Output:
The SPSS result is as follows-
Descriptive Statistics
Mean Std. Deviation N
Years of sales experience
6.70 3.831 10
Annual sales volume 104.90 14.395 10
Correlations
Md. Abdullah Al Mahmud Senior Lecturer
Manarat International University
Years of sales experience
Annual sales volume
Years of sales experience
Pearson Correlation 1 .886(**)
Sig. (2-tailed) . .001
N 10 10
Annual sales volume Pearson Correlation .886(**) 1
Sig. (2-tailed) .001 .
N 10 10
** Correlation is significant at the 0.01 level (2-tailed).
Interpretation of the Result
Comment on r:
The value of correlation coefficient, r=0.886, which implies that there is a strong positive association between the variables, years of sales experience and annual sales volume.
Comment on significance:
Here p-value=0.001
Since p-value is less than 0.01, we may reject the null hypothesis at 1% level of significance and conclude that the population correlation coefficient is not equal to 0, i.e., there is a linear association between years of sales experience annual sales volume.