chi square test for cross tab - session 9 & 10

39
Crosstabulation and Chisquare test Business Research Methodology Business Research Methodology Dr. Gunjan Malhotra Dr. Gunjan Malhotra Assistant Professor mailforgunjan@gmail com mailforgunjan@gmail.com

Upload: meenal-surjuse

Post on 24-Mar-2015

90 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chi Square Test for Cross Tab - Session 9 & 10

Cross‐tabulation and Chi‐square testq

Business Research MethodologyBusiness Research Methodology

Dr. Gunjan MalhotraDr. Gunjan MalhotraAssistant Professormailforgunjan@gmail [email protected]

Page 2: Chi Square Test for Cross Tab - Session 9 & 10

Simple Tabulation for Ranking Type Q ti Bi i t i blQuestions – Bivariate variables

• Suppose ‐ ordinal scale questions

• Q. Rank the 5 brands of refrigerators shown below on ascale of 1 to 5 (1=Best and 5=Worst), according to youropinionopinion.

BRAND RANKBRAND RANKWhirlpool ___Kelvinator ___Godrej ___Samsung ___Videocon ___

Page 3: Chi Square Test for Cross Tab - Session 9 & 10

Output table formulationOutput table formulation

Table 1BRAND RANK 1 RANK2 RANK3 RANK4 RANK5BRAND       RANK 1   RANK2   RANK3  RANK4  RANK5Whirlpool     x                  x x x xKelvinator x x x x xKelvinator    x                  x x x xGodrej          x                  x x x xSamsung x x x x xSamsung      x                  x x x xVideocon     x                  x x x x

Page 4: Chi Square Test for Cross Tab - Session 9 & 10

Univariate tablesUnivariate tables• For constructing univariate tables ‐ take up one column at atime and do separate frequency tables or charts. E.g.

BRAND No. of People who Ranked it No.1p

Whirlpool 90

Kelvinator 60

Godrej 70

Samsung 32g

Videocon 45

TOTAL 297

• We can calculate %age on a total for each brand. E.g. 90/297works out to 303 or 30 3% who ranked Whirlpool as no 1 andworks out to .303 or 30.3% who ranked Whirlpool as no.1. andso on.

Page 5: Chi Square Test for Cross Tab - Session 9 & 10

Simple Tabulation for Rating Type Questions Q. Rate the following attributes of LIRIL soap on a scale of 1 to 5 (1= Very Unsatisfactory to 5=Very Satisfactory).Very Unsatisfactory to 5 Very Satisfactory).

Lather          __________________________________

1 2 3 4 51              2             3             4                 5       

Fragrance      __________________________________

1              2             3             4                 5

• For each attribute, the number of people who rated it as 1, 2, 3, 4 or 5 can be tabulated in separate tables like:

RATING                 Lather

1 30

2 25

3 50

4 76

5 22

TOTAL             203

Page 6: Chi Square Test for Cross Tab - Session 9 & 10

Alternatively, we can tabulate ratings for all attributes as follows ‐

RATING    LATHER      FRAGRANCE    ATR.3    ATR.4    ATR.51 x x x x x1            x                       x x x x2            x                       x x x x3            x                       x x x x4            x                       x x x x5            x                       x x x x

Page 7: Chi Square Test for Cross Tab - Session 9 & 10

Second Stage Analysis – Cross Tabulation• A cross‐tabulation can be done by combining any two of the

questions and tabulating the data together. This is a 2‐variablequestions and tabulating the data together. This is a 2 variablecross tabulation.

b l b d f f b d f• E.g. a cross‐tabulation between Brand Preference for brands of teaand Region to which Respondent belongs.

BRANDRegionwise Buyers (No.)RAN Regionwise uyers (No.)North South East West Total

Brooke Bond 25 (50%) 20 20 15(30%) 80(40%)Lipton 10(20%) 15 20 5(10%) 50(25%)Tata 15(30%) 15 10 30(60%) 70(35%)Total 50(100%) 50 50 50(100%) 200(100%)Total 50(100%) 50 50 50(100%) 200(100%)

– An extension of this could be adding percentages.An extension of this could be adding percentages.

Page 8: Chi Square Test for Cross Tab - Session 9 & 10

Calculating Percentages in a Cross Tabulation•In the above example, we can compute percentages

• row‐wise,row wise,• column‐wise or•on the total sample of 200.

•The general rule is to calculate percentages across the dependentvariable (across Brand categories ).( g )

• Assume that brand preference depends on the region to whichrespondents belong. i.e. “Brand” ‐ dependent variable, and“Region” ‐ independent variable.

• The interpretation is – “Out of 50 respondents from the NorthernRegion, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy TataRegion, 50% buy Brooke Bond, 20% buy Lipton, and 30% buy TataTea”.

Page 9: Chi Square Test for Cross Tab - Session 9 & 10

Chi‐square testq

1.   Univariate  ‐ Chi‐square test for goodness of fitq g

• Test for significance in the analysis of frequency distributions.Test for significance in the analysis of frequency distributions.• Each question represents a variable under study.• Compare observed frequencies with expected frequenciesCompare observed frequencies with expected frequencies

2 Bivariate ‐ Chi‐square test for relatedness or independence2.   Bivariate Chi square test for relatedness or independence

– Chi‐Square allows testing for significant differences between– Chi‐Square allows testing for significant differences between groups.

[Two different questions in a questionnaire may represent two variables.]q q y p

Page 10: Chi Square Test for Cross Tab - Session 9 & 10

Chi‐square test for Goodness of FitChi square test for Goodness of Fit• is used to analyze probabilities of multinomial y pdistribution trials along a single dimension.

• The Chi‐square test for goodness‐of‐fit test comparesThe Chi square test for goodness of fit test compares the expected (theoretical) frequencies of categories from a population distribution to the observedfrom a population distribution to the observed (actual) frequencies from a distribution to determine whether there is a difference between what waswhether there is a difference between what was expected and what was observed .

∑ −=

i

ii )²( ²E

EOxiE

Page 11: Chi Square Test for Cross Tab - Session 9 & 10

Example 1: Chi Square test for goodness of fit ‐ Equal expected frequency

• The table outlines the attitudes of 60 people towards US• The table outlines the attitudes of 60 people towards US military bases in Australia. A chi‐square test for goodness of fit will allow us to determine if differencesgoodness of fit will allow us to determine if differences in frequency exist across response categories.H Th i i ifi t diff f f• Ho: There is no significant difference across frequency of attitudes towards military base in Australia.

Attitude towards US Military Frequency of ResponseAttitude towards US Military bases in Australia

Frequency of Response(Observed frequencies)

In favour 8

Against 20

Undecided 32

Page 12: Chi Square Test for Cross Tab - Session 9 & 10

Output 1: Chi‐Square test – equal expected frequencies

Page 13: Chi Square Test for Cross Tab - Session 9 & 10

Interpretation 1: Chi‐square test – equal d f iexpected frequencies

• The output shows that the chi‐square value  is significant (p < .05). (Ho: rejected).g (p ) ( j )

• Therefore it can be concluded that there are• Therefore, it can be concluded that there are significant differences in the frequency of attitudes towards military base in Australiatowards military base in Australia.

• The results show that people are largely undecided on this issue, chi‐square (2,N=60)=14.4, p < .05.

Page 14: Chi Square Test for Cross Tab - Session 9 & 10

Example 2: Chi‐square test for goodness of fit – Unequal expected frequencies

• Sometimes the expected frequencies are not evenly balanced across categories.y g

• E.g.  the expected frequency for each category was 15 15 and 30was 15, 15 and 30.

Attitude towardsUS Military bases

Frequency of Response

Expected  Frequency ofUS Military bases 

in AustraliaResponse(Observedfrequencies)

Frequency of responses

I f 8 15In favour 8 15

Against 20 15

Undecided 32 30

Page 15: Chi Square Test for Cross Tab - Session 9 & 10

Output 2: Chi‐square test – unequal expected frequencies

Page 16: Chi Square Test for Cross Tab - Session 9 & 10

Interpretation 2: Chi‐square test – unequal expected frequencies

• The output shows that the chi square value is• The output shows that the chi‐square value  is not significant (p = .079 > .05). (Ho = accepted)

• Therefore, it can be concluded that there is no ,significant differences in the frequency of attitudes towards military base in Australia.attitudes towards military base in Australia.

Th lt h th t l l l• The results show that people are largely undecided on this issue, chi‐square   (2,N=60)= 5 067 055.067, p > .05.

Page 17: Chi Square Test for Cross Tab - Session 9 & 10

Chi square test of IndependenceChi‐square test of Independence

• Qualitative Variables Nominal data• Qualitative Variables  ‐ Nominal data

• used to test if the two variables are statistically• used to test if the two variables are statistically associated with each other significantly.  

• Used to analyze the frequencies of two variables with multiple categories to determine whether the twomultiple categories to determine whether the two variables are independent.

• It is possible to do a cross‐tabulation (and a chi‐squared test – with given table value, df, confidence level) for any two nominal variables in the survey.

Page 18: Chi Square Test for Cross Tab - Session 9 & 10

Example 1: Chi square test for cross tabExample 1: Chi‐square test for cross‐tab

• Let us assume that we have conducted  consumer survey for a brand of detergent. One of the question dealt with income category of the respondent. Another asked the respondent to rate his purchase intentions. 

• Ho: There is no significant association between Respondent Income and Purchase Intentionp

Page 19: Chi Square Test for Cross Tab - Session 9 & 10

S. No

INCOME CODE INTENT INTCODE No.1 Less Than 5000 1 NONE 1 2 Less Than 5000 1 LOW 2 3 Less Than 5000 1 LOW 2 4 Less Than 5000 1 NONE 14 Less Than 5000 1 NONE 15 Less Than 5000 1 HIGH 3 6 5001-10000 2 LOW 2 7 5001-10000 2 HIGH 3 8 5001-10000 2 VERY

HIGH 4

9 5001-10000 2 HIGH 3 10 5001-10000 2 LOW 2 11 10001-20000 3 HIGH 3 12 10001-20000 3 VERY

HIGH 4

13 10001-20000 3 CERTAIN 514 10001-20000 3 HIGH 3 15 10001-20000 3 VERY

HIGH 4

16 Above 20000 4 HIGH 316 Above 20000 4 HIGH 317 Above 20000 4 CERTAIN 5 18 Above 20000 4 VERY

HIGH 4

19 Abo e 20000 4 CERTAIN 519 Above 20000 4 CERTAIN 520 Above 20000 4 CERTAIN 5

Page 20: Chi Square Test for Cross Tab - Session 9 & 10

Both variables are coded.Both variables are coded.

Income codes and their equivalent incomes are –

Code Income in Rs. per Month1 Less than 50001 Less than 50002 5001 to 10,0003 10,001 to 20,0004 Above 20 0004 Above 20,000

Purchase Intention codes are as follows –

Code Explanation (Value Labels for the Variable)1 None – No intention to buy1 None No intention to buy2 Low – Low intention to buy3 High – High intention4 Very High Very high intention4 Very High – Very high intention5 Certain – Certain to buy

Page 21: Chi Square Test for Cross Tab - Session 9 & 10

INCOME Per Month by PURCHASE INTENTION

Income per Month in RS.--- Purchase Intent

Code Less than 5000

5000-10000

10000-20000

Above 20000

TOTAL

5000None 1 2 0 0 0 2 Low 2 2 2 0 0 4Low 2 2 2 0 0 4High 3 1 2 2 1 6 V. High 4 0 1 2 1 4 Certain 5 0 0 1 3 4TOTAL 5 5 5 5 20

Page 22: Chi Square Test for Cross Tab - Session 9 & 10

Cross‐tabulation of code (column‐income per month) and Intcode (row – purchase intent).

Page 23: Chi Square Test for Cross Tab - Session 9 & 10

Result 1: Chi Square test for cross tabResult 1: Chi‐Square test for cross‐tab

Page 24: Chi Square Test for Cross Tab - Session 9 & 10
Page 25: Chi Square Test for Cross Tab - Session 9 & 10

Interpretation 1: Chi‐square test for cross‐tab 

• The cross‐tabulation shows the number of respondentsfalling into each cell (a cell is the combination of oneINCOME category with one PURCHASE INTENTION category).

• The first line of the chi‐squared test reads a significancelevel of 0 097 This means the chi‐squared test is showing alevel of 0.097. This means the chi squared test is showing asignificant association between these two variables at a 90percent confidence level. (equivalent to 0.10 significancelevel).

• Thus, we conclude that at 90 percent confidence level,PURCHASE INTENTION and INCOME are associatedsignificantly with each other This may lead us to concludesignificantly with each other. This may lead us to concludethat the price of the detergent is important in its purchase.

Page 26: Chi Square Test for Cross Tab - Session 9 & 10

Example 2: Chi square test for Cross tabsExample 2: Chi square test for Cross‐tabs 

• Suppose the researcher finds the association• Suppose the researcher finds the association between educational background (independent 

i bl ) f PGDM t d t d th i fvariable) of PGDM students and their performance in terms of grade (dependent variable) secured.

• A bivariate cross‐tabulation has been done by combining the above two variables and tabulating g gthe data together. 

• Here assumption is made by our group based on• Here assumption is made by our group based on information extracted from the database (performance) of B schools(performance) of B‐schools.

Page 27: Chi Square Test for Cross Tab - Session 9 & 10

• We want to test at 90% and 95% confidence level, what is the level of significance of gassociation between EDUCATIONAL BACKGROUND of PGDM students and theirBACKGROUND of PGDM students and their PERFORMANCE in terms of GRADE.

Page 28: Chi Square Test for Cross Tab - Session 9 & 10

• Further, the variables are coded.

• Educational background and their eqvivalent codes areEducational background CodeEducational background  Code

B.Com 1B E 2B.E. 2B.Sc. 3B B A 4B.B.A. 4B.A.  5

• Grade codes are as follows:Grade Obtainend Grade Code

A 1B 2C 3

Page 29: Chi Square Test for Cross Tab - Session 9 & 10

• These two variables were cross‐tabulated for twenty‐five observations.y

• A cross‐tabulation with a Chi‐squared test was performed using SPSS packageperformed using SPSS package.

Page 30: Chi Square Test for Cross Tab - Session 9 & 10

Input data tablell k d d d d dS.No. Roll No. Background    Code Grade Grdcode

1 1 B.Com 1 B        22 2 B.Com 1 C        33 3 B.Com 1 A        14 4 B.Com 1 C        35 5 B.Com 1 B        26 6 B.E.     2 A        17 7 B.E. 2 A 17 7 B.E.     2 A        18 8 B.E.     2 A        19 9 B.E.     2 B        210 10 B.E.     2 A        111 11 B Sc 3 B 211 11 B.Sc.    3 B        212 12 B.Sc.    3 B        213 13 B.Sc.    3 C        314 14 B.Sc.    3 C        315 15 B.Sc.    3 C        316 16 BBA      4 A        117 17 BBA      4 B        218 18 BBA      4 C        319 19 BBA      4 C        320 20 BBA      4 B        221 21 B.A.     5 C        322 22 B.A. 5 C 322 22 B.A.     5 C        323 23 B.A.     5 C        324 24 B.A.     5 C        325 25 B.A.     5 B        2

Page 31: Chi Square Test for Cross Tab - Session 9 & 10

Output table 2: Grades Vs Entry QualificationOutput table 2: Grades Vs Entry Qualification

Page 32: Chi Square Test for Cross Tab - Session 9 & 10

Result 2: Chi Square test for cross tabResult 2: Chi‐Square test for cross‐tab

Page 33: Chi Square Test for Cross Tab - Session 9 & 10
Page 34: Chi Square Test for Cross Tab - Session 9 & 10

Interpretation 2: Chi‐Square test for cross‐tab• The Chi‐square test revealed the significant association between the educational background of the studentsbetween the educational background of the students and their performance in terms of grade.

• The significance level of 0.089 (Pearson’s) has been achieved This means the Chi‐square test is showing aachieved. This means the Chi square test is showing a significant association between the above two variables at 91.1% confidence level (100 – 8.9).

• Thus we conclude that at 90% confidence level, ,educational background of PGDM students and their performance in terms of grade are associated significantly with each other, whereas this is not significant at the 95% confidence level. 

Page 35: Chi Square Test for Cross Tab - Session 9 & 10

• From the obtained contingency coefficient (C) of 0.596, it g y ( ) ,can be inferred that the association between the dependent and independent variable is significant, as the value 0.596 is closer to 1 that to 0. 

• From the Lambda asymmetric value (with grade code dependent) of 0.286, we conclude that there is a moderate level of association between the above two variables. This lambda value tells us that there is a 28.6% reduction in predicting the grade of student when we know his educational background.

• This leads us to conclude that educational background plays a vital role in the performance of the students of PGDM course. 

Page 36: Chi Square Test for Cross Tab - Session 9 & 10

Example 3: Chi‐square test for cross tab ‐ 3• A manufacturer was interested in assesing how children ages four, five 

and six play with one of the manufacturer’s toys. Each child was asked 1 i ll i h hild’ l d i i h15 questions. Following the child’s completed interview, the parent was asked the same 15 questions to validate the child’s answers. The following table lists the number of responses to selected items from g pthe survey. One hundred interviewers were conducted with both the parent and the child. Notice that item response rates varied from 

ti t ti F h ti t t t l t th d th tquestion to question. For each question, state at least one method that could be used to attempt to correct for this item nonresponse bias.

Question # Children Responding

# Parents Responding

Age of child 95 100

Location of Play 80 85

How much the child 30 50How much the child liked the toy

30 50

Page 37: Chi Square Test for Cross Tab - Session 9 & 10

Result 3: Chi square test for cross tabResult 3: Chi‐square test for cross‐tab

Page 38: Chi Square Test for Cross Tab - Session 9 & 10
Page 39: Chi Square Test for Cross Tab - Session 9 & 10

• Thank you…