cross-tabulation tables
DESCRIPTION
Cross-Tabulation Tables. Tables in R and Computing Chi Square. Kinds of Data. Nominal or Ordinal (few categories) Interval if it is grouped Some tests ignore the ordering of the categories (e.g. Chi square ) In R this means we are working with factors. Kinds of Tables. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/1.jpg)
Cross-Tabulation Tables
Tables in R and Computing Chi Square
![Page 2: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/2.jpg)
Kinds of Data• Nominal or Ordinal (few
categories)• Interval if it is grouped• Some tests ignore the ordering of
the categories (e.g. Chi square)• In R this means we are working
with factors
![Page 3: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/3.jpg)
Kinds of Tables1. One line per observation, e.g.
data on Ernest Witte where each row is a single individual - table() and Rcmdr()
2. One line per cell with a column of numbers representing the count for that cell – xtabs()
![Page 4: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/4.jpg)
Kinds of Tables3. A row for each category of the
first variable and a column for each category of the second variable with counts at the intersection of a row and column – Rcmdr (Enter table directly)
![Page 5: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/5.jpg)
Type 1> EWG2[sample(rownames(EWG2), 6),c("Age", "Goods")]
Age Goods159 Middle Adult Absent126 Child Present075 Child Absent156 Old Adult Present095 Adult Absent157 Old Adult Absent
![Page 6: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/6.jpg)
Type 2 Age Goods FreqChild Absent 18Adult Absent 51Child Present 19Adult Present 55
![Page 7: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/7.jpg)
Type 3
Absent Present Child 18 19 Adult 51 55
![Page 8: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/8.jpg)
Factors in R• Factors use integers to code for
categorical data• Each integer code is associated
with a label, e.g. 1 could stand for “Absent” and 2 for “Present”
• Usually R creates factors from any character data columns
![Page 9: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/9.jpg)
Factors• Regular factors are either equal or
not equal (nominal)• Ordered factors can be >, ==, and
<• Rcmdr makes is easy to convert a
numeric variable to a factor, to change the factor labels, to change the order of the factor levels, and to make the factor ordered
![Page 10: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/10.jpg)
Tables in R• Tables are basically matrices with
labeling• Transferring between data.frames
and tables is possible but can lead to unexpected results
• Rcmdr does not recognize tables.
![Page 11: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/11.jpg)
Key table commands in R• table() – create one and multi-way
tables• xtabs() - uses formulas (and
optionally weights/counts)• addmargins() – add row and
column totals• prop.table() – create table of
proportions
![Page 12: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/12.jpg)
Key commands (cont.)• ftable() – flatten a
multidimensional table – but does not work with xtable()
• print(xtable(), type=“html”) – print an html version of the table.
![Page 13: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/13.jpg)
# Use Rcmdr to load ErnestWitte and create EWG2# EWG2 <- subset(ErnestWitte, subset=Group==2)table(EWG2$Age)EWG2$Age <- factor(EWG2$Age)Table1 <- table(EWG2$Age, EWG2$Goods, dnn=c("Age", "Goods"))Table1str(Table1)Table2 <- xtabs(~Age+Goods, data=EWG2)Table2str(Table2)DF1 <- data.frame(Table1)DF1names(DF1) <- c("Age", "Goods","Freq")DF
![Page 14: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/14.jpg)
Table3 <- xtabs(Freq~Age+Goods, data=DF1)Table3addmargins(Table1)prop.table(Table1)prop.table(Table1, 1)prop.table(addmargins(Table1, 1), 1)
# Included in RcmdrrowPercents(Table1)colPercents(Table1)
![Page 15: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/15.jpg)
Table4 <- xtabs(~Adult+Goods+Pathology, data=EWG2)Table4str(Table4)ftable(Table4, row.vars=c(1, 2), col.vars=3)ftable(Table4, row.vars=c(3, 2), col.vars=1)
# tohtml() puts html code for table into Windows# clipboard or a file# named “clipboard” in Mac OsX or Linuxtohtml <- function(x) print(xtable(x), type="html", file="clipboard")tohtml(Table1)# Paste clipboard into Microsoft Excel
![Page 16: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/16.jpg)
Null Hypothesis• The usual null hypothesis is that
the row and column variables are independent of one another – knowing one does not help us predict the other
• If the null hypothesis is false, the cell values will deviate from expected values
![Page 17: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/17.jpg)
E.g. Coin Flipping• If I flip a coin twice, the chance
that the first flip comes up heads is .5
• The chance that the second flip comes up heads is .5 as well
• But what if the chance of getting a head changed depending on the first toss? The probabilities would be conditional
![Page 18: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/18.jpg)
Expected Probabilities• Under the null hypothesis the
expected value for a cell is– (Row sum * Column sum)/Total count
• Deviations of the actual counts from the expected values is measured as– (Observed – Expected)2/Expected
• Summing the deviations over all cells gives us a statistic with a chi-square distribution
![Page 19: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/19.jpg)
Chi-Square Test• Compares observed counts to
expected counts based on independence
• Rcmdr constructs the tables and computes the test, BUT deletes the results
![Page 20: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/20.jpg)
Two Options• chisq.test()
– Saves results in multiple tables– Performs Chi Square and simulation
for p value• CrossTable() and crosstab() in
descr– SAS, SPSS style output with xtable()– More formatting options– Mosaic plot with crosstab()
![Page 21: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/21.jpg)
Results <- chisq.test(xtabs(~Age+Pathology, data=EWG2), simulate.p.value=TRUE)
Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)
data: xtabs(~Age + Pathology, data = EWG2) X-squared = 31.2876, df = NA, p-value = 0.0004998
str(Results)Results$expectedResults$residualsfisher.test(xtabs(~Sex+Goods, data=EWG2))
![Page 22: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/22.jpg)
with(EWG2, CrossTable(Age, Pathology))with(EWG2, CrossTable(Age, Pathology, prop.c=FALSE, prop.t=FALSE))with(EWG2, crosstab(Age, Pathology))with(EWG2, crosstab(Age, Pathology, expected=TRUE, resid=TRUE))with(EWG2, crosstab(Sex, Goods))
![Page 23: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/23.jpg)
![Page 24: Cross-Tabulation Tables](https://reader036.vdocuments.mx/reader036/viewer/2022062316/568168b4550346895ddf8965/html5/thumbnails/24.jpg)