descriptive statistics with r
DESCRIPTION
TRANSCRIPT
![Page 1: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/1.jpg)
Descriptive Statistics with
2012-10-12 @HSPHKazuki Yoshida, M.D. MPH-CLE student
FREEDOMTO KNOW
![Page 2: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/2.jpg)
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
![Page 3: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/3.jpg)
n Introduction to R
n Reading Data into R (1)
n Reading Data into R (2)
Previously in this group
Group Website: http://rpubs.com/kaz_yos/useR_at_HSPH
![Page 4: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/4.jpg)
Menu
n mean and sd
n median, quantiles, IQR, max, min, and range
n skewness and kurtosis
n smarter ways of doing these
![Page 5: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/5.jpg)
Ingredients
n Summary statistics for continuous data
n Normal data
n Non-normal data
n Normality check
n vector and data frame
n DATA$VAR extraction
n Indexing by [row,col]
n Various functions
n skewness(), kurtosis()
n summary()
n describe(), describeBy()
Statistics Programming
![Page 6: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/6.jpg)
http://echrblog.blogspot.com/2011/04/statistics-on-states-with-systemic-or.html
Data loadedWhat’s next?
![Page 7: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/7.jpg)
Descriptive Statistics
http://www.ehow.com/info_8650637_descriptive-statistical-methods.html
![Page 8: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/8.jpg)
Descriptive statistics is the
discipline of quantitatively describing the main features of a collection of data
http://en.wikipedia.org/wiki/Descriptive_statistics
![Page 9: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/9.jpg)
Open R Studio
![Page 10: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/10.jpg)
http://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20bI&product_isbn_issn=9780538733496
Download comma-separated and Excel
BONEDEN.DAT.txtPut them in folder
![Page 11: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/11.jpg)
Read in BONEDEN.DAT.txt
Name it bone
![Page 12: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/12.jpg)
DATA$VARe.g., mean(bone$age)
Accessing a single variable in data set
dataset name variable name
![Page 13: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/13.jpg)
vector
![Page 14: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/14.jpg)
http://healthy-india.org/enviromentalhealth/direct_indirect2.html
?
![Page 15: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/15.jpg)
1 2 3 4 5 6 7 8
like strings with values attached
“A” “B” “C” “D” “E” “F” “G” “H”
DATA$VAR is a vector
OR
![Page 16: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/16.jpg)
1 2 3 4 5 6 7 8
Multiple vectors of same length tied together
“A” “B
” “C” “D
” “E” “F” “G
” “H”
DATA is a data frame
1 2 3 4 5 6 7 8“A
” “B” “C
” “D” “E
” “F” “G” “H
”
1 2 3 4 5 6 7 8
“A” “B
” “C” “D
” “E” “F” “G
” “H”
Tied here
![Page 17: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/17.jpg)
bone[1:15 , 1:12]
Extract 1st to 15th rows Extract 1st to 12th columns
Indexing: extraction of data from data frame
Don’t forget commaColon in between
![Page 18: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/18.jpg)
age vector within bone data frame
![Page 19: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/19.jpg)
bone$age
Extracted as a vector
![Page 20: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/20.jpg)
meanmean(x, trim = 0, na.rm = FALSE)
![Page 21: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/21.jpg)
Your turn
n What is the mean of age?
adopted from Hadley Wickham
![Page 22: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/22.jpg)
sdsd(x, na.rm = FALSE)
![Page 23: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/23.jpg)
Your turn
n What is the sd of age?
adopted from Hadley Wickham
![Page 24: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/24.jpg)
medianmedian(x, na.rm = FALSE)
![Page 25: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/25.jpg)
Your turn
n What is the median of age?
adopted from Hadley Wickham
![Page 26: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/26.jpg)
quantilequantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 7)
0th, 25th, 50th, 75th, and 100th percentiles by defaults
![Page 27: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/27.jpg)
Your turn
n What is the 25th and 75th percentiles of age?
adopted from Hadley Wickham
![Page 28: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/28.jpg)
IQRIQR(x, na.rm = FALSE, type = 7)
75th percentile - 25th percentile
![Page 29: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/29.jpg)
Your turn
n What is the IQR of age?
adopted from Hadley Wickham
![Page 30: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/30.jpg)
maxmax(..., na.rm = FALSE)
![Page 31: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/31.jpg)
min
min(..., na.rm = FALSE)
![Page 32: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/32.jpg)
Your turn
n What are the minimum and maximum of age?
adopted from Hadley Wickham
![Page 33: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/33.jpg)
rangerange(..., na.rm = FALSE)
![Page 34: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/34.jpg)
Your turn
n What the range of age?
adopted from Hadley Wickham
![Page 35: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/35.jpg)
We now resort toexternal packages
![Page 36: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/36.jpg)
e1071, psychInstall and Load
![Page 37: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/37.jpg)
To load a package by command
library(package)
package name here
double quote “” can be omitted
![Page 38: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/38.jpg)
Assessment of normality
![Page 39: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/39.jpg)
Load e1071 package
![Page 40: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/40.jpg)
skewnessskewness(x, na.rm = FALSE, type = 3)
type = 2 SAStype = 1 Stata
library(e1071)
![Page 41: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/41.jpg)
kurtosiskurtosis(x, na.rm = FALSE, type = 3)
type = 2 SAStype = 1 Stata
library(e1071)
![Page 42: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/42.jpg)
Your turn
n What are the skewness and kurtosis of age by the Stata-method?
adopted from Hadley Wickham
![Page 43: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/43.jpg)
Multiple variablesat once
![Page 44: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/44.jpg)
summarysummary(object, ...)
![Page 45: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/45.jpg)
Your turn
n Try summary on the dataset (data frame).
adopted from Hadley Wickham
![Page 46: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/46.jpg)
describedescribe(x, na.rm = TRUE, interp = FALSE, skew =
TRUE, ranges = TRUE,trim = .1, type = 3)type = 2 SAStype = 1 Stata
library(psych)Various summary
measures
![Page 47: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/47.jpg)
Your turn
n describe(bone[,-1], type = 2)
adopted from Hadley Wickham
![Page 48: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/48.jpg)
describeBydescribeBy(x, group=NULL,mat=FALSE,type=3,...)
type = 2 SAStype = 1 Stata
library(psych)Groupwise summary
![Page 49: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/49.jpg)
Your turn
n describeBy(bone[ , c(-1)] , bone$zyg , type = 2)
adopted from Hadley Wickham
zyg vector for groupingbone data frame
without 1st columns
SAS method for skewness and kurtosis
![Page 50: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/50.jpg)
Ingredients
n Summary statistics for continuous data
n Normal data
n Non-normal data
n Normality check
n vector and data frame
n DATA$VAR extraction
n Indexing by [row,col]
n Various functions
n skewness(), kurtosis()
n summary()
n describe(), describeBy()
Statistics Programming
![Page 51: Descriptive Statistics with R](https://reader035.vdocuments.mx/reader035/viewer/2022062615/548221dfb4af9f33088b4598/html5/thumbnails/51.jpg)