r intro 20140716-advance

45
R 統統統 統統 (2) 統統統統統統 統統統 統統統統統統統統統統統 統統統統統統統統

Upload: kevin-chun-hsien-hsu

Post on 07-Aug-2015

27 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: R intro 20140716-advance

R 統計軟體簡介 (2)常用統計分析

徐峻賢中央研究院語言學研究所

大腦與語言實驗室

Page 2: R intro 20140716-advance

• What is central limit theorem ?x <- rnorm(30, mean = 1, sd = 2)hist(x)

xmean <- numeric(100)for (i in 1:100){

x <- rnorm(30, mean = 1, sd = 2)xmean[i] <-mean(x)

}hist(xmean)

Page 3: R intro 20140716-advance

• What is central limit theorem ?y <- rexp(100, rate = 1)hist(y)

ymean <- numeric(100)for (i in 1:100){

y <- rexp(100, rate = 1)ymean[i] <-mean(y)

}hist(ymean)

Page 4: R intro 20140716-advance

rnorm() 產生常態分布的隨機變數dnorm() probability densitypnorm() cumulative probability

functionqnorm() the value of quantile

rnorm(n=30,mean=0,sd=1)dnorm(1)== 1/sqrt(2*pi)*exp(-1/2)pnorm(1.645, mean=0,sd=1)qnorm(0.95,mean=0,sd=1)

Page 5: R intro 20140716-advance
Page 6: R intro 20140716-advance

建立 R documents 的好習慣• R 軟體有很多細節,使用者偶而會出現失

讀症的徵狀…

Page 7: R intro 20140716-advance

建立 R documents 的好習慣• 多做注解 (##)• 留意套件和主程式的版本 (R-news)

• 在 documents 的開頭交代基本環境– e.g.:

### This is for …. By xxx at 2014/7/06library(ez)setwd(“c:/data/”)load(“myexample.Rdata”)rm(list=ls())

Page 8: R intro 20140716-advance

quasif data set in languageR package

Source: Raaijmakers et al., 1999, Table2

Page 9: R intro 20140716-advance

data(lexicalMeasures)Lexical distributional measures for 2233 English monomorphemic words. This dataset provides a subset of the data available in the dataset english.

Baayen, R.H., Feldman, L. and Schreuder, R. (2006) Morphological influences on the recognition of monosyllabic monomorphemic words, Journal of Memory and Language, 53, 496-512.

Page 10: R intro 20140716-advance

data(lexicalMeasures)head(lexicalMeasures)lexicalMeasures.cor = cor(lexicalMeasures[,-1], method = "spearman")^2lexicalMeasures.dist = dist(lexicalMeasures.cor)

### Hierarchical ClusteringlexicalMeasures.clust = hclust(lexicalMeasures.dist)plclust(lexicalMeasures.clust)

### or

### DIvisive ANAlysis Clusteringpltree(diana(lexicalMeasures.dist))

Page 11: R intro 20140716-advance
Page 12: R intro 20140716-advance
Page 13: R intro 20140716-advance

quasif data set in languageR package

> ldt=quasif> detach(package:languageR)

> B=read.csv(file="Baayen2008C.csv")> head(ldt, n=10)> tail(ldt, n=10)

Page 14: R intro 20140716-advance

Accessing information in data frames

dataframe[r,c]

> B[1, 4][1] 466

> B[1:2, ] Subj Item SOA RT1 s1 w1 Long 466

2 s1 w2 Long 520

> B[,4] [1] 466 520 502 475 …

Page 15: R intro 20140716-advance

Accessing information in data frames

dataframe$variable

> B$RT [1] 466 520 502 475 …

> B[B$Subj=="s1", 4][1] 466 520 502 475 494 490

> B[B$RT<500, 4][1] 466 475 494 490 491 484 470

Page 16: R intro 20140716-advance

Sorting a data frame> B=B[order(B$Item, B$SOA), ];B

Subj Item SOA RT1 s1 w1 Long 4662 s1 w2 Long 5203 s1 w3 Long 5024 s1 w1 Short 4755 s1 w2 Short 4946 s1 w3 Short 4907 s2 w1 Long 5168 s2 w2 Long 5669 s2 w3 Long 57710 s2 w1 Short 49111 s2 w2 Short 54412 s2 w3 Short 52613 s3 w1 Long 48414 s3 w2 Long 52915 s3 w3 Long 53916 s3 w1 Short 47017 s3 w2 Short 51118 s3 w3 Short 528

Page 17: R intro 20140716-advance

Changing information in a data frame> B$RT=B$RT/1000;B

Subj Item SOA RT1 s1 w1 Long 0.4662 s1 w2 Long 0.5203 s1 w3 Long 0.5024 s1 w1 Short 0.4755 s1 w2 Short 0.4946 s1 w3 Short 0.4907 s2 w1 Long 0.5168 s2 w2 Long 0.5669 s2 w3 Long 0.57710 s2 w1 Short 0.49111 s2 w2 Short 0.54412 s2 w3 Short 0.52613 s3 w1 Long 0.48414 s3 w2 Long 0.52915 s3 w3 Long 0.53916 s3 w1 Short 0.47017 s3 w2 Short 0.51118 s3 w3 Short 0.528

Page 18: R intro 20140716-advance

Contingency tables for data frames> B.xtab=xtabs(~ SOA+Item, data=B);B.xtab ItemSOA w1 w2 w3Long 3 3 3Short 3 3 3

> B.xtab.g500=xtabs(~ SOA+Item, + data=B,subset=B$RT>500);B.xtab.g500 ItemSOA w1 w2 w3Long 1 3 3Short 0 2 2

Page 19: R intro 20140716-advance

Calculations on data frames> bysub=aggregate(B$RT, list(B$SOA, B$Subj), + mean); bysub

Group.1 Group.2 x1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

> colnames(bysub) = c(“SOA”, “Subj”, “meanRT”)> bysub

SOA Subj meanRT1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

Page 20: R intro 20140716-advance

Calculations on data frames> byitem=aggregate(B$RT, list(B$SOA, B$Item), + mean); byitem

Group.1 Group.2 x1 Long w1 488.66672 Short w1 478.66673 Long w2 538.33334 Short w2 516.33335 Long w3 539.33336 Short w3 514.6667

> colnames(byitem) = c(“SOA”, “Subj”, “meanRT”)> byitem

SOA Subj meanRT1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

Page 21: R intro 20140716-advance

• By subject analysisbysub=aggregate(B$RT, list(B$SOA, B$Subj), mean);bysubnames(bysub) <- c("SOA", "Subj", "RT”)

rt_anova = ezANOVA( data = B #### 用 aggregate 之前的 data frames , dv = RT , wid = Subj , within = .(SOA))print(rt_anova)

rt_anova3 = ezANOVA( data = bysub #### 用 by subject mean 的 data frames , dv = RT , wid = Subj , within = .(SOA))print(rt_anova3)

Page 22: R intro 20140716-advance

• By item analysisbyitem=aggregate(B$RT, list(B$SOA, B$Item), mean);byitemnames(byitem) <- c("SOA", "items", "RT")

rt_anova2 = ezANOVA( data = byitem , dv = RT ,wid = items , between = SOA)print(rt_anova2)

Page 23: R intro 20140716-advance

• data(ANT)– ANT{ez}– Simulated data from the Attention Network Test – J Fan, BD McCandliss, T Sommer, A Raz, MI Posner

(2002). Testing the efficiency and independence of attentional networks. Journal of Cognitive Neuroscience, 14, 340-347.• 2 within-Ss variables (“cue” and “flank”)• 1 between-Ss variable (“group”)• 2 dependent variables (“rt”, and “error”)

Page 24: R intro 20140716-advance

> data(ANT) ### A data frame with 5760 observations on the following 10 variables> head(ANT, 20)

Page 25: R intro 20140716-advance
Page 26: R intro 20140716-advance

aov.rt = ezANOVA( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group)print(aov.rt)

Page 27: R intro 20140716-advance
Page 28: R intro 20140716-advance

aov.rt = ezANOVA( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group , detailed = T)print(aov.rt)

Page 29: R intro 20140716-advance

bt_descriptives = ezStats( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , between = group)print(bt_descriptives)

Page 30: R intro 20140716-advance

所有獨變項組合的平均反應時間

all_descriptives = ezStats( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group)print(all_descriptives)

Page 31: R intro 20140716-advance

group_plot = ezPlot( data = ANT[ANT$error==0,] , dv = .(rt) , wid = .(subnum) , between = .(group) , x = .(group) , do_lines = FALSE , x_lab = 'Group' , y_lab = 'RT (ms)')print(group_plot)

Page 32: R intro 20140716-advance

cue_by_flank_plot = ezPlot( data = ANT[ANT$error==0,] , dv = .(rt) , wid = .(subnum) , within = .(cue,flank) , x = .(flank) , split = .(cue) , x_lab = 'Flanker' , y_lab = 'RT (ms)' , split_lab = 'Cue')print(cue_by_flank_plot)

Page 33: R intro 20140716-advance

• 自我挑戰:– (1) 用 aggregare 計算正確反應時間的 by

subject mean– (2) 用 (1) 的輸出執行 ezANOVA– (3) 用 aggregate 計算每個人、每個 condition

的錯誤率– (4) 用 ezStats 計算每個人、每個 condition 的

錯誤率– (5) 使用錯誤率分析、畫圖

Page 34: R intro 20140716-advance

各種常用的指令

Page 35: R intro 20140716-advance

運算子 (operators)Arithmetic Comparison Logical

+ addition < lesser than !x logical NOT- subtraction > greater than x&y logical AND

* multiplication <=lesser than or equal to x&&y id.

/ division >=greater than or equal to x|y logical OR

^ power ==equal x||y id.

%% modulo != different xor(x,y) exclusive OR

%/% integer division       x<-matrix(1:6,2,3) #製造一個 2*3的矩陣 x,其數值為 1到 6

x[2,3]==6 # x矩陣第 2row第 3column的值是否等於 6

x[x<=3] # 列出 x矩陣內小於或等於 3的數值

x[x!=6] # 列出 x矩陣內不等於 6的數值

x[x<=3 & x!=2] #列出 x矩陣內小於或等於 3且不等於 2的值

Page 36: R intro 20140716-advance

函數 (function)• function.name(object, argument,

option) 函數名稱 物件 指令 選項 #args(function.name) 查詢該函數的指令• 數學及簡單函數 sum(),mean(),max(),length()• 產生隨機變數 rnorm(),runiform(),rbinom()• 初統常用分析函數 t.test(),aova(),lm()

Page 37: R intro 20140716-advance

產生隨機序列

Page 38: R intro 20140716-advance

Graphing

> windows() #開啟一個繪圖視窗> par(mfrow=c(m,n)) #將繪圖視窗切割成m*n區

> plot(x) #散佈圖> hist(x) #直方圖> boxplot(x) #箱型圖> qqnorm(x);qqline(x) #QQ Plot

main=“titile”xlab=“x lable name” ylab=“y lable name”xlim=c(a,b) ylim=c(a,b)

Page 39: R intro 20140716-advance

Graphing

> windows()> plot(B$RT, main="Scatter plot of B", ylab="B")

Page 40: R intro 20140716-advance

Graphing

> windows()> hist(B$RT, main="Histogram of B", xlab="B")

Page 41: R intro 20140716-advance

Graphing

> windows()> boxplot(B$RT, main="Boxplot of B")

Page 42: R intro 20140716-advance

Graphing

> windows()> qqnorm(B$RT); qqline(B$RT)

Page 43: R intro 20140716-advance

Graphing

> windows()

> par(mfrow=c(2,2))

> plot(B$RT, main="Scatter plot of B", ylab="B")

> hist(B$RT, main="Histogram of B", xlab="B")

> boxplot(B$RT, main="Boxplot of B")

> qqnorm(B$RT); qqline(B$RT)

Page 44: R intro 20140716-advance

Graphing

Page 45: R intro 20140716-advance

Exercise 3

• 請依據MASS中 leuk資料集內的 time變項資料製作下面這張圖 ,並儲存成MASSleuk.jpeg