r intro 20140716-advance

R 統計軟體簡介 (2)常用統計分析

徐峻賢中央研究院語言學研究所

大腦與語言實驗室

• What is central limit theorem ？x <- rnorm(30, mean = 1, sd = 2)hist(x)

xmean <- numeric(100)for (i in 1:100){

x <- rnorm(30, mean = 1, sd = 2)xmean[i] <-mean(x)

}hist(xmean)

• What is central limit theorem ？y <- rexp(100, rate = 1)hist(y)

ymean <- numeric(100)for (i in 1:100){

y <- rexp(100, rate = 1)ymean[i] <-mean(y)

}hist(ymean)

rnorm() 產生常態分布的隨機變數dnorm() probability densitypnorm() cumulative probability

functionqnorm() the value of quantile

rnorm(n=30,mean=0,sd=1)dnorm(1)== 1/sqrt(2*pi)*exp(-1/2)pnorm(1.645, mean=0,sd=1)qnorm(0.95,mean=0,sd=1)

建立 R documents 的好習慣• R 軟體有很多細節，使用者偶而會出現失

讀症的徵狀…

建立 R documents 的好習慣• 多做注解 (##)• 留意套件和主程式的版本 (R-news)

• 在 documents 的開頭交代基本環境– e.g.:

### This is for …. By xxx at 2014/7/06library(ez)setwd(“c:/data/”)load(“myexample.Rdata”)rm(list=ls())

quasif data set in languageR package

Source: Raaijmakers et al., 1999, Table2

data(lexicalMeasures)Lexical distributional measures for 2233 English monomorphemic words. This dataset provides a subset of the data available in the dataset english.

Baayen, R.H., Feldman, L. and Schreuder, R. (2006) Morphological influences on the recognition of monosyllabic monomorphemic words, Journal of Memory and Language, 53, 496-512.

data(lexicalMeasures)head(lexicalMeasures)lexicalMeasures.cor = cor(lexicalMeasures[,-1], method = "spearman")^2lexicalMeasures.dist = dist(lexicalMeasures.cor)

### Hierarchical ClusteringlexicalMeasures.clust = hclust(lexicalMeasures.dist)plclust(lexicalMeasures.clust)

### or

### DIvisive ANAlysis Clusteringpltree(diana(lexicalMeasures.dist))

quasif data set in languageR package

> ldt=quasif> detach(package:languageR)

> B=read.csv(file="Baayen2008C.csv")> head(ldt, n=10)> tail(ldt, n=10)

Accessing information in data frames

dataframe[r,c]

> B[1, 4][1] 466

> B[1:2, ] Subj Item SOA RT1 s1 w1 Long 466

2 s1 w2 Long 520

> B[,4] [1] 466 520 502 475 …

Accessing information in data frames

dataframe$variable

> B$RT [1] 466 520 502 475 …

> B[B$Subj=="s1", 4][1] 466 520 502 475 494 490

> B[B$RT<500, 4][1] 466 475 494 490 491 484 470

Sorting a data frame> B=B[order(B$Item, B$SOA), ];B

Subj Item SOA RT1 s1 w1 Long 4662 s1 w2 Long 5203 s1 w3 Long 5024 s1 w1 Short 4755 s1 w2 Short 4946 s1 w3 Short 4907 s2 w1 Long 5168 s2 w2 Long 5669 s2 w3 Long 57710 s2 w1 Short 49111 s2 w2 Short 54412 s2 w3 Short 52613 s3 w1 Long 48414 s3 w2 Long 52915 s3 w3 Long 53916 s3 w1 Short 47017 s3 w2 Short 51118 s3 w3 Short 528

Changing information in a data frame> B$RT=B$RT/1000;B

Subj Item SOA RT1 s1 w1 Long 0.4662 s1 w2 Long 0.5203 s1 w3 Long 0.5024 s1 w1 Short 0.4755 s1 w2 Short 0.4946 s1 w3 Short 0.4907 s2 w1 Long 0.5168 s2 w2 Long 0.5669 s2 w3 Long 0.57710 s2 w1 Short 0.49111 s2 w2 Short 0.54412 s2 w3 Short 0.52613 s3 w1 Long 0.48414 s3 w2 Long 0.52915 s3 w3 Long 0.53916 s3 w1 Short 0.47017 s3 w2 Short 0.51118 s3 w3 Short 0.528

Contingency tables for data frames> B.xtab=xtabs(~ SOA+Item, data=B);B.xtab ItemSOA w1 w2 w3Long 3 3 3Short 3 3 3

> B.xtab.g500=xtabs(~ SOA+Item, + data=B,subset=B$RT>500);B.xtab.g500 ItemSOA w1 w2 w3Long 1 3 3Short 0 2 2

Calculations on data frames> bysub=aggregate(B$RT, list(B$SOA, B$Subj), + mean); bysub

Group.1 Group.2 x1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

> colnames(bysub) = c(“SOA”, “Subj”, “meanRT”)> bysub

SOA Subj meanRT1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

Calculations on data frames> byitem=aggregate(B$RT, list(B$SOA, B$Item), + mean); byitem

Group.1 Group.2 x1 Long w1 488.66672 Short w1 478.66673 Long w2 538.33334 Short w2 516.33335 Long w3 539.33336 Short w3 514.6667

> colnames(byitem) = c(“SOA”, “Subj”, “meanRT”)> byitem

SOA Subj meanRT1 Long s1 496.00002 Short s1 486.33333 Long s2 553.00004 Short s2 520.33335 Long s3 517.33336 Short s3 503.0000

• By subject analysisbysub=aggregate(B$RT, list(B$SOA, B$Subj), mean);bysubnames(bysub) <- c("SOA", "Subj", "RT”)

rt_anova = ezANOVA( data = B #### 用 aggregate 之前的 data frames , dv = RT , wid = Subj , within = .(SOA))print(rt_anova)

rt_anova3 = ezANOVA( data = bysub #### 用 by subject mean 的 data frames , dv = RT , wid = Subj , within = .(SOA))print(rt_anova3)

• By item analysisbyitem=aggregate(B$RT, list(B$SOA, B$Item), mean);byitemnames(byitem) <- c("SOA", "items", "RT")

rt_anova2 = ezANOVA( data = byitem , dv = RT ,wid = items , between = SOA)print(rt_anova2)

• data(ANT)– ANT{ez}– Simulated data from the Attention Network Test – J Fan, BD McCandliss, T Sommer, A Raz, MI Posner

(2002). Testing the efficiency and independence of attentional networks. Journal of Cognitive Neuroscience, 14, 340-347.• 2 within-Ss variables (“cue” and “flank”)• 1 between-Ss variable (“group”)• 2 dependent variables (“rt”, and “error”)

> data(ANT) ### A data frame with 5760 observations on the following 10 variables> head(ANT, 20)

aov.rt = ezANOVA( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group)print(aov.rt)

aov.rt = ezANOVA( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group , detailed = T)print(aov.rt)

bt_descriptives = ezStats( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , between = group)print(bt_descriptives)

所有獨變項組合的平均反應時間

all_descriptives = ezStats( data = ANT[ANT$error==0,] , dv = rt , wid = subnum , within = .(cue,flank) , between = group)print(all_descriptives)

group_plot = ezPlot( data = ANT[ANT$error==0,] , dv = .(rt) , wid = .(subnum) , between = .(group) , x = .(group) , do_lines = FALSE , x_lab = 'Group' , y_lab = 'RT (ms)')print(group_plot)

cue_by_flank_plot = ezPlot( data = ANT[ANT$error==0,] , dv = .(rt) , wid = .(subnum) , within = .(cue,flank) , x = .(flank) , split = .(cue) , x_lab = 'Flanker' , y_lab = 'RT (ms)' , split_lab = 'Cue')print(cue_by_flank_plot)

• 自我挑戰：– (1) 用 aggregare 計算正確反應時間的 by

subject mean– (2) 用 (1) 的輸出執行 ezANOVA– (3) 用 aggregate 計算每個人、每個 condition

的錯誤率– (4) 用 ezStats 計算每個人、每個 condition 的

錯誤率– (5) 使用錯誤率分析、畫圖

各種常用的指令

運算子 (operators)Arithmetic Comparison Logical

+ addition < lesser than !x logical NOT- subtraction > greater than x&y logical AND

* multiplication <=lesser than or equal to x&&y id.

/ division >=greater than or equal to x|y logical OR

^ power ==equal x||y id.

%% modulo != different xor(x,y) exclusive OR

%/% integer division　　　　x<-matrix(1:6,2,3) #製造一個 2*3的矩陣 x，其數值為 1到 6

x[2,3]==6 # x矩陣第 2row第 3column的值是否等於 6

x[x<=3] # 列出 x矩陣內小於或等於 3的數值

x[x!=6] # 列出 x矩陣內不等於 6的數值

x[x<=3 & x!=2] #列出 x矩陣內小於或等於 3且不等於 2的值

函數 (function)• function.name(object, argument,

option) 函數名稱物件指令選項 #args(function.name) 查詢該函數的指令• 數學及簡單函數 sum(),mean(),max(),length()• 產生隨機變數 rnorm(),runiform(),rbinom()• 初統常用分析函數 t.test(),aova(),lm()

產生隨機序列

Graphing

> windows() #開啟一個繪圖視窗> par(mfrow=c(m,n)) #將繪圖視窗切割成m*n區

> plot(x) #散佈圖> hist(x) #直方圖> boxplot(x) #箱型圖> qqnorm(x);qqline(x) #QQ Plot

main=“titile”xlab=“x lable name” ylab=“y lable name”xlim=c(a,b) ylim=c(a,b)

Graphing

> windows()> plot(B$RT, main="Scatter plot of B", ylab="B")

Graphing

> windows()> hist(B$RT, main="Histogram of B", xlab="B")

Graphing

> windows()> boxplot(B$RT, main="Boxplot of B")

Graphing

> windows()> qqnorm(B$RT); qqline(B$RT)

Graphing

> windows()

> par(mfrow=c(2,2))

> plot(B$RT, main="Scatter plot of B", ylab="B")

> hist(B$RT, main="Histogram of B", xlab="B")

> boxplot(B$RT, main="Boxplot of B")

> qqnorm(B$RT); qqline(B$RT)

Graphing

Exercise 3

• 請依據MASS中 leuk資料集內的 time變項資料製作下面這張圖 ,並儲存成MASSleuk.jpeg

r intro 20140716-advance

Data & Analytics