introduction to r - nus.edu.sgnus.edu.sg/alset/wp-content/uploads/2020/07/...contents...
TRANSCRIPT
Introduction to R
Contents – Introduction to R• Introduction to R
• Display datasets• Display First and Last 6 Observations• Display ADL Table Names• Read in Graduate Employment Survey Questionnaire Table• Arithmetic Operators• Function
• Calculate Windsorized Mean• Calculate OLS Estimates and SEs
• Descriptive Statistics• Measures of Central Tendency
• Mean• Median• Trimmed Mean
• Measure of Dispersion• Standard Deviation• Median Absolute Deviation
• Graphics in R• Graphic Settings• Scatter Plot + Line Plot• Line + Bar Chart• Pie Chart• 3 Dimensional Contingency Table
2
…
Display a list of datasets available in JNB
head() – first few observations of the datasettail() – last few observations of the dataset
Display First and Last 6 Observations
5
…
Display ADL Table Names
Read in Graduate Employment Survey Questionnaireges_2016_questions
Arithmetic Operators
8
FunctionCalculate Windsorised Mean
FunctionCalculate OLS Estimates and SEs
Descriptive Statistics
11
Mean
n
xxx
n
x
x n
n
i
i +++==
= ...211
lj𝑥 =1 + 2 + 3 + 5 + 6 + 9 + 10 + 20 + 34
9= 10
Susceptible to the influence of outliers
12
Median
13
1 2 3 4 5 6 7 8 9 10
1 1 2 2 4 5 6 7 8 100000
Trimmed Mean
kn
x
x
kn
ki
i
trimmed2
1
−=−
+=
Trimmed 5%14
Standard Deviation
ix1 x2
1 7 7-7=0 0 12 12-7=5 25
2 8 8-7=1 1 2 2-7=-5 25
3 6 6-7=-1 1 0 0-7=-7 49
4 7 7-7=0 0 14 14-7=7 49
5 7 7-7=0 0 10 10-7=3 9
6 6 6-7=-1 1 9 9-7=2 4
7 8 8-7=1 1 5 5-7=-2 4
8 7 7-7=0 0 4 4-7=-3 9
Total 4 174
( )2xxi −ix xxi −
( )
( )985694.4
7
174
1
755929.07
4
1
1
2
2
1
2
1
==−
−
=
==−
−
=
=
=
n
xx
s
n
xx
s
n
i
i
n
i
i
( )2xxi −ix xxi −
15
Median Absolute Deviation
16
( ) Median Median i xxMAD i −=x1 x1-Median |x1-Median |
2 2 -12 = -10 10
6 6 -12 = -6 6
6 6 -12 = -6 6
12 (Median) 12 -12 = 0 0
17 17 -12 = 5 5
25 25 -12 = 13 13
32 32 -12 = 20 20
Median 6
MAD = 6 σ ≈ 1.4826 * MAD = 1.4826 * 6 = 8.8956
0 5 6 6 10 13 20
16
R Graphics
17
18
19
…
20
21
22
High-Level Function Description
plot() Scatterplothist() Histogramboxplot() Boxplotqqplot(), qqnorm(), qqline() Quantile plotsinteraction.plot() Interaction plotsunflowerplot() Sunflower scatterplotpairs() Scatter plot matrixsymbols() Draw symbols on a plotdotchart(), Dot chartbarplot(), bar chartpie(), pie chartcurve() Draw a curve from a given functionimage() Create a grid of coloured rectangles with colours based
on the values of a third variablecontour(), filled.contour() Contour plotpersp() Plot 3-D surface
High-Level Plot Function
23
Low-Level Plot Function Description
points() Add points to a figurelines() Add lines to a figuretext() Insert text in the plot regionmtext() Insert text in the figure and outer marginstitle() Add figure title or outer titlelegend() Insert legendaxis(), axis.Date() Customize axesabline() Add horizontal and vertical lines or a single linebox() Draw a box around the current plotrug() Add a 1-D plot of the data to the figurepolygon() Draw a polygonrect() Draw a rectanglearrows() Draw arrowssegments() Draw line segmentstrans3d() Add 2-D components to a 3-D plot
low-Level Plot Function
24
Source: Key Household Income Trends, 2012
Year Gini Coefficient
2002 0.454
2003 0.457
2004 0.460
2005 0.465
2006 0.470
2007 0.482
2008 0.474
2009 0.471
2010 0.472
2011 0.473
2012 0.478
Singapore Gini Coefficient from Year 2002 to 2012
25
# Gini Coefficient of Singapore From 2002 to 2012
# Scatter Plot
Year <- c(2002:2012)
Gini <- c(0.454,0.457,0.460,0.465,0.470,0.482,0.474,0.471,0.472,0.473,0.478)
plot(Year,Gini)
plot(x,y)
26
plot(Year,Gini,main="Gini Coefficient\nBased on Household Income from Work per Household Member",
sub="Source: Key Household Income Trends, 2012")
main
sub
27
plot(Year,Gini,main="Gini Coefficient\nBased on Household Income from Work per Household Member",
type = "b",pch=20,
sub="Source: Key Household Income Trends, 2012")
type=“b”
pch=20
28
plot(Year,Gini,main="Gini Coefficient\nBased on Household Income from Work per Household Member",
type = "b",pch=15,
col="red",lwd=2,
ylab="Gini Coefficient",
ylim=c(0.44,0.49),
sub="Source: Key Household Income Trends, 2012")
pch=15
29
plot(Year,Gini,main="Gini Coefficient",
type = "b",pch=15,
col="red",lwd=2,
cex.axis=1.2,cex.lab=1.5,cex.main=1.6,
ylab="Gini Coefficient",
ylim=c(0.44,0.49))
text(2008,0.44,"Based on Household Income from Work per Household Member",cex=0.7)
mtext("Source: Key Household Income Trends, 2012",side=1,line=4,at=2005)
Plot Region
30
par(bg = "lightblue")
plot(Year,Gini,main="Gini Coefficient",
type = "b",pch=15,
col="red",lwd=2,
cex.axis=1.2,cex.lab=1.5,cex.main=1.6,
ylab="Gini Coefficient",
ylim=c(0.44,0.49))
text(2008,0.44,"Based on Household Income from Work per Household Member",cex=0.7)
mtext("Source: Key Household Income Trends, 2012",side=1,line=4,at=2005)
31
Source: Zhang (2009) Lifelong education (learning) in China: Present situation and development trend. Convergence, 42(1), 49-63.
YearNumber of Examinees
(in 10,000s)Number of Graduates
(in 10,000s)
1996 858.21 26.02
1997 1014.31 28.88
1998 1180.81 34.54
1999 1305.16 42.20
2000 1327.68 48.94
2001 1330.43 64.10
2002 1285.10 129.42
2003 1155.91 70.45
2004 1234.53 78.81
2005 1058.04 254.26
Number of Examinees and Graduates in China between 1996 and 2005
Examinee <- c(858.21,1014.31,1180.81,1305.16,1327.68,1330.43,1285.10,1155.91,1234.53,1058.04)
Graduate <- c(26.02,28.88,34.54,42.20,48.94,64.10,129.42,70.45,78.81,254.26)
Year <- c(1996:2005)
32
Line + Bar Chart
33
barplot(Examinee)
par(new=TRUE)
plot(Year,Graduate) High-level Plot Function
barplot(Examinee)
plot(Year,Graduate)
plot(Year,Graduate)
barplot(Examinee)
34
barplot(Examinee,
main="No. of Examinees and Graduates\nin China between 1996 and 2005")
par(new=TRUE)
plot(Year,Graduate,type="l")
35
par(mar=c(5,6,4,6))barplot(Examinee,
main="No. of Examinees and Graduates\nin China between 1996 and 2005",
las=1,
names.arg=c(1996:2005))
par(new=TRUE)
plot(Year,Graduate,type="l",col="coral1",
xaxt="n",yaxt="n",xlab="",ylab="",
lwd=3)
axis(4,las=1)
5 Lines
6 Lines
4 Lines
6 L
ine
s
36
37
par(mar=c(5,6,4,6))
barplot(Examinee,
main="No. of Examinees and Graduates\nin China between 1996 and 2005",
las=1,
names.arg=c(1996:2005))par(new=TRUE)
plot(Year,Graduate,type="l",col="coral1",
xaxt="n",yaxt="n",xlab="",ylab="",
lwd=3)
axis(4,las=1)
names.arg a vector of names to be plotted below each bar or group of bars.
38
par(mar=c(5,6,4,6))
barplot(Examinee,
main="No. of Examinees and Graduates\nin China between 1996 and 2005",
las=1,
names.arg=c(1996:2005))
par(new=TRUE)
plot(Year,Graduate,type="l",col="coral1",
xaxt="n",yaxt="n",xlab="",ylab="",
lwd=3)
axis(4,las=1)axis(4,las=1)
39
par(mar=c(5,6,4,6))
barplot(Examinee,
main="No. of Examinees and Graduates\nin China between 1996 and 2005",
las=1,
col="pink",
names.arg=c(1996:2005))
par(new=TRUE)
plot(Year,Graduate,type="l",col="coral1",
xaxt="n",yaxt="n",xlab="",ylab="",
lwd=3)
axis(4,las=1)
40
Examinee <- c(858.21,1014.31,1180.81,1305.16,1327.68,1330.43,1285.10,1155.91,1234.53,1058.04)
Graduate <- c(26.02,28.88,34.54,42.20,48.94,64.10,129.42,70.45,78.81,254.26)
Year <- c(1996:2005)
par(mar=c(5,6,4,6))
barplot(Examinee,col="pink",
main="No. of Examinees and Graduates\nin China between 1996 and 2005",
cex.main=1.5,
xlab="Year",
cex.lab=1.2,
ylim=c(0,1400),
las=1,
names.arg=c(1996:2005))
par(new=TRUE)
plot(Year,Graduate,type="l",col="coral1",
xaxt="n",yaxt="n",xlab="",ylab="",
lwd=3)
axis(4,las=1)
mtext("Number of Graduates",side=4,line=3,cex=1.2,col="coral1")
mtext("Number of Examinees",side=2,line=3,cex=1.2,col="pink")
text(2001,25,"Source:Yearbook of Educational Statistics in China, 2006")
pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
names(pie.sales) <- c("Blueberry", "Cherry",
"Apple", "Boston Cream", "Other", "Vanilla Cream")
pie(pie.sales,col=rainbow(6))
41
heat.colors() terrain.colors() topo.colors() cm.colors()
Table <- matrix(
c(45, 5,16, 2,
1,33, 3, 7,
20,10,56, 4,
2, 3, 5,50),
ncol=4,byrow=T)
rows <- rep(1:4,4)
cols <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4))
dimnames(Table) = list(
c("Strongly Disagree", "Disagree","Agree","Strongly Agree"),
c("Strongly Disagree", "Disagree","Agree","Strongly Agree"))
library(scatterplot3d)
scatterplot3d(rows, cols, as.vector(Table),
type="h", pch=" ", angle=50,
lab=c(3,3), lwd=5,
main="3 Dimensional Contingency Table",
xlab="Current Satisfaction With Life",
ylab="Last Year Satisfaction with Life",
zlab="Observation",
x.ticklabs=rownames(Table),
y.ticklabs=colnames(Table),
y.margin.add=1.2,
color="red")
42
3 Dimensional Contingency Table