introduction to contributed packages in r department of statistical sciences and operations research...
TRANSCRIPT
Introduction to Contributed Packages in R
Department of Statistical Sciences and Operations ResearchComputation Seminar SeriesSpeaker: Edward BooneEmail: [email protected]
What is R?
The R statistical programming language is a free open source package based on the S language developed by Bell Labs.
The language is very powerful for writing programs. Many statistical functions are already built in. Contributed packages expand the functionality to
cutting edge research. Since it is a programming language, generating
computer code to complete tasks is required.
Getting Started
Where to get R? Go to www.r-project.org Downloads: CRAN Set your Mirror: Anyone in the USA is fine. Select Windows 95 or later. Select base. Select R-2.4.1-win32.exe
The others are if you are a developer and wish to change the source code.
Getting Started
The R GUI?
Getting Started
Opening a script. This gives you a script window.
Getting Started
Submitting a program:
Use button
Right mouse click and run selection.
Submit Selection
Getting Started
Basic assignment and operations. Arithmetic Operations:
+, -, *, /, ^ are the standard arithmetic operators. Matrix Arithmetic.
* is element wise multiplication %*% is matrix multiplication
Assignment To assign a value to a variable use “<-”
Getting Started
How to use help in R? R has a very good help system built in. If you know which function you want help with
simply use ?_______ with the function in the blank.
Ex: ?hist. If you don’t know which function to use, then use
help.search(“_______”). Ex: help.search(“histogram”).
Importing Data
How do we get data into R? Remember we have no point and click… First make sure your data is in an easy to
read format such as CSV (Comma Separated Values).
Use code: D <- read.table(“path”,sep=“,”,header=TRUE)
Working with data.
Accessing columns. D has our data in it…. But you can’t see it
directly. To select a column use D$column.
Working with data.
Subsetting data. Use a logical operator to do this.
==, >, <, <=, >=, <> are all logical operators. Note that the “equals” logical operator is two = signs.
Example: D[D$Gender == “M”,] This will return the rows of D where Gender is “M”. Remember R is case sensitive! This code does nothing to the original dataset. D.M <- D[D$Gender == “M”,] gives a dataset with the
appropriate rows.
Source Files
Source files allows you to store all of your created functions in a single file and have all those functions available to you.
To load a self created library use:
source(Path)
Don’t forget that \ in the path needs to be replaced with \\
Libraries
In order to keep R’s memory footprint small, additional functionality is stored in libraries.
These libraries can be called through the GUI or scripts.
Beware that some contributed packages may conflict with some libraries.
Contributed Packages
Since R is open source and the developers are well organized, developing and finding contributed packages is easy.
Currently there are 964 contributed packages.
These range from wavelets, financial mathematics to spatial data analysis.
Contributed Packages
One popular library is lattice.
Contributed Packages
You can install contributed packages using the GUI.
Contributed Packages
You can install the package by selecting it from the list.
Note: Installing a package does not make it immediately available for use.
You still need to use the library() statement to make the functionality available to you.
library(lattice)
Help on contributed packages Once a contributed package is loaded you
can access the help for the package and a list of functions available in the package by:
library(help=“lattice”)
The CircStats Package Many times data may come in a circular format. For example the direction of migration or flight of birds from
their nest. The data is an angle not a “linear” measurement. The data can only go between 0 and 2
The CircStats Package Use the CircStats Package.
library(CircStats)
Consider the following: data <- runif(50, 0, pi)
mean.dir <- circ.mean(data)
mean.dir
[1] 1.446502
The CircStats Package Randomly generate data from a Von Mises distribution
data.vm <- rvm(100, 0, 3)
Create a plot of it using circ.plot:
circ.plot(data.vm, stack=TRUE, bins=150, shrink=1.5)
The CircStats Package Regression with circular data: Create some data
data1 <- runif(50, 0, 2*pi)
data2 <- atan2(0.15*cos(data1) + 0.25*sin(data1), 0.35*sin(data1)) + rvm(50, 0, 5)
Run the regression using circ.reg:
circ.lm <- circ.reg(data1, data2, order=1)circ.lm
(Intercept) -0.01365604 -0.02939188cos.alpha -0.29872673 0.41344126
sin.alpha 0.78894271 0.72908521
The CircStats Package Plot the data
plot(data1, data2)
Plot the predicted line
circ.lm$fitted[circ.lm$fitted>pi] <- circ.lm$fitted[circ.lm$fitted>pi] - 2*pi
points(data1[order(data1)], circ.lm$fitted[order(data1)], type='l')
The norm Contributed Package While the norm package sounds as if it
would have something to do with the normal distribution it is in fact a package for dealing with missing data.
It implements the Data Augmentation and Multiple Imputation scheme of Schafer (1997).
Similar to SAS PROC MI.
The norm Contributed Package Load the library.
library(norm)
The norm Contributed Package Generate some data.
X1 <- rnorm(100,6,1)
X2 <- rnorm(100,10,3)
X3 <- rnorm(100,3,.2)
X4 <- rnorm(100,31,2)
Y <- 5 +.4*X1-.3*X2+rnorm(100,0,1)
The norm Contributed Package Generate some missing data.
X1a <- ifelse(runif(100,0,1)<.1,NA,X1)
X2a <- ifelse(runif(100,0,1)<.1,NA,X2)
Put the data together.
YX <- cbind(Y,X1a,X2a,X3,X4)
The norm Contributed Package Prep the data and parameters for multiple
imputation.#do preliminary manipulations s <- prelim.norm(YX)
#find the mle
thetahat <- em.norm(s)
#set random number generator seed
rngseed(1234567)
The norm Contributed Package Create a list to store the individual results in.
betaout <- vector("list",10)
betasterrout <- vector("list",10)
The norm Contributed Package Run a multiple imputation loop
for(i in 1:10){
ximp <- imp.norm(s,thetahat,YX)
beta1 <- lm(ximp[,1]~ximp[,2]+ximp[,3]+ximp[,4]+ximp[,5] )$coefficients
betaout[[i]] <- beta1
betasterrout[[i]] <- summary(lm(ximp[,1]~ximp[,2] + ximp[,3] + ximp[,4] + ximp[,5]))$coefficients[,2]
}
The norm Contributed Package Analyze the results
mi.inference(betaout,betasterrout,confidence=0.95)
The norm Contributed Package Look at the output(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]
6.75624286 0.30502706 -0.32846960 0.05157696 -0.04154060
$std.err
(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]
2.70312542 0.13431178 0.04240159 0.65908509 0.05596610
$df
(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]
1318.8371 222.2528 13269.2373 1770.6680 27689.4900
$signif
(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]
1.256048e-02 2.410251e-02 1.021405e-14 9.376337e-01 4.579447e-01
$r
(Intercept) ximp[, 2] ximp[, 3] ximp[, 4] ximp[, 5]
0.09004737 0.25192843 0.02673983 0.07676697 0.01835967
The lpSolve Package
The lpSolve package allows for the solving of linear and integer programs.
library(lpSolve)
The lpSolve Package
Consider the following linear program:
1 2 3
1 2 3
1 2 3
max :
9
:
2 3 9
3 2 2 15
x x x
ST
x x x
x x x
The lpSolve Package
Set up the vectors and matrices
f.obj <- c(1, 9, 3)
f.con <- matrix (c(1, 2, 3, 3, 2, 2), nrow=2, byrow=TRUE)
f.dir <- c("<=", "<=")
f.rhs <- c(9, 15)
The lpSolve Package
The lp() function will attempt to solve the linear program.
lp ("max", f.obj, f.con, f.dir, f.rhs)
Success: the objective function is 40.5
The lpSolve Package
To obtain the solution grab the solution from the object.
lp("max", f.obj, f.con, f.dir, f.rhs)$solution
[1] 0.0 4.5 0.0
The lpSolve Package
Sensitivity analyses can be obtained from the lp() object.
The following are objects attached to an lp() object.
[1] "direction" "x.count" "objective" "const.count" [5] "constraints""int.count" "int.vec" "objval"
[9] "solution" "presolve" "compute.sens" "sens.coef.from"
[13] "sens.coef.to" "duals" "duals.from" "duals.to"
[17] "status"
The lpSolve Package
To solve an integer program specify the vector components for which variables need to be integers
lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3)
Success: the objective function is 37
The lpSolve Package
To obtain the solution to the integer program use the solution statemet as before:
lp("max", f.obj, f.con, f.dir, f.rhs, int.vec=1:3) $solution
[1] 1 4 0
Summary
R is programming environment with many standard programming structures already included.
A large number of contributed packages. Many packages allow for use of modern statistical
procedures with out having to code them yourself. Requires familiarity with R to actually implement the
packages. No support. Allows users to create new packages.
Summary
All of the R code and files can be found at:
www.people.vcu.edu/~elboone2/CSS.htm