stats 330: lecture 4

Post on 03-Feb-2016

36 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

STATS 330: Lecture 4. Graphics: Doing it in R. Housekeeping. My contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/ Or via Cecil. Today’s lecture: R for graphics. Aim of the lecture: - PowerPoint PPT Presentation

TRANSCRIPT

04/22/23 330 Lecture 4 1

STATS 330: Lecture 4

04/22/23 330 Lecture 4 2

HousekeepingMy contact details….

Plus much else on course web page

www.stat.auckland.ac.nz/~lee/330/

Or via Cecil

04/22/23 330 Lecture 4 3

04/22/23 330 Lecture 4 4

Today’s lecture: R for graphics

Aim of the lecture:

To show you how to use R to produce the plots shown in the last few lectures

04/22/23 330 Lecture 4 5

Getting data into R In 330, as in many cases, data comes in 2 main

forms• As a text file• As an Excel spreadsheet

Need to convert from these formats to R Data in R is organized in data frames

• Row by column arrangement of data (as in Excel)

• Variables are columns• Rows are cases (individuals)

04/22/23 330 Lecture 4 6

Text files to R Suppose we have the data in the form of a text

file Edit the text file (use Notepad or similar) so that

• The first row consists of the variable names• Each row of data (i.e. data on a complete case)

corresponds to one line of the file Suppose data fields are separated by spaces

and/or tabs Then, to create a data frame containing the

data, we use the R function read.table

04/22/23 330 Lecture 4 7

Example: the cherry tree data

Suppose we have a text file called cherry.txt (probably created using Notepad or maybe Word, but saved as a text file)

First line: variable names

Data for each tree on a separate line, separated

by “white space” (spaces or tabs)

04/22/23 330 Lecture 4 8

Creating the data frame

In R, type

cherry.df = read.table(file.choose(),

header=TRUE)

and press the return key

This brings up the dialog to select the file cherry.txt

containing the data.

Click here to select file

Click here to load data

04/22/23 330 Lecture 4 9

Check all is OK!

04/22/23 330 Lecture 4 10

Getting data from a spreadsheet (1)

Create the spreadsheet in Excel

Save it as Comma Delimited Text (CSV)

This is a text file with all cells separated by commas

File is called cherry.csv

04/22/23 330 Lecture 4 11

Getting data from a spreadsheet (2)

In R, type

cherry.df = read.table(file.choose(),

header=TRUE, sep=“,”)

and proceed as before

Getting data from the R330 package

The package R330 contains several data sets used in the course, including the cherry tree data

To access the data frame:• Install the R330 package (see Appendix A.10 of the

coursebook)• In R, type

> library(R330)

> data(cherry.df)

04/22/23 330 Lecture 4 12

04/22/23 330 Lecture 4 13

Data frames and variables

Suppose we have read in data and made a data frame

At this point R doesn’t know about the variables in the data frame, so we can’t use e.g. the variable diameter in R commands

We need to say attach(cherry.df)

to make the variables in cherry.df visible to R.

Alternatively, say cherry.df$diameter (better)

04/22/23 330 Lecture 4 14

Scatterplots

In R, there are 2 distinct sets of functions for graphics, one for ordinary graphics, one for trellis.

Eg for scatterplots, we use either plot (ordinary R) or xyplot (Trellis)

In the next few slides, we discuss plot.

04/22/23 330 Lecture 4 15

Simple plottingplot(cherry.df$height,

cherry.df$volume,

xlab=“Height (feet)”,

ylab=“Volume (cubic feet)”,

main = “Volume versus height for 31 black cherry trees”)

i.e. label axes (give units if possible), give a title

04/22/23 330 Lecture 4 16

65 70 75 80 85

10

20

30

40

50

60

70

Volume versus height for 31 black cherry trees

Height (feet)

Vo

lum

e (

cub

ic fe

et)

Alternative form of plotplot(volume ~ height,

xlab=“Height (feet)”,

ylab=“Volume (cubic feet)”,

main = “Volume versus height for 31 black cherry trees”,

data = cherry.df)

Don’t need use the $ notation with this form, note reversal of x,y

04/22/23 330 Lecture 4 17

04/22/23 330 Lecture 4 18

Colours, points, etcpar(bg="darkblue")plot(cherry.df$height, cherry.df$volume, xlab="Height (feet)", ylab="Volume (cubic feet)", main = "Volume versus height for 31 black cherry trees", pch=19,fg="white", col.axis=“lightblue",col.main="white", col.lab=“white",col="white",cex=1.3)

Type

?par

for more info

04/22/23 330 Lecture 4 19

65 70 75 80 85

10

20

30

40

50

60

70

Volume versus height for 31 black cherry trees

Height (feet)

Vo

lum

e (

cub

ic fe

et)

04/22/23 330 Lecture 4 20

Lines Suppose we want to join up the rats on the

rats plot. (see data next slide) We could try

plot(rats.df$day, rats.df$growth, type=“l”)

but this won’t work Points are plotted in order they appear in

the data frame and each point is joined to the next

04/22/23 330 Lecture 4 21

Rats: the data> rats.df growth group rat change day1 240 1 1 1 12 250 1 1 1 83 255 1 1 1 154 260 1 1 1 225 262 1 1 1 296 258 1 1 1 367 266 1 1 2 438 266 1 1 2 449 265 1 1 2 5010 272 1 1 2 5711 278 1 1 2 6412 225 1 2 1 112 230 1 2 1 8

... More data

04/22/23 330 Lecture 4 22

0 10 20 30 40 50 60

30

04

00

50

06

00

day

gro

wth

Don’t want this!

04/22/23 330 Lecture 4 23

SolutionVarious solutions, but one is to plot each line

separately, using subsetting

plot(day,growth,type="n")lines (day[rat==1],growth[rat==1])lines (day[rat==2],growth[rat==2])

and so on …. (boring!), or (better)

for(j in 1:16){lines (day[rat==j],growth[rat==j])}

Draw axes, labels only

04/22/23 330 Lecture 4 24

Indicating groupsWant to plot the litters with different colours, add a legend:

Rats 1-8 are litter 1, 9-12 litter 2, 13-16 litter 3

plot(day,growth,type="n")

for(j in 1:8)lines(day[rat==j],growth[rat==j],col="white") # litter 1

for(j in 9:12)lines (day[rat==j], growth[rat==j],col="yellow") # litter 2

for(j in 13:16)lines (day[rat==j], growth[rat==j],col="purple") # litter 3

Set colour of line

04/22/23 330 Lecture 4 25

legendlegend(13,380,legend = c(“Litter 1”, “Litter 2”,

“Litter 3”), col = c("white","yellow","purple"),lwd = c(2,2,2),horiz = TRUE,cex = 0.7)

(Type ?legend for a full explanation of these parameters)

04/22/23 330 Lecture 4 26

0 10 20 30 40 50 60

30

04

00

50

06

00

day

gro

wth

Litter 1 Litter 2 Litter 3

Points and text

x=1:25

y=1:25

plot(x,y, type="n")

points(x,y,pch=1:25, col="red",

cex=1.2)

04/22/23 27330 Lecture 4

5 10 15 20 25

51

01

52

02

5

x

y

04/22/23 28330 Lecture 4

Points and text (3)

x=1:26

y=1:26

plot(x,y, type="n")

text(x,y, letters, col="blue", cex=1.2)

04/22/23 29330 Lecture 4

0 5 10 15 20 25

05

10

15

20

25

x

y

ab

cd

ef

gh

ij

kl

mn

op

qr

st

uv

wx

yz

04/22/23 30330 Lecture 4

Use of pos

04/22/23 330 Lecture 4 31

x = 1:10y = 1:10plot(x,y)

position = rep(c(2,4), 5)mytext = rep(c(“Left",“Right"), 5)text(x,y,mytext, pos=position)

04/22/23 330 Lecture 4 32

04/22/23 330 Lecture 4 33

Trellis Must load trellis library first

library(lattice)

General form of trellis plots

xyplot(y~x|W*Z, data=some.df)

Don’t need to use the $ form, , trellis functions can pick out the variables, given the data frame

04/22/23 330 Lecture 4 34

Main trellis functions

dotplot for dotplots, use when X is categorical, Y is continuous

bwplot for boxplots, use when X is categorical, Y is continuous

xyplot for scatter plots, use when both x and y are continuous

equal.count use to turn continuous conditioning variable into groups

Changing background colour

To change trellis background to white

trellis.par.set(background = list(col="white"))

To change plotting symbols

trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))

04/22/23 330 Lecture 4 35

04/22/23 330 Lecture 4 36

Equal.countxyplot(volume~height|diameter, data=cherry.df)

height

volu

me

20

40

60

80

65 70 75 80 85

diameter diameter

65 70 75 80 85

diameter diameter

65 70 75 80 85

diameter diameter

diameter diameter diameter diameter diameter

20

40

60

80diameter

20

40

60

80diameter diameter diameter diameter diameter diameter

diameter diameter diameter diameter diameter

20

40

60

80diameter

20

40

60

80diameter

65 70 75 80 85

diameter diameter

04/22/23 330 Lecture 4 37

Equal.count (2)diam.gp<-equal.count(diameter,number=4,overlap=0) xyplot(volume~height|diam.gp, data=cherry.df)

height

volu

me

10

20

30

40

50

60

65 70 75 80 85

diam.gp diam.gp

diam.gp

65 70 75 80 85

10

20

30

40

50

60diam.gp

Changing plotting symbols

To change plotting symbols

trellis.par.set(plot.symbol = list(pch=16, col="red", cex=1))

04/22/23 330 Lecture 4 38

04/22/23 330 Lecture 4 39

height

volu

me

10

20

30

40

50

60

65 70 75 80 85

diam.gp diam.gp

diam.gp

65 70 75 80 85

10

20

30

40

50

60diam.gp

04/22/23 330 Lecture 4 40

Non-trellis version

1020

3040

5060

70

65 70 75 80 85

65 70 75 80 85 65 70 75 80 85

1020

3040

5060

70

height

volu

me

10 12 14 16 18

Given : diameter

coplot(volume~height|diameter, data=cherry.df)

04/22/23 330 Lecture 4 41

Non-trellis version (2)

coplot(volume~height|diameter,data=cherry.df,number=4,overlap=0)

1030

5070

65 70 75 80 85

65 70 75 80 85

1030

5070

height

volu

me

10 12 14 16 18

Given : diameter

04/22/23 330 Lecture 4 42

Other useful functions

Regular R• scatterplot3d (3d scatter plot, load library

scatterplot3d)• contour, persp (draws contour plots, surfaces)• pairs

Trellis• cloud (3d scatter plot)

Rotating plots You need to install the R330 package

Create a data frame e.g. called data.df with the response in the first column

Then, type

reg3d(data.df)

04/22/23 330 Lecture 4 43

top related