08 functions

30
Hadley Wickham Stat405 Functions & for loops Monday, 13 September 2010

Upload: hadley-wickham

Post on 13-May-2015

633 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 08 functions

Hadley Wickham

Stat405Functions & for loops

Monday, 13 September 2010

Page 2: 08 functions

1. Projects

2. Function structure and calling conventions

3. Practice writing functions

4. For loops

Monday, 13 September 2010

Page 3: 08 functions

Functions

Monday, 13 September 2010

Page 4: 08 functions

Basic structure

• Name

• Input arguments

• Names/positions

• Defaults

• Body

• Output (final result)

Monday, 13 September 2010

Page 5: 08 functions

calculate_prize <- function(windows) { payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40, "BB" = 25, "B" = 10, "C" = 10, "0" = 0)

same <- length(unique(windows)) == 1 allbars <- all(windows %in% c("B", "BB", "BBB"))

if (same) { prize <- payoffs[windows[1]] } else if (allbars) { prize <- 5 } else { cherries <- sum(windows == "C") diamonds <- sum(windows == "DD")

prize <- c(0, 2, 5)[cherries + 1] * c(1, 2, 4)[diamonds + 1] } prize}

Monday, 13 September 2010

Page 6: 08 functions

calculate_prize <- function(windows) { payoffs <- c("DD" = 800, "7" = 80, "BBB" = 40, "BB" = 25, "B" = 10, "C" = 10, "0" = 0)

same <- length(unique(windows)) == 1 allbars <- all(windows %in% c("B", "BB", "BBB"))

if (same) { prize <- payoffs[windows[1]] } else if (allbars) { prize <- 5 } else { cherries <- sum(windows == "C") diamonds <- sum(windows == "DD")

prize <- c(0, 2, 5)[cherries + 1] * c(1, 2, 4)[diamonds + 1] } prize}

name

always indent inside {}!

arguments

last value in function is resultMonday, 13 September 2010

Page 7: 08 functions

mean <- function(x) { sum(x) / length(x)}mean(1:10)

mean <- function(x, na.rm = TRUE) { if (na.rm) x <- x[!is.na(x)] sum(x) / length(x)}mean(c(NA, 1:9))

Monday, 13 September 2010

Page 8: 08 functions

mean <- function(x) { sum(x) / length(x)}mean(1:10)

mean <- function(x, na.rm = TRUE) { if (na.rm) x <- x[!is.na(x)] sum(x) / length(x)}mean(c(NA, 1:9))

default value

Monday, 13 September 2010

Page 9: 08 functions

# Function: meanstr(mean) mean

args(mean)formals(mean)body(mean)

Monday, 13 September 2010

Page 10: 08 functions

Function arguments

Every function argument has a name and a position. When calling, matches exact name, then partial name, then position.

Rule of thumb: for common functions, position based matching is ok for required arguments. Otherwise use names (abbreviated as long as clear).

Monday, 13 September 2010

Page 11: 08 functions

mean(c(NA, 1:9), na.rm = T)

# Confusing in this case, but often saves typingmean(c(NA, 1:9), na = T)

# Generally don't need to name all argumentsmean(x = c(NA, 1:9), na.rm = T)

# Unusual orders best avoided, but# illustrate the principlemean(na.rm = T, c(NA, 1:9))mean(na = T, c(NA, 1:9))

Monday, 13 September 2010

Page 12: 08 functions

# Overkillqplot(x = price, y = carat, data = diamonds)

# Don't need to supply defaultsmean(c(NA, 1:9), na.rm = F)

# Need to remember too much about meanmean(c(NA, 1:9), , T)

# Don't abbreviate too much!mean(c(NA, 1:9), n = T)

Monday, 13 September 2010

Page 13: 08 functions

f <- function(a = 1, abcd = 1, abdd = 1) { print(a) print(abcd) print(abdd)}

f(a = 5)f(ab = 5)f(abc = 5)

Monday, 13 September 2010

Page 14: 08 functions

Your turn

Write a function to calculate the variance of a vector. Make sure it has a na.rm argument.

Write the function in your text editor, then copy into R.

Monday, 13 September 2010

Page 15: 08 functions

Strategy

Always want to start simple: start with test values and get the body of the function working first.

Check each step as you go.

Don’t try and do too much at once!

Create the function once everything works.

Monday, 13 September 2010

Page 16: 08 functions

x <- 1:10sum((x - mean(x)) ^ 2) / (length(x) - 1)

var <- function(x) sum((x - mean(x)) ^ 2) / (length(x) - 1)

x <- c(1:10, NA)var(x)

na.rm <- TRUEif (na.rm) { x <- x[!is.na(x)]}var <- function(x, na.rm = F) { if (na.rm) { x <- x[!is.na(x)] } sum((x - mean(x)) ^ 2) / (length(x) - 1)}

Monday, 13 September 2010

Page 17: 08 functions

Testing

Always a good idea to test your code.

We have a prebuilt set of test cases: the prize column in slots.csv

So for each row in slots.csv, we need to calculate the prize and compare it to the actual. (Hopefully they will be same!)

Monday, 13 September 2010

Page 18: 08 functions

For loops

for(value in 1:10) { print(value)}

print(1)print(2)print(3)print(4)print(5)print(6)print(7)print(8)print(9)print(10)

Monday, 13 September 2010

Page 19: 08 functions

For loops

cuts <- levels(diamonds$cut)for(cut in cuts) { selected <- diamonds$price[diamonds$cut == cut] print(cut) print(mean(selected))}# Have to do something with output!

Monday, 13 September 2010

Page 20: 08 functions

# Common pattern: create object for output, # then fill with results

cuts <- levels(diamonds$cut)means <- rep(NA, length(cuts))

for(i in seq_along(cuts)) { sub <- diamonds[diamonds$cut == cuts[i], ] means[i] <- mean(sub$price)}

# We will learn more sophisticated ways to do this # later on, but this is the most explicit

Monday, 13 September 2010

Page 21: 08 functions

# Common pattern: create object for output, # then fill with results

cuts <- levels(diamonds$cut)means <- rep(NA, length(cuts))

for(i in seq_along(cuts)) { sub <- diamonds[diamonds$cut == cuts[i], ] means[i] <- mean(sub$price)}

# We will learn more sophisticated ways to do this # later on, but this is the most explicit

Why use i and not cut?

Monday, 13 September 2010

Page 22: 08 functions

seq_len(5)seq_len(10)n <- 10seq_len(10)

seq_along(1:10)seq_along(1:10 * 2)

Monday, 13 September 2010

Page 23: 08 functions

Your turn

For each diamond colour, calculate the median price and carat size

Monday, 13 September 2010

Page 24: 08 functions

colours <- levels(diamonds$color)n <- length(colours)mprice <- rep(NA, n)mcarat <- rep(NA, n)

for(i in seq_len(n)) { set <- diamonds[diamonds$color == colours[i], ] mprice[i] <- median(set$price) mcarat[i] <- median(set$carat)}

results <- data.frame(colours, mprice, mcarat)

Monday, 13 September 2010

Page 25: 08 functions

Back to slots...

For each row, calculate the prize and save it, then compare calculated prize to actual prize

Question: given a row, how can we extract the slots in the right form for the function?

Monday, 13 September 2010

Page 26: 08 functions

slots <- read.csv("slots.csv")

i <- 334

slots[i, ]

slots[i, 1:3]

str(slots[i, 1:3])

as.character(slots[i, 1:3])

c(as.character(slots[i, 1]), as.character(slots[i, 2]), as.character(slots[i, 3]))

slots <- read.csv("slots.csv", stringsAsFactors = F)

str(slots[i, 1:3])

as.character(slots[i, 1:3])

calculate_prize(as.character(slots[i, 1:3]))

Monday, 13 September 2010

Page 27: 08 functions

# Create space to put the resultsslots$check <- NA

# For each row, calculate the prizefor(i in seq_len(nrow(slots))) { w <- as.character(slots[i, 1:3]) slots$check[i] <- calculate_prize(w)}

# Check with known answerssubset(slots, prize != prize_c)# Uh oh!

Monday, 13 September 2010

Page 28: 08 functions

# Create space to put the resultsslots$check <- NA

# For each row, calculate the prizefor(i in seq_len(nrow(slots))) { w <- as.character(slots[i, 1:3]) slots$check[i] <- calculate_prize(w)}

# Check with known answerssubset(slots, prize != prize_c)# Uh oh!

What is the problem? Think about the most general case

Monday, 13 September 2010

Page 29: 08 functions

DD DD DD 8007 7 7 80

BBB BBB BBB 40BB BB BB 25B B B 10C C C 10

Any bar Any bar Any bar 5C C * 5C * C 5C * * 2* C * 2* * C 2

DD doubles any winning combination. Two DD quadruples. DD is wild

windows <- c("DD", "C", "C")

# How can we calculate the

# payoff?

Monday, 13 September 2010

Page 30: 08 functions

Homework

What did we miss? Find out the problem with our scoring function and fix it.

Monday, 13 September 2010