build your own super-computer withcloudyrand aws · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 requied...

10
Build your own super-computer with cloudyr and AWS with(aws.ec2, assign(“furrr”, future + purrr)) Laurens Geffert @JanLauGe https://janlauge.github.io [email protected]

Upload: others

Post on 08-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

Build your own super-computer with cloudyr and AWS

with(aws.ec2,

assign(“furrr”,

future + purrr))

Laurens Geffert

@JanLauGe

https://janlauge.github.io

[email protected]

Page 2: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

Outline

• Introduction• How I came to use and love R• How the tidyverse makes everything better

• Motivation• The prevalence of embarrassingly parallel problems in applied data science• Scaling up with open-source solutions

• Demo• The base-R single-threat approach• The parallelized cloud approach

Page 3: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

Audience Survey

• Who uses the tidyverse?• Who uses AWS?• Who has heard of the future package?

Page 4: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

During my PhD

Species distribution models• X = Remote sensing data• Y = Species occurrence data

Page 5: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

In the “real world”

Audience lookalike models• X = Web event data• Y = Panel data

Page 6: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

The BaseR way

X <- 'my predictors’Y <- 'variables to predict'results <- list()

# loop over response vector # to fit one model eachfor (i in 1:length(Ys)) {

y <- Y[[i]]model <- cv.glmnet(X, y)

results[[i]] <- model}

Page 7: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

The Tidyverse way

X <- 'my predictors’

Y <- 'variables to predict'

# map apply over

# all elements in Y

map(Y, X = X,

~ cv.glmnet(X, .x))

Page 8: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

The furrr way

X <- 'my predictors’

Y <- 'variables to predict'

# map apply over

# all elements in Y

plan(multicore)

future_map(Y, X = X,

~ cv.glmnet(X, .x))

Page 9: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

Setup

local

worker

worker

worker

worker

furrr

aws.ec2

aws.ec2

aws.ec2

aws.ec2

Requied

• Active AWS account.• Amazon Machine Image (AMI) with• R• ssh• remoter• tidyverse• future• furrr

• Working ssh key pair• On local machine• On AMI (public AND private!)

Page 10: Build your own super-computer withcloudyrand AWS · furrr aws.ec2 aws.ec2 aws.ec2 aws.ec2 Requied •Active AWS account. •Amazon Machine Image (AMI) with •R •ssh •remoter

Furrr::ther reading

• cloudyR, ssh by rOpenSci, remoter by Drew Schmidt, and last but not least furrr by Davis Vaughan.• https://davisvaughan.github.io/furrr/articl

es/advanced-furrr-remote-connections.html• https://github.com/JanLauGe/ds-personal-

projects/tree/master/ds-computing-cluster• https://janlauge.github.io/• Call to action: Support the cloudyr project