rcpp packages tying it all together - princeton · 2014-09-03 · introductioncreating...

48
Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP Rcpp Packages – Tying It All Together Advanced Statistical Programming Camp Jonathan Olmsted (Q-APS) Day 5: May 31st, 2014 AM Session ASPC Rcpp Packages – Tying It All Together Day 5 AM 1 / 48

Upload: others

Post on 21-Jun-2020

13 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Rcpp Packages – Tying It All Together

Advanced Statistical Programming CampJonathan Olmsted (Q-APS)

Day 5: May 31st, 2014AM Session

ASPC Rcpp Packages – Tying It All Together Day 5 AM 1 / 48

Page 2: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Outline

1 Introduction

2 Creating Rcpp(Armadillo) Packages

3 Capstone

4 OpenMP

ASPC Rcpp Packages – Tying It All Together Day 5 AM 2 / 48

Page 3: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Motivation

• Today’s session focuses on using an R package to• organize our R code• organize our C++ code• automate the build process

• Transitioning to an R package is necessary:• sourceCPP is fantastic for prototyping• considerable limitations for production work• sourceCPP won’t even work on the compute nodes of Adroit

• they don’t have compilers

• Goal: construct (three) packages and then run a job on the clusterthat uses the last of them

ASPC Rcpp Packages – Tying It All Together Day 5 AM 3 / 48

Page 4: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

The Rcpp Magic

To appreciate what we achieve by automating the compilation of ourcode, consider the EM example we have used so far.

library("Rcpp")sourceCpp("em_probit.cpp")

• we run sourceCpp• an R function is created

Everything “just works”!

ASPC Rcpp Packages – Tying It All Together Day 5 AM 4 / 48

Page 5: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

The Rcpp Magic

1 # include <RcppArmadillo.h>2 // [[Rcpp::depends(RcppArmadillo)]]34 double f (double mu) {5 double val = ((R::dnorm(-mu, 0, 1, false)) /6 (1 - R::pnorm(-mu, 0, 1, true, false))7 ) ;8 return(val) ;9 }

1011 double g (double mu) {12 double val = ((R::dnorm(-mu, 0, 1, false)) /13 (R::pnorm(-mu, 0, 1, true, false))14 ) ;15 return(val) ;16 }17

em_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 5 / 48

Page 6: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

The Rcpp Magic

18 // [[Rcpp::export()]]19 arma::mat em_probit (arma::mat y,20 arma::mat X,21 int maxit = 1022 ) {23 int N = y.n_rows ;24 int K = X.n_cols ;25 arma::mat beta(K, 1) ;26 beta.fill(0.0) ; // initialize betas to 02728 arma::mat eystar(N, 1) ;29 eystar.fill(0) ;30

em_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 6 / 48

Page 7: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

The Rcpp Magic

31 for (int it = 0 ; it < maxit ; it++) {32 arma::mat mu = X * beta ;33 for (int n = 0 ; n < N ; n++) {34 if (y(n, 0) == 1) { // y = 135 eystar(n, 0) = mu(n, 0) + f(mu(n, 0)) ;36 }37 if (y(n, 0) == 0) { // y = 038 eystar(n, 0) = mu(n, 0) - g(mu(n, 0)) ;39 }40 }41 // linear regression given augmented data42 beta = (X.t() * X).i() * X.t() * eystar ;43 }44 return(beta) ;45 }

em_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 7 / 48

Page 8: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

The Rcpp Magic

Unless there is an error in our code, sourceCpp does not tell us aboutthe details of compilation.We can ask for the details, though.

sourceCpp("em_probit.cpp", rebuild = TRUE, verbose = TRUE)

The output that is generated includes:• the auto-generated bridging code which connects our snippet to

the R level (both C++ code and R code)• the command line calls necessary to compile the code

If we weren’t using Rcpp, we have to do some (or all) of this ourselves.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 8 / 48

Page 9: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

R Packages

A well-defined way of organizing:• R code• to-be-compiled code• data• documentation

They are particularly useful for:• distribution of code• automating the build mechanism for compiled code• automatic dependency checking (R packs)

We can’t go into all of the details of R packages, but thedocumentation is extensive:http://cran.r-project.org/doc/manuals/R-exts.html.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 9 / 48

Page 10: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Rcpp(Armadilo) Packages

• We can’t just put ourRcpp-based code in apackage and expect it towork.

• However, the team has madeit easy.

• Creating a minimallyworking R package thatuses Rcpp RcppArmadillois automatic.

•RcppArmadillo.package.skeleton

ASPC Rcpp Packages – Tying It All Together Day 5 AM 10 / 48

Page 11: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Outline

1 Introduction

2 Creating Rcpp(Armadillo) Packages

3 Capstone

4 OpenMP

ASPC Rcpp Packages – Tying It All Together Day 5 AM 11 / 48

Page 12: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Probit Regression Two Ways

• Our goal is to create a working R package with functions toestimate a Probit regression model with both EM and BayesianMCMC.

• Again, the details of compilation will be mostly automatic.

• Compiled output is persistent across sessions.• We can use our compiled code on the cluster.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 12 / 48

Page 13: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Probit Regression Two Ways

• Before creating an R package, you will want some prototypedcode.

• EM Probit is old code.• Bayesian MCMC Probit is new.

• As we work toward our goal, will create 3 different R packages.pack1: EM Probit regressionpack2: EM and Bayesian MCMC Probit regressionpack3: EM and Bayesian MCMC Probit regression with

headers

ASPC Rcpp Packages – Tying It All Together Day 5 AM 13 / 48

Page 14: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

• Verify that the Bayesian MCMC code works.• Confirm the prototype is good with sourceCpp.

• This is not general-purpose MCMC code for a Probit. It is justan illustration.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 14 / 48

Page 15: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

1 # include <RcppArmadillo.h>2 // [[Rcpp::depends(RcppArmadillo)]]34 using namespace Rcpp ;56 double get1TN(double mu,7 double sd,8 double low,9 double high

10 ) {11 double draw = 0.0 ;12 bool valid = false ;13 while (!valid) {14 double cand = R::rnorm(mu, sd) ;15 if ((cand >= low) &16 (cand <= high)17 ) {18 draw = cand ;19 valid = true ;20 }21 }22 return(draw) ;23 }24

bayes_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 15 / 48

Page 16: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

25 // [[Rcpp::export()]]26 List mcmc_probit (arma::mat y,27 arma::mat X,28 arma::mat betastart,29 int mcmc = 1030 ) {31 // Dimensions32 int N = y.n_rows ;33 int K = X.n_cols ;3435 // Assumed Priors36 arma::mat mu0(K, 1) ;37 mu0.fill(0.0) ;38 double var = pow(5, 2) ;39 arma::vec vars(K) ;40 vars.fill(var) ;41 arma::mat sigma0(K, K) ;42 sigma0.fill(0.0) ;43 sigma0.diag() = vars ;444546 // Current Containers47 arma::mat ystar(N, 1) ;48 ystar.fill(0.0) ;49 arma::mat beta = betastart ;50 arma::mat mu(N, 1) ;51

bayes_probit.cppASPC Rcpp Packages – Tying It All Together Day 5 AM 16 / 48

Page 17: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

52 // Trace Containers53 arma::mat trace(K, mcmc) ;545556 // Sampling57 for (int iter = 0 ; iter < mcmc ; iter++) {58 // UPDATE ystar59 mu = X * beta ;60 for (int n = 0 ; n < N ; n++) {61 if (y(n, 0) == 1) {62 ystar(n, 0) = get1TN(mu(n, 0),63 1,64 0,65 INFINITY66 ) ;67 }68 if (y(n, 0) == 0) {69 ystar(n, 0) = get1TN(mu(n, 0),70 1,71 -INFINITY,72 073 ) ;74 }75 }76

bayes_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 17 / 48

Page 18: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

77 // UPDATE beta78 arma::mat sigma = (X.t() * X + sigma0.i()).i() ;79 arma::mat mu = sigma * (X.t() * ystar + sigma0.i() * mu0) ;80 arma::mat z(K, 1) ;81 z.fill(0.0) ;82 for (int k = 0 ; k < K ; k++) {83 z(k, 0) = R::rnorm(0, 1) ;84 }85 beta = mu + chol(sigma) * z ;8687 // TRACE88 trace.col(iter) = beta ;89 }90

bayes_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 18 / 48

Page 19: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

9192 // Returns93 List ret;94 List last ;95 List priors ;9697 last["ystar"] = ystar ;98 last["beta"] = beta ;99100 priors["mu"] = mu0 ;101 priors["sigma"] = sigma0 ;102103 ret["beta"] = trace ;104 ret["N"] = N ;105 ret["K"] = K ;106 ret["mcmc"] = mcmc ;107 ret["last"] = last ;108 ret["priors"] = priors ;109 return(ret) ;110 }

bayes_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 19 / 48

Page 20: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

library(Rcpp)

sourceCpp("em_probit.cpp")sourceCpp("bayes_probit.cpp")

library(Zelig)data(turnout)

fit0 <- glm(vote ~ income + educate + age,data = turnout,family = binomial(link = "probit"))

coef(fit0)

## (Intercept) income educate age## -1.68241 0.09936 0.10667 0.01692

ASPC Rcpp Packages – Tying It All Together Day 5 AM 20 / 48

Page 21: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

y <- matrix(turnout$vote)X <- model.matrix(fit0)

betafit <- em_probit(y = y,X = X)

out <- mcmc_probit(y = y,X = X,betastart = betafit,mcmc = 5000)

ASPC Rcpp Packages – Tying It All Together Day 5 AM 21 / 48

Page 22: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

ASPC Rcpp Packages – Tying It All Together Day 5 AM 22 / 48

Page 23: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Bayesian MCMC Probit Regression

ASPC Rcpp Packages – Tying It All Together Day 5 AM 23 / 48

Page 24: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack1: EM Probit

1 Navigate to the directory we will create the package in.• For today’s use, a subdirectory adjacent to today’s files.

2 Start R.3 Load Rcpp.4 RcppArmadillo.package.skeleton("pack1").5 cp the EM Probit code (em_probit.cpp) into the src directory.

• e.g., cp ./pack1/src/

ASPC Rcpp Packages – Tying It All Together Day 5 AM 24 / 48

Page 25: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack1: EM Probit

ASPC Rcpp Packages – Tying It All Together Day 5 AM 25 / 48

Page 26: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack1: EM Probit

1 Compile attributes.1 cd pack12 R3 library(Rcpp)4 compileAttributes(verbose = TRUE) ; quit()

2 Install the package (shortcut method).1 cd .. (if you are still in pack1)2 R CMD INSTALL pack1

3 Test.1 Rscript pack1_demo.R

ASPC Rcpp Packages – Tying It All Together Day 5 AM 26 / 48

Page 27: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack2: EM and MCMC Probit

1 Start R.2 Load Rcpp.3 RcppArmadillo.package.skeleton("pack2").4 cp the EM Probit code (em_probit.cpp) into the src directory.5 cp the MCMC Probit code (bayes_probit.cpp) into the src

directory.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 27 / 48

Page 28: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack2: EM and MCMC Probit

ASPC Rcpp Packages – Tying It All Together Day 5 AM 28 / 48

Page 29: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack2: EM and MCMC Probit

1 Compile attributes.1 cd pack22 R3 library(Rcpp)4 compileAttributes(verbose = TRUE) ; quit()

2 Install the package (shortcut method).1 cd .. (if you are still in pack2)2 R CMD INSTALL pack2

3 Test.1 Rscript pack2_demo.R

[1] -1.69950254 0.10069090 0.10735209 0.01704585[1] 0.203190201 0.018175435 0.010442088 0.001765494

ASPC Rcpp Packages – Tying It All Together Day 5 AM 29 / 48

Page 30: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

1 Start R.2 Load Rcpp.3 RcppArmadillo.package.skeleton("pack3").4 cp the EM Probit code (em_probit.cpp) into the src directory.5 cp the MCMC Probit code (bayes_probit.cpp) into the src

directory.6 cp the TN RNG code (tn.cpp) into the src directory.7 cp the TN RNG header (tn.hpp) into the src directory.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 30 / 48

Page 31: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

ASPC Rcpp Packages – Tying It All Together Day 5 AM 31 / 48

Page 32: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

• tn.cpp implements the (fragile) Truncated Normal RNG

• tn.hpp provides a header for other files that need this functionalityto #include.

• allows code re-use• allows code modularization• requires explicit #include of headers as a dependency• requires attention to “guarding” the header (i.e., # ifndef#)

ASPC Rcpp Packages – Tying It All Together Day 5 AM 32 / 48

Page 33: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

1 # include <Rcpp.h>23 double get1TN(double mu,4 double sd,5 double low,6 double high7 ) {8 double draw = 0.0 ;9 bool valid = false ;

10 while (!valid) {11 double cand = R::rnorm(mu, sd) ;12 if ((cand >= low) &13 (cand <= high)14 ) {15 draw = cand ;16 valid = true ;17 }18 }19 return(draw) ;20 }

tn.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 33 / 48

Page 34: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

1 # ifndef tn_hpp2 # define tn_hpp34 double get1TN(double mu,5 double sd,6 double low,7 double high8 ) ;9

10 # endif

tn.hpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 34 / 48

Page 35: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

1 # include <RcppArmadillo.h>2 # include "tn.hpp"34 using namespace Rcpp ;56 // [[Rcpp::export()]]7 List mcmc_probit (arma::mat y,8 arma::mat X,9 arma::mat betastart,

10 int mcmc = 1011 ) {12 // Dimensions13 int N = y.n_rows ;14 int K = X.n_cols ;

pack3/src/bayes_probit.cpp

ASPC Rcpp Packages – Tying It All Together Day 5 AM 35 / 48

Page 36: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

pack3: EM and MCMC Probit with Headers

1 Compile attributes.1 cd pack32 R3 library(Rcpp)4 compileAttributes(verbose = TRUE) ; quit()

2 Install the package (shortcut method).1 cd .. (if you are still in pack3)2 R CMD INSTALL pack3

3 Test.1 Rscript pack3_demo.R

[1] -1.69848189 0.10081924 0.10721661 0.01704466[1] 0.197491688 0.018013906 0.010699391 0.001704965

ASPC Rcpp Packages – Tying It All Together Day 5 AM 36 / 48

Page 37: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Automating Attribute Compiling

• The process of “compiling attributes” manually is tedious.• You’ll probably want to write a script that does this for your own

project.• Consider builder.sh as inspiration.

1 #!/bin/bash23 if [ -d "pack1" ]; then4 cd pack15 Rscript -e "library(Rcpp) ; compileAttributes(verbose=TRUE)"6 cd ..7 R CMD INSTALL pack18 fi9

builder.sh

Run it with sh builder.sh.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 37 / 48

Page 38: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Outline

1 Introduction

2 Creating Rcpp(Armadillo) Packages

3 Capstone

4 OpenMP

ASPC Rcpp Packages – Tying It All Together Day 5 AM 38 / 48

Page 39: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

• Structuring Rcpp-based code projects inside R packages isbeneficial:

1 disciplines file organization2 allows for code re-use without sacrificing mostly automatic

compilation3 avoid unnecessary compilations

• But in the case of running on Adroit, it is necessary.• The compute nodes of Adroit don’t have compilers.• sourceCpp won’t work.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 39 / 48

Page 40: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

To run on Adroit:• Create a SLURM file representing your job request.

• job.slurm• Create an R script representing your computational job.

• job.R

ASPC Rcpp Packages – Tying It All Together Day 5 AM 40 / 48

Page 41: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

1 #!/bin/sh2 #SBATCH --nodes 2 # << number of compute nodes3 #SBATCH --ntasks-per-node=2 # << number of processors per compute node4 #SBATCH -t 00:10:00 # << max time required (hh:mm:ss)5 #SBATCH -J mcmc # << brief, descriptive job label6 #SBATCH -o log.%j # << file to save output/error information78 srun Rscript job.R

job.slurm

ASPC Rcpp Packages – Tying It All Together Day 5 AM 41 / 48

Page 42: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

1 ## #################2 ## Read Env Variable3 ## #################45 inSLURM <- (Sys.getenv("SLURM_JOB_ID") != "") # true only if a SLURM job67 ## ############8 ## Common Setup9 ## ############

1011 library("foreach")12

job.R

ASPC Rcpp Packages – Tying It All Together Day 5 AM 42 / 48

Page 43: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

13 ## #################14 ## Conditional Setup15 ## #################16 set.seed(1)17 if (inSLURM) {18 library("doMPI")19 cl <- startMPIcluster() ## will auto detect number of workers from SLUM20 registerDoMPI(cl)21 } else {22 library("doParallel")23 cl <- makeCluster(4, "PSOCK")24 registerDoParallel(cl)25 }2627 library("doRNG")28 registerDoRNG()2930 getDoParWorkers()31

job.R

ASPC Rcpp Packages – Tying It All Together Day 5 AM 43 / 48

Page 44: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

32 ## #################33 ## Computational Job34 ## #################3536 nChains <- 43738 lOut <- foreach(case = 1:nChains) %dopar% {39 library(Zelig)40 data(turnout)41 fit0 <- glm(vote ~ income + educate + age,42 data = turnout,43 family = binomial(link = "probit")44 )45 coef(fit0)46 y <- matrix(turnout$vote)47 X <- model.matrix(fit0)4849 library(pack3)5051 betafit <- em_probit(y = y, X = X, maxit = 100)5253 out <- mcmc_probit(y = y, X = X, betastart = betafit, mcmc = 5000)54 return(out)55 }5657 save(lOut, file = "chains.rda")58

job.RASPC Rcpp Packages – Tying It All Together Day 5 AM 44 / 48

Page 45: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

59 ## ####################60 ## Conditional Shutdown61 ## ####################6263 if (inSLURM) {64 closeCluster(cl)65 mpi.quit()66 } else {67 stopCluster(cl)68 }

job.R

ASPC Rcpp Packages – Tying It All Together Day 5 AM 45 / 48

Page 46: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Parallel Chains of the MCMC Probit

• The R script is portable, so you can (illegally) run the job on thehead node in parallel with a socket cluster of workers.

• Rscript job.R

• Submit it to the scheduler to run with an MPI cluster of workers.• sbatch job.slurm

• Output is chains.rda.• one R object: lOut• a list of length 4• each element is the output from the bayes_probit function

ASPC Rcpp Packages – Tying It All Together Day 5 AM 46 / 48

Page 47: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Outline

1 Introduction

2 Creating Rcpp(Armadillo) Packages

3 Capstone

4 OpenMP

ASPC Rcpp Packages – Tying It All Together Day 5 AM 47 / 48

Page 48: Rcpp Packages Tying It All Together - Princeton · 2014-09-03 · IntroductionCreating Rcpp(Armadillo) PackagesCapstoneOpenMP Rcpp Packages – Tying It All Together Advanced Statistical

Introduction Creating Rcpp(Armadillo) Packages Capstone OpenMP

Including OpenMP Code in an R Package

If you wanted to include code in your R package using OpenMP (e.g.,the parallel EM code), ./pack3/src/Makevars would need to be:

## Use the R_HOME indirection to support installations of multiple R versionPKG_CXXFLAGS=-fopenmpPKG_LIBS = `$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()"` $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) -fopenmp

• This mixes MPI (or socket) and OpenMP parallelism, however.• Correctly (and effectively) using both must be managed by the

user.• It’s easier (and safer) to use just one until the computational

demands of a problem dictate otherwise.

ASPC Rcpp Packages – Tying It All Together Day 5 AM 48 / 48