![Page 1: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/1.jpg)
The State of HPC in the Open Source R Ecosystem
Drew Schmidt
November 12, 2016
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 2: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/2.jpg)
Support and Disclaimer
This material is based upon work supported by the National Science Foundation Division ofMathematical Sciences under Grant No. 1418195.
The findings and conclusions in this presentation have not been formally disseminated by theU.S. Department of Health & Human Services nor by the U.S. Department of Energy, andshould not be construed to represent any determination or policy of University, Agency,Administration and National Laboratory.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 3: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/3.jpg)
Speaker Bio
M.S. in mathematics.
Former statistics consultant.
Former full-time university researcher.
Now a miserable grad student.
Prolific complainer on twitter.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 4: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/4.jpg)
Goals of This Talk
Convince you that R has a legitimate place in HPC.
Give a broad overview of the R package landscape.
Make some very safe predictions.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 5: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/5.jpg)
Contents
1 Background and Motivation
2 A Little History
3 Packages
4 A Closer Look at HPC and R
5 Concluding Remarks
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 6: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/6.jpg)
Background and Motivation
1 Background and MotivationR Is WeirdR Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 7: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/7.jpg)
Background and Motivation R Is Weird
1 Background and MotivationR Is WeirdR Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 8: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/8.jpg)
Background and Motivation R Is Weird
Types
logical (“boolean”)
integer (32-bit int)
numeric (double)
complex (double complex)
character (string)
Also raw and external pointer
Data Structures
Vectors (matrices, n-dim arrays)
Lists (arrays of pointers)
Dataframes (lists with constraints)
Environments (hash tables?!)
That’s it.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 1/39
![Page 9: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/9.jpg)
Background and Motivation R Is Weird
Happy Opposite Day!
1 T
2 ## [1] TRUE
3 F
4 ## [1] FALSE
5
6 T <- FALSE
7 F <- TRUE
8
9 T
10 ## [1] FALSE
11 F
12 ## [1] TRUE
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 2/39
![Page 10: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/10.jpg)
Background and Motivation R Is Weird
Odd Conventions
. has no semantic meaning (except when it does
t.test()
t.data.frame()
A package is installed in a library.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 3/39
![Page 11: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/11.jpg)
Background and Motivation R Is Weird
Package or Library?
I wrote a library.
I put that library into a package.
I installed the package . . . into a library.
I load the package with library() ???
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 4/39
![Page 12: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/12.jpg)
Background and Motivation R Is Weird
*BOOM*
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 5/39
![Page 13: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/13.jpg)
Background and Motivation R Is Popular
1 Background and MotivationR Is WeirdR Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 14: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/14.jpg)
Background and Motivation R Is Popular
Part Programming Language, Part Data Analysis Package
“R is a shockingly dreadful language for an exceptionally useful data analysis environment.”— Tim Smith, from aRrgh: a newcomer’s (angry) guide to R.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 6/39
![Page 15: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/15.jpg)
Background and Motivation R Is Popular
IEEE Spectrum’s 2014 Ranking of Programming Languages
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 7/39
![Page 16: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/16.jpg)
Background and Motivation R Is Popular
IEEE Spectrum’s 2016 Ranking of Programming Languages
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 8/39
![Page 17: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/17.jpg)
Background and Motivation R Is Popular
Rexer 2015 data scientist survey
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 9/39
![Page 18: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/18.jpg)
Background and Motivation R Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 10/39
![Page 19: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/19.jpg)
Background and Motivation R Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 11/39
![Page 20: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/20.jpg)
Background and Motivation R Is Popular
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 12/39
![Page 21: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/21.jpg)
Background and Motivation R Is Popular
Why use R at all?
Most diverse set of statistical methods available.
Rapid prototyping.
CRAN (and increasingly GitHub) packages.
Awesome community.
Syntax is designed for analysis of data.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 13/39
![Page 22: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/22.jpg)
A Little History
2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 23: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/23.jpg)
A Little History Statistics, Data Science, Big Data, and So On
2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 24: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/24.jpg)
A Little History Statistics, Data Science, Big Data, and So On
HPC: Not Just for PDE’S Anymore!
R’s use in HPC.
No traditional HPC. . .
Lots of interesting work
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 14/39
![Page 25: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/25.jpg)
A Little History Statistics, Data Science, Big Data, and So On
About Traditional HPC. . .
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 15/39
![Page 26: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/26.jpg)
A Little History Statistics, Data Science, Big Data, and So On
Changing Landscape of HPC
“non-traditional” HPC: everybody but physics.
What kind of software do they need?
Can we leverage any existing HPC stuff?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 16/39
![Page 27: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/27.jpg)
A Little History Statistics, Data Science, Big Data, and So On
Problems with ”Big Data“ Software
Many frameworks; what do they all do?
Don’t always play nice with HPC systems.
Often not as ”high level“ as advertised.
Almost exclusively batch!
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 17/39
![Page 28: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/28.jpg)
A Little History Statistics, Data Science, Big Data, and So On
Data Analysis Is An Interactive Activity
Data analysis is an interactive activitya
aData analysis is an interactive activity
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 18/39
![Page 29: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/29.jpg)
A Little History Statistics, Data Science, Big Data, and So On
Data science in action
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 19/39
![Page 30: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/30.jpg)
A Little History Enter R
2 A Little HistoryStatistics, Data Science, Big Data, and So OnEnter R
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 31: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/31.jpg)
A Little History Enter R
http://datascience.la/john-chambers-user-2014-keynote/
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 20/39
![Page 32: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/32.jpg)
Packages
3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 33: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/33.jpg)
Packages
Where to Begin?
Many packages of varying scope and quality.
1 core package (parallel)
Over 100 contributed packageshttps://cran.r-project.org/web/views/HighPerformanceComputing.html
Even more on GitHub.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 21/39
![Page 34: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/34.jpg)
Packages
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 22/39
![Page 35: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/35.jpg)
Packages Advanced Compute Packages
3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 36: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/36.jpg)
Packages Advanced Compute Packages
Out of Core Packages
ff, bigmemory and friends
R is very “copy happy”
Many statisticians don’t know about things like XSEDE.
Others hear “Linux” and run away screaming.
Bizarrely, cloud computing is changing this.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 23/39
![Page 37: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/37.jpg)
Packages Advanced Compute Packages
Rcpp
Rcpp
RcppArmadillo, RcppEigen
RcppParallel
. . .
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 24/39
![Page 38: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/38.jpg)
Packages HPC Packages
3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 39: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/39.jpg)
Packages HPC Packages
Accelerator Packages
gputools, Magma, HiPLARM, a few others.
Accessibility mostly from things like nvblas and Intel MKL.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 25/39
![Page 40: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/40.jpg)
Packages HPC Packages
Distributed Packages
Rmpi
snow
pbdMPI and friends
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 26/39
![Page 41: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/41.jpg)
Packages HPC Packages
Remote Evaluation Packages
rzmq, pbdZMQ
remoter, future
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 27/39
![Page 42: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/42.jpg)
Packages Hadoop and Applications
3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 43: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/43.jpg)
Packages Hadoop and Applications
Hadoop et al Packages
RHadoop, RHIPE
SparkR
sparklyr
h2o
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 28/39
![Page 44: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/44.jpg)
Packages Hadoop and Applications
“Applications”
dplyr and data.table
caret
randomForest
xgboost
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 29/39
![Page 45: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/45.jpg)
Packages Ok, So What?
3 PackagesAdvanced Compute PackagesHPC PackagesHadoop and ApplicationsOk, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 46: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/46.jpg)
Packages Ok, So What?
Is the R community using this stuff?
Short answer: yes.
Long answer: mostly single-node parallelism.
Hard truth: in addition to hype and buzzwords — fear and distrust
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 30/39
![Page 47: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/47.jpg)
Packages Ok, So What?
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 31/39
![Page 48: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/48.jpg)
Packages Ok, So What?
Source https://twitter.com/eddelbuettel/status/787740983433854977
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 32/39
![Page 49: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/49.jpg)
A Closer Look at HPC and R
4 A Closer Look at HPC and R
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 50: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/50.jpg)
A Closer Look at HPC and R
HPC may be dying, but we’re behind the times
0
50
100
150
2014 2015 2016 2017Date
Pac
kage
Dow
nloa
d M
arke
tsha
repackage
pbdMPI
Rmpi
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 33/39
![Page 51: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/51.jpg)
A Closer Look at HPC and R
“OLCF Researchers Scale R to Tackle Big Science Data Sets”
A problem that takes several hours on Apache Spark[was analyzed] in less than a minute using R on OLCFhigh-performance hardware.
“. . . for situations where one needs interactivenear-real-time analysis, the pbdR approach is muchbetter.”
https://www.hpcwire.com/2016/07/06/
olcf-researchers-scale-r-tackle-big-science-data-sets/
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 34/39
![Page 52: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/52.jpg)
A Closer Look at HPC and R
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 35/39
![Page 53: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/53.jpg)
A Closer Look at HPC and R
Interconnection Network
PROC + cache
PROC + cache
PROC + cache
PROC + cache
Mem Mem Mem Mem
Distributed Memory
Memory
CORE + cache
CORE + cache
CORE + cache
CORE + cache
Network
Shared Memory Local Memory
Co-Processor
GPU: Graphical Processing Unit
MIC: Many Integrated Core
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 36/39
![Page 54: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/54.jpg)
A Closer Look at HPC and R
Local Memory
Co-Processor
GPU: Graphical Processing Unit
MIC: Many Integrated Core
Interconnection Network
PROC + cache
PROC + cache
PROC + cache
PROC + cache
Mem Mem Mem Mem
Distributed Memory
Memory
CORE + cache
CORE + cache
CORE + cache
CORE + cache
Network
Shared Memory
Trilinos
PETSc
PLASMA
DPLASMALibSci (Cray) MKL (Intel)
ScaLAPACK PBLAS BLACS
cuBLAS (NVIDIA)
MAGMA
PAPI
Tau
MPImpiP
fpmpi
NetCDF4
ADIOS
pbdMPI
pbdPAPI
pbdNCDF4
pbdADIOS
pbdPROF pbdPROF pbdPROF
ACML (AMD)
pbdDEMO
CombBLAS
cuSPARSE (NVIDIA)
pbdDMATpbdDMATpbdDMATpbdDMAT
pbdBASE pbdSLAP
HiPLARHiPLARM
magma
ZeroMQ pbdCS
Profiling
I/O
Learning
Released Under DevelopmentSlides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 37/39
![Page 55: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/55.jpg)
Concluding Remarks
5 Concluding Remarks
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem
![Page 56: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/56.jpg)
Concluding Remarks
The Future?
Better dplyr backends.
More threading + accelerator usage in packages (Rcpp + RcppParallel).
Astronomical amounts of buzz in the Haddop/Spark-and-friends space — will ultimatelyhurt us in the MPI space.
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 38/39
![Page 57: The State of HPC in the Open Source R Ecosystem · A Closer Look at HPC and R \OLCF Researchers Scale R to Tackle Big Science Data Sets" A problem that takes several hours on Apache](https://reader030.vdocuments.mx/reader030/viewer/2022040605/5ea9a1e41936e5525410874e/html5/thumbnails/57.jpg)
Concluding Remarks
∼Thanks!∼
Questions?
Email: [email protected]
GitHub: https://github.com/wrathematics
Web: http://wrathematics.info
Twitter: @wrathematics
Slides: wrathematics.github.io/hpcdevcon2016/ Drew Schmidt The State of HPC in the Open Source R Ecosystem 39/39