r ext world/ user! kiev
TRANSCRIPT
R & EXT. WORD RESEARCH=>DEVELOPMENT
Ruslan Shevchenko email: [email protected] twitter: @rssh1 github: https://github.com/rssh work: Lynx Capital Partners [consultant]
R & EXT. WORLD : TALK OVERIEW
Let we build some model in R.
(Rewrite/Embed/R for all)
Integration techniques.
cmdline [littler, Rscript]
Language-level integration [RInside/Rcpp, RScala, rpy ]
R as net service [rApache, openCPU, Shiny ]
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic ?3. Implement business logic in R ?
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ?
1. Additional time and money
2. Improvements will follow long path.
=> Only if absolute necessary.
(platform, performance, etc ….)
2. Integrate R code with business logic in other language
3. Implement business logic in R ?
4. Migrate to other ecosystems ?
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ?
2. Integrate R code with business logic in other language ?
1. Complex.
2. Extra maintenance cost. 3. Implement business logic in R ?
4. Migrate to other ecosystems ?=> Only if we have no other way
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ?
2. Integrate R code with business logic in other language ?
3. Implement business logic in R ?
1. Esoteric way.
2. R was created without ‘software engineering’ way of
thinking ‘in mind’.4. Migrate to other ecosystems ?
python
R
Experience
Productivity
Warn: Speculative !!!(based on filling)
=> Only if R is ideal fit.
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ?
2. Integrate R code with business logic in other language ?
3. Implement business logic in R ?
4. Migrate to other ecosystems ? => No clear superior
Now: Python, Octave/Matlab
Future: Scalalab (?) Julia (?)
//! statistics //! fully compatible/free
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic ?3. Implement business logic in R ?
LET WE BUILD SOME MODEL IN R. WHAT NEXT ?
1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic in other language ?3. Implement business logic in R ?
R INTEGRATIONS
(Old S): 1976, New S: 1988, S4: 1996
R: 1995
1-st integrations:
littler: 2006
Rscript: 2006 (in R installation)
Only interactive
COMMAND LINE
littler (#!/<path>/r … )
ls -la | awk '{print $5}' | littler -e ‘print(summary(as.integer(readLines())))’
echo 'cat(rnorm(10))' | littler
Rscript (#!/<path>/Rscript … )
ls -la | awk '{print $5}' | Rscript -e “summary(as.numeric(readLines('stdin')))"
'cat(rnorm(10))' | Rscript -
LIBRARY LEVEL
Embedding in R (C library loaded as R extension)
R extensions. [Cpp as scripting language, etc]
Start something like littler as separate process
Actually used instead R Embedding.
(reasons: Organization of R-interpreter .. )
CALL PROCESS/USE FROM R
C++ : RInside (R from C++)/ (RCPP: C++ inside R)
#include <RInside.h>
int main(int argc, char *argv[]) {
RInside R(argc, argv);
R["txt"] = "Hello, world!\n";
R.parseEvalQ("cat(txt)"); exit(0);}
R instance started
// [[Rcpp::export]]double parallelVectorSum(NumericVector x) { // declare the SumBody instance Sum sum(x); // call parallel_reduce to start the work parallelReduce(0, x.length(), sum); // return the computed sum return sum.value;}
CALL PROCESS/USE FROM RC++
RInside (R from C++)/ (RCPP: C++ inside R)
Java/Scala
JVMRI | rScala / rJava
Python:
rpy2 / rpython
Yet one approach: // FastR: R implementation in Java
NETWORK[WEB]
Low level: rApache / httpuv
rApache: http://rapache.org : R calls from apache module.
httpuv: https://github.com/rstudio/httpuv/ : web server inside R
Usually httpuv used for development, rApache - for serving in production.
R & WEB
High-Level:
API: openCPU (http://www.opencpu.org)
Applications:
shiny: http://www.rstudio.com/products/Shiny/
Rook + dashboard (CPAN)
Interactive r from web: RStudio Server
OPENCPU
install.packages(‘opencpu’) library(‘opencpu’) Start web server
http://localhost:2347/ocpuBrowse any data
Call any function• GET:
• http://<base>/library/datasets/data/cars/json• cars dataset in json
• http://<base>/library/stats/info• info about stats package
• http://<base>/library/stats/R/glm/print• R source for glm
OPENCPU POST: DEMO
POST• URL: http://<base>/library/stats/R/rnorm • Params: n=10
POST• URL: http://<base>/library/stats/R/rnorm?json • Params: n=10• Result: [-0.315, 0.6241, 0.7175, 1.1813, -2.5993, -0.9768, -0.034, 0.503, -0.4165, 1.0353]
Id of object in R environment
OPENCPU POST: DEMO
POST• URL: http://<base>/library/graphics/R/plot • Params: x=x075fecda05 (key of object received in plot)
http://<base>/tmp/x01ccbd847f/graphics/1/png
opencpu.js — support library
OPENCPU
Input/Output format can be set in URL.
Data: JSON, CSV, TAB, Protobuf, RDA, …
Graphics: PNG, SVG, PDF
Texts: plain, markdown,
OPENCPU
Web application is R package (i.e. simple archive with R code and html landing page)
openCPU-server for production. (using rApache )
Exists openCPU PAAS. Packages which published on github are loaded automatically On you local server you can do the same with own repo
OPENCPU
Natural way to publish R API
Ideal as ‘R Microservice’.
Caveats:
authorization must be implemented separately
load-balancing must route same sessions to the same servers.
SHINY
Present data from R
http://shiny.rstudio.com/
DSL for HTML UI Elements
Reactive connection with R over web sockets.
// demonstration: movie database
BATCH PROCESSING
Triggering Mail, SOAP, file upload, etc.
RSB: R Servise Bus:
http://www.openanalytics.eu/
https://github.com/openanalytics/RSB
Revolution: R on Azure
SparkR (now merged into Spark): R on spark.
WHEN TO STOP
When to stop and migrate to other solutions ?
complex integration (more than <K> calls )
hight maintenance cost of hybrid solution. (rare)
performance loss is issue.
R & EXT. WORLD.
It is possible to use ‘R’ as element of application infrastructure in combination with ‘software-engineering languages’.
Fast [Research => Deploy] loops is important.