r ext world/ user! kiev

27
R & EXT. WORD RESEARCH=>DEVELOPMENT Ruslan Shevchenko email: [email protected] twitter: @rssh1 github: https://github.com/rssh work: Lynx Capital Partners [consultant]

Upload: ruslan-shevchenko

Post on 06-Aug-2015

197 views

Category:

Software


0 download

TRANSCRIPT

R & EXT. WORD RESEARCH=>DEVELOPMENT

Ruslan Shevchenko email: [email protected] twitter: @rssh1 github: https://github.com/rssh work: Lynx Capital Partners [consultant]

R & EXT. WORLD : TALK OVERIEW

Let we build some model in R.

(Rewrite/Embed/R for all)

Integration techniques.

cmdline [littler, Rscript]

Language-level integration [RInside/Rcpp, RScala, rpy ]

R as net service [rApache, openCPU, Shiny ]

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic ?3. Implement business logic in R ?

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ?

1. Additional time and money

2. Improvements will follow long path.

=> Only if absolute necessary.

(platform, performance, etc ….)

2. Integrate R code with business logic in other language

3. Implement business logic in R ?

4. Migrate to other ecosystems ?

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ?

2. Integrate R code with business logic in other language ?

1. Complex.

2. Extra maintenance cost. 3. Implement business logic in R ?

4. Migrate to other ecosystems ?=> Only if we have no other way

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ?

2. Integrate R code with business logic in other language ?

3. Implement business logic in R ?

1. Esoteric way.

2. R was created without ‘software engineering’ way of

thinking ‘in mind’.4. Migrate to other ecosystems ?

python

R

Experience

Productivity

Warn: Speculative !!!(based on filling)

=> Only if R is ideal fit.

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ?

2. Integrate R code with business logic in other language ?

3. Implement business logic in R ?

4. Migrate to other ecosystems ? => No clear superior

Now: Python, Octave/Matlab

Future: Scalalab (?) Julia (?)

//! statistics //! fully compatible/free

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic ?3. Implement business logic in R ?

LET WE BUILD SOME MODEL IN R. WHAT NEXT ?

1. Rewrite in ‘real’ programming language for real usage ? 2. Integrate R code with business logic in other language ?3. Implement business logic in R ?

R INTEGRATIONS

(Old S): 1976, New S: 1988, S4: 1996

R: 1995

1-st integrations:

littler: 2006

Rscript: 2006 (in R installation)

Only interactive

COMMAND LINE

littler (#!/<path>/r … )

ls -la | awk '{print $5}' | littler -e ‘print(summary(as.integer(readLines())))’

echo 'cat(rnorm(10))' | littler

Rscript (#!/<path>/Rscript … )

ls -la | awk '{print $5}' | Rscript -e “summary(as.numeric(readLines('stdin')))"

'cat(rnorm(10))' | Rscript -

LIBRARY LEVEL

Embedding in R (C library loaded as R extension)

R extensions. [Cpp as scripting language, etc]

Start something like littler as separate process

Actually used instead R Embedding.

(reasons: Organization of R-interpreter .. )

CALL PROCESS/USE FROM R

C++ : RInside (R from C++)/ (RCPP: C++ inside R)

#include <RInside.h>

int main(int argc, char *argv[]) {

RInside R(argc, argv);

R["txt"] = "Hello, world!\n";

R.parseEvalQ("cat(txt)"); exit(0);}

R instance started

// [[Rcpp::export]]double parallelVectorSum(NumericVector x) { // declare the SumBody instance Sum sum(x); // call parallel_reduce to start the work parallelReduce(0, x.length(), sum); // return the computed sum return sum.value;}

CALL PROCESS/USE FROM RC++

RInside (R from C++)/ (RCPP: C++ inside R)

Java/Scala

JVMRI | rScala / rJava

Python:

rpy2 / rpython

Yet one approach: // FastR: R implementation in Java

NETWORK[WEB]

Low level: rApache / httpuv

rApache: http://rapache.org : R calls from apache module.

httpuv: https://github.com/rstudio/httpuv/ : web server inside R

Usually httpuv used for development, rApache - for serving in production.

R & WEB

High-Level:

API: openCPU (http://www.opencpu.org)

Applications:

shiny: http://www.rstudio.com/products/Shiny/

Rook + dashboard (CPAN)

Interactive r from web: RStudio Server

OPENCPU

install.packages(‘opencpu’) library(‘opencpu’) Start web server

http://localhost:2347/ocpuBrowse any data

Call any function• GET:

• http://<base>/library/datasets/data/cars/json• cars dataset in json

• http://<base>/library/stats/info• info about stats package

• http://<base>/library/stats/R/glm/print• R source for glm

OPENCPU POST: DEMO

POST• URL: http://<base>/library/stats/R/rnorm • Params: n=10

POST• URL: http://<base>/library/stats/R/rnorm?json • Params: n=10• Result: [-0.315, 0.6241, 0.7175, 1.1813, -2.5993, -0.9768, -0.034, 0.503, -0.4165, 1.0353]

Id of object in R environment

OPENCPU POST: DEMO

POST• URL: http://<base>/library/graphics/R/plot • Params: x=x075fecda05 (key of object received in plot)

http://<base>/tmp/x01ccbd847f/graphics/1/png

opencpu.js — support library

OPENCPU

Input/Output format can be set in URL.

Data: JSON, CSV, TAB, Protobuf, RDA, …

Graphics: PNG, SVG, PDF

Texts: plain, markdown,

OPENCPU

Web application is R package (i.e. simple archive with R code and html landing page)

openCPU-server for production. (using rApache )

Exists openCPU PAAS. Packages which published on github are loaded automatically On you local server you can do the same with own repo

OPENCPU

Natural way to publish R API

Ideal as ‘R Microservice’.

Caveats:

authorization must be implemented separately

load-balancing must route same sessions to the same servers.

SHINY

Present data from R

http://shiny.rstudio.com/

DSL for HTML UI Elements

Reactive connection with R over web sockets.

// demonstration: movie database

BATCH PROCESSING

Triggering Mail, SOAP, file upload, etc.

RSB: R Servise Bus:

http://www.openanalytics.eu/

https://github.com/openanalytics/RSB

Revolution: R on Azure

SparkR (now merged into Spark): R on spark.

WHEN TO STOP

When to stop and migrate to other solutions ?

complex integration (more than <K> calls )

hight maintenance cost of hybrid solution. (rare)

performance loss is issue.

R & EXT. WORLD.

It is possible to use ‘R’ as element of application infrastructure in combination with ‘software-engineering languages’.

Fast [Research => Deploy] loops is important.

THANKS.

QUESTIONS ?

Ruslan Shevchenko: <[email protected]> https://github.com/rssh