if you can’t beat ’emgrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 growth of rcpp...

27
If You Can’t Beat ’Em … Some Comments on Using R Along With Other Languages Dirk Eddelbuettel 31 July 2016 Open Source Statistical Software for Data Science — Invited Papers Joint Statistical Meetings (JSM), Chicago, IL JSM 2016 1/27

Upload: others

Post on 06-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

If You Can’t Beat ’Em …

Some Comments on Using R Along With Other Languages

Dirk Eddelbuettel

31 July 2016

Open Source Statistical Software for Data Science — Invited PapersJoint Statistical Meetings (JSM), Chicago, IL

JSM 2016 1/27

Page 2: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Welcome to Lollapalooza!

JSM 2016 2/27

Page 3: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Overview

Content

• Single- or Multi-Language ?

• Interlude

• Empirics

• Illustration

• Conclusion

JSM 2016 3/27

Page 4: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Single- Or Multi-Language ?

JSM 2016 4/27

Page 5: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Claim: 1 + 1 > 2

Better with more than one?

• No one language fits all

• Real-world projects are frequently multi-language

• See e.g. job ads which rarely ever list just one language

JSM 2016 5/27

Page 6: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Counter-claim: 1 + 1 < 2

Or better with just one?

• Mental switching cost between languages? Possibly

• Interop difficult and less portable? Maybe, but that is anargument against weak systems / OSs

• Easier / less to learn?

• “More hoops” to code?

JSM 2016 6/27

Page 7: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Mental switching costs?

JSM 2016 7/27

Page 8: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

So which one is it?

Open Question

• Hard to measure or test: Any empirics on real world projects?

• Code competition / comparisons (e.g. Project Euler): Are theyrealistic?

JSM 2016 8/27

Page 9: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Interlude

JSM 2016 9/27

Page 10: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

John Chambers

Chambers (2008) Software ForData AnalysisChapters 10 and 11 devoted toInterfaces I: C and Fortran andInterfaces II: Other Systems.

JSM 2016 10/27

Page 11: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

John Chambers

Chambers (2016) Extending RAn entire book about this withconcrete Python, Julia and C++code and examples

JSM 2016 11/27

Page 12: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

John Chambers

Chambers 2016, Chapter 4

The fundamental lesson about programming in the large isthat requires a correspondingly broad and flexibleresponse. In particular, no single language or softwaresystem os likely to be ideal for all aspects. Interfacingmultiple systems is the essence. Part IV explores thedesign of of interfaces from R.

JSM 2016 12/27

Page 13: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Chamber 1976

Thanks to John Chambers for ascanned copy of this historic sketch.

JSM 2016 13/27

Page 14: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Empirics

JSM 2016 14/27

Page 15: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Growth

2010 2012 2014 2016

010

020

030

040

050

060

070

0

Growth of Rcpp usage on CRAN

n

Number of CRAN packages using Rcpp (left axis)Percentage of CRAN packages using Rcpp (right axis)

010

020

030

040

050

060

070

0

2010 2012 2014 2016

02

46

8

JSM 2016 15/27

Page 16: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Pagerank

library(pagerank) # github.com/andrie/pagerank

cran <- ”http://cloud.r-project.org”

pr <- compute_pagerank(cran)round(100*pr[1:5], 3)

## Rcpp MASS ggplot2 Matrix mvtnorm## 2.452 1.771 1.088 0.920 0.749

JSM 2016 16/27

Page 17: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Pagerank

bootrglzoodata.tablenlmeRCurlcodaXMLforeachreshape2jsonliteRcppArmadillodplyrigraphspstringrhttrlatticeplyrsurvivalmvtnormMatrixggplot2MASSRcpp

0.005 0.010 0.015 0.020 0.025

Top 25 of Page Rank as of July 2016

JSM 2016 17/27

Page 18: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Pagerank

Rcpp

MASS

ggplot2

Matrix

mvtnorm

lattice

plyr

stringr

RColorBrewer

dichromat

munsell

labeling

stringi

magrittr

colorspace

digest

gtable

reshape2

scales

sp

MatrixModels

graph

SparseM

sfsmisc

→→→

ImportsLinkingToEnhances

Top 5 packages by page rank

JSM 2016 18/27

Page 19: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Illustration

JSM 2016 19/27

Page 20: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Using R to C++ to Boost to Python, and back

Setup

py_cflags <- system(”python2.7-config --cflags”, intern=TRUE)se <- Sys.setenv; ge <- Sys.getenv # shorthands to typesetse(”PKG_CFLAGS”=sprintf(”%s %s”, ge(”PKG_CFLAGS”), py_cflags))se(”PKG_CXXFLAGS”=sprintf(”%s %s”, ge(”PKG_CXXFLAGS”), py_cflags))py_ldflags <- system(”python2.7-config --ldflags”, intern=TRUE)se(”PKG_LIBS”=sprintf(”%s %s %s”, ge(”PKG_CFLAGS”),

”-lboost_python-py27”, py_ldflags))

JSM 2016 20/27

Page 21: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Using R to C++ to Boost to Python, and back

#include <Rcpp.h>#include <Python.h>

// [[Rcpp::export]]void initialize_python() {

Py_SetProgramName(””); /* optional but recommended */Py_Initialize();

}

// [[Rcpp::export]]void hello_python() {

PyRun_SimpleString(”from time import time,ctime\n””print ’Today is’,ctime(time())\n”);

}

JSM 2016 21/27

Page 22: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Using R to C++ to Boost to Python, and back

Hello, World: Called from R

initialize_python()hello_python()

## Today is Sat Jul 30 13:38:01 2016

More at http://gallery.rcpp.org/articles/rcpp-python/Disclaimer: For illustration purposes. Works as designed on Ubuntu. Not meant to be universally portable to all three OSs.

JSM 2016 22/27

Page 23: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Conclusion

JSM 2016 23/27

Page 24: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Being Polyglot

Mixing Languages

• Common

• Natural

• Unavoidable

JSM 2016 24/27

Page 25: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Being Polyglot

Consequences

• Must make it easier to interoperate

• Stop bickering among ourselves

• Build systems that are larger that the sum of their parts

JSM 2016 25/27

Page 26: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Being Polyglot

Just Do It

JSM 2016 26/27

Page 27: If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp usage on CRAN n Number of CRAN packages using Rcpp (left axis) Percentage of CRAN packages

Appendix

Lars Wirzenius “Which license is the most free?”Free software licences can be roughly grouped into permissive and copyleft ones.[…] A permissive licence lets you do things that a copyleft one forbids, so clearly thepermissive licence is more free. A copyleft licence means software using it won’tever become non-free against the wills of the copyright holders, so clearly acopyleft licence is more free than a permissive one.

Both sides are both right and wrong, of course, which is why this argument willcontinue forever. […]

If a discussion about the relative freedom of licence types becomes heated, stepaway. It’s not worth participating anymore.

http://yakking.branchable.com/posts/comparative-freeness/

JSM 2016 27/27