introduction to microsoft r open
TRANSCRIPT
Introduction to Microsoft R OpenDavid SmithR Community Lead
January 28, 2015 — Welcome!
What is R?Applications of RMicrosoft R OpenDemoQ&A
David SmithR Community LeadMicrosoft@revodavid
Editor, Revolutions blogblog.revolutionanalytics.com
Co-author (with Bill Venables and R Core Team), An Introduction to Rcran.r-project.org/manuals.html
Poll 1:Which statement best matches your relationship to R? - I’m completely new to R, but want to learn- I’m learning R- I’m an experienced R user- I won’t be using R (but I’m interested in what it can do)
• Most widely used data analysis software• Used by 2M+ data scientists, statisticians and analysts
• Most powerful statistical programming language• Flexible, extensible and comprehensive for productivity
• Create beautiful and unique data visualizations• As seen in New York Times, The Economist and FlowingData
• Fills the Data Science talent gap• New graduates prefer R
• Thriving open-source community• Leading edge of analytics research
What is R?
CRAN: 7000+ add-on packages for R
CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
1993 Research project in Auckland, NZ• Ross Ihaka and Robert Gentlemen
1995 Released as open-source software• Generally compatible with the “S” language
1997 R core group formed2003 R Foundation formed in Austria2007 Revolution Analytics founded2014 Revolution R Open launched2015 R Consortium founded2015 Microsoft acquires Revolution Analytics2016 Microsoft R Open 3.2.3 released
A brief history of R
Photo credit: Robert Gentleman
R: The #1 software for Data Science… and #6 amongst general-purpose programming languages
R Usage GrowthRexer Data Miner Survey, 2007-
2015
Language PopularityIEEE Spectrum Top Programming Languages,
201576% of analytic professionals report using R
36% select R as their primary tool
200 Local R User Groups Worldwide
Find a user group near you: msdsug.microsoft.com
Applications of R
Rapid development
New York Times, June 25 2009(3 hours after Michael Jackson’s death)
• Credit Risk Analysis • Financial Networks
Facebook• Exploratory Data Anal
ysis• Experimental Analysis
“Generally, we use R to move fast when we get a new data set. With R, we don’t need to develop custom tools or write a bunch of code. Instead, we can just go about cleaning and exploring the data.” — Solomon Messing, data scientist at Facebook
Housing
• Crime mapping
“The core innovation that Zillow offers are its advanced statistical predictive products, including the Zestimate®, the Rent Zestimate and the ZHVI® family of real estate indexes. By using R in production as well as research, Zillow maximizes flexibility and minimizes the latency in rolling out updates and new products.”• Statistical forecasting
The Azure Cloud
Operational Announced
Central USIowa
West USCalifornia
North EuropeIreland
East USVirginia
East US 2Virginia
US GovVirginia
North Central USIllinois
US GovIowa
South Central USTexas
Brazil SouthSao Paulo
West EuropeNetherlands
China North *Beijing
China South *Shanghai
Japan EastSaitama
Japan WestOsakaIndia West
TBDIndia East
TBD
East AsiaHong Kong
SE AsiaSingapore
Australia WestMelbourne
Australia EastSydney
* Operated by 21Vianet
• Capacity Planning• Forecasting hardware purchase requirements (forecast package)• Also RAM requirements for Microsoft IT
• System monitoring & alerting• Understanding user behavior (how users configure monitoring
platform)• Visualizing infrastructure utilization data• Abnormal login detection• Custom R packages to analyze monitoring data (time series
anomaly detection)
Microsoft Azure uses R for Reliability
• Enhanced Open Source R distribution• 100% compatible with all R-related
software• Faster performance with multi-
threading• CRAN “Time Machine” for
reproducibility• Available for Windows, Mac, and Linux• Free and Open Source
Download from mran.microsoft.com
Microsoft R Open
• Intel MKL replaces standard BLAS/LAPACK algorithms • Download and install “MKL” from MRAN• Windows and Linux platforms
• High-performance algorithms• Pipelined operations optimized for Intel
• Sequential Parallel• Uses as many threads as there are available
cores• Control with:setMKLthreads(<value>)
• No need to change any R code
MRO: Multi-threaded performance
Benchmarks details at MRAN
R
MRO
MRO
Reproducibility : share and validateAcademic / Research• Verify results• Advance Research
Business• Production code• Reliability• Reusability• Collaboration• Regulation www.nytimes.com/2011/07/08/health/research/08genes.html
http://arxiv.org/pdf/1010.1092.pdf
Package dependency explosionR script file using 6 most popular packages
Any updated package = potential reproducibility error!http://blog.revolutionanalytics.com/2014/10/explore-r-package-connections-at-mran.html
MRAN takes a snapshot of all 7,500+ packages every day
CRAN Time Machine
Add 2 lines to the top of your R script:library(checkpoint)checkpoint("2015-01-28")
• Downloads all required package version as of January 28, 2015• Easy for collaborators to reproduce your results• Easy to use different package versions with different projects
Access snapshots with “checkpoint”
(Any date after Sep 17, 2015)
Poll 2:If you’re an R user, have you tried Microsoft R Open (or Revolution R Open)? - I’ve never tried Microsoft R Open - I’ve tried Microsoft R Open- I primarily use Microsoft R Open - I don’t use R
Microsoft R Open Demo:Basic RReproducibility with RLearn R online at:www.datacamp.com/courses/free-introduction-to-r
Use Microsoft R Open with… Microsoft R Server Big-data analytics and distributed computing on
Linux, Hadoop and Teradata
SQL Server 2016 Big-data analytics integrated with SQL Server database (coming soon)
PowerBI Computations and charts from R scripts in dashboards
Azure ML Studio R Scripts in cloud-based Experiment workflows
Visual Studio R Tools for Visual Studio: integrated development environment for R (coming soon)
HDInsights R integrated with cloud-based Hadoop clusters
Cortana Analytics Cloud-based R APIs and Virtual Machines
Upcoming Microsoft R Server WebinarsThursday,
February 4Using Microsoft R Server to Address Scalability Issues in R
Thursday, February
11Data Mining with Microsoft R Server
Thursday, February
18Best Practices for using Microsoft R Server with Hadoop
Thursday, February
25Using Microsoft R Server to Operationalize your Analytics
Register: info.microsoft.com/Microsoft-R-Webinars.html
• R is the leading language for data science today• R is used for all kinds of advanced analytics
applications• Microsoft R Open is 100% compatible with R,
and offers performance and reproducibility benefits
• Microsoft R Open is integrated with SQL Server, PowerBI, and more.
• Download from mran.microsoft.com
Any Questions?
Wrapping Up
© Copyright Microsoft Corporation. All rights reserved.
Bonus Slides
Transformational Trends
cloud computing
2011 2016 5x increase
emerging data science talent
Universities filling 300,000 US talent gap
90% of the data in the world today has been created in the last two years alone
data explosion
opensourcee.g. R and Python
Working with the R FoundationSupporting the R user communityContinuing the growth of the R ProjectLinux Foundation collaborative projectNon-profit trade organization