big data analytics trends...connecting r to hadoop –easily and fast big r is an r package...
TRANSCRIPT
![Page 1: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/1.jpg)
© 2014 IBM Corporation
Big Data Analytics Trends
- some observations
Dr. Alex Liu – Principal Data Scientist
March 18, 2015
BB-03-18-2015-1
![Page 2: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/2.jpg)
© 2014 IBM Corporation
Strata Big Data Conference
Feb 17-20, 2015
San Jose, CA
![Page 3: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/3.jpg)
© 2014 IBM Corporation
Hadoop is the mainstream
▪ Hadoop becomes a necessary tool for big data
▪ Everyone has their own version of Hadoop
▪ IBM BigInsights – Hadoop + Easy To Use Console + BigSQL + BigR
Apache Hadoop is an open source
framework for distributed storage
and processing of large sets of data
![Page 4: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/4.jpg)
© 2014 IBM Corporation
R is used by everyone
▪ R or SPSS/IBM or SAS or … Matlab … in the past
▪ Now, R user
▪ JAVA or Python …
![Page 5: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/5.jpg)
© 2014 IBM Corporation
About BigR ▪ Connecting R to Hadoop – easily and fast
▪ Big R is an R package
– Installed in your favorite R client, say RStudio
– Also installed on all data nodes of the cluster
– Client-side R communicates with BigInsights via JDBC
▪ Allows use of R as a language to access big data
– Provides a high degree of abstraction
– Support some level of R -> JaQL mapping
▪ Allows push down of R code onto the cluster
– Can leverage existing R assets (code and packages)
▪ Works well as long as application is parallelizable
– User may be able to leverage existing R assets
![Page 6: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/6.jpg)
© 2014 IBM Corporation
Big R Examples# Connect to BigInsights
bigr.connect(host="192.168.153.219", port=7052, user="biadmin", password=“...")
# Construct a bigr.frame to access large data set
air <- bigr.frame(dataPath="airline_demo.csv", …)
attach(air)
……
![Page 7: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/7.jpg)
© 2014 IBM Corporation
SPARK in coming
▪ Everyone is talking about SPARK
▪ Every company has or is about to integrate
In-memory framework for interactive
and iterative computations.
developed in the AMPLab at UC Berkeley.
Spark SQL
MLlib
![Page 8: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/8.jpg)
© 2014 IBM Corporation
Actionable Insights
▪ Big Data Analytics Consulting = Actionable Insights ?
![Page 9: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/9.jpg)
© 2014 IBM Corporation
Analytics with Big Data
9
Before After• Descriptive
• Predictive
• Prescriptive
• Descriptive
• Predictive
• Prescriptive
Data analytics, of any size, is data analytics.
More insights, actionable insights, are always what
customers want.
![Page 10: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/10.jpg)
© 2014 IBM CorporationDescriptive -> predictive -> prescriptive, now AllState uses prescriptive.
![Page 11: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/11.jpg)
© 2014 IBM Corporation
Automation
▪ Is automation possible?
![Page 12: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/12.jpg)
© 2014 IBM Corporation
![Page 13: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes](https://reader035.vdocuments.mx/reader035/viewer/2022081222/5f7b33f0d593982f517931e9/html5/thumbnails/13.jpg)
© 2014 IBM Corporation
Big Data Analytics is a process
4Es – Equation – Estimation – Evaluation - Explanation