big data analytics trends...connecting r to hadoop –easily and fast big r is an r package...

13
© 2014 IBM Corporation Big Data Analytics Trends - some observations Dr. Alex Liu Principal Data Scientist March 18, 2015 BB-03-18-2015-1

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Big Data Analytics Trends

- some observations

Dr. Alex Liu – Principal Data Scientist

March 18, 2015

BB-03-18-2015-1

Page 2: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Strata Big Data Conference

Feb 17-20, 2015

San Jose, CA

Page 3: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Hadoop is the mainstream

▪ Hadoop becomes a necessary tool for big data

▪ Everyone has their own version of Hadoop

▪ IBM BigInsights – Hadoop + Easy To Use Console + BigSQL + BigR

Apache Hadoop is an open source

framework for distributed storage

and processing of large sets of data

Page 4: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

R is used by everyone

▪ R or SPSS/IBM or SAS or … Matlab … in the past

▪ Now, R user

▪ JAVA or Python …

Page 5: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

About BigR ▪ Connecting R to Hadoop – easily and fast

▪ Big R is an R package

– Installed in your favorite R client, say RStudio

– Also installed on all data nodes of the cluster

– Client-side R communicates with BigInsights via JDBC

▪ Allows use of R as a language to access big data

– Provides a high degree of abstraction

– Support some level of R -> JaQL mapping

▪ Allows push down of R code onto the cluster

– Can leverage existing R assets (code and packages)

▪ Works well as long as application is parallelizable

– User may be able to leverage existing R assets

Page 6: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Big R Examples# Connect to BigInsights

bigr.connect(host="192.168.153.219", port=7052, user="biadmin", password=“...")

# Construct a bigr.frame to access large data set

air <- bigr.frame(dataPath="airline_demo.csv", …)

attach(air)

……

Page 7: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

SPARK in coming

▪ Everyone is talking about SPARK

▪ Every company has or is about to integrate

In-memory framework for interactive

and iterative computations.

developed in the AMPLab at UC Berkeley.

Spark SQL

MLlib

Page 8: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Actionable Insights

▪ Big Data Analytics Consulting = Actionable Insights ?

Page 9: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Analytics with Big Data

9

Before After• Descriptive

• Predictive

• Prescriptive

• Descriptive

• Predictive

• Prescriptive

Data analytics, of any size, is data analytics.

More insights, actionable insights, are always what

customers want.

Page 10: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM CorporationDescriptive -> predictive -> prescriptive, now AllState uses prescriptive.

Page 11: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Automation

▪ Is automation possible?

Page 12: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Page 13: Big Data Analytics Trends...Connecting R to Hadoop –easily and fast Big R is an R package –Installed in your favorite R client, say RStudio –Also installed on all data nodes

© 2014 IBM Corporation

Big Data Analytics is a process

4Es – Equation – Estimation – Evaluation - Explanation