anqi fu presents h2o and r; an intro to getting the most out of big data with r and h2o

10
4/23/ 13 Big Data and R H2O Saves the Day December 12, 2013, Anqi Fu

Upload: srisatish-ambati

Post on 26-Jan-2015

109 views

Category:

Technology


2 download

DESCRIPTION

Anqi Fu exposes the code behind integrating R with H2O, and demos how users can manipulate, slice, dice, and examine data to ask different questions using ALL of the same big data.

TRANSCRIPT

Page 1: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

4/23/13

Big Data and R – H2O Saves the DayDecember 12, 2013, Anqi Fu

Page 2: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

About Me

Anqi Fu: [email protected]• Math Hacker at 0xdata

• R+H2O rockstar

• Economics and Statistics Background

Page 3: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Installation

Everything in the demo is something YOU CAN DO.

Grab H2O and our R package and try it yourself:

• on our website (www.0xdata.com/downloadtable

• on our git:

(https://github.com/organizations/0xdata

Get support at:

• http://s3.amazonaws.com/h2o-release/h2o/master/1144/docs-website/index.html

Page 4: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

“The non-scientist in the street probably has a clearer notion of physics, chemistry and biology than of statistics, regarding statisticians as numerical philatelists, mere collector of numbers.” Stephen Senn- Anonymous

Page 5: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

DO Try This At Home!!! (here are the basic steps)

Page 6: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Basic Steps

1. Get R, H2O, and the R package (both H2O and R package are in any of our download files – you can find them on our web page.)

2. Install the H2O R package in R (along with dependencies you may not have).

3. Tell R to talk to H2O with a single command >h2o.init()

4. Let the connection automatically install the algorithms used by H2O+R.

5. Analyze Big Data!

Page 7: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Overview of Objects

• H2OClient: ip=character, port=numeric

• H2OParsedData: h2o=H2OClient, key=character

• H2OGLMModel: key=character, data=H2OParsedData, model=list(coefficients, deviance, aic, etc) Example: myModel@model$coefficients

H2Okey=“prostate.hex”

key=“airlines.hex”

Page 8: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Overview of Methods

Standard R H2O

read.csv, read.table, etc h2o.importFile, h2o.importURL

summary summary (limited to data only)

glm, glmnet h2o.glm(y, x, data, family, nfolds, alpha, lambda)

kmeans h2o.kmeans(data, centers, cols, iter.max)

randomForest, cforest h2o.randomForest(y, x_ignore, data, ntree, depth, classwt)

Page 9: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Demo: Big Data Manipulation and Modeling in R with H2O

Page 10: Anqi Fu presents H2O and R; an intro to getting the most out of big data with R and H2O

Thanks