anqi fu presents h2o and r; an intro to getting the most out of big data with r and h2o
DESCRIPTION
Anqi Fu exposes the code behind integrating R with H2O, and demos how users can manipulate, slice, dice, and examine data to ask different questions using ALL of the same big data.TRANSCRIPT
4/23/13
Big Data and R – H2O Saves the DayDecember 12, 2013, Anqi Fu
About Me
Anqi Fu: [email protected]• Math Hacker at 0xdata
• R+H2O rockstar
• Economics and Statistics Background
Installation
Everything in the demo is something YOU CAN DO.
Grab H2O and our R package and try it yourself:
• on our website (www.0xdata.com/downloadtable
• on our git:
(https://github.com/organizations/0xdata
Get support at:
• http://s3.amazonaws.com/h2o-release/h2o/master/1144/docs-website/index.html
“The non-scientist in the street probably has a clearer notion of physics, chemistry and biology than of statistics, regarding statisticians as numerical philatelists, mere collector of numbers.” Stephen Senn- Anonymous
DO Try This At Home!!! (here are the basic steps)
Basic Steps
1. Get R, H2O, and the R package (both H2O and R package are in any of our download files – you can find them on our web page.)
2. Install the H2O R package in R (along with dependencies you may not have).
3. Tell R to talk to H2O with a single command >h2o.init()
4. Let the connection automatically install the algorithms used by H2O+R.
5. Analyze Big Data!
Overview of Objects
• H2OClient: ip=character, port=numeric
• H2OParsedData: h2o=H2OClient, key=character
• H2OGLMModel: key=character, data=H2OParsedData, model=list(coefficients, deviance, aic, etc) Example: myModel@model$coefficients
H2Okey=“prostate.hex”
key=“airlines.hex”
Overview of Methods
Standard R H2O
read.csv, read.table, etc h2o.importFile, h2o.importURL
summary summary (limited to data only)
glm, glmnet h2o.glm(y, x, data, family, nfolds, alpha, lambda)
kmeans h2o.kmeans(data, centers, cols, iter.max)
randomForest, cforest h2o.randomForest(y, x_ignore, data, ntree, depth, classwt)
Demo: Big Data Manipulation and Modeling in R with H2O
Thanks