beauty and big data

13
Beauty and Big Data [Made possible by H2O and Tableau] Amy Wang

Upload: 0xdata

Post on 27-Aug-2014

622 views

Category:

Software


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Beauty and Big Data

Beauty and Big Data [Made possible by H2O and Tableau]

Amy Wang

Page 2: Beauty and Big Data

“A data scientist knows more statistics than a computer scientist and more computer science than a statistician.”

Page 3: Beauty and Big Data

What is H2O?Open source in-memory prediction engineMath Platform

• Parallelized and distributed algorithms making the most use out of multithreaded systems

• GLM, Random Forest, GBM, PCA, etc

Easy to use and adoptAPI• Written in Java – perfect for Java Programmers• REST API (JSON) – drives H2O from R, python, excel

More data? Or better models? Both?Big Data• Use all of your data – model without down sampling• Run a simple GLM or a more complex GBM to find the best fit for the data• More Data + Better Models = Better Predictions

Page 4: Beauty and Big Data

SQLHDFS NoSQLS3

RJSON

H2O

Scala

Java

Intelligent Enterprise Applications

Prediction Engine

Memory Manager

ensemblesSolvers

Deep learningCluster

Classify

Regression

Trees

Forest

Boosting

Gradients

Processes

Nano Fast Scoring Engine

Columnar Compression

Query Processor R-engine

In-Mem Map Reduce

2M Row ingest/ sec

50M Row Regression / sec

750M Row Aggregates / sec

On PremiseOn / Off HadoopOn EC2

Python

Page 5: Beauty and Big Data
Page 6: Beauty and Big Data
Page 7: Beauty and Big Data

Installation Process

Start playing with H2O with R yourself!Grab H2O and our R package: • Download from website : 0xdata.com/downloads• Build from git: https://github.com/0xdata/h2oGet support at: • http://docs.0xdata.com/

Page 8: Beauty and Big Data

Demo: Big Data Workflow using R with H2O

Page 9: Beauty and Big Data

OSEMN

INterpret [in Tableau]

Model [in H2O] and Explore the different models

Explore [in R or Tableau]

Obtain and Scrub

Page 10: Beauty and Big Data

H2O

Data

REST API

Local Socket Server

Page 11: Beauty and Big Data

Demo: Big Data Modeling Visualization in Tableau through R with H2O

Page 12: Beauty and Big Data

A little about us

Page 13: Beauty and Big Data

AdvisorsSystems, Data, File Systems and Hadoop

Scientific Advisory Council

Investors

Doug LeaACM Fellow, Malloc for C, fork-join, java memory model, suny Oswego

Chris PouliotVP of Data Science, Lyft, formerly, Netflix, Google

Dhruba BorthakurHDFS, Hive, Facebook

Stephen BoydProfessor of EE Engineering, Stanford

Rob TibshiraniProfessor of Health Research and Policy, and Statistics, Stanford

Trevor HastieProfessor of Statistics, Stanford

Jishnu BhattacharjeeNexus Venture Partners

Anand Babu PeriasamyFounder, Gluster (RedHat)

Anand RajaramanFounder, Junglee (Amazon) Kosmix (WalmartLabs)

Dipchand “Deep” NisharSVP of Products & UX (LinkedIn)

We’ve Got the Who’s Who of Predictive Analytics