top 10 performance gotchas for scaling in-memory algorithms

48
H2O – The Open Source Math Engine Better Predictions

Upload: srisatish-ambati

Post on 11-May-2015

1.838 views

Category:

Technology


1 download

DESCRIPTION

Top 10 Data Parallelism and Model Parallelism lessons from scaling H2O. "Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users. Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm."

TRANSCRIPT

Page 1: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O – The Open Source Math Engine !

Better Predictions!

Page 2: Top 10 Performance Gotchas for scaling in-memory Algorithms

4/23/13

H2O – Open Source in-memory Machine Learning for Big Data

Page 3: Top 10 Performance Gotchas for scaling in-memory Algorithms

Universe is sparse. Life is messy. Data is sparse & messy.!

- Lao Tzu

Page 4: Top 10 Performance Gotchas for scaling in-memory Algorithms

Hadoop = opportunity Not enough Data Scientists Analysts won’t code java

Page 5: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Adhoc  Explora-on  

Math  Modeling  

Real-­‐-me  Scoring  

Big Data

Messy  NAs  

Clustering  

Classifica-on                          

                               

Ensembles 100’s nanos  

models  

Regression  

Group  By  Grep  

Page 6: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

No New API!

Approximate!results each step!

Page 7: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Intellectual  Legacy  

 Math  needs    to  be  free  

 Open  Source  

 

Support and Innovation

hFps://github.com/0xdata/h2o  

Page 8: Top 10 Performance Gotchas for scaling in-memory Algorithms

All Top 10ʼs are binary!- Anonymous

Page 9: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data chunks > code chunks TCP for Data. UDP for Control.

>> Generated Java Assist

10      Move Code not Data  

Page 10: Top 10 Performance Gotchas for scaling in-memory Algorithms

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

A Frame: Vec[] age   sex   zip   ID   car  

l Vecs aligned in heaps l Optimized for concurrent access l Random access any row, any JVM

A Chunk, Unit of Parallel Access

Page 11: Top 10 Performance Gotchas for scaling in-memory Algorithms

season for Variable-sized chunks

and a season Uniform chunks. Tightly-packed! (chunk is also unit of batch!)

9      Chunk-ing Express!  

Page 12: Top 10 Performance Gotchas for scaling in-memory Algorithms

No Expensive intermediate states. Fine-grain parallelism wins! >> Fork / Join

8      Reduce early. Reduce Often!  

Page 13: Top 10 Performance Gotchas for scaling in-memory Algorithms

All CPUs grab Chunks in parallel Map/Reduce & F/J handles all sync

8      Reduce early. Reduce Often!  

JVM 4 Heap

JVM 1 Heap JVM 2 Heap JVM 3 Heap

Vec   Vec   Vec   Vec   Vec  

Page 14: Top 10 Performance Gotchas for scaling in-memory Algorithms

Debugging slow >> Heartbeats, Messages Two General’s Paradox

7      Slow is not different from Dead  

Page 15: Top 10 Performance Gotchas for scaling in-memory Algorithms

in-memory system as good as your memory manager! lazy eviction. compress.

align. Corollary: Track down Leaks!

6      Memory Manager  

Page 16: Top 10 Performance Gotchas for scaling in-memory Algorithms

Use primitives

5      Memory Overheads  

// A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }

Page 17: Top 10 Performance Gotchas for scaling in-memory Algorithms

Tree size Bin size Recursively divide Till Data à Cache

4      Cache-­‐Oblivious  

Page 18: Top 10 Performance Gotchas for scaling in-memory Algorithms

User-mode reliability S3 Readers will TCP Reset Mux your connections Not all toolkits are equal. >> JetS3

3      EC2 – Nothing is bounded  

Page 19: Top 10 Performance Gotchas for scaling in-memory Algorithms

Non-Blocking Data Structures.

2 No Locks, No Cry  

// VOLATILE READ before key compare. // CAS private final boolean CAS_kvs( final Object[] oldkvs, final Object[] newkvs ) { return _unsafe.compareAndSwapObject(this, _kvs_offset, oldkvs, newkvs ); }

Page 20: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 21: Top 10 Performance Gotchas for scaling in-memory Algorithms

byte[ ]. roll-your-own. fast.

1 endian wars ended! Keep-It-Simple-Serialization.  

public AutoBuffer putA1 ( byte[] ary, int sofar, int length ) {

while( sofar < length ) { int len = Math.min(length - sofar, _bb.remaining()); _bb.put(ary, sofar, len); sofar += len; if( sofar < length ) sendPartial(); } return this;

}

Page 22: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data Movement is a Defect. Slowing down helps communication.

Got Speed?  

Page 23: Top 10 Performance Gotchas for scaling in-memory Algorithms

Accuracy rules over speed. Predictive Performance

0      Math always produces a number  

Page 24: Top 10 Performance Gotchas for scaling in-memory Algorithms

Data presentation bias. Sorted data => interesting results

1      Shuffle  

Page 25: Top 10 Performance Gotchas for scaling in-memory Algorithms

2      Random acts of Kindness?  

Page 26: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 27: Top 10 Performance Gotchas for scaling in-memory Algorithms

3      Convex Problems: ADMM  

Page 28: Top 10 Performance Gotchas for scaling in-memory Algorithms

Matrix operations jama, jblas.. all single node. Distributed version needs data transfer!

4  Amdahl strikes: Cholesky / QR Decomposition  

Page 29: Top 10 Performance Gotchas for scaling in-memory Algorithms

embarrassingly parallel binning tree-building splits

5    Random  Forests  

Page 30: Top 10 Performance Gotchas for scaling in-memory Algorithms

iterate & stage weak-learners =>

strong learners each tree can be parallel minimize communication

6    Boos-ng  

Page 31: Top 10 Performance Gotchas for scaling in-memory Algorithms

embarrassingly parallel pre-calculate base stats distance calculation weight matrices – small footprint

7    Neural  Nets  &  Clustering  

Page 32: Top 10 Performance Gotchas for scaling in-memory Algorithms

Daisy chain a bunch of models Interleave. JIT – Minimize loops over data.

8    Ensembles  

Page 33: Top 10 Performance Gotchas for scaling in-memory Algorithms

Deterministic versions first! Got Pen & Paper? Optimize often. Test Big Data soon.

9      Tools  

Page 34: Top 10 Performance Gotchas for scaling in-memory Algorithms

Replace NAs to improves predictive performance by about 10pc.!

- Newton

Page 35: Top 10 Performance Gotchas for scaling in-memory Algorithms

Munging Missing Features impute NAs with mean impute NAs with knn impute with recursive pca!

- Boyd

Page 36: Top 10 Performance Gotchas for scaling in-memory Algorithms

Unbalanced data single rare classes Fraud / No-Fraud!

Stratify

Page 37: Top 10 Performance Gotchas for scaling in-memory Algorithms

Unbalanced data multiple rare classes Browse, Click, Purchase!

Stratify

Page 38: Top 10 Performance Gotchas for scaling in-memory Algorithms

Use Customer Data Algorithms for Sparse vs. Dense Unbalanced Data. Robustness under noise

10      Data is the System  

Page 39: Top 10 Performance Gotchas for scaling in-memory Algorithms

Volume:  HDFS  

HIVE/SQL

Data Scientist

Munging slice n dice Features

Classification Regression Clustering Optimal Model

Engineer

Velocity:  Events   Online  Scoring  

Explora-on  

       Modeling  

Offline  Scoring  

Business Analyst

Ensemble models Low latency

Applications

Predictions

Rule  Engine  

Before H2O

Page 40: Top 10 Performance Gotchas for scaling in-memory Algorithms

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

Big Data beats Better Algorithms!

Page 41: Top 10 Performance Gotchas for scaling in-memory Algorithms

Big  Data  Explora-on  Modeling  Scoring  Real-­‐-me  

 

Big Data and Better Algorithms! Scale & Parallelism!

Page 42: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O the

Prediction

Engine

Intellectual  Legacy  

 Math  needs    to  be  free  

 Open  Source  

 

Support and Innovation

hFps://github.com/0xdata/h2o  

Page 43: Top 10 Performance Gotchas for scaling in-memory Algorithms
Page 44: Top 10 Performance Gotchas for scaling in-memory Algorithms

H2O – The Open Source Math Engine !

Better Predictions!

Page 45: Top 10 Performance Gotchas for scaling in-memory Algorithms

0xdata.com  

45  

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Page 46: Top 10 Performance Gotchas for scaling in-memory Algorithms

46  

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Read  the  docs!  

This  talk!  

Join  our  GIT!  

Page 47: Top 10 Performance Gotchas for scaling in-memory Algorithms

0xdata.com  

47  

Distributed Data Taxonomy

Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame

Page 48: Top 10 Performance Gotchas for scaling in-memory Algorithms

Usecases

Conversion, Retention & Churn!•  Lead Conversion!•  Engagement!•  Product Placement!•  Recommendations!

Pricing Engine!Fraud Detection!