top 10 performance gotchas for scaling in-memory algorithms

H2O – The Open Source Math Engine !

Better Predictions!

4/23/13

H2O – Open Source in-memory Machine Learning for Big Data

Universe is sparse. Life is messy. Data is sparse & messy.!

- Lao Tzu

Hadoop = opportunity Not enough Data Scientists Analysts won’t code java

H2O the

Prediction

Engine

Adhoc Explora-on

Math Modeling

Real-‐-me Scoring

Big Data

Messy NAs

Clustering

Classifica-on

Ensembles 100’s nanos

models

Regression

Group By Grep

H2O the

Prediction

Engine

Big Data Explora-on Modeling Scoring Real-‐-me

No New API!

Approximate!results each step!

H2O the

Prediction

Engine

Intellectual Legacy

Math needs to be free

Open Source

Support and Innovation

hFps://github.com/0xdata/h2o

All Top 10ʼs are binary!- Anonymous

Data chunks > code chunks TCP for Data. UDP for Control.

>> Generated Java Assist

10 Move Code not Data

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

A Frame: Vec[] age sex zip ID car

l Vecs aligned in heaps l Optimized for concurrent access l Random access any row, any JVM

A Chunk, Unit of Parallel Access

season for Variable-sized chunks

and a season Uniform chunks. Tightly-packed! (chunk is also unit of batch!)

9 Chunk-ing Express!

No Expensive intermediate states. Fine-grain parallelism wins! >> Fork / Join

8 Reduce early. Reduce Often!

All CPUs grab Chunks in parallel Map/Reduce & F/J handles all sync

8 Reduce early. Reduce Often!

JVM 4 Heap

JVM 1 Heap JVM 2 Heap JVM 3 Heap

Vec Vec Vec Vec Vec

Debugging slow >> Heartbeats, Messages Two General’s Paradox

7 Slow is not different from Dead

in-memory system as good as your memory manager! lazy eviction. compress.

align. Corollary: Track down Leaks!

6 Memory Manager

Use primitives

5 Memory Overheads

// A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }

Tree size Bin size Recursively divide Till Data à Cache

4 Cache-‐Oblivious

User-mode reliability S3 Readers will TCP Reset Mux your connections Not all toolkits are equal. >> JetS3

3 EC2 – Nothing is bounded

Non-Blocking Data Structures.

2 No Locks, No Cry

// VOLATILE READ before key compare. // CAS private final boolean CAS_kvs( final Object[] oldkvs, final Object[] newkvs ) { return _unsafe.compareAndSwapObject(this, _kvs_offset, oldkvs, newkvs ); }

byte[ ]. roll-your-own. fast.

1 endian wars ended! Keep-It-Simple-Serialization.

public AutoBuffer putA1 ( byte[] ary, int sofar, int length ) {

while( sofar < length ) { int len = Math.min(length - sofar, _bb.remaining()); _bb.put(ary, sofar, len); sofar += len; if( sofar < length ) sendPartial(); } return this;

Data Movement is a Defect. Slowing down helps communication.

Got Speed?

Accuracy rules over speed. Predictive Performance

0 Math always produces a number

Data presentation bias. Sorted data => interesting results

1 Shuffle

2 Random acts of Kindness?

3 Convex Problems: ADMM

Matrix operations jama, jblas.. all single node. Distributed version needs data transfer!

4  Amdahl strikes: Cholesky / QR Decomposition

embarrassingly parallel binning tree-building splits

5 Random Forests

iterate & stage weak-learners =>

strong learners each tree can be parallel minimize communication

6 Boos-ng

embarrassingly parallel pre-calculate base stats distance calculation weight matrices – small footprint

7 Neural Nets & Clustering

Daisy chain a bunch of models Interleave. JIT – Minimize loops over data.

8 Ensembles

Deterministic versions first! Got Pen & Paper? Optimize often. Test Big Data soon.

9 Tools

Replace NAs to improves predictive performance by about 10pc.!

- Newton

Munging Missing Features impute NAs with mean impute NAs with knn impute with recursive pca!

- Boyd

Unbalanced data single rare classes Fraud / No-Fraud!

Stratify

Unbalanced data multiple rare classes Browse, Click, Purchase!

Stratify

Use Customer Data Algorithms for Sparse vs. Dense Unbalanced Data. Robustness under noise

10 Data is the System

Volume: HDFS

HIVE/SQL

Data Scientist

Munging slice n dice Features

Classification Regression Clustering Optimal Model

Engineer

Velocity: Events Online Scoring

Explora-on

Modeling

Offline Scoring

Business Analyst

Ensemble models Low latency

Applications

Predictions

Rule Engine

Before H2O

Big Data beats Better Algorithms!

Big Data and Better Algorithms! Scale & Parallelism!

H2O the

Prediction

Engine

Intellectual Legacy

Math needs to be free

Open Source

Support and Innovation

hFps://github.com/0xdata/h2o

H2O – The Open Source Math Engine !

Better Predictions!

0xdata.com

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Read the docs!

This talk!

Join our GIT!

0xdata.com

Distributed Data Taxonomy

Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame

Usecases

Conversion, Retention & Churn!•  Lead Conversion!•  Engagement!•  Product Placement!•  Recommendations!

Pricing Engine!Fraud Detection!

top 10 performance gotchas for scaling in-memory algorithms

Technology

xamarin.android memory management gotchas

scaling algorithms for network problems - university of...

verilog gotchas part1

stabilized sparse scaling algorithms for entropy regularized...

condor submissions: gotchas

towards scaling fully personalized pagerank: algorithms,...

networked slepian –wolf: theory, algorithms, and scaling...

gotchas paintball

optimal scaling in mtm algorithms

quantum algorithms for matrix scaling and matrix balancing

scaling up coordinate descent algorithms for large 1...

condor gotchas iii

ruby gotchas

scaling algorithms for approximate and exact maximum

cowra public school€¦ · 25 gotchas 50 gotchas 75...

when huge is routine: scaling genetic algorithms and

concurrency gotchas

scaling learning algorithms towards ai

scaling learning algorithms towards ai - yann...

self-scaling variable metric (ssvm) algorithms. part i...