learn faster: high-performance machine learning on gpu clusters

22
Learn Faster: High-Performance Machine Learning on GPU Clusters Learn Faster: High-Performance Machine Learning on GPU Clusters Peter Wittek September 26, 2012

Upload: peter-wittek

Post on 26-Jan-2015

115 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance MachineLearning on GPU Clusters

Peter Wittek

September 26, 2012

Page 2: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

What Machine Learning Is Not

It is not statisticsData-drivenStrict assumptions on underlying distributions

It is not AIModel-drivenUncertainty is addressed

It is not data miningAlthough there is a considerable overlap

Page 3: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

What Machine Learning Should Be About

Data-drivenLooking for patternsClasses, groups of similar objectsMainly quantitative, but can also be qualitative

Robust, tolerates noiseGeneralize well beyond training data

Page 4: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

Characteristics

Loose collection of algorithmsNo common ground

Few assumptionsParameters can be a major obstacleComputationally intensive

Not easy to parallelizeN:N access patterns are commonOr N:K through a proxy

Page 5: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

Nature-Inspired Methods

Many nature-inspired methodsComputational IntelligenceNeural networks, flocking algorithms, genetic algorithms,chemical reactions, etc.Also methods inspired by quantum mechanicsOthers: manifold learning, density-based clustering,support vector machines, etc.

Page 6: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

Learning Approach

SupervisedBiomedical: recognizing cancer cellsRecognizing handwritingSpam detection

UnsupervisedRecommendation enginesFinding groups of similar patentsIdentifying trends in a dynamic environment

Page 7: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Machine Learning

Ensembles

Page 8: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Why Do We Need It?

Petabytes of dataSparse, noisy, might be missing elements

There should be as few assumptions as possible

Large scale may not entail a need for quick learningmethods

Page 9: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

A Case Study: Digital Preservation

Adding advanced services to digital librariesCloud paradigm is importantOverview of the SHAMAN core infrastructure:

Page 10: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Examples

An ensemble of unsupervised methods:Distributed indexing (not on GPUs)Dimensionality reductionVisualization of clusters

A supervised classifier

Page 11: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Dimensionality Reduction: Random Projection

Johnson-Lindenstrauss lemma (1984)Latent Semantic AnalysisCPU: Incremental approachGPU: 14.5x slow-downAw×dRd×k = A′w×k

Page 12: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Dimensionality Reduction: Random Projection

Dead end: MapReduceMPI and CuSparseVery irregular, sparse matrix ( 1 % nonzero)GPUs 2 4 8 16All With I/O 5.10terms Projection only 19.37Subset With I/O 2.34 3.46 4.53 5.38

Projection only 2.02 4.05 8.10 16.45

Page 13: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Visualization: Self-Organizing Maps

wj(t + 1) = wj(t) + αhbj(t)[x(t)− wj(t)]

hbj = exp(−||rb−rj ||δ(t) )

Batch formulation

wj(tf ) =∑tf

t′=t0hbj (t′)x(t′)∑tf

t′=t0hbj (t′)

Video

Page 14: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Visualization: Self-Organizing Maps

Critical operation: finding best matching unit

d(wj(t0), x(t)) =√∑N

i=1(xi(t)− wji(t0))2

Multi-step reduction to find the minimum

1: v1 = (X ◦ X )[1,1 . . . 1]′

2: v2 = (W ◦W )[1,1 . . . 1]′

3: P1 = [v1v1 . . . v1]4: P2 = [v2v2 . . . v2]

5: P3 = XW ′

6: D = (P1 + P2 − 2P3)

Page 15: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Visualization: Self-Organizing Maps

GPUs 2 4 8 16All With I/O 8.69terms One epoch 9.68Subset With I/O 8.57 7.49 6.48 4.85

One epoch 9.68 9.42 9.75 9.56

Page 16: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Classification: Support Vector Machines

w′φ(xi) + b ≥ 1− ξi if yi = +1,

w′φ(xi) + b ≤ −1 + ξi if yi = −1,

Making a problem linearly separable after embedding intoa feature space by a nonlinear map φ.Minimize min 1

2‖w‖2 + C

∑i ξi

Solve the dual with the Gram matrix K (xi ,xj) = φ(xi)φ(xj)′.

a) b)

Page 17: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

High-Performance Machine Learning

Classification: Support Vector Machines

SVM ModelCreation

Cross ValidationKernel Matrix

Calculation (GPU)

SVM ParameterSelection

N-fold Validation

Different Sets of Parameters

TrainingData

SVMModel

Around 10x speedup

Page 18: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Quantum-Inspired Methods

Why Is Quantum Mechanics Relevant?

Contextual probabilityp(A ∩ B) 6= p(B ∩ A)If an event A happens, it implies a context

Robust and naturally fuzzyQuantum probability and quantum logic: same linalgframeworkBonus: HPC acceleration for free

Page 19: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Quantum-Inspired Methods

Dynamic Quantum Clustering

Semi-classical methodEhrenfest’s theoremEvolves the Hamiltonian of a quantum system:

Hψ(x , t) = (T + V (x))ψ(x , t)

Page 20: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Quantum-Inspired Methods

Dynamic Quantum Clustering

Speedups are impressive; square root of matrix below

0

10

20

30

40

50

60

70

80

90

64 128 256 512 1024 2048 4096 8192

Speedup

Matrix size

Without Memory TransferWith Memory Transfer

Page 21: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Quantum-Inspired Methods

There Is More to It

Trotter-Suzuki AlgorithmAvoids eigendecompositionLinear scaling tested up to 64 GPUsSpeedup over SSE and cache optimized CPU variant: 4-8x

0

5

10

15

20

25

30

1 2 4 8 16 32

Tim

e (

s)

Nodes

cpusse

cudahybrid

Going beyond current HPC: Machine learning based onactual quantum computers

Page 22: Learn Faster: High-Performance Machine Learning on GPU Clusters

Learn Faster: High-Performance Machine Learning on GPU Clusters

Quantum-Inspired Methods

Summary

ML is about data and patternsBlend of algorithmsEnsembles

AI?Parallel and distributed computing with challenges

Large-scale versus HPCTowards a common ground: quantum-inspired methods

Bonus: HPC with little effort