parallel tuning of machine learning algorithms, thesis proposal

22
Parallel auto-tuning of machine learning algorithms Gianmario Spacagna [email protected] 16 October 2012 (877) 769-3047 (408) 404-0152 fax [email protected] AgilOne, Inc. 1091 N Shoreline Blvd. #250 Mountain View, CA 94043

Upload: gianmario-spacagna

Post on 04-Jul-2015

789 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Parallel auto-tuning of machine learning algorithms Gianmario Spacagna [email protected]

16 October 2012

(877) 769-3047 (408) 404-0152 fax [email protected]

AgilOne, Inc. 1091 N Shoreline Blvd. #250 Mountain View, CA 94043

Page 2: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Motivation • Increase revenue of cloud service providers à Keep cost curve linear w.r.t. the expected exponential income growth.

• Technically achievable through Scalability: • Scalability in terms of resources à Distributed Parallel

Computing (Hadoop). • Scalability in terms of multi-tenancy à Same system

running for several customers. • Scalability in terms of auto-configuration à Avoiding manual tuning up operations.

2

Income Cost

Page 3: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Good Work Flow

3

Good data

ML Algorithm

Good results!

Tuning (Adjusting configuration)

Page 4: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

General Tuning diagram

4

Test Data

Run algorithm with conf. X

Are results good?

Tuned

Change configuration

X

yes

no

Page 5: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Tuning of Machine Learning Algorithms

• We need tuning when: • New algorithm or version is released. • We want to improve accuracy and/or performance. • New customer comes and the system must be customized for the

new dataset and requirements.

We need to make it smart, automatic and scalable!

5

Page 6: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Vision

6

Magic Box

Request: •  Data set •  Application

(prediction, clustering, classification…) •  Algorithm

(ANN, LR, K-means…) •  Fitness metrics

(Std. dev, Prob. of false true, clustering coeff., randomness…)

•  Goal constraints (x> 0.9 & 0.3<y<0.5)

Response: •  Best algorithm •  Optimal

configuration •  Metrics

evaluation

Page 7: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Architecture Design

7

Initializer

Upper Applications API

Controller

Scheduler

Executor ANN

Hadoop Cloud Service

Executor LR

Executor K-Means

Evaluator

Data Sampler

Evaluator

Data Sampler

Evaluator

Data Sampler

Local

Page 8: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Upper Applications API

Tasks: • Interfaces the communication between the system and the upper applications layer.

• Parse requests and results and generates the related output domain object.

Possible data format: • JSON • STDIN/OUT

8

Page 9: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Initializer

Tasks: • Generates the initial set of configuration.

Possible implementations: • Random points • Latin Hyper Cube

• Dataset similarity

9

Page 10: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Controller

Tasks: • Compares and generates configurations.

• Decides the convergence of the tuning.

• Adapt the data sampling request.

Possible implementations: • Random search • Grid search

• Stochastic Kriging • Genetic Algorithms

10

Page 11: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Scheduler

Tasks: • Checks if the requests are covered by the available services.

• Schedules and parallelizes requests executions.

• Optimizes resources.

• Collects evaluated results.

Possible implementations: • First available • Oldest idle

• Load balanced • Serialized (single node)

11

Page 12: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Executor

Tasks: • Executes the providing algorithm with the specified configuration.

Possible implementations: • Local execution • Hadoop cluster

• Cloud service

12

Sub components: •  Evaluator: Evaluates results

standing to the specified fitness metrics.

•  Data Sampler: Down and Up sampling of data.

Page 13: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Tuning diagram

13

Test Data

Run algorithm with conf. X

Are results good?

Tuned

Change configuration

X

yes

no

Scheduler, Executor Initializer,

Controller

Test execution Test control

Page 14: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

SUNS: Simple, Unclever and Not Scalable

14

Random Points

STDIN/OUT

Random Search – Grid Search

Serialized

Executor K-Means

Local

Evaluator

Page 15: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

15

Latin Hyper Cube

STDIN/OUT or JSON

Genetic Algorithm / Stochastic Kriging

Serialized

Executor K-Means

Local

Evaluator

SNS: Smart but Not Scalable

Page 16: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

16

Dataset Similarity

STDIN/OUT or JSON

Genetic Algorithm / Stochastic Kriging

Serialized

Executor K-Means

Local

Evaluator

VSNS: Very Smart but Not Scalable

Page 17: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

17

Dataset Similarity

STDIN/OUT or JSON

Genetic Algorithm or Stochastic Kriging

First Available

Executor K-Means

Hadoop

Evaluator

VSS: Very Smart and Scalable

Page 18: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

18

Dataset Similarity

STDIN/OUT or JSON

Genetic Algorithm or Stochastic Kriging

Load Balanced

Executor K-Means

Hadoop

Evaluator

VSVSO: Very Smart, Very Scalable and Optimized

Data Sampler

Page 19: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Thesis

It is possible to build an intelligent system based on Genetic Algorithm/Stochastic

Kriging that automatically selects and tunes machine learning algorithms, such

as K-Means and LR, parallelizing the work on an Hadoop cluster to scale in a

cost-efficient manner.

19

Page 20: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Project Plan 1.  Design the entire application in Scala in a testable and expandable

way.

2.  Implement the Genetic Algorithm or the Stochastic Kriging controller. 3.  Implement the Latin Hyper Cube initializer.

4.  Test with local instance algorithms (K-Means and/or LR).

5.  Develop and test at least one algorithm in MapReduce fashion using Hadoop.

6.  Test with real AgilOne cluster of servers. 7.  Implement the Dataset Similarity initializer.

8.  Implement the Dataset Sampler.

20

Order of priorities:

Page 21: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Questions, feedbacks, suggestions?

21

Page 22: Parallel Tuning of Machine Learning Algorithms, Thesis Proposal

Thank you!

22