scala for machine learning

59
Scala for Machine Learning Patrick Nicolas December 2014 patricknicolas.blogspot.com www.slideshare.net/pnicolas

Upload: patrick-nicolas

Post on 28-Jul-2015

233 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Scala for Machine Learning

Scala for Machine Learning

Patrick NicolasDecember 2014

patricknicolas.blogspot.comwww.slideshare.net/pnicolas

Page 2: Scala for Machine Learning

What challenges?

Building scientific and machine learning applications requires ….

1. Clearly defined abstractions2. Flexible, dynamic models3. Scalable execution

What makes Scala particularly suitable to solvemachine learning and optimization problems?

... and may involve mathematician, data scientists,software engineers and dev. ops.

Page 3: Scala for Machine Learning

Scala tool box

Which elements in the Scala tool box are useful to meetthese challenges?

Actors

Composed futures

F-boundReactive

Page 4: Scala for Machine Learning

AbstractionNon-linear learning models <= functorial tensors

Kernel monadic composition <= monads

Extending library types <= implicits

Flexibility

Scalability

Page 5: Scala for Machine Learning

Low dimension features space (manifold) embedded into an observation space (Euclidean)

Abstraction: Non-linear learning models

Page 6: Scala for Machine Learning

Tensors

𝑓(𝑥, 𝑦, 𝑧)

𝛻𝑓 =𝜕𝑓

𝜕𝑥 𝑖 +𝜕𝑓

𝜕𝑦 𝑗 +𝜕𝑓

𝜕𝑧𝑘

Each type of tensors is a category, associated with a functor category.

• Field

• Vector field (contravariant)

• Inner product

• Covariant vector field (one-form/map)

• Tensor product ,exterior product , …

< 𝑣,𝑤 > = f

𝛼 𝑤 =< 𝑣,𝑤 >

𝑇𝑚𝑛⨂𝑇𝑝

𝑞𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗

Tensor fields are geometric entities defining linear relation between vector fields, differential forms, scalars and other tensor fields

Abstraction: Non-linear learning models

Page 7: Scala for Machine Learning

Machine learning consists of identifying a low dimension features space, manifold within an Euclidean observations space. Computation of smooth manifolds relies on tensorsand tensor metrics (Riemann, Laplace-Beltrami,…)

Problem: How to represent tensors and metrics?

Solution: Functorial representation of tensors, tensor products and differential forms.

Abstraction: Non-linear learning models

Page 8: Scala for Machine Learning

One option is to define a vector field as a collection (i.e. List) and leverage the functor for the list.

Functor: f: U => V F(f): F(U) => F(V)

Abstraction: Non-linear learning models

Convenient but incorrect…

Page 9: Scala for Machine Learning

Let’s define a generic vector field and covector fields types

Abstraction: Non-linear learning models

Define a tensor as a Higher kind being either a vector or a co-vector accessed through type projection.

Page 10: Scala for Machine Learning

The functor for the vector field relies on the projection (Homfunctor) of 2 argument type functor Tensor on covariant and contravariant types.

Covariant Functor f: U => V F(f): F(U) => F(V)

Abstraction: Non-linear learning models

Page 11: Scala for Machine Learning

Contravariant functors are used for morphisms or transformation on Covariant tensors (type CoVField)

Contravariant functor f: U => V F(f): F(V) => F(U)

Abstraction: Non-linear learning models

Page 12: Scala for Machine Learning

Product Functor

Tensor metrics and products requires other type of functors …

BiFunctor

(*) Paul Phillips’ cats framework https://github.com/non/cats

Abstraction: Non-linear learning models

Page 13: Scala for Machine Learning

Abstraction: Kernel monadic composition

Clustering or classifying observations entails computation of inner product of observations on the manifold

Kernel functions are commonly used in training to separate classes of observations with a linear decision boundary (hyperplane).

Problem: Building a model entails creating, composing and evaluating numerous kernels.

Solution: Define kernels as a 1st class programming concept with monadic operations.

Page 14: Scala for Machine Learning

Define a kernel function as the composition of 2 functions g o h

𝒦𝑓 𝐱, 𝐲 = 𝑔(

𝑖

ℎ(𝑥𝑖 , 𝑦𝑖))

Abstraction: Kernel monadic composition

We create a monad to generate any kind of kernel functions Kf, by composing their component g: g1 o g2 o … o gn o h

Page 15: Scala for Machine Learning

A monad extends a functor with binding method (flatMap)

The monadic implementation of the kernel function component h

Abstraction: Kernel functions composition

Page 16: Scala for Machine Learning

Declaration explicit kernel function

𝒦 𝐱, 𝐲 = 𝑒−12𝐱−𝐲𝜎

2

h: 𝑥, 𝑦 → 𝑥 − 𝑦 g: 𝑥 → 𝑒−1

2𝜎2( 𝑥)2

Polynomial kernel𝒦 𝐱, 𝐲 = (1 + 𝐱. 𝐲)𝑑 h: 𝑥, 𝑦 → 𝑥. 𝑦 g: 𝑥 → (1 + 𝑥)𝑑

Abstraction: Kernel functions composition

Radius basis function kernel

Page 17: Scala for Machine Learning

Our monad is ready for composing any kind of explicit kernels on demand, using for-comprehension

Abstraction: Kernel functions composition

Page 18: Scala for Machine Learning

Notes

• Quite often monads defines filtering capabilities (i.e. Scala collections).

• Accidently, the for-comprehension closure can be also used to create dynamic workflow

Abstraction: Kernel functions composition

Page 19: Scala for Machine Learning

Abstraction: Extending library types

Scala libraries classes cannot always be sub-classed. Wrapping library component in a helper class clutters the design.

Implicit classes extends classes functionality without cluttering name spaces (alternative to type classes)

The purpose of reusability goes beyond refactoring code. It includes leveraging existing well understood concepts and semantic.

Page 20: Scala for Machine Learning

Data flow micro-router for successful and failed computation by transforming Try to Either with recovery and processing functions

scala.util.Try[T]

recover[U >: T](f: PartialFunction[Throwable, U]): Try[U]

getOrElse[U >: T](f: () => U): U

orElse[U :> T](f: () => Try[U]): Try[U]

toEither[U](rec: () => U)(f: T => T): Either[U, T]

Abstraction: Extending library types

Page 21: Scala for Machine Learning

.. as applied to a normalization problem.

4 lines of Scala code to extend Try with Either concept.

Abstraction: Extending library types

Page 22: Scala for Machine Learning

Notes

Abstraction: Extending library types

• Type conversion such as toDouble, toFloat can be extended to deal rounding error or rendering precision

• Creating a type class is a more generic (appropriate?) methodology to extends functionality of a closed model or framework. Is there a reason why Try in Scala standard library does not support conversion to Either ?

Page 23: Scala for Machine Learning

Abstractionnon-linear learning models <= functorial tensors

Kernel monadic composition <= monads

Extending library types <= implicits

FlexibilityModeling <= Stackable traits

Scalability

Page 24: Scala for Machine Learning

Flexibility: modeling

Building machine learning apps requires configurable, dynamic workflows that preserve the model formalism

Leverage mixins, inheritance and abstract values to create models and weave data transformation.

Factory design patterns have been used to model dynamic systems (GoF). Are they adequate to model dynamic workflow?

Page 25: Scala for Machine Learning

Flexibility: modeling

Traditional programming languages compare unfavorably to scientific related language such as R because their inability to follow a strict mathematical formalism:

1. Variable declaration

2. Model definition

3. Instantiation

Scala stacked traits and abstract values preserve the core formalism of mathematical expressions.

Page 26: Scala for Machine Learning

𝑓 ∈ ℝ𝑛 → ℝ𝑛

𝑓 𝑥 = 𝑒𝑥

𝑔 ∈ ℝ𝑛 → ℝ

ℎ = 𝑔𝑜𝑓

g 𝒙 = 𝑖 𝑥𝑖

Declaration

Model

Instantiation

Flexibility: modeling

Page 27: Scala for Machine Learning

Multiple models and algorithms are typically evaluated by weaving computation tasks.

A learning platform is a framework that• Define computational tasks• Wires the tasks (data flow)• Deploys the tasks (*)

Overcome limitation of monadic composition (3 level of dynamic binding…)

(*) Actor-based deployment

Flexibility: modeling

Page 28: Scala for Machine Learning

Even the simplest workflow (model of data transformation) requires flexibility …..

Flexibility: modeling

Page 29: Scala for Machine Learning

Data scientists should be able to

1. Given the objective of the computation, select the best sequence of module/tasks (i.e. Modeling: Preprocessing + Training + Validating)

2. Given the profile of data input, select the best data transformation for each module (i.e. Data preprocessing: Kalman, DFT, Moving average….)

3. Given the computing platform, select the best implementation for each data transformation (i.e. Kalman: KalmanOnAkka, Spark…)

Flexibility: modeling

Page 30: Scala for Machine Learning

Implementation of Preprocessing module

Flexibility: modeling

Page 31: Scala for Machine Learning

Implementation of Preprocessing module using discrete Fourier

… and discrete Kalman filter

Flexibility: modeling

Page 32: Scala for Machine Learning

d

dPreprocessing

Loading

Reducing Training

Validating

Preprocessor

DFTFilter

Kalman

EM

PCA SVM

MLP

Reducer Supervisor

Clustering

Clustering workflow = preprocessing task -> Reducing task

Modeling workflow = preprocessing task -> model training task -> model validation

Modeling

Flexibility: modeling

Page 33: Scala for Machine Learning

A simple clustering workflow requires a preprocessor &reducer. The computation sequence exec transform a time series of element of type U and return a time series of type W as option

Flexibility: modeling

Page 34: Scala for Machine Learning

A model is created by processing the original time series of type TS[T] through a preprocessor, a training supervisor and a validator

Flexibility: modeling

Page 35: Scala for Machine Learning

Putting all together for a conditional path execution …

Flexibility: modeling

1

Page 36: Scala for Machine Learning

AbstractionNon-linear learning models <= functorial tensors

Kernel monadic composition <= monads

Extending library types <= implicits

FlexibilityModeling <= Stackable traits

ScalabilityDynamic programming <= tail recursion

Online processing <= streams

Data flow control <= back-pressure strategy

Page 37: Scala for Machine Learning

Scalability: dynamic programming

Many machine learning algorithms (HMM,RL, EM, MLP, …) relies on dynamic programming techniques

Tail recursion is very efficient solution because it avoids the creation of new stack frames

Choosing between iterative and recursive implementation of algorithms is a well-documented dilemma.

Page 38: Scala for Machine Learning

Viterbi algorithm for hidden Markov Models

The objective is to find the most likely sequence of states {qt} given a set of observations {Ot} and a λ-model

Scalability: dynamic programming

Page 39: Scala for Machine Learning

The algorithm recurses along the observations with N different states.

Scalability: dynamic programming

Page 40: Scala for Machine Learning

Relative performance of the recursion w/o tail elimination for the Viterbi algorithm given the number of observations

Scalability: dynamic programming

Page 41: Scala for Machine Learning

Scalability: online processing

Some problems lend themselves to process very large data sets of unknown size for which the execution may have to be aborted or re-applied

Streams reduce memory consumption by allocating and releasing chunk of data (or slice or time series) while allowing reuse of intermediate results.

An increasing number of algorithms such as reinforcement training relies on online (or on-demand) training.

Page 42: Scala for Machine Learning

X0 X1 ….... Xn ………. Xm

Data stream

1

2𝑚 𝑦𝑛 − 𝑓 𝒘|𝑥𝑛

2+ 𝜆 𝒘 2

Garbage collector

Allocate slice .take

Release slice .drop

Heap

Traversal loss function

Scalability: online processing

The large data set is converted into a stream then broken down into manageable slices. The slices are instantiated, processed (i.e. loss function) and released back to the garbage collector, one at the time

Page 43: Scala for Machine Learning

Slices of NOBS observations are allocated one at the time, (.take) processed, then released (.drop) at the time.

Scalability: online processing

Page 44: Scala for Machine Learning

The reference streamRef has to be weak, in order to have the slices garbage collected. Otherwise the memory consumption increases with each new batch of data.

(*) Alternatives: define strmRef as a def or use StreamIterator

Scalability: online processing

Page 45: Scala for Machine Learning

Comparing list, stream and stream with weak references.

Scalability: online processing

Operating zone

Page 46: Scala for Machine Learning

Notes:

Iterators: • computation cannot not memoized. (“Iterators are the

imperative version of streams”)• One element at a time• Non-recursive (tail elimination)

Views:• No intermediate results preserved• One element at a time

Stream iterators: • Lazy tails

Scalability: online processing

Page 47: Scala for Machine Learning

The execution of workflow may create a stream bottleneck, for slow tasks and overflow local buffers.

A flow control mechanism handling back pressure on bounded mail boxes of upstream actors.

Actors provides a very efficient and reliable way to deploy workflows and tasks over a large number of cores and hosts.

Scalability: flow control

Page 48: Scala for Machine Learning

Scalability: flow control

Actor-based workflow has to consider• Cascading failures => supervision strategy• Cascading bottleneck => Mailbox back-pressure strategy

Workers

Router, Dispatcher, …

Akka has reliable mechanism to handle failure. What about temporary disruptions?

Page 49: Scala for Machine Learning

Scalability: flow control

Messages passing scheme to process various data streams with transformations.

Dataset

Workers

Controller

Watcher

Load->

Compute->

Bounded mailboxes

<- GetStatus

Status ->

Completed->

Page 50: Scala for Machine Learning

Worker actors processes data chunk msg.xt sent by the

Controller with the transformation msg.fct

Message sent by collector to trigger computation

Scalability: flow control

Page 51: Scala for Machine Learning

Watcher actor monitors messages queues report to collector with

Status message.

GetStatus message sent by the collector has no payload

Scalability: flow control

Page 52: Scala for Machine Learning

Controller creates the workers, bounded mailbox for each worker actor (msgQueues) and the watcher actor.

Scalability: flow control

Page 53: Scala for Machine Learning

The Controller loads the data sets per chunk upon receiving the message Load from the main program. It processes the results of the computation from the worker (Completed) and throttle the input to workers for each Status message.

Scalability: flow control

Page 54: Scala for Machine Learning

The Load message is implemented as a loop that create data chunk which size is adjusted according to the load computed by the watcher and forwarded to the controller, Status

Scalability: flow control

Page 55: Scala for Machine Learning

Simple throttle increases/decreases size of the batch of observations given the current load and specified watermark.

Scalability: flow control

Selecting faster/slower and less/more accurate version of algorithm can also be used in the regulation strategy

Page 56: Scala for Machine Learning

Feedback control loop adjusts the size of the batches given the load in mail boxes and complexity of the computation

Scalability: flow control

Page 57: Scala for Machine Learning

• Feedback control loop should be smoothed (moving average, Kalman…)

• A larger variety of data flow control actions such as adding more workers, increasing queue capacity, …

• The watch dog should handle dead letters, in case of a failure of the feedback control or the workers.

• Reactive streams introduced in Akka 2.2+ has a sophisticated TCP-based propagation and back pressure control flows

Notes

Scalability: flow control

Page 58: Scala for Machine Learning

… and there is more

There are many other Scala programming language constructs I found particularly intriguing as far as for machine learning is concerned …

Reactive streams (TCP)

Domain Specific Language

Emulate ‘R’ language for scientists to use the application.

Effective fault-tolerance & flow control mechanism

Delimited continuationSave, restore, reuse computation states

Page 59: Scala for Machine Learning

Donate to Apache software and Eclipse foundations

Monads are Elephants J. Ivy –james-iry.blogspot.com/2007/10/monads-are-elephans-part2.html

Extending the Cake pattern: Dependency injection in Scala A. Warski –www.warski.org/blog/2010/12/di-in-scala-cake-pattern

Programming in Scala $12.5 Traits as stackable modification M. Odersky, M. Spoon, L. Venners - Artima 2008

Introducing Akka J. Boner - Typesafe 2012www.slideshare.net/jboner/introducing-akka

Scala in Machine Learning: $1 Getting started P. Nicolas –Packt publishing 2014

Exploring Akka Stream’s TCP Back Pressure: U. Peter – Xebia 2015blog.xebia.com/2015/01/14/exploring-akka-streams-tcp-back-pressure/

References

Cats functional library P. Phillips – https://github.com/non/cats