machine learning with matlab - … · 2 agenda machine learning overview machine learning with...

34
1 © 2013 The MathWorks, Inc. Machine Learning with MATLAB U. M. Sundar Senior Application Engineer MathWorks India

Upload: hatuyen

Post on 30-Aug-2018

288 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

1 © 2013 The MathWorks, Inc.

Machine Learning with MATLAB

U. M. Sundar

Senior Application Engineer

MathWorks India

Page 2: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

2

Agenda

Machine Learning Overview

Machine Learning with MATLAB

– Unsupervised Learning

Clustering

– Supervised Learning

Classification

Regression

Learn More

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 3: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

3

Machine Learning Overview Types of Learning, Categories of Algorithms

Machine

Learning

Supervised

Learning

Classification

Regression

Unsupervised

Learning Clustering

Group and interpret data based only

on input data

Develop predictive model based on both input and output data

Type of Learning Categories of Algorithms

Page 4: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

4

Machine Learning When and where it is used?

When to use it

– Predict a future outcome based on

Historical data (many variables)

Specific patterns

– Define a System that is

Based on inputs and outputs from the system

complex to define using governing equations (e.g., black-box modeling)

Examples

– Pattern recognition (speech, images)

– Financial algorithms (credit scoring, algo trading)

– Energy forecasting (load, price)

– Biology (tumor detection, drug discovery)

93.68%

2.44%

0.14%

0.03%

0.03%

0.00%

0.00%

0.00%

5.55%

92.60%

4.18%

0.23%

0.12%

0.00%

0.00%

0.00%

0.59%

4.03%

91.02%

7.49%

0.73%

0.11%

0.00%

0.00%

0.18%

0.73%

3.90%

87.86%

8.27%

0.82%

0.37%

0.00%

0.00%

0.15%

0.60%

3.78%

86.74%

9.64%

1.84%

0.00%

0.00%

0.00%

0.08%

0.39%

3.28%

85.37%

6.24%

0.00%

0.00%

0.00%

0.00%

0.06%

0.18%

2.41%

81.88%

0.00%

0.00%

0.06%

0.08%

0.16%

0.64%

1.64%

9.67%

100.00%

AAA AA A BBB BB B CCC D

AAA

AA

A

BBB

BB

B

CCC

D

Page 5: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

5

Basic Concepts in Machine Learning

Start with an initial set of data

“Learn” from this data

– “Train” your algorithm

with this data

Use the resulting model

to predict outcomes

for new data sets

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Page 6: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

6

Machine Learning Process

Exploration Modeling Evaluation Deployment

Page 7: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

7

Exploratory Data Analysis

Gain insight from visual examination

– Identify trends and interactions

– Detect patterns

– Remove outliers

– Shrink data

– Select and pare predictors

– Feature transformation

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

Page 8: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

8

Data Exploration Interactions Between Variables

Plot Matrix by Group Parallel Coordinates Plot Andrews’ Plot

Glyph Plot Chernoff Faces

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

MPG Acceleration Displacement Weight Horsepower-3

-2

-1

0

1

2

3

4

Coord

inate

Valu

e

4

6

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8

-6

-4

-2

0

2

4

6

8

t

f(t)

4

6

8

chevrolet chevelle malibu buick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

chevrolet chevelle malibubuick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

Page 9: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

9

Unsupervised Learning Clustering

Supervised

Learning

Unsupervised

Learning Clustering

Group and interpret data based only

on input data Machine

Learning

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Classification

Regression

Page 10: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

10

Dataset We’ll Be Using

Cloud of randomly generated points

– Each cluster center is randomly chosen inside specified bounds

– Each cluster contains the specified number of points per cluster

– Each cluster point is sampled from a Gaussian distribution

– Multi-dimensional dataset

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Page 11: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

11

Clustering Overview

What is clustering?

– Segment data into groups,

based on data similarity

Why use clustering?

– Identify outliers

– Resulting groups may be

the matter of interest

How is clustering done?

– Can be achieved by various algorithms

– It is an iterative process (involving trial and error)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 12: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

12

Clustering: K – Means Clustering

K-means is a partitioning method

Partitions data into k mutually

exclusive clusters

Each cluster has a

centroid (or center)

– Sum of distances from

all objects to the center

is minimized

Statistics Toolbox

Page 13: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

13

Clustering: Neural Networks

Networks are comprised of one or more layers

Outputs computed by applying a nonlinear

transfer function with weighted sum of

inputs

Trained by letting the network

continually adjust itself

to new inputs (determines weights)

Interactive apps for easily creating and training networks

Multi-layered networks created by cascading

(provide better accuracy)

Example architectures for clustering:

– Self-organizing maps

– Competitive layers

Output Variable

Transfer function

Weights

Bias

Input variables

Page 14: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

14

Clustering : Gaussian Mixture Models

Good when clusters have different

sizes and are correlated

Assume that data is drawn

from a fixed number K

of normal distributions

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

10

20

Statistics Toolbox

Page 15: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

15

Cluster Analysis Summary

Segments data into groups, based on data similarity

No method is perfect (depends on data)

Process is iterative; explore different algorithms

Beware of local minima (global optimization can help)

Clustering

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Page 16: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

16

Model Development Process

Exploration Modeling Evaluation Deployment

Page 17: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

17

Supervised Learning Classification for Predictive Modeling

Unsupervised

Learning

Develop predictive model based on both input and output data

Machine

Learning

Supervised

Learning

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

Page 18: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

18

Classification Overview

What is classification?

– Predicting the best group for each point

– “Learns” from labeled observations

– Uses input features

Why use classification?

– Accurately group data never seen before

How is classification done?

– Can use several algorithms to build a predictive model

– Good training data is critical

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Page 19: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

19

Classification - Decision Trees

Builds a tree from training data

– Model is a tree where each node is a

logical test on a predictor

Traverse tree by comparing

features with threshold values

The “leaf” of the tree

specifies the group

Statistics Toolbox

Page 20: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

20

Classification - Ensemble Learners Overview

Decision trees are “weak” learners

– Good to classify data used to train

– Often not very good with new data

– Note rectangular groups

What are ensemble learners?

– Combine many decision trees to create a

“strong” learner

– Uses “bootstrapped aggregation”

Why use ensemble methods?

– Classifier has better predictive power

– Note improvement in cluster shapes

Statistics Toolbox

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5

0

0.5

1

1.5

x1

x2

group1

group2

group3

group4

group5

group6

group7

group8

Page 21: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

21

Classification - Support Vector Machines Overview

Good for modeling with complex

boundaries between groups

– Can be very accurate

– No restrictions on the predictors

What does it do?

– Uses non-linear “kernel” to

calculate the boundaries

– Can be computationally intensive

Version in Statistics Toolbox only

classifies into two groups

Statistics Toolbox

(as of R2013a)

-3 -2 -1 0 1 2 3-2

-1

0

1

2

3

4

1

2

Support Vectors

Page 22: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

22

K-Nearest Neighbor Classification

One of the simplest classifiers

Takes the K nearest points

from the training set, and

chooses the majority class

of those K points

No training phase – all the

work is done during the

application of the model -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

-0.5

0

0.5

1

1.5

x1

x2

group1

group2

group3

group4

group5

group6

group7

group8

Statistics Toolbox

Page 23: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

23

Classification Summary

No absolute best method

Simple does not

mean inefficient

Watch for overfitting

– Decision trees and neural networks may overfit the noise

– Use ensemble learning and cross-validation

Parallelize for speedup

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

Page 24: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

24

Supervised Learning Regression for Predictive Modeling

Unsupervised

Learning

Develop predictive model based on both input and output data

Machine

Learning

Supervised

Learning

Linear

Non-linear

Non-parametric

Regression

Page 25: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

25

Regression

Why use regression?

– Predict the continuous response

for new observations

Type of predictive modeling

– Specify a model that describes

Y as a function of X

– Estimate coefficients that

minimize the difference

between predicted and actual

You can apply techniques from earlier sections with

regression as well

Statistics Toolbox

Curve Fitting Toolbox

Page 26: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

26

Linear Regression

Y is a linear function of the regression coefficients

Common examples:

Straight line 𝑌 = 𝐵0 + 𝐵1𝑋1

Plane

𝑌 = 𝐵0 + 𝐵1𝑋1 +𝐵2𝑋2

Polynomial 𝑌 = 𝐵0 + 𝐵1𝑋13 + 𝐵2𝑋1

2 +𝐵3𝑋1

Polynomial

with cross terms 𝑌 = 𝐵0 + 𝐵1𝑋1

2 + 𝐵2(𝑋1 ∗ 𝑋2) + 𝐵3 𝑋22

Page 27: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

27

Fourier Series 𝑏0 + 𝑏1 cos 𝑏3𝑋 + 𝑏2 sin 𝑏3𝑋

Exponential Growth 𝑁 = 𝑁0𝑒

Logistic Growth 𝑃 𝑡 =𝑏0

1 + 𝑏1𝑒−𝑘𝑡

Nonlinear Regression

Y is a nonlinear function of the regression coefficients

Syntax for formulas:

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

@(b,t)(1/(b(1) + exp(-

b(2)*x)))

Page 28: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

28

Generalized Linear Models

Extends the linear model

– Define relationship between model and response variable

– Model error distributions other than normal

Logistic regression

– Response variable is binary (true / false)

– Results are typically expressed as an odd’s ratio

Poisson regression

– Model count data (non-negative integers)

– Response variable comes from a Poisson distribution

Page 29: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

29

Machine Learning with MATLAB

Interactive environment

– Visual tools for exploratory data analysis

– Easy to evaluate and choose best algorithm

– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)

Multiple algorithms to choose from

– Clustering

– Classification

– Regression

Page 31: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

31

MathWorks India – Services and Offerings

Local website: www.mathworks.in

Technical Support India: www.mathworks.in/ myservicerequests

Customer Service for non-technical questions: [email protected]

Application Engineering

Product Training:

www.mathworks.in/training

Consulting

Page 32: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

32

Scheduled Public Training for Sep–Dec 2013

Course Name Location Training dates

Statistical Methods in MATLAB Bangalore 02- 03 Sep 2013

MATLAB based Optimization Techniques Bangalore 04 Sep 2013

Physical Modeling of Multi-Domain Systems using Simscape Bangalore 05 Sep 2013

MATLAB Fundamentals

Delhi 23-25 Sep 2013

Pune 07-09 Oct 2013

Bangalore 21-23 Oct 2013

Web based 05- 07 Nov 2013

Chennai 09-11 Dec 2013

Simulink for System and Algorithm Modeling

Delhi 26-27 Sep 2013

Pune 10-11 Oct 2013

Bangalore 24-25 Oct 2013

Web based 12-13 Nov 2013

Chennai 12-13 Dec 2013

MATLAB Programming Techniques Bangalore 18-19 Nov 2013

MATLAB for Data Processing and Visualization Bangalore 20 Nov 2013

MATLAB for Building Graphical User Interface Bangalore 21 Nov 2013

Generating HDL Code from Simulink Bangalore 28-29 Nov 2013 Email: [email protected] URL: http://www.mathworks.in/services/training Phone: 080-6632-6000

Page 33: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

33

MathWorks Certification Program- for the first

time in India!

MathWorks Certified MATLAB Associate Exam

Why certification?

Validates proficiency with MATLAB

Can help accelerate professional growth

Can help increase productivity and project success and thereby

prove to be a strategic investment

Certification exam administered in English at MathWorks facilities

in Bangalore on Nov 27,2013

Email: [email protected] URL: http://www.mathworks.in/services/training Phone: 080-6632-6000

Page 34: Machine Learning with MATLAB - … · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification

34

MathWorks India Contact Details

URL: http://www.mathworks.in

E-mail: [email protected]

Technical Support: www.mathworks.in/myservicerequests

Tel: +91-80-6632 6000

Fax: +91-80-6632 6010

Thank You for Attending

Talk to Us – We are Happy to Support You

MathWorks India Private Limited

Salarpuria Windsor Building

Third Floor,

No.3 Ulsoor Road

Bangalore - 560042, Karnataka

India