# machine learning with matlab - ?· 2 agenda machine learning overview machine learning with matlab...

Post on 30-Aug-2018

221 views

Embed Size (px)

TRANSCRIPT

1 2013 The MathWorks, Inc.

Machine Learning with MATLAB

U. M. Sundar

Senior Application Engineer

MathWorks India

2

Agenda

Machine Learning Overview

Machine Learning with MATLAB

Unsupervised Learning

Clustering

Supervised Learning

Classification

Regression

Learn More

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3

Machine Learning Overview Types of Learning, Categories of Algorithms

Machine

Learning

Supervised

Learning

Classification

Regression

Unsupervised

Learning Clustering

Group and interpret data based only

on input data

Develop predictive model based on both input and output data

Type of Learning Categories of Algorithms

4

Machine Learning When and where it is used?

When to use it

Predict a future outcome based on

Historical data (many variables)

Specific patterns

Define a System that is

Based on inputs and outputs from the system

complex to define using governing equations (e.g., black-box modeling)

Examples

Pattern recognition (speech, images)

Financial algorithms (credit scoring, algo trading)

Energy forecasting (load, price)

Biology (tumor detection, drug discovery)

93.68%

2.44%

0.14%

0.03%

0.03%

0.00%

0.00%

0.00%

5.55%

92.60%

4.18%

0.23%

0.12%

0.00%

0.00%

0.00%

0.59%

4.03%

91.02%

7.49%

0.73%

0.11%

0.00%

0.00%

0.18%

0.73%

3.90%

87.86%

8.27%

0.82%

0.37%

0.00%

0.00%

0.15%

0.60%

3.78%

86.74%

9.64%

1.84%

0.00%

0.00%

0.00%

0.08%

0.39%

3.28%

85.37%

6.24%

0.00%

0.00%

0.00%

0.00%

0.06%

0.18%

2.41%

81.88%

0.00%

0.00%

0.06%

0.08%

0.16%

0.64%

1.64%

9.67%

100.00%

AAA AA A BBB BB B CCC D

AAA

AA

A

BBB

BB

B

CCC

D

5

Basic Concepts in Machine Learning

Start with an initial set of data

Learn from this data

Train your algorithm

with this data

Use the resulting model

to predict outcomes

for new data sets

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

6

Machine Learning Process

Exploration Modeling Evaluation Deployment

7

Exploratory Data Analysis

Gain insight from visual examination

Identify trends and interactions

Detect patterns

Remove outliers

Shrink data

Select and pare predictors

Feature transformation

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

8

Data Exploration Interactions Between Variables

Plot Matrix by Group Parallel Coordinates Plot Andrews Plot

Glyph Plot Chernoff Faces

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

MPG Acceleration Displacement Weight Horsepower-3

-2

-1

0

1

2

3

4

Coord

inate

Valu

e

4

6

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8

-6

-4

-2

0

2

4

6

8

t

f(t)

4

6

8

chevrolet chevelle malibu buick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

chevrolet chevelle malibubuick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

9

Unsupervised Learning Clustering

Supervised

Learning

Unsupervised

Learning Clustering

Group and interpret data based only

on input data Machine

Learning

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Classification

Regression

10

Dataset Well Be Using

Cloud of randomly generated points

Each cluster center is randomly chosen inside specified bounds

Each cluster contains the specified number of points per cluster

Each cluster point is sampled from a Gaussian distribution

Multi-dimensional dataset

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

11

Clustering Overview

What is clustering?

Segment data into groups,

based on data similarity

Why use clustering?

Identify outliers

Resulting groups may be

the matter of interest

How is clustering done?

Can be achieved by various algorithms

It is an iterative process (involving trial and error)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

12

Clustering: K Means Clustering

K-means is a partitioning method

Partitions data into k mutually

exclusive clusters

Each cluster has a

centroid (or center)

Sum of distances from

all objects to the center

is minimized

Statistics Toolbox

13

Clustering: Neural Networks

Networks are comprised of one or more layers

Outputs computed by applying a nonlinear

transfer function with weighted sum of

inputs

Trained by letting the network

continually adjust itself

to new inputs (determines weights)

Interactive apps for easily creating and training networks

Multi-layered networks created by cascading

(provide better accuracy)

Example architectures for clustering:

Self-organizing maps

Competitive layers

Output Variable

Transfer function

Weights

Bias

Input variables

14

Clustering : Gaussian Mixture Models

Good when clusters have different

sizes and are correlated

Assume that data is drawn

from a fixed number K

of normal distributions

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

10

20

Statistics Toolbox

15

Cluster Analysis Summary

Segments data into groups, based on data similarity

No method is perfect (depends on data)

Process is iterative; explore different algorithms

Beware of local minima (global optimization can help)

Clustering

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

16

Model Development Process

Exploration Modeling Evaluation Deployment

17

Supervised Learning Classification for Predictive Modeling

Unsupervised

Learning

Develop predictive model based on both input and output data

Machine

Learning

Supervised

Learning

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

18

Classification Overview

What is classification?

Predicting the best group for each point

Learns from labeled observations

Uses input features

Why use classification?

Accurately group data never seen before

How is classification done?

Can use several algorithms to build a predictive model

Good training data is critical

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

19

Classification - Decision Trees

Builds a tree from training data

Model is a tree where each node is a

logical test on a predictor

Traverse tree by comparing

features with threshold values

The leaf of the tree

specifies the group

Statistics Toolbox

20

Classification - Ensemble Learners Overview

Decision trees are weak

Recommended