machine learning with matlab - ?· 2 agenda machine learning overview machine learning with matlab...

Download Machine Learning with MATLAB - ?· 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised…

Post on 30-Aug-2018

221 views

Category:

Documents

5 download

Embed Size (px)

TRANSCRIPT

  • 1 2013 The MathWorks, Inc.

    Machine Learning with MATLAB

    U. M. Sundar

    Senior Application Engineer

    MathWorks India

  • 2

    Agenda

    Machine Learning Overview

    Machine Learning with MATLAB

    Unsupervised Learning

    Clustering

    Supervised Learning

    Classification

    Regression

    Learn More

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Group1

    Group2

    Group3

    Group4

    Group5

    Group6

    Group7

    Group8

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

  • 3

    Machine Learning Overview Types of Learning, Categories of Algorithms

    Machine

    Learning

    Supervised

    Learning

    Classification

    Regression

    Unsupervised

    Learning Clustering

    Group and interpret data based only

    on input data

    Develop predictive model based on both input and output data

    Type of Learning Categories of Algorithms

  • 4

    Machine Learning When and where it is used?

    When to use it

    Predict a future outcome based on

    Historical data (many variables)

    Specific patterns

    Define a System that is

    Based on inputs and outputs from the system

    complex to define using governing equations (e.g., black-box modeling)

    Examples

    Pattern recognition (speech, images)

    Financial algorithms (credit scoring, algo trading)

    Energy forecasting (load, price)

    Biology (tumor detection, drug discovery)

    93.68%

    2.44%

    0.14%

    0.03%

    0.03%

    0.00%

    0.00%

    0.00%

    5.55%

    92.60%

    4.18%

    0.23%

    0.12%

    0.00%

    0.00%

    0.00%

    0.59%

    4.03%

    91.02%

    7.49%

    0.73%

    0.11%

    0.00%

    0.00%

    0.18%

    0.73%

    3.90%

    87.86%

    8.27%

    0.82%

    0.37%

    0.00%

    0.00%

    0.15%

    0.60%

    3.78%

    86.74%

    9.64%

    1.84%

    0.00%

    0.00%

    0.00%

    0.08%

    0.39%

    3.28%

    85.37%

    6.24%

    0.00%

    0.00%

    0.00%

    0.00%

    0.06%

    0.18%

    2.41%

    81.88%

    0.00%

    0.00%

    0.06%

    0.08%

    0.16%

    0.64%

    1.64%

    9.67%

    100.00%

    AAA AA A BBB BB B CCC D

    AAA

    AA

    A

    BBB

    BB

    B

    CCC

    D

  • 5

    Basic Concepts in Machine Learning

    Start with an initial set of data

    Learn from this data

    Train your algorithm

    with this data

    Use the resulting model

    to predict outcomes

    for new data sets

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Group1

    Group2

    Group3

    Group4

    Group5

    Group6

    Group7

    Group8

  • 6

    Machine Learning Process

    Exploration Modeling Evaluation Deployment

  • 7

    Exploratory Data Analysis

    Gain insight from visual examination

    Identify trends and interactions

    Detect patterns

    Remove outliers

    Shrink data

    Select and pare predictors

    Feature transformation

    MPG Acceleration Displacement Weight Horsepow er

    MP

    GA

    ccele

    ratio

    nD

    ispla

    cem

    ent

    Weig

    ht

    Hors

    epow

    er

    50 1001502002000 4000200 40010 2020 40

    50

    100

    150

    200

    2000

    4000

    200

    400

    10

    20

    20

    40

  • 8

    Data Exploration Interactions Between Variables

    Plot Matrix by Group Parallel Coordinates Plot Andrews Plot

    Glyph Plot Chernoff Faces

    MPG Acceleration Displacement Weight Horsepow er

    MP

    GA

    ccele

    ratio

    nD

    ispla

    cem

    ent

    Weig

    ht

    Hors

    epow

    er

    50 1001502002000 4000200 40010 2020 40

    50

    100

    150

    200

    2000

    4000

    200

    400

    10

    20

    20

    40

    MPG Acceleration Displacement Weight Horsepower-3

    -2

    -1

    0

    1

    2

    3

    4

    Coord

    inate

    Valu

    e

    4

    6

    8

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8

    -6

    -4

    -2

    0

    2

    4

    6

    8

    t

    f(t)

    4

    6

    8

    chevrolet chevelle malibu buick skylark 320 plymouth satellite

    amc rebel sst ford torino ford galaxie 500

    chevrolet impala plymouth fury iii pontiac catalina

    chevrolet chevelle malibubuick skylark 320 plymouth satellite

    amc rebel sst ford torino ford galaxie 500

    chevrolet impala plymouth fury iii pontiac catalina

  • 9

    Unsupervised Learning Clustering

    Supervised

    Learning

    Unsupervised

    Learning Clustering

    Group and interpret data based only

    on input data Machine

    Learning

    K-means,

    Fuzzy K-means

    Hierarchical

    Neural Network

    Gaussian

    Mixture

    Classification

    Regression

  • 10

    Dataset Well Be Using

    Cloud of randomly generated points

    Each cluster center is randomly chosen inside specified bounds

    Each cluster contains the specified number of points per cluster

    Each cluster point is sampled from a Gaussian distribution

    Multi-dimensional dataset

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Group1

    Group2

    Group3

    Group4

    Group5

    Group6

    Group7

    Group8

  • 11

    Clustering Overview

    What is clustering?

    Segment data into groups,

    based on data similarity

    Why use clustering?

    Identify outliers

    Resulting groups may be

    the matter of interest

    How is clustering done?

    Can be achieved by various algorithms

    It is an iterative process (involving trial and error)

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

  • 12

    Clustering: K Means Clustering

    K-means is a partitioning method

    Partitions data into k mutually

    exclusive clusters

    Each cluster has a

    centroid (or center)

    Sum of distances from

    all objects to the center

    is minimized

    Statistics Toolbox

  • 13

    Clustering: Neural Networks

    Networks are comprised of one or more layers

    Outputs computed by applying a nonlinear

    transfer function with weighted sum of

    inputs

    Trained by letting the network

    continually adjust itself

    to new inputs (determines weights)

    Interactive apps for easily creating and training networks

    Multi-layered networks created by cascading

    (provide better accuracy)

    Example architectures for clustering:

    Self-organizing maps

    Competitive layers

    Output Variable

    Transfer function

    Weights

    Bias

    Input variables

  • 14

    Clustering : Gaussian Mixture Models

    Good when clusters have different

    sizes and are correlated

    Assume that data is drawn

    from a fixed number K

    of normal distributions

    0

    0.2

    0.4

    0.6

    0.8

    1

    0

    0.2

    0.4

    0.6

    0.8

    1

    0

    10

    20

    Statistics Toolbox

  • 15

    Cluster Analysis Summary

    Segments data into groups, based on data similarity

    No method is perfect (depends on data)

    Process is iterative; explore different algorithms

    Beware of local minima (global optimization can help)

    Clustering

    K-means,

    Fuzzy K-means

    Hierarchical

    Neural Network

    Gaussian

    Mixture

  • 16

    Model Development Process

    Exploration Modeling Evaluation Deployment

  • 17

    Supervised Learning Classification for Predictive Modeling

    Unsupervised

    Learning

    Develop predictive model based on both input and output data

    Machine

    Learning

    Supervised

    Learning

    Decision Tree

    Ensemble

    Method

    Neural Network

    Support Vector

    Machine

    Classification

  • 18

    Classification Overview

    What is classification?

    Predicting the best group for each point

    Learns from labeled observations

    Uses input features

    Why use classification?

    Accurately group data never seen before

    How is classification done?

    Can use several algorithms to build a predictive model

    Good training data is critical

    -0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Group1

    Group2

    Group3

    Group4

    Group5

    Group6

    Group7

    Group8

  • 19

    Classification - Decision Trees

    Builds a tree from training data

    Model is a tree where each node is a

    logical test on a predictor

    Traverse tree by comparing

    features with threshold values

    The leaf of the tree

    specifies the group

    Statistics Toolbox

  • 20

    Classification - Ensemble Learners Overview

    Decision trees are weak

Recommended

View more >