logistic regression

Click here to load reader

Post on 09-Apr-2016

16 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Logistic regression

TRANSCRIPT

  • Discriminative classifier and

    logistic regression

    Machine Learning

    CS 7641,CSE/ISYE 6740, Fall 2015

    Le Song

  • Classification

    Represent the data

    A label is provided for each data point, eg., 1,1

    Classifier

    2

  • Boys vs. Girls (demo)

    3

  • How to come up with decision boundary

    Given class conditional distribution: 1 , | 1, and class prior: 1 , 1

    4

    1 ; ,

    1 ; ,

    ?

  • Use Bayes rule

    ,

    ,

    5

    posterior

    likelihood Prior

    normalization constant

  • Bayes Decision Rule

    Learning: prior: ,class conditional distribution :

    The poster probability of a test point

    Bayes decision rule:

    If , then , otherwise

    Alternatively:

    If ratio |!" |!" !" !" , then , otherwise

    Or look at the log-likelihood ratio h x ln '( ') 6

  • More on Bayes error of Bayes rule

    Bayes error is the lower bound of probability of classification

    error

    Bayes decision rule is the theoretically best classifier that

    minimize probability of classification error

    However, computing Bayes error or Bayes decision rule is in

    general a very complex problem. Why?

    Need density estimation

    Need to do integral, eg. * 1 1+, -

    7

  • What do people do in practice?

    Use simplifying assumption for 1Assume 1 is Gaussian, !, !Assume 1 is fully factorized

    Use geometric intuitions

    k-nearest neighbor classifier

    Support vector machine

    Directly go for the decision boundary h x ln '( ') Logistic regression

    Neural networks

    8

  • Nave Bayes Classifier

    Use Bayes decision rule for classification

    But assume 1 is fully factorized

    1 .| 1/

    "

    Or the variables corresponding to each dimension of the data

    are independent given the label

    9

  • Nave Bayes classifier is a generative model

    Once you have the model, you can generate sample from it:

    For each data point :Sample a label, 1,2 , with according to the class prior Sample the value of from class conditional

    Nave Bayes: conditioned on , generate first dimension , second dimension , ., independently

    Difference from mixture of Gaussian models

    Purpose is different (density estimation vs. classification)

    Data different (with/without labels)

    Learning different (em/or not)

    10

    1

    label

    dimensions

  • K- nearest neighbors

    k-nearest neighbor classifier: assign a label by taking a majority vote over the 2 training points closest to

    For 3 4 1 , the k-nearest neighbor rule generalizes the nearest neighbor rule

    To define this more mathematically:

    I6 indices of the 2 training points closest to .If 71, then we can write the 2-nearest neighbor classifier as:

    86 9:; < =>

    11

  • Example

    12

    K = 1

  • Example

    13

    K = 3

  • Example

    14

    K = 5

  • Example

    15

    K = 25

  • Example

    16

    K = 51

  • Example

    17

    K = 101

  • Computations in K-NN

    Similar to KDE, essentially no training or learning phase,

    computation is needed when applying the classifier

    Memory: [email protected]

    Finding the nearest neighbors out of a set of millions of examples

    is still pretty hard

    Test computation [email protected]

    Use smart data structures and algorithms to index training data

    Memory: ? @-Training computation: ? @ [email protected] computation: [email protected], Ball tree, Cover tree

    18

  • Discriminative classifier

    Directly estimate decision boundary h x ln '( ') or posterior distribution |

    Logistic regression, Neural networks

    Do not estimate | and

    C or 8 : 1 is a function of , and does not have probabilistic meaning for , hence can not be used to sample data points

    Why discriminative classifier?

    Avoid difficult density estimation problem

    Empirically achieve better classification results

    19

  • What is logistic regression model

    Assume that the posterior distribution 1 take a particular form

    1 , E 11 expEH

    Logistic function 8 I JKLM NO

    20

  • Learning parameters in logistic regression

    Find E, such that the conditional likelihood of the labels is maximized

    maxR E : log. , E

    S

    "

    Good news: E is concave function of E, and there is a single global optimum.

    Bad new: no closed form solution (resort to numerical method)21

  • The objective function Elogistic regression model

    1 , E 11 expEH

    Note that

    0 , E 1 11 exp EH exp EH

    1 exp EH

    Plug in

    E : log. , ES

    "< 1

    EH log1 exp EH

    22

  • The gradient of E E : log. , E

    S

    "< 1

    EH log1 exp EH

    Gradient

    UEUE