# logistic regression

Post on 09-Apr-2016

16 views

Category:

## Documents

Embed Size (px)

DESCRIPTION

Logistic regression

TRANSCRIPT

• Discriminative classifier and

logistic regression

Machine Learning

CS 7641,CSE/ISYE 6740, Fall 2015

Le Song

• Classification

Represent the data

A label is provided for each data point, eg., 1,1

Classifier

2

• Boys vs. Girls (demo)

3

• How to come up with decision boundary

Given class conditional distribution: 1 , | 1, and class prior: 1 , 1

4

1 ; ,

1 ; ,

?

• Use Bayes rule

,

,

5

posterior

likelihood Prior

normalization constant

• Bayes Decision Rule

Learning: prior: ,class conditional distribution :

The poster probability of a test point

Bayes decision rule:

If , then , otherwise

Alternatively:

If ratio |!" |!" !" !" , then , otherwise

Or look at the log-likelihood ratio h x ln '( ') 6

• More on Bayes error of Bayes rule

Bayes error is the lower bound of probability of classification

error

Bayes decision rule is the theoretically best classifier that

minimize probability of classification error

However, computing Bayes error or Bayes decision rule is in

general a very complex problem. Why?

Need density estimation

Need to do integral, eg. * 1 1+, -

7

• What do people do in practice?

Use simplifying assumption for 1Assume 1 is Gaussian, !, !Assume 1 is fully factorized

Use geometric intuitions

k-nearest neighbor classifier

Support vector machine

Directly go for the decision boundary h x ln '( ') Logistic regression

Neural networks

8

• Nave Bayes Classifier

Use Bayes decision rule for classification

But assume 1 is fully factorized

1 .| 1/

"

Or the variables corresponding to each dimension of the data

are independent given the label

9

• Nave Bayes classifier is a generative model

Once you have the model, you can generate sample from it:

For each data point :Sample a label, 1,2 , with according to the class prior Sample the value of from class conditional

Nave Bayes: conditioned on , generate first dimension , second dimension , ., independently

Difference from mixture of Gaussian models

Purpose is different (density estimation vs. classification)

Data different (with/without labels)

Learning different (em/or not)

10

1

label

dimensions

• K- nearest neighbors

k-nearest neighbor classifier: assign a label by taking a majority vote over the 2 training points closest to

For 3 4 1 , the k-nearest neighbor rule generalizes the nearest neighbor rule

To define this more mathematically:

I6 indices of the 2 training points closest to .If 71, then we can write the 2-nearest neighbor classifier as:

86 9:; < =>

11

• Example

12

K = 1

• Example

13

K = 3

• Example

14

K = 5

• Example

15

K = 25

• Example

16

K = 51

• Example

17

K = 101

• Computations in K-NN

Similar to KDE, essentially no training or learning phase,

computation is needed when applying the classifier

Memory: [email protected]

Finding the nearest neighbors out of a set of millions of examples

is still pretty hard

Test computation [email protected]

Use smart data structures and algorithms to index training data

Memory: ? @-Training computation: ? @ [email protected] computation: [email protected], Ball tree, Cover tree

18

• Discriminative classifier

Directly estimate decision boundary h x ln '( ') or posterior distribution |

Logistic regression, Neural networks

Do not estimate | and

C or 8 : 1 is a function of , and does not have probabilistic meaning for , hence can not be used to sample data points

Why discriminative classifier?

Avoid difficult density estimation problem

Empirically achieve better classification results

19

• What is logistic regression model

Assume that the posterior distribution 1 take a particular form

1 , E 11 expEH

Logistic function 8 I JKLM NO

20

• Learning parameters in logistic regression

Find E, such that the conditional likelihood of the labels is maximized

maxR E : log. , E

S

"

Good news: E is concave function of E, and there is a single global optimum.

Bad new: no closed form solution (resort to numerical method)21

• The objective function Elogistic regression model

1 , E 11 expEH

Note that

0 , E 1 11 exp EH exp EH

1 exp EH

Plug in

E : log. , ES

"< 1

EH log1 exp EH

22

• The gradient of E E : log. , E

S

"< 1

EH log1 exp EH