uva cs 4501: machine learning lecture 13: logistic regression logistic regression models are...

Download UVA CS 4501: Machine Learning Lecture 13: Logistic Regression Logistic regression models are appropriate

Post on 08-Jul-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • UVA CS 4501: Machine Learning

    Lecture 13: Logistic Regression

    Dr. Yanjun Qi

    University of Virginia 
 Department of
Computer Science

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    1

  • Where are we ? è Five major sections of this course

    q Regression (supervised) q Classification (supervised) q Unsupervised models q Learning theory q Graphical models

    3/23/18 2

    Dr. Yanjun Qi / UVA CS / s18

  • Where are we ? è Three major sections for classification

    • We can divide the large variety of classification approaches into roughly three major types

    1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisionTree

    2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks

    3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

    3/23/18 3

    Dr. Yanjun Qi / UVA CS / s18

  • Today

    q Bayes Classifier q Logistic Regression q Binary to multi-class q Training LG by MLE

    Dr. Yanjun Qi / UVA CS / s18

    3/23/18 4

  • • Treat each feature attribute and the class label as random variables.

    • Given a sample x with attributes ( x1, x2, … , xp ): – Goal is to predict its class C. – Specifically, we want to find the value of Ci that maximizes

    p( Ci | x1, x2, … , xp ).

    • Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

    Bayes classifiers

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    5

  • • Treat each feature attribute and the class label as random variables.

    • Given a sample x with attributes ( x1, x2, … , xp ): – Goal is to predict its class c. – Specifically, we want to find the class that maximizes

    p( c | x1, x2, … , xp ).

    • Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

    Bayes classifiers

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    6

    MAP Rule

  • Bayes Classifiers – MAP Rule

    Task: Classify a new instance X based on a tuple of attribute values into one of the classesX = X1,X2,…,Xp

    cMAP = argmax cj∈C

    P(cj | x1, x2,…, xp )

    7

    MAP = Maximum Aposteriori Probability

    MAP Rule

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    Adapt From Carols’ prob tutorial

    Please read the L13Extra slides for WHY

  • A Dataset for classification

    • Data/points/instances/examples/samples/records: [ rows ] • Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [ columns, except the last] • Target/outcome/response/label/dependent variable: special column to be predicted [ last column ]

    3/23/18 8

    Output as Discrete Class Label

    C1, C2, …, CL

    C

    C Dr. Yanjun Qi / UVA CS / s18

    Discriminative argmax

    c∈C P(c |X) C = {c1 ,⋅⋅⋅,cL}

  • Establishing a probabilistic model for classification

    – Discriminative model

    x = (x1 ,x2 ,⋅⋅⋅,xp)

    Discriminative Probabilistic Classifier

    1x 2x xp

    )|( 1 xcP )|( 2 xcP )|( xLcP

    •••

    •••

    3/23/18 Adapt from Prof. KeChen NB slides

    9

    Dr. Yanjun Qi / UVA CS / s18

    argmax

    c∈C P(c |X), C = {c1 ,⋅⋅⋅,cL}

  • A Dataset for classification

    • Data/points/instances/examples/samples/records: [ rows ] • Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [ columns, except the last] • Target/outcome/response/label/dependent variable: special column to be predicted [ last column ]

    3/23/18 10

    Output as Discrete Class Label

    C1, C2, …, CL

    C

    C

    Generative

    Dr. Yanjun Qi / UVA CS / s18

    argmax

    c∈C P(c |X )= argmax

    c∈C P(X ,c)= argmax

    c∈C P(X |c)P(c)

    argmax

    c∈C P(c |X) C = {c1 ,⋅⋅⋅,cL}Discriminative

    Later!

  • Today

    q Bayes Classifier q Logistic Regression q Binary to multi-class q Training LG by MLE

    Dr. Yanjun Qi / UVA CS / s18

    3/23/18 11

  • Multivariate linear regression to Logistic Regression

    y = β0 +β1x1 +β2x2 + ...+βpxp

    Logistic regression for binary classification

    ln P( y |x)

    1−P( y x) ⎡

    ⎣ ⎢ ⎢

    ⎦ ⎥ ⎥ = β0 +β1x1 +β2x2 + ...+βpxp

    3/23/18 12

    Dr. Yanjun Qi / UVA CS / s18

  • Dr. Yanjun Qi / UVA CS / s18

    Review: Probabilistic Interpretation of Linear Regression

    • Let us assume that the target variable and the inputs are related by the equation:

    where ε is an error term of unmodeled effects or random noise

    • Now assume that ε follows a Gaussian N(0,σ), then we have:

    • By IID assumption:

    ii T

    iy εθ += x

    ⎟⎟⎠

    ⎞ ⎜⎜⎝

    ⎛ −−= 2 2

    22 1

    σ θ

    σπ θ )(exp);|( i

    T i

    ii yxyp x

    ⎟ ⎟

    ⎞ ⎜ ⎜

    ⎛ − −⎟

    ⎠ ⎞⎜

    ⎝ ⎛== ∑∏ =

    = 2

    1 2

    1 22 1

    σ θ

    σπ θθ

    n

    i i T

    i nn

    i ii

    y xypL

    )( exp);|()(

    x

    3/23/18 13

  • Logistic Regression p(y|x)

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    14

    P(y x)= e

    β0+β1x1+β2x2+...+βpxp

    1+eβ0+β1x1+β2x2+...+βpxp = 1 1+e−βT X

    ln P( y |x)

    1−P( y x) ⎡

    ⎣ ⎢ ⎢

    ⎦ ⎥ ⎥ = β0 +β1x1 +β2x2 + ...+βpxp

  • Logistic Regression models a linear classification boundary!

    3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    15

    !!y∈{0,1}

    ln P( y |x)

    1−P( y x) ⎡

    ⎣ ⎢ ⎢

    ⎦ ⎥ ⎥ = β0 +β1x1 +β2x2 + ...+βpxp

  • 3/23/18

    Dr. Yanjun Qi / UVA CS / s18

    16

    !!y∈{0,1}

    (2) p(y|x)

    !! ln P( y |x)

    1−P( y x) ⎡

    ⎣ ⎢ ⎢

    ⎦ ⎥ ⎥ =α +β1x1 +β2x2 + ...+βpxp

    0.5

    0.5

    Logistic Regression models a linear classification boundary!

  • The logistic function (1) -- is a common "S" shape function

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0 e.g. Probability of disease

    x

    βxα

    βxα

    e1 e)xP(y +

    +

    + =

    P (Y=1|X)

    3/23/18 17

    Dr. Yanjun Qi / UVA CS / s18

  • The logistic function (1) -- is a common "S" shape function

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0 e.g. Probability of disease

    x

    βxα

    βxα

    e1 e)xP(y +

    +

    + =

    P (Y=1|X)

    3/23/18 18

    Dr. Yanjun Qi / UVA CS / s18

  • Logistic Regression—when?

    Logistic regression models are appropriate for target variable coded as 0/1.

    We only observe “0” and “1” for the target variable—but we think of the target variable conceptually as a probability that “1”will occur.

    3/23/18 19

    Dr. Yanjun Qi / UVA CS / s18

    19 P(x) 1-p(x)

  • Logistic Regression—when?

    Logistic regression models are appropriate for target variable coded as 0/1.

    We only observe “0” and “1” for the target variable—but we think of the target variable conceptually as a probability that “1”will occur.

    This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x) predefined.

    The main interest è predicting the probability that an event occurs (i.e., the probability that p(y=1|x) ).

    3/23/18 20

    Dr. Yanjun Qi / UVA CS / s18

  • ln ( ) ( )

    P y x P y x

    x 1− ⎡

    ⎣ ⎢

    ⎦ ⎥ = +α β

    The logit function View

    Logit of P(y|x)

    { P y x e

    e

    x

    x( ) = +

    +

    +

    α β

    α β1

    3/23/18 21

    Dr. Yanjun Qi / UVA CS / s18

    ln P( y =1|x)

    P( y =0|x) ⎡

    ⎣ ⎢

    ⎦ ⎥ = ln

    P( y =1|x) 1−P( y =1|x) ⎡

    ⎣ ⎢

    ⎦ ⎥ =α +β1x1 +β2x2 + ...+βpxp

    Logit function

    Decision Boundary è equals to zero

  • Logistic Regression Assumptions

    • Linearity in the logit – the regression equation should have a linear relationship with the logit form of the target variable

    • There is no assumption about the feature variables / target predictors being linearly related to each other.

    3/23/18 22

    Dr. Yanjun Qi / UVA CS / s18

  • Today

    q Bayes Classifier q Logistic Regression q Binary to multi-class q Training LG by MLE

    Dr. Yanjun Qi / UVA CS / s18

    3/23/18 23

  • Binary Logistic Regression In summary that the logistic regression tells us two things at once. • Transformed, the “log odds” (logit) are linear.

View more >