Transcript
Page 1: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberSLIDES ADAPTED FROM HINRICH SCHÜTZE

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 1 of 5

Page 2: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

What are we talking about?

• Statistical classification: p(y |x)• y is typically a Bernoulli or multinomial outcome

• Classification uses: ad placement, spam detection

• Building block of other machine learning methods

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 2 of 5

Page 3: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression: Definition

• Weight vector βi• Observations Xi• “Bias” β0 (like intercept in linear regression)

P(Y = 0|X) =1

1+exp�

β0 +∑

i βiXi

� (1)

P(Y = 1|X) =exp

β0 +∑

i βiXi

1+exp�

β0 +∑

i βiXi

� (2)

• For shorthand, we’ll say that

P(Y = 0|X) =σ(−(β0 +∑

i

βiXi)) (3)

P(Y = 1|X) = 1−σ(−(β0 +∑

i

βiXi)) (4)

• Where σ(z) = 11+exp[−z]

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 3 of 5

Page 4: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

What’s this “exp” doing?

Exponential

Logistic

• exp [x] is shorthand for ex

• e is a special number, about 2.71828

◦ ex is the limit of compound interestformula as compounds becomeinfinitely small

◦ It’s the function whose derivative isitself

• The “logistic” function is σ(z) = 11+e−z

• Looks like an “S”

• Always between 0 and 1.

◦ Allows us to model probabilities◦ Different from linear regression

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 5

Page 5: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

What’s this “exp” doing?

Exponential

Logistic

• exp [x] is shorthand for ex

• e is a special number, about 2.71828

◦ ex is the limit of compound interestformula as compounds becomeinfinitely small

◦ It’s the function whose derivative isitself

• The “logistic” function is σ(z) = 11+e−z

• Looks like an “S”

• Always between 0 and 1.

◦ Allows us to model probabilities◦ Different from linear regression

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 5

Page 6: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 1: Empty Document?

X = {}

• P(Y = 0) = 11+exp [0.1] =

• P(Y = 1) = exp [0.1]1+exp [0.1] =

• Bias β0 encodes the priorprobability of a class

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 7: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 1: Empty Document?

X = {}

• P(Y = 0) = 11+exp [0.1] =

• P(Y = 1) = exp [0.1]1+exp [0.1] =

• Bias β0 encodes the priorprobability of a class

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 8: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 1: Empty Document?

X = {}

• P(Y = 0) = 11+exp [0.1] = 0.48

• P(Y = 1) = exp [0.1]1+exp [0.1] = 0.52

• Bias β0 encodes the priorprobability of a class

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 9: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 2

X = {Mother,Nigeria}

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 10: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 2

X = {Mother,Nigeria}

• P(Y = 0) = 11+exp [0.1−1.0+3.0] =

• P(Y = 1) = exp [0.1−1.0+3.0]1+exp [0.1−1.0+3.0] =

• Include bias, and sum the otherweights

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 11: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 2

X = {Mother,Nigeria}

• P(Y = 0) = 11+exp [0.1−1.0+3.0] =

0.11

• P(Y = 1) = exp [0.1−1.0+3.0]1+exp [0.1−1.0+3.0] =

0.88

• Include bias, and sum the otherweights

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 12: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 3

X = {Mother,Work,Viagra,Mother}

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 13: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 3

X = {Mother,Work,Viagra,Mother}

• P(Y = 0) =1

1+exp [0.1−1.0−0.5+2.0−1.0] =

• P(Y = 1) =exp [0.1−1.0−0.5+2.0−1.0]

1+exp [0.1−1.0−0.5+2.0−1.0] =

• Multiply feature presence byweight

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 14: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression Example

feature coefficient weightbias β0 0.1

“viagra” β1 2.0“mother” β2 −1.0“work” β3 −0.5

“nigeria” β4 3.0

• What does Y = 1 mean?

Example 3

X = {Mother,Work,Viagra,Mother}

• P(Y = 0) =1

1+exp [0.1−1.0−0.5+2.0−1.0] = 0.60

• P(Y = 1) =exp [0.1−1.0−0.5+2.0−1.0]

1+exp [0.1−1.0−0.5+2.0−1.0] = 0.30

• Multiply feature presence byweight

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 5

Page 15: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberABC

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 1 of 4

Page 16: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression: Objective Function

`≡ lnp(Y |X ,β) =∑

j

lnp(y(j) |x(j),β) (1)

=∑

j

y(j)�

β0 +∑

i

βix(j)i

− ln

1+exp

β0 +∑

i

βix(j)i

��

(2)

Training data (y ,x) are fixed. Objective function is a function of β . . . whatvalues of β give a good value.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 2 of 4

Page 17: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression: Objective Function

`≡ lnp(Y |X ,β) =∑

j

lnp(y(j) |x(j),β) (1)

=∑

j

y(j)�

β0 +∑

i

βix(j)i

− ln

1+exp

β0 +∑

i

βix(j)i

��

(2)

Training data (y ,x) are fixed. Objective function is a function of β . . . whatvalues of β give a good value.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 2 of 4

Page 18: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Convexity

• Convex function

• Doesn’t matter where you start, ifyou “walk up” objective

• Gradient!

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 3 of 4

Page 19: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Convexity

• Convex function

• Doesn’t matter where you start, ifyou “walk up” objective

• Gradient!

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 3 of 4

Page 20: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 21: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 22: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

0

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 23: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

0

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 24: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 25: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 26: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 27: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 28: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

3

UndiscoveredCountry

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 29: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient Ascent (non-convex)

Goal

Optimize log likelihood with respect to variables β

1

0

2

3

UndiscoveredCountry

Parameter

Objective

Luckily, (vanilla) logistic regression is convex

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 4

Page 30: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Logistic Regression

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberSLIDES ADAPTED FROM WILLIAM COHEN

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 1 of 10

Page 31: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient for Logistic Regression

To ease notation, let’s define

πi =expβT xi

1+expβT xi(1)

Our objective function is

`=∑

i

logp(yi |xi) =∑

i

`i =∑

i

¨

logπi if yi = 1

log(1−πi) if yi = 0(2)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 2 of 10

Page 32: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Taking the Derivative

Apply chain rule:

∂ `

∂ βj=∑

i

∂ `i( ~β)

∂ βj=∑

i

(

1πi

∂ πi∂ βj

if yi = 11

1−πi

− ∂ πi∂ βj

if yi = 0(3)

If we plug in the derivative,

∂ πi

∂ βj=πi(1−πi)xj , (4)

we can merge these two cases

∂ `i

∂ βj= (yi −πi)xj . (5)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 3 of 10

Page 33: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient for Logistic Regression

Gradient

∇β`( ~β) =�

∂ `( ~β)

∂ β0, . . . ,

∂ `( ~β)

∂ βn

(6)

Update

∆β ≡η∇β`( ~β) (7)

β ′i ←βi +η∂ `( ~β)

∂ βi(8)

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 10

Page 34: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient for Logistic Regression

Gradient

∇β`( ~β) =�

∂ `( ~β)

∂ β0, . . . ,

∂ `( ~β)

∂ βn

(6)

Update

∆β ≡η∇β`( ~β) (7)

β ′i ←βi +η∂ `( ~β)

∂ βi(8)

Why are we adding? What would well do if we wanted to do descent?

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 10

Page 35: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient for Logistic Regression

Gradient

∇β`( ~β) =�

∂ `( ~β)

∂ β0, . . . ,

∂ `( ~β)

∂ βn

(6)

Update

∆β ≡η∇β`( ~β) (7)

β ′i ←βi +η∂ `( ~β)

∂ βi(8)

η: step size, must be greater than zero

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 10

Page 36: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Gradient for Logistic Regression

Gradient

∇β`( ~β) =�

∂ `( ~β)

∂ β0, . . . ,

∂ `( ~β)

∂ βn

(6)

Update

∆β ≡η∇β`( ~β) (7)

β ′i ←βi +η∂ `( ~β)

∂ βi(8)

NB: Conjugate gradient is usually better, but harder to implement

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 4 of 10

Page 37: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Choosing Step Size

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 10

Page 38: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Choosing Step Size

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 10

Page 39: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Choosing Step Size

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 10

Page 40: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Choosing Step Size

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 10

Page 41: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Choosing Step Size

Parameter

Objective

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 5 of 10

Page 42: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Approximating the Gradient

• Our datasets are big (to fit into memory)

• . . . or data are changing / streaming

• Hard to compute true gradient

`(β)≡Ex [∇`(β ,x)] (9)

• Average over all observations

• What if we compute an update just from one observation?

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 6 of 10

Page 43: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Approximating the Gradient

• Our datasets are big (to fit into memory)

• . . . or data are changing / streaming

• Hard to compute true gradient

`(β)≡Ex [∇`(β ,x)] (9)

• Average over all observations

• What if we compute an update just from one observation?

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 6 of 10

Page 44: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Approximating the Gradient

• Our datasets are big (to fit into memory)

• . . . or data are changing / streaming

• Hard to compute true gradient

`(β)≡Ex [∇`(β ,x)] (9)

• Average over all observations

• What if we compute an update just from one observation?

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 6 of 10

Page 45: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Getting to Union Station

Pretend it’s a pre-smartphone world and you want to get to Union Station

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 7 of 10

Page 46: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Stochastic Gradient for Logistic Regression

Given a single observation xi chosen at random from the dataset,

βj ←β ′j +η [yi −πi ]xi ,j (10)

Examples in class.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 8 of 10

Page 47: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Stochastic Gradient for Logistic Regression

Given a single observation xi chosen at random from the dataset,

βj ←β ′j +η [yi −πi ]xi ,j (10)

Examples in class.

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 8 of 10

Page 48: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Algorithm

1 Initialize a vector B to be all zeros

2 For t = 1, . . . ,T

◦ For each example ~xi ,yi and feature j :

• Compute πi ≡ Pr(yi = 1 | ~xi)• Set β [j] =β [j]′+λ(yi −πi)xi

3 Output the parameters β1, . . . ,βd .

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 9 of 10

Page 49: Logistic Regression - University of Colorado Boulder · Logistic Regression INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber ... Undiscovered Country Parameter

Wrapup

• Logistic Regression: Regression for outputting Probabilities

• Intuitions similar to linear regression

• We’ll talk about feature engineering for both next time

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Logistic Regression | 10 of 10


Top Related