ecs171: machine learningchohsieh/teaching/ecs171... · ecs171: machine learning lecture 4:...

45
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Upload: others

Post on 20-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

ECS171: Machine LearningLecture 4: Optimization (LFD 3.3, SGD)

Cho-Jui HsiehUC Davis

Jan 22, 2018

Page 2: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Gradient descent

Page 3: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Optimization

Goal: find the minimizer of a function

minw

f (w)

For now we assume f is twice differentiable

Machine learning algorithm: find the hypothesis that minimizes Ein

Page 4: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Convex vs Nonconvex

Convex function:∇f (w∗) = 0⇔ w∗ is global minimumA function is convex if ∇2f (w) is positive definiteExample: linear regression, logistic regression, · · ·

Non-convex function:∇f (x) = 0⇔ Global min, local min, or saddle point

most algorithms only converge to gradient= 0Example: neural network, · · ·

Page 5: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Convex vs Nonconvex

Convex function:∇f (w∗) = 0⇔ w∗ is global minimumA function is convex if ∇2f (w) is positive definiteExample: linear regression, logistic regression, · · ·

Non-convex function:∇f (w∗) = 0⇔ w∗ is Global min, local min, or saddle point

most algorithms only converge to gradient= 0Example: neural network, · · ·

Page 6: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Gradient Descent

Gradient descent: repeatedly do

w t+1 ← w t − α∇f (w t)

α > 0 is the step size

Generate the sequence w1,w2, · · ·converge to minimum solution (∇f (w) = 0)

Step size too large ⇒ diverge; too small ⇒ slow convergence

Page 7: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Gradient Descent

Gradient descent: repeatedly do

w t+1 ← w t − α∇f (w t)

α > 0 is the step size

Generate the sequence w1,w2, · · ·converge to minimum solution (∇f (w) = 0)

Step size too large ⇒ diverge; too small ⇒ slow convergence

Page 8: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Gradient Descent

Gradient descent: repeatedly do

w t+1 ← w t − α∇f (w t)

α > 0 is the step size

Generate the sequence w1,w2, · · ·converge to minimum solution (∇f (w) = 0)

Step size too large ⇒ diverge; too small ⇒ slow convergence

Page 9: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Why gradient descent?

Reason I: Gradient is the “steepest direction” to decrease the objectivefunction locally

Reason II: successive approximation view

At each iteration, form an approximation function of f (·):

f (w t + d ) ≈ g(d ) := f (w t) +∇f (w t)Td +1

2α‖d‖2

Update solution by w t+1 ← w t + d ∗

d ∗ = arg mind g(d )

∇g(d ∗) = 0⇒ ∇f (w t) + 1αd ∗ = 0⇒ d ∗ = −α∇f (w t)

d ∗ will decrease f (·) if α (step size) is sufficiently small

Page 10: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Why gradient descent?

Reason I: Gradient is the “steepest direction” to decrease the objectivefunction locally

Reason II: successive approximation view

At each iteration, form an approximation function of f (·):

f (w t + d ) ≈ g(d ) := f (w t) +∇f (w t)Td +1

2α‖d‖2

Update solution by w t+1 ← w t + d ∗

d ∗ = arg mind g(d )

∇g(d ∗) = 0⇒ ∇f (w t) + 1αd ∗ = 0⇒ d ∗ = −α∇f (w t)

d ∗ will decrease f (·) if α (step size) is sufficiently small

Page 11: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Why gradient descent?

Reason I: Gradient is the “steepest direction” to decrease the objectivefunction locally

Reason II: successive approximation view

At each iteration, form an approximation function of f (·):

f (w t + d ) ≈ g(d ) := f (w t) +∇f (w t)Td +1

2α‖d‖2

Update solution by w t+1 ← w t + d ∗

d ∗ = arg mind g(d )

∇g(d ∗) = 0⇒ ∇f (w t) + 1αd ∗ = 0⇒ d ∗ = −α∇f (w t)

d ∗ will decrease f (·) if α (step size) is sufficiently small

Page 12: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Page 13: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Form a quadratic approximation

f (w t + d ) ≈ g(d ) = f (w t) +∇f (w t)Td +1

2α‖d‖2

Page 14: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Minimize g(d ):

∇g(d ∗) = 0⇒ ∇f (w t) +1

αd ∗ = 0⇒ d ∗ = −α∇f (w t)

Page 15: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Updatew t+1 = w t + d ∗ = w t−α∇f (w t)

Page 16: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Form another quadratic approximation

f (w t+1 + d ) ≈ g(d ) = f (w t+1) +∇f (w t+1)Td +1

2α‖d‖2

d ∗ = −α∇f (w t+1)

Page 17: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Illustration of gradient descent

Updatew t+2 = w t+1 + d ∗ = w t+1−α∇f (w t+1)

Page 18: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

When will it diverge?

Can diverge (f (w t)<f (w t+1)) if g is not an upperbound of f

Page 19: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

When will it converge?

Always converge (f (w t)>f (w t+1)) when g is an upperbound of f

Page 20: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Convergence

Let L be the Lipchitz constant

(∇2f (x) � LI for all x)

Theorem: gradient descent converges if α < 1L

In practice, we do not know L · · ·need to tune step size when running gradient descent

Page 21: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Convergence

Let L be the Lipchitz constant

(∇2f (x) � LI for all x)

Theorem: gradient descent converges if α < 1L

In practice, we do not know L · · ·need to tune step size when running gradient descent

Page 22: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Applying to Logistic regression

gradient descent for logistic regression

Initialize the weights w0

For t = 1, 2, · · ·Compute the gradient

∇Ein = − 1

N

N∑n=1

ynxn1 + eynwT xn

Update the weights: w ← w − η∇Ein

Return the final weights w

When to stop?

Fixed number of iterations, or

Stop when ‖∇Ein‖ < ε

Page 23: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Applying to Logistic regression

gradient descent for logistic regression

Initialize the weights w0

For t = 1, 2, · · ·Compute the gradient

∇Ein = − 1

N

N∑n=1

ynxn1 + eynwT xn

Update the weights: w ← w − η∇Ein

Return the final weights w

When to stop?

Fixed number of iterations, or

Stop when ‖∇Ein‖ < ε

Page 24: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic Gradient descent

Page 25: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Large-scale Problems

Machine learning: usually minimizing the in-sample loss (training loss)

minw{ 1

N

N∑n=1

`(wTxn, yn)} := Ein(w) (linear model)

minw{ 1

N

N∑n=1

`(hw (xn), yn)} := Ein(w) (general hypothesis)

`: loss function (e.g., `(a, b) = (a− b)2)

Gradient descent:w ← w − η ∇Ein(w)︸ ︷︷ ︸

Main computation

In general, Ein(w) = 1N

∑Nn=1 fn(w),

each fn(w) only depends on (xn, yn)

Page 26: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Large-scale Problems

Machine learning: usually minimizing the in-sample loss (training loss)

minw{ 1

N

N∑n=1

`(wTxn, yn)} := Ein(w) (linear model)

minw{ 1

N

N∑n=1

`(hw (xn), yn)} := Ein(w) (general hypothesis)

`: loss function (e.g., `(a, b) = (a− b)2)

Gradient descent:w ← w − η ∇Ein(w)︸ ︷︷ ︸

Main computation

In general, Ein(w) = 1N

∑Nn=1 fn(w),

each fn(w) only depends on (xn, yn)

Page 27: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient

Gradient:

∇Ein(w) =1

N

N∑n=1

∇fn(w)

Each gradient computation needs to go through all training samples

slow when millions of samples

Faster way to compute “approximate gradient”?

Use stochastic sampling:

Sample a small subset B ⊆ {1, · · · ,N}Estimate gradient

∇Ein(w) ≈ 1

|B|∑n∈B

∇fn(w)

|B|: batch size

Page 28: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient

Gradient:

∇Ein(w) =1

N

N∑n=1

∇fn(w)

Each gradient computation needs to go through all training samples

slow when millions of samples

Faster way to compute “approximate gradient”?

Use stochastic sampling:

Sample a small subset B ⊆ {1, · · · ,N}Estimate gradient

∇Ein(w) ≈ 1

|B|∑n∈B

∇fn(w)

|B|: batch size

Page 29: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

Stochastic Gradient Descent (SGD)

Input: training data {xn, yn}Nn=1

Initialize w (zero or random)

For t = 1, 2, · · ·Sample a small batch B ⊆ {1, · · · ,N}Update parameter

w ← w − ηt 1

|B|∑n∈B

∇fn(w)

Extreme case: |B| = 1 ⇒ Sample one training data at a time

Page 30: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

Stochastic Gradient Descent (SGD)

Input: training data {xn, yn}Nn=1

Initialize w (zero or random)

For t = 1, 2, · · ·Sample a small batch B ⊆ {1, · · · ,N}Update parameter

w ← w − ηt 1

|B|∑n∈B

∇fn(w)

Extreme case: |B| = 1 ⇒ Sample one training data at a time

Page 31: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Logistic Regression by SGD

Logistic regression:

minw

1

N

N∑n=1

log(1 + e−ynwT xn)︸ ︷︷ ︸

fn(w)

SGD for Logistic Regression

Input: training data {xn, yn}Nn=1

Initialize w (zero or random)

For t = 1, 2, · · ·Sample a batch B ⊆ {1, · · · ,N}Update parameter

w ← w − ηt 1

|B|∑i∈B

−ynxn1 + eynwT xn︸ ︷︷ ︸∇fn(w)

Page 32: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Why SGD works?

Stochastic gradient is an unbiased estimator of full gradient:

E [1

|B|∑n∈B∇fn(w)] =

1

N

N∑n=1

∇fn(w)

= ∇Ein(w)

Each iteration updated by

gradient + zero-mean noise

Page 33: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Why SGD works?

Stochastic gradient is an unbiased estimator of full gradient:

E [1

|B|∑n∈B∇fn(w)] =

1

N

N∑n=1

∇fn(w)

= ∇Ein(w)

Each iteration updated by

gradient + zero-mean noise

Page 34: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

In gradient descent, η (step size) is a fixed constant

Can we use fixed step size for SGD?

SGD with fixed step size cannot converge to global/local minimizers

If w∗ is the minimizer, ∇f (w∗) = 1N

∑Nn=1∇fn(w∗)=0,

but1

|B|∑n∈B∇fn(w∗)6=0 if B is a subset

(Even if we got minimizer, SGD will move away from it)

Page 35: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

In gradient descent, η (step size) is a fixed constant

Can we use fixed step size for SGD?

SGD with fixed step size cannot converge to global/local minimizers

If w∗ is the minimizer, ∇f (w∗) = 1N

∑Nn=1∇fn(w∗)=0,

but1

|B|∑n∈B∇fn(w∗)6=0 if B is a subset

(Even if we got minimizer, SGD will move away from it)

Page 36: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

In gradient descent, η (step size) is a fixed constant

Can we use fixed step size for SGD?

SGD with fixed step size cannot converge to global/local minimizers

If w∗ is the minimizer, ∇f (w∗) = 1N

∑Nn=1∇fn(w∗)=0,

but1

|B|∑n∈B∇fn(w∗)6=0 if B is a subset

(Even if we got minimizer, SGD will move away from it)

Page 37: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

In gradient descent, η (step size) is a fixed constant

Can we use fixed step size for SGD?

SGD with fixed step size cannot converge to global/local minimizers

If w∗ is the minimizer, ∇f (w∗) = 1N

∑Nn=1∇fn(w∗)=0,

but1

|B|∑n∈B∇fn(w∗)6=0 if B is a subset

(Even if we got minimizer, SGD will move away from it)

Page 38: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent

In gradient descent, η (step size) is a fixed constant

Can we use fixed step size for SGD?

SGD with fixed step size cannot converge to global/local minimizers

If w∗ is the minimizer, ∇f (w∗) = 1N

∑Nn=1∇fn(w∗)=0,

but1

|B|∑n∈B∇fn(w∗)6=0 if B is a subset

(Even if we got minimizer, SGD will move away from it)

Page 39: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent, step size

To make SGD converge:

Step size should decrease to 0

ηt → 0

Usually with polynomial rate: ηt ≈ t−a with constant a

Page 40: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Stochastic gradient descent vs Gradient descent

Stochastic gradient descent:

pros:cheaper computation per iterationfaster convergence in the beginning

cons:less stable, slower final convergencehard to tune step size

(Figure from https://medium.com/@ImadPhd/

gradient-descent-algorithm-and-its-variants-10f652806a3)

Page 41: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Revisit perceptron Learning Algorithm

Given a classification data {xn, yn}Nn=1

Learning a linear model:

minw

1

N

N∑n=1

`(wTxn, yn)

Consider the loss:

`(wTxn, yn) = max(0,−ynwTxn)

What’s the gradient?

Page 42: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Revisit perceptron Learning Algorithm

`(wTxn, yn) = max(0,−ynwTxn)

Consider two cases:

Case I: ynwTxn > 0 (prediction correct)

`(wTxn, yn) = 0∂∂w `(w

Txn, yn) = 0

Case II: ynwTxn < 0 (prediction wrong)

`(wTxn, yn) = −ynwTxn∂∂w `(w

Txn, yn) = −ynxnSGD update rule: Sample an index n

w t+1 ←

{w t if ynwTxn≥0 (predict correct)

w t + ηtynxn if ynwTxn<0 (predict wrong)

Equivalent to Perceptron Learning Algorithm when ηt = 1

Page 43: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Revisit perceptron Learning Algorithm

`(wTxn, yn) = max(0,−ynwTxn)

Consider two cases:

Case I: ynwTxn > 0 (prediction correct)

`(wTxn, yn) = 0∂∂w `(w

Txn, yn) = 0

Case II: ynwTxn < 0 (prediction wrong)

`(wTxn, yn) = −ynwTxn∂∂w `(w

Txn, yn) = −ynxn

SGD update rule: Sample an index n

w t+1 ←

{w t if ynwTxn≥0 (predict correct)

w t + ηtynxn if ynwTxn<0 (predict wrong)

Equivalent to Perceptron Learning Algorithm when ηt = 1

Page 44: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Revisit perceptron Learning Algorithm

`(wTxn, yn) = max(0,−ynwTxn)

Consider two cases:

Case I: ynwTxn > 0 (prediction correct)

`(wTxn, yn) = 0∂∂w `(w

Txn, yn) = 0

Case II: ynwTxn < 0 (prediction wrong)

`(wTxn, yn) = −ynwTxn∂∂w `(w

Txn, yn) = −ynxnSGD update rule: Sample an index n

w t+1 ←

{w t if ynwTxn≥0 (predict correct)

w t + ηtynxn if ynwTxn<0 (predict wrong)

Equivalent to Perceptron Learning Algorithm when ηt = 1

Page 45: ECS171: Machine Learningchohsieh/teaching/ECS171... · ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018

Conclusions

Gradient descent

Stochastic gradient descent

Next class: LFD 2

Questions?