neural networks - carnegie mellon school of computer …aarti/class/10701/slides/lecture22.pdf ·...

27
Neural Networks Aarti Singh Machine Learning 10-701/15-781 Nov 29, 2010 Slides Courtesy: Tom Mitchell 1

Upload: volien

Post on 06-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Neural Networks

Aarti Singh

Machine Learning 10-701/15-781

Nov 29, 2010

Slides Courtesy: Tom Mitchell

1

Page 2: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Logistic Regression

2

Assumes the following functional form for P(Y|X):

Logistic

function

(or Sigmoid):

Logistic function applied to a linearfunction of the data

z

logit

(z)

Features can be discrete or continuous!

Page 3: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Logistic Regression is a Linear Classifier!

3

Assumes the following functional form for P(Y|X):

Decision boundary:

(Linear Decision Boundary)

1

1

Page 4: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Training Logistic Regression

4

How to learn the parameters w0, w1, … wd?

Training Data

Maximum (Conditional) Likelihood Estimates

Discriminative philosophy – Don’t waste effort learning P(X), focus on P(Y|X) – that’s all that matters for classification!

Page 5: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Optimizing concave/convex function

5

• Conditional likelihood for Logistic Regression is concave

• Maximum of a concave function = minimum of a convex function

Gradient Ascent (concave)/ Gradient Descent (convex)

Gradient:

Learning rate, >0Update rule:

Page 6: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Logistic Regression as a Graph

Sigmoid Unit

d

d

d

Page 7: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Neural Networks to learn f: X Y

• f can be a non-linear function• X (vector of) continuous and/or discrete variables

• Y (vector of) continuous and/or discrete variables

• Neural networks - Represent f by network of logistic/sigmoid units, we will focus on feedforward networks:

Input layer, X

Output layer, Y

Hidden layer, H

Sigmoid Unit

Page 8: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 9: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 10: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 11: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

/activation function (also linear, threshold)

Differentiable

d

d

d

Page 12: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Forward Propagation for prediction

Sigmoid unit:

1-Hidden layer,

1 output NN:

Prediction – Given neural network (hidden units and weights), use it to predict

the label of a test point

Forward Propagation –

Start from input layer

For each subsequent layer, compute output of sigmoid unit

oh

Page 13: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

• Consider regression problem f:XY , for scalar Y

y = f(x) + assume noise N(0,), iid

deterministic

M(C)LE Training for Neural Networks

Learned

neural network

• Let’s maximize the conditional data likelihood

Page 14: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

• Consider regression problem f:XY , for scalar Y

y = f(x) + noise N(0,σ)

deterministic

MAP Training for Neural Networks

Gaussian P(W) = N(0,σI)

ln P(W) c i wi2

Page 15: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

d

E – Mean Square Error

For Neural Networks,

E[w] no longer convex in w

Page 16: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Using all training data D

l

lly

l

l

l

llyl

Page 17: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Error Gradient for a Sigmoid Unit

l

l

l

l

l

l

l l

ll

l

l ll

l

ly

ly

ly ly

ly

ly

l

l

l

ll l

l ll

l

l l l lly

Sigmoid Unit

d

d

d

Page 18: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Error Gradient for 1-Hidden layer, 1-output neural network

see Notes.pdf

Page 19: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

(MLE)

l l l llyk

l l l l

l

l l lo

Using Forward propagation

yk = target output (label)

of output unit k

ok(h) = unit output

(obtained by forward

propagation)

wij = wt from i to j

Note: if i is input variable,

oi = xi

Page 20: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Objective/Error no

longer convex in

weights

Page 21: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 22: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Regularization – train neural network by maximize M(C)AP

Early stopping

Regulate # hidden units – prevents overly complex models≡ dimensionality reduction

How to avoid overfitting?

Page 23: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 24: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 25: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic
Page 26: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

w0

left strt right up

Page 27: Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf · Logistic Regression 2 Assumes the following functional form for P(Y|X): Logistic

Artificial Neural Networks: Summary

• Actively used to model distributed computation in brain

• Highly non-linear regression/classification

• Vector-valued inputs and outputs

• Potentially millions of parameters to estimate - overfitting

• Hidden layers learn intermediate representations – how many to use?

• Prediction – Forward propagation

• Gradient descent (Back-propagation), local minima problems

• Mostly obsolete – kernel tricks are more popular, but coming back in new form as deep belief networks (probabilistic interpretation)