optimization methods morten nielsen department of systems biology , dtu

Optimization methods

Morten NielsenDepartment of Systems biology,

*Adapted from slides by Chen Kaeasar, Ben-Gurion University

The path to the closest local minimum = local minimization

Minimization

The path to the closest local minimum = local minimization

Minimization

The path to the global minimum

Minimization

Outline

• Optimization procedures – Gradient descent– Monte Carlo

• Overfitting – cross-validation

• Method evaluation

Linear methods. Error estimate

I1 I2w1 w2

Linear function

Gradient descent (from wekipedia)

Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if

for > 0 a small enough number, then F(b)<F(a)

Gradient descent (example)

Gradient descent

Weights are changed in the opposite direction of the gradient of the error

Gradient descent (Linear function)

I1 I2w1 w2

Linear function

Gradient descent

I1 I2w1 w2

Linear function

Gradient descent. Example

I1 I2w1 w2

Linear function

Gradient descent. Example

I1 I2w1 w2

Linear function

Gradient descent. Doing it your selfWeights are changed in the opposite direction of the gradient of the error

W1=0.1 W2=0.1

Linear function

What are the weights after 2 forward (calculate predictions) and backward (update weights) iterations with the given input, and has the error decrease (use =0.1, and t=1)?

Fill out the table

itr W1 W2 O

0 0.1 0.1

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

W1=0.1 W2=0.1

Linear function

Fill out the table

itr W1 W2 O

0 0.1 0.1 0.1

1 0.19 0.1 0.19

2 0.27 0.1 0.27

What are the weights after 2 forward/backward iterations with the given input, and has the error decrease (use =0.1, t=1)?

W1=0.1 W2=0.1

Linear function

Monte Carlo

Because of their reliance on repeated computation of random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is unfeasible or impossible to compute an exact result with a deterministic algorithmOr when you are too stupid to do the math yourself?

Example: Estimating Π by Independent

Monte-Carlo SamplesSuppose we throw darts randomly (and uniformly) at the square:

Algorithm:For i=[1..ntrials] x = (random# in [0..r]) y = (random# in [0..r]) distance = sqrt (x^2 + y^2) if distance ≤ r hits++EndOutput:

Adapted from course slides by Craig Douglas

http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html

Estimating P

After a long run, we want to find low-energy conformations, with high probability

Sampling Protein Conformations with MCMC(Markov Chain Monte Carlo)

Protein image taken from Chemical Biology, 2006

Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation with a “certain”

probability

But how?

A (physically) natural* choice is the Boltzman distribution, proportional to:

Ei = energy of state ikB = Boltzman constantT = temperatureZ = “Partition Function”

* In theory, the Boltzman distribution is a bit problematic in non-gas phase, but never mind that for now…

Slides adapted from Barak Raveh

The Metropolis-Hastings Criterion

• Boltzman Distribution:

• The energy score and temperature are computed (quite) easily• The “only” problem is calculating Z (the “partition function”) –

this requires summing over all states.• Metropolis showed that MCMC will converge to the true

Boltzman distribution, if we accept a new proposal with

probability

"Equations of State Calculations by Fast Computing Machines“ – Metropolis, N. et al. Journal of Chemical Physics (1953)

Ze TkE Bi

If we run till infinity, with good perturbations, we will visit every conformation according to the Boltzman distribution

Sampling Protein Conformations with Metropolis-Hastings MCMC

Protein image taken from Chemical Biology, 2006

Markov-Chain Monte-Carlo (MCMC) with “proposals”:1. Perturb Structure to create a “proposal”2. Accept or reject new conformation by the Metropolis criterion3. Repeat for many iterations

But we just want to find the energy minimum. If we do our perturbations in a smart manner, we can still cover relevant (realistic, low-energy) parts of the search space

Monte Carlo (Minimization)

dE<0dE>0

The Traveling Salesman

Adapted from www.mpp.mpg.de/~caldwell/ss11/ExtraTS.pdf

Gibbs sampler. Monte Carlo simulations RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK GFKGEQGPKGEPDVFKELKVHHANENI SRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

E1 = 5.4 E2 = 5.7

E2 = 5.2

dE>0; Paccept =1

dE<0; 0 < Paccept < 1

Note the sign. Maximization

Monte Carlo Temperature

• What is the Monte Carlo temperature?

• Say dE=-0.2, T=1

• T=0.001

MC minimization

Monte Carlo - Examples

• Why a temperature?

Local minima

optimization methods morten nielsen department of systems biology , dtu

closest local minimum

chen kaeasar

bengurion universitythe

global minimum

wekipediagradient descent

linear methods

realvalued function

negative gradient of

Documents

protein fold recognition morten nielsen, thomas nordahl cbs,...

morten granzau nielsen, dansk industri

hidden markov models, hmm’s morten nielsen, cbs, biosys,...

blosum matrices what are they? morten nielsen biosys, dtu

artificial neural networks 2 morten nielsen biosys, dtu

optimization methods morten nielsen department of systems...

optimization methods morten nielsen department of systems...

characterizing receptor ligand interactions morten nielsen,...

dealing with sequence redundancy morten nielsen department...

hidden markov models, hmm’s morten nielsen department of...

performance measures morten nielsen, cbs, department of...

predicting peptide mhc interactions morten nielsen, cbs,...

artificial neural networks 2 morten nielsen depertment of...

protein fold recognition morten nielsen, cbs, department of...

artificial neural networks 1 morten nielsen department of...

predicting peptide mhc interactions morten nielsen, cbs,...

algorithms in bioinformatics morten nielsen biosys, dtu

morten hjortbøl dtu kandidatspeciale

sequence alignment algorithms morten nielsen department of...

cross validation, training and evaluation of data driven...