handouts: lecture notesdspace.mit.edu/bitstream/handle/1721.1/35748/15... · linear programming...

1

15.053 Thursday, May 16

Review of 15.053

Handouts: Lecture Notes

2

Overview of Problem Types

Dynamic programming

Nonlinear Programming

“Easy” Nonlinear Programming

“Hard” Nonlinear Programming

Linear Programming

Integer Programming

Network Flows

3

Overview of Problem TypesNonlinear Programming

“Easy” Nonlinear Programming

“Hard” Nonlinear Programming

Dynamic programming

Integer Programming

Linear Programming

Network Flows

4

Why the focus on linear programming?

Linear programming illustrates much of what is important about modeling.Linear programming is a very useful tool in optimization!We can solve linear programs very efficiently.The state-of-the-art integer programming techniques rely on linear programmingLinear Programming is the best way of teaching about performance guarantees and duality.Linear programming is very helpful for understanding other optimization approaches.

5

Topics through midterm 2Linear programming – Formulations– Geometry– The simplex algorithm– Sensitivity Analysis– Duality Theory

Network OptimizationInteger programming– formulations– B&B– Cutting planes

6

Topics covered in the Final Exam

Linear Programming FormulationsInteger Programming FormulationsNonlinear ProgrammingDynamic ProgrammingHeuristics

7

Rest of this lecture

A very brief overview of the topics covered since the 2nd midterm.Slides drawn from lecturesIf you have questions about the topics covered, ask them as I go along. I need to reserve time at the end for Sloan course evaluations.

8

What is a non-linear program?

maximize 3 sin x + xy + y3 - 3z + log zSubject to x2 + y2 = 1

x + 4z ≥ 2z ≥ 0

A non-linear program is permitted to have non-linear constraints or objectives. A linear program is a special case of non-linear programming!

9

Portfolio Selection Example

When trying to design a financial portfolio investors seek to simultaneously minimize risk and maximize return.Risk is often measured as the variance of the total return, a nonlinear function.FACT:

1 2

1

var( ) var( ) var( ) cov( , )

n

n i ji j

x x xx x x x

≠

+ + + =

+ + +∑K

K

10

Portfolio Selection (cont’d)

Two Methods are commonly used:

–Min Risk s.t. Expected Return ≥ Bound

–Max Expected Return - θ (Risk) where θ reflects the tradeoff between return and risk.

11

Regression, and estimating β

-60.00%

-40.00%

-20.00%

0.00%

20.00%

40.00%

60.00%

80.00%

-40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 80.00%Market

Stoc

k

The value β is the slope of the regression line. Here it is around .6 (lower expected gain than the market, and lower risk.)

Return on Stock A vs. Market Return

12

Local vs. Global Optima

Def’n: Let x be a feasible solution, then – x is a global max if f(x) ≥ f(y) for every feasible y.– x is a local max if f(x) ≥ f(y) for every feasible y

sufficiently close to x (i.e. xj-ε ≤ yj ≤ xj+ ε for all j and some small ε).

z

There may be several locally optimal solutions.

C

x1

max f(x)s.t. 0 ≤ x ≤ 1

A

Bz = f(x)

0

13

Convex FunctionsConvex Functions: f(λ y + (1- λ)z) ≤ λ f(y) + (1- λ)f(z)for every y and z and for 0≤ λ ≤1.e.g., f((y+z)/2) ≤ f(y)/2 + f(z)/2

We say “strict” convexity if sign is “<” for 0< λ <1.

Line joining any points is above the curvef(x)

x

x

x

y z(y+z)/2

x

14

Concave FunctionsConcave Functions: f(λ y + (1- λ)z) ≥ λ f(y) + (1- λ)f(z)for every y and z and for 0≤ λ ≤1.e.g., f((y+z)/2) ≥ f(y)/2 + f(z)/2

We say “strict” convexity if sign is “>” for 0< λ <1.

f(x)

(y+z)/2

Line joining any points is below the curve

x

x

x

z y

x

15

Convexity and Extreme Points

x

y

W

P

2 4 6 8 10 12 14

24

68

1012

14

We say that a set S is convex, if for every two points x and y in S, and for every real number λ in [0,1], λx + (1-λ)y ε S.

The feasible region of a linear program is convex.

We say that an element w ε S is an extreme point (vertex, corner point), if wis not the midpoint of any line segment contained in S.

15

16

Local Maximum (Minimum) Property

A local max of a concave function on a convex feasible region is also a global max.A local min of a convex function on a convex feasible region is also a global min.Strict convexity or concavity implies that the global optimum isunique.Given this, we can efficiently solve:

– Maximization Problems with a concave objective function and linear constraints

– Minimization Problems with a convex objective function and linear constraints

17

Where is the optimal solution?

0 2 4 6 8 10 12 14 16 180

2

4

6

8

10

12

14

16

18y

x

Note: the optimal solution is not at a corner point. It is where the isocontour first hits the feasible region.

18

Another example:

0 2 4 6 8 10 12 14 16 180

2

4

6

8

10

12

14

16

18y

x

Then the global unconstrained minimum is also feasible.

The optimal solution is not on the boundary of the feasible region.

Minimize

(x-8)2 + (y-8)2

19

Finding a local maximum using Fibonacci Search.

2113 261816 19

0 34

Length of search interval 342113853

Where the maximum may be

20

The search finds a local maximum, but not necessarily a global maximum.

2113 261816 19

0 34

21

Approximating a non-linear function of 1 variable: the λ method

y = x^3/3 + 2x - 5

-25

-20

-15

-10

-5

0

5

10

15

-3 -2.6-2.

2-1.

8-1.

4 -1 -0.6-0.

2 0.2 0.6 1 1.4 1.8 2.2 2.6 3

x

y

Choose different values of x to approximate the x-axis

Approximate using piecewise linear segments

22

More on the λ method

y = x^3/3 + 2x - 5

-25

-20

-15

-10

-5

0

5

10

15

-3 -2.6-2.

2-1.

8-1.

4 -1 -0.6-0.

2 0.2 0.6 1 1.4 1.8 2.2 2.6 3

x

y

a1 = -3, f(a1) = -20

a2 = -1 f(a2) = -7 1/3

Suppose that for –3 ≤ x ≤ -1,

we represent x has λ1 (-3) + λ2 (-1) where λ1 + λ2 = 1 and λ1, λ2 ≥ 0

Then we approximate f(x) as

λ1 (-20) + λ2 (-7 1/3)

23

Approximating a non-linear objective function for a minimization NLP.

original problem: minimize {f(y): y ∈ P}

Suppose that y = Σj λjaj ,where Σj λj = 1 and λ >= 0 .

Approximate f(y). minimize {Σj λjf(aj): Σj λjaj ∈ P}Note: when given a choice of representing y in alternative ways, the LP will choose one that leads to the least objective value for the approximation.

24

For minimizing a convex function, the λ-method automatically satisfies the additional adjacency property.

min z = λ1f(a1) + λ2f(a2) + λ3f(a3) + λ4f(a4) + λ5f(a5)s.t. λ1 + λ2 + λ3 + λ4 + λ5 = 1 ; λ ≥ 0

+ adjacency condition+ other constraints

a5a1 a2 a3 a4

25

Dynamic programming

Suppose that there are 50 matches on a table, and the person who picks up the last match wins. At each alternating turn, my opponent or I can pick up 1, 2 or 6 matches. Assuming that I go first, how can I be sure of winning the game?

26

Determining the strategy using DPn = number of matches left (n is the state/stage) g(n) = 1 if you can force a win at n matches. g(n) = 0 otherwise g(n) = optimal value function.

At each state/stage you can make one of three decisions: take 1, 2 or 6 matches.

g(1) = g(2) = g(6) = 1 (boundary conditions)g(3) = 0; g(4) = g(5) = 1. (why?)

The recursion:g(n) = 1 if g(n-1) = 0 or g(n-2) = 0 or g(n-6) = 0;g(n) = 0 otherwise.Equivalently, g(n) = 1 – min (g(n-1), g(n-2), g(n-6)).

27

Dynamic Programming in GeneralBreak up a complex decision problem into a sequence of smaller decision subproblems.

Stages: one solves decision problems one “stage” at a time. Stages often can be thought of as “time” in most instances. – Not every DP has stages– The previous shortest path problem has 6

stages– The match problem does not have stages.

28

Dynamic Programming in GeneralStates: The smaller decision subproblems are often expressed in a very compact manner. The description of the smaller subproblems is often referred to as the state. – match problem: “state” is the number of

matches left

At each state-stage, there are one or more decisions. The DP recursion determines the best decision. – match problem: how many matches to remove– shortest path example: go right and up or else

go down and right

29

Optimal Capacity Expansion: What is the least cost way of building plants?

5282007

5572006

5762005

5842004

5622003

5412002

Cost per plant in $millions

Cum. DemandYear

Cost of $15 million in any year in which a plant is built. At most 3 plants a year can be built

30

Finding a topological order

1

2

5

4

3 6

7 85

1

3

2

1

1

2

4

7

3

6

Find a node with no incoming arc. Label it node 1.

For i = 2 to n, find a node with no incoming arc from an unlabeled node. Label it node i.

31

Find d(j) using a recursion.

d(j) is the shortest length of a path from node 1 to node j.

Let cij = length of arc (i,j)

3 6

7 85

1

3

2

1

1

2

4

7

3

6

2 3

2 4

51

What is d(j) computed in terms of d(1), … d(j-1)?

Compute f(2), …, f(8)

Example: d(4) = min { 3 + d(2), 2 + d(3) }

32

Finding optimal paragraph layouts

Tex optimally decomposes paragraphs by selecting the breakpoints for each line optimally. It has a subroutine that computes the ugliness F(i,j) of a line that begins at word i and ends at word j-1. How can we use F(i,j) as part of a dynamic program whose solution will solve the paragraph problem.

Tex optimally decomposes paragraphs by select-ing the breakpoints for each line optimally. It hasa subroutine that computes the ugliness F(i,j)of a line that begins at word i and ends at word j-1. How can we use F(i,j) as part of a dynamic program whose solution will solve the paragraph problem.

33

Capital Budgeting, againInvestment budget = $14,000

Investment 1 2 3 4 5 6

Cash Required (1000s)

$5

$7

$4

$3

$4

$6

NPV added (1000s)

$16

$22

$12

$8

$11

$19

34

Capital Budgeting: stage 3Consider stock 3: cost $4, NPV: $12

3 4 5 6 7Budget used up

8 9 10 11 12 13 140 1 2

0 - -Best NPV so far

- - 16 - - - - - - - - -

0 - - - - 16 - 22 - - - - 38 - -

0 - - - 12 16 - 22 - 28 - 34 38 - -

f(3,k) = max (f(2,k), f(2, k-4) + $12).

35

The recursion

f(0,0) = 0; f(0,k) is undefined for k > 0

f(k, v) = min ( f(k-1, v), f(k-1, v-ak) + ck)either item k is included, or it is not

The optimum solution to the original problem is max { f(n, v) : 0 ≤ v ≤ b }.

Note: we solve the capital budgeting problem for all right hand sides less than b.

36

Heuristics: a way of dealing with hard combinatorial problemsConstruction heuristics: construct a solution.

Example: Nearest neighbor heuristic

beginchoose an initial city for the tour;while there are any unvisited cities, then the next city on the tour is the nearest

unvisited city;end

37

Improvement Methods

These techniques start with a solution, and seek out simple methods for improving the solution.Example: Let T be a tour. Seek an improved tour T’ so that |T - T’| = 2.

38

Illustration of 2-opt heuristic

39

Take two edges out. Add 2 edges in.

40

Take two edges out. Add 2 edges in.

41

Local Optimality

A solution y is said to be locally optimum (with respect to a given neighborhood) if there is no neighbor of y whose objective value is better than that of y.

Example. 2-Opt finds a locally optimum solution.

42

Improvement methods typically find locally optimum solutions.

A solution y is said to be globally optimum if no other solution has a better objective value.Remark. Local optimality depends on what a neighborhood is, i.e., what modifications in the solution are permissible.– e.g. 2-interchanges– e.g., 3-interchanges

43

What is a neighborhood for the fire station problem?

1 2 3

4

5 67

8 9

1110

12

14 15

13

16

44

Insertion heuristic with randomization

Choose three cities randomly and obtain a tour T on the citiesFor k = 4 to n, choose a city that is not on T and insert it optimally into T.

Note: we can run this 1,000 times, and get many different answers. This increases the likelihood of getting a good solution.

Remark: simulated annealing will not be on the final exam.

45

GA terms

chromosome

alleles1 or 0

(variable) (values)

(solution)gene

selection

crossover

mutation

population Objective: maximize fitness function(objective function)

46

A Simple Example: Maximize the number of 1’s

Initial Population Fitness

1 1 1 0 1 40 1 1 0 1 30 0 1 1 0 21 0 0 1 1 3

Average fitness 3Usually populations are much bigger, say around 50 to 100, or more.

47

Crossover Operation: takes two solutions and creates a child (or more) whose genes are a mixture of the genes of the parents.

Select two parents from the population.

This is the selection step. There will be more on this later.

0 1 1 0 1parent 1

1 0 0 1 1parent 2

48

Crossover Operation: takes two solutions and creates a child (or more) whose genes are a mixture of the genes of the parents.

1 point crossover: Divide each parent into two parts at the same location k (chosen randomly.)

0 1 1 0 1 1 0 0 1 1parent 1 parent 2

0 1 1 1 1child 1

1 0 0 0 1child 2

Child 1 consists of genes 1 to k-1 from parent 1 and genes k to n from parent 2. Child 2 is the “reverse”.

49

Selection Operator

Think of crossover as matingSelection biases mating so that fitter parents are more likely to mate.

Example: 1. 1 1 1 0 1 42. 0 1 1 0 1 33. 0 0 1 1 0 24. 1 0 0 1 1 3

Total fitness 12

For example, let the probability of selecting member j be fitness(j)/total fitness

Prob(1) = 4/12 = 1/3

Prob(3) = 2/12 = 1/6

50

Example with Selection and Crossover Only

original after 5 after 10 generations generations

1 0 0 1 1 1 1 0 1 1 1 1 0 1 10 1 0 0 0 1 0 1 1 1 1 0 0 1 10 0 0 0 1 1 1 0 1 1 1 1 0 1 11 1 1 1 1 1 1 0 1 1 1 1 0 1 1. . .0 0 1 0 0 1 1 0 1 1 1 1 0 1 11 1 0 1 1 1 1 0 1 1 1 1 0 1 1

2.8000 3.7000 3.9000

51

Mutation

Previous difficulty: important genetic variability was lost from the population

Idea: introduce genetic variability into the population through mutation

simple mutation operation: randomly flip q% of the alleles (bits) in the population.

52

Previous Example with a 1% mutation rate

original after 5 after 10 generations generations

1 0 0 1 1 1 1 0 1 1 1 1 1 1 10 1 0 0 0 1 1 1 1 1 1 1 1 1 10 0 0 0 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1. . .1 0 0 0 1 0 1 1 1 1 1 1 1 1 10 0 1 0 0 1 1 1 1 1 1 1 1 1 11 1 0 1 1 1 1 1 1 1 1 1 1 1 1

2.8000 4.8000 4.9000

53

Generation based GAs

2 9 A B1

5 1 C D2 33 8 E F Then

replace the original population by the children

542 12 G H76

8 9 10 1 6 I J11 12 3 4 K L

13 15 14 7 M N1416 17 8 2 O P

18 19 15 6 Q R20 2 5 S T

54

Generation based GAs

2

4

1

9

35

610

78

11 12

13 14 15

1718

16

1920

A B

C D

E F

G H

I J

K L

M N

O P

Q R

S T

This creates the next generation. Then iterate

55

For genetic algorithms the final exam will cover basic terminology

We will not cover: steady state, random keysWe will cover terms mentioned on the previous slides

56

Any questions before we solicit feedback on 15.053?

handouts: lecture notesdspace.mit.edu/bitstream/handle/1721.1/35748/15... · linear programming...

Documents