handouts: lecture notesdspace.mit.edu/bitstream/handle/1721.1/35748/15... · linear programming...
TRANSCRIPT
1
15.053 Thursday, May 16
Review of 15.053
Handouts: Lecture Notes
2
Overview of Problem Types
Dynamic programming
Nonlinear Programming
“Easy” Nonlinear Programming
“Hard” Nonlinear Programming
Linear Programming
Integer Programming
Network Flows
3
Overview of Problem TypesNonlinear Programming
“Easy” Nonlinear Programming
“Hard” Nonlinear Programming
Dynamic programming
Integer Programming
Linear Programming
Network Flows
4
Why the focus on linear programming?
Linear programming illustrates much of what is important about modeling.Linear programming is a very useful tool in optimization!We can solve linear programs very efficiently.The state-of-the-art integer programming techniques rely on linear programmingLinear Programming is the best way of teaching about performance guarantees and duality.Linear programming is very helpful for understanding other optimization approaches.
5
Topics through midterm 2Linear programming – Formulations– Geometry– The simplex algorithm– Sensitivity Analysis– Duality Theory
Network OptimizationInteger programming– formulations– B&B– Cutting planes
6
Topics covered in the Final Exam
Linear Programming FormulationsInteger Programming FormulationsNonlinear ProgrammingDynamic ProgrammingHeuristics
7
Rest of this lecture
A very brief overview of the topics covered since the 2nd midterm.Slides drawn from lecturesIf you have questions about the topics covered, ask them as I go along. I need to reserve time at the end for Sloan course evaluations.
8
What is a non-linear program?
maximize 3 sin x + xy + y3 - 3z + log zSubject to x2 + y2 = 1
x + 4z ≥ 2z ≥ 0
A non-linear program is permitted to have non-linear constraints or objectives. A linear program is a special case of non-linear programming!
9
Portfolio Selection Example
When trying to design a financial portfolio investors seek to simultaneously minimize risk and maximize return.Risk is often measured as the variance of the total return, a nonlinear function.FACT:
1 2
1
var( ) var( ) var( ) cov( , )
n
n i ji j
x x xx x x x
≠
+ + + =
+ + +∑K
K
10
Portfolio Selection (cont’d)
Two Methods are commonly used:
–Min Risk s.t. Expected Return ≥ Bound
–Max Expected Return - θ (Risk) where θ reflects the tradeoff between return and risk.
11
Regression, and estimating β
-60.00%
-40.00%
-20.00%
0.00%
20.00%
40.00%
60.00%
80.00%
-40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 80.00%Market
Stoc
k
The value β is the slope of the regression line. Here it is around .6 (lower expected gain than the market, and lower risk.)
Return on Stock A vs. Market Return
12
Local vs. Global Optima
Def’n: Let x be a feasible solution, then – x is a global max if f(x) ≥ f(y) for every feasible y.– x is a local max if f(x) ≥ f(y) for every feasible y
sufficiently close to x (i.e. xj-ε ≤ yj ≤ xj+ ε for all j and some small ε).
z
There may be several locally optimal solutions.
C
x1
max f(x)s.t. 0 ≤ x ≤ 1
A
Bz = f(x)
0
13
Convex FunctionsConvex Functions: f(λ y + (1- λ)z) ≤ λ f(y) + (1- λ)f(z)for every y and z and for 0≤ λ ≤1.e.g., f((y+z)/2) ≤ f(y)/2 + f(z)/2
We say “strict” convexity if sign is “<” for 0< λ <1.
Line joining any points is above the curvef(x)
x
x
x
y z(y+z)/2
x
14
Concave FunctionsConcave Functions: f(λ y + (1- λ)z) ≥ λ f(y) + (1- λ)f(z)for every y and z and for 0≤ λ ≤1.e.g., f((y+z)/2) ≥ f(y)/2 + f(z)/2
We say “strict” convexity if sign is “>” for 0< λ <1.
f(x)
(y+z)/2
Line joining any points is below the curve
x
x
x
z y
x
15
Convexity and Extreme Points
x
y
W
P
2 4 6 8 10 12 14
24
68
1012
14
We say that a set S is convex, if for every two points x and y in S, and for every real number λ in [0,1], λx + (1-λ)y ε S.
The feasible region of a linear program is convex.
We say that an element w ε S is an extreme point (vertex, corner point), if wis not the midpoint of any line segment contained in S.
15
16
Local Maximum (Minimum) Property
A local max of a concave function on a convex feasible region is also a global max.A local min of a convex function on a convex feasible region is also a global min.Strict convexity or concavity implies that the global optimum isunique.Given this, we can efficiently solve:
– Maximization Problems with a concave objective function and linear constraints
– Minimization Problems with a convex objective function and linear constraints
17
Where is the optimal solution?
0 2 4 6 8 10 12 14 16 180
2
4
6
8
10
12
14
16
18y
x
Note: the optimal solution is not at a corner point. It is where the isocontour first hits the feasible region.
18
Another example:
0 2 4 6 8 10 12 14 16 180
2
4
6
8
10
12
14
16
18y
x
Then the global unconstrained minimum is also feasible.
The optimal solution is not on the boundary of the feasible region.
Minimize
(x-8)2 + (y-8)2
19
Finding a local maximum using Fibonacci Search.
2113 261816 19
0 34
Length of search interval 342113853
Where the maximum may be
20
The search finds a local maximum, but not necessarily a global maximum.
2113 261816 19
0 34
21
Approximating a non-linear function of 1 variable: the λ method
y = x^3/3 + 2x - 5
-25
-20
-15
-10
-5
0
5
10
15
-3 -2.6-2.
2-1.
8-1.
4 -1 -0.6-0.
2 0.2 0.6 1 1.4 1.8 2.2 2.6 3
x
y
Choose different values of x to approximate the x-axis
Approximate using piecewise linear segments
22
More on the λ method
y = x^3/3 + 2x - 5
-25
-20
-15
-10
-5
0
5
10
15
-3 -2.6-2.
2-1.
8-1.
4 -1 -0.6-0.
2 0.2 0.6 1 1.4 1.8 2.2 2.6 3
x
y
a1 = -3, f(a1) = -20
a2 = -1 f(a2) = -7 1/3
Suppose that for –3 ≤ x ≤ -1,
we represent x has λ1 (-3) + λ2 (-1) where λ1 + λ2 = 1 and λ1, λ2 ≥ 0
Then we approximate f(x) as
λ1 (-20) + λ2 (-7 1/3)
23
Approximating a non-linear objective function for a minimization NLP.
original problem: minimize {f(y): y ∈ P}
Suppose that y = Σj λjaj ,where Σj λj = 1 and λ >= 0 .
Approximate f(y). minimize {Σj λjf(aj): Σj λjaj ∈ P}Note: when given a choice of representing y in alternative ways, the LP will choose one that leads to the least objective value for the approximation.
24
For minimizing a convex function, the λ-method automatically satisfies the additional adjacency property.
min z = λ1f(a1) + λ2f(a2) + λ3f(a3) + λ4f(a4) + λ5f(a5)s.t. λ1 + λ2 + λ3 + λ4 + λ5 = 1 ; λ ≥ 0
+ adjacency condition+ other constraints
a5a1 a2 a3 a4
25
Dynamic programming
Suppose that there are 50 matches on a table, and the person who picks up the last match wins. At each alternating turn, my opponent or I can pick up 1, 2 or 6 matches. Assuming that I go first, how can I be sure of winning the game?
26
Determining the strategy using DPn = number of matches left (n is the state/stage) g(n) = 1 if you can force a win at n matches. g(n) = 0 otherwise g(n) = optimal value function.
At each state/stage you can make one of three decisions: take 1, 2 or 6 matches.
g(1) = g(2) = g(6) = 1 (boundary conditions)g(3) = 0; g(4) = g(5) = 1. (why?)
The recursion:g(n) = 1 if g(n-1) = 0 or g(n-2) = 0 or g(n-6) = 0;g(n) = 0 otherwise.Equivalently, g(n) = 1 – min (g(n-1), g(n-2), g(n-6)).
27
Dynamic Programming in GeneralBreak up a complex decision problem into a sequence of smaller decision subproblems.
Stages: one solves decision problems one “stage” at a time. Stages often can be thought of as “time” in most instances. – Not every DP has stages– The previous shortest path problem has 6
stages– The match problem does not have stages.
28
Dynamic Programming in GeneralStates: The smaller decision subproblems are often expressed in a very compact manner. The description of the smaller subproblems is often referred to as the state. – match problem: “state” is the number of
matches left
At each state-stage, there are one or more decisions. The DP recursion determines the best decision. – match problem: how many matches to remove– shortest path example: go right and up or else
go down and right
29
Optimal Capacity Expansion: What is the least cost way of building plants?
5282007
5572006
5762005
5842004
5622003
5412002
Cost per plant in $millions
Cum. DemandYear
Cost of $15 million in any year in which a plant is built. At most 3 plants a year can be built
30
Finding a topological order
1
2
5
4
3 6
7 85
1
3
2
1
1
2
4
7
3
6
Find a node with no incoming arc. Label it node 1.
For i = 2 to n, find a node with no incoming arc from an unlabeled node. Label it node i.
31
Find d(j) using a recursion.
d(j) is the shortest length of a path from node 1 to node j.
Let cij = length of arc (i,j)
3 6
7 85
1
3
2
1
1
2
4
7
3
6
2 3
2 4
51
What is d(j) computed in terms of d(1), … d(j-1)?
Compute f(2), …, f(8)
Example: d(4) = min { 3 + d(2), 2 + d(3) }
32
Finding optimal paragraph layouts
Tex optimally decomposes paragraphs by selecting the breakpoints for each line optimally. It has a subroutine that computes the ugliness F(i,j) of a line that begins at word i and ends at word j-1. How can we use F(i,j) as part of a dynamic program whose solution will solve the paragraph problem.
Tex optimally decomposes paragraphs by select-ing the breakpoints for each line optimally. It hasa subroutine that computes the ugliness F(i,j)of a line that begins at word i and ends at word j-1. How can we use F(i,j) as part of a dynamic program whose solution will solve the paragraph problem.
33
Capital Budgeting, againInvestment budget = $14,000
Investment 1 2 3 4 5 6
Cash Required (1000s)
$5
$7
$4
$3
$4
$6
NPV added (1000s)
$16
$22
$12
$8
$11
$19
34
Capital Budgeting: stage 3Consider stock 3: cost $4, NPV: $12
3 4 5 6 7Budget used up
8 9 10 11 12 13 140 1 2
0 - -Best NPV so far
- - 16 - - - - - - - - -
0 - - - - 16 - 22 - - - - 38 - -
0 - - - 12 16 - 22 - 28 - 34 38 - -
f(3,k) = max (f(2,k), f(2, k-4) + $12).
35
The recursion
f(0,0) = 0; f(0,k) is undefined for k > 0
f(k, v) = min ( f(k-1, v), f(k-1, v-ak) + ck)either item k is included, or it is not
The optimum solution to the original problem is max { f(n, v) : 0 ≤ v ≤ b }.
Note: we solve the capital budgeting problem for all right hand sides less than b.
36
Heuristics: a way of dealing with hard combinatorial problemsConstruction heuristics: construct a solution.
Example: Nearest neighbor heuristic
beginchoose an initial city for the tour;while there are any unvisited cities, then the next city on the tour is the nearest
unvisited city;end
37
Improvement Methods
These techniques start with a solution, and seek out simple methods for improving the solution.Example: Let T be a tour. Seek an improved tour T’ so that |T - T’| = 2.
38
Illustration of 2-opt heuristic
39
Take two edges out. Add 2 edges in.
40
Take two edges out. Add 2 edges in.
41
Local Optimality
A solution y is said to be locally optimum (with respect to a given neighborhood) if there is no neighbor of y whose objective value is better than that of y.
Example. 2-Opt finds a locally optimum solution.
42
Improvement methods typically find locally optimum solutions.
A solution y is said to be globally optimum if no other solution has a better objective value.Remark. Local optimality depends on what a neighborhood is, i.e., what modifications in the solution are permissible.– e.g. 2-interchanges– e.g., 3-interchanges
43
What is a neighborhood for the fire station problem?
1 2 3
4
5 67
8 9
1110
12
14 15
13
16
44
Insertion heuristic with randomization
Choose three cities randomly and obtain a tour T on the citiesFor k = 4 to n, choose a city that is not on T and insert it optimally into T.
Note: we can run this 1,000 times, and get many different answers. This increases the likelihood of getting a good solution.
Remark: simulated annealing will not be on the final exam.
45
GA terms
chromosome
alleles1 or 0
(variable) (values)
(solution)gene
selection
crossover
mutation
population Objective: maximize fitness function(objective function)
46
A Simple Example: Maximize the number of 1’s
Initial Population Fitness
1 1 1 0 1 40 1 1 0 1 30 0 1 1 0 21 0 0 1 1 3
Average fitness 3Usually populations are much bigger, say around 50 to 100, or more.
47
Crossover Operation: takes two solutions and creates a child (or more) whose genes are a mixture of the genes of the parents.
Select two parents from the population.
This is the selection step. There will be more on this later.
0 1 1 0 1parent 1
1 0 0 1 1parent 2
48
Crossover Operation: takes two solutions and creates a child (or more) whose genes are a mixture of the genes of the parents.
1 point crossover: Divide each parent into two parts at the same location k (chosen randomly.)
0 1 1 0 1 1 0 0 1 1parent 1 parent 2
0 1 1 1 1child 1
1 0 0 0 1child 2
Child 1 consists of genes 1 to k-1 from parent 1 and genes k to n from parent 2. Child 2 is the “reverse”.
49
Selection Operator
Think of crossover as matingSelection biases mating so that fitter parents are more likely to mate.
Example: 1. 1 1 1 0 1 42. 0 1 1 0 1 33. 0 0 1 1 0 24. 1 0 0 1 1 3
Total fitness 12
For example, let the probability of selecting member j be fitness(j)/total fitness
Prob(1) = 4/12 = 1/3
Prob(3) = 2/12 = 1/6
50
Example with Selection and Crossover Only
original after 5 after 10 generations generations
1 0 0 1 1 1 1 0 1 1 1 1 0 1 10 1 0 0 0 1 0 1 1 1 1 0 0 1 10 0 0 0 1 1 1 0 1 1 1 1 0 1 11 1 1 1 1 1 1 0 1 1 1 1 0 1 1. . .0 0 1 0 0 1 1 0 1 1 1 1 0 1 11 1 0 1 1 1 1 0 1 1 1 1 0 1 1
2.8000 3.7000 3.9000
51
Mutation
Previous difficulty: important genetic variability was lost from the population
Idea: introduce genetic variability into the population through mutation
simple mutation operation: randomly flip q% of the alleles (bits) in the population.
52
Previous Example with a 1% mutation rate
original after 5 after 10 generations generations
1 0 0 1 1 1 1 0 1 1 1 1 1 1 10 1 0 0 0 1 1 1 1 1 1 1 1 1 10 0 0 0 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1. . .1 0 0 0 1 0 1 1 1 1 1 1 1 1 10 0 1 0 0 1 1 1 1 1 1 1 1 1 11 1 0 1 1 1 1 1 1 1 1 1 1 1 1
2.8000 4.8000 4.9000
53
Generation based GAs
2 9 A B1
5 1 C D2 33 8 E F Then
replace the original population by the children
542 12 G H76
8 9 10 1 6 I J11 12 3 4 K L
13 15 14 7 M N1416 17 8 2 O P
18 19 15 6 Q R20 2 5 S T
54
Generation based GAs
2
4
1
9
35
610
78
11 12
13 14 15
1718
16
1920
A B
C D
E F
G H
I J
K L
M N
O P
Q R
S T
This creates the next generation. Then iterate
55
For genetic algorithms the final exam will cover basic terminology
We will not cover: steady state, random keysWe will cover terms mentioned on the previous slides
56
Any questions before we solicit feedback on 15.053?