least squares cs1114
TRANSCRIPT
![Page 1: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/1.jpg)
Least squares
CS1114http://cs1114.cs.cornell.edu
![Page 2: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/2.jpg)
2
Robot speedometer
Suppose that our robot can occasionally report how far it has traveled (mileage)– How can we tell how fast it is going?
This would be a really easy problem if:– The robot never lied
• I.e., it’s mileage is always exactly correct
– The robot travels at the same speed
Unfortunately, the real world is full of lying, accelerating robots– We’re going to figure out how to handle them
![Page 3: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/3.jpg)
3
The ideal robot
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
![Page 4: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/4.jpg)
4
The real (lying) robot
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
![Page 5: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/5.jpg)
5
Speedometer approach We are (as usual) going to solve a very
general version of this problem– And explore some cool algorithms– Many of which you will need in future classes
The velocity of the robot at a given time is the change in mileage w.r.t. time– For our ideal robot, this is the slope of the line
• The line fits all our data exactly In general, if we know mileage as a
function of time, velocity is the derivative– The velocity at any point in time is the slope of
the mileage function
![Page 6: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/6.jpg)
6
Estimating velocity
So all we need is the mileage function We have as input some measurements
– Mileage, at certain times
A mileage function takes as input something we have no control over– Input (time): independent variable– Output (mileage): dependent variable
Independent variable (time)
Dependent variable (mileage)
![Page 7: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/7.jpg)
7
Basic strategy
Based on the data, find mileage function– From this, we can compute:
• Velocity (1st derivative)• Acceleration (2nd derivative)
For a while, we will only think about mileage functions which are lines
In other words, we assume lying, non-accelerating robots– Lying, accelerating robots are much harder
![Page 8: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/8.jpg)
8
Models and parameters
A model predicts a dependent variable from an independent variable– So, a mileage function is actually a model– A model also has some internal variables that
are usually called parameters – In our line example, parameters are m,b
Independent variable (time)
Dependent variable (mileage)
Parameters (m,b)
![Page 9: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/9.jpg)
Linear regression
Simplest case: fitting a line
9
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
![Page 10: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/10.jpg)
Linear regression
Simplest case: just 2 points
10
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
(x1,y1)
(x2,y2)
![Page 11: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/11.jpg)
Linear regression
Simplest case: just 2 points
Want to find a line y = mx + b x1 y1, x2 y2
This forms a linear system: y1 = mx1 + b
y2 = mx2 + b x’s, y’s are knowns m, b are unknown Very easy to solve
11
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
(x1,y1)
(x2,y2)
![Page 12: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/12.jpg)
Linear regression, > 2 points
The line won’t necessarily pass through any data point
12
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
y = mx + b
(yi, xi)
![Page 13: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/13.jpg)
13
Some new definitions
No line is perfect – we can only find the best line out of all the imperfect ones
We’ll define an objective function Cost(m,b) that measures how far a line is from the data, then find the best line– I.e., the (m,b) that minimizes Cost(m,b)
![Page 14: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/14.jpg)
14
Line goodness
What makes a line good versus bad?– This is actually a very subtle question
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
![Page 15: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/15.jpg)
15
Residual errors
The difference between what the model predicts and what we observe is called a residual error (i.e., a left-over)– Consider the data point (x,y)– The model m,b predicts (x,mx+b)– The residual is y – (mx + b)
For 1D regressions, residuals can be easily visualized– Vertical distance to the line
![Page 16: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/16.jpg)
16
Least squares fitting
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
This is a reasonable cost function, but we usually use something slightly different
This is a reasonable cost function, but we usually use something slightly different
![Page 17: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/17.jpg)
17
Least squares fitting
0 1 2 3 4 5 60
2
4
6
8
10
12
Time
Mile
age
We prefer to make this a squared distance
We prefer to make this a squared distance
Called “least squares”
![Page 18: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/18.jpg)
18
Why least squares?
There are lots of reasonable objective functions
Why do we want to use least squares? This is a very deep question
– We will soon point out two things that are special about least squares
– The full story probably needs to wait for graduate-level courses, or at least next semester
![Page 19: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/19.jpg)
19
Gradient descent
Basic strategy:1. Start with some guess for the minimum2. Find the direction of steepest descent (gradient)3. Take a step in that direction (making sure that
you get lower, if not, adjust the step size)4. Repeat until taking a step doesn’t get you much
lower
![Page 20: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/20.jpg)
Gradient descent, 1D quadratic
There is some magic in setting the step size
20
1.8 1.9 2 2.1 2.22.5
3
3.5
4
4.5
5
5.5
m
sum
of
squa
red
erro
rs
![Page 21: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/21.jpg)
21
Some error functions are easy A (positive) quadratic is a convex
function– The set of points above the curve forms a
(infinite) convex set– The previous slide shows this in 1D
• But it’s true in any dimension
A sum of convex functions is convex Thus, the sum of squared error is
convex Convex functions are “nice”
– They have a single global minimum– Rolling downhill from anywhere gets you
there
![Page 22: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/22.jpg)
22
Consequences
Our gradient descent method will always converge to the right answer– By slowly rolling downhill– It might take a long time, hard to predict
exactly how long (see CS3220 and beyond)
![Page 23: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/23.jpg)
23
Why is an error function hard?
An error function where we can get stuck if we roll downhill is a hard one– Where we get stuck depends on where we start
(i.e., initial guess/conditions)– An error function is hard if the area “above it”
has a certain shape• Nooks and crannies• In other words, CONVEX!
– Non-convex error functions are hard to minimize
![Page 24: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/24.jpg)
24
What else about LS?
Least squares has an even more amazing property than convexity– Consider the linear regression problem
There is a magic formula for the optimal choice of (m,b)– You don’t need to roll downhill, you can
“simply” compute the right answer
![Page 25: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/25.jpg)
25
Closed-form solution!
This is a huge part of why everyone uses least squares
Other functions are convex, but have no closed-form solution
![Page 26: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/26.jpg)
26
Closed form LS formula
The derivation requires linear algebra– Most books use calculus also, but it’s not
required (see the “Links” section on the course web page)
– There’s a closed form for any linear least-squares problem
![Page 27: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/27.jpg)
Linear least squares
Any formula where the residual is linear in the variables
Exampleslinear regression: [y – (mx + b)]2
Non-example: [x’ – abc x]2 (variables: a, b, c)
27
![Page 28: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/28.jpg)
Linear least squares Surprisingly, fitting the
coefficients of a quadratic is still linear least squares
The residual is still linear in the coefficients
β1, β2, β3
28
Wikipedia, “Least squares fitting”
![Page 29: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/29.jpg)
Optimization
Least squares is another example of an optimization problem
Optimization: define a cost function and a set of possible solutions, find the one with the minimum cost
Optimization is a huge field
29
![Page 30: Least squares CS1114](https://reader036.vdocuments.mx/reader036/viewer/2022081515/56649c945503460f949503a5/html5/thumbnails/30.jpg)
Sorting as optimization
Set of allowed answers: permutations of the input sequence
Cost(permutation) = number of out-of-order pairs
Algorithm 1: Snailsort Algorithm 2: Bubble sort Algorithm 3: ???
30