tommi s. jaakkola mit ai lab · tommi jaakkola, mit ai lab 21. model selection we can re ne { the...

27
Machine learning: lecture 1 Tommi S. Jaakkola MIT AI Lab

Upload: others

Post on 04-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Machine learning: lecture 1

Tommi S. Jaakkola

MIT AI Lab

Page 2: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

6.867 Machine learning: administrivia

• Instructor: Prof. Tommi Jaakkola

• General info

– lectures TR 2.30-4pm

– tutorials/recitations (time/location tba)

• Grading

– midterm (15%), final (25%)

– 6 (≈ bi-weekly) problem sets (30%)

– final project (30%)

Tommi Jaakkola, MIT AI Lab 2

Page 3: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Broader context

• What is learning anyway?

• Principles of learning are “universal”

– society (e.g., scientific community)

– animal (e.g., human)

– machine

• Prediction is the key...

Tommi Jaakkola, MIT AI Lab 3

Page 4: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Prediction

• We make predictions all the time but rarely investigate the

processes underlying our predictions

• In carrying out scientific research we are also governed by

how theories are evaluated

• To automate the process of making predictions we need to

understand in addition how we search and refine “theories”

Tommi Jaakkola, MIT AI Lab 4

Page 5: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Machine learning

• Statistical machine learning

– principles, methods, and algorithms for learning and

prediction on the basis of past experience

– already everywhere: speech recognition, hand-written

character recognition, information retrieval, operating

systems, compilers, fraud detection, security, defense

applications, ...

Tommi Jaakkola, MIT AI Lab 5

Page 6: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

Tommi Jaakkola, MIT AI Lab 6

Page 7: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data

2. assumptions

3. representation

4. estimation

5. evaluation

6. model selection

Tommi Jaakkola, MIT AI Lab 7

Page 8: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

Tommi Jaakkola, MIT AI Lab 8

Page 9: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

2. assumptions: what can we assume about the students or

the course?

Tommi Jaakkola, MIT AI Lab 9

Page 10: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

2. assumptions: what can we assume about the students or

the course?

3. representation: how do we “summarize” a student?

Tommi Jaakkola, MIT AI Lab 10

Page 11: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

2. assumptions: what can we assume about the students or

the course?

3. representation: how do we “summarize” a student?

4. estimation: how do we construct a map from students to

grades?

Tommi Jaakkola, MIT AI Lab 11

Page 12: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

2. assumptions: what can we assume about the students or

the course?

3. representation: how do we “summarize” a student?

4. estimation: how do we construct a map from students to

grades?

5. evaluation: how well are we predicting?

Tommi Jaakkola, MIT AI Lab 12

Page 13: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Example

• A classifcation problem: predict the grades for students

taking this course

• Key steps:

1. data: what “past experience” can we rely on?

2. assumptions: what can we assume about the students or

the course?

3. representation: how do we “summarize” a student?

4. estimation: how do we construct a map from students to

grades?

5. evaluation: how well are we predicting?

6. model selection: perhaps we can do even better?

Tommi Jaakkola, MIT AI Lab 13

Page 14: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Data

• The data we have available (in principle):

– names and grades of students in past years ML courses

– academic record of past and current students

Tommi Jaakkola, MIT AI Lab 14

Page 15: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Data

• The data we have available (in principle):

– names and grades of students in past years ML courses

– academic record of past and current students

• “training” data:

Student ML course 1 course 2 ...

Peter A B A ...

David B A A ...

• “test” data:

Student ML course 1 course 2 ...

Jack ? C A ...

Kate ? A A ...

• Anything else we could use?

Tommi Jaakkola, MIT AI Lab 15

Page 16: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Assumptions

• There are many assumptions we can make to facilitate

predictions

1. the course has remained roughly the same over the years

2. each student performs independently from others

Tommi Jaakkola, MIT AI Lab 16

Page 17: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Representation

• Academic records are rather diverse so we might limit the

summaries to a select few courses

• For example, we can summarize the ith student (say Pete)

with a vector

xi = [A C B]

where the grades correspond to (say) 18.06, 6.041, and

6.034.

• The available data in this representation

Training Test

Student ML grade Student ML grade

x1 A x′1 ?

x2 B x′2 ?

. . . . . . . . . . . .

Tommi Jaakkola, MIT AI Lab 17

Page 18: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Estimation

• Given the training data

Student ML grade

x1 A

x2 B

. . . . . .

we need to find a mapping from “input vectors” x to “labels”

y encoding the grades for the ML course.

Tommi Jaakkola, MIT AI Lab 18

Page 19: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Estimation

• Given the training data

Student ML grade

x1 A

x2 B

. . . . . .

we need to find a mapping from “input vectors” x to “labels”

y encoding the grades for the ML course.

• Possible solution (nearest neighbor classifier):

1. For any student x find the “closest” student xi in the

training set

2. Predict yi, the grade of the closest student

Tommi Jaakkola, MIT AI Lab 19

Page 20: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Evaluation

• How can we tell how good our predictions are?

– we can wait till the end of this course...

– we can try to assess the accuracy based on the data we

already have (training data)

Tommi Jaakkola, MIT AI Lab 20

Page 21: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Evaluation

• How can we tell how good our predictions are?

– we can wait till the end of this course...

– we can try to assess the accuracy based on the data we

already have (training data)

• Possible solution:

– divide the training set further into training and test sets

– evaluate the classifier constructed on the basis of only the

smaller training set on the new test set

Tommi Jaakkola, MIT AI Lab 21

Page 22: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Model selection

• We can refine

– the estimation algorithm (e.g., using a classifier other than

the nearest neighbor classifier)

– the representation (e.g., base the summaries on a different

set of courses)

– the assumptions (e.g., perhaps students work in groups)

etc.

• We have to rely on the method of evaluating the accuracy

of our predictions to select among the possible refinements

Tommi Jaakkola, MIT AI Lab 22

Page 23: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Types of learning problems

A rough (and somewhat outdated) classification of learning

problems:

• Supervised learning, where we get a set of training inputs

and outputs

– classification, regression

• Unsupervised learning, where we are interested in capturing

inherent organization in the data

– clustering, density estimation

• Reinforcement learning, where we only get feedback in the

form of how well we are doing (not what we should be doing)

– planning

Tommi Jaakkola, MIT AI Lab 23

Page 24: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Supervised learning: classification (again)

Example: digit recognition (8x8 binary digits)

binary digit target label

“2”

“2”

“1”

“1”

. . . . . .

• We wish to learn the mapping from digits to labels

Tommi Jaakkola, MIT AI Lab 24

Page 25: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Representation/assumptions

• A change in the representation that preserves the relevant

information can preclude learning

Tommi Jaakkola, MIT AI Lab 25

Page 26: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Supervised learning: regression

−2 −1 0 1 2−5

−4

−3

−2

−1

0

1

2

3

• Given a set of training examples {(x1, y1) . . . , (xn, yn)}, we

want to learn a mapping f : X → Y such that

yi ≈ f(xi), i = 1, . . . , n

Tommi Jaakkola, MIT AI Lab 26

Page 27: Tommi S. Jaakkola MIT AI Lab · Tommi Jaakkola, MIT AI Lab 21. Model selection We can re ne { the estimation algorithm (e.g., using a classi er other than the nearest neighbor classi

Unsupervised learning: data organization

The digits again...

• We’d like to understand the generation process of examples

(digits in this case)

Tommi Jaakkola, MIT AI Lab 27