artificial intelligence comp-424atombe2/classnotes... · 2009-04-16 · artificial intelligence...
TRANSCRIPT
Artificial IntelligenceCOMP-424
Lecture notes by Alexandre Tomberg
Prof. Joelle Pineau McGill University
Winter 2009
Lecture notes Page 1
History of AII.
Uninformed Search Methods1.Informed Search2.Search for Optimization Problems3.Game Playing4.Constraint Satisfaction5.
SearchII.
Knowledge Representation: Logic1.First Order Logic2.Planning3.Spatial Planning4.
LogicIII.
Reasoning under Uncertainty1.Bayesian Networks2.
ProbabilityIV.
Machine Learning: Parameter Estimation1.Learning with Missing Values2.Supervised Learning3.Neural Nets4.Decision Trees5.
Machine LearningV.
Utility Theory1.Markov Decision Processes (MDPs)2.Reinforcement Learning3.
Decision TheoryVI.
Table of ContentsDecember-03-08
12:16 PM
Lecture notes Page 2
History of AIJanuary-06-09
10:03 AM
Lecture notes Page 3
Uninformed Search MethodsJanuary-08-09
10:06 AM
Lecture notes Page 4
Lecture notes Page 5
Generic Search Algorithm:
Algorithm 1: BFS
Lecture notes Page 6
Algorithm 2: DFS
Algorithm 3: Depth limited search
Algorithm 4: Iterative Deepening
Lecture notes Page 7
Informed SearchJanuary-13-09
10:02 AM
Lecture notes Page 8
Algorithm #1: Best-First Search
Algorithm #2: Heuristic Search
AlgorithmsJanuary-13-09
10:34 AM
Lecture notes Page 9
Algorithm # 3: A* search
Lecture notes Page 10
Lecture notes Page 11
Search for Optimization ProblemsJanuary-15-09
10:05 AM
Lecture notes Page 12
Algorithm #1: Hill Climbing
Algorithm #2: Simulated Annealing
Iterative Improvement AlgorithmsJanuary-15-09
10:05 AM
Lecture notes Page 13
Lecture notes Page 14
Genetic AlgorithmsJanuary-15-09
11:06 AM
Lecture notes Page 15
Game PlayingJanuary-20-09
10:03 AM
Lecture notes Page 16
Minimax SearchJanuary-20-09
10:07 AM
Lecture notes Page 17
α-β PruningJanuary-20-09
10:44 AM
Lecture notes Page 18
Constraint SatisfactionJanuary-22-09
10:10 AM
Lecture notes Page 19
Lecture notes Page 20
Lecture notes Page 21
Knowledge Representation: LogicJanuary-27-0910:10 AM
Lecture notes Page 22
Lecture notes Page 23
Lecture notes Page 24
Lecture notes Page 25
Lecture notes Page 26
Lecture notes Page 27
First Order LogicFebruary-18-09
7:50 PM
Lecture notes Page 28
Lecture notes Page 29
Lecture notes Page 30
Lecture notes Page 31
Lecture notes Page 32
Lecture notes Page 33
Lecture notes Page 34
PlanningFebruary-03-0910:11 AM
Lecture notes Page 35
Lecture notes Page 36
Lecture notes Page 37
Lecture notes Page 38
Partial Order Planning AlgorithmFebruary-18-09
8:55 PM
Lecture notes Page 39
Least Commitment
Analysis
Lecture notes Page 40
Spatial PlanningFebruary-03-09
10:32 AM
Lecture notes Page 41
Lecture notes Page 42
Lecture notes Page 43
Lecture notes Page 44
If we know probabilities, what actions should we choose?
Reasoning under UncertaintyFebruary-18-09
9:13 PM
Lecture notes Page 45
Lecture notes Page 46
Lecture notes Page 47
Lecture notes Page 48
Lecture notes Page 49
Lecture notes Page 50
Bayesian NetworksMarch-19-09
3:26 PM
Lecture notes Page 51
Lecture notes Page 52
Machine Learning: Parameter EstimationMarch-03-09
10:09 AM
Lecture notes Page 53
Statistical Parameter FittingMarch-03-09
10:34 AM
Lecture notes Page 54
Maximum Likelihood Estimate (MLE)March-03-09
10:53 AM
Lecture notes Page 55
Lecture notes Page 56
Learning with Missing ValuesMarch-10-09
10:14 AM
Lecture notes Page 57
Basic EM algorithm:Start with an initial parameter setting
Expectation Step: Complete the data by assigning values to missing items.Maximization Step: Compute the maximum log-likelihood and new parameters on the complete data.
Repeat:
Lecture notes Page 58
Lecture notes Page 59
Soft EM for a general Bayes net:
Lecture notes Page 60
Machine Learning: ClusteringMarch-19-09
4:21 PM
Lecture notes Page 61
Lecture notes Page 62
Supervised LearningMarch-10-09
10:55 AM
Lecture notes Page 63
Lecture notes Page 64
Lecture notes Page 65
OverfittingApril-14-09
8:35 PM
Lecture notes Page 66
Lecture notes Page 67
Gradient Descent:Given w0, for i = 0, 1, 2, ... do:
Repeat until necessary.
Finding Parameters in GeneralApril-14-09
9:05 PM
Lecture notes Page 68
Batch vs. Online OptimizationApril-14-09
9:38 PM
Lecture notes Page 69
What we should know:
Lecture notes Page 70
Neural NetsMarch-19-09
4:48 PM
Lecture notes Page 71
Lecture notes Page 72
Lecture notes Page 73
Lecture notes Page 74
Forward pass:
Compute the output of all units in layer kCopy this output as the input to the next layer
for layer k = 1 ... K do:
Feed Forward Neural NetworksApril-15-09
10:48 AM
Lecture notes Page 75
Lecture notes Page 76
Backpropagation algorithm:Forward pass: compute the output of the network going from input layer to output layer.
1.
Backward pass: compute the gradient of the error for every weight inside the network going from output layer towards the input layer.
2.
Update: update the weights using the standard rule:3.
Lecture notes Page 77
Lecture notes Page 78
Overfitting in Neural NetApril-15-09
12:56 PM
Lecture notes Page 79
Decision TreesApril-15-091:04 PM
Lecture notes Page 80
Lecture notes Page 81
Lecture notes Page 82
Lecture notes Page 83
Lecture notes Page 84
Utility TheoryApril-15-09
1:54 PM
Lecture notes Page 85
Utility Models:
Lecture notes Page 86
Maximizing Expected Utility (MEU) PrincipleApril-15-09
2:21 PM
Lecture notes Page 87
Lecture notes Page 88
What we should know:
Lecture notes Page 89
Markov Decision Processes (MDPs)April-15-09
2:50 PM
Lecture notes Page 90
Lecture notes Page 91
PoliciesApril-15-09
2:50 PM
Lecture notes Page 92
Lecture notes Page 93
Iterative Policy Evaluation Algorithm:Start with some initial guess1.During iteration k update the function for all states as follows:2.
Lecture notes Page 94
Searching for a Good PolicyApril-15-09
4:47 PM
Lecture notes Page 95
Policy Iteration Algorithm:Start with an initial policy
Compute using policy evaluation algorithmCompute using greedy policy update rule on
Repeat until
Lecture notes Page 96
Value Iteration Algorithm:Start with an initial value
Update the value function estimate using:Repeat until
Lecture notes Page 97
Lecture notes Page 98
Lecture notes Page 99
Reinforcement LearningApril-15-09
5:38 PM
Lecture notes Page 100
Lecture notes Page 101
TD (order 0) Learning Algorithm:Initialize the value function:1.
Pick a start statea.
Choose an action a based on current policy π and current state si.Take action a, observe reward r and new state s'ii.Compute TD error: δ = r + γ V(s') - V(s)iii.Update the value function: V(s) = V(s) + αs δiv.Update current state: s = s'v.If s' is a terminal state, GoTo 2.vi.
Repeat for every time step tb.
Repeat until feeling sick of it:2.
Lecture notes Page 102
Reinforcement Learning for ControlApril-15-09
6:35 PM
Lecture notes Page 103
Lecture notes Page 104