learning
DESCRIPTION
Learning. CPSC 386 Artificial Intelligence Ellen Walker Hiram College. What is learning?. Process which changes a system to enable it to do the same task or tasks drawn from the same population more efficiently next time (improving performance). Examples (increasing abstraction) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/1.jpg)
Learning
CPSC 386 Artificial Intelligence
Ellen Walker
Hiram College
![Page 2: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/2.jpg)
What is learning?
• Process which changes a system to enable it to do the same task or tasks drawn from the same population more efficiently next time (improving performance).
• Examples (increasing abstraction)– Rote learning– Performance enhancement (problem solving)– Classification– Knowledge acquisition
![Page 3: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/3.jpg)
Designing a Learning Agent
• Which components of the performance element are to be learned?
• What feedback is available to learn this?• What representation is used?
![Page 4: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/4.jpg)
Symbolic vs. Non-Symbolic learning
• If you “open the system up” after it has learned, can the knowledge be easily expressed?
• Symbolic uses accessible internal representations
• Non-symbolic uses inaccessible internal representations
![Page 5: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/5.jpg)
Learning Examples Classified
Symbolic NonSymbolic
Supervised Structural,Decision tree,CandidateElimination,Explanation BasedLearning
Genetic Algorithms,Backpropagation NNsSCARF,ParameterAdjustment
Un-supervised
Discovery (AM,Bacon)SOARProdigy
Clustering,Competitive learning(Kohonen Net),Hopfield Networks
![Page 6: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/6.jpg)
Inductive learning
• Given a set of examples (x, y) where x is input, y is output
• Learn a function y=f(x) that– Returns correct results for all (x,y) pairs in the
training set of examples– Generalizes well -- returns correct results for x
values not in the training set
![Page 7: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/7.jpg)
Ockham’s Razor
• If two functions fit, pick the simplest• There is an inevitable tradeoff between the
complexity of the hypothesis function and the degree of fit to the data.
![Page 8: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/8.jpg)
Decision Trees
• Each node is a question• Each leaf is a decision
hair?
legs?
snakefrog
Pet?
CatLion
![Page 9: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/9.jpg)
Learning Decision Trees from Examples
• Silly example: should I buy this car?
1. red VW (foreign, small, red) YES
2. green Cadillac (domestic, large, green) NO
3. blue Subaru (foreign, small, blue) YES
4. blue Mercedes (foreign, large, blue) NO
5. red Saturn (domestic, small, red) YES
![Page 10: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/10.jpg)
Three types of learning
• Supervised– The system learns a function from examples of inputs and
outputs– Correct outputs must be available during training
• Unsupervised– The system learns without feedback, based on global
optimization criterion
• Reinforcement– System is rewarded (or punished) for decisions– This is the most general, models most human learning
(except school).
![Page 11: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/11.jpg)
Recursive Splitting
• Start with one big class– If there are some yes, some no, choose an
attribute to split them (we now have 2 recursive problems)
– Otherwise, we are done
• When all recursive problems are solved, the remaining classes will have all YES or all NO
• Each decision used for a split is a branch on the tree.
![Page 12: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/12.jpg)
Recursive splitting example
• Initial class{ (foreign,small,red,yes), (domestic, large, green,
no), (foreign, small, blue, yes), (foreign,large,blue,no), (domestic,small,red,yes) }
• Split on size:{ (foreign,small,red,yes), (foreign, small, blue, yes),
(domestic,small,red,yes) }
{(domestic, large, green, no), (foreign,large,blue,no)}
![Page 13: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/13.jpg)
Choosing an attribute to split on
• We want to split on an attribute that gives us information– If an attribute splits the class into all pos/all neg
that’s best!– Otherwise: if an attribute splits the class roughly
evenly, and one subclass is mostly pos, one mostly neg, that’s pretty good
![Page 14: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/14.jpg)
A formal notation of “best”
• Goal is to maximize information gain– Number of “bits” of information still needed after
the split – number of bits needed before the split
– Information• I(p,n) = –( (p/p+n)lg(p/p+n) + (n/p+n)lg(n/p+n) )
– We need to subtract the sum of the informations for the split, weighted by the number of items in each
• Example: (4,2) -> (3, 0) and (1,2)• Value is I(4,2) - 1/2 * I(3,0) - 1/2*I(1,2)
![Page 15: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/15.jpg)
Updating our recursive algorithm
• Defun tree(examples)– If all examples are positive (or negative) return
examples– Else
• Choose best attribute using Information gain• Divide examples into sublists based on examples• Return • (cons attribute (mapcar #’Tree (list of sublists)))
• Result will be tree with each element being an attribute and a list of branches.
![Page 16: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/16.jpg)
Assessing a Learning System
• Collect a large set of examples• Divide into test and training sets (disjoint)• Apply learning algorithm to training set (only)• Measure its performance on test set (only)
• Repeat for different sizes of training sets• Repeat for different randomly selected test
sets of each training set
![Page 17: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/17.jpg)
Learning Curve
Training set size (% of total)
% correct
![Page 18: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/18.jpg)
Learning Depends on Training
• If the test set is not a random subset of the training set, strange results can occur!– What if test set contains only small cars, training
set only large cars?
• If the overall set of examples doesn’t “cover the space” the wrong concept will be learned– Tank and weather example
![Page 19: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/19.jpg)
Overfitting is Bad
• An algorithm is fully trained if it classifies every test case perfectly
• But what if every leaf is a set with only one element?– Training set is perfectly classified– Each element of test set creates a new category--
we have no experience!– Avoid by requiring minimum information gain value
in order to split a set
![Page 20: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/20.jpg)
One example at a time
• At any given point we have a current hypothesis that explains the examples
• Positive examples (that were incorrectly classified as negative) extend the hypothesis until it includes the new example
• Negative examples (that were incorrectly classified as positive) restrict the hypothesis until it does not include the new example
![Page 21: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/21.jpg)
Extending and Restricting
• To extend a hypothesis, “add in” the new information– Extended hypothesis = hypothesis | pos. example
• To restrict a hypothesis “subtract out” the new information– Extended hypothesis = hypothesis & not(neg. ex.)
![Page 22: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/22.jpg)
Candidate elimination (car example)1. red VW (foreign, small, red) YES
Min hypothesis: all foreign, small red things are good cars
Max hypothesis: everything is a good car
2. green Cadillac (domestic, large, green) NO
Min hypothesis: all foreign, small red things are good cars
Max hypothesis: everything foreign or small or not green is a good car
3. blue Subaru (foreign, small, blue) YES
Min hypothesis: all foreign, small (red or blue) things are good cars
Max hypothesis: everything foreign or small or not green is a good car
![Page 23: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/23.jpg)
Candidate elimination (car example) cont.
4. blue Mercedes (foreign, large, blue) NO
Min hypothesis: all foreign, small, (red or blue) things are good cars
Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car
5. red Saturn (domestic, small, red) YES
Min hypothesis: all small, (red or blue) things are good cars
Max hypothesis: everything small or (domestic and not green) or (foreign and not blue) or red is a good car
![Page 24: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/24.jpg)
Version Space Learning
• Consider the set of all hypotheses consistent with the examples– This will be the “range” from min to max in the
prior examples– This is called a version space, and is updated after
each example
• Least-commitment algorithm– We take no great leaps, but only make the minimal
changes required for the concept to fit the examples.
![Page 25: Learning](https://reader036.vdocuments.mx/reader036/viewer/2022070404/56813bdc550346895da50a57/html5/thumbnails/25.jpg)
Evaluating these algorithms
• Decision Tree learning is faster• ... But you need to have all examples in
advance• Decision trees make disjunctions easier to
express• Both are highly dependent on having the right
attributes available• Both are highly susceptible to noise (incorrect
training examples)