Download - Ch 9-1.Machine Learning: Symbol-based
Machine Learning 1
KU NLPKU NLP
Ch 9. Machine Learning: Symbol-based
9.0 Introduction
9.1 A Framework for Symbol-Based Learning
9.2 Version Space Search The Candidate Elimination Algorithm
9.3 ID3 Decision Tree Induction Algorithm
9.5 Knowledge and Learning Explanation-Based Learning
9.6 Unsupervised Learning Conceptual clustering
Machine Learning 2
KU NLPKU NLP
9.0 Introduction
Learning through the course of their interactions with the world through the experience of their own internal states and
processes Is important for practical applications of AI
Knowledge engineering bottleneck major obstacle to the widespread use of intelligent systems the cost and difficulty of building expert systems using
traditional knowledge acquisition techniques one solution
For program to begin with a minimal amount of knowledge And learn from examples, high-level advice, own
explorations of the domain
Machine Learning 3
KU NLPKU NLP
9.0 Introduction
Definition of learning
Views of Learning Generalization from experience
Induction: must generalize correctly to unseen instances of domain
Inductive biases: selection criteria (must select the most effective aspects of their experience)
Changes in the learner acquisition of explicitly represented domain knowledge,
based on its experience, the learner constructs or modifies expressions in a formal language (e.g. logic).
Any change in a system that allow it to perform better the second time on repetition of the same task or on another task drawn form the same population (Simon, 1983)
Machine Learning 4
KU NLPKU NLP
9.0 Introduction
Learning Algorithms vary in goals, available training data, learning strategies and
knowledge representation languages
All algorithms learn by searching through a space of possible concepts to find an acceptable generalization (concept space Fig. 9.5)
Inductive learning learning a generalization from a set of examples concept learning is a typical inductive learning
infer a definition from given examples of some concept (e.g. cat, soybean disease)
allow to correctly recognize future instances of that concept Two algorithms: version space search and ID3
Machine Learning 5
KU NLPKU NLP
9.0 Introduction
Similarity-based vs. Explanation-based
Similarity-based (data-driven) using no prior knowledge of the domain rely on large numbers of examples generalization on the basis of patterns in training data
Explanation-based Learning(prior knowledge-driven) using prior knowledge of the domain to guide generalization learning by analogy and other technology that utilize prior knowledge
to learn from a limited amount of training data
Machine Learning 6
KU NLPKU NLP
9.0 Introduction
Supervised vs. Unsupervisedsupervised learning
learning from training instances of known classification
unsupervised learning learning from unclassified training data conceptual clustering or category formation
Machine Learning 7
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
Learning Algorithms are characterized by a general
model (Fig. 9.1, p 354, sp 8)
Data and goals of the learning task
Representation Language
A set of operations
Concept space
Heuristic Search
Acquired knowledge
Machine Learning 8
KU NLPKU NLP
A general model of the learning process (Fig. 9.1)
Machine Learning 9
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
Data and Goals Type of data
positive or negative examples Single positive example and domain specific knowledge high-level advice (e.g. condition of loop termination) analogies(e.g. electricity vs. water)
Goal of Learning algorithms: acquisition of concept, general description of a class of objects plans problem-solving heuristics other forms of procedural knowledge
Properties and quality of data come from the outside environment (e.g. teacher) or generated by the program itself reliable or contain noise well-structured or unorganized positive and negative or only positive
Machine Learning 10
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
Concept learning
Explanation-based
Clustering
Data Positive/negative examples of a target class
A training example + prior knowledge
A set of unclassified instances
Goal To infer a general definition
To infer a general concept
To discover categorizations
Machine Learning 11
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
Representation of learned knowledge concept expressions in predicate calculus
A simple formulation of the concept learning problem as
conjunctive sentences containing variables
structured representation such as frames
description of plans as a sequence of operations or triangle table
representation of heuristics as problem-solving rules
size(obj1, small) ^ color(obj1, red) ^ shape(obj1, round) size(obj2, large) ^ color(obj2, red) ^ shape(obj2, round)=> size(X, Y) ^ color(X, red) ^ shape(X, round)
size(obj1, small) ^ color(obj1, red) ^ shape(obj1, round) size(obj2, large) ^ color(obj2, red) ^ shape(obj2, round)=> size(X, Y) ^ color(X, red) ^ shape(X, round)
Machine Learning 12
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
A Set of operations Given a set of training instances, the leaner must construct a
generalization, heuristic rule, or plan that satisfies its goal Requires ability to manipulate representations Typical operations include
generalizing or specializing symbolic expressions adjusting the weights in a neural network modifying the program’s representations
Concept space defines a space of potential concept definitions complexity of potential concept space is a measure of difficulty of
learning algorithms
Machine Learning 13
KU NLPKU NLP
9.1 Framework for Symbol-based Learning
Heuristic Search Use available training data and heuristics to search efficiently Patrick Winston’s work on learning concepts from positive and
negative examples along with near misses (Fig. 9.2). The program learns by refining candidate description of the target
concept through generalization and specialization. Generalization changes the candidate description to let it
accommodate new positive examples (Fig. 9.3) Specialization changes the candidate description to exclude near
misses (Fig. 9.4) Performance of learning algorithm is highly sensitive to the quality
and order of the training examples
Machine Learning 14
KU NLPKU NLP
Examples and Near Misses for the concept “Arch” (Fig. 9.2)
Machine Learning 15
KU NLPKU NLP
Generalization of descriptions (Figure 9.3)
Machine Learning 16
KU NLPKU NLP
Generalizations of descriptions (Fig 9.3 continued)
Machine Learning 17
KU NLPKU NLP
Specialization of description (Figure 9.4)
Machine Learning 18
KU NLPKU NLP
9.2 Version Space Search
Implementation of inductive learning as search through a concept space
Generalization operations impose an ordering on the concepts in a space, and uses this ordering to guide the search
9.2.1 Generalization Operators and Concept Space 9.2.2 Candidate Elimination Algorithm
Machine Learning 19
KU NLPKU NLP
9.2.1 Generalization Operators and the Concept Spaces
Primary generalization operations used in ML Replacing constants with variables
color(ball, red) -> color(X, red) Dropping conditions from a conjunctive expression
shape(X, round) ^ size(X, small) ^ color(X, red) -> shape(X, round) ^ color(X, red)
Adding a disjunct to an expression shape(X, round) ^ size(X, small) ^ color(X, red)
-> shape(X, round) ^ size(X, small) ^ (color(X, red) color(X, blue))
Replacing a property with its parent in a class hierarchy color(X, red)
-> color(X, primary_color) if primary_color is superclass of red
Machine Learning 20
KU NLPKU NLP
9.2.1 Generalization Operators and the Concept Spaces
Notion of covering If concept P is If concept P is more generalmore general than concept Q, we say that than concept Q, we say that “ “P P coverscovers Q” Q” Color(X,Y) Color(X,Y) coverscovers color(ball,Y), which in turn color(ball,Y), which in turn covers covers color(ball,red color(ball,red
)) Concept spaceConcept space
Defines a Defines a space of potential concept definitionsspace of potential concept definitions The example concept space representing the The example concept space representing the predicate obj(Sizes, Color, Shapes) with properties and valuespredicate obj(Sizes, Color, Shapes) with properties and values
Sizes = {large, small}Sizes = {large, small} Colors = {red, white, blue}Colors = {red, white, blue} Shapes = {ball, brick, cube}Shapes = {ball, brick, cube}
is presented in Figure 9.5 (p 362, sp21)is presented in Figure 9.5 (p 362, sp21)
Machine Learning 21
KU NLPKU NLP
A Concept Space (Fig. 9.5)
Machine Learning 22
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
Version space: the set of all concept descriptions
consistent with the training examples.
Toward reducing the size of the version space as
more examples become available (Fig. 9.10) Specific to general search from positive examples General to specific search from negative examples Candidate elimination algorithm combines these into a bi-
directional search
Generalize based on regularities found in the
training data
Supervised learning
Machine Learning 23
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
The learned concept must be general enough to cover
all positive examples, also must be specific enough
to exclude all negative examples maximally specific generalization
Maximally general specialization
A concept c, is maximally specific if it covers all positive examples, none of the negative examples, and for any concept c’, that covers the positive examples, c c’
A concept c, is maximally specific if it covers all positive examples, none of the negative examples, and for any concept c’, that covers the positive examples, c c’
A concept c, is maximally general if it covers none of the negative training instances, and for any other concept c’, that covers no negative training instance, c c’.
A concept c, is maximally general if it covers none of the negative training instances, and for any other concept c’, that covers no negative training instance, c c’.
Machine Learning 24
KU NLPKU NLP
Specific to General Search
Machine Learning 25
KU NLPKU NLP
Specific to General Search (Fig 9.7)
Machine Learning 26
KU NLPKU NLP
General to Specific Search
Machine Learning 27
KU NLPKU NLP
General to Specific Search (Fig 9.8)
Machine Learning 28
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
Machine Learning 29
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
Begin
Initialize G to the most general concept in the space;
Initialize S to the first positive training instance;
For each new positive instance p
Begin
Delete all members of G that fail to match p;
For every s in S, if s does not match p, replace s with its most specific generalizations that match p and are more specific than some members of G;
Delete from S any hypothesis more general than some other hypothesis in S;
End;
For each new negative instance n
Begin
Delete all members of S that match n;
For each g in G that matches n, replace g with its most general specializations that do not match n and are more general than some members of S;
Delete from G any hypothesis more specific than some other hypothesis in G;
End
Machine Learning 30
KU NLPKU NLP
9.2.2 The candidate elimination algorithm (Fig. 9.9)
Machine Learning 31
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
Combining the two directions of search into a
single algorithm has several benefits. G and S sets summarizes the information in the negative
and positive training instances.
Fig. 9.10 gives an abstract description of the
candidate elimination algorithm. “+” signs represent positive instances “-” signs indicate negative instances The search “shrinks” the outermost concept to exclude
negative instances The search “expands” the innermost concept to include new
positive instances
Machine Learning 32
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
Machine Learning 33
KU NLPKU NLP
9.2.2 The candidate elimination algorithm
An incremental nature of learning algorithm Accepts training instances one at a time, forming a usable,
although possibly incomplete, generalization after each example (unlike the batch algorithm such as ID3).
Even before the algorithm converges on a single
concept, the G and S sets provide usable
constraints on that concept If c is the goal concept, then for all g∈G and s∈S, s≤c≤g. Any concept that is more general than some concept in G
will cover negative instance; any concept that is more specific than some concept in S will fail to cover some positive instances
Machine Learning 34
KU NLPKU NLP
9.2.4 Evaluating Candidate Elimination
Problems combinatorics of problem space: excessive growth of search spac
e Useful to develop heuristics for pruning states from G and S (be
am search) Uses an inductive bias to reduce the size of concept space trade off between expressiveness and efficiency
The algorithm may fail to converge because of noise or inconsistency in training data
One solution to this problem is to maintain multiple G and S sets
Contribution explication of the relationship between knowledge representation,
generalization, and search in inductive learning