lecture8.ppt
TRANSCRIPT
Overview of Machine Learning
Foundations of Artificial Intelligence
Foundations of Artificial Intelligence 2
Learning What is Learning?
Learning in AI is also called machine learning or pattern recognition. The basic objective is to allow an intelligent agent to discover autonomously
knowledge from experience.
Let’s examine the definition more closely: “an intelligent agent”: The ability to learn requires a prior level of intelligence and
knowledge. Learning has to start from an existing level of capability. “to discover autonomously”: Learning is fundamentally about an agent
recognizing new facts for its own use and acquiring new abilities that reinforce its own existing abilities. Literal programming, i.e. rote learning from instruction, is not useful.
“knowledge”: Whatever is learned has to be represented in some way that the agent can use. “If you can't represent it, you can't learn it” is a corollary of the slogan “Knowledge is power”.
“from experience”: Experience is typically a set of so-called training examples; examples may be categorized or not. They may be random or selected by a teacher. They may include explanations or not.
Foundations of Artificial Intelligence 3
Learning by Discovery One example: AM by Doug Lenat at Stanford
a mathematical system inputs: set theory (union, intersection, etc); “how to do mathematics” (based on a book by
Polya), e.g., if f is an interesting function of two arguments, then f(x,x) is an interesting function on one, etc.
speculated about what was interesting an made conjectures, etc.
What AM discovered integers (as equivalence relation on cardinality of sets) addition (using disjoint union of sets) multiplication primes: 1 was interesting, the function returning the cardinality of set of divisors was interesting,
etc. Glodbach’s conjecture: “all even numbers are the sum of two prime numbers”; (note that AM
did not prove it, just discovered that it was interesting
Why was AM so successful? Connection between LISP and mathematics (mutations of small bits of LISP code are likely to
be interesting) Doesn’t extend to other domains Lessons from EURISKO (fleet game)
Foundations of Artificial Intelligence 4
Explanation-Based Learning Explanation- based learning (EBL) systems try to explain why
each training instance belongs to the target concept. The resulting “proof” is then generalized and saved. If a new instance can be explained in the same manner as a previous instance,
then it is also assumed to be a member of the target concept.
Like macro- operators, EBL systems never learn to solve a problem that they couldn’t solve before (in principle). However, they can become much more efficient at problem- solving by
reorganizing the search space.
One of the strengths of EBL is that the resulting “explanations” are typically easy to understand.
One of the weaknesses of EBL is that they rely on a domain theory to generate the explanations.
Foundations of Artificial Intelligence 5
Case-Based Learning Case-based reasoning (CBR) systems keep track of previously
seen instances and apply them directly to new ones.
In general, a CBR system simply stores each “case” that it experiences in a “case base” which represents its memory of previous episodes.
To reason about a new instance, the system consults its case base and finds the most similar case that it’s seen before. The old case is then adapted and applied to the new situation.
CBR is similar to reasoning by analogy. Many people believe that much of human learning is case- based in nature.
Foundations of Artificial Intelligence 6
Connectionist Algorithms Connectionist models (also called neural networks) are inspired
by the interconnectivity of the brain. Connectionist networks typically consist of many nodes that are highly
interconnected. When a node is activated, it sends signals to other nodes so that they are activated in turn.
Using layers of nodes allows connectionist models to learn fairly complex functions.
Neural networks are loosely modeled after the biological processes involved in cognition: 1. Information processing involves many simple elements called neurons. 2. Signals are transmitted between neurons using connecting links. 3. Each link has a weight that controls the strength of its signal. 4. Each neuron applies an activation function to the input that it receives from
other neurons. This function determines its output.
Foundations of Artificial Intelligence 7
Concept Learning A form of supervised learning in which data is classified
according to one or more predefined categories The learning program is given examples of the form (xi, yi) and tries to learn a
function f such that f(xi) = yi for all i .
f should be general enough to apply to values of x that were not among the training instances.
The system might learn:
feathers => birdor
(feathers /\ (yellow \/ b&w) ) => bird
feathers => birdor
(feathers /\ (yellow \/ b&w) ) => bird
Foundations of Artificial Intelligence 8
Inductive Learning Inductive Learning
inductive learning involves learning generalized rules from specific examples (can think of this as the “inverse” of deduction)
main task: given a set of examples, each classified as positive or negative produce a concept description that matches exactly the positive examples
Some Notes: The examples are coded in some representation language, e.g. they are coded by a finite set of
real-valued features. The concept description is in a certain language that is presumably a superset of the language of
possible example encodings. A “correct” concept description is one that classifies correctly ALL possible examples, not just
those given in the training set.
Fundamental Difficulties with Induction can’t generalize with perfect certainty examples and concepts are NOT available “directly”; they are only available through
representations which may be more or less adequate to capture them some examples may be classified as both positive and negative the features supplied may not be sufficient to discriminate between positive and negative
examples
Foundations of Artificial Intelligence 9
Inductive Bias Learning as Classification
can be viewed as classification of a target concept according to examples often by looking at positive and negative instances of a binary predicate problem: learning spaces can be very large
Example: Learning a classification of bit strings classification is a subset of set of all instances for m instances we have 2m possible classifications but, for n bits there are 2n possible strings of n bits so, total space is 22n
Inductive Bias need to use a bias to prune the search space e.g., given strings {1100, 1010} as positive examples of some target set, we can narrow
down possible generalizations: strings that start with 1 and end in 0 strings the have equal number of 0’s and 1’s strings with even number of 0’s and 1’s
what if we now get a positive example 110100? What if we get a negative example 100101?
Foundations of Artificial Intelligence 10
Version Spaces Idea: assume you are looking for a CONJUNCTIVE CONCEPT
e.g., spade A, club 7, club 9 yes
club 8, heart 5 no concept: odd and black
now notice that the set of conjunctive concepts is partially ordered by specificity
any card
black
odd black spade
odd spade
3 of spade
at any point, keep most specific and least specific conjuncts consistent with data:
most specific:• anything more specific misses some positive instances • always exists -- conjoin all OK conjunctions
least specific:• anything less specific admits some negative instances • may not be unique -- imagine all you know is club 4 not ok, odd black ok, spade ok, black not ok
Idea is to gradually merge least and most specific as data comes in.
Foundations of Artificial Intelligence 11
Card In Target Set?
A- yes
7- yes
8- no
9- yes
5- no
K- no
6- no
7- yes
Version Spaces: Example
Step 0: most specific concept (msc) is the empty set; least specific concept (lsc) is the set of all cards.
Step 1: A-spade is found to be in target set: msc = {A-spade} lsc = set of all cards
Step 2: 7-club is found to be in target set: msc = odd black cards lsc = set of all cards
Step 3: 8-heart is not in target set msc = odd black cards lsc = all odd cards OR all black cards
. . .
The training examples (obtained) incrementally:
Foundations of Artificial Intelligence 12
What Is Classification? The goal of data classification is to organize and
categorize data in distinct classes A model is first created based on the data distribution The model is then used to classify new data Given the model, a class can be predicted for new data
Classification = prediction for discrete and nominal values With classification, I can predict in which bucket to put the ball,
but I can’t predict the weight of the ball
Foundations of Artificial Intelligence 13
Classification: 3 Step Process 1. Model construction (Learning):
Each record (instance) is assumed to belong to a predefined class, as determined by one of the attributes, called the class label
The set of all records used for construction of the model is called training set
The model is usually represented in the form of classification rules, (IF-THEN statements) or decision trees
2. Model Evaluation (Accuracy): Estimate accuracy rate of the model based on a test set The known label of test sample is compared with the classified result from
model Accuracy rate: percentage of test set samples correctly classified by the
model Test set is independent of training set otherwise over-fitting will occur
3. Model Use (Classification): The model is used to classify unseen instances (assigning class labels) Predict the value of an actual attribute
Foundations of Artificial Intelligence 14
Decision Trees What is a Decision Tree
it takes as input the description of a situation as a set of attributes (features) and outputs a yes/no decision (so it represents a Boolean function)
each leaf is labeled "positive” or "negative", each node is labeled with an attribute (or feature), and each edge is labeled with a value for the feature of its parent node
Attribute-value language for examples in many inductive tasks, especially learning decision trees, we need a
representation language for examples each example is a finite feature vector a concept is a decision tree where nodes are features
Foundations of Artificial Intelligence 15
Decision Trees Example: “is it a good day to play golf?”
a set of attributes and their possible values:
outlook sunny, overcast, rain
temperature cool, mild, hot
humidity high, normal
windy true, false
A particular instance in thetraining set might be:
<overcast, hot, normal, false>: play
In this case, the target classis a binary attribute, so eachinstance represents a positiveor a negative example.
Foundations of Artificial Intelligence 16
Using Decision Trees for Classification Examples can be classified as follows
1. look at the example's value for the feature specified 2. move along the edge labeled with this value 3. if you reach a leaf, return the label of the leaf 4. otherwise, repeat from step 1
Example (a decision tree to decide whether to go play golf):
outlook
humidity windyyes
yes no yesno
sunny overcast rain
high normal true false
Foundations of Artificial Intelligence 17
What is Clustering?
Cluster: a collection of data objects that
are “similar” to one another and thus can be treated collectively as one group
but as a collection, they are sufficiently different from other groups
Clustering unsupervised classification no predefined classes
Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters
Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters
Helps users understand the natural grouping or structure in a data set
Foundations of Artificial Intelligence 18
What Is Good Clustering? A good clustering will produce high quality
clusters in which: the intra-class (that is, intra-cluster) similarity is high the inter-class similarity is low
The quality of a clustering result also depends on both the similarity measure used by the method and its implementation
The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
The quality of a clustering result also depends on the definition and representation of cluster chosen
Foundations of Artificial Intelligence 19
Applications of Clustering Clustering has wide applications in Pattern
Recognition
Spatial Data Analysis: create thematic maps in GIS by clustering feature spaces detect spatial clusters and explain them in spatial data mining
Image Processing
Market Research
Information Retrieval Document or term categorization Information visualization and IR interfaces
Web Mining Cluster Web usage data to discover groups of similar access patterns Web Personalization
Foundations of Artificial Intelligence 20
What is Memory-Based Reasoning? Basic Idea: classify new instances based on their similarity to
instances we have seen before also called “instance-based learning”
Simplest form of MBR: Rote Learning learning by memorization save all previously encountered instance; given a new instance, find one from
the memorized set that most closely “resembles” the new one; assign new instance to the same class as the “nearest neighbor”
more general methods try to find k nearest neighbors rather than just one but, how do we define “resembles?”
MBR is “lazy” defers all of the real work until new instance is obtained; no attempts are made
to learn a generalized model from the training set less data preprocessing and model evaluation, but more work has to be done at
classification time
Foundations of Artificial Intelligence 21
Basic Issues in Applying MBR Choosing the right set of instances
can do just random sampling since “unusual” records may be missed (e.g., in the movie database, poplar movies will dominate the random sample)
usual practice is to keep roughly the same number of records for each class
Computing Distance general distance functions like those discussed before can be used issues are how to normalize and what to do with missing values
Finding the right “combination” function how many nearest neighbors need to be used how to combine answers from nearest neighbors
basic approaches: democracy, weighted voting
Foundations of Artificial Intelligence 22
MBR in Collaborative Filtering “Social Learning”
idea is to give recommendations to a user based on the “ratings” of objects by other users
usually assumes that features in the data are similar objects (e.g., Web pages, music, movies, etc.)
usually requires “explicit” ratings of objects by users based on a rating scale there have been some attempts to obtain ratings implicitly based on user
behavior (mixed results; problem is that implicit ratings are often binary)
Will Karen like “Independence Day?”Will Karen like “Independence Day?”