module 3 - decisiontrees-introtoann
TRANSCRIPT
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
1/44
Ing. Leonel D Rozo C, M.Sc, PhD(c)[email protected]
2010
mailto:[email protected]:[email protected] -
8/6/2019 Module 3 - DecisionTrees-IntroToANN
2/44
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
3/44
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
4/44
2. Appropriate problems for decision tree learning
Instances are represented by attribute-value pairs - Instances aredescribed by a fixed set of attributes (e.g., Temperature) and
their values (e.g., Hot).
The target function has discrete output values - The decision treeassigns a boolean classification (e.g., yes or no) to eachexample.
Disjunctive descriptions may be required.
The training data may contain errors.
The training data may contain missing attribute values.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
5/44
3. The basic decision tree learning algorithm
1. Which attribute should be tested at the root of the tree?
2. The best attribute is selected and used as the test at the rootnode of the tree.
3. A descendant of the root node is then created for each possiblevalue of this attribute, and the training examples are sorted tothe appropriate descendant node.
4. The entire process is then repeated using the training examplesassociated with each descendant node to select the best attributeto test at that point in the tree.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
6/44
3. The basic decision tree learning algorithm
3.1. Which attribute is the best classifier ?
The central choice in the algorithm is selecting which attribute to testat each node in the tree. We would like to select the attribute that ismost useful for classifying examples.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
7/44
3.1. Which attribute is the best classifier ?
Entropy measures homogeneity of examplesDefining a measure commonly used in information theory, calledentropy, that characterizes the (im)purity of an arbitrary collection ofexamples
.
Given a collection S, containing positive and negative examples ofsome target concept, the entropy of S relative to this booleanclassification is:
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
8/44
3.1. Which attribute is the best classifier ?
Information gain measures the expected reduction in entropy
Information gain is simply the expected reduction in entropy caused
by partitioning the examples according to this attribute. Moreprecisely, the information gain, Gain(S, A) of an attribute A, relativeto a collection of examples S, is defined as:
Set of all possiblevalues for attribute A
Subset of S forwhich attribute Ahas value v
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
9/44
3.1. Which attribute is the best classifier ?
An illustrative example
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
10/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
The algorithm described before growseach branch of the tree just deeply enoughto perfectly classify the training examples.While this is sometimes a reasonablestrategy, in fact it can lead to difficultieswhen:
There is noise in the data.
The number of training examples is toosmall to produce a representativesample of the true target function.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
11/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
A hypothesis overfits the training examples if some other hypothesisthat fits the training examples less well actually performs better over theentire distribution of instances (i.e., including instances beyond thetraining set).
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
12/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
There are several approaches to avoiding overfitting in decision treelearning. These can be grouped into two classes:
Approaches that stop growing the tree earlier, before it reaches thepoint where it perfectly classifies the training data.
Approaches that allow the tree to overfit the data, and then post-prune
the tree.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
13/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
Reduced error pruningConsider each of the decision nodes in the tree to be candidates forpruning. Pruning a decision node consists of removing the subtreerooted at that node, making it a leaf node, and assigning it the mostcommon classification of the training examples affiliated with thatnode.
Nodes are removed only if the resulting pruned tree performs noworse than the original over the validation set.
Nodes are pruned iteratively, always choosing the node whoseremoval most increases the decision tree accuracy over the validationset.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
14/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
Reduced error pruning
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
15/44
4. Issues in decision tree learning
4.1. Avoiding overfitting the data
Rule post-pruning
i. Infer the decision tree from the training set.
ii. Convert the learned tree into an equivalent set of rules.
iii. Prune (generalize) each rule by removing any preconditions that
result in improving its estimated accuracy.
iv. Sort the pruned rules by their estimated accuracy, and consider themin this sequence when classifying subsequent instances.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
16/44
4. Issues in decision tree learning
4.2. Incorporating continuous-valued attributes
This can be accomplished by dynamically defining new discrete valuedattributes that partition the continuous attribute value into a discrete setof intervals.
In particular, for an attribute A that is continuous-valued, thealgorithm can dynamically create a new boolean attribute Ac, thatis true if A < c and false otherwise.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
17/44
Many tasks involving intelligence or
pattern recognition are extremelydifficult to automate, but appear to beperformed very easily by animals.
For instance, animalsrecognize various objects and
make sense out of the largeamount of visual information intheir surroundings, apparentlyrequiring very little effort.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
18/44
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
19/44
2. History of neural networks
The amount of activity at any given point in the brain cortex is the sumof the tendencies of all other points to discharge into it, such tendencies
being proportionate
(William James)
1. To the number of times the excitement of other points may haveaccompanied that of the point in question.
2. To the intensities of such excitements.
3. To the absence of any rival point functionally disconnected withthe first point, into which the discharges may be diverted.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
20/44
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
21/44
2. History of neural networks
1954 Gabor invented the learning filter" that uses gradientdescent to obtain optimal weights that minimize the MSE
between the observed output signal and a signal generatedbased upon the past information.
1958 Rosenblatt invented the perceptron, introducing alearning method for the McCulloch and Pitts neuron model.
1960 Widrow and Hoffintroduced the Adaline.
1961 Rosenblatt proposed the backpropagation scheme fortraining multilayer networks.
1969 The limits of simple perceptrons were demonstrated.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
22/44
3. Structure and function of a single neuron
3.1. Biological neurons
A typical biological neuron is composed of a cell body, a tubular axon,and a multitude of hair-like dendrites.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
23/44
3. Structure and function of a single neuron
3.1. Biological neurons
The small gap between an end bulb and a dendrite is called a synapse,across which information is propagated. The axon of a single neuronforms synaptic connections with many other neurons.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
24/44
3. Structure and function of a single neuron
3.1. Biological neurons
Inhibitory or excitatory signals from other neurons are transmitted toa neuron at its dendrites synapses . The magnitude of the signalreceived by a neuron (from another) depends on the efficiency of thesynaptic transmission.
The cell membrane becomes electrically active when sufficientlyexcited by the neurons making synapses onto this neuron.
A neuron will fire if sufficient signals from other neurons fall upon itsdendrites in a short period of time, called the period of latent
summation.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
25/44
3. Structure and function of a single neuron
3.2. Artificial neuron models
The position of the neuron (node) ofthe incoming synapse (connection) isirrelevant.
Each node has a single output value,distributed to other nodes viaoutgoing links, irrespective theirpositions.
All inputs come in the same time orremain activated at the same levellong enough for computation of f tooccur.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
26/44
3. Structure and function of a single neuron
3.2. Artificial neuron models
The next level of specialization is to assume that different weightedinputs are summed.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
27/44
3. Structure and function of a single neuron
3.2. Artificial neuron models
Now, it is necessary to stablish which function fthe neuron has
Ramp functions
Step functions
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
28/44
3. Structure and function of a single neuron
3.2. Artificial neuron models
Sigmoid functions
Piecewise linear and Gaussian functions
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
29/44
4. Neural net architectures
A single node is insufficient for many practical problems, andnetworks with a large number of nodes are frequently used. The
way nodes are connected determines how computations proceedand constitutes an important early design decision by a neuralnetwork developer.
Fully connected networks
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
30/44
4. Neural net architectures
Layered networks
Acyclic networks
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
31/44
4. Neural net architectures
Feedforward networks
Modular networks
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
32/44
5. Neural learning
Correlation learning
When an axon of cell A is nearenough to excite a cell B andrepeatedly or persistently takesplace in firing it, some growthprocess or metabolic change takesplace in one or both cells such thatAs efficiency, as one of the cells
firing B, is increased.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
33/44
5. Neural learning
Competitive learning
Another principle for neural computation is that when aninput pattern is presented to a network, different nodescompete to be " winners" with high levels of activity. Thecompetitive process involves self-excitation and mutualinhibition among nodes, until a single winner emerges.
The connections between input nodes and the winner node are
then modified , increasing the likelihood that the same winnercontinues to win in future competitions.
The converse of competition is cooperation, found in someneural network models.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
34/44
5. Neural learning
Feedback-based weight adaptation
If increasing a particular weight leadsto diminished performance orlarger error, then that weight isdecreased as the network is trainedto perform better.
The amount of change made at every
step is very small in most networks toensure that a network does not stray toofar from its partially evolved state, and sothat the network withstands somemistakes made by the teacher, feedback,
or performance evaluation mechanism.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
35/44
6. What can neural networks be used for ?
Classification
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
36/44
6. What can neural networks be used for ?
Clustering
Clustering requires grouping together objects that are similar toeach other
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
37/44
6. What can neural networks be used for ?
Pattern association
In pattern association, another important task that can beperformed by neural networks, the presentation of an inputsample should trigger the generation of a specific output pattern
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
38/44
6. What can neural networks be used for ?
Function approximation
Many computational models can be described as functionsmapping some numerical input vectors to numericaloutputs. The outputs corresponding to some input vectorsmay be known from training data, but we may not know themathematical function describing the actual process thatgenerates the outputs from the input vectors.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
39/44
6. What can neural networks be used for ?
Forescasting
There are many real-life problems in which future eventsmust be predicted on the basis of past history. An exampletask is that of predicting the behavior of stock marketindices.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
40/44
6. What can neural networks be used for ?
Control applications
Control addresses the task of determining the values for inputvariables in order to achieve desired values for output variables.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
41/44
7. Evaluation of networks
Quality of results
The performance of a neural network is frequently gauged interms of an error measure.
Euclidean distance
Manhattan or Hamming distance
In classification problems, another possible error measure is the
fraction of misclassified samples.
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
42/44
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
43/44
8. Real applications of neural networks
-
8/6/2019 Module 3 - DecisionTrees-IntroToANN
44/44