module 3 - decisiontrees-introtoann

Upload: cristian-torres

Post on 08-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    1/44

    Ing. Leonel D Rozo C, M.Sc, PhD(c)[email protected]

    2010

    mailto:[email protected]:[email protected]
  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    2/44

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    3/44

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    4/44

    2. Appropriate problems for decision tree learning

    Instances are represented by attribute-value pairs - Instances aredescribed by a fixed set of attributes (e.g., Temperature) and

    their values (e.g., Hot).

    The target function has discrete output values - The decision treeassigns a boolean classification (e.g., yes or no) to eachexample.

    Disjunctive descriptions may be required.

    The training data may contain errors.

    The training data may contain missing attribute values.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    5/44

    3. The basic decision tree learning algorithm

    1. Which attribute should be tested at the root of the tree?

    2. The best attribute is selected and used as the test at the rootnode of the tree.

    3. A descendant of the root node is then created for each possiblevalue of this attribute, and the training examples are sorted tothe appropriate descendant node.

    4. The entire process is then repeated using the training examplesassociated with each descendant node to select the best attributeto test at that point in the tree.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    6/44

    3. The basic decision tree learning algorithm

    3.1. Which attribute is the best classifier ?

    The central choice in the algorithm is selecting which attribute to testat each node in the tree. We would like to select the attribute that ismost useful for classifying examples.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    7/44

    3.1. Which attribute is the best classifier ?

    Entropy measures homogeneity of examplesDefining a measure commonly used in information theory, calledentropy, that characterizes the (im)purity of an arbitrary collection ofexamples

    .

    Given a collection S, containing positive and negative examples ofsome target concept, the entropy of S relative to this booleanclassification is:

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    8/44

    3.1. Which attribute is the best classifier ?

    Information gain measures the expected reduction in entropy

    Information gain is simply the expected reduction in entropy caused

    by partitioning the examples according to this attribute. Moreprecisely, the information gain, Gain(S, A) of an attribute A, relativeto a collection of examples S, is defined as:

    Set of all possiblevalues for attribute A

    Subset of S forwhich attribute Ahas value v

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    9/44

    3.1. Which attribute is the best classifier ?

    An illustrative example

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    10/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    The algorithm described before growseach branch of the tree just deeply enoughto perfectly classify the training examples.While this is sometimes a reasonablestrategy, in fact it can lead to difficultieswhen:

    There is noise in the data.

    The number of training examples is toosmall to produce a representativesample of the true target function.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    11/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    A hypothesis overfits the training examples if some other hypothesisthat fits the training examples less well actually performs better over theentire distribution of instances (i.e., including instances beyond thetraining set).

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    12/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    There are several approaches to avoiding overfitting in decision treelearning. These can be grouped into two classes:

    Approaches that stop growing the tree earlier, before it reaches thepoint where it perfectly classifies the training data.

    Approaches that allow the tree to overfit the data, and then post-prune

    the tree.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    13/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    Reduced error pruningConsider each of the decision nodes in the tree to be candidates forpruning. Pruning a decision node consists of removing the subtreerooted at that node, making it a leaf node, and assigning it the mostcommon classification of the training examples affiliated with thatnode.

    Nodes are removed only if the resulting pruned tree performs noworse than the original over the validation set.

    Nodes are pruned iteratively, always choosing the node whoseremoval most increases the decision tree accuracy over the validationset.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    14/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    Reduced error pruning

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    15/44

    4. Issues in decision tree learning

    4.1. Avoiding overfitting the data

    Rule post-pruning

    i. Infer the decision tree from the training set.

    ii. Convert the learned tree into an equivalent set of rules.

    iii. Prune (generalize) each rule by removing any preconditions that

    result in improving its estimated accuracy.

    iv. Sort the pruned rules by their estimated accuracy, and consider themin this sequence when classifying subsequent instances.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    16/44

    4. Issues in decision tree learning

    4.2. Incorporating continuous-valued attributes

    This can be accomplished by dynamically defining new discrete valuedattributes that partition the continuous attribute value into a discrete setof intervals.

    In particular, for an attribute A that is continuous-valued, thealgorithm can dynamically create a new boolean attribute Ac, thatis true if A < c and false otherwise.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    17/44

    Many tasks involving intelligence or

    pattern recognition are extremelydifficult to automate, but appear to beperformed very easily by animals.

    For instance, animalsrecognize various objects and

    make sense out of the largeamount of visual information intheir surroundings, apparentlyrequiring very little effort.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    18/44

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    19/44

    2. History of neural networks

    The amount of activity at any given point in the brain cortex is the sumof the tendencies of all other points to discharge into it, such tendencies

    being proportionate

    (William James)

    1. To the number of times the excitement of other points may haveaccompanied that of the point in question.

    2. To the intensities of such excitements.

    3. To the absence of any rival point functionally disconnected withthe first point, into which the discharges may be diverted.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    20/44

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    21/44

    2. History of neural networks

    1954 Gabor invented the learning filter" that uses gradientdescent to obtain optimal weights that minimize the MSE

    between the observed output signal and a signal generatedbased upon the past information.

    1958 Rosenblatt invented the perceptron, introducing alearning method for the McCulloch and Pitts neuron model.

    1960 Widrow and Hoffintroduced the Adaline.

    1961 Rosenblatt proposed the backpropagation scheme fortraining multilayer networks.

    1969 The limits of simple perceptrons were demonstrated.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    22/44

    3. Structure and function of a single neuron

    3.1. Biological neurons

    A typical biological neuron is composed of a cell body, a tubular axon,and a multitude of hair-like dendrites.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    23/44

    3. Structure and function of a single neuron

    3.1. Biological neurons

    The small gap between an end bulb and a dendrite is called a synapse,across which information is propagated. The axon of a single neuronforms synaptic connections with many other neurons.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    24/44

    3. Structure and function of a single neuron

    3.1. Biological neurons

    Inhibitory or excitatory signals from other neurons are transmitted toa neuron at its dendrites synapses . The magnitude of the signalreceived by a neuron (from another) depends on the efficiency of thesynaptic transmission.

    The cell membrane becomes electrically active when sufficientlyexcited by the neurons making synapses onto this neuron.

    A neuron will fire if sufficient signals from other neurons fall upon itsdendrites in a short period of time, called the period of latent

    summation.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    25/44

    3. Structure and function of a single neuron

    3.2. Artificial neuron models

    The position of the neuron (node) ofthe incoming synapse (connection) isirrelevant.

    Each node has a single output value,distributed to other nodes viaoutgoing links, irrespective theirpositions.

    All inputs come in the same time orremain activated at the same levellong enough for computation of f tooccur.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    26/44

    3. Structure and function of a single neuron

    3.2. Artificial neuron models

    The next level of specialization is to assume that different weightedinputs are summed.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    27/44

    3. Structure and function of a single neuron

    3.2. Artificial neuron models

    Now, it is necessary to stablish which function fthe neuron has

    Ramp functions

    Step functions

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    28/44

    3. Structure and function of a single neuron

    3.2. Artificial neuron models

    Sigmoid functions

    Piecewise linear and Gaussian functions

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    29/44

    4. Neural net architectures

    A single node is insufficient for many practical problems, andnetworks with a large number of nodes are frequently used. The

    way nodes are connected determines how computations proceedand constitutes an important early design decision by a neuralnetwork developer.

    Fully connected networks

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    30/44

    4. Neural net architectures

    Layered networks

    Acyclic networks

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    31/44

    4. Neural net architectures

    Feedforward networks

    Modular networks

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    32/44

    5. Neural learning

    Correlation learning

    When an axon of cell A is nearenough to excite a cell B andrepeatedly or persistently takesplace in firing it, some growthprocess or metabolic change takesplace in one or both cells such thatAs efficiency, as one of the cells

    firing B, is increased.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    33/44

    5. Neural learning

    Competitive learning

    Another principle for neural computation is that when aninput pattern is presented to a network, different nodescompete to be " winners" with high levels of activity. Thecompetitive process involves self-excitation and mutualinhibition among nodes, until a single winner emerges.

    The connections between input nodes and the winner node are

    then modified , increasing the likelihood that the same winnercontinues to win in future competitions.

    The converse of competition is cooperation, found in someneural network models.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    34/44

    5. Neural learning

    Feedback-based weight adaptation

    If increasing a particular weight leadsto diminished performance orlarger error, then that weight isdecreased as the network is trainedto perform better.

    The amount of change made at every

    step is very small in most networks toensure that a network does not stray toofar from its partially evolved state, and sothat the network withstands somemistakes made by the teacher, feedback,

    or performance evaluation mechanism.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    35/44

    6. What can neural networks be used for ?

    Classification

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    36/44

    6. What can neural networks be used for ?

    Clustering

    Clustering requires grouping together objects that are similar toeach other

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    37/44

    6. What can neural networks be used for ?

    Pattern association

    In pattern association, another important task that can beperformed by neural networks, the presentation of an inputsample should trigger the generation of a specific output pattern

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    38/44

    6. What can neural networks be used for ?

    Function approximation

    Many computational models can be described as functionsmapping some numerical input vectors to numericaloutputs. The outputs corresponding to some input vectorsmay be known from training data, but we may not know themathematical function describing the actual process thatgenerates the outputs from the input vectors.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    39/44

    6. What can neural networks be used for ?

    Forescasting

    There are many real-life problems in which future eventsmust be predicted on the basis of past history. An exampletask is that of predicting the behavior of stock marketindices.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    40/44

    6. What can neural networks be used for ?

    Control applications

    Control addresses the task of determining the values for inputvariables in order to achieve desired values for output variables.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    41/44

    7. Evaluation of networks

    Quality of results

    The performance of a neural network is frequently gauged interms of an error measure.

    Euclidean distance

    Manhattan or Hamming distance

    In classification problems, another possible error measure is the

    fraction of misclassified samples.

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    42/44

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    43/44

    8. Real applications of neural networks

  • 8/6/2019 Module 3 - DecisionTrees-IntroToANN

    44/44