cs2351 ai unit5 learning

Upload: kaliaprml

Post on 03-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 CS2351 AI UNIT5 Learning

    1/65

    UNIT V: LEARNING

  • 8/12/2019 CS2351 AI UNIT5 Learning

    2/65

    LEARNING

    Learning from Observation

    Inductive Learning

    Decision Trees Explanation based Learning

    Statistical Learning methods

    Reinforcement Learning

  • 8/12/2019 CS2351 AI UNIT5 Learning

    3/65

    2000-2012 Franz Kurfess Learning

    Learning

    An agent tries to improve its behavior throughobservation, reasoning, or reflection

    learning from experience

    memorization of past percepts, states, and actions generalizations, identification of similar experiences

    forecasting

    prediction of changes in the environment

    theories generation of complex models based on observations

    and reasoning

  • 8/12/2019 CS2351 AI UNIT5 Learning

    4/65

    2000-2012 Franz Kurfess Learning

    Learning from Observation

    Learning Agents

    Inductive Learning

    Learning Decision Trees

  • 8/12/2019 CS2351 AI UNIT5 Learning

    5/65

    2000-2012 Franz Kurfess Learning

    Learning Agents

    based on previous agent designs, such asreflexive, model-based, goal-based agents

    those aspects of agents are encapsulated into the

    performance elementof a learning agent a learning agent has an additional learning

    element usually used in combination with a critic and a

    problem generator for better learning most agents learn from examples

    inductive learning

  • 8/12/2019 CS2351 AI UNIT5 Learning

    6/65

    2000-2012 Franz Kurfess Learning

    Learning Agent Model

    Sensors

    Effectors

    Performance Element

    Critic

    Learning Element

    Problem Generator

    Agent

    Environment

    PerformanceStandard

    Feedback

    LearningGoals

    Changes

    Knowledge

  • 8/12/2019 CS2351 AI UNIT5 Learning

    7/65 2000-2012 Franz Kurfess Learning

    Forms of Learning

    supervised learning

    an agent tries to find a function that matches examples from a sampleset

    each example provides an input together with the correct output

    a teacher provides feedback on the outcome

    the teacher can be an outside entity, or part of the environment

    unsupervised learning

    the agent tries to learn from patterns without corresponding outputvalues

    reinforcement learning the agent does not know the exact output for an input, but it receives

    feedback on the desirability of its behavior

    the feedback can come from an outside entity, the environment, or theagent itself

    the feedback may be delayed, and not follow the respective actionimmediately

  • 8/12/2019 CS2351 AI UNIT5 Learning

    8/65 2000-2012 Franz Kurfess Learning

    Feedback

    provides information about the actual outcome ofactions

    supervised learning

    both the input and the output of a component can be

    perceived by the agent directly

    the output may be provided by a teacher

    reinforcement learning

    feedback concerning the desirability of the agentsbehavior is available

    not in the form of the correct output

    may not be directly attributable to a particular action

    feedback may occur only after a sequence of actions

  • 8/12/2019 CS2351 AI UNIT5 Learning

    9/65 2000-2012 Franz Kurfess Learning

    Prior Knowledge

    background knowledge available before a task

    is tackled

    can increase performance or decrease

    learning time considerably

    many learning schemes assume that no prior

    knowledge is available

    in reality, some prior knowledge is almost

    always available

    but often in a form that is not immediately usable

    by the agent

  • 8/12/2019 CS2351 AI UNIT5 Learning

    10/65 2000-2012 Franz Kurfess Learning

    Inductive Learning

    tries to find a function h(the hypothesis)

    that approximates a set of samples defining a

    function f

    the samples are usually provided as

    input-output pairs (x, f(x))

    supervised learning method

    relies on inductive inference, or induction

    conclusions are drawn from specific instances to

    more general statements

  • 8/12/2019 CS2351 AI UNIT5 Learning

    11/65 2000-2012 Franz Kurfess Learning

    Hypotheses finding a suitable hypothesis can be difficult

    since the function fis unknown, it is hard to tell if thehypothesis his a good approximation

    the hypothesis spacedescribes the set of

    hypotheses under consideration e.g. polynomials, sinusoidal functions, propositional

    logic, predicate logic, ...

    the choice of the hypothesis space can strongly

    influence the task of finding a suitable function

    while a very general hypothesis space (e.g. Turingmachines) may be guaranteed to contain a suitablefunction, it can be difficult to find it

    Ockhams razor: if multi le h otheses are

  • 8/12/2019 CS2351 AI UNIT5 Learning

    12/65 2000-2012 Franz Kurfess Learning

    Example Inductive Learning 1

    input-output pairsdisplayed as points

    in a plane

    the task is to find a

    hypothesis

    (functions) thatconnects the points

    either all of them,

    or most of them

    various

    performancemeasures

    number of points

    connected

    minimal surface

    lowest tension

    x

    f(x)

  • 8/12/2019 CS2351 AI UNIT5 Learning

    13/65 2000-2012 Franz Kurfess Learning

    Example Inductive Learning 2

    hypothesis is afunction consisting

    of linear segments

    fully incorporates all

    sample pairs

    goes through allpoints

    very easy to

    calculate

    has discontinuities

    at the joints of thesegments

    moderate predictive

    performancex

    f(x)

  • 8/12/2019 CS2351 AI UNIT5 Learning

    14/65 2000-2012 Franz Kurfess Learning

    Example Inductive Learning 3

    hypothesisexpressed as apolynomialfunction

    incorporates allsamples

    morecomplicated tocalculate than

    linear segments no

    discontinuities

    better predictivepower

    x

    f(x)

  • 8/12/2019 CS2351 AI UNIT5 Learning

    15/65 2000-2012 Franz Kurfess Learning

    Example Inductive Learning 4

    hypothesis is alinear functions

    does not

    incorporate all

    samples

    extremely easy tocompute

    low predictive

    power

    x

    f(x)

  • 8/12/2019 CS2351 AI UNIT5 Learning

    16/65

    2000-2012 Franz Kurfess Learning

    Learning and Decision Trees

    based on a set of attributes as input, predictedoutput value, the decisionis learned

    it is called classificationlearning for discrete values

    regressionfor continuous values

    Boolean or binary classification

    output values are true or false

    conceptually the simplest case, but still quite powerful

    making decisions

    a sequence of test is performed, testing the value of

    one of the attributes in each step

    when a leaf node is reached its value is returned

  • 8/12/2019 CS2351 AI UNIT5 Learning

    17/65

    2000-2012 Franz Kurfess Learning

    Boolean Decision Trees

    compute yes/no decisions based on sets of

    desirable or undesirable properties of an

    object or a situation

    each node in the tree reflects one yes/no decision

    based on a test of the value of one property of the

    object

    the root node is the starting point leaf nodes represent the possible final decisions

    branches are labeled with possible values

    the learning aspect is to predict the value of a

    oal redicate also called oal conce t

  • 8/12/2019 CS2351 AI UNIT5 Learning

    18/65

    2000-2012 Franz Kurfess Learning

    Terminology

    example or sample describes the values of the attributes and the goal

    a positive sample has the value true for the goal predicate, a

    negative sample has false

    sample set

    collection of samples used for training and validation

    training

    the training set consists of samples used for

    constructing the decision tree

    validation

    the test set is used to determine if the decision tree

  • 8/12/2019 CS2351 AI UNIT5 Learning

    19/65

    2000-2012 Franz Kurfess Learning

    Restaurant Sample Set

    Example Attributes Goal ExaAlt Bar Fri Hun Pat Price Rain Res Type Est WillWait

    X1 Yes No No Yes Some $$$ No Yes French 0-10 Yes X1

    X2 Yes No No Yes Full $ No No Thai 30-60 No X2

    X3 No Yes No No Some $ No No Burger 0-10 Yes X3

    X4 Yes No Yes Yes Full $ No No Thai 10-30 Yes X4

    X5 Yes No Yes No Full $$$ No Yes French >60 No X5

    X6 No Yes No Yes Some $$ Yes Yes Italian 0-10 Yes X6

    X7 No Yes No No None $ Yes No Burger 0-10 No X7

    X8 No No No Yes Some $$ Yes Yes Thai 0-10 Yes X8X9 No Yes Yes No Full $ Yes No Burger >60 No X9

    X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10-30 No X10

    X11 No No No No None $ No No Thai 0-10 No X11

    X12 Yes Yes Yes Yes Full $ No No Burger 30-60 Yes X12

  • 8/12/2019 CS2351 AI UNIT5 Learning

    20/65

    2000-2012 Franz Kurfess Learning

    Decision Tree Example

    No

    No

    Yes

    Patrons?

    Bar?

    EstWait?

    Hungry?Yes

    Yes Alternative? No Alternative?

    Walkable?Yes

    Yes No

    Driveable?Yes

    Yes No

    To wait, or not to wait?

  • 8/12/2019 CS2351 AI UNIT5 Learning

    21/65

    2000-2012 Franz Kurfess Learning

    Learning Decision Trees

    Problem: find a decision tree that agrees with

    the training set

    trivial solution: construct a tree with one

    branch for each sample of the training set

    works perfectly for the samples in the training set

    may not work well for new samples

    (generalization)

    results in relatively large trees

    better solution: find a concise tree that still

    agrees with all samples

  • 8/12/2019 CS2351 AI UNIT5 Learning

    22/65

    2000-2012 Franz Kurfess Learning

    Ockhams Razor

    The most likely hypothesis is the simplest

    one that is consistent with all

    observations.

    general principle for inductive learning

    a simple hypothesis that is consistent with all

    observations is more likely to be correct than a

    complex one

  • 8/12/2019 CS2351 AI UNIT5 Learning

    23/65

    2000-2012 Franz Kurfess Learning

    Constructing Decision Trees

    in general, constructing the smallest possible

    decision treeis an intractable problem

    algorithms exist for constructing reasonably

    small trees

    basic idea: test the most important attribute

    first

    attribute that makes the most difference for the

    classification of an example

    can be determined through information theory

    hopefully will yield the correct classification with

  • 8/12/2019 CS2351 AI UNIT5 Learning

    24/65

    2000-2012 Franz Kurfess Learning

    Decision Tree Algorithm

    recursive formulation

    select the best attribute to splitpositive and

    negative examples

    if only positive or only negative examples are left,

    we are done

    if no examples are left, no such examples were

    observed return a default value calculated from the majority

    classification at the nodes parent

    if we have positive and negative examples left, but

    no attributes to split them, we are in trouble

  • 8/12/2019 CS2351 AI UNIT5 Learning

    25/65

  • 8/12/2019 CS2351 AI UNIT5 Learning

    26/65

    2000-2012 Franz Kurfess Learning

    Noise and Over-fitting

    the presence of irrelevant attributes (noise) may lead to moredegrees of freedom in the decision tree

    the hypothesis space is unnecessarily large

    overfittingmakes use of irrelevant attributes to distinguish

    between samples that have no meaningful differences e.g. using the day of the week when rolling dice

    over-fitting is a general problem for all learning algorithms

    decision tree pruningidentifies attributes that are likely to be

    irrelevant very low information gain

    cross-validationsplits the sample data in different training and

    test sets

    results are averaged

  • 8/12/2019 CS2351 AI UNIT5 Learning

    27/65

    2000-2012 Franz Kurfess Learning

    Ensemble Learning

    Multiple hypotheses (an ensem le)aregenerated, and their predictions combined

    by using multiple hypotheses, the likelihood for

    misclassification is hopefully lower also enlarges the hypothesis space

    oosting is a frequently used ensemble method

    each example in the training set has a weightassociated

    the weights of incorrectly classified examples are

    increased, and a new hypothesis is generated from

    this new weighted training set

  • 8/12/2019 CS2351 AI UNIT5 Learning

    28/65

    Explanation-Based Learning

    Learning complex concepts using Inductionprocedures typically requires a substantialnumber of training instances.

    But people seem to be able to learn quite a bitfrom single examples.

    An EBL system attempts to learn from a singleexample x by explaining why x is an example

    of the target concept. The explanation is then generalized, and then

    systems performance is improved through theavailability of this knowledge.

    EBL

  • 8/12/2019 CS2351 AI UNIT5 Learning

    29/65

    EBL EBL programs as accepting the following as input:

    A training example

    A goal concept: A high level description of what the program is supposed to

    learn An operational criterion- A description of which concepts are usable.

    A domain theory: A set of rules that describe relationships between objectsand actions in a domain.

    From this EBL computes a generalizationof the training example that is

    sufficient to describe the goal concept, and also satisfies theoperationality criterion.

    Explanation-based generalization (EBG) is an algorithm for EBL and hastwo steps: (1) explain, (2) generalize

    During the explanationstep- prune away all the unimportant aspects ofthe training example with respect to the goal conceptgives explanation

    The next step is to generalize the explanationas far as possible while stilldescribing the goal concept.

  • 8/12/2019 CS2351 AI UNIT5 Learning

    30/65

    Statistical Learning Methods

  • 8/12/2019 CS2351 AI UNIT5 Learning

    31/65

    Statistical Learning Data instantiations of some or all of

    the random variables describing thedomain; they are evidence

    Hypothesesprobabilistic theories of

    how the domain works The Surprise candy example: two

    flavors in very large bags of 5 kinds,

    indistinguishable from outside h1: 100% cherry P(c|h1) = 1, P(l|h1) = 0

    h2: 75% cherry + 25% lime

    h3: 50% cherry + 50% lime

  • 8/12/2019 CS2351 AI UNIT5 Learning

    32/65

    Problem formulation Given a new bag, random variable H denotes the

    bag type (h1h5); Diis a random variable (cherry orlime); after seeing D1, D2, , DN,predictthe flavor

    (value) of DN-1.

    Bayesian learning Calculates the probability of each hypothesis, given

    the data and makes predictions on that basis P(hi|d) = P(d|hi)P(hi), where dare observed values of D

    Predictions use a likelihood-weighted average overhypotheses

    hiare intermediariesbetween raw data and predictions

    No need to pick one best-guess hypothesis

  • 8/12/2019 CS2351 AI UNIT5 Learning

    33/65

    Learning with Complete Data

    Parameter learning - to find the numericalparameters for a probability model whosestructure is fixed

    Data are complete when each data pointcontains values for every variable in themodel

    Maximum-likelihoodparameter learning:discrete model

    With complete data, ML parameter learning

  • 8/12/2019 CS2351 AI UNIT5 Learning

    34/65

    Nave Bayes models

    The most common Bayesian network model used in machinelearning

    It assumes that the attributes are conditionally independentof each other, given class

    A deterministic prediction can be obtained by choosing themost likely class

    P(C|x1,x2,,xn) = (C)i(xi|C)

    NBChasnodifficultywithnoisydata

  • 8/12/2019 CS2351 AI UNIT5 Learning

    35/65

    Learning with Hidden Variables

    Many real-world problems have hidden variables which arenot observable in the data available for learning.

    Question: If a variable (disease) is not observed, why not

    construct a model without it?

    Answer: Hidden variables can dramatically reduce the numberof parameters required to specify a Bayesian network. Thisresults in the reduction of needed amount of data forlearning.

  • 8/12/2019 CS2351 AI UNIT5 Learning

    36/65

    EM: Learning mixtures of Gaussians

    The unsupervised clustering problem

    If we knew which component generated each

    xj, we can get , If we knew the parameters of each component, we

    know which cishould xj belong to. However, we donot know either,

    EMexpectation and maximization

    Pretend we know the parametersof the model

    and then to infer the probabilitythat each xj

  • 8/12/2019 CS2351 AI UNIT5 Learning

    37/65

    E-step computes the expected value pij of thehiddenindicator variablesZij, where Zijis 1 if xjwas generated by i-thcomponent, 0 otherwise

    M-stepfinds the new valuesof the parameters that maximizethe log likelihood of the data, given the expected values of Zij

  • 8/12/2019 CS2351 AI UNIT5 Learning

    38/65

    Instance-based Learning

    Parametric vs. nonparametric learning

    Learning focuses on fitting the parametersof a

    restricted family of probability models to an

    unrestricted data set Parametric learning methods are often simple and

    effective, but can oversimplify whats really

    happening

    Nonparametriclearning allows the hypothesis

    complexity to grow with the data

    IBL is nonparametric as it constructs hypotheses

    directly from the training data.

  • 8/12/2019 CS2351 AI UNIT5 Learning

    39/65

    Nearest-neighbor models

    The key idea: Neighbors are similar

    Density estimation example: estimate xs probability density by thedensity of its neighbors

    Connecting with table lookup, NBC, decision trees,

    How define neighborhood N If too small, no any data points

    If too big, density is the same everywhere

    A solution is to define N to contain k points, where k is large enough toensure a meaningful estimate

    For a fixed k, the size of N varies

    The effect of size of k

    For most low-dimensional data, k is usually between 5-10

  • 8/12/2019 CS2351 AI UNIT5 Learning

    40/65

    K-NN for a given query x

    Which data point is nearest to x?

    We need a distance metric, D(x1, x2)

    Euclidean distance DE is a popular one

    When each dimension measures something different, it is inappropriate to useDE (Why?)

    Important to standardize the scale for each dimension

    Mahalanobis distance is one solution

    Discrete features should be dealt with differently

    Hamming distance

    Use k-NN to predict

    High dimensionality poses another problem

  • 8/12/2019 CS2351 AI UNIT5 Learning

    41/65

    Summary

    Bayesian learning formulates learning as a form of probabilistic inference,using the observations to update a prior distribution over hypotheses.

    Maximum a posteriori (MAP) selects a single most likely hypothesis giventhe data.

    Maximum likelihood simply selects the hypothesis that maximizes the

    likelihood of the data (= MAP with a uniform prior). EM can find local maximum likelihood solutions for hidden variables.

    Instance-based models use the collection of data to represent adistribution.

    Nearest-neighbor method

  • 8/12/2019 CS2351 AI UNIT5 Learning

    42/65

    Reinforcement Learning

    In which we examine how an agent

    can learn from success and failure,reward and punishment.

  • 8/12/2019 CS2351 AI UNIT5 Learning

    43/65

    Introduction

    Learning to r ide a bicycle:The goal given to the Reinforcement Learning

    system is simply to ride the bicycle without fallingoverBegins riding the bicycle and performs a series of

    actions that result in the bicycle being tilted 45degrees to the right

    Photo:http://www.roanoke.com/outdoors/bikepages/bikerattler.html

  • 8/12/2019 CS2351 AI UNIT5 Learning

    44/65

    Introduction

    Learning to r ide a bicycle:

    RL system turns the handle bars to the LEFT

    Result: CRASH!!!Receives negative reinforcement

    RL system turns the handle bars to the RIGHTResult: CRASH!!!

    Receives negative reinforcement

  • 8/12/2019 CS2351 AI UNIT5 Learning

    45/65

    Introduction

    Learning to r ide a bicycle:RL system has learned that the state of being

    titled 45 degrees to the right is bad

    Repeat trial using 40 degree to the rightBy performing enough of these trial-and-errorinteractions with the environment, the RL systemwill ultimately learn how to prevent the bicyclefrom ever falling over

    i i i

  • 8/12/2019 CS2351 AI UNIT5 Learning

    46/65

    Passive Learning in a Known

    Environment

    Passive Learner: A passive learner simplywatchesthe world going by, and tries to

    learn the utilityof being in various states. Another way to think of a passive learneris as an agent with a fixed policytrying todetermine its benefits.

  • 8/12/2019 CS2351 AI UNIT5 Learning

    47/65

    Passive Learning in a Known Environment

    In passive learning, the environment generates statetransitions and the agent perceives them. Consideran agent trying to learn the utilities of the statesshown below:

  • 8/12/2019 CS2351 AI UNIT5 Learning

    48/65

    Passive Learning in a Known Environment

    Agent can move {North, East, South, West}

    Terminate on reading [4,2] or [4,3]

  • 8/12/2019 CS2351 AI UNIT5 Learning

    49/65

    Passive Learning in a Known Environment

    the object is to use this information about rewards tolearn the expected utility U(i) associated with eachnonterminal state i

    Utilities can be learned using 3 approaches1) LMS (least mean squares)

    2) ADP (adaptive dynamic programming)

    3) TD (temporal difference learning)

  • 8/12/2019 CS2351 AI UNIT5 Learning

    50/65

    Active Learning in an Unknown Environment

    An active agent must consider :

    what actions to take

    what their outcomes may be

    how they will affect the rewards received

  • 8/12/2019 CS2351 AI UNIT5 Learning

    51/65

    Active Learning in an Unknown Environment

    Minor changes to passive learning agent :

    environment model now incorporates the

    probabilities of transitions to other states

    given a particular action

    maximize its expected utility

    agent needs a performance element to

    choose an action at each step

  • 8/12/2019 CS2351 AI UNIT5 Learning

    52/65

    2000-2012 Franz Kurfess Learning

    Learning in Neural Networks

    Neurons and the Brain

    Neural Networks

    Perceptrons

    Multi-layer Networks

    Applications

  • 8/12/2019 CS2351 AI UNIT5 Learning

    53/65

    2000-2012 Franz Kurfess Learning

    Neural Networks

    complex networks of simple computing

    elements

    capable of learning from examples

    with appropriate learning methods

    collection of simple elements performs high-

    level operations

    thought

    reasoning

    consciousness

  • 8/12/2019 CS2351 AI UNIT5 Learning

    54/65

    2000-2012 Franz Kurfess Learning

    Neural Networks and the Brain

    brain set of interconnected modules

    performs information processing

    operations at various levels

    sensory input analysis

    memory storage and retrieval

    reasoning

    feelings

    consciousness

    neurons basic computational elements

    heavily interconnected with

    other neurons

    [Russell & Norvig, 1995]

  • 8/12/2019 CS2351 AI UNIT5 Learning

    55/65

    2000-2012 Franz Kurfess Learning

    Artificial Neuron Diagram

    weighted inputs are summed up by the input function

    the (nonlinear) activation functioncalculates the activation value,

    which determines the output

    [Russell & Norvig, 1995]

  • 8/12/2019 CS2351 AI UNIT5 Learning

    56/65

    2000-2012 Franz Kurfess Learning

    Common Activation Functions

    Stept(x) = 1 if x >= t, else 0

    Sign(x) = +1 if x >= 0, else 1

    Sigmoid(x) = 1/(1+e-x)

    [Russell & Norvig, 1995]

  • 8/12/2019 CS2351 AI UNIT5 Learning

    57/65

    2000-2012 Franz Kurfess Learning

    Network Structures

    in principle, networks can be arbitrarily

    connected

    occasionally done to represent specific structures

    semantic networks

    logical sentences

    makes learning rather difficult

    layered structures networks are arranged into layers

    interconnections mostly between two layers

    some networks have feedback connections

  • 8/12/2019 CS2351 AI UNIT5 Learning

    58/65

    2000-2012 Franz Kurfess Learning

    Perceptrons single layer, feed-forward

    network

    historically one of the first

    types of neural networks

    late 1950s

    the output is calculated as

    a step function applied tothe weighted sum of

    inputs

    capable of learning simple

    functions

    linearly separable

    [Russell & Norvig, 1995]

  • 8/12/2019 CS2351 AI UNIT5 Learning

    59/65

    2000-2012 Franz Kurfess Learning

    Perceptrons and Learning

    perceptrons can learn from examples through a

    simple learning rule

    calculate the error of a unit Errias the difference

    between the correct output Tiand the calculated

    output OiErri= Ti- Oi

    adjust the weight Wjof the input Ijsuch that the error

    decreasesWij:= Wij+ *Iij* Errij is the learning rate

    this is a gradient descent search through the weight

    space

  • 8/12/2019 CS2351 AI UNIT5 Learning

    60/65

    2000-2012 Franz Kurfess Learning

    Multi-Layer Networks

    research in the more complex networks with

    more than one layer was very limited until the

    1980s

    learning in such networks is much morecomplicated

    the problem is to assign the blame for an error to

    the respective units and their weights in aconstructive way

    the back-propagation learning algorithm can

    be used to facilitate learning in multi-layer

  • 8/12/2019 CS2351 AI UNIT5 Learning

    61/65

    2000-2012 Franz Kurfess Learning

    Diagram Multi-Layer Network

    two-layer network input units Ik

    usually not counted as a

    separate layer

    hidden units aj

    output units Oi

    usually all nodes of one layer

    have weighted connections to

    all nodes of the next layer

    Ik

    aj

    Oi

    Wji

    Wkj

  • 8/12/2019 CS2351 AI UNIT5 Learning

    62/65

    2000-2012 Franz Kurfess Learning

    Back-Propagation Algorithm

    assigns blame to individual units in the

    respective layers

    essentially based on the connection strength

    proceeds from the output layer to the hiddenlayer(s)

    updates the weights of the units leading to the

    layer essentially performs gradient-descent search

    on the error surface

    relatively simple since it relies only on local

  • 8/12/2019 CS2351 AI UNIT5 Learning

    63/65

    2000-2012 Franz Kurfess Learning

    Capabilities of Multi-Layer Neural Networks

    expressiveness weaker than predicate logic

    good for continuous inputs and outputs

    computational efficiency training time can be exponential in the number of

    inputs

    depends critically on parameters like the learning rate

    local minima are problematic

    can be overcome by simulated annealing, at additional cost

    generalization

    works reasonabl well for some functions classes of

    Capabilities of Multi Layer Neural

  • 8/12/2019 CS2351 AI UNIT5 Learning

    64/65

    2000-2012 Franz Kurfess Learning

    Capabilities of Multi-Layer Neural

    Networks (cont.)

    sensitivity to noise

    very tolerant

    they perform nonlinear regression

    transparency

    neural networks are essentially black boxes

    there is no explanation or trace for a particular

    answer

    tools for the analysis of networks are very limited

    some limited methods to extract rules from

    networks

  • 8/12/2019 CS2351 AI UNIT5 Learning

    65/65

    Applications

    domains and tasks where neural networks are

    successfully used

    handwriting recognition

    control problems

    juggling, truck backup problem

    series prediction

    weather, financial forecasting categorization

    sorting of items (fruit, characters, phonemes, )