Download - Learning Department of Computer Science & Engineering Indian Institute of Technology Kharagpur
Learning
Department of Computer Science & Engineering
Indian Institute of Technology Kharagpur
CSE, IIT Kharagpur 2
Paradigms of learning
• Supervised Learning
– A situation in which both inputs and outputs are given– Often the outputs are provided by a friendly teacher.
• Reinforcement Learning
– The agent receives some evaluation of its action (such as a fine for stealing bananas), but is not told the correct action (such as how to buy bananas).
• Unsupervised Learning
– The agent can learn relationships among its percepts, and the trend with time
CSE, IIT Kharagpur 3
Decision Trees
• A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no “decision”.
Decision: Whether to wait for a table at a restaurant.
1. Alternate: whether there is a suitable alternative restaurant2. Lounge: whether the restaurant has a lounge for waiting customers3. Fri/Sat: true on Fridays and Saturdays4. Hungry: whether we are hungry5. Patrons: how many people are in it (None, Some, Full)6. Price: the restaurant’ price range ( , , )★ ★★ ★★★7. Raining: whether it is raining outside8. Reservation: whether we made a reservation9. Type: the kind of restaurant (Indian, Chinese, Thai, Fastfood)10. WaitEstimate: 0-10 mins, 10-30, 30-60, >60.
CSE, IIT Kharagpur 4
Sample Decision Tree
Patrons?
Yes
WaitEstimate?No Yes
Reservation? Fri/Sat?
Lounge? No YesYes
No Yes
Alternate?No Hungry?
Alternate?Yes
Raining?Yes
No Yes
None Some Full
>60 30-60 10-30 1-10
No Yes Yes
Yes YesYes
Yes Yes
No No No
No No
CSE, IIT Kharagpur 5
Decision Tree Learning Algorithm
Function Decision-Tree-Learning( examples, attributes, default )if examples is empty then return defaultelse if all examples have the same classification then
return the classificationelse if attributes is empty then return Majority-Value( examples )else
best Choose-Attribute( attributes, examples )tree a new decision tree with root test best
for each value vi of best do
examplesi { elements of examples with best = vi }
subtree Decision-Tree-Learning( examplesi, attributes – best, Majority-
Value( examples ))
add a branch to tree with label vi and subtree subtreeendreturn tree
CSE, IIT Kharagpur 6
Neural Networks
• A neural network consists of a set of nodes (neurons/units) connected by links– Each link has a numeric weight associated with it
• Each unit has:– a set of input links from other units,
– a set of output links to other units,
– a current activation level, and
– an activation function to compute the activation level in the next time step.
CSE, IIT Kharagpur 7
Basic definitions
• The total weighted input is the sum of the input activations times their respective weights:
• In each step, we compute:
ini
ai
g
InputFunction
ActivationFunction
Output
ai = g( ini )
InputLinks
aj Wj,i
OutputLinks
j
jiji aWin ,
j
jijii aWginga )( ,
CSE, IIT Kharagpur 8
Learning in Single Layer Networks
IjWj,i Oi
• If the output for a output unit is O, and the correct output should be T, then the error is given by:
Err = T – O
• The weight adjustment rule is:
Wj Wj + x Ij x Err where is a constant called the learning rate
• This method is unable to learn all types of functions• can learn only linearly separable functions
• Used in competitive learning
CSE, IIT Kharagpur 9
Two-layer Feed-Forward Network
Oi Output units
Wj,i
aj Hidden units
Wk,j
Ik Input units
CSE, IIT Kharagpur 10
Back-Propagation Learning
• Erri = (Ti – Oi) is the error at the output node
• The weight update rule for the link from unit j to output unit i is:
Wj,i Wj,i + x aj x Erri x g'(ini)
where g' is the derivative of the activation function g.
• Let i = Erri g'(ini). Then we have:
Wj,i Wj,i + x aj x i
CSE, IIT Kharagpur 11
Error Back-Propagation
• For updating the connections between the inputs and the hidden units, we need to define a quantity analogous to the error term for the output nodes
– Hidden node j is responsible for some fraction of the error i each of the output nodes to which it connects
– So, the i values are divided according to the strength of the connection between the hidden node and the output node, and propagated back to provide the j values for the hidden layer
• The weight update rule for weights between the inputs and the hidden layer is:
Wk,j Wk,j + x Ik x j
i
iijjj Wing ,)(
CSE, IIT Kharagpur 12
The theory
• The total error is given by:
• Only one of the terms in the summation over i and j depends on a particular W j,i, so all other terms are treated as constants with respect to W j,i
• Similarly:
i j kkjkiji
i jjiji
iii
IWWgT
aWgTOTE
2
,,
2
,2
2
1
2
1
2
1
ijiiijj
jijiijij
aingOTaaWgOTaW
E
,
,
jkjk
IW
E
,