SUNY at AlbanySystem Dynamics Colloquium, Spring 2008
Navid Ghaffarzadegan
Effect of Conditional Feedback
on LearningNavid GhaffarzadeganPhD Student, the State University of New York at Albany
MIT-Albany-WPI System Dynamics Colloquium, Spring 2008
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Introduction
barriers to learning from feedback in a dynamic decision making environment: Complexity of the environment (Gonzalez 2005) Misperception of delays (Rahmandad et al. 2007, Rahmandad
2008) Feedback asymmetry (Denrell and March 2001), The existence of noise in feedback (Bereby-Meyer and Roth
2006), Problems of mental models (Senge 1996) …
People ignore and misperceive feedback (Sterman 1989a, Sterman 1989b).
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Introduction
A common theme in formal studies on learning a decision maker makes a decision and receives a
payoff the question is whether or not the decision maker is
capable of learning from the information. Full Feedback
Decision Payoff Perceiving payoff
LEARNING
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Introduction
Little attention has been paid to the relevance of such an assumption. eg: Police Officer, Admission Office, Human Resources Manager
Conditional Feedback For positive decisions we perceive feedback much easier than
for negative decisions
Decision Payoff Perceiving payoff
LEARNING
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Research Problem
What is the effect of conditional feedback on learning Or How relevant was the assumption of “Full Feedback”?
Method: 1- Simulation.
1- Build a differential equation model in Signal Detection Framework
2- Experiment with the model 2- Test with data
Second hand data: a published laboratory experiment
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Framework
Signal Detection Theory Signal vs. Noise
e.g. Guilty vs. Innocent, e.g. capable vs. incapable candidates
Decision makers try to differentiate signals from noise Judgment and Decision Making
Evidence is often ambiguous, and there is uncertainty in the environment (Hammond 1996, Stewart 2000)
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Framework
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
(e.g. guilty persons)
Noise distribution
(e.g. innocent persons) Threshold
Judgment
d'
Important concepts: Base rate – selection rate – d’ – threshold Payoff Threshold Learning
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Framework
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
(e.g. guilty persons)
Noise distribution
(e.g. innocent persons) Threshold
Judgment
d'
Conditional Feedback Threshold Learning (Cue Learning)
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
Noise distribution
T1 T2
T3
Threshold
Threshold
payoff
ThresholdLearning
payoffperception
Judgment
Set threshold make decision Receive Payoff Perceive Payoff correct threshold
One stock: Threshold (experiment)
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
Noise distribution
T1 T2
T3
Threshold
Threshold
payoff
ThresholdLearning
payoffperception
Judgment
Learning Algorithm: 1. Learning from payoff shortfall:
Payoff shortfall= maximum possible payoff (Q) – payoff (Q, d) maximum possible payoff (Q)= Vtn+ Q*(Vtp-Vtn)
2. Anchoring and adjustment assumption in correcting the threshold (Tversky and Kahneman 1974, Epley and Gilovich 2001, Sterman 1989.b)
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
Noise distribution
T1 T2
T3
Threshold
Threshold
payoff
ThresholdLearning
payoffperception
Judgment
Inputs: noise ~ N(0,1) and signal ~ N(d’,1)
Base rate = 0.5, values are symmetric, (To make Base rate traceable)correct decisions are more valued
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback - Results
6.5
3.5
0.5
-2.5
-5.5
0 100 200 300 400 500 600 700 800 900 1000
threshold
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950Time (Minute)
avrage base rate : Base rate = 05average selection rate : Base rate = 05
Optimal threshold
threshold
Average base rate = 0.5
Average Selection rate in last 50 trials
In full feedback; the model is able to learn from feedback
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback - Results
In full feedback; the model is able to learn from feedback Looking at payoff shortfall in enough to learn threshold The speed of approaching depends on the time to change
threshold
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950Time (Minute)
avrage base rate : Base rate = 03average selection rate : Base rate = 03
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950Time (Minute)
avrage base rate : Base rate = 07average selection rate : Base rate = 07
Average Selection rate in last 50
trials
Dynamics of selection rate for base rates of 0.3 and 0.7
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model II: Conditional Feedback
Our decision influence our payoff perception.
How do we judge our negative decision's payoff.
People can be different in interpreting their negative decisions. (Personality, second loop learning,..)
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
Noise distribution
Threshold
Judgment
Clear feedback Unclear (NO) feedback
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model II: Conditional Feedback
Constructivist strategyFor negative decisions: perceived payoff= payoff (p,0)
p = ratio of signals to total decisions for negative decisionsp=0 means assuming all of our negative decisions are right.p=1 means assuming all of our negative decisions are wrong.
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
Judgment
Pro
ba
bilit
y d
en
sit
y
Signal distribution
Noise distribution
Threshold
Judgment
Clear feedback Unclear (NO) feedback P
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model II: Conditional Feedback - Results
In conditional feedback; learning depends on how people code their negative decisions
Elwin et. al (2007): p=0 Stewart et. al (2007): people underestimate selection
rate and overestimate threshold
current
average selection rate1
0.75
0.5
0.25
00 249.97 499.95 749.92 999.90
Time (Minute)
Base rate
Final selection rate interval
current
optimal threshold2
1
0
-1
-20 249.97 499.95 749.92 999.90
Time (Minute)
Optimal threshold
Final threshold interval
P=1
P=0
P=1
P=0
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback - Results
Comparison of full feedback and conditional feedback in confident constructivist strategy
Dynamics of selection rate in last 50 trials (a) and threshold (b) for base rate of 0.5
6.5
3.5
0.5
-2.5
-5.5
0 100 200 300 400 500 600 700 800 900 1000
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950
Optimal threshold
Average base rate = 0.5
full feedback
conditional feedback
full feedback
conditional feedback
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Model I: Full Feedback - Results
In conditional feedback and in confident constructivist strategy; the model is not able to learn from feedback
What is the real p? How do really people code their negative decisions?
Dynamics of selection rate for base rates of 0.3 and 0.7
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950
1
0.75
0.5
0.25
0
50 140 230 320 410 500 590 680 770 860 950
Average base rate = 0.3
full feedback
conditional feedback
Average base rate = 0.7
full feedback
conditional feedback
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Replications of an empirical investigation
Data from Elwin et. al (2007) Comparison of Full Feedback and Conditional
Feedback Sixty four subjects performed a computerized task
of predicting economic outcomes for companies The experiment had two major phases:
training trials test phase
In the training part, a group of subjects performed 120 trials of full feedback decision making, while the other group performed 240 trials of conditional feedback.
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Replications of an empirical investigation
We use their published report in our model and test parameters that can replicate their findings.
Main parameters: d’ and p. (and time to adjust threshold)
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5
Confidence in negative decisions (1- p)
Level of expertise (d') High
High
Low
Low
0.36<SR<0.5
Replication area 0.3 ≤SR ≤0.36
over confidence
SR<0.3
0.5<SR
SR=0.5
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5
Confidence in negative decisions (1- p)
Level of expertise (d') High
High
Low
Low
Replication area 0.3 ≤SR ≤0.36
SR <0.3
SR=0.5
0.36<SR<0.5
over confidence
0.5<SR
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Conclusion
A new explanation for imperfectness of decision making in a series of tasks. (learning from clear shortfalls)
Conditional feedback can result in bias and underestimation of the base rate.
In respect to second loop learning: People do not find the optimal threshold even if, in the real world, second loop learning exists, it works for
a limited number of people
Warning about overestimation of relevance of full feedback assumption in formal studies.
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Conclusion
Future works: Effects of personality traits
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
P
Effect of Personality on Learning, e.g. using Big Five
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Conclusion
Future works: Making confidence endogenous.
Effect of Personality on Learning, e.g. using Big Five
Dynamics of confidence building
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
P
Confidence
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Conclusion
Future works: Two individuals communicating
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
P
Threshold 1
payoff 1
feedbackavailability 1
payoffperception 1
P1
ThresholdLearning
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Conclusion
Future works: Two individuals influencing each others performance
Threshold
payoff
ThresholdLearning
feedbackavailability
payoffperception
P
Threshold 1
payoff 1
feedbackavailability 1
payoffperception 1
P1
ThresholdLearning
Navid Ghaffarzadegan
System Dynamics Colloquium, Spring 2008
SUNY at Albany
Thanks
FEEDBACK?