measuring risk - what doesn’t work and what does

Applied Information EconomicsApplied Information Economics

© 2010 HDR and Aliado Accesso LLC© 2010 HDR and Aliado Accesso LLC11

Measuring risk – What Works and What Doesn’t

Background• In the past 16 years, I

conducted 60 major risk/return analysis projects so far

• I noticed that what were thought of as “impossible” measurements could actually be measured

• I also noticed that risk management and much decision analysis in business was mostly unscientific and did not reflect the latest research

• I wrote these two books published by John Wiley & Sons


Challenges• How can we measure “intangibles”?• How do we know that our method of analyzing big

decisions “works” (i.e. has a measurable improvement for our forecasts and decisions)?

• How can we use proven, quantitative methods when, apparently, we lack sufficient data, or the problem is too complex?


Key Lesson: Skepticism

• In defense of many popular methods for decision analysis, portfolio prioritization and metrics, you may have heard (or said) the following:– “Our method is structured and formal”– “It helps us build consensus”– “It’s easily understood and relatively fast”– “It is a proven method” (proven meaning somebody

else did it this way and said they liked it)

If the method uses a weighted score, or labels risks as “high/medium/low” then you should be suspicious.


Analysis Placebos

• Gathering more information makes you feel better but, at some point, begins to reduce decision quality while confidence continues to increase. Tsai C., Klayman J., Hastie R. “Effects of amount of information on judgment accuracy and confidence” Org. Behavior and Human Decision Processes, Vol. 107, No. 2, 2008, pp 97-105

• Interaction with others also increases decision confidence but, again, at some point decisions are not improved while confidence continues to increase Heath C., Gonzalez R. “Interaction with Others Increases Decision Confidence but Not Decision Quality: Evidence against Information Collection Views of Interactive Decision Making” Organizational Behavior and Human Decision Processes, Vol. 61, No. 3, 1995, pp 305-326

• Formal training in detecting lies makes individuals slightly worse at detecting lies in controlled experiments – but there confidence in their judgments increases dramatically. Kassin, S.M., Fong, C.T. “I’m innocent!: Effect of training on judgments of truth and deception in the interrogation room” Law and Human Behavior, 23 pp 499-516, 1999

Studies have shown that it is very easy for a decision-making process to increase confidence in forecasts and decisions even if measured outcomes (return on decisions, forecasts, etc.) are not improved – or even made worse


Errors in Expert Judgment

• Overconfidence – Their chance of being right is much less than they believe.• Inconsistency and influence by irrelevant factors – When given the same

sets of problems to evaluate, experts have a hard time giving the same answers. Their memory is reconstructed so that they believe they always had one preference when in fact they didn’t. Factors which experts may insist have no bearing on their judgment show correlations to their judgments.

• Misinterpretation – We tend to interpret cues about risks, measurements and decision problems in a way that is mathematically irrational.

Human expertise is an important input in and it is hard to completely automate. But there are certain types of errors in human judgment we know how to measure and control for:

“Experience is inevitable. Learning is not.” Paul Schoemaker


Real Reasons Decisions Change

• Fear and anger effect the perception of risk and risk tolerance (Lerner, Keltner, 2001).

• A small study presented at Cognitive Neuroscience Society meeting in 2009 by a grad student at U. of Michigan showed that simply being briefly exposed to smiling faces makes people more risk tolerant in betting games.

• An NIH funded study conducted by Brian Knutson of Stanford showed that emotional stimulation caused subjects to take riskier bets in betting game.

• Risk preferences show a strong correlation to testosterone levels – which change daily (Sapienza, Zingales, Maestripieri, 2009).

• “Emerging preferences” affects our perception of risk and risk aversion and that emerging preferences are perceived as something the decision maker always had. (DeKay)

• Research on effects like “anchoring” show how exposure to an unrelated number prior to the decision affects the choice. This implies that even the order investments are presented in can affect choices. (Kahneman, Tversky)

Our self-image of how tolerant or averse we are toward risk is much more fluid than we think. We will imagine our risk appetite is a more permanent part of our character than it really is. Controlling for this means 1) being aware of the issue, 2) documenting risk aversion with “risk boundaries” 3) multiple estimates of risks


Scale Errors

• The use of scales simply obscures (doesn’t alleviate) the lack of information and potential disagreements - this creates an illusion of communication (Budescu)

• Arbitrary changes to the scale (1 to 5 vs. 1 to 10) have unexpected effects on how people distribute their responses on a scale – which lead to major differences in outcomes (Fox).

• Popular weighted scores add error to unaided human judgment. Scale error is added even if scales are “well defined” by introducing an extreme rounding error. It is possible to have one risk 10 or 50 times greater than another risk end up in the same final group. (Cox)

Relative Impact of Sponsor Level on Project Failure Actual

3

4

1

Scale

2

C-Level

SVP

ManagerVP

Based on the relative value of the sponsor levels based on historical data of project failures (Note that the error was just enough that the order might even be wrong)

Scales are simple. But our response behaviors when we use them are not. Typical scales combine several complex, subtle errors


What Does Work?• “Calibrate” experts to realistically assess probabilities.• For certain problems, remove inconsistencies in judgment.• “Do the math”– don’t rely on intuition entirely.

– Use the “calibrated” judgments of experts in Monte Carlo simulations.– Simple historical models and actual measurements usually outperform human

judges.– Compute the “Expected Value of Information” to identify important measures.

• Document basic decision criteria - especially risk vs. return.

Each of these address known errors or been tested in multiple controlled experiments with measurable results – not just case anecdotal case studies with reactions of users as an indicator of effectiveness.

Don’t reinvent the wheel – scientifically proven, effective risk analysis methods have been applied to other equally difficult problems where there is limited historical data and lots of uncertainty. Examples: nuclear power, insurance of rare and complex events, oil exploration


Giga Analysts

Giga Clients

Statistical Error

“Ideal” Confidence

30%

40%

50%

60%

70%

80%

90%

100%

50% 60% 80% 90% 100%

25

75 71 65 58

21

17

68152

65

4521

70%Assessed Chance Of Being Correct

Per

cent

Cor

rect

99 # of Responses

Calibrated Probabilities• 1997: An experiment Hubbard conducted with Giga Information Group proves

people can be trained to assess probabilities of uncertain forecasts• Hubbard has calibrated hundreds of people since then• Calibrated probabilities are the basis for modeling the current state of uncertainty


“Smoothing” Inconsistencies• No matter how much experience experts have, they appear to be unable to

apply what they learned consistently• Methods that statistically “smooth” their estimates show reduced error in

several studies for many different kinds of problems

Reduction in Forecasting Error Compared to Expert Judgment

R&D Portfolio Priorities

Battlefield Fuel Forecasts

IT Portfolio Priorities

Cancer patient recovery

Changes in stock prices

Mental illness prognosis

Psychology course grades

Business failures

0% 10% 20% 30%

My Studies

My Studies

Other Published Studies

Other Published Studies

First Estimate

Sec

ond

Est

imat

e

Movie Box Office Forecasts


Quantitative Modeling: It WorksEvent AEvent A

Event BEvent B

DemandDemand

%Orders Lost

%Orders LostOR

Lost Revenue

Lost Revenue

• In the United Kingdom between 1844 and 1853, 149 insurance companies were formed. By the end of this period, just 59 survived. Those that failed tended to be those that did not use mathematically valid premium calculations. (Buhlmann, 1997)

• Over 150 studies have shown areas of judgment where historical models outperform expert judgment – even though the humans insist each item is unique and requires the “human touch”. (Meehl, 1954; Dawes 1996)

• One researcher in the oil industry found a correlation between the use of quantitative risk analysis methods and financial performance – and the improvement in performance started when they started using the quantitative methods. (F. Macmillan, 2000)

• Data at NASA from over 100 space missions showed that Monte Carlo simulations and historical models beat other methods for estimating cost, schedule and mission risks (I published this in The Failure of Risk Management and OR/MS Today)


Red Herrings of Modeling• We need to be careful of red herring arguments against models.

– “We cannot model that…it is too complex.”– “Models will have error and therefore we should not attempt it.”– “We don’t have sufficient data to use for a model.”– “The model failed to predict X, therefore modeling has no value.”

• Build on George E. P. Box: “Essentially, all models are wrong, but some are useful.”

– Some models are more useful than others.– Everyone uses a model – even if it is intuition or “common sense”– So the question is not whether a model is “right” or whether to use a model

at all. – The question is whether one model measurably outperforms another.– A proposed model (quantitative or otherwise) should be preferred if the error

reduction compared to the current model (expert judgment, perhaps) is enough to justify the cost of the new model.


Applied Information Economics

• AIE is a practical application of quantitative methods to decision analysis problems

• Goal: Optimizing Uncertainty Reduction –Balancing measurably improved decisions and analysis effort

• It answers two questions:– Given the current uncertainty, what is the best decision?– What additional analysis or measurements are justified?

• Every component of the method is based on empirical research that shows it improves decisions


Model The Current State of Uncertainty – Initially use calibrated estimates and then

actual measurements

Making the Best Bet

Optimize Decision – Use the quantified Risk/Return boundary of the Decision makers to determine which

decision is preferred.

Define the Decision and Identify Relevant Variables. Set up the “Business Case” for

the decision, using these variables –

Ca

libra

tion

T

rain

ing

Compute the value of additional Information – Determine what to measure and how much

effort to spend on measuring it.

Measure where the information value is high – Reduce uncertainty using any of the

methods

No

Yes

Is there significant value to more information?


A Few Examples

• IT– Prioritizing IT portfolios

– Risk of software development

– The value of better information

– The value of better security

– The Risk of obsolescence and optimal technology upgrades

– Vendor selection

– The value of infrastructure

– Performance metrics for the business value of applications

• Engineering– The risks of major engineering projects

– Mining operations

AIE was applied initially to IT business cases. But over the last 16 years it has also been applied to other decision analysis problems in all areas of Business Cases, Performance Metrics, Risk Analysis, and Portfolio Prioritization

• Business– Market forecasts– The risk/return of expanding operations– Business valuations for venture capital and mergers

and acquisitions– Movie project selection

• Environment– The value of safer drinking water– The value of “scrubbers” on smoke stacks– The value of better pesticide control for saving

endangered species

• Military– Forecasting fuel for Marines in the battlefield– Measuring the effectiveness of combat training to

reduce roadside bomb/IED casualties– R&D portfolios


Uncertainty, Risk & Measurement

• The “Measurement Theory” definition of measurement: “A measurement is an observation that results in information (reduction of uncertainty) about a quantity.”

• An Actuary's approach to Risk Measurement: “To quantify probability and loss of an undesirable possibility”

• The value of a Measurement: “The monetized reduction in risk from making decisions under less uncertainty”

• We model uncertainty statistically – with Monte Carlo simulations

Measuring Uncertainty, Risk and the Value of Information are closely related concepts, important measurements themselves, and precursors to most other measurements


The Impact of Computing Information Value

Traditional M

easurement P

riorities

Val

ue o

f In

form

atio

n

• The Value of Information is Computable: AIE uses a relatively simple (and 60 year-old) set of algorithms from decision theory to compute the value of information

• The Priority of Measurements is Reversed: This calculation reveals that most organizations will consistently focus on low-value measurements and ignore high-value measurements - this is the “measurement inversion”

• Only a Few Measurements Are Really Needed: We also found that, if anything, fewer measurements were required after the information values were known.

• Some Additional Empirical Measurements are almost always needed: I found that 97% of the models I built justified further measurement according to the information values.


Five Useful Assumptions

• Its been measured before• You have more data than you think• You need less data than you think• New data is more economical than you think• All measurements have error, but your

subjective estimates of that error has even more error.

“It’s amazing what you can see when you look” Yogi Berra


Measuring the “Impossible”

Several clever sampling methods exist that can measure more with less data than you might think

Examples:• Estimating the number of tanks

created by the Germans in WWII

• Clinical trials with extremely small samples

• Measuring undetected computer viruses or hacking attempts

• Estimating the population of fish in the ocean

• Measuring unreported crimes or the size of the black market

• Using “near misses” to measure catastrophic but rare events

WWII German Tank Production Estimates


Quantifying Risk Aversion

Acceptable Risk/Return Boundary

Investment Region

• The simplest element of Harry Markowitz’s Nobel Prize-winning method “Modern Portfolio Theory” is documenting how much risk an investor accepts for a given return.

• The “Investment Boundary” states how much risk an investor is willing to accept for a given return.

• For our purposes, we modified Markowitz’s approach a bit.

Investment


Cost vs. Value of AIE• The cost of analysis routinely comes in below 1% and has always

been under 2% of the investment size - including initial training• This is still less than some industries spend on risk analysis of

investments of similar size and risk• It is also sometimes less time-consuming than the previous non-

quantitative analysis techniques used by the firm (One of the reasons this analysis is efficient is we conduct a Value of Information Analysis - we only measure what is economically justified)

• Using the standard VIA calculation for the value of AIE analysis, AIE itself was the best investment of all the investments we analyzed - very conservative measures of payoffs put $20 to every $1 spent on AIE


Final Tips• Learn how to think about uncertainty, risk and

information value in a quantitative way• Assume its been measured before• You have more data than you think and you

need less data than you think• Methods that reduce your uncertainty are more

economical than many managers assume• Don’t let “exception anxiety” cause you to avoid

any observations at all• Just do it


Questions?

Jody Keyser

[email protected]

www.aliadocorp.com

1-888-373-0680


Supplementary Material


“Proper” Incentives

• Incentives can also help build a culture of “high-performance forecasting”

• A method called the “Brier Score” can be shown to compute rational incentives that optimize calibration of probabilities (Murphy, Winkler)

• Since the Brier score cannot be gamed in any way other than simply applying the best estimate on each probability, it is called a “proper” score

• Brier score = (Outcome(Xi)-P(Xi))2

• Other viable methods include “prediction markets”

i

Results of Brier Score applied to weather forecasts


Prediction Markets• Simulated trading

markets are a proven method of generating probabilities for uncertain events

• Research shows that it works even without purely monetary reward systems

Source: Servan-Schreiber, et. al. Electronic Markets, v 14-3, September 2004


Do

llar

Va

lue/

Co

st

EVI

ECI

Increasing Value & Cost of Info.

• EVPI – Expected Value of Perfect Information

• ECI – Expected Cost of Information

• EVI – Expected Value of Information

$0

$$$

Low certaintyHigh certainty

EVPI

Aim for this range

Perfect Information


The Value of the “First Few”• Uncertainty reduces much faster on the first few observations than you might think• Myth: When uncertainty is high, lots of data is needed to reduce it. • Fact: just the opposite is true.

• As number of samples increase, the 90% CI gets much narrower but…

• …each new sample reduces uncertainty only slightly and…

• …beyond about 30 samples, you need to quadruple the sample size to cut error by half

• As number of samples increase, the 90% CI gets much narrower but…

• …each new sample reduces uncertainty only slightly and…

• …beyond about 30 samples, you need to quadruple the sample size to cut error by half

-100%

0

100%

0 5 10 15 20 25 30 35 40

90% CI

Number of samples

Mea

n of

the

sam

ple

vs.

actu

al

• With a few samples, there is still high uncertainty but…

•…each new sample reduces uncertainty a lot and…

•…the first few samples reduce uncertainty the most when initial uncertainty is high.

• With a few samples, there is still high uncertainty but…

•…each new sample reduces uncertainty a lot and…

•…the first few samples reduce uncertainty the most when initial uncertainty is high.


Bayesian Sampling• Bayesian inversion allows for samples of extremely small sizes

when we use some prior knowledge about what is likely.• This can be used anytime when the cost of a sample is extremely

high – e.g. rocket launches, cancer patients, a complete inspection of a ship, plane or building, etc.

3 Launches, 0 Failures


5 Launches, 1 Failure

5 Launches, 1 Failure

Baseline Failure Rate of a Rocket After 5 Launches

1 Launch, 0 Failures

1 Launch, 0 Failures




Issues with a “Risk Map”• Does your “Risk Map” look more like the top or

bottom chart? If more like the top, how do the errors mentioned earlier compare to the variance among the clustered responses?

• Clustering means that all the previous errors mentioned before make up a large part of the difference between scores of individual risks.

• How does this address the measured response behaviors of overconfidence, partition dependence, framing, anchoring, etc.?

• How does this address correlations, common mode failures, and cascade failures? These factors can make a few “low risk” items add up to one very big risk.

• Risk maps like this may be ok for initial brainstorming, but don’t make critical decisions based on it.

Lik

elih

ood

Impact


The Illusion of Cognition

• Anchoring: Simply being exposed to arbitrary, even irrelevant numbers affects subsequent subjective estimates (Kahneman)

• Influential Inferiors: Give a test group two choices, A and B. For a control group add choice C - clearly just an inferior version of B. The percentage of people who choose B will increase over the control group. (Ariely)

• Defaults: Implicit or explicitly “defaults” affect choices, dramatically. (Ariely)

• Framing logically identical choices as loss avoidance vs. opportunity seeking changes preferences. (Kahneman)

The “Illusion of Cognition” is a phrase in the decision psychology literature that refers to the misconception that our choices are based on rational thinking. Risk assessment methods employ structures that can introduce these problems.


First, Do No Harm

Method Gut Feel Weighted Score Traditional Financial

Quantitative Models

Measured Improvement to Judgment?

Baseline No: Remove no errors and add new errors

Maybe: Decomposition helps; false precision

Yes: Proven w/controlled tests

Does it quantify risk?

Only intuitively

No, it attempts to describe risk;

No, but may attempt to adjust for it

Yes

Determines High-Payoff Measures?

No No: Turns some good measures into scores

No Yes (w/AIE)

Net Benefit? Baseline No: Probably Worse Maybe Slightly Best

• “Gut Feel” is the baseline. Anything that “works” has to show an improvement on this. Measured sources of error : inconsistency, overconfidence, various biases, inaccurate estimates

• The worst case is not “gut feel” – some methods add more error• The best case isn’t perfection – just measurably reduced error compared to gut feel