overcoming data deluge (state street) presentation at the chief data scientist, usa 2016

52
GXN-2870 Overcoming the Data Deluge: Ensuring Data Quantity Generates Quality Insights Chief Data Scientist Forum– November, 2016 Dr. Jeffrey R. Bohn Chief Science Officer, State Street Global Exchange SM Head of GX Labs [email protected]

Upload: corinium-coriniumglobal

Post on 16-Feb-2017

526 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Overcoming the Data Deluge: Ensuring Data Quantity Generates Quality Insights

Chief Data Scientist Forum– November, 2016Dr. Jeffrey R. BohnChief Science Officer, State Street Global ExchangeSM

Head of GX [email protected]

Page 2: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-2870

Thinking

Theories

2

Page 3: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-2870

David HumeA wise man, therefore, proportions his belief to the evidence.-- Of Miracles; Section X, Part I. 87

For if truth be at all within the reach of human capacity, it is certain it must lie very deep and abstruse: and to hope we shall arrive at it without pains, while the greatest geniuses have failed with the utmost pains, must certainly be esteemed sufficiently vain and presumptuous. -- Treatise of Human Nature, Introduction

3

Page 4: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Consilience: A frequent driver toward complexity

A "jumping together” of knowledge by the linking of facts and fact-based theory across disciplines to create a common groundwork of explanation.

Wilson (1999) p. 8

4GXN-2870

Page 5: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-28705

Compelling communication requires multiple interactions

• Educate as to model’s usefulness as a function of complexity

• Frame key performance indicators (KPIs)

• Prototype, socialize, productionize

• Avoid big-bang projects– include executives in discovery/iteration process

GXN-2870

Page 6: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-28706

1. Technology and analytics trends

2. Data deception

3. Latent factor modeling

4. Using machine learning to improve analytics

5. Data visualization and compelling communication

Overview

Page 7: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Technology and analytics trends

1

7

Page 8: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-28708

New, more numerous and novel data

IoT

Exabyte = 1018 bytes

Source: IBM

Page 9: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-28709

Cost of Computing Power Equal to an iPad 2

1950$1 trillion 1970

$100 million

Source: Brookings Institute (http://www.brookings.edu/~/media/research/files/papers/2011/8/innovation-greenstone-looney/08_innovation_greenstone_looney.pdf)

2010$100

Page 10: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287010

Catalysts for changing approaches

• Cloud-based computing infrastructure

• Use of GPUs and improvement of hardware   

• Machine Learning

• Data volume explosion• Images, sounds, streaming…• Mobile• IoT (Internet of Things)

• New types of databases

• Parallelization of computation• Open sourced,

community-based development

Infrastructure and culture ComputationData

Page 11: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287011

What makes decision-support analytics complex?

• Multi-variate optimization problem– often with controls and results playing out over different time horizons.

• Uncertainty– “risk” defined by known distributions and “Knightian uncertainty” defined by unknown models/data-generating processes

• Interconnectedness– both explicit and implicit.

• Hierarchies of relationships and relevant assumptions

• Separation of point predictions/estimates and distribution estimates

Page 12: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287012

Focus on criteria for accepting the validity of analytical output

1. Accuracy [out-of-sample confirmation of estimated probability distribution and contributions of underlying components to that distribution]

2. Consistency (both internal and external) [multi-asset-class, assets & liabilities]

3. Broadness in scope [granularity and comprehensiveness]

4. Simplicity [complex enough to capture dynamics, but simple enough to be diagnosed and communicated to a quantitatively-informed business head]

5. Fruitfulness [output substantively contributes to impactful decisions]

Depending on the theory under evaluation, criteria may contradict each other so a relative weighting may be needed i.e., given a particular circumstance, some criteria are more important than others. Kuhn (1977)

In portfolio risk analysis, we typically add Timeliness to the evaluation process– a successful theory/model/system that cannot provide timely output is useless.

Page 13: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287013

How to persuade an individual to make a decision

Agree on values Disagree on values

Agree on facts Computational decision Negotiate

Disagree on facts Experiment Paralysis or chaos

From Koomey (2001) figure 19.1 p. 88.

Page 14: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Data deception 2

14

Page 15: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287015

“Figures don’t lie, but liars figure”

• Time window of sample

• Training/development sample

• Hold-out sample

• “Knightian” uncertainty vs. parameter uncertainty

• Biases

• Robustness

• Reproducibility

• Error bars

• Black Swans

• Perfect Storms

Page 16: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-2870

Dealing with priors: Confirmation bias

The trouble with most folks isn’t so much their ignorance, as knowing so many things that ain’t so.

Josh Billings related by Friedman (1965)

16

Page 17: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287017

Cognitive biases

• False dichotomy: Presenting two choices such that it seems they are the only possibilities.– Simple vs. complex model– Use no models vs. use only one model

• Perfect as the enemy of the good (or good enough)• Red herrings and missing forest for the trees• Biases

– Affect heuristic: Analyst or executive has “fallen in love with” a particular output so that they minimize model problems and exaggerate model strengths.

– Groupthink– Saliency bias: Overly influence by analogous, past success– Confirmation bias– Availability bias– Anchoring bias– Halo effect: Impression of model author, analyst or even model influences interpretation– Sunk-cost fallacy: A particular model output has driven strategy/investment– Overconfidence– Disaster neglect– Loss aversion

Traps arising from logical fallacies and cognitive biases

Page 18: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287018

Questioning the entire modeling enterprise

• Preproducibility is a prerequisite for attempting to reproduce a result: it involves providing an adequate description of an experiment or analysis for the work to be re-undertaken. It requires documentation, openness, and communication.

• Quantifauxcation is to assign a meaningless number, then pretend that since it’s quantitative, it’s meaningful. Usually involves some combination of data, pure invention, ad-hoc models, inappropriate statistics, and logical lacunae.

• Cost of most policy cost-benefit analyses is high: lost rationality

• Rates vs. probabilities– Randomness created by taking a random sample, assigning subjects at random, etc.– Probability model invented (assumed) for data that world generates– inferences are

only as good as the assumptions– Aleatory: Coin toss, die roll, under some circumstances, behave “as if” random– Epistemic: Stuff we don’t know– Trials are random, have same chance of success and have known dependence– can

quantify estimate uncertainty.– Ignorance does not equal randomness– Tendency to treat haphazard as random– Probability as metaphor

(From Stark 2015)

Page 19: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Latent factor modeling (with an example) 3

19

Page 20: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Shift in financial modeling approaches

Normal Distribution Assumptions

Linear estimation is good enough

Current Mainstream Paradigm (examples)

Focus on 2nd moment Linear Regressions

Bias toward tractable “closed form” solutions

Examples FastData™

BigComputation™

Machine Learning

Empirical orientation

Enablers

Skewed, Fat-taileddistributions

Focus on Skewness and Tail, generated by

simulations Deep Learning

Need to recognize non-linearities

Empirical approach with recognition of

non-linearityExamples

New Approaches

20

Page 21: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Latent Factor Estimation (LFE) Process

Latent factors

123

4…

N

EQUITY

FIXED INCOME…

OTHER

Market-Implied Volatility measure

(e.g. VIX)

Moving exponential time-decay window

20 Equity factors10 Fixed income

factors

Factor Loadings (“betas”)

Sensitivity of each security to the Latent Factors

betas

Factor Clustering(“PCA-like”)

Market-Implied Risk Forecast Adjustment

Regress

K securities

Focus on liquid names

Currently N=30 factors:• 20 Equity• 10 Fixed Income

Regime conditioning

Observed security prices

for illustrative purposes21

Page 22: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287022

Portfolio Simulation

N securities

Portfolio

Valuation Model…

Random draws

Single PositionK latent factors

Factor draws

Factor realizationsFactor Betas

value of ith security

K = 30 factors + K idiosyncratic drawsN ≈ 10k ~ 500k positionsJ ≈ 1m ~ 10m iterations

How much computing power needed?

J iterations

Distribution

Repeat J times

value of

portfolio for one iteration

equity

fixed income

other

Repeat N times

for illustrative purposes

Page 23: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287023

Simulating Multi-asset class portfolios requires HPCProbability

Start of TailExpected Tail Loss Or Conditional VaR

Tail Loss

Portfolio ValueExpected

Value

Volatility = 1 standard deviation

Losses due to portfolio values falling “in the tail” are rare events. It is necessary to simulate many many times to generate those states of the world resulting in Tail Losses.

• For example, in order to observe 10,000 instances of a Loss Event that occurs once every 33 years (3% probability), it is necessary to run 3 x 106 iterations.

• To perform so many simulations, computing speed is key.

• Monte Carlo simulations in traditional portfolios analyses were iterated for typically Order(103) ~ Order(104) times.

Image for illustrative purposes only; No live data being used

Page 24: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287024

Today Horizon

Expected return

Expected Tail Loss

Tail Loss

Volatility

Volatility• 1 Standard deviation of return distribution• Focus on annual performance• Easy to backtest

Value at Risk (Var)• Losses in excess of a defined tail loss threshold • Focus on rare, high-impact losses• Backtesting is difficult• Not a coherent risk measure

Expected Tail Loss• Expected Value of Tail Loss• Backtesting is extremely difficult• Coherent risk measure• Useful for stress testing

Risk Metrics

Images for Illustrative purposes only; No live data is used

Page 25: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

MAPPINGS

Separate risk interpretation from simulation

Cross-sectional/ Macro “factors”

GDP

Inflation

Wage Rise

Oil Price

Household Leverage

Etc…

Risk Interpretation

Latent Factors

Simulated Portfolio

distributions

25 Image for illustrative purposes. No Live Data is being used

Page 26: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Using machine learning to improve analytics

4

26

Page 27: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287027

Machine learning, artificial intelligence and new analytics

• Dealing with differences

1. Heterogeneous loan documents can be analyzed with natural language processing (NLP)

2. Information from multiple data sources can be distilled and classified into environment, social and governance (ESG) categories (e.g., aggregate local news)

• Disentangling data

1. Multiple data sources can be merged in a more useful way (e.g., managing missing data and overlapping data)

2. Data cleaning can be improved to consider better relationships and plausibility– not just data impossibilities (e.g., plausibility of spread and probability of default (PD))

• Adding to analytics

1. Artificial intelligence can produce better processes (e.g., monitor anomalies)

2. Unstructured data can be combined with structured data to improve credit models

3. Novel data can be included to improve processes and models (e.g., sensor data at manufacturing and transportation companies)

Page 28: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287028

The Promise of Machine LearningIntelligent Automation is the First Step in Defeating Information Overload

Machine Learning leverages adaptive algorithms to approximate the nuanced human approach to complex problems such as classification, optimization and interpretation

1 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 0 0 1 0 1

0 1 0 1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 0 0

0 0 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1

1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1

1 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 0

1 0 0 1 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 0 1 1 0 0

0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 0

1 0 0 1 1 0 1 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0

1 1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 0

0 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0

0 1 0 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1

0 1 1 1 1 1 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1

YOURresearch

digest

ActNOW

$$$

BREADTH

SYNTHESIS

SPEED

EFFICIENCY

Algorithms can handle vast arrays of information, potentially benefitting from the volume and diversity of information to identify nuances too sparse to find manually

Cognitive feedback loops adapt to user interaction, enabling a personalized experience and customized content

Plentiful processing power allows immediate and flexible access to insights, helping drive timely, actionable decisions

Low-cost scalability of machines allows valuable human resources to focus attention on more pressing needs

However, machine learning is only as good as its training…

Page 29: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287029

• Some topics are well defined e.g. Turbulence

• Others require the creation of an “association cloud”− words that define a topic by association

• Machines can suggest associationsbut some are spurious

• Determining whether a particular word is relatedworks well as an iterative loop of human reviewand machine analysis

Training the Tag CloudCarefully Parsing Context to Improve Algorithmic Accuracy

PriceMomentum

Relative Vigor Index

Price Persistence

Relative StrengthRSI

Momo Play

Price Trend

Flow Momentum

TechnicalAnalysis

Opening Price

CPI

Relative Value

Page 30: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287030

Marrying structured and unstructured data

• Interesting data lies in heterogeneous formats• Aggregating local information often better than global (national)

information• Non-financial information may have material impact over longer time

horizons• May identify issues with reported financial information

Can improve model development in terms of adding obscure, but important information.

Page 31: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Deriving ESG Fundamentals from Big Data

8,500+Monitor

companies

75,000+Draw from

sources

1,000,000+Aggregatedata points

monthly

300,000+ Extract and Analyze

signals monthly

Data as of 10/2016. Screen and data provided by TruValueLabs, partnered with State Street Global ExchangeSM.

Page 32: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Example Process: TruValue Labs*

Dynamic Web Feeds

Company Data

External Structured Datasets

NGO Sources

Content Parsing

Entity Detection

SignalCategorization

Category Scoring

Significance Clustering

Density Analysis

Freshness

Volatility / Confidence

Company Scorecards & Trends

Alerts & Reports

Client Reporting

Peer, Sector, Benchmark Comparison

Standardized

On-demand ConfigurableScalable Current Objective Accessible

Data Intake

Extract Signals

Calculate Significance

Generate Insights

* State Street Global ExchangeSM Partner

Page 33: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870

Data visualization and compelling communication

5

33

Page 34: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287034

Communicating Uncertainty

• Explain signal and noise in specific terms

• Communicate how model disentangles signal and noise

• Identify and root out data biases

• Educate on error bars and confidence intervals

Sampling error does not necessarily equal “uncertainty” in terms of implications of model output

Page 35: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287035

Data Visualization Tools (example 1)

Data are all figurative for illustrative purposes

Page 36: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287036 Data are all figurative for illustrative purposes

So which is richer, from a data insight perspective?

Page 37: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287037 Data are all figurative for illustrative purposes

Multiple attributes represented in two dimensionsExpected Returns vs. Volatility by Exposure Size – Sharpe Ratio as Color

High Sharpe Ratios, but small positions

OK Sharpe Ratios, and larger position

Page 38: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287038

Data Visualization Tools (example 2)

Data are all figurative for illustrative purposes

Page 39: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287039

Implement based on principles for risk-data visualization

• Match output to use cases

– Concentration risk assessment

– Risk appetite assessment (stress testing)

– Position-level limits/allocation

• Prepare for multiple dimensions (e.g., region, sector, asset class, customer type, size)

• Incorporate drill-down capability

• Contextualize output (e.g., benchmarks, time series, scenario-based)

• Use robust statistics (e.g., median, inter-quartile, mean absolute deviation)

• Use techniques to address data difficulties (e.g., Winsorization, shrinkage)

• Target near-instantaneous rendering of decision-support output

Risk data tend to be defined by outliers

Page 40: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287040

Contextualize: Describe regimes

• Business as usual (BAU) i.e., sustainable growth

• Cyclical (typical up and down growth– but same process and similar trend)

• Structural (move to a different growth path driven by a different process)

• Providing context is critical:

– Benchmark to competitors (cohorts)

– Benchmark to optimal, feasible outcome

– Show time series

– Drill into components on a consistent basis

Page 41: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287041

Improve visualization independent of model complexity

• Google study (Tuch, et al., 2012) found for websites:

– Visually complex websites are less appealing

– Prototypical websites (for a given category) are more appealing

– Simpler design is rated higher

• What makes analytical output compelling and credible?

– Prototypicality: Basic mental image one’s brain creates to categorize everything with which you interact.

– Cognitive fluency: One’s brain prefers output that is easier to process.

– Mere exposure effect: Familiarity arising from repeated exposure.

– Metric balancing: Too many metrics equals no understanding.

Page 42: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287042

Be transparent as to epistemic nature of analytical output

• What the executive doesn’t know, but is knowable: Model output is available and useful e.g., credible metrics identify risk (in the technical sense.)

• What the executive or the analyst don’t know yet, but is knowable: Proof-of-concept model is available; however, more investment (e.g., data, analysts, systems, tools) is needed.

• What is knowable with uncertainty: Model output is available and potentially useful; however, questions remain as to whether the model itself is specified correctly e.g., metrics reflect Knightian uncertainty. (Bayesian methods may be helpful.)

• What is unknowable: Model output is not available.

• What one chooses not to know: Incentives overpower model output.

Page 43: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287043

Move from discovery to action

Persuade decision makers regarding…

• Credibility

• Likelihood

• Materiality

• Addressibility

Identify, mitigate and explain chronic model weaknesses

• No feedback loops

• No thresholds

• Inadequate spill-over effects

Page 44: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287044

Build a narrative

1. Identify the [the set of] focal issue(s)

2. Determine the key micro or local forces impacting the focal issue(s)

3. Identify the key macro or global forces impacting the focal issue(s)

4. Rank by importance and uncertainty– distinguish parameterization of probability distributions (known unknowns) from model uncertainty (unknown unknowns)

5. Select scenario logic in terms of the parameters (and maybe models) to adjust to show range of possible outputs

6. Flesh out the scenarios of most importance and highest likelihood– drill into details of micro/macro forces and nature of parameterized probabilities and uncertainty

7. Determine implications

8. Select leading indicators and signposts to monitor evolution of scenario in light of decisions

From Koomey (2001)

Page 45: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287045

Focus on criteria for accepting the validity of analytical output

1. Accuracy [out-of-sample confirmation of estimated probability distribution and contributions of underlying components to that distribution]

2. Consistency (both internal and external) [multi-asset-class, assets & liabilities]

3. Broadness in scope [granularity and comprehensiveness]

4. Simplicity [complex enough to capture dynamics, but simple enough to be diagnosed and communicated to a quantitatively-informed business head]

5. Fruitfulness [output substantively contributes to impactful decisions]

Depending on the theory under evaluation, criteria may contradict each other so a relative weighting may be needed i.e., given a particular circumstance, some criteria are more important than others. Kuhn (1977)

In portfolio risk analysis, we typically add Timeliness to the evaluation process– a successful theory/model/system that cannot provide timely output is useless.

Page 46: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287046

Key points to remember for compelling communication

• Frame within a narrative• Avoid “quantifauxcation”• Contextualize (across time and across cohorts)• Address biases: Highlight data selection concerns and explain assumptions & process• Use transparency in model estimation process to spark questions and debate• Compare output from multiple models (when possible)• Visualize data– encourage interactive diagnostics and drill-down• Emphasize actionable insight• Educate

– Explain key components of analytical process– Teach how to understand confidence intervals (noise vs. signal)

Build on understanding: Descriptive, prescriptive and cognitiveMove from analysis (breaking into components) to synthesis (re-assembling with insight)

Page 47: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287047

References 1• Aikman, David, Piergiorgio Alessandri, Bruno Eklund, Prasanna Gai, Sujit Kapadia, Elizabeth

Martin, Nada Mora, Gabriel Sterne and Matthew Willison, 2009, “Funding liquidity risk in a quantitative model of systemic stability,” Working Paper 372, Bank of England.

• Arrow, Kenneth J. and Gerard Debreu, 1954, "Existence of an equilibrium for a competitive economy,” Econometrica 22 (3), pp. 265–290.

• Bohn, Jeffrey and Roger Stein 2009, Active Portfolio Management in Practice, Wiley.• Diaconis, Persi, 2003, “The problem of thinking too much,” Bulletin American Academy of

Science, Spring, pp. 26-38.• Fender, Ingo and John Kiff, 2004, “CDO rating methodology: Some thoughts on model risk and

its implications,” Monetary and Economic Department, BIS.• Feynman, Richard P., 1974, “Cargo cult science,” Engineering and Science, June, pp. 10-13.• Folinsbee, Kaila E; et al. 2007, "Quantitative approaches to phylogenetics; §5.2: Fount of

stability and confusion: A synopsis of parsimony in systematics,” In Winfried Henke, ed. Handbook of Paleoanthropology: Primate evolution and human origins: Volume 2, Springer, p. 168.

• Gordy, Michael B., 2003, “A risk-factor model foundation for ratings-based bank capital rules,” Journal of Financial Intermediation, 12, pp. 199-232.

• Gray, Dale and Samuel Malone, 2008, Macrofinancial Risk Analysis, Wiley.• Haidt, Jonathan, 2012, The righteous mind: Why good people are divided by politics and

religion, Pantheon Books, New York, NY.• Haldane, Andrew G., 2012, “The dog and the frisbee,” Bank of England speech.• Hamilton, James, 1994, Time Series Analysis, Princeton University Press.• Hansen, Lars P. and Thomas J. Sargent, 2015, “Four types of ignorance,” Journal of Monetary

Economics, 69, pp. 97-113.

Page 48: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-287048

References 2• Hansen, Lars P. and Thomas J. Sargent, 2007, “Recursive robust estimation and control

without commitment,” Journal of Economic Theory, 136, pp. 1-27.• Jablonka, Eva and Marion J. Lamb, 2014, Evolution in four dimensions: genetic, epigenetic,

behavioral, and symbolic variation in the history of life, MIT Press, Cambridge, MA.• Kalirai, Harvir and Martin Scheicher, 2002, “Macroeconomic Stress Testing: Preliminary Evidence for

Austria,” Financial Stability Report 3, Oesterreichische Nationalbank, 58-74.• Knight, Frank H., 1921, Risk, Uncertainty, and Profit, Hart, Schaffner & Marx; Houghton Mifflin

Company: Boston, MA. • Koomey, Jonathan, 2001, Turning numbers into knowledge: Mastering the art of problem solving,

Analytics Press, Oakland, CA.• Kuhn, Thomas S., 1977, The essential tension: Selected studies in scientific tradition change,

University of Chicago Press, Chicago.• Lewis, Michael, 2010, The big short: Inside the doomsday machine, W.W. Norton & Company: New

York, NY. • Lucas, Robert, 1977, “Understanding business cycles,” in K. Brunner and A.H. Metzler, eds.,

Stabilization of the domestic and international economy, Carnegie-Rochester Conference Series on Public Policy, 7729.

• Popper, Karl, 1972, Objective knowledge: An evolutionary approach, Clarendon Press, Oxford.• Rescher, Nicholas, 2004, “Leibniz quantitative epistemology,” Studia Leibnitiana, 36(2), pp. 210-231.• Simon, Herbert A., 1955, “A behavioral model of rational choice,” Quarterly Journal of Economics,

69(1), pp. 99-118.• Stark, Philip B., 2015, “Pay no attention to the model behind the curtain,” Working paper, UC

Berkeley.• Vasicek, Oldrich A., 1997, “The distribution of loan portfolio value,” Working paper, KMV.

Page 49: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-287049

Disclaimers and Important Risk InformationState Street Global Exchange is a trademark of State Street Corporation (incorporated in Massachusetts) and is registered or has registrations pending in ℠multiple jurisdictions. State Street Associates® is a research partnership between State Street Global Exchange and academia under which this document is produced. State Street Associates is a registered trademark of State Street Corporation.

Use of reportThis document and information herein (together, the “Content”) is for informational, illustrative and/or marketing purposes only and it does not constitute investment research or investment, legal, or tax advice. The Content provided is not, nor should be construed as, any offer or solicitation to buy or sell any product, service, or securities or any financial instrument, and it does not constitute any binding contractual arrangement or commitment for State Street Corporation and its subsidiaries and affiliates (“State Street”) of any kind. The Content provided does not purport to be comprehensive nor intended to replace the exercise of a client’s own careful independent review regarding any corresponding investment or other financial decision.

DistributionThe Content provided is not intended for retail clients, nor is intended to be relied upon by any person or entity, and is not intended for distribution to or use by any person or entity in any jurisdiction where such distribution or use would be contrary to applicable law or regulation. No permission is granted to reprint, sell, copy, distribute, or modify the Content in any form or by any means without the prior written consent of State Street.

Other Important DisclosuresThe Content provided has been prepared and obtained from sources believed to be reliable at the time of preparation, however it is provided “as-is” and State Street makes no guarantee, representation, or warranty of any kind including, without limitation, as to its accuracy, suitability, timeliness, merchantability, fitness for a particular purpose, non-infringement of third-party rights, or otherwise. Views and opinions expressed herein are those of the author(s) and are subject to change without notice based on market and other conditions and in any event may not reflect the views of State Street. State Street disclaims all liability, whether arising in contract, tort or otherwise, for any claims, losses, liabilities, damages (including direct, indirect, special or consequential), expenses or costs arising from or connected with the Content. The Content provided may contain certain statements that could be deemed forward-looking statements; any such statements or forecasted information are not guarantees or reliable indicators for future performance and actual results or developments may differ materially from those depicted or projected. Past performance is no guarantee of future results.

Prospects, clients or counterparties should be aware of the risks of participating in trading foreign exchange, equities, fixed income or derivative instruments or in investments in non-liquid or emerging markets. Prospects, clients or counterparties should be aware that products and services outlined may put their principal and capital at risk and that diversification does not ensure a profit or guarantee against loss.

Australia: This communication is made available in Australia by State Street Bank and Trust Company ABN 70 062 819 630, AFSL 239679 and is intended only for wholesale clients, as defined in the Corporations Act 2001.

Page 50: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-287050

Brazil: The products in this marketing material have not been and will not be registered with the Comissão de Valores Mobiliários, the Brazilian Securities and Exchange Commission ("CVM"), and any offer of such products is not directed to the general public within the Federative Republic of Brazil ("Brazil"). The information contained in this marketing material is not provided for the purpose of soliciting investments from investors residing in Brazil and no information in this marketing material should be construed as a public offering or unauthorised distribution of the products within Brazil, pursuant to applicable Brazilian law and regulations. To our knowledge, no analyst is related to an individual who works at the issuer of the securities subject to the content of this website; no analyst or their spouse holds, directly or indirectly, securities subject to the reports on this website; no analyst or their spouse is, directly or indirectly, involved in the acquisition, sale or intermediation of securities subject to the reports on this website; no analyst or their spouse has, directly or indirectly, a financial interest in relation to the subject matter of the reports on this website; and the relevant analyst's compensation is not, directly or indirectly, influenced by the revenues arising from the business and financial transactions carried out by the entity to which is associated or otherwise related. Canada: The products and services outlined in this document are generally offered in Canada by State Street Bank and Trust Company and/or by State Street Global Markets Canada Inc.Hong Kong: This communication is made available in Hong Kong by State Street Bank and Trust Company, which accepts responsibility for its contents, and is intended for distribution to professional investors only (as defined in the Securities and Futures Ordinance).Indonesia: This communication is made available in Indonesia by State Street Bank and Trust Company and its affiliates. Neither this communication nor any copy hereof may be distributed in Indonesia or to any Indonesian citizens wherever they are domiciled or to Indonesian residents except in compliance with applicable Indonesian capital market laws and regulations. This communication is not an offer of securities in Indonesia. Any securities referred to in this communication have not been registered with the Capital Market and Financial Institutions Supervisory Agency (BAPEPAM-LK) pursuant to relevant capital market laws and regulations, and may not be offered or sold within the territory of the Republic of Indonesia or to Indonesian citizens through a public offering or in circumstances which constitute an offer within the meaning of the Indonesian capital market law and regulations.Israel: This communication is made available in Israel by State Street Global Markets International Limited, which is not licensed under Israel’s Regulation of Investment Advice, Investment Marketing and Portfolio Management Law, 1995. This communication may only be distributed to or used by investors in Israel which are “eligible clients” as listed in the First Schedule to Israel’s Regulation of Investment Advice, Investment Marketing and Portfolio Management Law 1995.Japan: This communication is made available in Japan by State Street Global Markets (Japan) which is regulated by the Financial Services Agency of Japan as a financial instruments firm.Malaysia: This communication is made available in Malaysia by State Street Global Markets International Limited (“SSGMIL”) which is authorised and regulated by the United Kingdom’s Financial Conduct Authority. SSGMIL is not licensed within or doing business within Malaysia and the activities that are being discussed are carried out off-shore. The written materials do not constitute, and should not be construed as constituting: 1) an offer or invitation to subscribe for or purchase securities or futures in Malaysia or the making available of securities or futures for purchase or subscription in Malaysia; 2) the provision of investment advice concerning securities or futures; or 3) an undertaking by SSGMIL to manage the portfolio of securities or futures contracts on behalf of other persons.Mexico: This communication is distributed by State Street Bank and Trust Company and its affiliates from outside Mexico. State Street is not authorized to act as an investment advisor in Mexico or as a regulated entity under Mexican law. Any products and services outlined herein will be provided from outside Mexico exclusively.New Zealand: This communication is made available in New Zealand by State Street Bank and Trust Company, which accepts responsibility for its contents, and is intended for distribution only to “wholesale clients” as defined in the Financial Advisers Act 2008 (New Zealand). This communication is not intended to constitute “financial advice” (as that term is defined in the Financial Advisers Act), or “advice or assistance” (in terms of section 37(5) of the Securities Markets Act 1988 (New Zealand)), to any person.

Page 51: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-287051

Oman: This communication is made available in Oman by State Street Bank and Trust Company and its affiliates. The information contained in this communication is for information purposes and does not constitute an offer for the sale of foreign securities in Oman or an invitation to an offer for the sale of foreign securities. State Street is neither a bank or financial services provider registered to undertake business in Oman and is neither regulated by the Central Bank of Oman nor the Capital Market Authority. This document is confidential and is intended solely for the information of the person to whom it has been delivered. No representation or warranty is given as to the achievement or reasonableness of any research material contained in this communication. Nothing contained in this communication report is intended to constitute Omani investment, legal, tax, accounting, investment or other professional advice.Qatar: This communication is made available in Qatar by State Street Bank and Trust Company and its affiliates. The information in this communication has not been reviewed or approved by the Qatar Central Bank, the Qatar Financial Markets Authority or the Qatar Financial Centre Regulatory Authority, or any other relevant Qatari regulatory body.Singapore: This communication is made available in Singapore by State Street Bank and Trust Company, Singapore Branch (“SSBTS”), which holds a wholesale bank license by the Monetary Authority of Singapore. In Singapore, this communication is only distributed to accredited, institutional investors as defined in the Singapore Financial Advisers Act (“FAA”). Note that SSBTS is exempt from Sections 27 and 36 of the FAA. When this communication is distributed to overseas investors as defined in the FAA, note that SSBTS is exempt from Sections 26, 27, 29 and 36 of the FAA. State Street Bank and Trust Company Limited, Singapore Branch’s Unique Entity Number is T01FC6134G.South Africa: The products and services outlined in this document are made available in South Africa through either State Street Global Markets International Limited or State Street Bank and Trust Company, both of which are authorized in South Africa under the Financial Advisory and Intermediary Services Act, 2002 as a Category I Financial Services Provider; FSP No. 42823 and 42671 respectively. South Korea: This communication is made available in South Korea by State Street Bank and Trust Company and its affiliates, which accept responsibility for its contents, and is intended for distribution to professional investors only. State Street Bank and Trust Company is not licensed to undertake securities business within South Korea, and any activities related to the content hereof will be carried out off-shore and only in relation to off-shore non-South Korea securities.Taiwan: This communication is made available in Taiwan by State Street Bank and Trust Company and its affiliates, which accept responsibility for its contents, and is intended for distribution to professional investors only. State Street Bank and Trust Company is not licensed to undertake securities business within Taiwan, and any activities related to the content hereof will be carried out off-shore and only in relation to off-shore non-Taiwan securities.Turkey: This communication is made available in Turkey by State Street Bank and Trust Company and its affiliates. The information included herein is not investment advice. Investment advisory services are provided by portfolio management companies, brokers and banks without deposit collection licenses within the scope of the investment advisory agreements to be executed with clients. Any opinions and statements included herein are based on the personal opinions of the commentators and authors. These opinions may not be suitable to your financial status and your risk and return preferences. Therefore, an investment decision based solely on the information herein may not be appropriate to your expectations.United Arab Emirates: This communication is made available in United Arab Emirates by State Street Bank and Trust Company and its affiliates. This communication does not, and is not intended to, constitute an offer of securities anywhere in the United Arab Emirates and accordingly should not be construed as such. Nor does the addressing of this research publication to you constitute, or is intended to constitute, the carrying on or engagement in banking, financial and/or investment consultation business in the United Arab Emirates under the rules and regulations made by the Central Bank of the United Arab Emirates, the Emirates Securities and Commodities Authority or the United Arab Emirates Ministry of Economy. Any public offer of securities in the United Arab Emirates, if made, will be made pursuant to one or more separate documents and only in accordance with the applicable laws and regulations. Nothing contained in this communication is intended to endorse or recommend a particular course of action or to constitute investment, legal, tax, accounting or other professional advice. Prospective investors should consult with an appropriate professional for specific advice rendered on the basis of their situation. Further, the information contained within this communication is not intended to lead to the conclusion of any contract of whatsoever nature within the territory of the United Arab Emirates. This communication has been forwarded to you solely for your information, and may not be reproduced or passed on, directly or indirectly, to any other person or published, in whole or in part, for any purpose. This communication is addressed only to persons who are professional, institutional or otherwise sophisticated investors.

Page 52: Overcoming Data Deluge (State Street) presentation at the Chief Data Scientist, USA 2016

GXN-2870GXN-287052

United Kingdom and European Union: The products and services outlined herein are only offered to professional clients or eligible counterparties through State Street Bank and Trust Company, London Branch, authorised and regulated by Federal Reserve Board, authorised and subject to limited regulation by the Prudential Regulation Authority and subject to regulation by the Financial Conduct Authority and/or State Street Global Markets International Limited, authorised and regulated by the Financial Conduct Authority Details about the extent of our regulation by the Financial Conduct Authority and Prudential Regulation Authority are available from us on request. Please note that certain foreign exchange business (spot and certain forward transactions) are not regulated by the Financial Conduct Authority.

United States and Latin America: The products and services outlined in this document are generally offered in the United States and in Latin America by State Street Bank and Trust Company and/or by State Street Global Markets, LLC.

 

Please contact your sales representative for further information.

 

© 2016, State Street Corporation, All rights reserved.