10 ways backtests lie by tucker balch

10 WAYS BACKTESTS LIE

TUCKER BALCH, PH.D. PROFESSOR, GEORGIA TECH CO-FOUNDER AND CTO, LUCENA RESEARCH

OR… Mistakes quant developers make that cause backtests to be inaccurate.

WHAT I’LL COVER Introductions What is a backtest? How backtests lie: 1.  In sample backtesting 2.  Survivor bias 3.  Assume you can observe the close and trade at the close 4.  Ignoring market impact 5.  Assume you can buy $10M of a $1M company 6.  Data mining fallacy 7.  Stateful strategy luck 8.  Buy at the open 9.  Don’t trust complex models 10.  Don’t forward test

ABOUT THE SPEAKER •  Professor of Interactive Computing at

Georgia Institute of Technology. •  Teach courses in Artificial Intelligence and

Finance. •  Teach MOOCs on Machine Learning for

Trading. •  Published over 120 research publications

related to Robotics and Machine Learning. •  Co-founder of Lucena Research.

ABOUT THE SPEAKER •  Professor of Interactive Computing at

Georgia Institute of Technology. •  Teach courses in Artificial Intelligence and

Finance. •  Teach MOOCs on Machine Learning for

Trading. •  Published over 120 research publications

related to Robotics and Machine Learning. •  Co-founder of Lucena Research.

New book ->

ABOUT LUCENA RESEARCH •  We are a fin-tech company who

employ experts in Computational Finance, Quantitative Analysis, and Software Development.

•  We deliver investment decision support technology at a fraction of the cost of an in house quant shop.

•  Python-based infrastructure. •  Erez Katz, CEO •  Eric Davidson, VP http://lucenaresearch.com

LUCENA RESEARCH!

BUILDING A MODEL FROM DATA

BACKTESTING TO VALIDATE THE MODEL

BACKTESTING TO VALIDATE THE MODEL

Roll forward cross validation Out of sample validation

10 WAYS BACKTESTS LIE

1. IN SAMPLE BACKTESTING Description: Backtesting over the same data you used to train your model.

1. IN SAMPLE BACKTESTING Description: Backtesting over the same data you used to train your model. This method is doomed to succeed spectacularly!

1. IN SAMPLE BACKTESTING Description: Backtesting over the same data you used to train your model.

Training

Testing

1. IN SAMPLE BACKTESTING How to avoid?

Training

Testing

1. IN SAMPLE BACKTESTING How to avoid? More generally, build safeguards and procedures to prevent testing over the same data you train over. E.g., Train over 2007, test over 2008-forward.

2. SURVIVOR BIAS Description: Selective use of data in a statistical study that emphasizes examples that are “alive” at the end of the study. The significance of the bias depends on how important survival is to the quantity being measured.

2. SURVIVOR BIAS Example: Company claims: “Our drug reduces the blood pressure of those who take the drug over time.” 5 year study:

•  Randomly select 500 cardiac patients •  Administer drug to them •  Measure their blood pressure monthly

Results: •  160/110 average first month •  135/80 average at end of study

Do you believe this is a good drug?

2. SURVIVOR BIAS Problem: 58 of the patients they started with have died since the start of the study.

Note: 58 of the members of the S&P 500 in 2008 are now delisted. Not just out of the S&P 500, but gone as companies. 11.6%

2. SURVIVOR BIAS

2. SURVIVOR BIAS

Green: Current S&P 500, Purple: Point in time S&P 500

--Lucena Research

2. SURVIVOR BIAS How to prevent?

•  Use historic index membership. •  Pair with SBF-free data. •  Use these indices as your universe for testing.

3. OBSERVING THE CLOSE Description: You assume you can observe information recorded at market close, and trade on it.

Examples:

•  Closing price/volume •  Technicals based on price/volume

3. OBSERVING THE CLOSE Description: You assume you can observe information recorded at market close, and trade on it.

Examples:

•  Closing price/volume •  Technicals based on price/volume

This is a specific case of “look ahead bias.” Other examples:

•  Earnings reports •  News feeds

3. OBSERVING THE CLOSE How to prevent? Ensure that information with timestamp X cannot be acted on until X+1.

Example: Data marked January 15 cannot be traded until the open on January 16.

4. IGNORING MARKET IMPACT Description: The act of trading affects price. Historical data does not include your trades and is therefore not an accurate representation of the price you would get.

4. IGNORING MARKET IMPACT

4. IGNORING MARKET IMPACT

Swetha Shivakumar, Georgia Tech

4. IGNORING MARKET IMPACT How to prevent? Include a “slippage” or “market impact” model in your backtests.

5. BUY $10M OF A $1M COMPANY Description: Backtest allows a strategy to buy (or short) as much of a symbol as it wants.

5. BUY $10M OF A $1M COMPANY Description: Backtest allows a strategy to buy (or short) as much of a symbol as it wants. There often is real alpha in thinly traded stocks.

5. BUY $10M OF A $1M COMPANY Description: Backtest allows a strategy to buy (or short) as much of a symbol as it wants. There often is real alpha in thinly traded stocks.

This is a specific example of the more general issue of capacity limitations.

5. BUY $10M OF A $1M COMPANY How to avoid? Ensure the backtester prohibits trading more dollar volume than actually was available on that day.

Add slippage/market impact models to penalize buying too much.

6. DATA MINING FALLACY Description: If you generate and test enough strategies you’ll eventually find one that “works” in a backtest. The quality of the strategy cannot be distinguished from random luck.

6. DATA MINING FALLACY Description: If you generate and test enough strategies you’ll eventually find one that “works” in a backtest. The quality of the strategy cannot be distinguished from random luck. Example: Look for “skilled” coin flipper among 10,000 candidates.

6. DATA MINING FALLACY How to avoid? You can’t! However you can and should forward test before committing significant capital.

7 THROUGH 10 7. Stateful strategy luck

8. Buy at the open

9. Trust complex models

10. Don’t forward test

THANK YOU! www.lucenaresearch.com

10 ways backtests lie by tucker balch

Economy & Finance

model backtesting

data backtesting

data building

model roll

sample backtesting description

selective use of data

data mining fallacy

research publications