Hull Tactical Asset Allocation

October 25, 2016

0

SHARE

Cross Validation, or How to Minimize Cheating

We have learned to be skeptical of simulated returns. A marketing guy visits to pitch an investment strategy. You ask a few questions and find that the strategy was researched and tested on the same data. You are not surprised when actual results are not even close to the simulated results. It turns out that the research staff tortured the data to get the best fit. In short, they cheated.

Our analysts use cross validation techniques to avoid cheating. Let’s consider an example. We have a data set with 300 observations. We divide into three parts (1, 2 and 3), each with 100 observations. We train a model on parts 1 and 2, and test the model on part 3. Then we train on parts 1 and 3 and test on part 2. Finally, we train on parts 2 and 3 and test on part 1. This is an example of k-fold cross validation with k=3.

Another example of cross validation is called “leave one out” cross validation. In this case, a model is trained on all but one observation and tested on the hold out observation. This can be done 100 times on a data set with 100 observations. Leave one out is just k-fold cross validation where k equals the number of observations in the data set.

One wrinkle we have to consider is that we are generally working with time-series data. The observations are ordered and earlier data is correlated with later data. One way to deal with times series data is to divide a data set into two parts. Train a model on the first 70% of the data and test it on the last 30% of the data. For multiple model evaluation, one can divide the data into three parts. Train each model on the first 60% of the data. Test each model on the next 20% of the data. Choose the best model and then test it on the last 20% of the data.

One can use k-fold cross validation for a time series data set. Consider an example with k=3. We have a daily model that looks at returns over the past five days in order to forecast the next day’s return. The data set has 300 observations that we can divide into three parts, each with 100 observations. As we discussed earlier, we train on Parts 1 and 2 and test on Part 3; train on Parts 1 and 3 and test on Part 2 and finally we train on Parts 2 and 3 and test on Part 1. Note that when time series models are involved data is often “lost” for training. In this case we can only train the model on 95 observations out of the 100 observations of each subset, because the model needs to look back five days in order to create an estimate.

Walk forward analyses are often used to simulate investment strategies. One might train a strategy on the first 700 observations of a data set with 1,000 observations. The strategy could then be tested on observations 701 through 800. The strategy is then trained on the first 800 observations (or perhaps observations 101 through 800) and tested on observations 801 through 900. The process is executed a final time to train through observation 900 and then test on observations 901 through 1000.

Using cross validation will not guarantee stellar real time investment returns. But it will reduce the probability of finding spurious relationships or relationships that are unlikely to hold up over time. So the next time you see an investment strategy presentation with simulated returns, ask a few questions to see if there was any cheating involved in the research. You could save yourself a lot of money.

For a technical discussion of cross validation techniques, readers can consult:

S. Arlot and A. Celise (2010). A Survey of Cross-Validation Procedures for Model Selection

http://projecteuclid.org/euclid.ssu/1268143839

Here is link to an article about cross validation and time series analysis:

C. Bergmeir, R. Hyndman and B. Koo (2015). A Note on the Validity of Cross-Validation of Evaluating Time Series Prediction

http://robjhyndman.com/papers/cv-wp.pdf

For less technical discussions try the terms “cross validation techniques” or “cross validation for time series analysis” in an Internet search engine like Google or Yahoo!

©2016 Hull Tactical Asset Allocation, LLC (“HTAA”) is a Registered Investment Adviser. The information set forth in HTAA’s market commentaries and writings are of a general nature and are provided solely for the use of HTAA, its clients and prospective clients. This information does not constitute investment advice, which can be provided only after the delivery of HTAA’s Form ADV and once a properly executed investment advisory agreement has been entered into by the client and HTAA. These materials reflect the opinion of HTAA on the date of production and are subject to change at any time without notice. Due to various factors, including changing market conditions or tax laws, the content may no longer be reflective of current opinions or positions. Past performance does not guarantee future results. All investments are subject to risks.

SHARE

BACK TO BLOG >

Show Comments (0)

LEAVE A COMMENT

Cancel reply
Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Δ

Cross Validation, or How to Minimize Cheating

LEAVE A COMMENT

Cancel reply