undefined | Better HN

0 pointstheincredulousk6y ago0 comments

It still just means you're finding parameters that match some subset of the relationships present in the data as a whole.

All sliding the window does is discover the parameters that work for the whole data set in chunks - it is an artificial distinction. It still regresses to: you've found some number space generated by some function that matches some percentage of the numerical relationships (correlations) present in the data.

It's circular reasoning because during creating the parameters you're testing it on the "future" data. It only "guarantees" success in the "future" because you discarded all the parameters that didn't work in the "future". No different from writing a model that uses the S&P 500 price "parameters" between 250 and 1000 and back-testing it on data from 1950-1996.

The only way to prove your algorithm's robustness is to generate random data and test it on that. Once you've tested against every one of the infinite possible realities of a single time window, then you can rightly assert that past results have guaranteed success in the future. Hint: it's impossible, but the random data testing is actually the correct technique to test algorithms at scale.

Back-testing on historical data is like a footnote compared to the thesis simulation can generate - the only value it contains is correlating relationships between market data and external variables not present in the numbers. Back-testing to tune an algorithm based purely on the numbers in the data is just an exercise in quantified hind-sight bias.

0 comments

dzej_bi6y ago

> The only way to prove your algorithm's robustness is to generate random data and test it on that.

I would never to that. This algorithm appears to have worked well on EURUSD daily timeframe candlesticks data. It would be ridiculous to assume that it can work well on a random set of data, like global average temperatures daily or rate of births worldwide. Or even prices of oil or other currency pair.

theincredulouskOP6y ago

"Random data" didn't mean a random data set from a different domain. It meant random data from the same domain - simulated price/volume data within a reasonable range. If it can't work well on that, then it isn't a trading algorithm, it is a glorified fit curve.

scawf6y ago

> It meant random data from the same domain - simulated price/volume data within a reasonable range

How do you know what is a reasonable range without hypothesis on the price distribution ?

Where does these hypothesis comes from ? historical data ?

So.. is that really valid ?

1 more reply

j / k navigate · click thread line to collapse