All sliding the window does is discover the parameters that work for the whole data set in chunks - it is an artificial distinction. It still regresses to: you've found some number space generated by some function that matches some percentage of the numerical relationships (correlations) present in the data.
It's circular reasoning because during creating the parameters you're testing it on the "future" data. It only "guarantees" success in the "future" because you discarded all the parameters that didn't work in the "future". No different from writing a model that uses the S&P 500 price "parameters" between 250 and 1000 and back-testing it on data from 1950-1996.
The only way to prove your algorithm's robustness is to generate random data and test it on that. Once you've tested against every one of the infinite possible realities of a single time window, then you can rightly assert that past results have guaranteed success in the future. Hint: it's impossible, but the random data testing is actually the correct technique to test algorithms at scale.
Back-testing on historical data is like a footnote compared to the thesis simulation can generate - the only value it contains is correlating relationships between market data and external variables not present in the numbers. Back-testing to tune an algorithm based purely on the numbers in the data is just an exercise in quantified hind-sight bias.