Best metric here would be a Sharpe Ratio and drawdown details. https://en.wikipedia.org/wiki/Sharpe_ratio
> This system is impossible to test. I would be hesitant to trust it.
Disagree. Best test would be a paper test going forward, audited by a common platform.