Which is what I said, you can screenshot & compare. But it becomes a fuzzy compare due to acceptable precision differences.
And it ends up being more of an integration test and not a unit test.
> Every language has valid-per-spec differences.
They really don't, but that's not entirely what I'm talking about. I'm talking about valid hardware behavior differences, which doesn't exist broadly. How a float performs in Java is well-defined and never changes. How numbers perform in most languages is well-defined and does not vary.
GPU shaders are completely different. Numbers do not have consistent behavior across differing hardware & drivers. This is a highly unique situation. Even in languages where things are claimed to be variable (like the size of int in C & C++), end up not actually varying, because things don't cope well with it. Shaders don't play any such similar games.