I believe that this is a fundamental problem of testing in all distributed systems: you are trying to test and validate for emergent behaviour. The other term we have for such systems is: chaotic. Good luck with that.
In fact, I have begun to suspect that the way we even think about software testing is backwards. Instead of test scenarios we should be thinking in failure scenarios - and try to subject our software to as much of those as possible. Define the bounding box of the failure universe, and allow computer to generate the testing scenarios within. EXPECT that all software within will eventually fail, but as long as it survives beyond set thresholds, it gets a green light.
In a way... we'd need something like a bastard hybrid of fuzzing, chaos testing, soak testing, SRE principles and probabilistic outcomes.