The nice thing about this compared to generating tests in advance is that, if your app changes in non-material ways that would otherwise break a test, you don't need to regenerate - the AI will "adapt".
It's also, arguably, a form of fuzzing, maybe? I guess? Having it be non-deterministic means it might test the same thing in different - ideally all correct - ways, helping to uncover bugs.