There's a distinction worth making here. A/B testing the interface button placement, hue of a UI element, title styling — is one thing. But you wouldn't accept Photoshop silently changing your #000000 to #333333 in the actual file. That's your output, not the UI around it. That's what LLMs do. The randomness isn't in the wrapper, it's in the result you take away.
It’s an assistant, answering your question and running some errands for you. If you give it blind permission to do a task, then you’re not worrying about what it does.
That's not what they're doing, they are trying to use plan mode to plan out a task. I don't know where you could have got the idea that they were blindly doing anything.
The plan mode uses sub tasks to read, find, list, or grow out certain information. I would imagine those tasks would be covered by the dubiously discovered a/b testing