The real question is can we get consistency without mode collapse? Are they orthogonal, or necessarily opposed?
Or to put it more bluntly.. can we get correctness without cringe ;)
I think it could be done, to a degree, with current systems, but it would be more expensive. You'd increase the temperature, and then you'd do more runs. And you could do that iteratively... re-generate each paragraph a few times, take the best of N. So you end up with interesting output, which still meets some threshold of quality.
Actually that doesn't sound too hard to slap together right now...