I had similar thoughts to you.
However diffusion models suck at details, like how many fingers on a hand, and with language words and characters matter, both which ones and where they are.
So while I'm sure diffusion could produce walls of text that look convincingly like a blog post at a glance say, I'm not sure it would hold up to anyone actually reading.