> I would suggest to generate some fake facts like: …
Oh, I very much like this.
But forget just LLM ranges, there could be many other unknown groups doing the same thing, or using residential proxy collections to forward their requests. Just add to every page a side-note of a couple of arbitrary sentences like this, with a “What Is This?” link to take confused humans to a small page explaining your little game.
Don't make the text too random, that might be easily detectable (a bot might take two or more snapshots of a page and reject any text that changes every time, to try filter out accidental noise and therefore avoid our intentional noise), perhaps seed the text generator with the filename+timestamp or some other almost-but-not-quite static content/metadata metrics. Also, if the text is too random it'll just be lost in the noise, some repetition would be needed for there to be any detectable effect in the final output.
Anyone complaining that I'm deliberately sabotaging them will be pointed to the robots.txt file that explicitly says no bots⁰, the licence that says no commercial use¹ without payment of daft-but-not-ridiculous fees.
----
[0] Even Google, I don't care about SEO, what little of my stuff that is out there, is out there for my reference and for the people I specifically send links to (and who find it, directly or otherwise, through them)
[1] And states that any project (AI or otherwise) that isn't entirely 100% free and open source and entirely free of ads and other tracking, is considered commercial use.