An agent making a request on the explicit behalf of someone else is probably something most of us agree is reasonable. "What are the current stories on Hacker News?" -- the agent is just doing the same request to the same website that I would have done anyways.
But the sort of non-explicit just-in-case crawling that Perplexity might do for a general question where it crawls 4-6 sources isn't as easy to defend. "Are polar bears always white?" -- Now it's making requests I wouldn't have necessarily made, and it could even been seen as a sort of amplification attack.
That said, TFA's example is where they register secretexample.com and then ask Perplexity "what is secretexample.com about?" and Perplexity sends a request to answer the question, so that's an example of the first case, not the second.
What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?
Pinky promises? Ethics? Laws? Technical limitations? Leeroy Jenkins?
What prevents anyone else? robots.txt is a request, not an access policy.
Do you still see authentic human traffic on your domains, is it easy to discern?
I feel like I missed the bus on running a blog pre-AI.
You can put up a paywall depending on UserAgent or OS (has been done).
In short, it's a 2-way street: the client on the other end of the TCP pipe makes a request, and your server fulfills the request as it sees fit.
If you give them a URL that does not appear in Google, ask them to visit that URL specifically, and then notice the content from that URL in the training data, it's proof that they're doing this, which would be quite damaging to them.
When you swap in an AI and ask what are the current stories. The AI fetches the front page and every thread and feeds it back to you. You are less likely to participate in discussion because you've already had the info summarized.
Am I supposed to spend money on Amazon.com when I visit the website just because Amazon wants me to?
And yet people install ad blockers and defend their freedom to not participate in this because they don't want to be annoyed by ads.
They claim that since they are free to not buy an advertised product, why would they be forced to see ads for it. But Foo news claims that they are also free to not waste bandwidth to serve their free website to people who declare (by using an ad blocker or the modern alternative: AI aummarizera) they won't participate in the funding of the service
HTTP/1.1 402 Payment Required
WWW-price: 0.0000001 BTC, 0.000001 ETH, 0.00001 DOGE
> You are less likely to participate in discussionyou (or AI on your behalf) paid instead. Many sites would probably like it better.
If store businesses at least partially relies on obscurity of information that can be solved through automated means (e.g. storefronts tend to push visitors towards products they don't want, and buyer agents are fighting that and looking for something buyers instructed them) just playing this cat and mouse game of blocking agents, finding workarounds, and repeating the cycle is only creating perverse technological contraptions that neither party is really interested in - but both are circumstantially forced to invest into.
Corporate America. Where clean code goes to die.
Mind you I'm not saying electric scooters are a bad idea, I have one and I quite enjoy it. I'm saying we didn't need five fucking startups all competing to provide them at the lowest cost possible just for 2/3s of them to end up in fucking landfills when the VC funding ran out.
At this moment I am using Perplexity's Comet browser to take a spotify playlist and add all the tracks to my youtube music playlist. I love it.
If sites want to avoid people using agents, they should offer the functionality that people are using the agents to accomplish.
Excellent. Personal shoppers are 'adblock for IRL'.
>You owe the companies nothing. You especially don't owe them any courtesy. They have re-arranged the world to put themselves in front of you. They never asked for your permission, don't even start asking for theirs.
Everyone having a personal shopper obviously changes the relationship to the products and services you use or purchase via personal shopper. Good, bad, whatever.
The point is the web is changing, and people use a different type of browser now. Ans that browser happens to be LLMs.
Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.
Grow up cloud flare, some is your business models don’t make sense any more.
You say this as though all LLM/otherwise automated traffic is for the purposes of fulfilling a request made by a user 100% of the time which is just flatly on-its-face untrue.
Companies make vast amounts of requests for indexing purposes. That could be to facilitate user requests someday, perhaps, but it is not today and not why it's happening. And worse still, LLMs introduce a new third option: that it's not for indexing or for later linking but is instead either for training the language model itself, or for the model to ingest and regurgitate later on with no attribution, with the added fun that it might just make some shit up about whatever you said and be wrong. And as the person buying the web hosting, all of that is subsidized by me.
"The web is changing" does not mean every website must follow suit. Since I built my blog about 2 internet eternities ago, I have seen fad tech come and fad tech go. My blog remains more or less exactly what it was 2 decades ago, with more content and a better stylesheet. I have requested in my robots.txt that my content not be used for LLM training, and I fully expect that to be ignored because tech bros don't respect anyone, even fellow tech bros, when it means they have to change their behavior.
Its a clear road for disaster. I am honestly surprised by how great Hackernews is, to that comparison where most people are sharing it for the love of the craft as an example. And for that hackernews holds a special place in my heart. (Slightly exaggerating to give it a thematic ending I suppose)
They will be quite the wiser if they track/limit how often your shopper enters the store. You probably aren't entering the same store fifteen times every day and neither would be your shopper if they were only doing it on your behalf.
Might does not make right.
It's like saying a web browser that is customized in any way is wrong. If one configures their browser to eagerly load links so that their next click is instant, is that now wrong?
that's called breaking and entering, and generally frowned upon -- by-passing the "closed sign".