Perplexity Response to Cloudflare (opens in new tab)

(twitter.com)

56 pointsTokumei-no-hito9mo ago45 comments

45 comments

> When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect

Strawmen. They aren't arguing that any automated tool should be suspect. They are arguing that an automated tool with sufficient computing power should be suspect. By Perplexity's reasoning, I should be able to set up a huge server farm and hit any website with 1,000,000 requests per second because 1 request is not seen as harmful. In this case, of course, the danger with AI is not a DOS attack but an attack against the way the internet is structured and the way website are supposed to work.

> This overblocking hurts everyone. Consider someone using AI to research medical conditions,

Of course you will put medical conditions in there: appeal to the hypothetical person with a medical problem, a rather contemptible and revolting argument.

> This undermines user choice

What happens to user choice when website designers stop making websites or writing for websites because the lack of direct interaction makes it no longer worthwile?

> An AI assistant works just like a human assistant.

That's like saying a Ferarri works like someone walking. Yes, they go from A to B, but the Ferarri can go 400km down a highway much faster than a human. So, no, it has fundamental speed and power differences that change the way the ecosystem works, and you can't ignore the ecosystem.

> This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.

As a website designer and writer, I consider all AI assistants to be actual threats, along with the entirety of Perplexity and all AI companies. And I'm not the only one: many content creators feel the same and hope your AI assistants are neutralized with as much extreme prejudice as possible.

taskforcegemini9mo ago

As a sysadmin occasionally responsible for resolving load-spikes caused by bots/crawlers: well said!

viraptor9mo ago

> By Perplexity's reasoning, I should be able to set up a huge server farm and hit any website with 1,000,000 requests per second because 1 request is not seen as harmful.

That's a slippery slope all the was to absurd. They're not talking about millions of requests a second. They're talking about a browsing session (few page views) as a result of user's action. It's not even additional traffic and there's no extra concurrency - it's likely the same requests a user would make just with shorter delay.

vouaobrasil9mo ago

> That's a slippery slope all the was to absurd. They're not talking about millions of requests a second. They're talking about a browsing session (few page views) [...]

My statement was meant as an analogy. I'm not saying an argument against Perplexity and agents is about requests per second. I'm saying there's an analogous argument: that the power of AI to transform the browsing experience is akin to the power of a server farm and thus a net negative. Therefore, your interpretation of what I was saying is wrong.

just-tom9mo ago

While agents act on behalf of the user, they won't see nor click any ads; they won't sign-up to any newsletter; they won't buy the website owner a coffee. They don't act as humans just because humans triggered them. They simply take what they need and walk away.

1gn159mo ago

Do you have an ad-blocker? If so, should website owners be able to disable your ad-blocker via a setting they send to you? It's their content, after all.

Rebelgecko9mo ago

As a frugal person it's frustrating when websites block me for using an ad blocker, but I also don't think publishers should be required to send me data

beardyw9mo ago

They may, and do, refuse to send you the page. That is a more realistic parallel.

Without advertising the web would be largely unsupportable financially without per site subscriptions.

skeledrew9mo ago

Without advertising, the web would be a far healthier place today, free of click-bait and ad-infested, low effort, low quality content. Bad business models wouldn't've survived the light of day.

Tokumei-no-hitoOP9mo ago

fair point.

where's the front page CF callout for google search agent? they wouldn't dare. i don't remember the shaming for ad and newsletter pop up blockers.

that being said, agree with you that sites are not being used the way they were intended. i think this is part of the evolution of the web. it all began with no monetization, then swung far too much into it to the point of abuse. and now legitimate content creators are stuck in the middle.

what i disagree on is that CF has the right to, again allegedly, shame perplexity on false information. especially when OAI is solving captchas and google is also "misusing" websites.

i wish i had an answer to how we can evolve the web sustainably. my main gripe is the shaming and virtue signaling.

skybrian9mo ago

As far as I know Google's bot respects robots.txt and doesn't try to evade detection?

Tokumei-no-hitoOP9mo ago

maybe. the allegations against perplexity are being challenged and i haven't seen any research on google agent. CF can demonstrate nonpartisanship, and gain credence for their claims against perplexity, by being transparent about other players in the space.

(as an aside, not to shift the goalpost to the elephant in the room, but i didn't see any blog posts on the shameless consumption of every single thing on the internet by OAI, google and anthropic. talk about misuse..)

skeledrew9mo ago

Funnily enough this would also mean some humans are not human, like me. I exhibit that behavior exactly as well. Maybe I'm an agent acting on behalf of myself, whatever that means.

skybrian9mo ago

Yep, also true of something like curl -s $url | llm --system 'summarize this article'

vouaobrasil9mo ago

It's also true that you could dismantle a building with a hammer, which accomplishes the same as dynamite. So why not just sell dynamite at the local hardware store along with hammers?

lofaszvanitt9mo ago

And Cloudflare is the police entity to enforce this, right?

avallach9mo ago

Cloudflare did explain a proper solution: "Separate bots for separate activities". E.g. here: one bot for scraping/indexing, and one for non-persistent user-driven retrieval.

Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?

Perplexity's almost seems to believe that "robots.txt was only made for scraping bots, so if our bot is not scraping, it's fair for us to ignore it and bypass the enforcement". And their core business is a bot, so they really should have known better.

viraptor9mo ago

They're already doing that https://docs.perplexity.ai/guides/bots There's PerplexityBot and Perplexity‑User.

avallach9mo ago

And then once they see that the website operator blocked the perplexity-user, apparently instead of respecting that, they not only ignore robots.txt, but actively try to bypass the security measures established with the explicit purpose of limiting their access. If this was about bypassing DRM rather than AI-WAF, it would be plainly illegal.

To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.

skeledrew9mo ago

I use Perplexity regularly for research because it does a good job accessing, preprocessing and citing relevant resources. Which do you think is better: the service respects my desire for it to do a good job and ignore site owners blocking agent access because "don't like automated agents", or the service respects said site owners' - what I consider unreasonable - desires and not do a good job for me? Expand to the inevitably increasing LLM-for-research user base.

1 more reply

skeledrew9mo ago

> bypassing a bot block is a violation of the owners right to decide whom to admit?

There is only a violation if the bot finds a way around a login block. Same for human. But whatever is on the public web is... public. For all.

hunter2_9mo ago

So it's ok to block someone "because you didn't include a session token I gave you in exchange for knowing the password" but it's not ok to block someone "because you didn't stick to manually-operated user agents as I told you via robots.txt"? What about not letting someone play level 42 "because you didn't complete level 41"?

A web server providing a response to your request is akin to a restaurant server doing the same. Except for specific situations related to civil rights, they are free to not deal with you for any reason.

skeledrew9mo ago

Typically when something is behind a login, it denotes a private space intended for a particular set of persons given explicit access. It's senseless to block people from using agents if the same people would otherwise have access, unless there is an abuse of that access, ie. action which is to the detriment of the space. And though some of that does happen, it obviously isn't the full story. I have a Perplexica instance running locally that I sometimes use (but often don't as Perplexity does a much better job). Should that also be blocked?

Hmm maybe a civil case could be potentially made here too, re disability. By blocking LLM use, sites are reducing the ability of select users to reasonably interact with the content. Just could become a thing in a few years if this nonsense continues.

SilverElfin9mo ago

> When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like.

I wonder if Perplexity or others mix the traffic of the two types so they’re indistinguishable, specifically to make this argument.

avallach9mo ago

@viraptor above mentions that they actually do try first with an explicit perplexity-agent: https://news.ycombinator.com/item?id=44797682 . So there's no ambiguity. The worst they could accuse Cloudflare, is that they don't give website owners an easy way to only block scrapers while allowing user-driven agents (do they?).

astrange9mo ago

Why did Perplexity use ChatGPT to write this? That's a competitor.

Or are they just so bad at writing that their own style looks like it?

ronsor9mo ago

Perplexity uses third-party models. They aren't a frontier AI lab like OpenAI, Anthropic, or DeepSeek.

astrange9mo ago

The main thing I use ChatGPT for is web searches.

bobbiechen9mo ago

Perplexity claims their traffic was confused with Browserbase's - I think this is inevitable at scale without better ways to identify traffic (or more specifically in this case, AI agents / fetchers), based on working in this space.

Zooming out for a second, we might be in an analogous era to open email relays. In a few years, will you need to run an agent through a big service provider because other big service providers only trust each other?

Traster9mo ago

Perplexity has really convinced me about this. There is a clear difference between automated bots scraping data at bulk for later use, and automated bots working on behalf of users on direct requests. I can see a reasonable argument that some of the first type of automation could be tolerable for websites with strict limits, the second type I think by default should not be tolerated at all.

Perplexity's value proposition appears to be "we're going to take the stuff off your website, and present it to our users. We're not going to show them your ads, we're not going to offer them your premium services or referrals to other products, we're going to strip out the value from your content and take it for our users".

You can argue all you want about whether that's 5k impressions a day or 1m impressions a day. It should be 0 impressions a day. It is literally just free-riding.

Also, they're meant to be a professional company taking VC money to build a business, why are writing whiny posts like a teenager? The impression I get with a lot of these companies is that their business is losing money hand over fist, they have no idea how they're going to make it work and they look absolutely panicked as a result. They come across like a company I would want to be nowhere near.

skeledrew9mo ago

> Perplexity's value proposition appears to be "we're going to take the stuff off your website, and present it to our users. We're not going to show them your ads, we're not going to offer them your premium services or referrals to other products, we're going to strip out the value from your content and take it for our users".

This, exactly this is a primary reason why I use Perplexity. I want the valued content, without the unnecessary distractions that I'll never consciously touch anyway (there have been accidental clicks now and then, because some site designers really want people to click that ad and go all out to embed it into the content, and it only leads to great annoyance and sometimes a promise never to visit that site again).

Traster9mo ago

Yes and the result will be one of two options: option a (more likely) the underlying sites will literally just disappear, their business model no longer works and the content that you want (but apparently not enough to respect the authors) will cease to exist. It will most likely be replaced with AI slop replicas of the content you wanted. Or option b (much less likely) the content you want will move behind premium services where AI companies will have to negotiate subscriptions you will have entered the cable TV bundle era of the internet.

skeledrew9mo ago

Oh there's also part c where, once a or b happens, it'll clear the way for quality non-SEO content by creators who just want to share with 0 expectation of any return, which will finally see light once the return-optimized stuff has died or been walled away. The internet will be back to how it was before and meant to be with content shared for fun and/or interest, not profit, taking the front scene.

bwfan1239mo ago

google was also accused in its early days of free riding on other people's content - google news still remains controversial. Also, in its early days, google did not have to deal with a large counter-party like cloudflare which is now a gatekeeper of sorts.

The problem I see for chatgpt/perplexity and the like is this: for good responses to many questions, they have to index the web real-time. ie, they become a search-engine. However, they cannot share revenue with the content-providers since they dont have an ad-model. I wonder how this would be resolved - perhaps thru content licensing with the large publishers.

minimaxir9mo ago

Official post on their company blog (no idea why they reposted it on Twitter verbatim, that's bad SEO, ironically): https://www.perplexity.ai/hub/blog/agents-or-bots-making-sen...

posperson9mo ago

Perplexity using Cloudflare on their own website with the WAF security settings turned up is quite ironic

Tokumei-no-hitoOP9mo ago

if what they're saying is true then this was a huge fuckup on CFs part. i was already a bit suspicious when they started gloating about OAI agent since it's been shown to literally state that it's solving a captcha to complete the task.

i guess it will come down to browserbase corroborating the claims.

anon70009mo ago

Hm, it’s still a tricky question. Do web admins have a right to block agentic AI? Of course they do. They have a right to block whoever they want. Distinguishing between scraper and agent is important, yes, but agents can still produce a lot more traffic than the average human, and it’s not like the agents are optimizing their access patterns.

Tokumei-no-hitoOP9mo ago

i agree. i think the point i agree with perplexity on is that CF is a central authority that is claiming to be the gatekeeper while (allegedly) being disingenuous and incapable.

i also tend to agree with the concept that scraping != consuming on behalf of a user. they explicitly point out that they do not store the data for training, which would fall under scraping by proxy.

ChrisArchitect9mo ago

Responding to:

Perplexity is using stealth, undeclared crawlers to evade no-crawl directives

https://news.ycombinator.com/item?id=44785636

MitPitt9mo ago

Recently I've been unable to beat cloudflare's captcha at all, locking me out of many services. Modern problems require modern solutions, and it shouldn't hurt regular users.

faragon9mo ago

That should be a property of a web site. E.g. the web site using a property in the Cloudflare configuration. That way there would be competition between websites allowing or not IA agents on user accounts.

j / k navigate · click thread line to collapse

45 comments

vouaobrasil9mo ago

> When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect

> This overblocking hurts everyone. Consider someone using AI to research medical conditions,

Of course you will put medical conditions in there: appeal to the hypothetical person with a medical problem, a rather contemptible and revolting argument.

> This undermines user choice

What happens to user choice when website designers stop making websites or writing for websites because the lack of direct interaction makes it no longer worthwile?

> An AI assistant works just like a human assistant.

> This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.

taskforcegemini9mo ago

As a sysadmin occasionally responsible for resolving load-spikes caused by bots/crawlers: well said!

viraptor9mo ago

> By Perplexity's reasoning, I should be able to set up a huge server farm and hit any website with 1,000,000 requests per second because 1 request is not seen as harmful.

vouaobrasil9mo ago

> That's a slippery slope all the was to absurd. They're not talking about millions of requests a second. They're talking about a browsing session (few page views) [...]

just-tom9mo ago

1gn159mo ago

Do you have an ad-blocker? If so, should website owners be able to disable your ad-blocker via a setting they send to you? It's their content, after all.

Rebelgecko9mo ago

As a frugal person it's frustrating when websites block me for using an ad blocker, but I also don't think publishers should be required to send me data

beardyw9mo ago

They may, and do, refuse to send you the page. That is a more realistic parallel.

Without advertising the web would be largely unsupportable financially without per site subscriptions.

skeledrew9mo ago

Without advertising, the web would be a far healthier place today, free of click-bait and ad-infested, low effort, low quality content. Bad business models wouldn't've survived the light of day.

Tokumei-no-hitoOP9mo ago

fair point.

where's the front page CF callout for google search agent? they wouldn't dare. i don't remember the shaming for ad and newsletter pop up blockers.

what i disagree on is that CF has the right to, again allegedly, shame perplexity on false information. especially when OAI is solving captchas and google is also "misusing" websites.

i wish i had an answer to how we can evolve the web sustainably. my main gripe is the shaming and virtue signaling.

skybrian9mo ago

As far as I know Google's bot respects robots.txt and doesn't try to evade detection?

Tokumei-no-hitoOP9mo ago

skeledrew9mo ago

Funnily enough this would also mean some humans are not human, like me. I exhibit that behavior exactly as well. Maybe I'm an agent acting on behalf of myself, whatever that means.

skybrian9mo ago

Yep, also true of something like curl -s $url | llm --system 'summarize this article'

vouaobrasil9mo ago

It's also true that you could dismantle a building with a hammer, which accomplishes the same as dynamite. So why not just sell dynamite at the local hardware store along with hammers?

lofaszvanitt9mo ago

And Cloudflare is the police entity to enforce this, right?

avallach9mo ago

Cloudflare did explain a proper solution: "Separate bots for separate activities". E.g. here: one bot for scraping/indexing, and one for non-persistent user-driven retrieval.

Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?

viraptor9mo ago

They're already doing that https://docs.perplexity.ai/guides/bots There's PerplexityBot and Perplexity‑User.

avallach9mo ago

To me this invalidates their whole claim that Cloudflare fails to tell the difference between scraper and user-driven agent. Instead, distinguishing them is trivial, and the block is intentional.

skeledrew9mo ago

1 more reply

skeledrew9mo ago

> bypassing a bot block is a violation of the owners right to decide whom to admit?

There is only a violation if the bot finds a way around a login block. Same for human. But whatever is on the public web is... public. For all.

hunter2_9mo ago

skeledrew9mo ago

SilverElfin9mo ago

I wonder if Perplexity or others mix the traffic of the two types so they’re indistinguishable, specifically to make this argument.

avallach9mo ago

astrange9mo ago

Why did Perplexity use ChatGPT to write this? That's a competitor.

Or are they just so bad at writing that their own style looks like it?

ronsor9mo ago

Perplexity uses third-party models. They aren't a frontier AI lab like OpenAI, Anthropic, or DeepSeek.

astrange9mo ago

The main thing I use ChatGPT for is web searches.

bobbiechen9mo ago

Traster9mo ago

You can argue all you want about whether that's 5k impressions a day or 1m impressions a day. It should be 0 impressions a day. It is literally just free-riding.

skeledrew9mo ago

Traster9mo ago

skeledrew9mo ago

bwfan1239mo ago

minimaxir9mo ago

Official post on their company blog (no idea why they reposted it on Twitter verbatim, that's bad SEO, ironically): https://www.perplexity.ai/hub/blog/agents-or-bots-making-sen...

posperson9mo ago

Perplexity using Cloudflare on their own website with the WAF security settings turned up is quite ironic

Tokumei-no-hitoOP9mo ago

i guess it will come down to browserbase corroborating the claims.

anon70009mo ago

Tokumei-no-hitoOP9mo ago

i agree. i think the point i agree with perplexity on is that CF is a central authority that is claiming to be the gatekeeper while (allegedly) being disingenuous and incapable.

i also tend to agree with the concept that scraping != consuming on behalf of a user. they explicitly point out that they do not store the data for training, which would fall under scraping by proxy.

ChrisArchitect9mo ago

Responding to:

Perplexity is using stealth, undeclared crawlers to evade no-crawl directives

https://news.ycombinator.com/item?id=44785636

MitPitt9mo ago

Recently I've been unable to beat cloudflare's captcha at all, locking me out of many services. Modern problems require modern solutions, and it shouldn't hurt regular users.

faragon9mo ago

j / k navigate · click thread line to collapse