Even worse, they state that Brave search won't index a page only if other search engines are not allowed to index it. It is morally not their right to make that call. A publisher should have full control to discriminate which search engine indexes the website's content. That's the very heart of why the Robots Exclusion Protocol exists, and Brave is brazenly ignoring it.
Even worse than that, the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?
I wrote about all this here:
https://searchengineland.com/crawlers-search-engines-generat...
and more references elsewhere in this thread:
https://news.ycombinator.com/item?id=36989129
Amusingly, while I was writing my article, this got posted to their forums, asking about how to block their crawler:
https://community.brave.com/t/stop-website-being-shown-in-br...
No reply so far.
If you post something to the open web, what's it to you who reads it and how? You can block some IPs but that's about it.
I don't know if Brave has a knowledge graph - if they do, I would understand objecting if they filled it in with “stolen” content. But I don't see what's the problem with search.
By the way, isn't everyone's favourite archive.is doing the same thing?
I have no strong opinion on this, curious to hear counter arguments.
> A publisher should have full control to discriminate which search engine indexes the website's content
If you want someone to not see what you publish block him yourself. Also why would you want to do that? Do you want google to own the web or something?
I share your concern about Google having this much power, and I'd add that Microsoft Bing is equally bad but gets away with it because they're smaller. Still, the final decision about which search engine indexes a website is purely the publisher's.
It simply doesn't sound right to say which tool a user can use. It's literally the same as arguing that you should be able to block Firefox from accessing your website and it's Mozilla's fault that they don't respect your wishes as a webmaster to block Firefox exclusively. Or that a VPN doesn't publish its IP addresses so that you can block it. Or a screen reader that processes the text to speech in a way that you disagree with.
Philosophically it seems intuitive to say "I should be able to block a third party that is abusing my site" but it's ignoring the broader context of what "open web" and "net neutrality" actually mean.
I run a service for podcasters. There are podcast apps and directories that either ignorantly make unnecessary requests for content or have software bugs that cause redownloads. I could trivially block them, but I don't because doing so penalizes the end user who is ultimately innocent, rather than the badly behaved service operator. The better solution is primitives like rate limiting, which I use liberally. Plus, blocking anyone literally has a direct effect of incentivizing centralization on Apple, Spotify, etc. and making the state of open tech in podcasting even worse.
> the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?
I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee. Services like this have existed for decades. If you don't want your content scraped, put it behind a login. Especially considering this only applies when you allow other search engines and if you think Google and Bing aren't using your content to train AI, you're off your rocker.
1. User agents should identify themselves
2. A crawler is not a User agent - it's an agent for Brave
>I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee.
You can't end-run copyright like this: just because something is publicly available doesn't mean anyone can redistribute it. Look at the legal issues & cases relating to Library Genesis.
What if I consider (some or any of) my ideas to be un-indexable, not directly suitable to representation in any hierarchy other than those I may set them in?
To me as a search engine end user, this kind of behavior is undesirable. Why would I want a website to selectively degrade my experience because of my choice in search engine or browser?
Brings back horrible flashbacks of “this website is only compatible with IE6”.
Also, these search crawls by the browser do not identify themselves beyond the Brave standard UA header, namely a plain Chrome user-agent string.
How many Chrome users have opted in to sending data to Google.
Sometimes uninformed consent is not actually consent. These so-called "tech" companies love to toe that line.
Baidu
Bing
Brave
Google
Yandex
You can compare their results on this search comparison page I maintain:https://www.gnod.com/search/?engines=p,o,br,n,q&nw=1
(If you want to also search image libraries like Flickr and Pexels, click on "more engines" to select all places you want to search)
Bamboo sign
Give it a try on Google images. You'll see that nearly all the results are x-rays of people with ankylosing spondylitis, a form of which is commonly referred to as "bamboo spine".
I tested out brave search - it correctly shows 90% signs however it does also show a few spines.
Google still shows incorrect results months later. It's by far the worst of all the search engines in the list for this simple and obvious search.
I actually like Brave here for my test better than Google. I typed in a few cities, just wanting to see the skylines and such.
Brave gives me good photos, some stock photos, etc. Google gives me pictures from recent news articles, which isn't overly helpful IMO.
That already makes it worth of support.
But Google having become so bad of late has made switching quite easy, even if brave is not getting better super fast, Google unfortunately is getting worse and making up for it.
What are the censored image searches you found?
See: https://help.kagi.com/kagi/why-kagi/kagi-vs-google.html
c.f. yep.com
Brave search was my default for quite a while, until a few weeks after they got rid of bing results. As soon as that happened, stuff just wasn't showing up that I'd expect to be there, and 90% of the time I'd follow up searches with !g or !ddg just to get something decent to show up. The index just felt severely lacking, or the relevancy was pretty far off base.
Would you say search has greatly improved over the past month or two?
A recent example from my search history, "doors of stone release date". The author has announced a new novella releasing Nov. 2023, but not the actual book Doors of Stone. The google infobox gets this wrong, but the first result is correct. Brave accidentally gets it right that there's no release date for the book, but misses the novella announcement and all but one of the results are blogspam.
I'm so skeptical that I'm just now starting to develop a feeling of trust towards DuckDuckGo.
In the browser domain, Mozilla is the only company of which I feel that it is genuinely pro-customer.
So I give all my stuff to Google and hope that they at least just protect it from hackers, while I am aware that they analyze my data in order to see how they can monetize me better, but at least with anonymity in regards to 3rd parties. I just hope I'm not wrong.
My god, there is so much telemetry in FF now, and it's tricky to hunt down all the about:configs to disable it. Not friendly or privacy conscious at all. Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!
Quick FF enshittification example from 2 months ago:
Alert HN: Mozilla puts advertising into Firefox AGAIN
https://news.ycombinator.com/item?id=36351322 (48 comments)
Mozilla stops Firefox fullscreen VPN ads after user outrage
https://news.ycombinator.com/item?id=36085642 (220 comments)
They literally can't win. One group of vocal users is outraged by how much money Mozilla takes from Google while the other half screams about how Mozilla is trying to gasp gain additional revenue streams that isn't taking money from their biggest competitor.
> Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!
No, but I really don't give a shit either. At a certain point, I looked in the mirror and said life is too short to care about stupid shit like that. If I was a spy or a journalist in some state like China or Iran, maybe I would care. But this feels odd to hone in on when any website you go to is collecting all sorts of info of this sort.
Mozilla doesn't go about it in as upfront way as Brave does, IME, but stuff like VPN, Pocket and other browser-related services I mind not at all.
I have no sympathy to the current political shitfest that Mozilla is as an organization, but as makers of Firefox I feel like Mozilla is in an impossible bind: Their users expect a fairytale of an independent, donation-funded browser that people spontaneously adopt, and go nuts about stuff like the inclusion of Pocket. I know, I used to be one of those people back when Pocket was introduced. But reeing about Mozilla trying to have independent funding by giving people useful services is just strange. It's exactly what they should be doing, and Brave setting up revenue streams like Talk and Search is great. Especially because they operate in the normal money universe for those of us who aren't terribly enthusiastic about crypto.
I'm curious to see what you think about this. If you're not okay with Firefox telling Mozilla your IP address every time you connect, does the same go for Brave sending entire pages of your search results to them? This also includes which results you've clicked on.
The thing is that it will always get worse. Every company needs to grow in order to stay alive and eventually quality and your user experience will suffer because of this, but here's the thing: Always choose the upstart competition but just be prepared to jump to the next up-and-comer after that. For me, I found DuckDuckGo getting worse over time (probably not on purpose, just spam) and somehow Brave is better so I'm sticking with that, but as soon as Brave decides to fuck me (and they will!) then I'll be jumping to whoever the new underdog is at that time.
On the other hand, the Google lawyers seem to have found an excuse to link some proprietary code into Chome (that's not part of Chromium). Does anybody know what that excuse is, and if it provides a loophole large enough to close off Chrome development?
It also explains why Brave Search is slow today I suspect... :)
Usually google+tineye+bing+yandex will yield some useful results
It surprisingly comes up for image searching. Google for example has been known to censor images of Tiananmen Square.
(though I've recently switched away from Brave Search since the goggles subscriptions (the reason I've switched to Brave) have a big bar right at the top and the whole top settings+goggles shifts search results on load!)
Now... I would just like to see the full https url on searches.
I already love the look, Brave summarizer AI and general results!
The current policies don't allow it, sure. But then there will be gradual changes to those policies, and all sorts of dark patterns that make 99% of users leak their data (think "enabled by default but opt-out").
Is there any guarantee against this?
You have to realize data is worth less over time -- a lot less. So the issue would be whether users stick if (heaven forbid) a change of control left Brave in the hands of unethical owners. By design, open source, network sniffing, and other auditors would catch wise and flame such a corrupt Brave into the ground. That's the best we can do here. There is no safety property ("X" holds for all future program states) enforceable on a company.