Brave Search launches own image and video search (opens in new tab)

(brave.com)

245 pointsgripfx2y ago162 comments

162 comments

The major problem with Brave search is their position about indexing and licensing content against the wishes of the website publisher. Their robot does not identify itself, meaning the publisher cannot use the standard robots.txt to block its crawling if the publisher so wishes. Incidentally, the robots.txt file has been used in court cases litigating if a search engine is legal or not.

Even worse, they state that Brave search won't index a page only if other search engines are not allowed to index it. It is morally not their right to make that call. A publisher should have full control to discriminate which search engine indexes the website's content. That's the very heart of why the Robots Exclusion Protocol exists, and Brave is brazenly ignoring it.

Even worse than that, the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?

I wrote about all this here:

https://searchengineland.com/crawlers-search-engines-generat...

and more references elsewhere in this thread:

https://news.ycombinator.com/item?id=36989129

Amusingly, while I was writing my article, this got posted to their forums, asking about how to block their crawler:

https://community.brave.com/t/stop-website-being-shown-in-br...

No reply so far.

yreg2y ago

Hmm, I don't know, it doesn't seem obvious to me that it is unethical to disobey the publisher wishes.

If you post something to the open web, what's it to you who reads it and how? You can block some IPs but that's about it.

I don't know if Brave has a knowledge graph - if they do, I would understand objecting if they filled it in with “stolen” content. But I don't see what's the problem with search.

By the way, isn't everyone's favourite archive.is doing the same thing?

I have no strong opinion on this, curious to hear counter arguments.

CaptainFever2y ago

I’m just thinking that if website publishers are able to legally allow Googlebot but block other bots, it might contribute to the Google monopoly.

pierrefar2y ago

That would be bad, and it is already bad that Google and Microsoft control so much of search queries, but the decision about which search engine indexes a website is purely the publisher's.

4 more replies

jaharios2y ago

This make me want to use Brave search now. When I use a tool I expect it that it serves me, not the material it provides.

> A publisher should have full control to discriminate which search engine indexes the website's content

If you want someone to not see what you publish block him yourself. Also why would you want to do that? Do you want google to own the web or something?

pierrefar2y ago

There is a a difference between a human being able to access content vs a search engine indexing it (and in the case of Brave, "licensing" it on).

I share your concern about Google having this much power, and I'd add that Microsoft Bing is equally bad but gets away with it because they're smaller. Still, the final decision about which search engine indexes a website is purely the publisher's.

memefrog2y ago

There is a difference between Americans and Chinese people but that doesnt mean discriminating on racial lines is justified. Just saying "there is a difference" isnt an argument. Indexing a website isnt the same as reading it, but it is a form of consumption and I see no reason why they should be treated differently.

And to use that analogy even further, if you want to block Chinese visitors you block Chinese IPs. You do not add a file called "countries.txt" containing "China block" and then expect Chinese users to see it and voluntarily cease to use it, and threaten to sue them if they don't.

Repeatedly asserting that "the final decision is with the publisher" is stupid. That is the point you seem to want to defend. Defend it! Give us a reason. Just saying the same thing over and over again doesnt make it true.

Grimburger2y ago

> There is a a difference between a human being able to access content vs a search engine indexing it

Much of the problem with search today arises from websites showing googlebot what it wants to see and showing real users. I have to manually remove entire domains from google search as they often appear 1st yet don't show any content without me signing up for an account. Clearly that's not what they are showing to google.

There should be no differentiation between a crawler and a human being with regards to what is being served.

waithuh2y ago

If the website owner wants google to own the web, they should be able to restrict their website. Nothing wrong with that.

bastawhiz2y ago

Let's say I pay for Kagi. Kagi is a tool that I'm using to avoid doing hard work manually. With relatively few exceptions, I can probably accomplish what I use a search engine for manually, but with much more time and effort. So I'm paying for a tool to assist me with my use of the web. A "user agent", you might even say.

It simply doesn't sound right to say which tool a user can use. It's literally the same as arguing that you should be able to block Firefox from accessing your website and it's Mozilla's fault that they don't respect your wishes as a webmaster to block Firefox exclusively. Or that a VPN doesn't publish its IP addresses so that you can block it. Or a screen reader that processes the text to speech in a way that you disagree with.

Philosophically it seems intuitive to say "I should be able to block a third party that is abusing my site" but it's ignoring the broader context of what "open web" and "net neutrality" actually mean.

I run a service for podcasters. There are podcast apps and directories that either ignorantly make unnecessary requests for content or have software bugs that cause redownloads. I could trivially block them, but I don't because doing so penalizes the end user who is ultimately innocent, rather than the badly behaved service operator. The better solution is primitives like rate limiting, which I use liberally. Plus, blocking anyone literally has a direct effect of incentivizing centralization on Apple, Spotify, etc. and making the state of open tech in podcasting even worse.

> the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?

I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee. Services like this have existed for decades. If you don't want your content scraped, put it behind a login. Especially considering this only applies when you allow other search engines and if you think Google and Bing aren't using your content to train AI, you're off your rocker.

sangnoir2y ago

> With relatively few exceptions, I can probably accomplish what I use a search engine for manually, but with much more time and effort. So I'm paying for a tool to assist me with my use of the web. A "user agent", you might even say

1. User agents should identify themselves

2. A crawler is not a User agent - it's an agent for Brave

>I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee.

You can't end-run copyright like this: just because something is publicly available doesn't mean anyone can redistribute it. Look at the legal issues & cases relating to Library Genesis.

bastawhiz2y ago

> User agents should identify themselves

There is no rule that this is true, and many user agents exist _specifically to not be identified_. See Tor and other privacy-centric user agents.

> A crawler is not a User agent - it's an agent for Brave

You know, I thought "what does Wikipedia have to say on this matter?" and sure enough:

> Examples include all common web browsers, such as Google Chrome, Mozilla Firefox, and Safari, as well as some email readers, command-line utilities like cURL, and arguably headless services that power part of a larger application, such as a web crawler.

I can't even make that up.

> just because something is publicly available doesn't mean anyone can redistribute it

You're mistaking reselling content with providing access to it. By your logic, caching proxy servers would be illegal on the grounds of copyright. The physical act of downloading files necessarily creates copies of the data every step of the journey from the source server to you. There's a material difference between paying someone for a copy of some content and paying someone to fetch content for you on your behalf. Nothing about copyright law specifically requires the person physically acquiring the content is the one who ends up consuming it.

memefrog2y ago

Downloading something isnt redistributing it. It is your website. You provide what is on it to me. I send you an HTTP request. You dont have to respond. You do. I am not copying anything. Copyright simply isnt engaged at any point in this process.

cvalka2y ago

They are right and you are wrong. If some web page is publicly available, it should be indexed. Scraping neutrality, please.

waithuh2y ago

Heavily disagree. I own the server, thus the website. I should be able to allow or disallow any type of web crawler/scraper i want. Similar to how you cant easily regulate whats in a website without lawsuits and takedowns, you cant regulate how discoverable a website is.

ric2b2y ago

> I should be able to allow or disallow any type of web crawler/scraper i want.

You're certainly allowed to try, but I don't see why indexers should be mandated to collaborate with you. They serve their users, not you.

1 more reply

tympious2y ago

What if I want what I publish to be known only by word of mouth?

What if I consider (some or any of) my ideas to be un-indexable, not directly suitable to representation in any hierarchy other than those I may set them in?

Grimburger2y ago

Then you should hide them behind a url that isn't linked elsewhere on your site that you can easily propagate by word of mouth only.

    example.com/correcthorsebatterystaple

If you consider "word of mouth" to be public posts on a forum which millions can read at any time then block googlebot IP's

2 more replies

memefrog2y ago

Then dont publish it.

vGPU2y ago

> The Robots Exclusion Protocol is a mechanism for publishers to discriminate between what users and crawlers are allowed to access, and discriminate between different crawlers (for example, allow Bingbot to crawl but not Googlebot).

To me as a search engine end user, this kind of behavior is undesirable. Why would I want a website to selectively degrade my experience because of my choice in search engine or browser?

Brings back horrible flashbacks of “this website is only compatible with IE6”.

1vuio0pswjnm72y ago

Curious why cannot selectively block using IP address instead of user-agent string. According to HTTP specification, UA is not a required header. There is certainly no technical requirement for it in order to process HTTP requests. Of course, any website could block requests that lack a UA header. I never send one and it's relatively rare IME to see a site require it, but it's certainly possible.

pierrefar2y ago

This is explained more in the article I referred to, but briefly: Brave delegates crawling to normal Brave browsers, so it's a huge IP addresses pool, not a single IP address or range.

Also, these search crawls by the browser do not identify themselves beyond the Brave standard UA header, namely a plain Chrome user-agent string.

1 more reply

1vuio0pswjnm72y ago

According to Brave Privacy Policy, participation in the Web Discovery Project is "opt-in". How many Brave users have opted in to sending data to Brave.

How many Chrome users have opted in to sending data to Google.

Sometimes uninformed consent is not actually consent. These so-called "tech" companies love to toe that line.

mg2y ago

As far as I can tell, that makes for 5 independent image search engines on the web:

    Baidu
    Bing
    Brave
    Google
    Yandex

You can compare their results on this search comparison page I maintain:

https://www.gnod.com/search/?engines=p,o,br,n,q&nw=1

(If you want to also search image libraries like Flickr and Pexels, click on "more engines" to select all places you want to search)

esperent2y ago

I accidentally found a good test string for image search a few months ago:

Bamboo sign

Give it a try on Google images. You'll see that nearly all the results are x-rays of people with ankylosing spondylitis, a form of which is commonly referred to as "bamboo spine".

I tested out brave search - it correctly shows 90% signs however it does also show a few spines.

Google still shows incorrect results months later. It's by far the worst of all the search engines in the list for this simple and obvious search.

firejake3082y ago

Actually, the Google result is not incorrect. In radiology, an X-ray of a patient with ankylosing spondylitis is said to display the "bamboo sign", a real medical term describing the appearance of the backbones. Google is showing the result that is more relevant to thousands of medical students (such as myself) who are trying to understand a radiology report, but that may not be relevant to someone who just wants a sign made out of bamboo

silisili2y ago

Nice!

I actually like Brave here for my test better than Google. I typed in a few cities, just wanting to see the skylines and such.

Brave gives me good photos, some stock photos, etc. Google gives me pictures from recent news articles, which isn't overly helpful IMO.

isaacremuant2y ago

Brave is the one that censors less, from all those. Specially doesn't censor for political motives that I'm aware of.

That already makes it worth of support.

But Google having become so bad of late has made switching quite easy, even if brave is not getting better super fast, Google unfortunately is getting worse and making up for it.

yreg2y ago

Interesting, I regularly use both and I find Google to perform better for me than Brave (in text search).

jorvi2y ago

I have a deeeeeeep dislike for Google’s “must include: duck | missing: duck”.

magicalist2y ago

> Specially doesn't censor for political motives that I'm aware of

What are the censored image searches you found?

reitanqild2y ago

Try to search for the 1989 Tiananmen Square protests and massacre.

That tends to upset some engines including Bing I think.

2 more replies

jaystraw2y ago

I'm sure this isn't in the spirit or meaning of what you said, but to respond jokingly, I haven't found many censored anythings :-P turns out...

isaacremuant2y ago

I meant for text. Haven't really used the image feature yet.

ColinHayhurst2y ago

You missed out Mojeek. For an independent take: https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

SomeoneOnTheWeb2y ago

Shouldn't Kagi (https://kagi.com) also be on that list?

MildRant2y ago

Someone will correct me if I'm wrong but Kagi uses Google search results. I'm sure it's more complicated than that and they have their own secret sauce but it is not an independent search engine.

See: https://help.kagi.com/kagi/why-kagi/kagi-vs-google.html

Terretta2y ago

> Someone will correct me if I'm wrong but Kagi uses Google search results.

Click two links down in the same menu:

https://help.kagi.com/kagi/why-kagi/kagi-vs-brave.html

Kagi Search includes anonymized requests to traditional search indexes like Google and Bing as well as sources like Wikipedia, DeepL, and other APIs. We also have our own non-commercial index (Teclis), news index (TinyGem), and an AI for instant answers. Teclis and TinyGem are a result of our crawl through millions of domains, focusing primarily on non-commercial, high-quality content.

Our unique results combined from all of these sources help you discover the best content you can possibly find online, sometimes from the quieter places on the web.

1 more reply

anderber2y ago

I believe you're correct. Kagi just uses Google's API and makes some changes on top of it.

fsflover2y ago

One more, which is self-hosted, peer-to-peer and FLOSS: https://yacy.net

vGPU2y ago

Yacy search results/ranking are worse than useless

fsflover2y ago

Because too few people are participating.

1 more reply

NooneAtAll32y ago

does searx count as independent?

esperent2y ago

It's not a search engine, I think? It's a search aggregator - it combines results from other search engines.

ShrigmaMale2y ago

ahrefs

c.f. yep.com

cheald2y ago

I'm very glad for this! I've been using Brave search almost since it launched, and the standard results have gotten great, but it's always been a bummer to have to go out to Bing/Google for images/video. It's really nice having an alternative that isn't just wrapper around someone else's search index.

ignitionmonkey2y ago

Same here. Much like Firefox, I think it's important to use a search index that isn't tied to the larger players (Bing and Google in this case). I tried Brave a couple of months ago[1], I found the results better than Bing, but without image search it wasn't usable. Now I can give it another go.

[1]: https://jahed.dev/2023/07/01/trying-brave-search/

waithuh2y ago

Out of curiosity, if you dont want to be 'tied to the larger players', why not use metasearch engines like SearX? If Google does not have a great answer, Bing does, or Mojeek.

ignitionmonkey2y ago

I'm referring to a default search engine. Something I can use by default everywhere and I can recommend to others. So it has to provide results based on typical use cases.

First I've heard of a "metasearch" engine. I just tried SearX and it gave me no results for my projects and only a couple for "Mastodon", but the animal not the software.

em-bee2y ago

i have been using brave search for a while now and i was surprised and am very satisfied with the search results. missing image and video search was a bit annoying mostly in that it linked to google and bing but not any other search engines. but i just remembered to switch to where i wanted to do image search instead. it sometimes meant that i had to go back to retype my search query, but i'd rather have a good text search than be bothered by that. in most cases i'd know ahead of time if i wanted images so it was easy to pick the right search engine.

SparkyMcUnicorn2y ago

I'm a little surprised to hear other people having such good results.

Brave search was my default for quite a while, until a few weeks after they got rid of bing results. As soon as that happened, stuff just wasn't showing up that I'd expect to be there, and 90% of the time I'd follow up searches with !g or !ddg just to get something decent to show up. The index just felt severely lacking, or the relevancy was pretty far off base.

Would you say search has greatly improved over the past month or two?

cheald2y ago

I feel like it's improved steadily over time. I used to find it severely lacking for anything code-related, but that's improved, too.

It's not as good as Google was at its peak (and Google itself has degraded severely in quality, IMO), but it's good enough that I can generally find what I'm looking for with a minimum of effort.

I run maybe 1 in 40-50 searches with "!g" because Brave is insufficient, for context.

sundarurfriend2y ago

> stuff just wasn't showing up that I'd expect to be there

Exactly my experience. I hadn't connected it to them getting rid of Bing results, but it makes sense. I've had to use the bang redirects to other search engines a lot more too, to a level I haven't had to in more than a year.

computronus2y ago

Possibly, at least for my own experience. It's much less often that I rerun a search with !g. Brave Search's results are becoming more relevant to my own queries, and Google's becoming less so. Not to mention how ads like to masquerade as ordinary Google results; using Google is starting to feel less comfortable.

1 more reply

AlotOfReading2y ago

I wonder what's different about our searches and expectations. I've been using Brave as a default for 1y+ and I still get consistently bad results compared to Google. The only reason it's remotely competitive is how much Google itself has declined in quality.

A recent example from my search history, "doors of stone release date". The author has announced a new novella releasing Nov. 2023, but not the actual book Doors of Stone. The google infobox gets this wrong, but the first result is correct. Brave accidentally gets it right that there's no release date for the book, but misses the novella announcement and all but one of the results are blogspam.

ignitionmonkey2y ago

>I wonder what's different about our searches and expectations.

The difference might be that they (including myself) don't ask search engines for facts like "doors of stone release date". They'll search for "doors of stone", find personally reliable sources like Wikipedia, Fandom, Goodreads, browse them and decide on an answer. When sources fail to appear, they'll either refine the search (like "doors of stone rothfuss") or call it a failure and maybe try a different search engine.

This is one the reasons why Brave has been good for me so far. When a relevant Wikipedia article exists, it shows it, even if the title doesn't match. Whereas lately DDG and others don't. In fact, you can see this with "doors of stone". Brave shows "The Kingkiller Chronicle", DDG doesn't at all, Google has it low down in the results.

It also shows Reddit discussions without needing to explicitly filter for it. And I use ad block to remove the AI summariser that takes up half the screen, it's not what I want from a search engine.

1 more reply

mox12y ago

Overall Brave search has been good for me. I have been using it as the default on every PC / Browser I utilize.

I will say there are times when it just falls flat. Like I will search for a brand or specific thing , expecting to get to the home / login page for that brand, and it just flat out gives me weird results.

But when I put the !g in front of the query, the first result is always what I wanted.

On the other than, when doing more general searches, Brave is on par or better than google.

1 more reply

mrweasel2y ago

The search results are pretty good, but I can't work with the layout. I really don't like that video results a so prominently displayed, I don't understand why results a split into multiple "boxes". It's way to messy.

qwertox2y ago

I'm always staying away from Brave because I've been confronted so many times with bait-and-switch tactics that I have the feeling that one day they will move away from being good and monetize all the collected data, even though they don't collect data.

I'm so skeptical that I'm just now starting to develop a feeling of trust towards DuckDuckGo.

In the browser domain, Mozilla is the only company of which I feel that it is genuinely pro-customer.

So I give all my stuff to Google and hope that they at least just protect it from hackers, while I am aware that they analyze my data in order to see how they can monetize me better, but at least with anonymity in regards to 3rd parties. I just hope I'm not wrong.

metadat2y ago

You should check these assumptions, Mozilla has been hard at work enshittifying their entire portfolio. Instead of giving the public features they actually want (the most secure, performant, and predictable web browser), the current CEO has directed the focus towards revenue-generating features.

My god, there is so much telemetry in FF now, and it's tricky to hunt down all the about:configs to disable it. Not friendly or privacy conscious at all. Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!

Quick FF enshittification example from 2 months ago:

Alert HN: Mozilla puts advertising into Firefox AGAIN

https://news.ycombinator.com/item?id=36351322 (48 comments)

Mozilla stops Firefox fullscreen VPN ads after user outrage

https://news.ycombinator.com/item?id=36085642 (220 comments)

hipsterstal1n2y ago

> the current CEO has directed the focus towards revenue-generating features.

They literally can't win. One group of vocal users is outraged by how much money Mozilla takes from Google while the other half screams about how Mozilla is trying to gasp gain additional revenue streams that isn't taking money from their biggest competitor.

> Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!

No, but I really don't give a shit either. At a certain point, I looked in the mirror and said life is too short to care about stupid shit like that. If I was a spy or a journalist in some state like China or Iran, maybe I would care. But this feels odd to hone in on when any website you go to is collecting all sorts of info of this sort.

esperent2y ago

> No, but I really don't give a shit either.

Nobody's asking you to personally fight for everything. Life truly is too short for that. All we're asking for is your tacit support, or failing even that, your abstinence from the conversation.

I don't personally have time to look after stray dogs in my area, for example. But I sure as hell don't come out with "I looked in the mirror and decided I don't give a shit" when I meet someone who does care about that. Not online and not offline. Instead I'll be supportive and tell them how amazing they are for spending their valuable time on this.

Even if it's something I don't personally care about at all, or even if I think it's a massive waste of time but I can see it's important to someone else, I'd still never tell them that I don't give a shit.

Is it too much to ask you to have the same respect for people who care about important issues like online privacy? If it's not important to you, that's fine! Go be somewhere else instead of interrupting people who do care about this. There's about 500 million other conversations happening on the internet at this very second. Surely one of them is something you actually do care about enough to engage with in a positive way?

metadat2y ago

The thing is about Mozilla of today is that the leadership isn't committed to the fundamentals -- privacy, security, functionality, and control by the end user.

At this point, Brave is a more promising bet than MZ.

soundnote2y ago

I mind Mozilla trying to find alternate revenue sources 0%. It's a good thing: Organizations like Mozilla and Brave SHOULD be making their own money and not be stuck to the Google teat.

Mozilla doesn't go about it in as upfront way as Brave does, IME, but stuff like VPN, Pocket and other browser-related services I mind not at all.

I have no sympathy to the current political shitfest that Mozilla is as an organization, but as makers of Firefox I feel like Mozilla is in an impossible bind: Their users expect a fairytale of an independent, donation-funded browser that people spontaneously adopt, and go nuts about stuff like the inclusion of Pocket. I know, I used to be one of those people back when Pocket was introduced. But reeing about Mozilla trying to have independent funding by giving people useful services is just strange. It's exactly what they should be doing, and Brave setting up revenue streams like Talk and Search is great. Especially because they operate in the normal money universe for those of us who aren't terribly enthusiastic about crypto.

reitanqild2y ago

I also think it is great that browsers seek out alternative sources of funding.

My problems with Mozilla are:

- Misuse of money: the browser team have brought in lots of money over the years (we talk billions) and the foundation is milking it dry. If the income created by the browser had stayed with the browser team they would have had funding for years to come.

- Being dishonest: Mozilla has sought donations for Firefox and I think many of us have donated thinking we supported Firefox, while in reality the Firefox team funds itself and the rest of Mozilla and Mozilla isn't even allowed to send money the other way.

- Not being up front about what they do: they more or less lied about their relationship with Pocket. I like Pocket, both the product and as a way to bring in income, but whenever it comes up, everyone who was there starts thinking about their lies.

- Nerfing the extension API.

- Writing "dear community members" in emails begging for money while simultaneously being rude to us in responses to real issues in Bugzilla.

Now, if anyone think I use Chrome, think again.

I am still optimistically waiting for authorities to wake up and punish Google the same way they punished Microsoft - huge fines and browser ballots - but that does not mean I give Mozilla a free pass ;-)

1 more reply

lern_too_spel2y ago

Brave sees Mozilla's political shitfest and raises a political shitbacchanalia.

1 more reply

cayley_graph2y ago

https://github.com/brave/web-discovery-project/blob/main/mod...

I'm curious to see what you think about this. If you're not okay with Firefox telling Mozilla your IP address every time you connect, does the same go for Brave sending entire pages of your search results to them? This also includes which results you've clicked on.

gurgunday2y ago

This is opt-in, and a rather hard-to-mistakenly-enable kind of opt-in that I personally think is consistent with their great™ privacy by default stance

Brave offers a simple deal: if you believe or have audited their technical claims, you give them fully anonymized snippets of URLs and web pages, and they improve their search index for everyone

guerrilla2y ago

> I'm always staying away from Brave because I've been confronted so many times with bait-and-switch tactics that I have the feeling that one day they will move away from being good and monetize all the collected data, even though they don't collect data.

The thing is that it will always get worse. Every company needs to grow in order to stay alive and eventually quality and your user experience will suffer because of this, but here's the thing: Always choose the upstart competition but just be prepared to jump to the next up-and-comer after that. For me, I found DuckDuckGo getting worse over time (probably not on purpose, just spam) and somehow Brave is better so I'm sticking with that, but as soon as Brave decides to fuck me (and they will!) then I'll be jumping to whoever the new underdog is at that time.

nxrabl2y ago

That makes two companies who both maintain their own Chromium forks and run direct competitors to core Google search products. I wonder if we'll see Google start to close off open development on Chrome - Microsoft will likely be fine, but that could put Brave in a precarious position.

vanviegen2y ago

I'm not sure if they can without rewriting the whole thing, the original (WebKit/KHTML) code base being GPL.

On the other hand, the Google lawyers seem to have found an excuse to link some proprietary code into Chome (that's not part of Chromium). Does anybody know what that excuse is, and if it provides a loophole large enough to close off Chrome development?

soundnote2y ago

They could pull a Red Hat.

msp262y ago

No mention of reverse image search sadly. I've been looking for anything better than Google/yandex forever.

davidm17292y ago

Hi, I'm David and I've worked along other engineers at Brave on this! Thanks for your feedback, it would certainly be a nice addition, although we may want to focus a bit more on quality first. Thanks!

qingcharles2y ago

Thank you for you and your team's hard work on this. To break the small monopoly on image search is no easy thing and I appreciate it.

It also explains why Brave Search is slow today I suspect... :)

teruakohatu2y ago

Hi David, reverse image search is an easier problem than good search quality. I am happy to chat with you about it, email in HN profile.

guestbest2y ago

I’ve been using tineye.com

porridgeraisin2y ago

Bing visual search is great. I use it in cryptic hunts a bunch. Would of course prefer a non-MS offering though, I hope one comes along that's equally good.

veave2y ago

Bing has its own and it's not bad.

Usually google+tineye+bing+yandex will yield some useful results

CoBE102y ago

In the last few months Bing has been much more reliable for me at reverse finding original sources of cover album images that have been posted on Discogs. Google seems to not give me images that have been posted a long time ago (10+ years maybe), but Bing still does.

kilroy1232y ago

If you're looking for people I was add pimeyes to that list.

CoBE102y ago

Also reported here: https://www.theregister.com/2023/08/03/brave_cuts_ties_with_...

NayamAmarshe2y ago

That's awesome! I've been using Brave Search for years and it's my favorite search engine ever!

user39393822y ago

My priority for search is that it’s free of political censorship and weighting. Google is abysmal on this issue, then we had the DDG CEO on here making excuses for his very clear statements about Russian news sources. I don’t need or want anyone else to decide for me what qualifies as “misinformation” I will decide that for myself.

It surprisingly comes up for image searching. Google for example has been known to censor images of Tiananmen Square.

sundarurfriend2y ago

On the topic of "weighting", Brave Goggles allow you to decide the weighting/ranking yourself, and is a great and under-used feature. The UI is a bit lacking (you have to create a goggle as a github gist/repo, publish it by submitting it to them, then bookmark an unweildy URL that's the link to your custom search engine), but the goggle syntax is pretty expressive and easy to use.

wusher2y ago

this is exactly why i switched from ddg to brave

gettodachoppa2y ago

I didn't even know about Brave Search! Thanks for the heads up, I'll switch now and see how it compares. DDG's CEO praising censorship of Russian sites definitely soured me on them.

eviks2y ago

Very nice!

(though I've recently switched away from Brave Search since the goggles subscriptions (the reason I've switched to Brave) have a big bar right at the top and the whole top settings+goggles shifts search results on load!)

rejectfinite2y ago

Very nice! I just switched to Brave Search from DDG on Brave and I kind of like it.

Now... I would just like to see the full https url on searches.

I already love the look, Brave summarizer AI and general results!

gettodachoppa2y ago

I like Brave and trust Brendan Eich, but what happens when they cash out and the next owners decide to monetize all this data?

The current policies don't allow it, sure. But then there will be gradual changes to those policies, and all sorts of dark patterns that make 99% of users leak their data (think "enabled by default but opt-out").

Is there any guarantee against this?

BrendanEich2y ago

We don't collect any data, so there's nothing to "monetize". We bring ad matching and blind confirmations to the browser, no need for collection.

You have to realize data is worth less over time -- a lot less. So the issue would be whether users stick if (heaven forbid) a change of control left Brave in the hands of unethical owners. By design, open source, network sniffing, and other auditors would catch wise and flame such a corrupt Brave into the ground. That's the best we can do here. There is no safety property ("X" holds for all future program states) enforceable on a company.

butz2y ago

Good, but we need more alternative search engines, even very niche ones.

soundnote2y ago

Excellent news. Now to see how well it holds up.

whalesalad2y ago

I wish brave (and everyone else on the internet) would abandon the poppins font it's nauseating.

post-factum2y ago

I'd appreciate if it'd possible to select Czech Republic and Ukraine as regions for search.

lern_too_spel2y ago

When will Brave Search launch a crawler update that lets me specifically block its crawler in robots.txt like every other search engine supports?

blacksmith_tb2y ago

I see they say "if a domain or page is not crawlable by any search engine (it has a noindex tag), or if it is not crawlable by googlebot, then Brave Search’s bot will not crawl it either."

1: https://brave.com/search/api/

cpeterso2y ago

Does the Brave crawler send the Googlebot or regular Chrome User-Agent string? If it sends something different than the standard Googlebot User-Agent string, you could dynamically serve a robots.txt that blocks Googlebot to every client besides Googlebot. OTOH, I've read that the Google crawler sometimes users the regular Chrome User-Agent string and penalizes sites that return different content to Googlebot and Chrome.

lern_too_spel2y ago

What if I want googlebot to crawl it but not bravebot? Every other search engine lets me block its crawler specifically. Only Brave has this shady policy.

hightrix2y ago

> What if I want googlebot to crawl it but not bravebot?

Then you need to gate your content such that it is not available openly to the public.

This falls inline with many objections to Google's WEI. If you host content openly and allow access freely, then don't be surprised when people access it at will and use it for free.

1 more reply

blacksmith_tb2y ago

Hmm, I agree it's odd, but 'shady' seems to attribute malice to what could just be stupidity?

2 more replies

gettodachoppa2y ago

Youu want the monopolistic tech giant to crawl you but not a small privacy-focused company? What possible justification could you have for this attitude?

1 more reply

bravetraveler2y ago

I'm conflicted - I see your point and agree; though I appreciate that by using methods of others... we don't end up with more

Loosely related XKCD: https://xkcd.com/927/

j / k navigate · click thread line to collapse

162 comments

pierrefar2y ago

I wrote about all this here:

https://searchengineland.com/crawlers-search-engines-generat...

and more references elsewhere in this thread:

https://news.ycombinator.com/item?id=36989129

Amusingly, while I was writing my article, this got posted to their forums, asking about how to block their crawler:

https://community.brave.com/t/stop-website-being-shown-in-br...

No reply so far.

yreg2y ago

Hmm, I don't know, it doesn't seem obvious to me that it is unethical to disobey the publisher wishes.

If you post something to the open web, what's it to you who reads it and how? You can block some IPs but that's about it.

I don't know if Brave has a knowledge graph - if they do, I would understand objecting if they filled it in with “stolen” content. But I don't see what's the problem with search.

By the way, isn't everyone's favourite archive.is doing the same thing?

I have no strong opinion on this, curious to hear counter arguments.

CaptainFever2y ago

I’m just thinking that if website publishers are able to legally allow Googlebot but block other bots, it might contribute to the Google monopoly.

pierrefar2y ago

That would be bad, and it is already bad that Google and Microsoft control so much of search queries, but the decision about which search engine indexes a website is purely the publisher's.

4 more replies

jaharios2y ago

This make me want to use Brave search now. When I use a tool I expect it that it serves me, not the material it provides.

> A publisher should have full control to discriminate which search engine indexes the website's content

If you want someone to not see what you publish block him yourself. Also why would you want to do that? Do you want google to own the web or something?

pierrefar2y ago

There is a a difference between a human being able to access content vs a search engine indexing it (and in the case of Brave, "licensing" it on).

memefrog2y ago

Grimburger2y ago

> There is a a difference between a human being able to access content vs a search engine indexing it

There should be no differentiation between a crawler and a human being with regards to what is being served.

waithuh2y ago

If the website owner wants google to own the web, they should be able to restrict their website. Nothing wrong with that.

bastawhiz2y ago

> the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way?

sangnoir2y ago

1. User agents should identify themselves

2. A crawler is not a User agent - it's an agent for Brave

>I don't think there's any court at this point that would back you up that freely published content annotated with full provenance cannot be scraped and published for a fee.

You can't end-run copyright like this: just because something is publicly available doesn't mean anyone can redistribute it. Look at the legal issues & cases relating to Library Genesis.

bastawhiz2y ago

> User agents should identify themselves

There is no rule that this is true, and many user agents exist _specifically to not be identified_. See Tor and other privacy-centric user agents.

> A crawler is not a User agent - it's an agent for Brave

You know, I thought "what does Wikipedia have to say on this matter?" and sure enough:

I can't even make that up.

> just because something is publicly available doesn't mean anyone can redistribute it

memefrog2y ago

cvalka2y ago

They are right and you are wrong. If some web page is publicly available, it should be indexed. Scraping neutrality, please.

waithuh2y ago

ric2b2y ago

> I should be able to allow or disallow any type of web crawler/scraper i want.

You're certainly allowed to try, but I don't see why indexers should be mandated to collaborate with you. They serve their users, not you.

1 more reply

tympious2y ago

What if I want what I publish to be known only by word of mouth?

What if I consider (some or any of) my ideas to be un-indexable, not directly suitable to representation in any hierarchy other than those I may set them in?

Grimburger2y ago

Then you should hide them behind a url that isn't linked elsewhere on your site that you can easily propagate by word of mouth only.

    example.com/correcthorsebatterystaple

If you consider "word of mouth" to be public posts on a forum which millions can read at any time then block googlebot IP's

2 more replies

memefrog2y ago

Then dont publish it.

vGPU2y ago

To me as a search engine end user, this kind of behavior is undesirable. Why would I want a website to selectively degrade my experience because of my choice in search engine or browser?

Brings back horrible flashbacks of “this website is only compatible with IE6”.

1vuio0pswjnm72y ago

pierrefar2y ago

This is explained more in the article I referred to, but briefly: Brave delegates crawling to normal Brave browsers, so it's a huge IP addresses pool, not a single IP address or range.

Also, these search crawls by the browser do not identify themselves beyond the Brave standard UA header, namely a plain Chrome user-agent string.

1 more reply

1vuio0pswjnm72y ago

According to Brave Privacy Policy, participation in the Web Discovery Project is "opt-in". How many Brave users have opted in to sending data to Brave.

How many Chrome users have opted in to sending data to Google.

Sometimes uninformed consent is not actually consent. These so-called "tech" companies love to toe that line.

mg2y ago

As far as I can tell, that makes for 5 independent image search engines on the web:

    Baidu
    Bing
    Brave
    Google
    Yandex

You can compare their results on this search comparison page I maintain:

https://www.gnod.com/search/?engines=p,o,br,n,q&nw=1

(If you want to also search image libraries like Flickr and Pexels, click on "more engines" to select all places you want to search)

esperent2y ago

I accidentally found a good test string for image search a few months ago:

Bamboo sign

Give it a try on Google images. You'll see that nearly all the results are x-rays of people with ankylosing spondylitis, a form of which is commonly referred to as "bamboo spine".

I tested out brave search - it correctly shows 90% signs however it does also show a few spines.

Google still shows incorrect results months later. It's by far the worst of all the search engines in the list for this simple and obvious search.

firejake3082y ago

silisili2y ago

Nice!

I actually like Brave here for my test better than Google. I typed in a few cities, just wanting to see the skylines and such.

Brave gives me good photos, some stock photos, etc. Google gives me pictures from recent news articles, which isn't overly helpful IMO.

isaacremuant2y ago

Brave is the one that censors less, from all those. Specially doesn't censor for political motives that I'm aware of.

That already makes it worth of support.

But Google having become so bad of late has made switching quite easy, even if brave is not getting better super fast, Google unfortunately is getting worse and making up for it.

yreg2y ago

Interesting, I regularly use both and I find Google to perform better for me than Brave (in text search).

jorvi2y ago

I have a deeeeeeep dislike for Google’s “must include: duck | missing: duck”.

magicalist2y ago

> Specially doesn't censor for political motives that I'm aware of

What are the censored image searches you found?

reitanqild2y ago

Try to search for the 1989 Tiananmen Square protests and massacre.

That tends to upset some engines including Bing I think.

2 more replies

jaystraw2y ago

I'm sure this isn't in the spirit or meaning of what you said, but to respond jokingly, I haven't found many censored anythings :-P turns out...

isaacremuant2y ago

I meant for text. Haven't really used the image feature yet.

ColinHayhurst2y ago

You missed out Mojeek. For an independent take: https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

SomeoneOnTheWeb2y ago

Shouldn't Kagi (https://kagi.com) also be on that list?

MildRant2y ago

Someone will correct me if I'm wrong but Kagi uses Google search results. I'm sure it's more complicated than that and they have their own secret sauce but it is not an independent search engine.

See: https://help.kagi.com/kagi/why-kagi/kagi-vs-google.html

Terretta2y ago

> Someone will correct me if I'm wrong but Kagi uses Google search results.

Click two links down in the same menu:

https://help.kagi.com/kagi/why-kagi/kagi-vs-brave.html

Our unique results combined from all of these sources help you discover the best content you can possibly find online, sometimes from the quieter places on the web.

1 more reply

anderber2y ago

I believe you're correct. Kagi just uses Google's API and makes some changes on top of it.

fsflover2y ago

One more, which is self-hosted, peer-to-peer and FLOSS: https://yacy.net

vGPU2y ago

Yacy search results/ranking are worse than useless

fsflover2y ago

Because too few people are participating.

1 more reply

NooneAtAll32y ago

does searx count as independent?

esperent2y ago

It's not a search engine, I think? It's a search aggregator - it combines results from other search engines.

ShrigmaMale2y ago

ahrefs

c.f. yep.com

cheald2y ago

ignitionmonkey2y ago

[1]: https://jahed.dev/2023/07/01/trying-brave-search/

waithuh2y ago

Out of curiosity, if you dont want to be 'tied to the larger players', why not use metasearch engines like SearX? If Google does not have a great answer, Bing does, or Mojeek.

ignitionmonkey2y ago

I'm referring to a default search engine. Something I can use by default everywhere and I can recommend to others. So it has to provide results based on typical use cases.

First I've heard of a "metasearch" engine. I just tried SearX and it gave me no results for my projects and only a couple for "Mastodon", but the animal not the software.

em-bee2y ago

SparkyMcUnicorn2y ago

I'm a little surprised to hear other people having such good results.

Would you say search has greatly improved over the past month or two?

cheald2y ago

I feel like it's improved steadily over time. I used to find it severely lacking for anything code-related, but that's improved, too.

It's not as good as Google was at its peak (and Google itself has degraded severely in quality, IMO), but it's good enough that I can generally find what I'm looking for with a minimum of effort.

I run maybe 1 in 40-50 searches with "!g" because Brave is insufficient, for context.

sundarurfriend2y ago

> stuff just wasn't showing up that I'd expect to be there

computronus2y ago

1 more reply

AlotOfReading2y ago

ignitionmonkey2y ago

>I wonder what's different about our searches and expectations.

It also shows Reddit discussions without needing to explicitly filter for it. And I use ad block to remove the AI summariser that takes up half the screen, it's not what I want from a search engine.

1 more reply

mox12y ago

Overall Brave search has been good for me. I have been using it as the default on every PC / Browser I utilize.

But when I put the !g in front of the query, the first result is always what I wanted.

On the other than, when doing more general searches, Brave is on par or better than google.

1 more reply

mrweasel2y ago

qwertox2y ago

I'm so skeptical that I'm just now starting to develop a feeling of trust towards DuckDuckGo.

In the browser domain, Mozilla is the only company of which I feel that it is genuinely pro-customer.

metadat2y ago

Quick FF enshittification example from 2 months ago:

Alert HN: Mozilla puts advertising into Firefox AGAIN

https://news.ycombinator.com/item?id=36351322 (48 comments)

Mozilla stops Firefox fullscreen VPN ads after user outrage

https://news.ycombinator.com/item?id=36085642 (220 comments)

hipsterstal1n2y ago

> the current CEO has directed the focus towards revenue-generating features.

> Do you really want Mozilla to get pinged with your IP address every time your browser process starts and exits? Yuck!

esperent2y ago

> No, but I really don't give a shit either.

Nobody's asking you to personally fight for everything. Life truly is too short for that. All we're asking for is your tacit support, or failing even that, your abstinence from the conversation.

metadat2y ago

The thing is about Mozilla of today is that the leadership isn't committed to the fundamentals -- privacy, security, functionality, and control by the end user.

At this point, Brave is a more promising bet than MZ.

soundnote2y ago

I mind Mozilla trying to find alternate revenue sources 0%. It's a good thing: Organizations like Mozilla and Brave SHOULD be making their own money and not be stuck to the Google teat.

Mozilla doesn't go about it in as upfront way as Brave does, IME, but stuff like VPN, Pocket and other browser-related services I mind not at all.

reitanqild2y ago

I also think it is great that browsers seek out alternative sources of funding.

My problems with Mozilla are:

- Nerfing the extension API.

- Writing "dear community members" in emails begging for money while simultaneously being rude to us in responses to real issues in Bugzilla.

Now, if anyone think I use Chrome, think again.

1 more reply

lern_too_spel2y ago

Brave sees Mozilla's political shitfest and raises a political shitbacchanalia.

1 more reply

cayley_graph2y ago

https://github.com/brave/web-discovery-project/blob/main/mod...

gurgunday2y ago

This is opt-in, and a rather hard-to-mistakenly-enable kind of opt-in that I personally think is consistent with their great™ privacy by default stance

Brave offers a simple deal: if you believe or have audited their technical claims, you give them fully anonymized snippets of URLs and web pages, and they improve their search index for everyone

guerrilla2y ago

nxrabl2y ago

vanviegen2y ago

I'm not sure if they can without rewriting the whole thing, the original (WebKit/KHTML) code base being GPL.

soundnote2y ago

They could pull a Red Hat.

msp262y ago

No mention of reverse image search sadly. I've been looking for anything better than Google/yandex forever.

davidm17292y ago

qingcharles2y ago

Thank you for you and your team's hard work on this. To break the small monopoly on image search is no easy thing and I appreciate it.

It also explains why Brave Search is slow today I suspect... :)

teruakohatu2y ago

Hi David, reverse image search is an easier problem than good search quality. I am happy to chat with you about it, email in HN profile.

guestbest2y ago

I’ve been using tineye.com

porridgeraisin2y ago

Bing visual search is great. I use it in cryptic hunts a bunch. Would of course prefer a non-MS offering though, I hope one comes along that's equally good.

veave2y ago

Bing has its own and it's not bad.

Usually google+tineye+bing+yandex will yield some useful results

CoBE102y ago

kilroy1232y ago

If you're looking for people I was add pimeyes to that list.

CoBE102y ago

Also reported here: https://www.theregister.com/2023/08/03/brave_cuts_ties_with_...

NayamAmarshe2y ago

That's awesome! I've been using Brave Search for years and it's my favorite search engine ever!

user39393822y ago

It surprisingly comes up for image searching. Google for example has been known to censor images of Tiananmen Square.

sundarurfriend2y ago

wusher2y ago

this is exactly why i switched from ddg to brave

gettodachoppa2y ago

I didn't even know about Brave Search! Thanks for the heads up, I'll switch now and see how it compares. DDG's CEO praising censorship of Russian sites definitely soured me on them.

eviks2y ago

Very nice!

rejectfinite2y ago

Very nice! I just switched to Brave Search from DDG on Brave and I kind of like it.

Now... I would just like to see the full https url on searches.

I already love the look, Brave summarizer AI and general results!

gettodachoppa2y ago

I like Brave and trust Brendan Eich, but what happens when they cash out and the next owners decide to monetize all this data?

Is there any guarantee against this?

BrendanEich2y ago

We don't collect any data, so there's nothing to "monetize". We bring ad matching and blind confirmations to the browser, no need for collection.

butz2y ago

Good, but we need more alternative search engines, even very niche ones.

soundnote2y ago

Excellent news. Now to see how well it holds up.

whalesalad2y ago

I wish brave (and everyone else on the internet) would abandon the poppins font it's nauseating.

post-factum2y ago

I'd appreciate if it'd possible to select Czech Republic and Ukraine as regions for search.

lern_too_spel2y ago

When will Brave Search launch a crawler update that lets me specifically block its crawler in robots.txt like every other search engine supports?

blacksmith_tb2y ago

I see they say "if a domain or page is not crawlable by any search engine (it has a noindex tag), or if it is not crawlable by googlebot, then Brave Search’s bot will not crawl it either."

1: https://brave.com/search/api/

cpeterso2y ago

lern_too_spel2y ago

What if I want googlebot to crawl it but not bravebot? Every other search engine lets me block its crawler specifically. Only Brave has this shady policy.

hightrix2y ago

> What if I want googlebot to crawl it but not bravebot?

Then you need to gate your content such that it is not available openly to the public.

This falls inline with many objections to Google's WEI. If you host content openly and allow access freely, then don't be surprised when people access it at will and use it for free.

1 more reply

blacksmith_tb2y ago

Hmm, I agree it's odd, but 'shady' seems to attribute malice to what could just be stupidity?

2 more replies

gettodachoppa2y ago

Youu want the monopolistic tech giant to crawl you but not a small privacy-focused company? What possible justification could you have for this attitude?

1 more reply

bravetraveler2y ago

I'm conflicted - I see your point and agree; though I appreciate that by using methods of others... we don't end up with more

Loosely related XKCD: https://xkcd.com/927/

j / k navigate · click thread line to collapse