Dawn of a new era in Search: Balancing innovation, competition, and public good (opens in new tab)

[1] https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billio...

Fire-Dragon-DoL1y ago

I don't have the advanced assistant. Is it only on ultimate?

cube22221y ago

Yeah, most AI features are.

adamcharnock1y ago

> The Google Search Index is a unique and irreplaceable resource within the digital ecosystem. Mandating fair access to it or treating it as an essential facility could address the core issues...

The article estimates the Google Search Index at 12.5PB. If Kagi thinks that is a big enough moat to be the primary target then, well, I suppose they should know. But I'm also skeptical. You could fit that on about 50 Hetzner SX295, so about $20k/month. Plus the cost of gathering the data. It is surely a huge resource.

But weighed against the combination of Google Search + AdWords + Android + YouTube + Chrome, all in a single company? To me a 12.5PB search index feels like small change in comparison.

NB: Happy Kagi-paying customer here.

freediver1y ago

> The article estimates the Google Search Index at 12.5PB.

I realize there was a mistake with the estimated number (thanks for pointing out, should be closer to 180 PB for raw crawl data). Since this is speculative and also does not account for other data needed to actually rank pages, hardware to do it in under 500ms at a scale of billions of queries per day and thus can be misleading in terms of true effort to do it, I edited that datapoint out of the article.

You are right, just crawling large number of pages (millions even billions) is indeed straightforward (eg [1]), it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain. Microsoft spent $100bn and last 20 years by their own account trying to match it and most people agree it is still not even close. At some point you reach diminishing returns. To use the analogy from the article, it is akin to someone trying to rebuild all of the US railroad network today. Sounds plausible, but not really in practice. That train has left the station in early 2000s.

adamcharnock1y ago

Thank you for the reply!

> it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain

I am both happy to take your word for it, and also very interested to know more. If you were to write that article then I would love to read it.

IgorPartola1y ago

This puts it in enough perspective for me to ask: why doesn’t a university create a public/open source search index? Seems like a way to get a ton of attention.

Moreover, archive.org has all the data and data storage capabilities many times over. What prevents them from creating an open source search engine?

scroot1y ago

Or the Library of Congress, if it had the right appropriations. Google itself started with an NSF grant to explore the future of libraries.

arnaudsm1y ago

I don't buy this number. Text-only common crawl is 20TB. Remove spam and dupes, you're around <10TB of current useful data. Which you can parse and index on a single server nowadays.

It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller.

1vuio0pswjnm71y ago

Does CC publish the methodology for how they determine what to crawl. More particularly, how do they determine what not to crawl.

dmonitor1y ago

I assume that the major hurdle is not storing an equivalently-sized search index, but building one from scratch. Crawling takes time, and Google has had a many years head start.

pierrefermat11y ago

Yes OP is hilariously out of touch, the storage would be sub 1% of total costs.

ldayley1y ago

I've been using (and occasionally paying) Kagi on and off for a couple of years now. I truly think they're building something interesting and valuable! While I haven't agreed with every product decision they've made, the founder is very good at both understanding his business and also explaining their decisions. This is a well crafted explainer of the search business and the monopoly case-- much better for sharing with less tech-savvy peers than most mainstream media explainers on this subject!

Edit: wording

Edit 2: Can you imagine a world where Google's Internet Search Index is legally considered an "Essential Facility"!? https://law.stanford.edu/publications/essential-platforms/

somethoughts1y ago

I think its instructive to look at the early history of Google and Facebook. In the early years they did not really turn on the ad revenue levers and just focused on increasing users (i.e. Don't Be Evil) - until a decade after offering their respective services.

Similarly Netflix is just now starting the ad revenue model after years of only subscription based services.

Eventually the temptation for multiple sources of revenue (i.e. subscription AND advertising) will likely be too great due to:

- IPO and Wall Street demands net income growth (i.e. FB/Google)

- Private Equity buys the company and needs to pay back leveraged debt

- The number of customers willing to offer up a credit card for Search stagnates and a lower cost ad tier appears and the ad infrastructure that is built is applied to the paid tiers

squiggy221y ago

I'd argue the minute they turn to an advertising model that subscriptions would churn overnight, but in the same vein as the don't be evil motto being dropped I nod at the scepticism.

pennybanks1y ago

i honestly cant imagine paying for a search engine with whats out there for free

enasterosophes1y ago

When you use Google or another ad-powered search engine, you're saying that it's okay for some company to pay to bias the results away from your best interests.

When you use, say, Kagi, you become the customer; they have a vested interest in providing you with the best experience they can, because they know they're relying on your continued patronage to be able to keep competing with those other free search engines which you already admitted are pretty good.

The other three answers to your question were dumb, so I'll try to do better. Kagi has a free trial of 100 searches without requiring a credit card to sign up. Honest to god, try it. Just bookmark it and search once or twice when Google or Bing is giving you meh results.

As an engineer... googling for stuff is a good fraction of your job. It's quite reasonable to pay to improve that experience. The only real question is what is reasonable to pay.

mystified50161y ago

Kagi allows you to permanently blacklist Pinterest from appearing in any search ever.

It also gives you exactly what you ask for. If you put words in quotes, you only get results matching that phrase. Same with +- modifiers, and all the other "advanced" search operators.

Meanwhile on any free engine, they often completely ignore your query to show you unrelated SEO slop. You can completely forget modifiers and advanced queried, they aren't even parsed. About the only thing that still works is the site: modifier. And I'm pretty sure they only keep that because 40% of google searches include site:reddit

As well, if Kagi can't find a result for your query, it returns nothing. Try searching something incredibly obscure on google or DDG. You get pages upon pages of results and they're all useless garbage or just straight up ads disguised as results.

That's why I pay for Kagi. It stays the hell out of my way and only gives me exactly what I ask for and absolutely nothing else.

NetOpWibby1y ago

You get what you pay for

baggachipz1y ago

Then you don't understand what you're getting. If you want to be targeted and misled by opposing incentives, that's on you.

jbaber1y ago

Nothing stopping you from trying for one month to confirm.

endisneigh1y ago

You’re getting downvoted but I agree. I didn’t see the value and stopped after a while of paying. It is sad that basically any comment that isn’t pro-Kagi is downvoted but I guess that’s the Reddit-ification of this site.

Most people who complain about Google don’t even use it properly (e.g. PSE).

> Google has built a massive index of the internet that covers close to 100% of the accessible web.

While their index (of other peoples stuff) is enormous it far from includes everything. It is easy to disqualify and people would be screaming if content farms would be included. What even is a content farm nowadays? One can return a reasonable article for any query with llms rich in links to other pages that don't exist but could be indexed and are part of the accessible web

If you make a new website with a few thousand pages and a few thousand images it takes quite a while for google to pick up the entire thing, if it even bothers to.

google tries to fill the result page with a small subset of websites. A good thing for users most of the time and the easiest ad money but horrible for new players.

it use to be quite common for bloggers (and others) to follow everything written about them or of interest. google (blog search) and technorati were very useful for that kind of discovery.

The average user might never have noticed that but when it was killed off the www stopped being a community.

We can pretend the index is still there. If you cant get to it it's s much like the llm content.

pierrefermat11y ago

It seems like in their mind the snake ate it's tail without realising, so they don't even know what's outside of Google's index.

https://www.robtex.com/cidr/209.216.230.0-24

on the outside are people who could make websites but cant see the point of it. They are on Reddit on facebook on hn on twitter on youtube or they just don't bother. On platforms they make very crappy low effort content (with unearned audience) compared to what they could be making. There is room for casual chatter but almost anyone can make an effort to write down their perspective on something and with some practice enlighten someone or a lot of someones. One might become someone if only in the world of pokemon collectors.

rychco1y ago

I’ve been a happy Kagi customer for probably 2 years(?) now. Highlights for me, as a professional:

- I can blacklist low-value domains (such as geeksforgeeks) that dominate the top of many programming searches.

- I can increase/decrease the priority of domains or pin domains to the top of searches, such as official documentation for languages/libraries.

- I can use “Lenses” to filter results for programming/academic/forum results.

abcdefg121y ago

The big question is wtf google isn’t doing the same? Instead they keep removing features and dumbing down their product

rychco1y ago

Allowing users to re-prioritize search results directly harms their true product: ads

baggachipz1y ago

This essentially advocates for the same thing defined in Cory Doctorow's The Internet Con: How to Seize the Means of Computation. That point being, requiring open protocols from big tech will enable competition and innovation. This will return the creative inspiration to the technologists. I completely agree with it, and I hope we are reaching an inflection point where walled data gardens are cracked open.

chrisweekly1y ago

I switched from Google to DDG for default search a few years ago, and then to Kagi maybe 18 months ago. Kagi's simply excellent.

andrewstuart1y ago

Who knows what websites and pages are even on the web?

There's no index to the web that I know of apart from Google and DuckDuckGo and maybe this Kagi thing.

I want to explore the web - surely search isn't the only way to use the web?

I imagine it could be fun to explore the web, lists and graphs of interest where I can hop from here to there via list of links or graphs or nodes or something?

Does anyone know of anything like this?

1vuio0pswjnm71y ago

There have been some websites in the past that allowed one to browse www content by IP address, covering what seemed to be the full range of IPv4 address space. For example, a page with a list of IP address ranges where each address range is a hyperlink. One could then drill down by following hyperlinks to a specific IP address and view whatever was hosted at that address (default host in the case of virtual hosting). Not sure why these websites do not persist. Quite useful. IMHO.

DNS zone files are a decent starting point for exploring the web. Not every registered domain name has an associated website but most do. The largest zone files are available to the public for free.

1vuio0pswjnm71y ago

Organisating a directory of websites, or information about websites, by IP address/range is still common. For example, we can see news.ycombinator.com listed here:

https://www.robtex.com/ip-lookup/209.216.230.240

The above directory has been online for at least 18 years.

Here is one I can remember that disappeared:

https://web.archive.org/web/20090907060026if_/http://onsamei...

nucleardog1y ago

> view whatever was hosted at that address (default host in the case of virtual hosting). Not sure why these websites do not persist.

You probably answered your own question here--I imagine at this point the amount of things on the internet that aren't using some sort of virtual hosting are quite tiny.

Even ignoring everything else, with the IP address exhaustion and the push for SSL (can't get a certificate for an IP address), websites available directly at an IP address just aren't really a thing anymore.

AndroidKitKat1y ago

While not quite what you're looking for, Kagi has a "Small Web" feed of sites that are semi-curated blogs. [1][2] I don't know how often it is updated, but I like to poke around every now and then see what's going on in people's corners of the internet.

[1] - https://blog.kagi.com/small-web [2] - https://kagi.com/smallweb

ColinHayhurst1y ago

Google indexes widely. DuckDuckGo and Kagi have small specialised indexes and as such rely on the larger indexes like Google, Bing and Mojeek. DuckDuckGo used to use Yandex. More information here: https://www.searchenginemap.com/

pbronez1y ago

Here’s the best list of search indexes I’ve come across

https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

firecall1y ago

Webrings?

There was an attempt to push a return of Webrings I think...

But funnily enough, whilst browsing this thread, https://news.ycombinator.com/item?id=41389642

I commented to my colleagues:

Remember when people would find good websites and share them!

madrox1y ago

I think Kagi is correct and that the way we explore information on the internet will look very different in X years with all the changes LLMs will bring. I think the real question will be what will it look like.

I don't think it looks like search today. Google got where they were because they were 10x better than everything else and had an experience focusing on what mattered at the time. I don't think the 10x experience will look like ten blue links. I don't know what that next experience is, but I'll know it when I see it.

icar1y ago

I wish Kagi was cheaper. It's expensive in many countries and it doesn't help that we are bombarded with subscriptions everywhere.

bugtodiffer1y ago

I wish Kagi would use the money to build a search engine.

But instead they use it to build a browser no one wants and an email service no one needs.

Yet their search website is still broken here and there...

benhurmarcel1y ago

I’ve had good results requesting fixes on https://kagifeedback.org/ , even minor/aesthetics stuff.

abcdefg121y ago

I do want their browser. Especially with chrome v3 going crapware route. At least there is brave.

poikroequ1y ago

> Apple has stated that Bing does not match Google’s search result quality, and they are unwilling to compromise on user experience by offering subpar results.

I wouldn't take this statement at face value. This is most likely a BS PR excuse for Apple to maintain their current deal with Google. I wouldn't expect anything less from any large corporation looking to protect $20 billion in annual revenue.

jpc01y ago

Universally every single time I've used bing, including this week. My response was to scroll through the entire first oage of results, swear, open google and with the exact same search params what I wanted was the first result.

Bing is objectively worse.

rqtwteye1y ago

Maybe Bing is managed by the same people that haven't managed to get Windows search right despite trying for many, many years?

xipho1y ago

Use DDG (Bing), append `!g` after you get nothing and search again? Do this with many many different `!`.

skinkestek1y ago

Last I tried at least:

- Current Google was bad compared to original Google (it ignores my keywords, even if I use doublequotes and verbatim)

- DuckDuckGo and Bing managed to be worse

- Kagi is like old Google

moonlion_eth1y ago

Paying kagi user. All day every day

freefaler1y ago

I've used Kagi for a year and it was better than Google for most of the searches.

However I think the model will be changed to something more like Perplexity.ai

I've switched to Perplexity and for most of the searches it works better than Kagi.

They'd need to add something like this to survive in the long run, because for exploratory searches tools like Perplexity are really good.

Terretta1y ago

Perplexity lets a firm switch on SSO and give perplexity to employees without a big barrier to entry. So, we bought it for our employees and if they use it great, if they don't great. Even though we're a small startup, this is true of almost any SaaS we find that lets us control the login to stay regulation compliant. If you like it, and show how it helps your day, and the SaaS let us control the login, we'll "just pay for it".

We will not, however, pay some four or five figure SSO tax for every SaaS. We'd be bankrupted.

Kagi should do this, or at least enable domain-specific OIDC/Oauth2 — the ubiquitous "Sign in with" or "Continue with" buttons like http://xsplit.com/user/auth or http://id.atlassian.com/login since MSFT and Google accounts hit almost all businesses — and then just bill the same as the individual pro price + usage pricing.

As it stands, we reimburse employees who buy Kagi individually, this costs us more than the cost of Kagi, and means it's only one off.

PS. Don't get me started on the MacOS and iOS apps that have no retail price version available. Apple provides no way for a firm to provide employees with IAP subscription apps whether BYOD or managed devices. We can, and do, provide any retail priced app for both BYOD and MDM. It shows up in a catalog on the device, people install it, you get a retail sale. Thank you to those devs who make a retail version available, even if its 2x - 5x the annual cost. Empowering employees with apps is a no brainer if devs just let firms pay them to do it.

cube22221y ago

I admit I haven’t used Perplexity much, but isn’t that already mostly covered by the researcher[0] and newly added assistant that can do web searching [1]?

[0]: https://help.kagi.com/kagi/ai/assistant.html#research

[1]: https://kagi.com/changelog#4529

https://www.audiowaveai.com/p/2626-dawn-of-a-new-era-in-sear...

AI is expensive to host, and I don't trust their puke. I'd rather get a classic result page. Kagi is great.

pbronez1y ago

Here’s a 30min audio version of TFA:

blackeyeblitzar1y ago

As someone who hasn’t used paid search services, what sort of problems has this solved for other HN users that make it worth it? How does it compare to “AI” based search tools like Perplexity?

pbronez1y ago

For me, Kagi is super fast and provides high quality, customizable results. Web search Just Works. When I use a new computer/browser and accidentally search with Google, it is a viscerally unpleasant surprise.

As for perplexity - I got a free year of Perplexity because I bought a Rabbit R1. I tried it, wasn’t impressed. I use Kagi’s AI assistant all the time. It’s my primary way of getting information from the web. I just type a a free form question into my address bar, append !expert for general questions or !code for technical ones, and seconds later my question is answered and I’m back to work.

nucleardog1y ago

I use the internet basically as an extension of my brain. There's very little barrier between having a thought and seeking information. Search is the usual way those two come together.

Google, these days, seems to mostly ignore whatever I've tried to search for and instead return results that I'd call "more popular". So the top results are mostly generic, useless results and below that it's mostly blog spam or wildly unrelated things.

This is especially bad when I'm looking for specific technical documentation or trying to understand unusual or obscure problems. Usually it's returning nothing useful at all.

Kagi returns results actually related to what I'm looking for more often than not.

The thing that convinced me to pay for it was a single search. I kept hitting _something_ that was causing the Apple TV to stop showing how much time was remaining in a show and instead show something else.

I went to Kagi and searched "netflix apple TV showing wrong time remaining" (my incomplete but best understanding of the problem at the time). Kagi surfaced a result that explained what this was and how it was getting triggered as the fourth result.

I went back to Google and searched the same. Top result was "If you can't change the time or time zone on your Apple Device" from Apple. Second was "Netflix audio is out of sync" from Netflix. With the benefit of knowing what the answer was, I did find a single relevant result about 25 results down mixed in with some blog spam on removing a show from "Continue Watching" on Disney Plus, listicles on hidden ways to make the Apple TV app on your phone even better, and a link to a Google Books copy of a 2008 Men's Health Magazine (?!).

Every time I accidentally end up back on Google it's... jarring to say the least.

endisneigh1y ago

A smaller competitor wants and advocates for itself. Makes sense but is it really surprising? It would be strange otherwise.

I do wonder how far one can get charging for search.

mediumsmart1y ago

Now it dawns on me that in this new era I can write an article for the public good and by balancing the wording I am able to place it before the 3852 competing public good articles on the search results page. lets innovate

abtinf1y ago

This is an extremely disappointing post. In the past, I’ve enthusiastically supported and advocated for Kagi.

But Kagi advocating for using force to destroy its competitors is completely unacceptable to me and an admission that they do not believe they have a viable product.

Antitrust law is arbitrary and evil. If you make more money than your competitors, you have undue market power. If you price below your competitors, you are dumping. If you price the same as your competitors, you are colluding. The whole thing is a naked power grab by politicians and inferior companies.

This is a sad day. Kagi is the best thing that’s happened to the internet in the past decade. And now I have to stop my auto renew.

kingstoned1y ago

I don't see how Kagi are advocating to destroy Google and especially competitors. They write about potential actions government might do to make the field more competitive.

Keep in mind that it's the Google that was government funded when it got started via NSF and university system. Also, a ton of subsidies, tax breaks for their subsidiaries, direct payments from government (e.g. google cloud gov contracts, military recruitment ads on youtube...).

baggachipz1y ago

They're not advocating for the destruction of competitors. They're arguing FOR competitors. Google has already been ruled an illegal monopoly, and now it's time to figure out what to do about it. Kagi is saying that rather than split up products, require the protocols to be open and usable for all. That's it.

freediver1y ago

> But Kagi advocating for using force to destroy its competitors is completely unacceptable to me

Everyone is entitled to their own interpretation, but that is not what the article advocates at all. The article is about what is best for the user given the circumstances, where all other proposed remedies have focused on how to hurt Google, which article argues to be counter-productive.

The ruling has already been made and a remedy will be chosen whether we agree with the ruling or not - so which one is the best for the users? The solution that is proposed in the article would actually mean increased competition in the space, including to Kagi.

abtinf1y ago

You have benefited my life immensely, so I will take the time to write the most helpful response I can.

*What is the nature of competition in the free market?*

The premise that Google’s success constitutes a monopoly that must be dismantled to protect consumers is rooted in a flawed interpretation of the market and the moral principles governing it.

In a free market, companies make money by offering the products and services demanded by customers. Google's dominance is not the result of coercion but its ability to meet customer demand effectively. Antitrust law declares that the choices of billions of individuals are wrong — not because those trading with the company have been treated unfairly (if they had, they wouldn’t have traded), but because those who know better through some mystical connection with the good declare it so using arbitrary standards.

There have been many competitors to Google, with at least one of them extremely well funded. There are other massive internet-scale companies that could compete with Google, though they’ve often failed in past attempts. Punishing the owners of Google for its success is to punish the very qualities that drive progress and prosperity.

You suggest that Google's control over the search index and its associated business models warrant government intervention to force “fair” access, because it would promote competition. Whatever else you mean by “fair”, you certainly don’t mean “terms that Google would find agreeable”. If one of the parties to a transaction is an unwilling participant, then it cannot be a fair transaction. We have names for this when the coercing party is anyone but the government: blackmail, protection rackets, theft, mugging, and so on.

*What about property rights?*

Simply put, Google’s indexes are the property of Google’s shareholders. Their property was created through decades of sustained investment and innovation. To force it to share its core technologies with competitors would be a gross violation of property rights—the very foundation of a free and productive society.

Your proposal to treat Google’s search index as an "essential facility" that must be shared “fairly” is a call for expropriation—the forced redistribution of Google’s property for the benefit of those whose sole claim to it is that they have not earned it.

The entire argument against Google is that they must be punished because they are successful. Kagi, even as a PBC, will one day be subject to the exact same arguments if it creates as much value as Google—someone will claim that some aspect of Kagi’s operations constitute a monopoly and Kagi should be forced to provide access to some data or service.

*What does it mean for the government to intervene?*

You claim consumers are harmed by the lack of choice in search experiences. Well, what about me? I love your product and pay for it. So do another 32,633 other subscribers. Apparently, we are free to choose your product, and I’m certain many of us advocate for your products to our friends and family. But your product isn’t for everyone. Maybe not even for most. If people want Google and free search, and no one’s rights are being violated, by what authority does anyone get to tell them they are wrong?

Consumers choose Google because they think it meets their needs, not because they are coerced into using it.

Government’s only tool is violence. It can force people to do something (rare, and usually ineffective), or force people to not do something (very common, and usually equally ineffective but with disastrous side effects). Government can’t create "more diverse competition" by forcing Google’s to share its property; competition is the result of companies striving to innovate and offer better products. Government can’t level the playing field by forcing the competitors to do better, it can only cut off the legs of the best players to bring them down to the level of the competition.

Were your proposal for the Google index actually implement, I speculate it would be a few short years until you would see the government declare all other company’s indices to be illegal and subject to regulation--after all, if its a public good, then it must be supported and defended by the public against threats. I would expect to see forced mergers of other indices into the Google index.

*What about non-moral arguments?*

I’m not going to go into detail, but based on my education and extensive past research, in good faith I assert the following to be true: there are no documented instances of long-lasting monopolies that are not perpetuated by governments; every antitrust intervention has made consumers materially worse off; government interventions nearly always have the effect of significantly reducing competition.

If you identify of a robust counter example to any of those claims, you could probably win a Nobel prize.

*What could you do instead?*

A judge has made an unjust ruling advancing an unjust law toward unjust ends--that doesn’t mean the conversation should shift to deciding on the best unjust punishments. That would be a contradiction: there is no such thing as the most-just-unjust punishment.

You could continue your education and advocacy of the true cost of “free” search. Take out an OpEd in NYT hammering Apple’s duplicitous claims of being privacy respecting while sending every customer’s most intimate thoughts to the Google monstrosity by default. Demonstrate the cost to privacy. Demonstrate how your product exposes resources that are hidden by the search giants. Show people the super powers they can have for a modest fee.

And build a better product. Hell, make a Kagi Phone—I’d preorder one even if it were years away from release—and free us from the Apple/Google hegemony.

[0]: https://help.kagi.com/kagi/features/lenses.html

Destiner1y ago

i feel bad for kagi/ddg

they are still trying to fight the google by building pretty much the same product

while perplexity is obviously in the lead by being ai-first

freediver1y ago

For many people (me included) being AI-first is a bug, not a feature :)

j / k navigate · click thread line to collapse

79 comments

cube22221y ago

I've been using Kagi for a while (almost two years now!) and it's been nothing but excellent!

Lenses are very useful (Reddit lens is on every second search), and I personally really like the AI features they are working on.

The quick assist triggered by a question mark at the end of a search query which makes a quick ai-generated summary of the few top results is something I use constantly.

All in all, great product which I'm happy to pay for.

baybayblonde1y ago

What is "Lenses", is it the Google Lens?

cube22221y ago

Here's the docs[0].

The cool thing is you can enable a Lens when using the AI assistant with internet access, so its searches are also constrained to that Lens.

[1] https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billio...

Fire-Dragon-DoL1y ago

I don't have the advanced assistant. Is it only on ultimate?

cube22221y ago

Yeah, most AI features are.

adamcharnock1y ago

> The Google Search Index is a unique and irreplaceable resource within the digital ecosystem. Mandating fair access to it or treating it as an essential facility could address the core issues...

But weighed against the combination of Google Search + AdWords + Android + YouTube + Chrome, all in a single company? To me a 12.5PB search index feels like small change in comparison.

NB: Happy Kagi-paying customer here.

freediver1y ago

> The article estimates the Google Search Index at 12.5PB.

adamcharnock1y ago

Thank you for the reply!

> it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain

I am both happy to take your word for it, and also very interested to know more. If you were to write that article then I would love to read it.

IgorPartola1y ago

This puts it in enough perspective for me to ask: why doesn’t a university create a public/open source search index? Seems like a way to get a ton of attention.

Moreover, archive.org has all the data and data storage capabilities many times over. What prevents them from creating an open source search engine?

scroot1y ago

Or the Library of Congress, if it had the right appropriations. Google itself started with an NSF grant to explore the future of libraries.

arnaudsm1y ago

I don't buy this number. Text-only common crawl is 20TB. Remove spam and dupes, you're around <10TB of current useful data. Which you can parse and index on a single server nowadays.

It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller.

1vuio0pswjnm71y ago

Does CC publish the methodology for how they determine what to crawl. More particularly, how do they determine what not to crawl.

dmonitor1y ago

I assume that the major hurdle is not storing an equivalently-sized search index, but building one from scratch. Crawling takes time, and Google has had a many years head start.

pierrefermat11y ago

Yes OP is hilariously out of touch, the storage would be sub 1% of total costs.

ldayley1y ago

Edit: wording

Edit 2: Can you imagine a world where Google's Internet Search Index is legally considered an "Essential Facility"!? https://law.stanford.edu/publications/essential-platforms/

somethoughts1y ago

Similarly Netflix is just now starting the ad revenue model after years of only subscription based services.

Eventually the temptation for multiple sources of revenue (i.e. subscription AND advertising) will likely be too great due to:

- IPO and Wall Street demands net income growth (i.e. FB/Google)

- Private Equity buys the company and needs to pay back leveraged debt

- The number of customers willing to offer up a credit card for Search stagnates and a lower cost ad tier appears and the ad infrastructure that is built is applied to the paid tiers

squiggy221y ago

I'd argue the minute they turn to an advertising model that subscriptions would churn overnight, but in the same vein as the don't be evil motto being dropped I nod at the scepticism.

pennybanks1y ago

i honestly cant imagine paying for a search engine with whats out there for free

enasterosophes1y ago

When you use Google or another ad-powered search engine, you're saying that it's okay for some company to pay to bias the results away from your best interests.

As an engineer... googling for stuff is a good fraction of your job. It's quite reasonable to pay to improve that experience. The only real question is what is reasonable to pay.

mystified50161y ago

Kagi allows you to permanently blacklist Pinterest from appearing in any search ever.

It also gives you exactly what you ask for. If you put words in quotes, you only get results matching that phrase. Same with +- modifiers, and all the other "advanced" search operators.

That's why I pay for Kagi. It stays the hell out of my way and only gives me exactly what I ask for and absolutely nothing else.

NetOpWibby1y ago

You get what you pay for

baggachipz1y ago

Then you don't understand what you're getting. If you want to be targeted and misled by opposing incentives, that's on you.

jbaber1y ago

Nothing stopping you from trying for one month to confirm.

endisneigh1y ago

Most people who complain about Google don’t even use it properly (e.g. PSE).

> Google has built a massive index of the internet that covers close to 100% of the accessible web.

If you make a new website with a few thousand pages and a few thousand images it takes quite a while for google to pick up the entire thing, if it even bothers to.

google tries to fill the result page with a small subset of websites. A good thing for users most of the time and the easiest ad money but horrible for new players.

it use to be quite common for bloggers (and others) to follow everything written about them or of interest. google (blog search) and technorati were very useful for that kind of discovery.

The average user might never have noticed that but when it was killed off the www stopped being a community.

We can pretend the index is still there. If you cant get to it it's s much like the llm content.

pierrefermat11y ago

It seems like in their mind the snake ate it's tail without realising, so they don't even know what's outside of Google's index.

https://www.robtex.com/cidr/209.216.230.0-24

rychco1y ago

I’ve been a happy Kagi customer for probably 2 years(?) now. Highlights for me, as a professional:

- I can blacklist low-value domains (such as geeksforgeeks) that dominate the top of many programming searches.

- I can increase/decrease the priority of domains or pin domains to the top of searches, such as official documentation for languages/libraries.

- I can use “Lenses” to filter results for programming/academic/forum results.

abcdefg121y ago

The big question is wtf google isn’t doing the same? Instead they keep removing features and dumbing down their product

rychco1y ago

Allowing users to re-prioritize search results directly harms their true product: ads

baggachipz1y ago

chrisweekly1y ago

I switched from Google to DDG for default search a few years ago, and then to Kagi maybe 18 months ago. Kagi's simply excellent.

andrewstuart1y ago

Who knows what websites and pages are even on the web?

There's no index to the web that I know of apart from Google and DuckDuckGo and maybe this Kagi thing.

I want to explore the web - surely search isn't the only way to use the web?

I imagine it could be fun to explore the web, lists and graphs of interest where I can hop from here to there via list of links or graphs or nodes or something?

Does anyone know of anything like this?

1vuio0pswjnm71y ago

DNS zone files are a decent starting point for exploring the web. Not every registered domain name has an associated website but most do. The largest zone files are available to the public for free.

1vuio0pswjnm71y ago

Organisating a directory of websites, or information about websites, by IP address/range is still common. For example, we can see news.ycombinator.com listed here:

https://www.robtex.com/ip-lookup/209.216.230.240

The above directory has been online for at least 18 years.

Here is one I can remember that disappeared:

https://web.archive.org/web/20090907060026if_/http://onsamei...

nucleardog1y ago

> view whatever was hosted at that address (default host in the case of virtual hosting). Not sure why these websites do not persist.

You probably answered your own question here--I imagine at this point the amount of things on the internet that aren't using some sort of virtual hosting are quite tiny.

AndroidKitKat1y ago

[1] - https://blog.kagi.com/small-web [2] - https://kagi.com/smallweb

ColinHayhurst1y ago

pbronez1y ago

Here’s the best list of search indexes I’ve come across

https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

firecall1y ago

Webrings?

There was an attempt to push a return of Webrings I think...

But funnily enough, whilst browsing this thread, https://news.ycombinator.com/item?id=41389642

I commented to my colleagues:

Remember when people would find good websites and share them!

madrox1y ago

icar1y ago

I wish Kagi was cheaper. It's expensive in many countries and it doesn't help that we are bombarded with subscriptions everywhere.

bugtodiffer1y ago

I wish Kagi would use the money to build a search engine.

But instead they use it to build a browser no one wants and an email service no one needs.

Yet their search website is still broken here and there...

benhurmarcel1y ago

I’ve had good results requesting fixes on https://kagifeedback.org/ , even minor/aesthetics stuff.

abcdefg121y ago

I do want their browser. Especially with chrome v3 going crapware route. At least there is brave.

poikroequ1y ago

> Apple has stated that Bing does not match Google’s search result quality, and they are unwilling to compromise on user experience by offering subpar results.

jpc01y ago

Bing is objectively worse.

rqtwteye1y ago

Maybe Bing is managed by the same people that haven't managed to get Windows search right despite trying for many, many years?

xipho1y ago

Use DDG (Bing), append `!g` after you get nothing and search again? Do this with many many different `!`.

skinkestek1y ago

Last I tried at least:

- Current Google was bad compared to original Google (it ignores my keywords, even if I use doublequotes and verbatim)

- DuckDuckGo and Bing managed to be worse

- Kagi is like old Google

moonlion_eth1y ago

Paying kagi user. All day every day

freefaler1y ago

I've used Kagi for a year and it was better than Google for most of the searches.

However I think the model will be changed to something more like Perplexity.ai

I've switched to Perplexity and for most of the searches it works better than Kagi.

They'd need to add something like this to survive in the long run, because for exploratory searches tools like Perplexity are really good.

Terretta1y ago

We will not, however, pay some four or five figure SSO tax for every SaaS. We'd be bankrupted.

As it stands, we reimburse employees who buy Kagi individually, this costs us more than the cost of Kagi, and means it's only one off.

cube22221y ago

I admit I haven’t used Perplexity much, but isn’t that already mostly covered by the researcher[0] and newly added assistant that can do web searching [1]?

[0]: https://help.kagi.com/kagi/ai/assistant.html#research

[1]: https://kagi.com/changelog#4529

https://www.audiowaveai.com/p/2626-dawn-of-a-new-era-in-sear...

AI is expensive to host, and I don't trust their puke. I'd rather get a classic result page. Kagi is great.

pbronez1y ago

Here’s a 30min audio version of TFA:

blackeyeblitzar1y ago

As someone who hasn’t used paid search services, what sort of problems has this solved for other HN users that make it worth it? How does it compare to “AI” based search tools like Perplexity?

pbronez1y ago

nucleardog1y ago

I use the internet basically as an extension of my brain. There's very little barrier between having a thought and seeking information. Search is the usual way those two come together.

This is especially bad when I'm looking for specific technical documentation or trying to understand unusual or obscure problems. Usually it's returning nothing useful at all.

Kagi returns results actually related to what I'm looking for more often than not.

Every time I accidentally end up back on Google it's... jarring to say the least.

endisneigh1y ago

A smaller competitor wants and advocates for itself. Makes sense but is it really surprising? It would be strange otherwise.

I do wonder how far one can get charging for search.

mediumsmart1y ago

abtinf1y ago

This is an extremely disappointing post. In the past, I’ve enthusiastically supported and advocated for Kagi.

But Kagi advocating for using force to destroy its competitors is completely unacceptable to me and an admission that they do not believe they have a viable product.

This is a sad day. Kagi is the best thing that’s happened to the internet in the past decade. And now I have to stop my auto renew.

kingstoned1y ago

I don't see how Kagi are advocating to destroy Google and especially competitors. They write about potential actions government might do to make the field more competitive.

baggachipz1y ago

freediver1y ago

> But Kagi advocating for using force to destroy its competitors is completely unacceptable to me

abtinf1y ago

You have benefited my life immensely, so I will take the time to write the most helpful response I can.

*What is the nature of competition in the free market?*

The premise that Google’s success constitutes a monopoly that must be dismantled to protect consumers is rooted in a flawed interpretation of the market and the moral principles governing it.

*What about property rights?*

*What does it mean for the government to intervene?*

Consumers choose Google because they think it meets their needs, not because they are coerced into using it.

*What about non-moral arguments?*

If you identify of a robust counter example to any of those claims, you could probably win a Nobel prize.

*What could you do instead?*

And build a better product. Hell, make a Kagi Phone—I’d preorder one even if it were years away from release—and free us from the Apple/Google hegemony.