Lenses are very useful (Reddit lens is on every second search), and I personally really like the AI features they are working on.
The quick assist triggered by a question mark at the end of a search query which makes a quick ai-generated summary of the few top results is something I use constantly.
The new more advanced assistant which is able to do searches, which can also be constrained to lenses, and lets you pick an arbitrary model, is also excellent, and basically means I don't need a chatgpt/claude subscription, as Kagi covers it very well.
All in all, great product which I'm happy to pay for.
It's basically a saved partial filter that you can use when searching. So you can e.g. say you only want to search reddit and have that saved as an easily-accessible button. But there's also predefined ones like "Forums" and "Academic", so that's the general idea.
The cool thing is you can enable a Lens when using the AI assistant with internet access, so its searches are also constrained to that Lens.
The article estimates the Google Search Index at 12.5PB. If Kagi thinks that is a big enough moat to be the primary target then, well, I suppose they should know. But I'm also skeptical. You could fit that on about 50 Hetzner SX295, so about $20k/month. Plus the cost of gathering the data. It is surely a huge resource.
But weighed against the combination of Google Search + AdWords + Android + YouTube + Chrome, all in a single company? To me a 12.5PB search index feels like small change in comparison.
NB: Happy Kagi-paying customer here.
I realize there was a mistake with the estimated number (thanks for pointing out, should be closer to 180 PB for raw crawl data). Since this is speculative and also does not account for other data needed to actually rank pages, hardware to do it in under 500ms at a scale of billions of queries per day and thus can be misleading in terms of true effort to do it, I edited that datapoint out of the article.
You are right, just crawling large number of pages (millions even billions) is indeed straightforward (eg [1]), it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain. Microsoft spent $100bn and last 20 years by their own account trying to match it and most people agree it is still not even close. At some point you reach diminishing returns. To use the analogy from the article, it is akin to someone trying to rebuild all of the US railroad network today. Sounds plausible, but not really in practice. That train has left the station in early 2000s.
[1] https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billio...
> it is about creating a searchable index of the web scale that has certain quality level that is simply impossible to do anymore for many reasons that would require another article to explain
I am both happy to take your word for it, and also very interested to know more. If you were to write that article then I would love to read it.
Moreover, archive.org has all the data and data storage capabilities many times over. What prevents them from creating an open source search engine?
It's the full Google index history with full HTML that is probably 12PB, but the useful part of the search engine is much smaller.
Edit: wording
Edit 2: Can you imagine a world where Google's Internet Search Index is legally considered an "Essential Facility"!? https://law.stanford.edu/publications/essential-platforms/
Similarly Netflix is just now starting the ad revenue model after years of only subscription based services.
Eventually the temptation for multiple sources of revenue (i.e. subscription AND advertising) will likely be too great due to:
- IPO and Wall Street demands net income growth (i.e. FB/Google)
- Private Equity buys the company and needs to pay back leveraged debt
- The number of customers willing to offer up a credit card for Search stagnates and a lower cost ad tier appears and the ad infrastructure that is built is applied to the paid tiers
When you use, say, Kagi, you become the customer; they have a vested interest in providing you with the best experience they can, because they know they're relying on your continued patronage to be able to keep competing with those other free search engines which you already admitted are pretty good.
As an engineer... googling for stuff is a good fraction of your job. It's quite reasonable to pay to improve that experience. The only real question is what is reasonable to pay.
It also gives you exactly what you ask for. If you put words in quotes, you only get results matching that phrase. Same with +- modifiers, and all the other "advanced" search operators.
Meanwhile on any free engine, they often completely ignore your query to show you unrelated SEO slop. You can completely forget modifiers and advanced queried, they aren't even parsed. About the only thing that still works is the site: modifier. And I'm pretty sure they only keep that because 40% of google searches include site:reddit
As well, if Kagi can't find a result for your query, it returns nothing. Try searching something incredibly obscure on google or DDG. You get pages upon pages of results and they're all useless garbage or just straight up ads disguised as results.
That's why I pay for Kagi. It stays the hell out of my way and only gives me exactly what I ask for and absolutely nothing else.
Most people who complain about Google don’t even use it properly (e.g. PSE).
While their index (of other peoples stuff) is enormous it far from includes everything. It is easy to disqualify and people would be screaming if content farms would be included. What even is a content farm nowadays? One can return a reasonable article for any query with llms rich in links to other pages that don't exist but could be indexed and are part of the accessible web
If you make a new website with a few thousand pages and a few thousand images it takes quite a while for google to pick up the entire thing, if it even bothers to.
google tries to fill the result page with a small subset of websites. A good thing for users most of the time and the easiest ad money but horrible for new players.
it use to be quite common for bloggers (and others) to follow everything written about them or of interest. google (blog search) and technorati were very useful for that kind of discovery.
The average user might never have noticed that but when it was killed off the www stopped being a community.
We can pretend the index is still there. If you cant get to it it's s much like the llm content.
- I can blacklist low-value domains (such as geeksforgeeks) that dominate the top of many programming searches.
- I can increase/decrease the priority of domains or pin domains to the top of searches, such as official documentation for languages/libraries.
- I can use “Lenses” to filter results for programming/academic/forum results.
There's no index to the web that I know of apart from Google and DuckDuckGo and maybe this Kagi thing.
I want to explore the web - surely search isn't the only way to use the web?
I imagine it could be fun to explore the web, lists and graphs of interest where I can hop from here to there via list of links or graphs or nodes or something?
Does anyone know of anything like this?
DNS zone files are a decent starting point for exploring the web. Not every registered domain name has an associated website but most do. The largest zone files are available to the public for free.
https://www.robtex.com/cidr/209.216.230.0-24
https://www.robtex.com/ip-lookup/209.216.230.240
The above directory has been online for at least 18 years.
Here is one I can remember that disappeared:
https://web.archive.org/web/20090907060026if_/http://onsamei...
You probably answered your own question here--I imagine at this point the amount of things on the internet that aren't using some sort of virtual hosting are quite tiny.
Even ignoring everything else, with the IP address exhaustion and the push for SSL (can't get a certificate for an IP address), websites available directly at an IP address just aren't really a thing anymore.
[1] - https://blog.kagi.com/small-web [2] - https://kagi.com/smallweb
https://seirdy.one/posts/2021/03/10/search-engines-with-own-...
There was an attempt to push a return of Webrings I think...
But funnily enough, whilst browsing this thread, https://news.ycombinator.com/item?id=41389642
I commented to my colleagues:
Remember when people would find good websites and share them!
I don't think it looks like search today. Google got where they were because they were 10x better than everything else and had an experience focusing on what mattered at the time. I don't think the 10x experience will look like ten blue links. I don't know what that next experience is, but I'll know it when I see it.
But instead they use it to build a browser no one wants and an email service no one needs.
Yet their search website is still broken here and there...
I wouldn't take this statement at face value. This is most likely a BS PR excuse for Apple to maintain their current deal with Google. I wouldn't expect anything less from any large corporation looking to protect $20 billion in annual revenue.
Bing is objectively worse.
- Current Google was bad compared to original Google (it ignores my keywords, even if I use doublequotes and verbatim)
- DuckDuckGo and Bing managed to be worse
- Kagi is like old Google
However I think the model will be changed to something more like Perplexity.ai
I've switched to Perplexity and for most of the searches it works better than Kagi.
They'd need to add something like this to survive in the long run, because for exploratory searches tools like Perplexity are really good.
We will not, however, pay some four or five figure SSO tax for every SaaS. We'd be bankrupted.
Kagi should do this, or at least enable domain-specific OIDC/Oauth2 — the ubiquitous "Sign in with" or "Continue with" buttons like http://xsplit.com/user/auth or http://id.atlassian.com/login since MSFT and Google accounts hit almost all businesses — and then just bill the same as the individual pro price + usage pricing.
As it stands, we reimburse employees who buy Kagi individually, this costs us more than the cost of Kagi, and means it's only one off.
PS. Don't get me started on the MacOS and iOS apps that have no retail price version available. Apple provides no way for a firm to provide employees with IAP subscription apps whether BYOD or managed devices. We can, and do, provide any retail priced app for both BYOD and MDM. It shows up in a catalog on the device, people install it, you get a retail sale. Thank you to those devs who make a retail version available, even if its 2x - 5x the annual cost. Empowering employees with apps is a no brainer if devs just let firms pay them to do it.
https://www.audiowaveai.com/p/2626-dawn-of-a-new-era-in-sear...
As for perplexity - I got a free year of Perplexity because I bought a Rabbit R1. I tried it, wasn’t impressed. I use Kagi’s AI assistant all the time. It’s my primary way of getting information from the web. I just type a a free form question into my address bar, append !expert for general questions or !code for technical ones, and seconds later my question is answered and I’m back to work.
Google, these days, seems to mostly ignore whatever I've tried to search for and instead return results that I'd call "more popular". So the top results are mostly generic, useless results and below that it's mostly blog spam or wildly unrelated things.
This is especially bad when I'm looking for specific technical documentation or trying to understand unusual or obscure problems. Usually it's returning nothing useful at all.
Kagi returns results actually related to what I'm looking for more often than not.
The thing that convinced me to pay for it was a single search. I kept hitting _something_ that was causing the Apple TV to stop showing how much time was remaining in a show and instead show something else.
I went to Kagi and searched "netflix apple TV showing wrong time remaining" (my incomplete but best understanding of the problem at the time). Kagi surfaced a result that explained what this was and how it was getting triggered as the fourth result.
I went back to Google and searched the same. Top result was "If you can't change the time or time zone on your Apple Device" from Apple. Second was "Netflix audio is out of sync" from Netflix. With the benefit of knowing what the answer was, I did find a single relevant result about 25 results down mixed in with some blog spam on removing a show from "Continue Watching" on Disney Plus, listicles on hidden ways to make the Apple TV app on your phone even better, and a link to a Google Books copy of a 2008 Men's Health Magazine (?!).
Every time I accidentally end up back on Google it's... jarring to say the least.
I do wonder how far one can get charging for search.
But Kagi advocating for using force to destroy its competitors is completely unacceptable to me and an admission that they do not believe they have a viable product.
Antitrust law is arbitrary and evil. If you make more money than your competitors, you have undue market power. If you price below your competitors, you are dumping. If you price the same as your competitors, you are colluding. The whole thing is a naked power grab by politicians and inferior companies.
This is a sad day. Kagi is the best thing that’s happened to the internet in the past decade. And now I have to stop my auto renew.
Keep in mind that it's the Google that was government funded when it got started via NSF and university system. Also, a ton of subsidies, tax breaks for their subsidiaries, direct payments from government (e.g. google cloud gov contracts, military recruitment ads on youtube...).
Everyone is entitled to their own interpretation, but that is not what the article advocates at all. The article is about what is best for the user given the circumstances, where all other proposed remedies have focused on how to hurt Google, which article argues to be counter-productive.
The ruling has already been made and a remedy will be chosen whether we agree with the ruling or not - so which one is the best for the users? The solution that is proposed in the article would actually mean increased competition in the space, including to Kagi.
*What is the nature of competition in the free market?*
The premise that Google’s success constitutes a monopoly that must be dismantled to protect consumers is rooted in a flawed interpretation of the market and the moral principles governing it.
In a free market, companies make money by offering the products and services demanded by customers. Google's dominance is not the result of coercion but its ability to meet customer demand effectively. Antitrust law declares that the choices of billions of individuals are wrong — not because those trading with the company have been treated unfairly (if they had, they wouldn’t have traded), but because those who know better through some mystical connection with the good declare it so using arbitrary standards.
There have been many competitors to Google, with at least one of them extremely well funded. There are other massive internet-scale companies that could compete with Google, though they’ve often failed in past attempts. Punishing the owners of Google for its success is to punish the very qualities that drive progress and prosperity.
You suggest that Google's control over the search index and its associated business models warrant government intervention to force “fair” access, because it would promote competition. Whatever else you mean by “fair”, you certainly don’t mean “terms that Google would find agreeable”. If one of the parties to a transaction is an unwilling participant, then it cannot be a fair transaction. We have names for this when the coercing party is anyone but the government: blackmail, protection rackets, theft, mugging, and so on.
*What about property rights?*
Simply put, Google’s indexes are the property of Google’s shareholders. Their property was created through decades of sustained investment and innovation. To force it to share its core technologies with competitors would be a gross violation of property rights—the very foundation of a free and productive society.
Your proposal to treat Google’s search index as an "essential facility" that must be shared “fairly” is a call for expropriation—the forced redistribution of Google’s property for the benefit of those whose sole claim to it is that they have not earned it.
The entire argument against Google is that they must be punished because they are successful. Kagi, even as a PBC, will one day be subject to the exact same arguments if it creates as much value as Google—someone will claim that some aspect of Kagi’s operations constitute a monopoly and Kagi should be forced to provide access to some data or service.
*What does it mean for the government to intervene?*
You claim consumers are harmed by the lack of choice in search experiences. Well, what about me? I love your product and pay for it. So do another 32,633 other subscribers. Apparently, we are free to choose your product, and I’m certain many of us advocate for your products to our friends and family. But your product isn’t for everyone. Maybe not even for most. If people want Google and free search, and no one’s rights are being violated, by what authority does anyone get to tell them they are wrong?
Consumers choose Google because they think it meets their needs, not because they are coerced into using it.
Government’s only tool is violence. It can force people to do something (rare, and usually ineffective), or force people to not do something (very common, and usually equally ineffective but with disastrous side effects). Government can’t create "more diverse competition" by forcing Google’s to share its property; competition is the result of companies striving to innovate and offer better products. Government can’t level the playing field by forcing the competitors to do better, it can only cut off the legs of the best players to bring them down to the level of the competition.
Were your proposal for the Google index actually implement, I speculate it would be a few short years until you would see the government declare all other company’s indices to be illegal and subject to regulation--after all, if its a public good, then it must be supported and defended by the public against threats. I would expect to see forced mergers of other indices into the Google index.
*What about non-moral arguments?*
I’m not going to go into detail, but based on my education and extensive past research, in good faith I assert the following to be true: there are no documented instances of long-lasting monopolies that are not perpetuated by governments; every antitrust intervention has made consumers materially worse off; government interventions nearly always have the effect of significantly reducing competition.
If you identify of a robust counter example to any of those claims, you could probably win a Nobel prize.
*What could you do instead?*
A judge has made an unjust ruling advancing an unjust law toward unjust ends--that doesn’t mean the conversation should shift to deciding on the best unjust punishments. That would be a contradiction: there is no such thing as the most-just-unjust punishment.
You could continue your education and advocacy of the true cost of “free” search. Take out an OpEd in NYT hammering Apple’s duplicitous claims of being privacy respecting while sending every customer’s most intimate thoughts to the Google monstrosity by default. Demonstrate the cost to privacy. Demonstrate how your product exposes resources that are hidden by the search giants. Show people the super powers they can have for a modest fee.
And build a better product. Hell, make a Kagi Phone—I’d preorder one even if it were years away from release—and free us from the Apple/Google hegemony.
they are still trying to fight the google by building pretty much the same product
while perplexity is obviously in the lead by being ai-first