What happened? Did they let their golden goose starve to death? Or is this all in my head and Google is actually fine for everyone else?
Google search has gotten that bad recently.
I'd say that I'm not sure the search you linked supports the idea that people have been asking about a decline in quality in Google for 13 years continuously. Most of the threads about that topic seem to start about 4 years ago and pick up in frequency until the now.
Though this could just be a result of the search engine's limited ability to index old threads or some other sampling bias (like there being more users on the site recently).
All of that said here's roughly what I saw in the threads you linked:
2009: People notice that advanced settings in search seem to not work as well as before.
2011: It seems like Google now will start to ignore some of the words in your query if they're uncommon.
2011: Google starts giving preferential treatment to its own products like Google Reviews, Shopping, and Youtube.
2012: Users note Google Plus's "+1"s seem to have an outsized influence on search results.
2013: Google Hummingbird launches - it now appears that your query always gets passed through an interpretation layer. Making precise queries becomes significantly more difficult. Note this was probably (?) to make voice search return better results since those queries were more conversational.
2015: The importance of site performance in general and mobile specifically seems to rankings seemingly increased significantly. Likely influenced by Android One project.
2016: Google increases the importance of following their "webmaster guidelines" and W3C validation.
2016: Amit Singhal who oversaw Google Search for 15 years leaves the company to join Uber.
2017: Google starts to try and identify and deprioritize "offensive", "upsetting", and "inappropriate" content, as crowdsourced from users and paid evaluators.
2018: Google launches an update aimed at websites with low "expertise, authority, or trustworthiness." They state that they punish sites which are "not in line with the general scientific consensus" or which have negative sentiment associated with them on review sites.
- It's about this point that threads about Google's decreasing search quality show a strong uptick.
2019: Google starts using BERT - a machine learning model for interpreting search queries.
2020: Google releases updates aimed at further reducing the rankings of "misinformation" or "biased" content.
- At this point there appears to be strong consensus, even outside of technical circles that Google's search results are deteriorating. To the point where there are memes and popular Youtube videos about it.
2021: Google releases MUM - an AI/ML model that aims to reduce the number of queries a user needs to make by including results for related queries that it thinks you'll make next.
2022 (May): An update to Google's search algorithm deprioritizes reference websites like dictionaries, lyric websites, and wikis while promoting video content.
2022 (June): After years denying that a pages 'freshness' affects its ranking, Google increases its importance in the algorithm dramatically - reportedly affecting the ranking of 35% of search results.
----
I'm not a search engine guy and don't have any particular knowledge about Google's search engine. But reading through the history of HN discussion about it does paint an interesting picture.
I'll leave it to others to speculate about any correlations between these changes and the end product.
You're right, it was definitely hyperbolic to say it's been posted monthly on a continuous basis dating back that far. As you've pointed out, it would have been more accurate to say it was first posted 14 years ago and has been posted (close to) monthly for the last few years.
Case in point: I was just trying to find some information about TGV rail routes. Standard Google searched yielded nothing but spam. Appending “Reddit” game me posts more than 5 years old with out of date information.
It feels as if internet is sharded just like MMORPG servers. Nowadays there are more humans than ever with access to internet but organic creations are getting harder to find and everything is swamped with SEO spam.
I still can’t exactly point my finger on what’s wrong.
https://forum.agoraroad.com/index.php?threads/dead-internet-...
The constant moving of communities is also a problem. We dropped irc, join us on slack. Crickets.. er we moved to discord. Each move must impact engagement negatively.
The quality of _the web_ has been in decline lately.
With ML’s capacity to paraphrase original content and to generate plausible rubbish content from scratch, it’s very difficult for Google’s pagerank (or whatever they call their algo these days) to fight back.
That been said, there does seem to be a fair bit of scraping and paste going on. I’m surprised G is not looking at published dates and lowering the the scammers ranking.
It works really poorly when all the links are paid for, or bargained for, or part of social media sites that are overrun with spammers and use rel=nofollow anyway, or are internal-only because every site wants to be its own walled garden. That's the web we've got now.
I suspect google is boosting ad supported content over non ad supported content. Directly incentivizing paraphrased/copy pasta content.
Search engines should let us configure a whitelist of sites for certain categories/context of search.
I'm on the fence on this.
Contributors and commenters on HN manage to surface many interesting sites that would be difficult to find using a modern search engine. There is also a large number of old sites that have continued to exist, even if they are otherwise unmaintained, which are also difficult to find using a modern search engine. On the other hand, these sites may be less common than they used to be.
Likewise, scraped and pasted websites are nothing new. It has always been a relatively low effort way to post content. What seems to be new is how often nearly identical pages appear in the top search results. This could be because it is more common, but it may also be because the algorithms are favouring very particular types of content.
It's time for the cycle to begin again.
On one hand you have Google, trying to keep patching an algorithm while also maintaining an ad revenue stream.
On the other hand you have the whole of humanity's ingenuity trying to make money for themselves.
Why do we think one system is going to be able to defeat the whole universe of creative hacks to try to beat that system at any reasonable rate?
(It's a wonder that it's even still adequate... but as long as the main competitors are using the same machine-and-math-driven approach, just with fewer resources to throw at it, it's unlikely that there will be a ubiquitous replacement making a major improvement any time soon.)
And then the products - does anyone wake up every day feeling super excited to help push ads and shrill partisan hacks to boomers? It’s like to work there you just accept “this is how it is and I like money” and you punch a card and leave. It must be a miserable existence for anyone who truly ponders it.
1. Search has to be hard because filtering spam is a cat and mouse game inherently.
2. Useful information has moved off of searchable sources. Instead of forums, we have discord, facebook groups, slack channels, twitter...
3. There is simply more shit, exponentially, than when I was young, and therefore there's more garbage lying around. The ratio of shit:not shit hasn't gotten better.
Recently, I couldn't remember the domain name for the web game, https://plurality.fun. Searching "plurality game" in DDG, Brave, and even Bing, the results are quite bad¹. However, the site I wanted was the #1 result on Google.
I've been using alternative search engines like DDG since ~2015, and I'm quite disappointed at the lack of innovation so far. It feels like a $20 bill on the sidewalk, so I assume I'm missing some reason why they all suck, but for the life of me I don't have an explanation.
¹To be fair, the site I wanted is pretty crappy, and not necessarily what everyone wants with those search terms. Nevertheless, it was #14 on Brave's search, where 3 of the top 5 results are links to the same academic paper, "When are plurality rule voting games dominance-solvable?" on three separate domains.
The web is the lowest common denominator. The high quality information is elsewhere and Google isn't allowed to reveal it due to rights holders, such as Google Scholar and Google Books.
I tend to spend my time on Infinity Family https://o2oo.li and https://halfbakery.com a community of inventors and creators. We have an ontology to solve the world's problems and work towards goals and projects as a community. We relax and share opinionated perspectives to problems in the world.
Also, I presume the search algorithm is not how it used to be too, what with ML models being used left and right. I wonder if anyone really knows how we get the results anymore. There are a few initiatives like kagi but I know too few about them to say anything at this point. For now, I am quite content with appending "reddit" at the end of every non-technical query nowadays... at least until they start gaming this as well.
1) I can search for a handful of specific terms I remember were on a webpage I didn't bookmark, and I can type them in and find what I'm looking for instead of them being "interpreted" by something trying to be too smart
2) Nobody bothers to send them the same DMCA takedowns that they send to Google, so I can find torrents and things I couldn't otherwise
But on most days, Google is easier to find useful stuff when I don't know specifically what I'm looking for or the exact terms to use.
In an ideal world, I'd have Google default to more of how DuckDuckGo seems to work, with a "fuzzy search" option you can use if needed
Maybe I'm wrong. This is just a hunch based on looking at how big online communities fail, and how services that rely on user generated content fail. At a certain scale, you can no longer rely on people being good actors with light moderation. You need to tighten up moderation, even if it means making the community overall worse than it's peak. As, the alternative is letting the bad actors make it even worse than that.
More and more lately I've been coming across spam, reposted content from sites like StackOverflow, w3schools, Reddit, etc., but posted on different domains, on a page plastered with ads. To me, this is a peak into the arms race between Google's Search trying to return good results, and bad actors trying to get their spam to the front results.
I think google dorking make the suggestions better. Also, using bangs on duckduckgo has vastly improved my search results. If you know where to find stuff, you will get good results. For example if I am looking for review or recommendations, I use site:"reddit.com" .
I'm talking innocuous stuff not political or porn or warez etc.
I know the domain name of a vintage computer wiki site. I search the name by name, I don't even need a search engine to find me the name, I did the main work for them already and showed up already knowing a domain name, and yet even so, all I get are 20 pages of links to mail list posts archived by narkive, and never the actual site. Google, of course, first link and most links on the first and all other pages of results.
It's bizarre and makes you wonder about all those unknown unknowns.
Or just search on other search providers like you.com, qwant.com, startpage.com or DDG.
In short, Internet as we have known it has fallen apart because there are hardly any true organic links left. They used to be the lifeblood of the Web, the links based SOLELY on the relevance and quality of the linked page. Such a notion is quaint these days.
The Web used to grow exponentially in the early days, from say 1994-2008. Google launched with 100M webpages in their index, then they grew it to 1B in 2000 but they were NOT the first to reach that milestone, they were beaten to it by a Norwegian search company FAST that launched alltheweb.com with 1B before Google in 2000.
FAST was acquired by Yahoo in 2003 and you can guess what happened to them. For a few years Google and Yahoo were playing off against each other in terms of the size of their indexes, with Google always in the lead. But around 2003 the game of numbers stopped as they both announced they would stop publishing numbers. The Web continued to grow, still basically exponentially, and the next big milestone was announcement of Cuil in 2008. Cuil was a competing search engine created by a top team of Anna Patterson and Tom Costello, also with Luis Monier from Altavista. Their claim was that they would launch with an index of 120B pages, with an index bigger than Google.
That was widely considered an outrageous claim, as the notion that Google knows practically everything was already firmly entrenched. But they did manage to stir things up a bit, to the point of Google issuing a vague release in their official blog claiming they knew about 1 trillion urls. Of course they did not mention anything about indexing all that but the damage was done.
Shortly upon launch, the quality of Cuil results turned out to be far worse than expected which is really a shame as their basic premises were spot on, apart from index quality. Cuil then promptly fizzled out.
Note that projecting exponential growth (doubling every 18 mo i.e. quadrupling every three years) since 2008 we would expect 4^4*120B or more than 3 quadrillion(!) for the size of the Web index, with Google knowing 8 times that.
Such an expectation is plain silly, especially having simple queries such as e.g. 'Novak Djokovic', or 'Roger Federer' on Google returning less than 100 results.
But all this is only a (smaller) part of the story. Indexing is now a LEGACY technology, more than 20 years old. Users expect much more than returning bunch of blue links with matched keywords in response to their queries. They want much of the time direct answers to their questions.
The technology to do it has been known for 10 years now, in terms of dense vectors also known as word, sentence and other types of embeddings. Direct answers would be then found by nearest neighbor search. The scale of the system would of course have to be in the billions. BTW, it is a very interesting open question how many direct answers Google can provide now, in terms of infoboxes/featured snippets. Google has been coy about the issue but in my professional opinion, as a founder of multiple search engines, the answer is no more than around 20B. Feel free to shed more light on the subject and challenge this number.
In summary, the time has come to have a system based on vectors and nearest neighbor search with billions of vectors, giving direct answers to queries, with no ads nor tracking,and hopefully with API too.
One more DISCLAIMER, such systems are online for all to try and play with, at https://qaagi.com (for causal queries about causes and effects of things with billions of ranked answers) and https://yottaanswers.com (for factoid and and general questions what/how/where etc. with billions of answers). Both of the projects are led and principally funded by me, Borislav Agapiev.
Well, yes (and Google does that remarkably well sometimes) ... but also no. I often just want good results, damn it.
I do suspect that this type of technology would be helpful in judging whether a combination of words might actually be relevant to the query rather than just containing some of the same words - so the underlying challenge might end up being the same.
I’m also not concerned with privacy/whatever so I won’t switch to something equally as good or worse to make a statement.
Stop using Google products. Block their ads. Don't use their web browser.
I recommend Mozilla Firefox and uBlock Origin but any other web browser reduces their tracking and allows ad blocking. It looks like Google Chrome will prevent ad blockers soon.
It serve its users, you are probably in the niche category that's declining, or rather the casual group is growing in number and as a result your category is being diluted even more, tech stuff usually are old stuff, users prefer new and up to date topics
If you are looking for something specific, i suggest trying few tricks to get better specific results: https://betterprogramming.pub/11-tricks-to-master-the-art-of...
Text search on the web will slowly die. No one trusts the results of random text. Google is in the adversarial position of wanting to censor certain answers as well as present answers that maximize their own revenue.
People will search video based content instead, and use the fact that a human spoke the information, as well as comments/upvotes to vet it as trustworthy material (like on TikTok).
Google search as we know it will slowly die, and then will decline like Facebook. TikTok will steal search marketshare as their video clips span all of human life.
Ha, is this a joke? You can pay people on Fiver to produce videos for you given any script at all