The mermaid is taking over Google search in Norway (opens in new tab)

(alexskra.com)

903 pointsoarth4y ago346 comments

346 comments

I have some experience on this field. Around two years ago i was a DevOp for the company running Dagbladet, Norways #2 newspaper. One of the things I did was keep an eye on mysterious traffic.

I managed to find a huge spam network that set up a proxy service that delivered normal content, but injected "you can win an iPhone!" spam to all users visiting them.

Since I was in the position of being able to monitor their proxy traffic towards many sites I managed. I could easily document their behaviour.

In the same time, I wrote a crawler that visited their sites over a long, long time. I learned that they kept injecting hidden links to other sites in their network, so I did let my bot look at those also.

By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

My bot found in excess of 100K domains being used for this operation, targeting all of westeren Europe. All the 100K sites contained proxied content and was hidden behind Cloudflare, but thanks to the position I had, I managed to find their backend anyways.

We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

Oh, and the journalist? He did find a Dutch company that was not happy to see neither him or the photographer :)

avian4y ago

> We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

As someone that tried reporting spam sites because they were using content scrapped from my website, I'm not surprised.

Cloudflare has a policy that they will not stop providing their IP hiding/reverse proxy services to anyone, regardless of complaints. The best they do is forward your complaint to the owner of the website, who is free to ignore it.

They say "we're not a hosting provider" as if that's an excuse that they can't refuse to offer their service. I'm sure many spam websites would go away if they couldn't hide behind Cloudflare.

yosamino4y ago

> The best they do is forward your complaint to the owner of the website, who is free to ignore it.

Or worse. Since I have no way to know beforehand who I would be dealing with, this is actively dangerous - what if the mobster running this site is having a bad day and choses to retaliate ?

Also what a stupid fucking policy that is. Even if you are not legally compelled to block content, what is the point of actively helping distibute harmful content?

What they are doing is worse than just saying "We are not a hosting provider" - because while what is true, they are actively distributing content that is hosted elsewhere while hiding who is hosting it.

One can easily write an email to abuse@hoster.example.com and usually these people do not want garbage on their networks. CF is making it impossible to do notify them, and they refuse to implement an alternative procedure.

I still do not understand the moral position of profiting off of enabling criminal scum, when it would be so easy not to...

5 more replies

eru4y ago

They might take that stance, to avoid liability and complication.

At the moment, they have a very clear rule. If they stop providing services to obvious spammers, they will create lots of grey areas, and they will also implicitly make a judgement that the client they still serve are _good_ in some way, and an enterprising lawyer or muckraker might exploit that.

3 more replies

mattbee4y ago

This policy even extends to stresser/booster/DoS-for-hire services services - try searching for some and see who fronts them?

20 years ago the transit providers of the internet would have spotted Cloudflare for what it is, and cut it off.

guest1598354y ago

I've been reporting hundreds of spam sites to Cloudflare, but always get the same lame excuse. Godaddy the same. Meanwhile good content drops in Google rankings and spam moves to the top.

FeepingCreature4y ago

That seems like the sort of thing that should require a judge's order.

1 more reply

andyjohnson04y ago

I'm pretty sure they stopped providing services to a neo-nazi site a few years ago. A decision that I am completely happy with btw.

nine_k4y ago

This is very rational of them. They position themselves as a pipe for "bytes", not "content".

By ignoring the content they serve, they rid themselves of the necessity to analyze and judge what they serve. Not only would this require a brain the size of a planet and the expense of running it, but also would inevitably conflict with someone else's judgments, and bring various PR woes.

They don't analyze the internals of their traffic the way internet backbone providers don't analyze the internals of the traffic they pass around.

I frankly find this position superior: imho it does more good by preventing censorship than harm by serving good-intentioned and bad-intentioned customers alike.

rowanG0774y ago

In fact I completely agree with that stance. It's not cloudfares job to police the content. They provide a simple service. If something is unlawful law enforcement should go after the owners.

1 more reply

dylan6044y ago

That sounds like a hell of an investigation, and now my curiosity is running. 100k domains sounds like an huge amount of logistics on their side to keep it all running. It would be interesting to read about how a spam company manages that kind of infrastructure compared to a "legit" company.

Legit company will always have internal struggles between dev/sales/marketing, so things just take longer and are much more draining to accomplish. I'd imaginge spam org just needs to have bare minimum stuff up to satisfy whatever need it is they have knowing that humans won't necessarily be perusing those domains, yet it's 100K domains. I could almost see something like this running more smoothly. I can also see it being run by small number of people that let things lapse and it's just barely hanging together. So many questions...

tluyben24y ago

It is not very difficult to manage: a company of mine was bought by a squatter (I found out after dealing with a broker for the sale; I had to integrate it with their 'tech team' and walked away after) and for many years already, this all has been fairly easy to automate. The registars have apis, cloud flare has apis. There was 1 tech guy keeping it all up and running and he didn't have to do anything. It would register and provision with content automatically. There is really almost no work involved besides keeping money in the registrar account and the costs are only the domains probably, maybe they have a little hetzner load balanced setup with 2 machines but that's likely it.

tikiman1634y ago

The reason you found so many domains is that they intentionally take down thier spam sites and reload them under a new domain every few hours. They do this so they can't be taken down by people reporting them as spam. They literally setup the next domain while the current one starts being used so they can do a live swap to the next one without interruptions to thier spam operations. This is typically done in an effort to spread Trojan malware to anybody running computers with out of date operating systems and browsers. Windows getting people off of Internet Explorer has been a huge hit for them as it reduces the number of possible vulnerabilities someone might have when they get sent to one of these Trojan spam sites.

ultimoo4y ago

> By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

Very curious to know what you found!

Ueland4y ago

We did publish a whole series about the network and companies we found in the process, unfortunately in Norwegian only and soft-paywalled: https://www.dagbladet.no/nyheter/sonjas-52-oppdagelse-avslor...

1 more reply

lifeisstillgood4y ago

Can I just clarify?

There is / are organisations that a) scrape legitimate sites for content, b) host that content on their own 100K domains, c) sit behind cloudflare, d) do some seo??? e) when someone finds their site they then inject an ad or similar rubbish f) do this enough that they make money off the ad / competition / porn ?

That seems like a problem that the ”original-source” metatag was supposed to stop?

tyingq4y ago

Canonical urls help with noting your own purposeful duplicated content. But that meta tag goes on the duplicated content. So it doesn't help with scrapers, who strip that out.

1 more reply

pepy4y ago

Do you want to get to the bottom of this? A friend of mine is a top Dutch lawyer with an interest in these things.

Ueland4y ago

This was two years ago and the network is now (to my knowledge) gone.

1 more reply

keyme4y ago

Google search has progressively deteriorated in quality over the last 10 years, to the point where I see it becoming useless in the relatively near future. And it's mainly not even their fault.

I've been using Google search for all kinds of research for 15 years. There used to be a time when you could find the answer to pretty much anything. I could find leaked source codes on public FTP servers, links to pirated software and keygens, detailed instructions for a variety of useful things. That was the golden age of the web.

These days, all the "interesting" data on the Internet is all inside closed Telegram chats, facebook groups, Discords or the rare public website here and there that Google doesn't want to index (like sci-hub, or other piracy sites).

The data that remains on SERPs is now also heavily censored for arbitrary reasons. "For your health", "For your protection". Google search is done.

omega34y ago

> And it's mainly not even their fault.

It's precisely their fault, they've created an environment that incentivizes low quality, irrelevant content and are actively hostile towards users. Two examples just from top of my head: ignoring the country website, previously if you wanted to search only local news it was very easy to do. Another was ignoring completely the exact phrase search with double brackets.

spaniard892774y ago

Ignoring double brackets drives me crazy. That's the last straw that sent me to DDG, although I have to say that DDG isn't much better either.

What made me really angry aboyt Google Search was when they removed their function to search in discussion forums. But even then you could more or less filter out crap.

Nowadays it feels very hard. I find myself using the site: flag many times, but you need to know the site beforehand, which is another problem.

3 more replies

remus4y ago

> It's precisely their fault, they've created an environment that incentivizes low quality, irrelevant content and are actively hostile towards users.

I think this is an overly harsh take. I strongly suspect that any algorithm for ranking search results is open to gaming and manipulation by malicious users.

1 more reply

account424y ago

> gnoring the country website, previously if you wanted to search only local news it was very easy to do

Also the opposite: insisting on pushing local and localized results on google.com even when I set my browser language to english.

1 more reply

yreg4y ago

How do any of those make people talk inside closed Discord groups instead of the open web?

BizarroLand4y ago

I'm sure public human SEO manipulation is at least partly to blame. The only thing that is surprising is that it isn't worse than it is. At least the first half page is usually close to what you want.

Adrig4y ago

One of the last use case for Google is being a proper search engine for Reddit. But I think they are aware of their downfall, that's why the top of the page is increasingly taken by their widgets to provide directly the information.

On the other hand, Youtube is the second most popular search engine and I don't see it slowing down. What an insight they had when they bought it.

Edit : I entirely agree to the fact that valuable information is found more in communities nowadays. I also predict that the web in 5 years will be mostly explored through communities

the_duke4y ago

> that's why the top of the page is increasingly taken by their widgets to provide directly the information.

Another reason for that is user retention.

If you get your information directly on google.com, you won't navigate away, probably search again, and bring in more ad revenue.

wil4214y ago

When I’m looking for reviews of a product I usually type XXX review Reddit to avoid the XXX top 10 list blog spam that google returns. I don’t want a review from someone who just jumbled together a top 10 list without ever looking at the product in person.

tonypace4y ago

YouTube search is regressing quickly. They're losing there too.

nuker4y ago

> Google search has progressively deteriorated in quality

49 out of 50 review sites are now just affiliate links to Amazon. “Check the price on Amazon” buttons is the main content there

wccrawford4y ago

I've noticed this a lot lately. There are words on the page that look like a description of the product and a review, but once you really read them you see that they could be generated by a bot and they don't actually review the product, just describe the basic properties of it. Then they provide that button.

2 more replies

mojzu4y ago

I think it depends on what you’re searching for, for dev related stuff no other search engine I’ve tried comes close. But there are whole industries now that are so heavily SEO’d that finding useful information without knowing the exact keyword to search for is incredibly frustrating

kall4y ago

I agree, and I‘ve read the opinion too that it‘s a problem people have with DDG. Yet google doesn’t feel excellent at that. Could it be worth competing with google there? I‘m not gonna say it‘s "easy", but maybe worthwhile and possible?

I don‘t think I have used more than 1000 different sites in all development searches ever. It‘s the stack exchange network, github, official documentation, non-github official issue tracking/communities and some high quality blogs. That seems very manageable. You could probably index that into one elasticsearch and one sourcegraph instance. Add a little more specific faceted search, add back powerful and precise query syntax and still maintain "just past in whatever and hit the first result" functionality. I‘m likely underestimating the breadth of other developer needs than my own. I don‘t know.

1 more reply

jacobolus4y ago

Google scholar search is still very useful.

DuckDuckGo is nowadays more useful than Google for my web searches.

smusamashah4y ago

You should try yandex.ru for all that interesting stuff. They don't censor any of it.

kkoncevicius4y ago

Google seems to also place less emphasis on search phrases. When searching for exact article names I easily find them on DuckDuckGo, but not on Google. Two recent search-term examples:

1. the scientific worldview needs an update

2. from reproducibility to over reproducibility

account424y ago

In general, google no longer primarily searches what you asked them but for what they think you want. This might be better for the average user but can be extremely frustrating when you are trying to find something more niche.

2 more replies

yetanotheralexn4y ago

Your examples seem to work for me (the second one only if combined with double quotes). Do you have more? https://snipboard.io/PYhNHW.jpg https://snipboard.io/HvRaiE.jpg

It would be cool to find datapoints for a proper bug report for Google :)

1 more reply

cratermoon4y ago

Whether or not it's Google's fault depends on how much you attribute the development of the advertising-driving distraction factory internet to Google's business. We can debate whether or not Google was ever really in the search engine business – certainly at one point the search was a useful tool. Today, Google search is a sort of glorified Yellow Pages*. Their main product is selling ads in this Nouveau YP. The results their search engine returns are now heavily skewed towards revenue-generating sites. Such sites may incidentally be informative, but they are generally selling something.

Edit: see this other HN story: https://news.ycombinator.com/item?id=27993564

This is not to say that all search results are bought, although of course those are present now, too. But overall Google presumes that whatever the user is searching for, the best result is one where the answer is "buy this thing".

For those search results that don't lead directly to commercial products, the revenue generation is indirect: through the collection of user preferences and activity, Google can refine its search results towards maximizing revenue. At the very least, the result is likely to be a site that has ads, some of which generate revenue directly for Google.

*In the old-fashioned Yellow Pages book, you couldn’t really “search,” but there was an index by category. It had many of the issues inherent in categories, but it didn’t take an expert to find things. Google search eliminates the needs for anyone to understand a taxonomy of businesses.

herbst4y ago

Google only recently started to totally butcher the Swiss search results. For some reason I could still find direct download links to movies and music a few years ago (kinda legal here).

Now such search results often don't even get a second page...

IfOnlyYouKnew4y ago

If 90% of what you’re searching for is keygens and „inside closed Telegram groups“, it might just be time to grow up?

janmo4y ago

I've seen the same here in Germany but they do appear only if you use the results within the last 24h functionality. It looks like the German content is generated through GPT2 or 3. It makes no real sense if you read it. If you go on the page you are immediately redirected to a scam just like the article mentions. Interestingly they use ".it" domains here. It also looks like the domains might have been hacked or are expired domains that have been bought.

For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...

How do they rank so well?

I've checked the domain on ahref and it has almost no backlinks. But if you look closely you will see that all the results that rank very well have been added very recently. On the screenshots in the article you can see things like "for 2 timer siden" which means 2 hours ago. It looks like google is ranking pages that have a very recent publishing date higher.

Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...

adventured4y ago

Typically Google has a warming/trial period for new large content sites, after their search bot is introduced to the content and has spidered its way through the site.

For example there used to be a very common content farm system, that was structured like like this:

https://domainsites.com/site/nytimes.com

So when people searched for sites by domain name, the zillions of low traffic long-tail results of this farm system would be all over Google's results.

What it would present on the page is a mess of data about nytimes.com, such as traffic, or keywords pulled from the site header, maybe a manufactured description (or pulled right from the site head), sometimes images / screenshots of the site. Anything that could be stuffed in there to fill up enough content to get Google to not do an automatic shallow content kill penalty on the content farm. This worked for several years very successfully until Google's big algorithm updates, 9-10 years ago or whatever now (Penguin et al.). You could just build a large index of the top million domains (eg Alexa and Quantcast used to provide that index in a zip file), spider & scrape info from the domains, and build a content farm index out of it and have a million pages of content to then hand off to Googlebot.

So initially such a farm will boom into the search rankings, Google would give them a trial period and let out the flood gates of traffic to the site. Then Google would promptly kill off the content farm after the free run period expired and they had figured out it was a garbage site.

I still occasionally see this model of content farm burst up into traffic rankings, and it's usually very short lived. It makes me wonder if that's not more or less what's going on with the Mermaid farm.

kostecki4y ago

This definitely looks like an expired domain that was bought. Havfruen seems to be a restaurant in the city of Korsør - which conveniently have the postal code of 4220.

NorwegianDude4y ago

.it pages are used in Norway too, but I'm not sure it's something GPT-ish that's being used. Whole sentences are copied word for word from other articles.(might be a small dataset it's trained on?)

It could of course be that its something similar to GPT that is trained on all the content it could find and then writes articles, cause it's clearly messing up sometimes, form the small piece of content available at the search results page.

I'm not sure if this is an ML race and the reason we're not seeing the same thing in English is because Google might understands English better than spammers. While in Norwegian and German it's the other way around?

Clearly freshness is a large part of it. Google seems to have indexed millions upon millions of pages tied to this in the last 24 hours.

ROARosen4y ago

Seems like not a new thing. Here is a warning tweet from beginning July from Danish Cybersec guy @peterkruse who saw his name coming up for a different domain owned by the same registrant as havfruen4220.dk

https://twitter.com/peterkruse/status/1410895961803665410

nmstoker4y ago

I presume "GPL" was an autocorrect from the intended "GPT" right?

janmo4y ago

Correct, it was a typo

1 more reply

MrUnderhill4y ago

Interesting, I've been seeing the same spam for Norwegian searches, but with the domain nem-multiservice dot dk, or nem-varmepumper dot dk - presumably another legitimate business' domain that expired and was grabbed by the scammers. Visiting those domains show the same graphic as shown in the article.

Almost any search in Norwegian will have obvious scam sites like these in the top 10 results.

Other domains part of the same scam that show up in my results today: mariesofie dot dk, bvosvejsogmontage dot dk

I wonder if it is related to this: https://www.dk-hostmaster.dk/en/news/dk-hostmaster-takes-102...

NorwegianDude4y ago

Yup. Those domains are the same thing, and redirects to the same thing. There are even more domains.

Never seen anything on this scale before. I can search for basically anything(tax rules, baking, stocks, property, hygiene...) and Google will most likely show those domains somewhere.

e_carra4y ago

I had similar experiences with: https://www.xspdf.com/resolution/51859292.html

The content seems taken from other websites and mixed in a nonsensical way. It comes up frequently in my search results. www.xspdf.com has completely unrelated content and seems a separate business.

ricardo814y ago

Poor man's cloaking

curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' 'https://havfruen4220.dk' > 1.html

curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk' > 2.html

diff 1.html 2.html

7d6

gvb4y ago

The "diff" output (above) needs an extra line break to avoid HN automatic line wrapping. The output of the diff command is:

diff 1.html 2.html

NorwegianDude4y ago

I've noticed this daily.

Would be interesting to see the actual content. Based on the small snippets in the search results, it takes content from other sites, like large Norwegian news sites, and somehow outranks them hard.

I wonder what the Google Search Console looks like for that domain, considering that it's probably getting millions worth of free traffic.

EDIT: After looking more at it, it's insane how much it ranks for and how well. Straight up brand names seems to be the hardest to compere with, at least larger ones. Those seems to be around page 4-5 for me.

Some brands I was unable to find at all, but ironically another .dk domain showed up in it's place that did the same thing. There is also some .it domains using the same content.

I've found that it takes contents from multiple sources and glues it together in sometimes great ways. Like one sentence from this page, another thing from that page.

Maybe this is some ML that collects content and pieces a lot of it together sentences or half sentences to one large article? It's clearly from completely different sources, but about the same thing.

Example: "wash car"

Result in google: "A dark winter with snow and salt is hard on the car, and it's extra important to wash the car" - Collected from one article.

"Keep the pressure washer at 30-50 cm from the car..." - From another article.

Ironically, there is like 11 results all tied to this thing outranking the original articles(those are last), even if it's medium to large sized well known companies selling for billion(s) of dollars each year in Norway.

Sometimes it goes from one thing and switches to something completely unrelated, so I guess the spammers still have something to improve.

Weird.

weird-eye-issue4y ago

Some data on their traffic from some SEO tools I pay for:

Ahrefs: 230k organic traffic valued at $124k SEMRush: 558k organic traffic valued at $355k

These are estimates and can be widely under or overestimated but they show that this is happening on a very large scale.

For a quick idea on how this is possible I looked at their top pages (according to Ahrefs). Their top page is ranking #2 for the keyword "interia" which has 207k searches per month in Norway and is rated as 0 (out of 100) for being easy to rank for. Usually when a keyword has that amount of searches it would be incredibly hard to rank for, I've never seen anything like this. So what is happening here looks like they are just taking advantage of a market with really low competition keywords.

NorwegianDude4y ago

Interia is a large polish web portal, from what I could find. Norwegian people doesn't know it, but polish people might. There is probably around ~2 % polish people in Norway. It also ranks as #1 for me. It's in polish too, so basically only ~2 % of Norway would understand it.

However, the weird thing it that it steals content from articles, and then outranks them. Most pages seems to be boosted, maybe as a result of it being new. (Most content is just hours old)

Could you check these too? (exactly the same thing, but newer, it seems) www.mariesofie.dk nem-varmepumper.dk

Clearly reused domains.

weird-eye-issue4y ago

The keyword data was based on searches in Norway alone, it is an order of magnitude higher in Poland. In Norway almost anybody could rank for that keyword if they tried due to the difficulty being different based on location and language.

Ueland4y ago

Sidenote but what do you think about Ahrefs? I'm doing some tests to see how easy it is to get ranked for keywords (with actual helpful content, not crap like this thread is about), but i find the Adsense keyword tool not that helpful as they delete many keywords when you search for them, which kinda voids that tool.

But I currently feel that paying $100/mo for Ahrefs for something I do as a side project is a tad wasteful.

weird-eye-issue4y ago

You need a tool like Ahrefs or SEMRush for competitor analysis and keyword research. One trick with Ahrefs if you want to be frugal is to pay for the $7 trial and use it as much as possible during the trial to do your keyword research and cancel. Technically if you are efficient enough that trial could get you months worth of content at least.

555554y ago

ahrefs is the best in the business.

1 more reply

gnyman4y ago

Pet theory (disclaimer that I know very little about SEO) would be that the website with the cloned content loads fast and does not load 4 MiB of javascript, thus beating the original content in ranking mostly because of the speed, which is I believe a important factor in Google rankings (and getting more important).

And add to that the some link spam and preventing the visitors to return not get any bounce back...

Either way, I can't help to be a bit impressed by the SEO spammers outsmarting the people at Google. (Edit: and I don't mean to say they are smarter or anything, just that they only need to find one weakness in the algorithm while the people working to improve it needs to make it works for everything.)

jmiserez4y ago

Once the hard requirement on speed impacts the quality of results it no longer helps me as a user. I'd rather have the sites invest their time in good content and wait a few seconds rather than get fast but low quality SEO-ed results. Same with AMP, the quest for speed doesn't make my experience faster if I still load the original page (which is often still necessary).

monday_4y ago

Not sure how relevant this is, but the animal characters in the top image are from a Russian children hit cartoon "The Smesharicks" (literally "The Laughballs").

snickersnee114y ago

Also, the left image of a woman from a russian meme from a tv show.

Schnurpel4y ago

If I would run a global infrastructure company like Cloudflare, I also would not take any sides, and leave my service open to anyone. The world is full of people who get upset about something. However, if I declare a hands-off policy, it must be truly hands-off. Cloudflare kicked off Switter https://www.theverge.com/2018/4/19/17256370/switter-cloudfla..., it banned 8Chan https://blog.cloudflare.com/terminating-service-for-8chan/ , it banned the Hacker News https://mobile.twitter.com/thehackersnews/status/66900183605... . That’s not how hands-off works.

notRobot4y ago

To be clear, that's not HN, but The Hacker News, a different website, known for... dubious reporting.

bigpeopleareold4y ago

I hate dealing with this and now refuse to use Google now when I saw patterns in search results while I was researching common things (like housing) in Norwegian, here in Norway. I rarely use Google these days, but I thought for a second that Google might be better with search results than DDG in Norwegian, but this stuff is aggravating. This is one of those where they screw around with history that you just have to start fresh again on whatever you were doing instead of going back.

edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.

eitland4y ago

> edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.

Another big one is that Norwegians like Germans write words together, just one example from one of the stupid ads: "Spesial Reportasje" is a dead giveaway not only because of the capitalization.)

(Oh well, sadly because of pressure from Words incompetent spell checker over years and lenient teachers this is getting worse. I fear we are seing compound damage here as kids that got away with this are now becoming teachers...)

eru4y ago

In German there was actually quite a lot of historic development about whether to write words together or separate or with hyphens.

The current state of formal German will surely not be the end of history.

See https://de.wikipedia.org/wiki/Leerzeichen_in_Komposita#Gesch... (Might need Google Translate, if you don't speak German.)

1 more reply

bigpeopleareold4y ago

This reminds me of the facebook group: Bilder i kampen mot særskrivingfeil: https://www.facebook.com/ettord :D

(Something like: Pictures in the struggle against mistakes when using spaces between words)

1 more reply

bigpeopleareold4y ago

Just want to add to my comment also that it is not limited to havfruen4220.dk, but clarifies a general pattern. I tried a couple of search terms like 'mattilbud rema 1000' and found more .dk domains on the second page (nem-varmepumper.dk, humanrebels.dk) - two things that have nothing to do with food.

the_biot4y ago

For all that Google search has been utterly crap for going on a decade now, I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this. I like DuckDuckGo for now, but I imagine as they get bigger they're going to be a target for these kinds of spam just the same.

boomlinde4y ago

> they get targetted relentlessly by SEO spam operations like this.

Why, though? There is an arbitrary ranking system that seems increasingly independent of what I actually searched for. Google had created a game where the winner isn't necessarily relevant or at all useful. It's inevitable that spammers will play that game.

skinkestek4y ago

> I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this.

A bit of it is probably that.

Outright ignoring my queries: +, doublequotes, "verbatim" and all takes more than SEO tactics, it takes someone inside Google, either malicious or more probably incompetent on the inside.

Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

logicchains4y ago

>Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

Or maybe Google started applying "We know better than the users", the driving principle behind their software and libraries, to their search.

fauigerzigerk4y ago

Is there really any difference between DDG and Google when it comes to SEO spam? If there is, I sure haven't noticed in spite of using both, often for the same search terms.

It seems to me that the techniques used to spam Google's index work just as well on Bing's index.

raverbashing4y ago

Even worse, getting this kind of spam through to DDG (Bing?) seems easier than on Google

It seems DDG is worse at finding the more authoritative sites about a subject compared to Google.

shuger4y ago

That's an advantage. Since google tuned up their engine to treat authoritative results as better their searches became absolute dogshit.

You search for a very specific thing and all the results are big sites that have said something that contains two of the 6 words you search for in a completely generic article that helps you none.

My favorite is when your query contains a word that is the very essence of what you search for and google chooses to display results without it so you have to do extra click "yes I actually want to search for what I said I want to search for".

rvba4y ago

Because they automated anything and you cannot contact any human from quality assurance.

beebeepka4y ago

Google search has been a brochure for a long time now

dhosek4y ago

The ones thing I want more than anything from google or DuckDuckGo or anyone really is the ability to give a list of domains and never have their results show up in my searches. I know I can do this on a per search basis but I want it to be a configurable setting.

mattwad4y ago

UBlacklist is a plugin that does this. It's so great to be able to hide all those sites that just cache Git issues and SO posts.

RileyJames4y ago

Just to add to that, uBlacklist has a power feature called subscriptions. Which is massively under utilised.

It enables a collaborative effort in blocking spam / low value domains.

If you make a block list, please submit it to the list I’ve made: https://github.com/rjaus/awesome-ublacklist

(There’s no great subscription discovery as yet)

nickysielicki4y ago

Oh man, this plugin is going to save me hours of time over the next 30 years. Goodbye forever, cplusplus.com

dhosek4y ago

I installed it and it's—ok? For search results where the spam overwhelms the signal (it used to be able to do a decent reverse phone lookup by putting a phone number into Google), you end up with empty pages or mostly empty pages in the search results. Better than nothing, but it really should be a feature from the search engine, not a browser plugin.

eitland4y ago

I used to have a text document on my desktop containing a list of domains that contained autogenerated content, each with a minus in front, like:

-stupidautogeneratedcontent1.com -stupidautogeneratedcontent2.com etc

I figured sooner or later Google would pick up the signal but I think instead they just started ignoring my "- requests" as I stopped using them. edit: or maybe they fix the problem. Spam sites used to be a problem during the early decline of Google. I think what happened was that problem actually almost disappeared for me and was replaced by irrelevant results from non-spam-sites

Edit: mahalo.com was one of those, https://en.m.wikipedia.org/wiki/Mahalo.com

dcdc1234y ago

I’ve you tried -site:site.com? I think that still works.

niutech4y ago

Just filter out results using uBlock Origin like this:

   google.*##.g:has(a[href*="example.com"])

matsemann4y ago

Yeah, I've seen this domain a lot lately. But I've complained about the Norwegian results for years [0]. For most searches there will be a result that's just keyword spam ranking high. Retried my "pes anserinus bursitt" search now 2 years later, and two results are spam from havfruen, and there are some other results from https://no.amenajari .org which is also just translated and scraped content for all languages google seems to love, as I've seen it for years. A third domain I often see as well is "nem-varmepumper". Apparently a site about heat pumps has content on everything.

Can't fathom Google not catching this..

[0]: https://news.ycombinator.com/item?id=21621099

porbelm4y ago

When I try that search, havfruen is seventh place. NHI and other good results at the top.

YMMV a lot with Google results. For me, it's usually great where DDG is kinda crap, but not as bad as... shudder ... bing

fogihujy4y ago

With DDG, I found this thread. Google set to Norway as region/language found nothing from havfruen4220.dk, unless I specifically added site:havfruen4220.dk in the search.

My guess is that someone at Google reacted.

2 more replies

bash-j4y ago

The last time I accidentally installed malware on my computer was when the top Google result pointed me to a site masquerading as the official site for the software. That thought me a lesson to pay attention to the domain name.

hayksaakian4y ago

Interesting because it shows that bounce-back is a more significant ranking factor than before.

It seems like they've manipulated rankings by locking people in to reduce their bounce-back stats (in addition to keyword-stuffed content)

ma2rten4y ago

I don't think it necessarily shows that. Their good ranking could be completely unrelated to bouceback.

wokwokwok4y ago

Who knows? It's a black box after all.

...but, you know. Can you see anything else they're doing that would give them that kind of ranking? These pages are just piles of crap, and google is pretty good at filtering that sort of stuff out.

If it was that easy, google would be filled with spam everywhere.

The chance that someone did something random thats very uncommon (block back) and it happened to be a super effective signal to google seems:

a) like an edge case they didn't think of

b) like it'll get fixed pretty fast

c) not that unlikely.

Compared to, say, the idea that some random spammers have built a network of incredibly sophisticated ML-generated pages that can subvert googles algorithms which seems:

a) not substantiated by any obvious content on the pages

b) requires a very high level of sophistication which seems totally lacking

c) very unlikely

...but I mean, who knows right?

We're all just speculating. I guess it'll get fixed soon, and we'll never know.

1 more reply

purplepatrick4y ago

I agree. Tons of sites employ the bounce-back avoidance tactics, and these don’t particularly help their ranking (in fact, lots on non-ranking sites do it — presumably just to keep you on the page)

soheil4y ago

I wonder why Yandex opens every link in a new window. How can they track bounceback?

3 more replies

FeepingCreature4y ago

That seems automatically testable. Load the site in simulator, then look at the URL history.

rwmj4y ago

I've also seen this, but from a different side. I have Google Alerts for many open source projects that I run, but in the past few years these alerts have become all but useless. Spammers scrape genuine pages from all over the place (including ones containing references to my projects) and put them into scammy ".it" domains. These appear both in Google Alerts and high up in Google Search. So alerts and search both become useless. The scam appears to be that when you visit these web pages they say you're the billionth (or whatever) visitor to Google and you've won a prize, just type in your bank details.

This has been going on for years now, so I don't have much confidence that Google is able or willing to fix it.

l0b04y ago

WHOIS shows it's registered four weeks ago by someone in Riga, Latvia.

rataata_jr4y ago

Tal's ghost is trolling now?

hoppla4y ago

The recaptcha process should be reversed. The sites should prove to humans that it’s content is not generated by bots.

ant6n4y ago

Perhaps a search engine that deranks pages that monetize visits (like ads) would be a good first step.

wdrw4y ago

Interesting, the image seems to contain characters from a Russian childrens' cartoon ( https://en.wikipedia.org/wiki/Kikoriki )

incrop4y ago

And the girl on the left is from comedy clip "Foreign language courses in Balashikha" https://youtu.be/wrYFUBA2kUA

aasasd4y ago

Yup, these very guys: https://s5.cdn.teleprogramma.pro/wp-content/uploads/2020/04/...

A rather non-sequitur choice, like everything else with this thing I guess.

mads4y ago

They are using different images. A month or so ago it was some guy tied up on a chair with some russian text on top of the image.

There are a lot of these domains (ptsdforum.dk, verdes.dk, momentsbykruuse.dk from the top of my mind). Always Danish domains and always registered by the same person in Riga.

fny4y ago

Somewhat related: has anyone else noticed a massive change in breadth of results? I was searching for reviews for diving equipment and some less niche items and I feel like I'm being spoonfed results from the same comparison engines. Since when did algo content become king?

estebarb4y ago

I feel the same. Looking for specialized topics with Google is now very difficult. Now is impossible to look for phones, uncommon words or looking for anything that is not the mainstream result.

I'm not sure if the culprit is BERT or using neural ranking. But in the last years I feel that is more common that I leave Google search without useful information. The worse part is that all the competing search engines are using the same algorithms that are only useful for mainstream results.

fy204y ago

I noticed this in my country when searching for somewhat less common parts (electronics, car parts, tools, etc). The first few results are for online retailers in my country, and then after that it's full of domains with paths such as /sale_12345678. The domain sounds somewhat promising, and the description sounds good - other than it often being a quantity of 10 - but when you click the link it just redirects to AliExpress.

ffffwe3rq352y34y ago

I find that using another search engine in that kind of situation is extremely useful! If I'm searching for more mainstream stuff google usually is great but when I'm going for more specialized topics duck duck go will usually bring up some different links!

2 more replies

infogulch4y ago

Search engines seem to be stuck between serving two roles: 1. An easily accessible directory of mainstream information, and 2. A specialized tool to find the diamond in the rough. It seems like it has to be a tradeoff, it can't serve both roles equally well.

dukeofdoom4y ago

This happens for unpopular events too.

Memory Hole

"The alteration or outright disappearance of inconvenient or embarrassing documents, photographs, transcripts, or other records, such as from a web site or other archive. Its origin comes from George Orwell's "1984", in which the memory hole was a small incinerator chute used for censoring, (through destroying), things Big Brother deemed necessary to censor."

https://www.urbandictionary.com/define.php?term=Memory%20Hol...

visarga4y ago

> I'm not sure if the culprit is BERT or using neural ranking

Tools are not to blame here, it's like blaming the compiler for the behaviour of an application. Starting with the training data and ending with how the model is used in deployment it's the blame of people who made it, not of the neural architecture. The architecture itself can learn anything you throw at it, good or bad.

ajsnigrutin4y ago

Atleast you get the results you are looking for... I search for three keywords, and it chooses to ignore the two specific one, and show only the one general one (while puting a line under the search result, that the result does not contain some keywords).

Basically, like searching diving suit thickness, and google ignoring "suit" and "thickness" (until i specifically put those two words in quotemarks), and only showing me results for diving.

sunshineforever4y ago

I play a game with google search: I take something very mainstream like a movie title, let's say 'Reservoir Dogs' And change something in it, to say 'Reservoir Cats' for example.

Google search 'reservoir cats' and it will completely ignore what you actually search for in favor of the mainstream result. The effect is basically that you can't sesrch for 'reservoir cats'!

Even putting something opposite or unrelated to the highly mainstream result will have no effect.

Its completely entirely ridiculous and makes the search engine seem like a facade.

5 more replies

donkeybeer4y ago

In cases like that, it often ignores words even after double quoting them.

TheSpiceIsLife4y ago

We ignored your search query and showed you results our highest paying customers / advertisers paid us to show you instead.

I don't know of a good general internet search engine, so I tend to stick to the sites I know will provide answers that'll work for me, which is a shame for discovering new content.

pverghese4y ago

I searched diving suit thickness and it provides to correct information as the first result...the diving suit thickness for different temperatures. Not sure why you are not getting that information

Guidii4y ago

Odd. When I try that search[1] I'm seeing good results. There's a onebox telling me how thick a suit I need for different temperatures, followed by a bunch of articles on the topic.

[1] https://www.google.com/search?q=diving+suit+thickness&rlz=1C...

yojo4y ago

This exactly. I’ve been researching specific house repair issues and just get nothing but content spam. Whenever I want specific information I find myself adding “reddit” to the query string, which will usually turn up a thread with links out to the actual answer.

zadler4y ago

Said it before and I’ll say it again, when Reddit finally becomes inaccesible via searches we will have lost a huge and very useful database of succinct information.

2 more replies

jhoechtl4y ago

Searching in Google has become all about shopping. Pure and relevant content is hard to find.

Even today here are bloggers outside who do not have a commercial affiliation with the goods/items/things they are blogging about. Such content is practically impossible to find in comparison to all the Amazon-affiliated pseudo-information conveying spoof-sites.

mdolon4y ago

I wrote a blog post complaining about this early last year: http://mdolon.com/essays/amazon-has-ruined-search-and-google...

The Amazon affiliate program is definitely contributing to this problem.

1 more reply

pjmlp4y ago

Same here, I no longer can find anything sensible on Google, regardless how much I try to customize the search expression.

Additionally as polyglot it is very irritanting that Google tries to helpfully translate queries for me, thus I have to go to other search engines to actually find the article on the language I want.

alfiedotwtf4y ago

I'm just sick of seeing pintrest and quora as the top 8 results :/

mahkeiro4y ago

Pinterest is the worst as you cannot see the results without registering… How can it be a relevant search result! Fortunately -site:pinterest.com make it useable.

1 more reply

gomox4y ago

I couldn't agree more. More and more lately I've felt like the Altavista days. I know the information I'm looking for is out there, it's just not in the Google results page, which is plastered with unreadable stuff (paywalls, content farms), crap "content cards" in the results page, and sneakier and sneakier ads.

I'm not sure what the beginning of the end was for Google Search, but I think the day where they changed the ad background to white is a good candidate.

Google Search used to be like Chrome or Gmail - we know its wrong in the long term, but it's hard to stop using it because it just works so well.

But these days, not anymore. Search is a lot less sticky, and it is their golden goose they are messing with here.

YeBanKo4y ago

I have been struggling with the same issue recently. Results are much more narrower and they seemed to be leaning towards consumer goods items. Though I don’t remember when I ever bought something coming from Google search.

juskrey4y ago

Simply, Google have lost the battle against SEO long ago, and, being in a trap of own cash flow, can't do anything radical to change that.

mmaunder4y ago

Catch22 though. If you eliminate bounce back, you have to rank to get the ranking signal into Google. So how did they rank in the first place? I haven’t tried to reverse what they’re doing but I don’t think the author quite figured it out. Interesting phenomenon though.

nolito4y ago

According to DK-hostmaster (https://www.dk-hostmaster.dk/da/find-domaenenavn) its registered to Ance Dzerina. Ieriku iela 37, dz. 32, LV-1084 Riga, Letland

At 2. juli 2021

Thats pretty fast to work so well. But i see lots of this, with other domains, when searching and have done for years so nothing new here i think.

Matsta4y ago

I had a look at this, and it looks to me like it's a 301 from another domain. Typically when domains get a manual penalty (primarily for spam), they drop in rankings overnight. So to counter this, you register a new domain and redirect it and overnight, your rankings bounce back. This technique is super common for blackhat sites like illegal streaming sites.

If the redirect is done as a meta refresh, then you can block it in your robots.txt from being picked up from SEO tools like Ahrefs, SEMRush etc.

These types of sites are called doorway pages and have been around for ages. They are most popular in Russia and on Yandex, but you do see them on Google for super longtail keywords with 0 competition.

The other important thing to remember is that doing SEO in any language that's not English is a walk in the park. Lots of SEO influencer types have case studies showing how much extra traffic they get by translating their content. [1]

[1] https://neilpatel.com/blog/seo-trend/

belter4y ago

The mermaid mentioned in the article seems to be either a terribly amateurish operation or a very sophisticated sting.

They can be easily traced to a block of flats in Latvia but since their registered phone its a Toy Store in Riga...I am going to go with probably stolen identify operation and a sense of humour on their part instead of the real operation of some 12 year in Riga...

agency4y ago

This is only tangentially related but has anyone else started getting more obviously spam emails in their gmail inbox lately? I feel like for a long time I never got spam in my inbox but lately I’ll get ones that seem like they should be easy to detect, talking about gifts and stuff and uSiNg wEirD capitals or s p a c i n g. Is it just me?

sp3324y ago

Yes, and more non-spam email is getting filtered as spam. Also, a mailing list I was unable to unsubscribe from and marked as spam at least 5 times kept being delivered to my inbox.

beart4y ago

I'll chime in as well. I forward everything from gmail to another account I have. I pretty much never got any forwarded email for years because the gmail account is only really used as an identity for google services. A few months ago I suddenly started to get a significant amount of spam forwarded for no known reason.

philiplu4y ago

Not just you. Something changed two or three months ago. Never really saw spam for years before that; now 3 or 4 mails a day.

javier24y ago

Yes, a few days I've even had 5 different spam emails in the inbox.

kostecki4y ago

Interesting that Latvians picked a danish domain for norwegian content. Especially since you can't just hide behind domain privacy protection.

ocdtrekkie4y ago

My guess is they get away with it because it's a non-English query and most of the people working on these problems aren't looking at their localization. A big issue in general for global tech companies is that they don't usually handle things outside the US/English context particularly well. This often crops up in that political space, where for instance, something contentious like gun sales might get pulled from Google globally even though the political concern with them is mostly limited to the US.

An SEO-fighting Googler might at a glance have no reason not to think that could be a really relevant or popular site in your country.

rapind4y ago

> I think that Google uses stats on whether the user continued checking more results for that specific search query to determine if the visited result answered the user.

God I hope not. If Google does do this, it sounds like a really dumb idea, which will ultimately create widespread usability issues. I can already envision SEO consultants recommending this for their clients if this is believed.

Doesn’t look like it according to https://www.seroundtable.com/google-browser-back-button-rank...

evolve2k4y ago

Before I accessed the article I was hopeful from the title that “The Mermaid” was some hot new search engine out of Norway.

dotcommand4y ago

Same here. But sadly the title would have been 'Google purchases "The Mermaid" for $X'... Given their near $2 trillion market cap, I doubt any search engine would be allowed to stay hot for too long.

mromanuk4y ago

Same for me

spicyramen4y ago

franze4y ago

In a similar note: https://www.autosuggest.net/ currently approaching a lot of websites in the german market.

"We help you to receive high-quality visitors from search engines, generate conversions and build your brand. To achieve these results, we ensure your website / company is recommended for specific keywords by the search engine's autocomplete function."

pope_meat4y ago

Gotta give it to these folks, good hustle.

gonab4y ago

Google has a problem when HN becomes an issue tracker

zulrah4y ago

I've noticed another trend recently where it seems that some websites write content for google SEO instead of optimized for human readability. E.g.: I've seen my exact search phrase repeated mutiple times and then a very long article about the topic when what I searched was a simple question with a few words answer.

kristofferR4y ago

Yeah, I experienced this same spam domain for some searches I did yesterday. It's everywhere.

knolax4y ago

More reasons why a global search monopoly is suboptimal. Smaller markets like this are just going to get neglected and maintained just enough that a better alternative can't compete. Google search is basically useless for any language other than English.

aembleton4y ago

Surprisingly no one has created another search engine that targets another language other than yandex

yfkar4y ago

I've lately noticed that searching Google for topics related to gardening in Finnish often gives me some scraped and machine translated pages from Russia. Really annoying that totally useless content is so high up in the results.

tvirosi4y ago

Google search seems to have gotten significantly worse lately (sometimes to the point that it's barely usable). From scams like these (I've seen others) somehow getting a foothold, to a lot of internal "unbiasing" skewing the results towards googles political stance (usually totally irrelevant to my query). It's gotten to the point that I barely google anymore other than for things I already know what the results will be.

Crazyontap4y ago

Can somebody else who is in Norway can confirm this? It could be simply be a malware injecting this. Would be great to eliminate this possibility

intarga4y ago

I’m in Norway, and I tried the first search “rema 1000” without getting any spam results on the first two pages…

That doesn’t entirely eliminate the other possibilities though, google search isn’t deterministic, and the domain could have been reported since the article went up.

NorwegianDude4y ago

Searches like "REMA 1000"(just a very well known brand name) seems to be the best case scenario, even according to the article(page 5).

I've noticed that the ranking of the results changes really often.

sleepyhead4y ago

It's not showing for the example search (Rema 1000) for me right now, but I did a search yesterday, about a person/company and the result was news related content, and ended up with a site with the same image. However I can't find havfruen (mermaid) in my browser history so they must use other domains as well.

knidoyl4y ago

I'm in France and shearched for the how often thing, it returned themermaid on second page

probably_wrong4y ago

I tried four of the queries from Germany using a private window. 3 returned results from themermaid on the first page.

In particular, the only results ranking higher than themermaid for "hvor ofte oppdaterer apple ios" are those coming from support.apple.com.

javier24y ago

It is not happening with the example from the article for me, but I have seen this practice ruin my search results in varying degree over the past 6 months. Some times entire keywords will just be broken because there are so many fake sites.

Ueland4y ago

Can confirm this is not malware, Google has a huge spam problem, see my previous comments.

fleddr4y ago

Makes you wonder what happens when AI can write "passing" articles. Useless to the reader, but too close to tell for the crawler.

nkozyra4y ago

> The simple solution would be to test sites regularly with an unknown IP and common user agent to check that a site isn’t just showing content to Google and gives real users something completely different. That would stop this.

Surely Google does this, right? Given that - in theory - showing different content to Google versus non-Google should result in a penalty, anyway ...

not2b4y ago

The problem is that paywall sites already do this: Google sees the article, others see a paywall.

qwerty4561274y ago

For every country/market somebody should better make a search engine to compete with Google. Now this is a chance for Norway.

matsemann4y ago

Used to have https://www.kvasir.no/ but now it's just a skinned Google.

sesam.no (not valid domain anymore) was an engine made by some a big Norwegian company back in 2005 or so.

Norway used to be big in search. FAST got acquired by MS back in 2008.

sleepyhead4y ago

We had a fast one but Microsoft bought it and shut it down.

cnxsoft4y ago

Google is garbage. I once complained a website stealing my contents and other people's contents was ranking very highly in Google. I was told I'd better fixed my website before looking at "competitors". Part of that was true, but at the time the person did not seem to care at all of spammy content delivered by Google.

StreamBright4y ago

Same in Hungarian. Google is full of spam and nobody cares. The top hits are auto-translated garbage for many searches.

qwerty4561274y ago

I'm surprised to find out people actually return to the search results page using the back button. Whenever I am serious enough (enough to keep looking after the first link I click does not satisfy me) about finding something I always Middle-Click or Ctrl+Click the links to open them in new tabs.

TeMPOraL4y ago

Artefact of mobile use perhaps? "Open in new tab" is slightly harder on a phone or on a tablet.

chimen4y ago

People are easily "surprised" these days

algismo4y ago

Just tried Google.no from my computer (Norwegian IP (Larvik area)). Nothing similar. I see “normal” search results. In any way, I stopped using Google stuff 5 years ago. Never looked back since then, so my search history is kind of clean, maybe that changes their algorithm behavior.

Recommend to switch to DuckDuckGo:)

golergka4y ago

This image features characters from Smeshariki animation series, hugely popular in Russia in the last 15 years.

tikiman1634y ago

I'm kind of curious why he's so concerned about this? They've never managed better than ninth most relevant and in most cases they didn't even make the first page of result. Any advertising person will tell you, if you aren't in the top 3 results (basically the top result now that paid ads automatically get the top 2 spots on nearly all searches) your odds of being seen and clicked on drop to almost nothing.

Are they potentially doin harm? Sure. Have the successfully managed to trick anybody with this? I'd be extremely surprised if they're getting more than a dozen people clicking through from being the ninth result in a day,and when people see they've been redirected to an advertisement the majority of people immediately click away.

This isn't like clicking on a fake prorn site that redirects to cam girls with viruses hidden in all the downloads. It's random unrelated searches redirecting you to blatant ads for cryto currency. The kind of people who are young enough to know what crypto currency is and how to buy it, also know how to spot a redirect to a fake website.

burnished4y ago

These kinds of scams are a stochastic process. They don't work on your average person, they only work on vulnerable people. Heres the catch though, everyone is vulnerable at some point in their lives. This is where the stochastic process comes in, they don't need to get you when you're strong, they just need to test enough people enough times to catch them in a vulnerable moment.

onepunchedman4y ago

The language in those scam articles is actually perfect, first time I've seen that.

sublimefire4y ago

It is interesting as you cannot see the content which is being indexed. Suspect only bot does. If I understand correctly this is the sequence of events from the bot's perspective:

## read robots.txt `curl 'https://havfruen4220.dk/robots.txt'`

## use pointer to a sitemap.xml

## read more sitemaps

Other sitemaps contain a pointer to a "webpage" eg: https://havfruen4220.dk/no/7a28855e4714dd14

## read web pages

Each location in a sitemap has a "lastmod" of today/yesterday so bot returns there everyday. In addition each webpage has a "<meta name="robots" content="noarchive">"

But if you visit each of those pages then it shows you a cartoon image. It seems the actual indexed content is visible only to the bot.

## But how is actual content being rendered?

The question is, what conditions (request params/headers) result in the actual content being rendered? The bot needs to evaluate it. Suspect it is some combo of checking if the requester is an actual google bot, maybe by looking up the IP https://developers.google.com/search/docs/advanced/crawling/...

cratermoon4y ago

The Norwegian pinterest

tapland4y ago

I imagine it's done in a similar way to how reddit circumvents searching for results from certain dates. I don't like anyone messing with google results.

techaddict0094y ago

Someone has probably found some kind of SEO Hack or Some 0 Day in Google serp. There are plenty of .it domains doing similar in Google USA serps.

fergie4y ago

Norwegian here- I haven't seen this at all- maybe the author has been somehow "fingerprinted" and targeted?

aembleton4y ago

Have you tried in a private window to check that you're not fingerprinted?

siproprio4y ago

I've seen this trend in other places too:

For example, Microsoft routinely deletes negative feedback from GitHub issue for vs code.

ubercore4y ago

FWIW, I just tried these searches (am in Norway) and didn't see that domain in the results.

onepunchedman4y ago

Wow, the Norwegian on those scam web sites is actually perfect. Never seen that before.

Ueland4y ago

That's because it's real content that they have stolen and just republished. In SEO circles one like to say that original content is king. Well, not so much after all.

punnerud4y ago

I live in Norway and don’t have this problem now. I had a similar problem about a year ago on my MacBook Air because of some software that altered my Google results in all of my browsers. Don’t remember the name of it, but something smelled fishy when the results was different from the ones on my phone.

oarthOP4y ago

Pretty sure affects you too as it's the same for me, on multiple networks, multiple user-agents, multiple devices and so on.

Simply just trie one of the examples like "hvordan regne ut prosent"(how to calculate percentages) or, I don't know..."DNB aksje"(DNB stocks, DNB being the biggest bank in Norway). Sure enough, both ranks on the first page or as the one of the top results. (One is now using the www.nem-varmepumper.dk domain, that is the same thing).

EDIT: Now the DNB one moved from 2 and 3 place to page 2. Things are moving around quickly.

punnerud4y ago

You are right, it also affects me.

«hvordan regne ut prosent», https://www.havfruen4220.dk is rank 5 and 6.

«DNB Aksje», https://www.havfruen4220.dk is rank 10

1 more reply

classified4y ago

If the mermaid took it, does that mean Google search is resting with the fishes?

manceraio4y ago

They will get probably outranked on the next big Google update.

claroclinic4y ago

Well this is happening in all countries

rataata_jr4y ago

Havfruen, brought to you by mountain trolls from Finmark.

sleepyhead4y ago

There are no mountain trolls in Finnmark, they live further south as the mountains in Finnmark are not very big.

rataata_jr4y ago

Have you met them? Please tell yes.

1 more reply

mlang234y ago

It seems google has lost its ability to block spam effectively. Since a few months, I notice an increase amount of outright scam being promoted on YT. I even got a ad for a fake Musk telling people to invest in a shady bitcoin scheme. Knowing that Google is willing to let these slip through just to maximize their ad revenue is really a warning sign that this company, no matter how large it might be by now, should not be trusted anymore.

chovybizzass4y ago

I've been using https://search.brave.com for a few weeks. Most of the time I find what I need.

WarOnPrivacy4y ago

Their news scroll is also better than average.

devmunchies4y ago

Yes me too. have like it better than DDG.

Goety4y ago

I will remain steadfast in my support from Google forever and always.

jessaustin4y ago

TFA talks about Google testing with "unknown IP", but doesn't mention any testing done by the author with cookies cleared or in incognito mode. This seems basic.

finnh4y ago

What do you expect incognito to change? That would presumably show the same content the author is seeing. Only Google sees the content that drives the ranking.

It is Google that needs "incognito" mode, not the author.

jessaustin4y ago

I stopped using google for search because I noticed the filter bubble it was building around me. Perhaps that wasn't maintained by cookies, but in that case I wonder what it was...

QuietCF4y ago

For all we know (as the OP doesn’t mention trying incognito) the OP could have malicious software on their device that hijacks their browser to manipulate search results

2 more replies

londons_explore4y ago

It's the hooking of the browser back button in a way that Google does not detect which is the real 'trick'.

Anyone who can do that can rank as high as they like for any search query.

londons_explore4y ago

To expand on this: A very strong ranking signal is how many of the users that click a search result are sufficiently satisfied with the information they have found to end their search.

A good proxy for this is how many people don't click the 'back' button to see other results.

Google is already aware of sites which hijack the back button. Their crawler detects this, and if they find it, they throw out the figures of how many people click the back button.

So if you can find a way to hook the back button so nobody can click back, while stopping google thinking you have hooked the back button, then your page will keep creeping up the rankings.

Google detects back button hijacking with their crawler (by rendering the page in Chromium and seeing the effect when hitting the actual back button), but this is circumvented by presenting the crawler different html. (or making sure the page behaves differently in their crawler, potentially by checking things like the model of the graphics card - googles crawlers don't yet support most of WebGL 2.0, and also simulate playing audio at the wrong rate)

Google also detects how many real users click back. If it's zero, then thats a warning flag. So I'd guess the back-hijacking logic is only activated ~80% of the time.

paxys4y ago

I doubt it's some crazy sophisticated SEO hijacking operation. Probably a result of a small data set (Norwegian language web pages), specific search terms (Norwegian brands, companies), and lots of keyword stuffing. Most of the examples the author pointed out were from pages 5-10 of Google results, which are probably worthless for ad revenue anyways.

Osiris4y ago

He specifically pointed out that it's ranking in the top 10 for nearly every search he did.

tyingq4y ago

It does rate a pretty good chuckle recalling old Google blog posts about their various uber-sophisticated anti-spam ML algorithms and how black hat SEO just wasn't possible anymore.

rchaud4y ago

This type of scraped-content websites were common for English language searches back in 2010 or so. I believe the 'Panda' algorithm update eliminated them from English searches.

j / k navigate · click thread line to collapse

346 comments

Ueland4y ago

I have some experience on this field. Around two years ago i was a DevOp for the company running Dagbladet, Norways #2 newspaper. One of the things I did was keep an eye on mysterious traffic.

I managed to find a huge spam network that set up a proxy service that delivered normal content, but injected "you can win an iPhone!" spam to all users visiting them.

Since I was in the position of being able to monitor their proxy traffic towards many sites I managed. I could easily document their behaviour.

By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

Oh, and the journalist? He did find a Dutch company that was not happy to see neither him or the photographer :)

avian4y ago

> We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.

As someone that tried reporting spam sites because they were using content scrapped from my website, I'm not surprised.

They say "we're not a hosting provider" as if that's an excuse that they can't refuse to offer their service. I'm sure many spam websites would go away if they couldn't hide behind Cloudflare.

yosamino4y ago

> The best they do is forward your complaint to the owner of the website, who is free to ignore it.

Or worse. Since I have no way to know beforehand who I would be dealing with, this is actively dangerous - what if the mobster running this site is having a bad day and choses to retaliate ?

Also what a stupid fucking policy that is. Even if you are not legally compelled to block content, what is the point of actively helping distibute harmful content?

I still do not understand the moral position of profiting off of enabling criminal scum, when it would be so easy not to...

5 more replies

eru4y ago

They might take that stance, to avoid liability and complication.

3 more replies

mattbee4y ago

This policy even extends to stresser/booster/DoS-for-hire services services - try searching for some and see who fronts them?

20 years ago the transit providers of the internet would have spotted Cloudflare for what it is, and cut it off.

guest1598354y ago

I've been reporting hundreds of spam sites to Cloudflare, but always get the same lame excuse. Godaddy the same. Meanwhile good content drops in Google rankings and spam moves to the top.

FeepingCreature4y ago

That seems like the sort of thing that should require a judge's order.

1 more reply

andyjohnson04y ago

I'm pretty sure they stopped providing services to a neo-nazi site a few years ago. A decision that I am completely happy with btw.

nine_k4y ago

This is very rational of them. They position themselves as a pipe for "bytes", not "content".

They don't analyze the internals of their traffic the way internet backbone providers don't analyze the internals of the traffic they pass around.

I frankly find this position superior: imho it does more good by preventing censorship than harm by serving good-intentioned and bad-intentioned customers alike.

rowanG0774y ago

In fact I completely agree with that stance. It's not cloudfares job to police the content. They provide a simple service. If something is unlawful law enforcement should go after the owners.

1 more reply

dylan6044y ago

tluyben24y ago

tikiman1634y ago

ultimoo4y ago

> By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.

Very curious to know what you found!

Ueland4y ago

1 more reply

lifeisstillgood4y ago

Can I just clarify?

That seems like a problem that the ”original-source” metatag was supposed to stop?

tyingq4y ago

Canonical urls help with noting your own purposeful duplicated content. But that meta tag goes on the duplicated content. So it doesn't help with scrapers, who strip that out.

1 more reply

pepy4y ago

Do you want to get to the bottom of this? A friend of mine is a top Dutch lawyer with an interest in these things.

Ueland4y ago

This was two years ago and the network is now (to my knowledge) gone.

1 more reply

keyme4y ago

Google search has progressively deteriorated in quality over the last 10 years, to the point where I see it becoming useless in the relatively near future. And it's mainly not even their fault.

The data that remains on SERPs is now also heavily censored for arbitrary reasons. "For your health", "For your protection". Google search is done.

omega34y ago

> And it's mainly not even their fault.

spaniard892774y ago

Ignoring double brackets drives me crazy. That's the last straw that sent me to DDG, although I have to say that DDG isn't much better either.

What made me really angry aboyt Google Search was when they removed their function to search in discussion forums. But even then you could more or less filter out crap.

Nowadays it feels very hard. I find myself using the site: flag many times, but you need to know the site beforehand, which is another problem.

3 more replies

remus4y ago

> It's precisely their fault, they've created an environment that incentivizes low quality, irrelevant content and are actively hostile towards users.

I think this is an overly harsh take. I strongly suspect that any algorithm for ranking search results is open to gaming and manipulation by malicious users.

1 more reply

account424y ago

> gnoring the country website, previously if you wanted to search only local news it was very easy to do

Also the opposite: insisting on pushing local and localized results on google.com even when I set my browser language to english.

1 more reply

yreg4y ago

How do any of those make people talk inside closed Discord groups instead of the open web?

BizarroLand4y ago

Adrig4y ago

On the other hand, Youtube is the second most popular search engine and I don't see it slowing down. What an insight they had when they bought it.

Edit : I entirely agree to the fact that valuable information is found more in communities nowadays. I also predict that the web in 5 years will be mostly explored through communities

the_duke4y ago

> that's why the top of the page is increasingly taken by their widgets to provide directly the information.

Another reason for that is user retention.

If you get your information directly on google.com, you won't navigate away, probably search again, and bring in more ad revenue.

wil4214y ago

tonypace4y ago

YouTube search is regressing quickly. They're losing there too.

nuker4y ago

> Google search has progressively deteriorated in quality

49 out of 50 review sites are now just affiliate links to Amazon. “Check the price on Amazon” buttons is the main content there

wccrawford4y ago

2 more replies

mojzu4y ago

kall4y ago

1 more reply

jacobolus4y ago

Google scholar search is still very useful.

DuckDuckGo is nowadays more useful than Google for my web searches.

smusamashah4y ago

You should try yandex.ru for all that interesting stuff. They don't censor any of it.

kkoncevicius4y ago

Google seems to also place less emphasis on search phrases. When searching for exact article names I easily find them on DuckDuckGo, but not on Google. Two recent search-term examples:

1. the scientific worldview needs an update

2. from reproducibility to over reproducibility

account424y ago

2 more replies

yetanotheralexn4y ago

Your examples seem to work for me (the second one only if combined with double quotes). Do you have more? https://snipboard.io/PYhNHW.jpg https://snipboard.io/HvRaiE.jpg

It would be cool to find datapoints for a proper bug report for Google :)

1 more reply

cratermoon4y ago

Edit: see this other HN story: https://news.ycombinator.com/item?id=27993564

herbst4y ago

Google only recently started to totally butcher the Swiss search results. For some reason I could still find direct download links to movies and music a few years ago (kinda legal here).

Now such search results often don't even get a second page...

IfOnlyYouKnew4y ago

If 90% of what you’re searching for is keygens and „inside closed Telegram groups“, it might just be time to grow up?

janmo4y ago

For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...

How do they rank so well?

Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...

adventured4y ago

Typically Google has a warming/trial period for new large content sites, after their search bot is introduced to the content and has spidered its way through the site.

For example there used to be a very common content farm system, that was structured like like this:

https://domainsites.com/site/nytimes.com

So when people searched for sites by domain name, the zillions of low traffic long-tail results of this farm system would be all over Google's results.

kostecki4y ago

This definitely looks like an expired domain that was bought. Havfruen seems to be a restaurant in the city of Korsør - which conveniently have the postal code of 4220.

NorwegianDude4y ago

.it pages are used in Norway too, but I'm not sure it's something GPT-ish that's being used. Whole sentences are copied word for word from other articles.(might be a small dataset it's trained on?)

Clearly freshness is a large part of it. Google seems to have indexed millions upon millions of pages tied to this in the last 24 hours.

ROARosen4y ago

https://twitter.com/peterkruse/status/1410895961803665410

nmstoker4y ago

I presume "GPL" was an autocorrect from the intended "GPT" right?

janmo4y ago

Correct, it was a typo

1 more reply

MrUnderhill4y ago

Almost any search in Norwegian will have obvious scam sites like these in the top 10 results.

Other domains part of the same scam that show up in my results today: mariesofie dot dk, bvosvejsogmontage dot dk

I wonder if it is related to this: https://www.dk-hostmaster.dk/en/news/dk-hostmaster-takes-102...

NorwegianDude4y ago

Yup. Those domains are the same thing, and redirects to the same thing. There are even more domains.

Never seen anything on this scale before. I can search for basically anything(tax rules, baking, stocks, property, hygiene...) and Google will most likely show those domains somewhere.

e_carra4y ago

I had similar experiences with: https://www.xspdf.com/resolution/51859292.html

The content seems taken from other websites and mixed in a nonsensical way. It comes up frequently in my search results. www.xspdf.com has completely unrelated content and seems a separate business.

ricardo814y ago

Poor man's cloaking

curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' 'https://havfruen4220.dk' > 1.html

diff 1.html 2.html

7d6

gvb4y ago

The "diff" output (above) needs an extra line break to avoid HN automatic line wrapping. The output of the diff command is:

diff 1.html 2.html

NorwegianDude4y ago

I've noticed this daily.

Would be interesting to see the actual content. Based on the small snippets in the search results, it takes content from other sites, like large Norwegian news sites, and somehow outranks them hard.

I wonder what the Google Search Console looks like for that domain, considering that it's probably getting millions worth of free traffic.

Some brands I was unable to find at all, but ironically another .dk domain showed up in it's place that did the same thing. There is also some .it domains using the same content.

I've found that it takes contents from multiple sources and glues it together in sometimes great ways. Like one sentence from this page, another thing from that page.

Maybe this is some ML that collects content and pieces a lot of it together sentences or half sentences to one large article? It's clearly from completely different sources, but about the same thing.

Example: "wash car"

Result in google: "A dark winter with snow and salt is hard on the car, and it's extra important to wash the car" - Collected from one article.

"Keep the pressure washer at 30-50 cm from the car..." - From another article.

Sometimes it goes from one thing and switches to something completely unrelated, so I guess the spammers still have something to improve.

Weird.

weird-eye-issue4y ago

Some data on their traffic from some SEO tools I pay for:

Ahrefs: 230k organic traffic valued at $124k SEMRush: 558k organic traffic valued at $355k

These are estimates and can be widely under or overestimated but they show that this is happening on a very large scale.

NorwegianDude4y ago

However, the weird thing it that it steals content from articles, and then outranks them. Most pages seems to be boosted, maybe as a result of it being new. (Most content is just hours old)

Could you check these too? (exactly the same thing, but newer, it seems) www.mariesofie.dk nem-varmepumper.dk

Clearly reused domains.

weird-eye-issue4y ago

Ueland4y ago

But I currently feel that paying $100/mo for Ahrefs for something I do as a side project is a tad wasteful.

weird-eye-issue4y ago

555554y ago

ahrefs is the best in the business.

1 more reply

gnyman4y ago

And add to that the some link spam and preventing the visitors to return not get any bounce back...

jmiserez4y ago

monday_4y ago

Not sure how relevant this is, but the animal characters in the top image are from a Russian children hit cartoon "The Smesharicks" (literally "The Laughballs").

snickersnee114y ago

Also, the left image of a woman from a russian meme from a tv show.

Schnurpel4y ago

notRobot4y ago

To be clear, that's not HN, but The Hacker News, a different website, known for... dubious reporting.

bigpeopleareold4y ago

eitland4y ago

Another big one is that Norwegians like Germans write words together, just one example from one of the stupid ads: "Spesial Reportasje" is a dead giveaway not only because of the capitalization.)

eru4y ago

In German there was actually quite a lot of historic development about whether to write words together or separate or with hyphens.

The current state of formal German will surely not be the end of history.

See https://de.wikipedia.org/wiki/Leerzeichen_in_Komposita#Gesch... (Might need Google Translate, if you don't speak German.)

1 more reply

bigpeopleareold4y ago

This reminds me of the facebook group: Bilder i kampen mot særskrivingfeil: https://www.facebook.com/ettord :D

(Something like: Pictures in the struggle against mistakes when using spaces between words)

1 more reply

bigpeopleareold4y ago

the_biot4y ago

boomlinde4y ago

> they get targetted relentlessly by SEO spam operations like this.

skinkestek4y ago

> I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this.

A bit of it is probably that.

Outright ignoring my queries: +, doublequotes, "verbatim" and all takes more than SEO tactics, it takes someone inside Google, either malicious or more probably incompetent on the inside.

Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

logicchains4y ago

>Or more probably: someone was so busy trying to use AI in searches that no they haven't had time the last ten years to consider if it was smart.

Or maybe Google started applying "We know better than the users", the driving principle behind their software and libraries, to their search.

fauigerzigerk4y ago

Is there really any difference between DDG and Google when it comes to SEO spam? If there is, I sure haven't noticed in spite of using both, often for the same search terms.

It seems to me that the techniques used to spam Google's index work just as well on Bing's index.

raverbashing4y ago

Even worse, getting this kind of spam through to DDG (Bing?) seems easier than on Google

It seems DDG is worse at finding the more authoritative sites about a subject compared to Google.

shuger4y ago

That's an advantage. Since google tuned up their engine to treat authoritative results as better their searches became absolute dogshit.

You search for a very specific thing and all the results are big sites that have said something that contains two of the 6 words you search for in a completely generic article that helps you none.

rvba4y ago

Because they automated anything and you cannot contact any human from quality assurance.

beebeepka4y ago

Google search has been a brochure for a long time now

dhosek4y ago

mattwad4y ago

UBlacklist is a plugin that does this. It's so great to be able to hide all those sites that just cache Git issues and SO posts.

RileyJames4y ago

Just to add to that, uBlacklist has a power feature called subscriptions. Which is massively under utilised.

It enables a collaborative effort in blocking spam / low value domains.

If you make a block list, please submit it to the list I’ve made: https://github.com/rjaus/awesome-ublacklist

(There’s no great subscription discovery as yet)

nickysielicki4y ago

Oh man, this plugin is going to save me hours of time over the next 30 years. Goodbye forever, cplusplus.com

dhosek4y ago

eitland4y ago

I used to have a text document on my desktop containing a list of domains that contained autogenerated content, each with a minus in front, like:

-stupidautogeneratedcontent1.com -stupidautogeneratedcontent2.com etc

Edit: mahalo.com was one of those, https://en.m.wikipedia.org/wiki/Mahalo.com

dcdc1234y ago

I’ve you tried -site:site.com? I think that still works.

niutech4y ago

Just filter out results using uBlock Origin like this:

   google.*##.g:has(a[href*="example.com"])

matsemann4y ago

Can't fathom Google not catching this..

[0]: https://news.ycombinator.com/item?id=21621099

porbelm4y ago

When I try that search, havfruen is seventh place. NHI and other good results at the top.

YMMV a lot with Google results. For me, it's usually great where DDG is kinda crap, but not as bad as... shudder ... bing

fogihujy4y ago

With DDG, I found this thread. Google set to Norway as region/language found nothing from havfruen4220.dk, unless I specifically added site:havfruen4220.dk in the search.

My guess is that someone at Google reacted.

2 more replies

bash-j4y ago

hayksaakian4y ago

Interesting because it shows that bounce-back is a more significant ranking factor than before.

It seems like they've manipulated rankings by locking people in to reduce their bounce-back stats (in addition to keyword-stuffed content)

ma2rten4y ago

I don't think it necessarily shows that. Their good ranking could be completely unrelated to bouceback.

wokwokwok4y ago

Who knows? It's a black box after all.

...but, you know. Can you see anything else they're doing that would give them that kind of ranking? These pages are just piles of crap, and google is pretty good at filtering that sort of stuff out.

If it was that easy, google would be filled with spam everywhere.

The chance that someone did something random thats very uncommon (block back) and it happened to be a super effective signal to google seems:

a) like an edge case they didn't think of

b) like it'll get fixed pretty fast

c) not that unlikely.

Compared to, say, the idea that some random spammers have built a network of incredibly sophisticated ML-generated pages that can subvert googles algorithms which seems:

a) not substantiated by any obvious content on the pages

b) requires a very high level of sophistication which seems totally lacking

c) very unlikely

...but I mean, who knows right?

We're all just speculating. I guess it'll get fixed soon, and we'll never know.

1 more reply

purplepatrick4y ago

soheil4y ago

I wonder why Yandex opens every link in a new window. How can they track bounceback?

3 more replies

FeepingCreature4y ago

That seems automatically testable. Load the site in simulator, then look at the URL history.

rwmj4y ago

This has been going on for years now, so I don't have much confidence that Google is able or willing to fix it.

l0b04y ago

WHOIS shows it's registered four weeks ago by someone in Riga, Latvia.

rataata_jr4y ago

Tal's ghost is trolling now?

hoppla4y ago

The recaptcha process should be reversed. The sites should prove to humans that it’s content is not generated by bots.

ant6n4y ago

Perhaps a search engine that deranks pages that monetize visits (like ads) would be a good first step.

wdrw4y ago

Interesting, the image seems to contain characters from a Russian childrens' cartoon ( https://en.wikipedia.org/wiki/Kikoriki )

incrop4y ago

And the girl on the left is from comedy clip "Foreign language courses in Balashikha" https://youtu.be/wrYFUBA2kUA

aasasd4y ago

Yup, these very guys: https://s5.cdn.teleprogramma.pro/wp-content/uploads/2020/04/...

A rather non-sequitur choice, like everything else with this thing I guess.

mads4y ago

They are using different images. A month or so ago it was some guy tied up on a chair with some russian text on top of the image.

There are a lot of these domains (ptsdforum.dk, verdes.dk, momentsbykruuse.dk from the top of my mind). Always Danish domains and always registered by the same person in Riga.

fny4y ago

estebarb4y ago

I feel the same. Looking for specialized topics with Google is now very difficult. Now is impossible to look for phones, uncommon words or looking for anything that is not the mainstream result.

fy204y ago

ffffwe3rq352y34y ago

2 more replies

infogulch4y ago

dukeofdoom4y ago

This happens for unpopular events too.

Memory Hole

https://www.urbandictionary.com/define.php?term=Memory%20Hol...

visarga4y ago

> I'm not sure if the culprit is BERT or using neural ranking

ajsnigrutin4y ago

Basically, like searching diving suit thickness, and google ignoring "suit" and "thickness" (until i specifically put those two words in quotemarks), and only showing me results for diving.

sunshineforever4y ago

I play a game with google search: I take something very mainstream like a movie title, let's say 'Reservoir Dogs' And change something in it, to say 'Reservoir Cats' for example.

Google search 'reservoir cats' and it will completely ignore what you actually search for in favor of the mainstream result. The effect is basically that you can't sesrch for 'reservoir cats'!

Even putting something opposite or unrelated to the highly mainstream result will have no effect.

Its completely entirely ridiculous and makes the search engine seem like a facade.

5 more replies

donkeybeer4y ago

In cases like that, it often ignores words even after double quoting them.

TheSpiceIsLife4y ago

We ignored your search query and showed you results our highest paying customers / advertisers paid us to show you instead.

I don't know of a good general internet search engine, so I tend to stick to the sites I know will provide answers that'll work for me, which is a shame for discovering new content.

pverghese4y ago

I searched diving suit thickness and it provides to correct information as the first result...the diving suit thickness for different temperatures. Not sure why you are not getting that information

Guidii4y ago

Odd. When I try that search[1] I'm seeing good results. There's a onebox telling me how thick a suit I need for different temperatures, followed by a bunch of articles on the topic.

[1] https://www.google.com/search?q=diving+suit+thickness&rlz=1C...

yojo4y ago

zadler4y ago

Said it before and I’ll say it again, when Reddit finally becomes inaccesible via searches we will have lost a huge and very useful database of succinct information.

2 more replies

jhoechtl4y ago

Searching in Google has become all about shopping. Pure and relevant content is hard to find.

mdolon4y ago

I wrote a blog post complaining about this early last year: http://mdolon.com/essays/amazon-has-ruined-search-and-google...

The Amazon affiliate program is definitely contributing to this problem.

1 more reply

pjmlp4y ago

Same here, I no longer can find anything sensible on Google, regardless how much I try to customize the search expression.

Additionally as polyglot it is very irritanting that Google tries to helpfully translate queries for me, thus I have to go to other search engines to actually find the article on the language I want.

alfiedotwtf4y ago

I'm just sick of seeing pintrest and quora as the top 8 results :/

mahkeiro4y ago

Pinterest is the worst as you cannot see the results without registering… How can it be a relevant search result! Fortunately -site:pinterest.com make it useable.

1 more reply

gomox4y ago

I'm not sure what the beginning of the end was for Google Search, but I think the day where they changed the ad background to white is a good candidate.

Google Search used to be like Chrome or Gmail - we know its wrong in the long term, but it's hard to stop using it because it just works so well.

But these days, not anymore. Search is a lot less sticky, and it is their golden goose they are messing with here.

YeBanKo4y ago

juskrey4y ago

Simply, Google have lost the battle against SEO long ago, and, being in a trap of own cash flow, can't do anything radical to change that.

mmaunder4y ago

nolito4y ago

According to DK-hostmaster (https://www.dk-hostmaster.dk/da/find-domaenenavn) its registered to Ance Dzerina. Ieriku iela 37, dz. 32, LV-1084 Riga, Letland

At 2. juli 2021

Thats pretty fast to work so well. But i see lots of this, with other domains, when searching and have done for years so nothing new here i think.

Matsta4y ago

If the redirect is done as a meta refresh, then you can block it in your robots.txt from being picked up from SEO tools like Ahrefs, SEMRush etc.

[1] https://neilpatel.com/blog/seo-trend/

belter4y ago

The mermaid mentioned in the article seems to be either a terribly amateurish operation or a very sophisticated sting.

agency4y ago

sp3324y ago

Yes, and more non-spam email is getting filtered as spam. Also, a mailing list I was unable to unsubscribe from and marked as spam at least 5 times kept being delivered to my inbox.

beart4y ago

philiplu4y ago

Not just you. Something changed two or three months ago. Never really saw spam for years before that; now 3 or 4 mails a day.

javier24y ago

Yes, a few days I've even had 5 different spam emails in the inbox.

kostecki4y ago

Interesting that Latvians picked a danish domain for norwegian content. Especially since you can't just hide behind domain privacy protection.

ocdtrekkie4y ago

An SEO-fighting Googler might at a glance have no reason not to think that could be a really relevant or popular site in your country.

rapind4y ago

> I think that Google uses stats on whether the user continued checking more results for that specific search query to determine if the visited result answered the user.

Doesn’t look like it according to https://www.seroundtable.com/google-browser-back-button-rank...

evolve2k4y ago

Before I accessed the article I was hopeful from the title that “The Mermaid” was some hot new search engine out of Norway.

dotcommand4y ago

Same here. But sadly the title would have been 'Google purchases "The Mermaid" for $X'... Given their near $2 trillion market cap, I doubt any search engine would be allowed to stay hot for too long.

mromanuk4y ago

Same for me

spicyramen4y ago

franze4y ago

In a similar note: https://www.autosuggest.net/ currently approaching a lot of websites in the german market.

pope_meat4y ago

Gotta give it to these folks, good hustle.

gonab4y ago

Google has a problem when HN becomes an issue tracker

zulrah4y ago

kristofferR4y ago

Yeah, I experienced this same spam domain for some searches I did yesterday. It's everywhere.

knolax4y ago

aembleton4y ago

Surprisingly no one has created another search engine that targets another language other than yandex

yfkar4y ago

tvirosi4y ago

Crazyontap4y ago

Can somebody else who is in Norway can confirm this? It could be simply be a malware injecting this. Would be great to eliminate this possibility

intarga4y ago

I’m in Norway, and I tried the first search “rema 1000” without getting any spam results on the first two pages…

That doesn’t entirely eliminate the other possibilities though, google search isn’t deterministic, and the domain could have been reported since the article went up.

NorwegianDude4y ago

Searches like "REMA 1000"(just a very well known brand name) seems to be the best case scenario, even according to the article(page 5).

I've noticed that the ranking of the results changes really often.

sleepyhead4y ago

knidoyl4y ago

I'm in France and shearched for the how often thing, it returned themermaid on second page

probably_wrong4y ago

I tried four of the queries from Germany using a private window. 3 returned results from themermaid on the first page.

In particular, the only results ranking higher than themermaid for "hvor ofte oppdaterer apple ios" are those coming from support.apple.com.

javier24y ago

Ueland4y ago

Can confirm this is not malware, Google has a huge spam problem, see my previous comments.

fleddr4y ago

Makes you wonder what happens when AI can write "passing" articles. Useless to the reader, but too close to tell for the crawler.

nkozyra4y ago

Surely Google does this, right? Given that - in theory - showing different content to Google versus non-Google should result in a penalty, anyway ...

not2b4y ago

The problem is that paywall sites already do this: Google sees the article, others see a paywall.

qwerty4561274y ago

For every country/market somebody should better make a search engine to compete with Google. Now this is a chance for Norway.

matsemann4y ago

Used to have https://www.kvasir.no/ but now it's just a skinned Google.

sesam.no (not valid domain anymore) was an engine made by some a big Norwegian company back in 2005 or so.

Norway used to be big in search. FAST got acquired by MS back in 2008.

sleepyhead4y ago

We had a fast one but Microsoft bought it and shut it down.

cnxsoft4y ago

StreamBright4y ago

Same in Hungarian. Google is full of spam and nobody cares. The top hits are auto-translated garbage for many searches.

qwerty4561274y ago

TeMPOraL4y ago

Artefact of mobile use perhaps? "Open in new tab" is slightly harder on a phone or on a tablet.

chimen4y ago

People are easily "surprised" these days

algismo4y ago

Recommend to switch to DuckDuckGo:)

golergka4y ago

This image features characters from Smeshariki animation series, hugely popular in Russia in the last 15 years.

tikiman1634y ago

burnished4y ago

onepunchedman4y ago

The language in those scam articles is actually perfect, first time I've seen that.

sublimefire4y ago

It is interesting as you cannot see the content which is being indexed. Suspect only bot does. If I understand correctly this is the sequence of events from the bot's perspective:

## read robots.txt `curl 'https://havfruen4220.dk/robots.txt'`

## use pointer to a sitemap.xml

## read more sitemaps

Other sitemaps contain a pointer to a "webpage" eg: https://havfruen4220.dk/no/7a28855e4714dd14

## read web pages

Each location in a sitemap has a "lastmod" of today/yesterday so bot returns there everyday. In addition each webpage has a "<meta name="robots" content="noarchive">"

But if you visit each of those pages then it shows you a cartoon image. It seems the actual indexed content is visible only to the bot.

## But how is actual content being rendered?

cratermoon4y ago

The Norwegian pinterest

tapland4y ago

I imagine it's done in a similar way to how reddit circumvents searching for results from certain dates. I don't like anyone messing with google results.

techaddict0094y ago

Someone has probably found some kind of SEO Hack or Some 0 Day in Google serp. There are plenty of .it domains doing similar in Google USA serps.

fergie4y ago

Norwegian here- I haven't seen this at all- maybe the author has been somehow "fingerprinted" and targeted?

aembleton4y ago

Have you tried in a private window to check that you're not fingerprinted?

siproprio4y ago

I've seen this trend in other places too:

For example, Microsoft routinely deletes negative feedback from GitHub issue for vs code.

ubercore4y ago

FWIW, I just tried these searches (am in Norway) and didn't see that domain in the results.

onepunchedman4y ago

Wow, the Norwegian on those scam web sites is actually perfect. Never seen that before.

Ueland4y ago

That's because it's real content that they have stolen and just republished. In SEO circles one like to say that original content is king. Well, not so much after all.

punnerud4y ago

oarthOP4y ago

Pretty sure affects you too as it's the same for me, on multiple networks, multiple user-agents, multiple devices and so on.

EDIT: Now the DNB one moved from 2 and 3 place to page 2. Things are moving around quickly.

punnerud4y ago

You are right, it also affects me.

«hvordan regne ut prosent», https://www.havfruen4220.dk is rank 5 and 6.

«DNB Aksje», https://www.havfruen4220.dk is rank 10

1 more reply

classified4y ago

If the mermaid took it, does that mean Google search is resting with the fishes?

manceraio4y ago

They will get probably outranked on the next big Google update.

claroclinic4y ago

Well this is happening in all countries

rataata_jr4y ago

Havfruen, brought to you by mountain trolls from Finmark.

sleepyhead4y ago

There are no mountain trolls in Finnmark, they live further south as the mountains in Finnmark are not very big.

rataata_jr4y ago

Have you met them? Please tell yes.

1 more reply

mlang234y ago

chovybizzass4y ago

I've been using https://search.brave.com for a few weeks. Most of the time I find what I need.

WarOnPrivacy4y ago

Their news scroll is also better than average.

devmunchies4y ago

Yes me too. have like it better than DDG.

Goety4y ago

I will remain steadfast in my support from Google forever and always.

jessaustin4y ago

TFA talks about Google testing with "unknown IP", but doesn't mention any testing done by the author with cookies cleared or in incognito mode. This seems basic.

finnh4y ago

What do you expect incognito to change? That would presumably show the same content the author is seeing. Only Google sees the content that drives the ranking.

It is Google that needs "incognito" mode, not the author.

jessaustin4y ago

I stopped using google for search because I noticed the filter bubble it was building around me. Perhaps that wasn't maintained by cookies, but in that case I wonder what it was...

QuietCF4y ago

For all we know (as the OP doesn’t mention trying incognito) the OP could have malicious software on their device that hijacks their browser to manipulate search results

2 more replies

londons_explore4y ago

It's the hooking of the browser back button in a way that Google does not detect which is the real 'trick'.

Anyone who can do that can rank as high as they like for any search query.

londons_explore4y ago

To expand on this: A very strong ranking signal is how many of the users that click a search result are sufficiently satisfied with the information they have found to end their search.

A good proxy for this is how many people don't click the 'back' button to see other results.

Google is already aware of sites which hijack the back button. Their crawler detects this, and if they find it, they throw out the figures of how many people click the back button.

So if you can find a way to hook the back button so nobody can click back, while stopping google thinking you have hooked the back button, then your page will keep creeping up the rankings.

Google also detects how many real users click back. If it's zero, then thats a warning flag. So I'd guess the back-hijacking logic is only activated ~80% of the time.

paxys4y ago

Osiris4y ago

He specifically pointed out that it's ranking in the top 10 for nearly every search he did.

tyingq4y ago

It does rate a pretty good chuckle recalling old Google blog posts about their various uber-sophisticated anti-spam ML algorithms and how black hat SEO just wasn't possible anymore.

rchaud4y ago

This type of scraped-content websites were common for English language searches back in 2010 or so. I believe the 'Panda' algorithm update eliminated them from English searches.

j / k navigate · click thread line to collapse