There's one project that is 'active'. Are they active because they're better, or because they arrived late to the party and are reinventing the wheel.
This other project is 8 years old and has hardly been touched in 2 years. Is that because it takes 4 years to cover the entire domain and a couple years to fix all the bugs that can be fixed? Or because it's languishing?
Novelty is not a good selection criterion. Sometimes the old ways are best.
(disclosure - I've built libhunt)
When I search for anything technical the first thing I do is change the date range to the past year.
That would make it not "one of the best," right?
-- Arthur Schopenhauer, "On Authorship and Style"
https://ebooks.adelaide.edu.au/s/schopenhauer/arthur/essays/...
Emphasis, and wall of text, in the original.
Whether it’s gender identity issues, technological warfare, jingoist politicians, class struggles, leadership advice, etc., somebody already wrote about it more effectively than the huge majority of modern writers churning out books.
For example, the blog didn't even beat out the result above it which is "older content". For all we know, the blog was the last relevant candidate for that search query, the author doesn't demonstrate otherwise.
The only spammy thing we can tell from TFA is that the blog post's publish-date is set in 2019 when it was clearly published in 2016. But we don't know if Google knows that and is already penalizing them. Nor if it's an effective attack.
Also, the author doesn't demonstrate that the modified-date is inaccurate or spammy. CMS software will update this if you go back and correct a typo, or somehow mass-transform a bunch of blog posts. For instance, I once had a dead-link-finder plugin on Wordpress that gave me a UI to patch urls in old blog posts that would have updated the modified-date on all affected posts.
I don't see anything in TFA that suggests that the blog was benefiting from spammy behavior.
Google's algorithm tends to classify queries as "evergreen" and otherwise. With the former, recency doesn't seem to matter that much. I've had content stick a page 1 ranking for 3+ years without any updates.
For non-evergreen keywords, recency matters a lot. A fresh article on an established domain can often outrank better, but older content.
The query classification system works well enough for most keywords, but there is a grey area where Google doesn't really know what to focus on. Like with a query that focuses on "best practices". It isn't clear whether Google should prioritize classic, evergreen best practices, or focus on more recently developed best practices.
I would definitely appreciate a filter that only shows me old school content for my target queries. Some of the best content I've read online sits on websites that haven't been updated since 2002 and still use tables based design.
This website is a curated archive of Usenet posts.
It's far from universal, but if you understand quality content, you will recognize it here:
https://yarchive.net/comp/risc_definition.html John Mashey talking about RISC vs CISC.
https://yarchive.net/comp/linux/lost+found.html Ted Ts'o on lost+found.
... and plenty more.
It is Google's fault, but the fault is that I can't choose.
If I'm looking for something about Beaglebone programming, there is a 90%+ probability that I need it limited to "last 6 months". If I'm looking for something about java.util.concurrent, 10 years old articles are probably just fine.
The fact that I can't make this choice is infuriating.
If a blog post is worth publishing multiple times then it should be published as something different.
I think Google ranks microblogs, blogs, articles, wikis, discussions, books, etc differently.
But for all we know, they do. The TFA doesn't establish otherwise. It doesn't even establish that the author didn't update the content in June. Or that the blog unfairly ranked over better candidates.
This is actually a real question from me, and I'd love feedback. I look back on my stuff from a few years ago and the blog posts need fixing up, be it for SEO, for my more up to date 'voice' or because I switched to Gutenberg and removed janky slider plugins that haven't been updated since 2014. Should I re-date? Or leave them as-is?
I know some sites abuse this because of the way that the Google algorithm works, but it is a good practice to list the date you last updated the post so your visitors can see.
For specific topics, I prefer to read more recent info so if I end up on a page that lists "2015" as published date but doesn't tell me that it has been updated since, I may not trust that page as much simply because of the idea I may have that the info may be outdated.
Solution: put a "first published" and a "last updated" date, or use a wiki that makes the revisions available online and a way to link to the different revisions.
The issue is that blogs abuse these fields for SEO purposes.
Enough old content disappears from the internet organically.
But the new post likely wouldn't rank as well as the old post, so you are less likely to find it using search. It would be a better user experience if the original post had a clear indicator stating that it was updated with new information on X date.
Even looking from usability standpoint I think that it is totally okay and even necessary to add 'updated' or 'modified' date. Personally, as a reader I'm relying on article created/updated heavily, if I cannot quickly and easily figure out if article is still relevant and up-to-date I'm usually leaving webpage immediately without even bothering to read.
You can add a link to the new post somewhere on the old post.
Granted this would be my personal preference. From an SEO point of view this doesn't make sense, at all.
Thanks again!
But if people want an answer as to why the blogosphere is dead and everything's on centralised silos: this is why. Any decentralised system that doesn't take spamfighting into account from the beginning will drown under it as soon as it becomes popular.
Authors of popular blogs didn't want to handle comment moderation, and once they started encouraging discussion directly on FB, Reddit or HN, it was game over.
Problems come up when looking for older content on purpose. A lot of older content is very relevant still today. I know people who could not find something they knew existed in Google. Switched to Bing and it came right to the top. The difference was the way newer content was prioritized.
Google doing this sets priorities. It says that newer is more important. Is that true? Many would argue it's not. Google says it is and "advises" people where to go based on that.
I find these worth considering.
Content writers create posts like "Ultimate Beginnner's guide to X in 2019" articles and just update the post title and "Last updated" metadata each year. Nobody's going to create a brand new guide to building muscle or whatever if the core information doesn't change year to year.
1. it's possible the author double checks the content each year and redates so that visitors know it still applies
2. the author updates the article to be current and redates it
It's a little weird to say this is "blogwashing". It's pretty common (for me at least) to check the date of an article when it's a tutorial so I know if it's current or not. And I've seen this happen before where authors append a "changelog" to the article at the end so you know that it's up to date.
I do that on my site mainly to keep things less cluttered. Every post has an "Updated on November 12th 2019 in #docker #flask" line at the top of the post and that date is either the original published date or the last time I updated the content in the post, but the meta tags are always the correct values (ie. I don't refresh the published date with the updated date).
But now it's making me think I should include both the "Posted on" date as well as a separate "Updated on" date in the presentation of the page itself to be crystal clear. My only concern with that is that will eliminate some vertical space on the page because I can't fit all of that on 1 line cleanly. I would have to break the dates and tags onto 2 lines. For example, this is what a current line looks like: https://nickjanetakis.com/blog/make-your-static-files-produc...
I've always dreamed of a grep for the web, for instance. Trying to Google for code is a pain, even when quoted/verbatim.
My local historical society decided to produce lots of content and exclusively post to Facebook. It’s incredibly dumb, but people seek the easiest path.
As a starting point, your not-googlebot needs to spider sites differently from googlebot (so it can't be detected by traffic analysis), imitate average user hardware well (GPU acceleration + high GPU performance, more realistically slow network, slower CPU hardware, etc), use network addresses not obviously Google's, and imitate user behavior (plausible input events, scrolling, etc). This is within Google's capabilities but is definitely an undertaking and SEO types could eventually identify their strategies.
In my experience, Google's dupe content detection is pretty good and their penalty is harsh. I once ran a website that tried to curate and touch up old usenet material that never could rank all that well because of dupe content (another website had monospaced dumps of the usenet content).
Yeah, I could do that in an afternoon :^)
I noticed it a couple times myself. Stuff that's obviously an older article appears in the SERPs as if it was published a few days ago.
And with version numbers, you are not limited to dates in the past, you can even write articles about the future!
Here's a brilliant example: https://gorails.com/setup/ubuntu/20.04
How to set up Rails on Ubuntu 20.04, which will be released in April next year. You can already read the guide today! Some of the links might not work yet, because obviously you can't download Ubuntu 20.04 yet, but once it's released, those guys are bound to be the first ones who had a guide out!
I get completely different results on Google if I actually search for the phrase in the search bar, with no sign of the blog in question. I see zero evidence that the scummy SEO tactic actually works and a lot more evidence of a faked "Google" screenshot.
Usually, I'm interested in currency not recency. If, say, a technical article was written in 2015, I don't exactly care that it was written in 2015 but do care very much whether it's outdated today or not. APIs change, etc. If the blogger has re-dated the article, that suggests they believe it is still current, which is useful information to me.
(* - Caveat: no, I've never redated a blog article myself. But I am only a very infrequent blogger anyway.)
This is Google's fault.
But this bullshit makes that really hard.
November 13th 2422
This week, Groaar and Mrumfm have been experimenting a new invention. We are considering calling it "wheel". Will keep you informed.
Comments
This is old news. Our tribe has been using it for eons.
Google also knows about vertical search [1] and actively destroys anyone who pops up with a good new algorithm and hope for a startup.
[1] https://en.wikipedia.org/wiki/Vertical_search. I am pretty sure there has been a HN frontpage article about a couple with a vertical search startup that was legally and practically destroyed by Google.
OP made the point that Google does not want to address this problem and may even facilitate it (maybe passively). However, they have actively prevented progress on search engines that may be more difficult to SEO engineer/hack (because of their specificity) and particularly have prevented people with good vertical search algorithms from building a business (read: actively sabotaged their business). [1]
In any case, Google known as well to quash any other small projects that they feel challenges them. [2] [3] [4]
[1] https://www.nytimes.com/2018/02/20/magazine/the-case-against... and https://news.ycombinator.com/item?id=16420004 [2] https://news.ycombinator.com/item?id=19553941 web browser [3] https://news.ycombinator.com/item?id=18566929 person's idea taken after interview [4] https://news.ycombinator.com/item?id=19124324 business
References [2]–[4] are not specifically important; there are easily a dozen such kind of complaints if you just search for "google" on HN.
With that being said, there's a huge difference between shortening "weblog" into "blog" and shortening "blog post" into "blog":
First, when "weblog" was shortened to "blog," "blog" didn't already mean something (and certainly didn't mean anything in the relevant context). When "blog post" got shortened to "blog," "blog" already had a meaning - AND it already had a meaning _in the context of the internet_. One of these leads to confusion, one of them doesn't.
Second, when "weblog" was shortened to "blog," we didn't already have a shorthand way of saying "weblog." But we've been shortening "blog post" to "post" basically since the beginning. There was no reason to shorten it to "blog" also. "Post" was just fine.
I'd argue that a more fair comparison would be if, after using "blog" for a while, we decided to shorten "weblog" into "web" instead. It would have been silly, because "web" already meant something, and because we already had a shorthand version of "weblog" (i.e., "blog") - so why did we need another?
But I guess your sarcasm and the down votes answer my question anyway. The internet has accepted "blog" as meaning "blog post." I might as well get on board.