Suddenly, Hacker News is not the first result for 'Hacker News' (opens in new tab)

(google.com)

427 pointsAretNCarlsen14y ago199 comments

199 comments

I think I know what the problem is; we're detecting HN as a dead page. It's unclear whether this happened on the HN side or on Google's side, but I'm pinging the right people to ask whether we can get this fixed pretty quickly.

Added: Looks like HN has been blocking Googlebot, so our automated systems started to think that HN was dead. I dropped an email to PG to ask what he'd like us to do.

pg14y ago

I sent you an email about this.

(A couple weeks ago I banned all Google crawler IPs except one. Crawlers are disproportionately bad for HN's performance because HN is optimized to serve recent stuff, which is usually in memory.)

pierrefar14y ago

Hi Paul,

A site can be crawled from any number of Googlebot IP addresses, and so blocking all except one doesn't help in throttling crawling.

If you verify the site in Webmaster Tools, we have a tool you can use to set a slower crawl rate for Googlebot, regardless of which specific IP address ends up crawling the site.

Let me know if you need more help.

Edit Detailed instructions to set a custom crawl rate:

1. Verify the site in Webmaster Tools.

2. On the site's dashboard, the left hand side menu has an entry called Site Settings. Expand that and choose the Settings submenu.

3. The page there has a crawl rate setting (last one). It defaults to " Let Google determine my crawl rate (recommended)". Select "Set custom crawl rate" instead.

4. That opens up a form and choose his desired crawl rate in crawls per second.

If there is a specific problem with Googlebot, you can reach the team as follows:

1. To the right hand side of the Crawl Rate setting is a link called "Learn More". Click that to open a yellow box. 2. In the box is a link called Report a problem with Googlebot which will take you to form you can fill out with full details.

Thanks!

Pierre

2 more replies

wheels14y ago

Have you considered putting a caching reverse proxy in front of the arc app to keep the backend from having to render all of the old pages?

It seems like the only dynamic element of old articles is the "$x days ago" bit and that'd be pretty easy to turn into something static by instead just putting in timestamps in the actual HTML and using Javascript to transform them into how many hours / days ago they were. Then the crawlers would just be pulling out cached, pre-rendered HTML.

There's an example of doing such with nginx here:

http://serverfault.com/questions/30705/how-to-set-up-nginx-a...

With that you'd just have to send out the HTTP header from the arc app saying that current articles expire immediately, and old ones don't.

1 more reply

Matt_Cutts14y ago

Gotcha--thanks, Paul. I'm about to get on a plane, but we'll get this figured out where we're not sending as much hostload toward HN.

2 more replies

tibbon14y ago

Could you use some sort of sitemap or other way to provide the data to Google that isn't so damaging to site performance? Or in Google Webmaster tools turn down the rate of crawling?

Just realized that this could be a problem for lots of sites, and I'm curious as to what the best solution is, since not everyone has Matt Cutts reading their site and helping out.

1 more reply

jedberg14y ago

reddit had the same problem. We set up a separate server just for the google crawler with it's own copy of the database, so that the queries for old pages didn't slow down everyone else.

blhack14y ago

Seeing threads like this remind me that HN is still a pretty tight-knit community of real people doing real things.

It's good to see this stuff sometimes. Thanks, Matt!

edit:

And then reading some of the other threads on this topic is a bit...something.

Guys, can you calm the conspiracy theory nonsense a bit? Please?

If you're not on this site very much, you might not realize that Matt pops into almost every thread where google is doing something strange regardless of who they're doing it to, and tries to help figure out what is happening. This isn't HN getting some sort of preferential treatment, this is just the effect of having a userbase full of hackers.

You'd see the same type of thing on /. years ago if you frequented it enough.

This is nothing new. This is what a good community looks like. Everybody relax.

Honestly if you read the things that Matt and Pierre have said, they just looked at "freshness" (I believe that is what it is called), and inferred that PG had blocked their crawlers.

This is all stuff you can get from within google webmaster tools (which isn't some secret whoooo insider google thing. It's something they offer to everybody, and it's just like analytics.)

OH! Wait! I mean (hold on, let me spin up my google conspiracy theory generator): thehackernews.com has more ads on it so google is intentionally tweaking their algo to serve that page at a higher point than the real HN because of ads!

DUH!

C'mon, guys, look at their user pages. They're both just active users of the site trying to help out.

larrys14y ago

"This isn't HN getting some sort of preferential treatment, this is just the effect of having a userbase full of hackers"

Of course it's preferential treatment. And if you scan the last month or two of Matt's comments they are general in nature and not specific as in:

"I think I know what the problem is; we're detecting HN as a dead page. It's unclear whether this happened on the HN side or on Google's side, but I'm pinging the right people to ask whether we can get this fixed pretty quickly."

You don't think "pinging the right people" and "get this fixed pretty quickly" is preferential treatment?

3 more replies

billpatrianakos14y ago

It really is but I was personally kind of glad not to see HN have that high of a ranking on Google for the term.

I remember when I first started visiting HN I saw all these smart people and the tight community and I was amazed that something that felt so close-knit and exclusive yet was still open could still exist these days.

I was a lurker for a long time before I actually signed up and participated because I honestly felt like I swasnt entitled to be part of "the group" and I should somehow earn my wings. Then in late 2010 I signed up but didn't submit for a bit and didn't join discussions. I still felt like I didn't have enough to offer. I now feel like I've somehow earned the right to be part of this community though in hindsight I'm quite embarrassed of my first few submissions.

So this story does have a point that I'm about to get to. I first heard of HN through an article in GQ and then forgot the link. I couldn't find the site again after searching Google for "Hacker News" as easily as I thought. This frustrated me slightly back then but now I think it's a good thing.

As the size of a community gets larger the quality of comments and submissions usually decreases. Letting people join HN freely and openly is a great thing but I fear that if it became a huge sensation then we'd be inundated by garbage submissions and comments way more frequently. I know about the post on how newbies often say HN is becoming Reddit and all that so I do try to remember that.

So the point is that not everyone respects communities like this and are thoughtful about joining and how they choose to interact on communities like HN the same way I was and I feel like maybe it's okay if Google isn't giving us the best ranking for certain terms. I mean, HN is easy to find still, just not that easy to stumble over.

RBerenguel14y ago

Advantages of having a deep Google guy as a user :)

cyanbane14y ago

The question is, 'Was Matt Cutts ever THE Googleguy?'

   ( http://www.webmasterworld.com/profilev4.cgi?action=view&member=GoogleGuy )

I always wondered who it really was years back.

1 more reply

jfoster14y ago

Has Google considered an indicator next to results for this type of case? A single red exclamation mark with further details when the mouse is over it, for example.

I'm aware of webmaster tools, but it seems not all webmasters are.

Matt_Cutts14y ago

We do include some information in the search results for cases where our users could be harmed, e.g. if we detect malware or if we think that your site may have been hacked. We've been exploring other ways to alert site owners that there's something important affecting their site.

But people do choose to remove their sites, so we can't always tell between a mistake vs. someone who genuinely prefers not to be in Google's index.

moe14y ago

Does it perhaps return something to the tune of "Unknown or expired link"? :-P

Matt_Cutts14y ago

Nope, it looks like we haven't been able to fetch even the robots.txt file for a few days, so our system is trying to play it safe by assuming that HN is unreachable/dead.

1 more reply

mutt_cutts14y ago

what about the massive amount of duplicate content indexed via this domain: hackerne.ws ?

eurohacker14y ago

can you improve my site also Matt and fix the issue , i had the same problem some time ago - some wordpress plugin blocked googlebot unintentinally and i lost the ranking,

i am sure here are many folks here as well who had similar problems, why not help us all out ))

benatkin14y ago

I never thought that Google Search would be so laughable.

billpatrianakos14y ago

Excellent contribution... Ugh, this is why I don't mind if HN is listed on Google or not.

So which part is laughable? The part where they're able to crawl the vast expanses of the web and return relevant results for the majority of their users? Or is it the part where they came out of nowhere to dominate search because they did it better than the rest?

Come on now, you can't be all things to all people. Google is far from perfect but for a lot of us it's much closer to perfect than the competition and they're constantly trying to improve it. Why don't you go ask Matt Cutts to fix whichever parts of it you think are laughable to your liking? He's been hanging around here and he doesn't seem shy about answering people's questions and concerns. I do doubt he'd give the time of day to a one sentence remark that adds nothing of value whatsoever to the larger discussion or any of it's offshoots.

2 more replies

GigabyteCoin14y ago

Apparently the page in the top spot now hasn't been updated since: 2011.07.08

It now only consists of ads, a twitter feed, and a "Abba-da-dabba-da-dabba-dabba Dat’s all folks!" line.

So what if Googlebot thought HN was dead? Why would it opt to show a "more dead" page in place of it?

I think you're fibbing.

StavrosK14y ago

The other page isn't dead, they can access it.

eurohacker14y ago

yes - it looks more like google had been changing their algorithm recently - in favor of older sites , versus the content -

and it didnt turn out quite well ...

hahla14y ago

Does anyone else find it disturbing that google employees are bending over for pg/hn? Seriously, if any other webmaster blocked googles bots they wouldn't change their algorithms to accommodate, or see how they could use less of our resources.

Its pg's fault not googles, and I dont see why they should care. Maybe from their standpoint it would be more beneficial to google users who are used to typing in 'hacker news' to visit this site, but since when did that matter to google?

Also don't get me wrong I love both google and hackernews. I just find whats going on in this thread interesting..

pierrefar14y ago

From my personal point of view, I help webmasters when I see an issue. It's part of my job and it's something that gets me fired up. I help when I go on our forums, I do that here, and I'm regularly helping webmasters on Google+ (my team does regular webmaster hangouts that anyone can join - see my profile: https://plus.google.com/115984868678744352358 ) and also on Twitter. We can't cover everything, we can't be everywhere, but when we come across something we can help with, we try to help!

Another important point about this thread: this is a very common issue that regularly comes up, and no site is immune from it. I regularly see major sites have a firewall that auto-configures itself to block Googlebot, and the webmaster doesn't know what's going on. Raising awareness about this problem and how to fix it adds to the importance of replying.

novedieci14y ago

Anyone know if this is also a common issue with BingBot? Our site is doing fine in Google at the moment, but was completely deindexed by Bing, and we're totally baffled.

unreal3714y ago

It's not disturbing at all. I am glad Google is a company made up of humans and not uncaring robots. Human nature is that people want to help others when they are in a position to do so. I would do the same if HN needed my help.

Of course, there are people who want to take advantage of people like Matt to give themselves an economic advantage, and I have no sympathy for those people if they can't get someone in Google to help them as easily.

It comes down to karma.

felipe14y ago

> I am glad Google is a company made up of humans and not uncaring robots

That's exactly the problem. Google is notorious for providing horrible customer support, so why some people get personalized help, while the rest of us are stuck with the uncaring robots?

4 more replies

AznHisoka14y ago

Yes, it's VERY disturbing, no question about it. Matt can sound all helpful, and cuddly, and oxytocin-inducing all he wants, but he's basically stepping in and helping 1 site over another, when the whole process should NOT favor anyone. There are lots of webmasters out there that do stupid stuff like override a robots.txt accidentally, yet I don't see Matt sending them an email, asking them to check up on it.. Come'n... this is lame!

jfoster14y ago

What do you think should happen in this scenario? Matt Cutts is aware that a reasonably high-profile site is not being indexed properly. Should he just ignore that?

I think it's particularly good that this is being done publicly, too. There's billions of sites out there and Google can't provide this for all of them, but since they're doing it publicly others can learn from it and know how to address it.

If Matt was artificially boosting HN's ranking that would be disturbing, but they're fixing an indexing issue.

apetresc14y ago

Are you kidding? Matt is basically a superhero who helps anyone he can find. He's solved problems for people complaining on Twitter, on Google+, he hosts regular video office hours when anyone can come to him with problems, etc. It may be the first time YOU'VE seen him rush in to help someone, but this is Matt Cutts' standard MO.

1 more reply

Tichy14y ago

Google should simply deliver the best results, no matter what algorithm they use to do so.

1 more reply

umarmung14y ago

Damned if you and damned if you dont.

All the people in this thread, including Google-related users, are trying to do is figure out what changed. That could just as easily have been at Google or HN's! It could also just as easily been a correct change.

If it turns out it was something HN changed and they did not use or even aware of the proper Google tools available for everyone to optimize their site, then I hardly think any reasonable person could object if it is pointed out to them - even if it's by Google personnel.

blhack14y ago

>Does anyone else find it disturbing that google employees are bending over for pg/hn?

Umm...no. They're not tweaking the algo, they're explaining to PG that he should stop blocking the crawlers, or that he should verify the site in google webmaster tools, then change the crawl rate.

larrys14y ago

Umm...Yes.

Seems like "bending over" to me:

"but I'm pinging the right people to ask whether we can get this fixed pretty quickly"

1 more reply

ig114y ago

If you use Google Webmaster Tools it will in-fact tell you all of this information without having to ask someone at Google.

MarkMc14y ago

Interesting, but not disturbing. I think it's just a consequence of people at Google initially reading HN for the stories, then finding that the HN community forms a useful 'brains trust'. I for one am glad to see that HN has a hotline to the Kremlin ;)

product5014y ago

Baseless comment. If they don't help, you will say - HN is one of the top sources of tech news in the world and it is surprising no one from Google is responding to such an obvious mistake. If they help, you comment as stated.

Secondly, HN is one of the most popular sources of tech news. By responding to this issue, Google is helping bring out possible solutions to a common problem - that crawlers are too frequent and affect the performance of a site. People like you and me can read the responses and learn what can be done to control this.

Third, popular websites like HN deserve attention like this because of the following they have garnered over the years. If,say "hahla blog" doesn't show as a top result in Google - it will take time for Google to verify whether it should even be shown as a top site. But, if something like HN or Amazon.com doesn't show as top result when users explicitly searches for it - that is a darn good use case for Google to fix the issue. It is an indication of something is definitely wrong somewhere that needs to be fixed.

zem14y ago

> Maybe from their standpoint it would be more beneficial to google users who are used to typing in 'hacker news' to visit this site, but since when did that matter to google?

from google's point of view, a user who types "hacker news" into google is almost certainly looking for news.ycombinator. therefore, it is highly desirable for the site to be the #1 result.

zerostar0714y ago

Disturbing yes but not surprising, HN is influential. Now whether it's hacking news or entrepreneur news is another issue.

csallen14y ago

What's more interesting is that people ONLY feel the way you do when the company in question is big. If it's a tiny startup doing some sort of unscalable marketing or customer support, nobody blinks an eye. But if a Google-sized organization does it, it's shocking or disturbing.

I think your best bet is to realize that, regardless of size, companies are always made up of people. And people are opinionated, subjective, and prone to making decisions that fall outside of some standard set of rules.

moses140014y ago

Completely agree with you Hahla - amazing to see the Googlers jump in including Matt Cutts so quickly to support PG and HN and YC yet 99.99999999% would never ever see support like this.

billpatrianakos14y ago

Let's be real here. Do 99.9999...% of users need or deserve this kind of support? It's easy to have sour grapes and say "oh the big shots get attention but I don't! Not fair!". But then you have to remember that people like PG worked their ass off to get a site like this to become so valuable and popular. We're not talking about some guy's blog about his cat that no one cares about, we're talking about HN here. Is it equal treatment? Of course not but is it fair? I say it is fair.

If fairness is treating people based on merit then it's totally fine. Matt isn't in here saying he's going to manually manipulate the result or change anything on Google's end (save for maybe correcting a flaw that would benefit everyone who's in a similar situation as HN). All he's doing is suggesting possible causes, trying to make a diagnosis, and just troubleshooting.

Most people who want this attention don't deserve it. They're the type of people who won't use the resources available like the google help documents or even learning how to administrate a site for best results in search engines. Plus, most people's website just aren't worthy of this attention.

I could go on but lets look at the reality here. We're all knowledgeable of how the web works here so let's just admit that HN is a damn popular site and it definitely seems like a fluke to have it rank as it was when this was submitted.

njloof14y ago

If you don't know somebody that works for Google at this point in your career, you're working in the wrong industry.

xd14y ago

~30k employees at google .. what makes you think they hold all the keys to the millions of web developers across the globe?

brown9-214y ago

What algorithm changes?

101010111100114y ago

Was the query for a domain name or just hacker+news? How do we know?

Does <title> make any difference?

I don't think of this site as "Hacker News". I think of it as ycombinator, and the subdomain, news.

Should users of hackernews.com think of that site as something else, e.g. whatever is between the title tags?

A searchable list of domain names, ranked by popularity. Or even a searchable list of main page titles. Is that how some users are using Google? If so, Google does not need a full, current cached copy of the crawlable web to provide that.

tripzilch14y ago

Unsure. On several occasions I got the distinct impression that a lot of members aren't actually interested in "hacking" (for any of its definitions), instead you get people asking about the value of jailbreaking a phone: http://news.ycombinator.com/item?id=3169000

I suppose it is because the top hits below HN are perhaps even less about "hacker news" ?

pierrefar14y ago

I work at Google helping webmasters.

It seems something has been blocking Googlebot from crawling HN, and so our algorithms think the site is dead. A very common cause is a firewall.

I realize that pg has been cracking down on crawlers recently. Maybe there was an unexpected configuration change? If Googlebot is crawling too fast, you can slow it down in Webmaster Tools.

I'm happy to answer any questions. This is a common issue.

Pierre

joelhaasnoot14y ago

Why is it that "our algorithms think the site is dead" so soon, yet when I search for a keyword, I still get sites that our dead or no longer contain the given keyword? Bad algorithms or is this "thinking dead" a time delayed thing?

Matt_Cutts14y ago

Normally there's a lag between when a site goes dead and when we crawl the site to see that it's dead.

It's also tricky because you don't want a single transient glitch to cause a site to be removed from Google, so normally our systems try to give a little wiggle room in case individual sites are just under unusual load or the site is down only temporarily.

yeggeyeggeyegge14y ago

Not a webmaster, but would like to know: Does/can the google crawler system send a notification email to the registered webmaster of a website, if something like this happens? This could be deemed unsolicited, but would in a huge majority of the cases be more than welcome!

pierrefar14y ago

All these errors are reported in Webmaster Tools in the Diagnostics section. Verify your site and you'll have access to this and a wealth of more data.

Also, we do send notifications in Webmaster Tools, and you can confgure those to be delivered by email too. I'm not sure if we send messages for these kinds of serious crawl errors, so I'll need to check. If not, that's an interesting idea I can ask the team to think about.

Thanks for the feedback :)

Pierre

mmaunder14y ago

PG: welcome to the woes of being an alexa top 1000 site with over 1 million pages of dynamic content.

HN has roughly 1.3 million pages indexed by google.

1.3M pages at 43k per page is 53 gigs to cache static versions of all pages on the site. Quadruple that for a worst case scenario and it'll still easily fit on a single drive.

When your site gets this popular you tend to have to re-architect your application to solve perf issues. You could serve googlebot UA's 1 week old cached pages for example.

I'd encourage you to start thinking of yourself as a utility providing a valuable and necessary resource to the Net and take the time and energy to solve this properly.

resnamen14y ago

I'd like him to fix the experience for non-bot users first, namely the "Unknown or expired link" issue.

markerdmann14y ago

Has Google been making significant changes to the search ranking algorithms in the past couple months? I've noticed a significant decline in the quality of results, to the point that (for the first time in years), I've bounced over to DuckDuckGo or Bing to try my luck there. I love Google as a company, so (if this isn't just in my head) I'd love to see things get better again.

EDIT: Looks like the change to HN's ranking is related to a change that pg made, so my comment is now less relevant to the parent post. I still stand by it, though. :-)

Matt_Cutts14y ago

If you can look back in your search history to find the specific searches that didn't do well for you, we're always happy to get really concrete examples.

The best reports look like "I did a search for [Bavarian red widgets] and the results weren't good because e.g. you were missing a specific page X that you used to return or should return, or you returned Austrian red widgets" or whatever.

Lots of Googlers are clearly hanging out on HN over Thanksgiving while they're stuck at relatives' houses. :)

rumpelstiltskin14y ago

Here's a search that tripped me up -

I'm in Houston, TX. When I searched for 'windshield repair houston', the 1st result - wwwDOThoustonwindshieldrepairDOTnet - looked promising so I made an appointment with them to get my windshield repaired. When I went to their place of business, it was just a guy in a pickup truck in the parking lot of a strip mall, with a 'windshield repair' sign on the back of the truck.

Turns out he's running a scam where he gets ppl to file claims with their insurance, and when they pay him, he would kickback 50% to the customer. Having insurance pay for damages is common enough, plenty of businesses do it. But this guy was trying to get me to file claims for damages that I didn't even have. When I asked for a cash price just to repair the damage I did have, he refused saying that 'it wasn't enough money'.

I was pissed. Not only is this illegal, I couldn't believe he was ranked #1. When I did some digging, it turns out the guy is gaming google with a ton of paid backlinks. For example, http://www.searchpicks.com/business/automotive/patsco-windsh... (click 'suggest listing' for the price).

I'm sure plenty of other searchers ended up wasting their time with this SERP just like I did.

1 more reply

markerdmann14y ago

That's helpful, thanks. I haven't kept a record of those searches (they were mostly programming-related queries), but I'd be happy to submit reports in the future. Is there a URL or email address for this purpose?

1 more reply

billpatrianakos14y ago

This is awesome. Matt friggin Cutts is over here personally responding to not just the big shots (as in PG, and I don't mean that in a bad way at all) but some of us little people too. I see this stuff and it just inspires me. I really hope one day I can be in a position where all types of people, rich, poor, powerful, powerless, can all benefit from my skill or knowledge and that I would be the guy who helps the average Joe the same as the hot shot CEO. I hope I carry that with me as I build my own business. Thanks for being so cool, Matt Cutts.

1 more reply

lhnz14y ago

I actually think this might be positive for the quality of the conversation on this site.

_ndrw14y ago

True. No reason not to keep HN on the hush hush, eh?

billpatrianakos14y ago

I disagree. I mean, we don't want to keep it a secret but at the same time this site has always been really close knit and the people all really just "get it". If HN grows too large I fear for it. The web is full of Cretans and people who think they're thoughts are so worthy they must be heard. Lots of ignorance out there. I've seen some it here already. I learned the culture and studied the etiquette and what qualifies as a useful submission or comment before I really joined in. Not everyone is so thoughtful and it could really sour a once great community. You're obviously new around here so I'm not surprised you'd say that. I mean no disrespect - I do think you'll eventually change your mind though.

w1ntermute14y ago

Out of curiosity, how did the OP even find out that this was going on? He appears to be a long-time user of the site, and therefore would have no reason to Google "Hacker News".

The most common other reason would be that some people use Google as their URL bar - instead of typing "hackerne.ws" or "news.ycombinator.com" into the URL bar, they type "Hacker News" into Google and click on the first result. However, I would've thought that the types of people using HN would have the tech savvy to use a keyword bookmark, or at least the URL.

pbhjpbhj14y ago

Sometimes I want to find HN but am using a different device, sometimes that device is also a touchscreen and typing is a pain. Usually Google is very close to hand and the autocomplete saves typing out a full address or full set of keywords.

So I'd only need to type "hacke" in google and click "I'm feeling lucky" in the drop down rather than type "http://news.ycombinator.com/ into an address bar.

Didn't know about the hackerne.ws domain name however.

narcissus14y ago

Could've been from telling a friend "hey yeah... found this on Hacker News". Then the friend went looking for 'Hacker News' on Google and came across the current top result.

At that point that friend asks the OP: "by Hacker News, you mean <this other site>, right?". Then the OP goes looking...

yonran14y ago

When I'm on someone else's phone or computer, I just search for [hn] to geT here quickly.

ludwigvan14y ago

A few months ago, I was talking about the Stanford ai/ml-classes to one of my friends. He asked me if there were any more classes and I said "Yes, take a look at the front page in Hacker News, there are some links. I visit that site frequently, it's very helpful."

There was one little issue though.

The poor guy didn't know what hackernews was, so found that site (hackernews.com). He then scanned the Twitter stream over that site several times to find those links and started visiting the site for several days to find those other helpful links.

When I saw him again a few days later, he told me: "What a silly site HackerNews is! And I couldn't find the links to those classes over there."

He also told me that he was disappointed of me for visiting such a silly site.

Now, can you guess the look on his face when I told him that he was visiting the wrong site for the last few days?

tillk14y ago

great story, bro

lowglow14y ago

Matt Cutts is my hero. I've never seen anyone from google (or any company) be as proactive in explaining, interacting, and helping this community. Thanks.

cletus14y ago

Raised here already:

http://news.ycombinator.com/item?id=3277365

FWIW I've raised this issue.

Matt_Cutts14y ago

Thanks, cletus. I'll file one too, just to help make sure people check it out.

test562514y ago

Don't link to Google search results!

It's personalized - everyone sees different results. Even if you don't have a Google account.

For me http://news.ycombinator.com is the top page. But when I use TOR, http://www.hackernews.com and http://thehackernews.com/ are on top.

I don't think it's possible to get a real "invariant" result page. It all depends on which computer you use (cookies, language setting, ip address).

waitwhat14y ago

http://googlesystem.blogspot.com/2007/04/how-to-disable-goog...

AretNCarlsenOP14y ago

Or even on the first page. (I notice because I always get to HN this way. Today is the first time I have ever seen it below #1 or #2.)

deepkut14y ago

Screw it--don't crawl HN. Keep the community small and passionate :)

bezza114y ago

Bottom line is that if you don't trust Google & use their tools (GWT) then you can end up in sticky situations like this one.

My experience with the Crawl Rate feature via GWT is that they do honour it pretty strictly, but for large sites Gbot can cause a lot of extra load even if pages are static.

A good CDN and stateless cache server will help but for sites as large as HN every request adds up!

alpb14y ago

I remember times a few months ago when Hacker News was not the first result again. Just wanted to point this out.

gaza3g14y ago

I think it makes a a lot of sense for Matt Cutts to intervene in this case since:

1) Matt browses HN. 2) HN is a high-volume site and whatever suggestions that were discussed and implemented here can be noted and learned by everyone else.

yllus14y ago

I definitely didn't expect to see my own blog as the last result on page one. Or is that because I've shared it via Google Plus and it's being injected into my personal results?

kizel14y ago

Yeah, I usually get here that way, but today was the first time I haven't seen it on the front page, let alone first result.

williamle830014y ago

I want to "Learn ethical hacking training."

braga14y ago

Stay away from venture capitalists ;)

BadassFractal14y ago

I don't think it's all bad, we don't want this community to turn into another reddit.

rplnt14y ago

Nor second or anywhere in the top five (seven it is).

baby14y ago

It has never been the first result for me.

skillachie14y ago

lol, i checked google twice, then checked to make sure it was actually google that I was searching "Hacker News" in

chorola14y ago

Yes.What's wrong?

Craiggybear14y ago

Still number one via DuckDuckGo.

Google is now officially useless.

noduerme14y ago

DDG is using Bing for most general web results. My site mysteriously disappeared from Bing and DDG a few days ago at the same time; but Gabriel said if I'm not cool enough to get a top ranking on HN for six hours to complain about it, I'll just have to go through the normal process Bing sets up for webmasters.

mangodrunk14y ago

Your comments are officially useless.

TastyFish14y ago

Oh shit, now I'm gonna have to remember the url.

tpr1m14y ago

That website sucks too :\

StatHacking14y ago

This is an example of why Goog's search algorithms (and others') should be open: http://news.ycombinator.com/item?id=3268371

A subtle attack may be by making bots stop indexing it or using SEO practices to lower it enough so it would become unsearchable, and therefore, non-existent.

Or just crack into Google...

StatHacking14y ago

UPDATE: http://www.itworld.com/software/228393/free-software-activis...

j / k navigate · click thread line to collapse

199 comments

Matt_Cutts14y ago

Added: Looks like HN has been blocking Googlebot, so our automated systems started to think that HN was dead. I dropped an email to PG to ask what he'd like us to do.

pg14y ago

I sent you an email about this.

(A couple weeks ago I banned all Google crawler IPs except one. Crawlers are disproportionately bad for HN's performance because HN is optimized to serve recent stuff, which is usually in memory.)

pierrefar14y ago

Hi Paul,

A site can be crawled from any number of Googlebot IP addresses, and so blocking all except one doesn't help in throttling crawling.

If you verify the site in Webmaster Tools, we have a tool you can use to set a slower crawl rate for Googlebot, regardless of which specific IP address ends up crawling the site.

Let me know if you need more help.

Edit Detailed instructions to set a custom crawl rate:

1. Verify the site in Webmaster Tools.

2. On the site's dashboard, the left hand side menu has an entry called Site Settings. Expand that and choose the Settings submenu.

3. The page there has a crawl rate setting (last one). It defaults to " Let Google determine my crawl rate (recommended)". Select "Set custom crawl rate" instead.

4. That opens up a form and choose his desired crawl rate in crawls per second.

If there is a specific problem with Googlebot, you can reach the team as follows:

Thanks!

Pierre

2 more replies

wheels14y ago

Have you considered putting a caching reverse proxy in front of the arc app to keep the backend from having to render all of the old pages?

There's an example of doing such with nginx here:

http://serverfault.com/questions/30705/how-to-set-up-nginx-a...

With that you'd just have to send out the HTTP header from the arc app saying that current articles expire immediately, and old ones don't.

1 more reply

Matt_Cutts14y ago

Gotcha--thanks, Paul. I'm about to get on a plane, but we'll get this figured out where we're not sending as much hostload toward HN.

2 more replies

tibbon14y ago

Could you use some sort of sitemap or other way to provide the data to Google that isn't so damaging to site performance? Or in Google Webmaster tools turn down the rate of crawling?

Just realized that this could be a problem for lots of sites, and I'm curious as to what the best solution is, since not everyone has Matt Cutts reading their site and helping out.

1 more reply

jedberg14y ago

reddit had the same problem. We set up a separate server just for the google crawler with it's own copy of the database, so that the queries for old pages didn't slow down everyone else.

blhack14y ago

Seeing threads like this remind me that HN is still a pretty tight-knit community of real people doing real things.

It's good to see this stuff sometimes. Thanks, Matt!

edit:

And then reading some of the other threads on this topic is a bit...something.

Guys, can you calm the conspiracy theory nonsense a bit? Please?

You'd see the same type of thing on /. years ago if you frequented it enough.

This is nothing new. This is what a good community looks like. Everybody relax.

Honestly if you read the things that Matt and Pierre have said, they just looked at "freshness" (I believe that is what it is called), and inferred that PG had blocked their crawlers.

This is all stuff you can get from within google webmaster tools (which isn't some secret whoooo insider google thing. It's something they offer to everybody, and it's just like analytics.)

DUH!

C'mon, guys, look at their user pages. They're both just active users of the site trying to help out.

larrys14y ago

"This isn't HN getting some sort of preferential treatment, this is just the effect of having a userbase full of hackers"

Of course it's preferential treatment. And if you scan the last month or two of Matt's comments they are general in nature and not specific as in:

You don't think "pinging the right people" and "get this fixed pretty quickly" is preferential treatment?

3 more replies

billpatrianakos14y ago

It really is but I was personally kind of glad not to see HN have that high of a ranking on Google for the term.

RBerenguel14y ago

Advantages of having a deep Google guy as a user :)

cyanbane14y ago

The question is, 'Was Matt Cutts ever THE Googleguy?'

   ( http://www.webmasterworld.com/profilev4.cgi?action=view&member=GoogleGuy )

I always wondered who it really was years back.

1 more reply

jfoster14y ago

Has Google considered an indicator next to results for this type of case? A single red exclamation mark with further details when the mouse is over it, for example.

I'm aware of webmaster tools, but it seems not all webmasters are.

Matt_Cutts14y ago

But people do choose to remove their sites, so we can't always tell between a mistake vs. someone who genuinely prefers not to be in Google's index.

moe14y ago

Does it perhaps return something to the tune of "Unknown or expired link"? :-P

Matt_Cutts14y ago

Nope, it looks like we haven't been able to fetch even the robots.txt file for a few days, so our system is trying to play it safe by assuming that HN is unreachable/dead.

1 more reply

mutt_cutts14y ago

what about the massive amount of duplicate content indexed via this domain: hackerne.ws ?

eurohacker14y ago

can you improve my site also Matt and fix the issue , i had the same problem some time ago - some wordpress plugin blocked googlebot unintentinally and i lost the ranking,

i am sure here are many folks here as well who had similar problems, why not help us all out ))

benatkin14y ago

I never thought that Google Search would be so laughable.

billpatrianakos14y ago

Excellent contribution... Ugh, this is why I don't mind if HN is listed on Google or not.

2 more replies

GigabyteCoin14y ago

Apparently the page in the top spot now hasn't been updated since: 2011.07.08

It now only consists of ads, a twitter feed, and a "Abba-da-dabba-da-dabba-dabba Dat’s all folks!" line.

So what if Googlebot thought HN was dead? Why would it opt to show a "more dead" page in place of it?

I think you're fibbing.

StavrosK14y ago

The other page isn't dead, they can access it.

eurohacker14y ago

yes - it looks more like google had been changing their algorithm recently - in favor of older sites , versus the content -

and it didnt turn out quite well ...

hahla14y ago

Also don't get me wrong I love both google and hackernews. I just find whats going on in this thread interesting..

pierrefar14y ago

novedieci14y ago

Anyone know if this is also a common issue with BingBot? Our site is doing fine in Google at the moment, but was completely deindexed by Bing, and we're totally baffled.

unreal3714y ago

It comes down to karma.

felipe14y ago

> I am glad Google is a company made up of humans and not uncaring robots

That's exactly the problem. Google is notorious for providing horrible customer support, so why some people get personalized help, while the rest of us are stuck with the uncaring robots?

4 more replies

AznHisoka14y ago

jfoster14y ago

What do you think should happen in this scenario? Matt Cutts is aware that a reasonably high-profile site is not being indexed properly. Should he just ignore that?

If Matt was artificially boosting HN's ranking that would be disturbing, but they're fixing an indexing issue.

apetresc14y ago

1 more reply

Tichy14y ago

Google should simply deliver the best results, no matter what algorithm they use to do so.

1 more reply

umarmung14y ago

Damned if you and damned if you dont.

blhack14y ago

>Does anyone else find it disturbing that google employees are bending over for pg/hn?

Umm...no. They're not tweaking the algo, they're explaining to PG that he should stop blocking the crawlers, or that he should verify the site in google webmaster tools, then change the crawl rate.

larrys14y ago

Umm...Yes.

Seems like "bending over" to me:

"but I'm pinging the right people to ask whether we can get this fixed pretty quickly"

1 more reply

ig114y ago

If you use Google Webmaster Tools it will in-fact tell you all of this information without having to ask someone at Google.

MarkMc14y ago

product5014y ago

zem14y ago

> Maybe from their standpoint it would be more beneficial to google users who are used to typing in 'hacker news' to visit this site, but since when did that matter to google?

from google's point of view, a user who types "hacker news" into google is almost certainly looking for news.ycombinator. therefore, it is highly desirable for the site to be the #1 result.

zerostar0714y ago

Disturbing yes but not surprising, HN is influential. Now whether it's hacking news or entrepreneur news is another issue.

csallen14y ago

moses140014y ago

Completely agree with you Hahla - amazing to see the Googlers jump in including Matt Cutts so quickly to support PG and HN and YC yet 99.99999999% would never ever see support like this.

billpatrianakos14y ago

njloof14y ago

If you don't know somebody that works for Google at this point in your career, you're working in the wrong industry.

xd14y ago

~30k employees at google .. what makes you think they hold all the keys to the millions of web developers across the globe?

brown9-214y ago

What algorithm changes?

101010111100114y ago

Was the query for a domain name or just hacker+news? How do we know?

Does <title> make any difference?

I don't think of this site as "Hacker News". I think of it as ycombinator, and the subdomain, news.

Should users of hackernews.com think of that site as something else, e.g. whatever is between the title tags?

tripzilch14y ago

I suppose it is because the top hits below HN are perhaps even less about "hacker news" ?

pierrefar14y ago

I work at Google helping webmasters.

It seems something has been blocking Googlebot from crawling HN, and so our algorithms think the site is dead. A very common cause is a firewall.

I realize that pg has been cracking down on crawlers recently. Maybe there was an unexpected configuration change? If Googlebot is crawling too fast, you can slow it down in Webmaster Tools.

I'm happy to answer any questions. This is a common issue.

Pierre

joelhaasnoot14y ago

Matt_Cutts14y ago

Normally there's a lag between when a site goes dead and when we crawl the site to see that it's dead.

yeggeyeggeyegge14y ago

pierrefar14y ago

All these errors are reported in Webmaster Tools in the Diagnostics section. Verify your site and you'll have access to this and a wealth of more data.

Thanks for the feedback :)

Pierre

mmaunder14y ago

PG: welcome to the woes of being an alexa top 1000 site with over 1 million pages of dynamic content.

HN has roughly 1.3 million pages indexed by google.

1.3M pages at 43k per page is 53 gigs to cache static versions of all pages on the site. Quadruple that for a worst case scenario and it'll still easily fit on a single drive.

When your site gets this popular you tend to have to re-architect your application to solve perf issues. You could serve googlebot UA's 1 week old cached pages for example.

I'd encourage you to start thinking of yourself as a utility providing a valuable and necessary resource to the Net and take the time and energy to solve this properly.

resnamen14y ago

I'd like him to fix the experience for non-bot users first, namely the "Unknown or expired link" issue.

markerdmann14y ago

EDIT: Looks like the change to HN's ranking is related to a change that pg made, so my comment is now less relevant to the parent post. I still stand by it, though. :-)

Matt_Cutts14y ago

If you can look back in your search history to find the specific searches that didn't do well for you, we're always happy to get really concrete examples.

Lots of Googlers are clearly hanging out on HN over Thanksgiving while they're stuck at relatives' houses. :)

rumpelstiltskin14y ago

Here's a search that tripped me up -

I'm sure plenty of other searchers ended up wasting their time with this SERP just like I did.

1 more reply

markerdmann14y ago

1 more reply

billpatrianakos14y ago

1 more reply

lhnz14y ago

I actually think this might be positive for the quality of the conversation on this site.

_ndrw14y ago

True. No reason not to keep HN on the hush hush, eh?

billpatrianakos14y ago

w1ntermute14y ago

Out of curiosity, how did the OP even find out that this was going on? He appears to be a long-time user of the site, and therefore would have no reason to Google "Hacker News".

pbhjpbhj14y ago

So I'd only need to type "hacke" in google and click "I'm feeling lucky" in the drop down rather than type "http://news.ycombinator.com/ into an address bar.

Didn't know about the hackerne.ws domain name however.

narcissus14y ago

Could've been from telling a friend "hey yeah... found this on Hacker News". Then the friend went looking for 'Hacker News' on Google and came across the current top result.

At that point that friend asks the OP: "by Hacker News, you mean <this other site>, right?". Then the OP goes looking...

yonran14y ago

When I'm on someone else's phone or computer, I just search for [hn] to geT here quickly.

ludwigvan14y ago

There was one little issue though.

When I saw him again a few days later, he told me: "What a silly site HackerNews is! And I couldn't find the links to those classes over there."

He also told me that he was disappointed of me for visiting such a silly site.

Now, can you guess the look on his face when I told him that he was visiting the wrong site for the last few days?

tillk14y ago

great story, bro

lowglow14y ago

Matt Cutts is my hero. I've never seen anyone from google (or any company) be as proactive in explaining, interacting, and helping this community. Thanks.

cletus14y ago

Raised here already:

http://news.ycombinator.com/item?id=3277365

FWIW I've raised this issue.

Matt_Cutts14y ago

Thanks, cletus. I'll file one too, just to help make sure people check it out.

test562514y ago

Don't link to Google search results!

It's personalized - everyone sees different results. Even if you don't have a Google account.

For me http://news.ycombinator.com is the top page. But when I use TOR, http://www.hackernews.com and http://thehackernews.com/ are on top.

I don't think it's possible to get a real "invariant" result page. It all depends on which computer you use (cookies, language setting, ip address).

waitwhat14y ago

http://googlesystem.blogspot.com/2007/04/how-to-disable-goog...

AretNCarlsenOP14y ago

Or even on the first page. (I notice because I always get to HN this way. Today is the first time I have ever seen it below #1 or #2.)

deepkut14y ago

Screw it--don't crawl HN. Keep the community small and passionate :)

bezza114y ago

Bottom line is that if you don't trust Google & use their tools (GWT) then you can end up in sticky situations like this one.

My experience with the Crawl Rate feature via GWT is that they do honour it pretty strictly, but for large sites Gbot can cause a lot of extra load even if pages are static.

A good CDN and stateless cache server will help but for sites as large as HN every request adds up!

alpb14y ago

I remember times a few months ago when Hacker News was not the first result again. Just wanted to point this out.

gaza3g14y ago

I think it makes a a lot of sense for Matt Cutts to intervene in this case since:

1) Matt browses HN. 2) HN is a high-volume site and whatever suggestions that were discussed and implemented here can be noted and learned by everyone else.

yllus14y ago

I definitely didn't expect to see my own blog as the last result on page one. Or is that because I've shared it via Google Plus and it's being injected into my personal results?

kizel14y ago

Yeah, I usually get here that way, but today was the first time I haven't seen it on the front page, let alone first result.

williamle830014y ago

I want to "Learn ethical hacking training."

braga14y ago

Stay away from venture capitalists ;)

BadassFractal14y ago

I don't think it's all bad, we don't want this community to turn into another reddit.

rplnt14y ago

Nor second or anywhere in the top five (seven it is).

baby14y ago

It has never been the first result for me.

skillachie14y ago

lol, i checked google twice, then checked to make sure it was actually google that I was searching "Hacker News" in

chorola14y ago

Yes.What's wrong?

Craiggybear14y ago

Still number one via DuckDuckGo.

Google is now officially useless.

noduerme14y ago

mangodrunk14y ago

Your comments are officially useless.

TastyFish14y ago

Oh shit, now I'm gonna have to remember the url.

tpr1m14y ago

That website sucks too :\

StatHacking14y ago

This is an example of why Goog's search algorithms (and others') should be open: http://news.ycombinator.com/item?id=3268371

A subtle attack may be by making bots stop indexing it or using SEO practices to lower it enough so it would become unsearchable, and therefore, non-existent.

Or just crack into Google...

StatHacking14y ago

UPDATE: http://www.itworld.com/software/228393/free-software-activis...

j / k navigate · click thread line to collapse