undefined | Better HN

0 pointstyingq4y ago0 comments

They haven't talked much detail since Matt Cutts left, but over time they did sort of outline the basics. That the core ranking is still some evolution of PageRank, weighting scoring of page attributes/metadata and flowing it down/through inbound links as well. But then altered via various waves of ML, like Vince (authority/brand power), Panda (inbound link quality), Penguin (content quality), and many others that targeted other attributes (page layout, ad placement, etc).

Even if some of that is off, the premise of a chain of some ML, and some not ML, processors means they probably can't really tell you exactly why anything ranks where it does.

0 comments

dehrmann4y ago

It's clear the public and lawmakers like the idea of knowing how the algorithm works, but what you posted is about as deep as people can reasonably understand at a high level. I don't think they realize how complex a system built over 20 years that's a trillion-dollar company's raison d'être can be.

zo14y ago

Those sound like awesome potential features. Allow users to assign 0-100% weights for each of those scoring adjustments during search,and show them the calcs (if you can).

tyingqOP4y ago

Supposedly there's thousands of different features that are scored, and those are just the rolled-up categories that needed their own separate ML pipeline step.

Like, maybe, for example, a feature is "this site has a favicon.ico that is unique and not used elsewhere" (page quality). Or "this page has ads, but they are below the fold" (page layout). Or "this site has > X amount of inbound links from a hand curated list of 'legitimate branded sites'" (page/site authority).

Google then picks a starting weight for all these things, and has human reviewers score the quality of the results, order of ranking, etc, based on a Google written how-to-score document. Then tweaks the weights, re-runs the ML pipeline, and has the humans score again, in some iterative loop until they seem good.

There's a never-acted-on FTC report[1] that describes how they used this system to rank their competition (comparison shopping sites) lower in the search results.

[1] http://graphics.wsj.com/google-ftc-report/

Edit: Note that a lot of detail is missing here. Like topic relevance, where a site may rank well for some niche category it specializes in. But that it wouldn't necessarily rank well for a completely different topic, even with good content, since it has no established signals it should.

dehrmann4y ago

> and those are just the rolled-up categories that needed their own separate ML pipeline step.

AKA ensemble models.

j / k navigate · click thread line to collapse

0 comments

dehrmann4y ago

zo14y ago

Those sound like awesome potential features. Allow users to assign 0-100% weights for each of those scoring adjustments during search,and show them the calcs (if you can).

tyingqOP4y ago

Supposedly there's thousands of different features that are scored, and those are just the rolled-up categories that needed their own separate ML pipeline step.

There's a never-acted-on FTC report[1] that describes how they used this system to rank their competition (comparison shopping sites) lower in the search results.

[1] http://graphics.wsj.com/google-ftc-report/

dehrmann4y ago

> and those are just the rolled-up categories that needed their own separate ML pipeline step.

AKA ensemble models.

j / k navigate · click thread line to collapse