YC: There should be a way to prevent reposts

11 pointsmartianpenguin18y ago13 comments

Recently, I've noticed a lot of articles that are reposts of the same thing. They should all be consolidated to one thread. Just my opinion.

13 comments

13 comments · 5 top-level

aneesh18y ago· 1 in thread

There could simply be a "report duplicate" link (where you specify which article it's a repost of) next to a new post for the first 1 hour after it's posted. If enough people say it's a duplicate of a particular article, then it goes off, and the comment threads are merged.

tim218y ago

Instead of dropping or merging the article, put a list (or link to a list) of "previous postings" somewhere very prominently on the page.

tlrobinson18y ago· 3 in thread

There is. Duplicate URLs are not allowed.

As for different sources reporting on the same thing, that's a bit harder.

bootload18y ago

"... There is. Duplicate URLs are not allowed. ..."

There is a subtler variation on this. Two different urls resolving to the same article. A site may publish:

- http://foo.com/date/bar/some-inane-tech-article

- http://foo.com/date/some-inane-tech-article

Both urls point to the same article, both are unique but point to the same document. A quick example might be an article and the same article printed.

bootload18y ago

"... There is. Duplicate URLs are not allowed ... There is a subtler variation on this. Two different urls resolving to the same article ..."

Here is an live example I just spotted:

original ~ http://www.paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=133430

dupe ~ http://paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=134775

bootload18y ago

and another:

- http://thoughtpad.net/alan-dean/http-headers-status.png, http://news.ycombinator.com/item?id=134933

- http://thoughtpad.net/alan-dean/http-headers-status.gif, http://news.ycombinator.com/item?id=134236

kajecounterhack18y ago· 3 in thread

Can something be written that strips the url of arbirary variables?

http://foo.com/bar?aritrary-var=arbitrary-val

to just

http://foo.com/bar

that would help because a lot of people end up posting links with ?source=newsletter or &sessionid=asdf1234ilikepie or etc

natrius18y ago

Those variables are occasionally meaningful. Stripping them all off indiscriminantly would break some links.

marcus18y ago

So don't strip them indiscriminately, have the HN site compare submitted pages it gets by dropping them one by one to the original link, only maintain those that actually affect the resulting page.

mixmax18y ago

What about:

http://economist.com/articles?somearticle

and

http://economist.com/articles?entirelydifferentarticle

jmtulloss18y ago· 1 in thread

I agree. I've also noticed that the same story as reported by two different news sources will get posted, which is essentially the same thing.

Google does some pretty cool probability stuff (at least, I think that's how they do it) to figure out what articles are the same for news.google.com. Something like that would be really cool on new.yc.

I know, the source is open.... but I'm clearly busy ;)

aneesh18y ago

Yeah, I've wondered, how exactly does Google do that?? Anyway, in the short term, a manual system would probably be more accurate.

nreece18y ago

There should be a better way to prevent reposts

j / k navigate · click thread line to collapse

13 comments

13 comments · 5 top-level

aneesh18y ago· 1 in thread

tim218y ago

Instead of dropping or merging the article, put a list (or link to a list) of "previous postings" somewhere very prominently on the page.

tlrobinson18y ago· 3 in thread

There is. Duplicate URLs are not allowed.

As for different sources reporting on the same thing, that's a bit harder.

bootload18y ago

"... There is. Duplicate URLs are not allowed. ..."

There is a subtler variation on this. Two different urls resolving to the same article. A site may publish:

- http://foo.com/date/bar/some-inane-tech-article

- http://foo.com/date/some-inane-tech-article

Both urls point to the same article, both are unique but point to the same document. A quick example might be an article and the same article printed.

bootload18y ago

"... There is. Duplicate URLs are not allowed ... There is a subtler variation on this. Two different urls resolving to the same article ..."

Here is an live example I just spotted:

original ~ http://www.paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=133430

dupe ~ http://paulgraham.com/ycombinator.html ~ post ~ http://news.ycombinator.com/item?id=134775

bootload18y ago

and another:

- http://thoughtpad.net/alan-dean/http-headers-status.png, http://news.ycombinator.com/item?id=134933

- http://thoughtpad.net/alan-dean/http-headers-status.gif, http://news.ycombinator.com/item?id=134236

kajecounterhack18y ago· 3 in thread

Can something be written that strips the url of arbirary variables?

http://foo.com/bar?aritrary-var=arbitrary-val

to just

http://foo.com/bar

that would help because a lot of people end up posting links with ?source=newsletter or &sessionid=asdf1234ilikepie or etc

natrius18y ago

Those variables are occasionally meaningful. Stripping them all off indiscriminantly would break some links.

marcus18y ago

So don't strip them indiscriminately, have the HN site compare submitted pages it gets by dropping them one by one to the original link, only maintain those that actually affect the resulting page.

mixmax18y ago

What about:

http://economist.com/articles?somearticle

and

http://economist.com/articles?entirelydifferentarticle

jmtulloss18y ago· 1 in thread

I agree. I've also noticed that the same story as reported by two different news sources will get posted, which is essentially the same thing.

I know, the source is open.... but I'm clearly busy ;)

aneesh18y ago

Yeah, I've wondered, how exactly does Google do that?? Anyway, in the short term, a manual system would probably be more accurate.

nreece18y ago

There should be a better way to prevent reposts

j / k navigate · click thread line to collapse