Google is the only search engine that works on Reddit now, thanks to AI deal (opens in new tab)

(404media.co)

515 pointsturkeytotal1y ago356 comments

356 comments

  # Welcome to Reddit's robots.txt
  # Reddit believes in an open internet, but not the misuse of public content.
  # See https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy Reddit's Public Content Policy for access and use restrictions to Reddit content.
  # See https://www.reddit.com/r/reddit4researchers/ for details on how Reddit continues to support research and non-commercial use.
  # policy: https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy

  User-agent: *
  Disallow: /

Source: https://www.reddit.com/robots.txt

sunaookami1y ago

They serve a different robots.txt to Google: https://merj.com/blog/investigating-reddits-robots-txt-cloak...

You can see it here: https://search.google.com/test/rich-results/result?id=_mYogl... (click on "View Tested Page")

dogleash1y ago

> # Reddit believes in an open internet, but not the misuse of public content.

Calling it "public" content in the very act of exercising their ownership over it. The balls on whoever wrote that.

shit_game1y ago

Their license/Eula clearly state that Reddit has perpetual whatever to content posted on Reddit, but relying solely on DMCA for "stolen" content _yet again_ feels like a terrible way to deal with non-original content. Part of me hopes that Reddit gets hit with some new precidence-setting lawsuits regarding non-original content that requires useful attribution, but I double t that will ever happen.

1 more reply

pas1y ago

it's even worse. it's not theirs (it's the users'), they are merely hosting it and using it (ToS gives them a fancy irrevocable license I guess).

so they can do whatever they want with it and the actual owners/authors have no chance to really influence Reddit at all to make it crawlable. (the GDPR-like data takeout is nice, but ... completely useless in these cases where the value is in the composition and aggregation with other users' content.)

3 more replies

Khelavaster1y ago

The Fake News police should shut down this sort of messaging

will01y ago

Looks like it changed a month ago:

https://old.reddit.com/r/redditdev/comments/1doc3pt/updating...

immibis1y ago

Nobody who wants to be successful obeys robots.txt. And I do mean nobody.

chippiewill1y ago

They changed it to disallow so that scrapers can't just claim the robots.txt gave them permission.

3 more replies

latexr1y ago

That’s a weird statement to be absolutist about. The majority of individuals and companies who want to be successful do not do so by scrapping websites, thus have no reason to disobey robots.txt. Most people in the world, ambitious or not, wouldn’t even understand what your sentence refers to.

1 more reply

maxnevermind1y ago

Has not NYT tried to sue OpenAI because of them ignoring robots.txt or you mean it's impossible to prove and / or it's still more profitable to just ignore robots.txt?

JohnFen1y ago

Sadly true. That's why I gave up on robots.txt years ago and started blocking crawlers outright in .htaccess

Of course, that became unsustainable so now I have everything behind a login wall.

Zuiii1y ago

> We believe in something that we will now proceed to violate.

I will never take a statement given by a company that blatantly lies like this at face value going forward. What a bunch of clowns.

raverbashing1y ago

With the amount of crap in Reddit, cleaning it must be a very non-trivial problem. (I mean, it never is, but in the case of Reddit it's probably extra complicated)

arnaudsm1y ago

I understand the AI context, but this is dangerously anticompetitive for other search engines.

This is a dangerous precedent for the internet. Business conglomerates have been controlling most of the web, but refusing basic interoperability is even worse.

zooq_ai1y ago

There is nothing preventing search companies paying the same $60 Million to license content.

If reddit had exclusive agreement, it would be anti-competive.

This is classic HN anti-Google tirade (and downvoting facts, logic and concepts of free market)

not_wyoming1y ago

> There is nothing preventing search companies paying the same $60 Million to license content.

Yes, actually, there is - having $60m to throw around.

"Barriers to entry often cause or aid the existence of monopolies and oligopolies" [0]. Monopolies and oligopolies are definitionally the opposite of free market forces. This is quite literally Econ 101.

[0] - https://en.wikipedia.org/wiki/Barriers_to_entry

4 more replies

not_wyoming1y ago

Juiciest update I’ve ever gotten to share: https://www.nytimes.com/2024/08/05/technology/google-antitru...

pluc1y ago

Paying 60 million to every site you want to index is also a bad precedent to set. Why can Reddit get paid and XYZ can't?

2 more replies

onlyrealcuzzo1y ago

This is an interesting development.

How many other sites might have leverage to charge to be indexed?

I don't want to live in a world where you have to use X search engine to get answers from Y site - but this seems like the beginning of that world.

From an efficiency perspective - it's obviously better for websites to just lease their data to search engines then both sides paying tons of bandwidth and compute to get that data onto search engines.

Realistically, there are only 2 search engines now.

This seems very bad for Kagi - but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

ColinHayhurst1y ago

Kagi uses at least Google and Mojeek

edit:

> Realistically, there are only 2 search engines now.

https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

WarOnPrivacy1y ago

> Realistically, there are only 2 search engines now.

From the article:

     Many alternatives to GBY [Google, Bing, and Yandex] exist, but almost none of them have their own results;

This seems to assert that ~0 other search providers do any crawling at all. Ever. Are we sure that's the case?

   (they could crawl but never ever return those results == more odd).

5 more replies

Yawrehto1y ago

Doesn't it list three major ones, Google, Bing, and Yandex, plus Mojeek and a few other small ones? That's a bit more than two.

McDyver1y ago

That seems like the business model for streaming. You subscribe to X provider to watch Y series. So, as for streaming, I suppose a pirate bay search engine will come up

toomuchtodo1y ago

Pirate Bay is probably not the most optimal analogy, more like Anna's Archive imho [1], individually offered by web property scrape runs compressed into a package, maybe served by torrents like this Academic Torrents site example [2].

Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... ("HN Search: annas archive")

[2] https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d... ("AcademicTorrents: Reddit comments/submissions 2005-06 to 2023-12 [2.52TB]")

gtirloni1y ago

> but this seems like the beginning of that world.

It's not the beginning, it's mere continuation.

Walled gardens have existed since the AOL days. They deteriorate over time but it doesn't prevent companies from trying (each time, in bigger attempts).

aAaaArrRgH1y ago

> but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

It still exists. It just isn't that popular.

splwjs1y ago

idk man i bet you five bucks and a handshake it's just going to play out like the existing startup grift.

There's an established player with institutional protections, then a scrappy upstart takes a bunch of VC money, converts it into runway, gives away the product for free, gradually replaces and becomes the standard, then puts out an s-1 document saying "we don't make money and we never have, want to invest?" and then they start to enjoy all the institutional protections. Or they don't. Either way you pay yourself handsomely from the runway money so who cares.

The upstart gets indexed and has an API, the established player doesn't.

The upstart is more easily found and modular but the institutional player can refuse to be indexed to own their data and they can block their API to prevent ai slop from getting in and dominating their content.

StrauXX1y ago

IANAL but as far as I understand the current legal status (in the US) a change in robots.txt or terms and conditions is not binding for web scrapers since the data is publicly accessible. Neither does displaying a banner "By using this site you accept our terms and conditions" change anything about that. The only thing that can make these kinds of terms binding is if the data is only accessible after proactively accepting terms. For instance by restricting the website until one has created an account. Linkedin lost a case against a startup scraping and indexing their data because of that a few years ago.

qingcharles1y ago

At the federal level; but states have their own laws. For instance, it can get you 5 years in prison in Illinois to violate a web site ToS.

https://www.ilga.gov/legislation/ilcs/ilcs4.asp?DocName=0720...

redcobra7621y ago

Has anyone ever successfully been prosecuted for violating this statute?

1 more reply

jpalomaki1y ago

Quite sure they are also enforcing these with some technical measures to limit scraping.

renlo1y ago

As was LinkedIn, who was forced to rate stop limiting / IP-banning scrapers for public pages.

2 more replies

wtf2421y ago

This problem is only going to get worse. for my thegreatestbooks.org site i used to just get indexed/scraped by google and bing. now it's like 50+ AI bots scraping my entire site just so they can train a LLM to answer questions my site answers without having a user ever visit my site. I just checked cloudflare and in the past 24 hours I've had 1.2 million bot/automated requests

sct2021y ago

There's a new setting in Cloudflare to block AI/scraper bots. https://blog.cloudflare.com/declaring-your-aindependence-blo...

graeme1y ago

Anyone have any experience with this? Is there nothing but upside in blocking these bots

1 more reply

jedberg1y ago

They changed robots.txt a month or so ago. For the first 19 years of life, reddit had a very permissive robots.txt. We allowed all by default and then only restricted certain poorly behaved agents (and Bender's Shiny Metal Ass(tm))

But I can understand why they made the change they did. The data was being abused.

My guess is that this was an oversight -- that they will do an audit and reopen it for search engines after those engines agree not to use the data for training, because let's face it, reddit is a for profit business and they have to protect their income streams.

Closi1y ago

> But I can understand why they made the change they did. The data was being abused.

Depends how you see it - if you see it as 'their' data (legally true) or if you see it as user content (how their users would likely see it).

If you see it as 'user content', they are actually selling the data to be abused by one company, rather than stopping it being abused at all.

From a commercial 'lets sell user data and make a profit' perspective I get it, although does seem short-sighted to decide to effectively de-list yourself from alternative search engines (guess they just got enough cash to make it worth their while).

Ajedi321y ago

> if you see it as 'their' data (legally true)

Is that actually true? Reddit may indeed have a license to use that data (derived from their ToS), but I very much doubt they actually own the copyright to it. If I write a comment on Reddit, then copy-paste it somewhere else, can Reddit sue me for copyright infringement?

1 more reply

passwordoops1y ago

Enough cash or enough data on hand to show the majority of traffic comes from the search monopoly

ColinHayhurst1y ago

Person extensively quoted in the article here. They are welcome to reach out. But not a single person from any level did that, nor replied to my polite requests to explain and engage. We first contacted them in early June and by 13th June, I had escalated to Steve Huffman @spez.

toomuchtodo1y ago

An acquaintance investigating Reddit's moderation mechanization inquired how a major subreddit was moderated after an Associated Press post was auto removed by automod. They were banned from said sub. They inquired why they were banned, and they shared they would share any responses with a journalism org (to be transparent where any replies would be going, because they are going to a journalism org). They were muted by mods for 28 days and were "told off" in a very poor manner (per the screenshots I've seen) by the anonymous mod who replied to them. They were then banned from Reddit for 3 days after an appeal for "harassment"; when they requested more info about what was considered harassment, they were ignored. Ergo, inquiring as to how the mods of a major sub are automodding non-biased journalism sources (the AP, in this case) without any transparency appears to be considered harassment by Reddit. The interaction was submitted to the FTC through their complaint system to contribute towards their existing antitrust investigation of Reddit.

Shared because it is unlikely Reddit responds except when required by law, so I recommend engaging regulators (FTC, and DOJ at the bare minimum) and legislators (primarily those focused on Section 230 reforms) whenever possible with regards to this entity. They're the only folks worth escalating to, as Reddit's incentives are to gate content, keep ad buyers happy, and keep the user base in check while they struggle to break even, sharing as little information publicly as possible along the way [1] [2].

[1] https://www.bloomberg.com/news/articles/2024-05-09/reddit-la... | https://archive.today/wQuKM

[2] https://www.sec.gov/edgar/browse/?CIK=1713445

1 more reply

JohnMakin1y ago

One (in this case, 2) company's incentive for profit should not take priority over the usability/well being of the internet as a whole, ever, and is exactly why we are where we are now. This is an absolutely terrible precedent.

BeetleB1y ago

I know people will hate to hear this, but Reddit it's not important to the A well being of the Internet.

2 more replies

jedberg1y ago

I agree with you in theory, but in practice someone has to pay for all this magic.

4 more replies

ColinHayhurst1y ago

The blocks for MojeekBot, as Cloudflare verified and respectful bot for 20 years, started before the robots.txt file changes. We first noticed in early June.

We thought it was an oversight too at first. It usually is. Large publishers have blocked us when they have not considered the details, but then reinstated us when we got in touch and explained.

ekidd1y ago

I personally feel that this kind of "exclusive search only by Google deal" should result in an anti-trust case against Google. This is the kind of abuse of monopoly power that caused anti-trust laws to be passed in the 1890s.

eddd-ddde1y ago

if i create a vacuum cleaner and decide to only sell it at Walmart you can't get mad at me for not wanting to sell it at costco

you can always buy a competitor's or make your own vacuum cleaner if you hate buying at Walmart

maybe what you are really mad about is Reddit monopolising content

2 more replies

fredgrott1y ago

the article quotes reddit policy change: Reddit considers search and ads commercial activities and thus subject to robot.txt block and exclusion.

EasyMark1y ago

how was it being abused. You still clicked on the information and saw the reddit ads? Now they won't get any of that from "rival" sites to google. I guess they figured the 60 million was more than that ad revenue. Seems greedy but I don't think it's illegal like others are suggesting.

account421y ago

Ah so when reddit uses user content for monetization it's ok but when others do it then it isn't? Reddit may want that double standard but I think the only thing they are going to achieve with this stunt is more people ignoring robots.txt.

ykonstant1y ago

It's ironic, because Reddit is the only search engine that works on Google now thanks to shittening.

maxwell1y ago

They're both running on fumes at this point.

riiii1y ago

Also sniffing them.

QVVRP4nYz1y ago

For years reddit build-in search was broken (or at least broken) and people were forced to use 3rd parties like google, so we came full circle.

daft_pink1y ago

I don’t understand how this isn’t anti-competitive behavior. It seems like reddit has to offer this deal with similar terms to google’s competitors.

talldayo1y ago

They do offer that deal to others; a big news story was when OpenAI bought Reddit's data they were selling: https://openai.com/index/openai-and-reddit-partnership/

dathinab1y ago

yep, but for things which are "only" search engines it's not a viable offer. Only if you expect "big AI business value" from it does it make sense, maybe.

eddd-ddde1y ago

I don't see how this tracks at all. Companies can decide to only sell their products with some retailer if they want. You can't force them to make deals with other companies.

gtirloni1y ago

You certainly can in monopoly situations (which apparently this isn't the case).

Suppafly1y ago

Most business deals are anti-competitive in some way. What makes you think this specifically rises to the level where they'd legally have to offer similar terms to competitors?

daft_pink1y ago

I’m not sure. Maybe the angle is that Google is anti-competitive by signing an agreement that limits information to it’s rivals.

Being forced into using google services, because they are paying information companies to deal only with them seems like a disaster for the web.

carlosjobim1y ago

Why in the world would they have to do that? There are thousands of exclusive business-to-business deals being signed into action every second of the day.

lmeyerov1y ago

FWIW, we inquired to the reddit sales team about paying for data sometime last year, as we do similar elsewhere for use cases like helping emergency responders, and even though they were launching the program and asking for customers... no email back. Nor on our second and I think third attempt.

I'm not sure what to make of that.

morkalork1y ago

How much were you willing to pay? Still, rude of them not to even discuss the issue. Every time I've gone to buy data, if I'm too small of a fish, vendors have always been happy refer me to a reseller.

heisenbit1y ago

Certainly rude but also possibly legally problematic. If they were judged to be in a dominant position in a market and were found making deals with exclusivity then it can get expensive.

It all depends of course what the market is. If one looks as reddit not as a whole but as a collection of niches then one could imho find niches where reddit has a dominant knowledge position.

lmeyerov1y ago

We do 4-6 figures/yr for providers which is normal in our world

An enterprise sales team with only 1 customer happens (eg, Mozilla 's search bar), but... That's surprising here, and scary as a sustainable & scalable business. Ignoring 5-6 figure/yr inquiries says a lot to me. In contrast, we did that same-day with Twitter without talking to anyone.

dathinab1y ago

Worse it doesn't even really "work" anymore, giving how most search are flooded with garbage SEO results and payed advertisements "basically" looking like search results (most times more garbage not what you are looking for results, int he cases where it isn't it quite often times is on the line of "googles algorithm blackmailing companies to buy ads for users which want to find them through google but wouldn't without ads".)

I wonder if this might affect redis, as in slowly kill it's user base especially when it comes to user providing (and often also looking for) high quality content, because who of such users would want to use google search?

john-radio1y ago

> Worse it doesn't even really "work" anymore, giving how most search are flooded with garbage SEO results and payed advertisements "basically" looking like search results ...

I don't understand what you're saying. That's exactly why people append `site:reddit.com` to their searches in the first place, because those search results typically aren't like that.

wwweston1y ago

Or at least, reddit posts and comments that are content messaging / marketing (human or AI) fit in better with earnest and natural posts, so that they're more effective.

numbers1y ago

"Information is power. But like all power, there are those who want to keep it for themselves. The world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations." - Aaron Swartz (2008)

1vuio0pswjnm71y ago

"If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn't rely on Google's indexing and search Reddit by using "site:reddit.com," you will not see any results from the last week."

The veracity of this statement is questionable.

I found at least four web search engines not using Google's index that produced results from the last week.

Example: Recent eruption at Yellowstone Black Diamond Pool

https://www.ecosia.org/search?method=index&q=site:reddit.com...

https://search.brave.com/search?q=reddit.com+black+diamond+p...

https://api.yep.com/fs/2/search?client=web&gl=all&no_correct...

   POST /sp/search HTTP/1.0
   host: www.startpage.com
   content-length: 74
   content-type: application/x-www-form-urlencoded
   query=site:reddit.com black diamond pool&abp=-1&t=&lui=english&sc=&cat=web

At least for this example, I got the same desired result using Reddit site search.

https://old.reddit.com/search/?q=black+diamond+pool

If anyone has some good examples of search queries that I can test showing why a search engine must be used, please share.

1vuio0pswjnm71y ago

"Its search engine uses Microsoft's Bing's technology, with whom it has a long-term arrangement."

https://www.bbc.com/news/business-53922786

lopis1y ago

Ecosia does use Google or Bing, which you can select in the settings.

niutech1y ago

But Brave Search has their own independent index.

r_singh1y ago

I wonder how Aaron Swartz would react to this

geodel1y ago

My guess is he'd freak out once he'd hear that lawyers, law enforcement may get involved on this issue.

voisin1y ago

Makes sense that Google did this deal since their search quality tanked and they became an de facto front end UI for Reddit.

NoMoreNicksLeft1y ago

Up until 2016 (I think, +/- 1 year), if you could remember 3 uncommon words in a comment, you could find any reddit post instantly on Google. I'd want to follow up on a thread from weeks ago, and it was magic. Number one result. Then one day that just stopped working, and even adding site:*.reddit.com didn't fix it. At the time, I think, I didn't realize that it was mostly Google's fault, I thought maybe Reddit had changed their infrastructure so that it couldn't be crawled properly.

Google hasn't been a search engine in a long while, it's just an advertisement engine now.

dev1ycan1y ago

it's so bad it's crazy, you can legit not find stuff on the internet anymore, it's the same with youtube, I search something and get like 20 or so results and then everything else is hidden.

it started when youtube removed the ability to search for videos older than 5 years, if I had to guess? cost saving, have every old video in cheaper storage... but it sort of fragments youtube, every couple of years you only get newer content.

2 more replies

niutech1y ago

Have you tried Verbatim search in Google?

LegitShady1y ago

"we noticed that since our search results had gotten so bad nobody can use them to find the things they want, people just kept adding "reddit" to search terms anyways, so we figured we might as well make it official and exclusive"

mutatio1y ago

It's funny in the context of Google's past motto of "don't be evil". I feel the right thing for Google here would have been to decline any deal regarding exclusivity, then Reddit wouldn't have pulled the trigger with its robots.txt update. The entire manoeuvre required both parties.

peddling-brink1y ago

Google should abandon its mission to “organize the world’s information” because doing so requires spending money for valuable data, and others might not want to spend that money?

roughly1y ago

Boy, the LLMs have really been an apocalypse moment for the web, haven’t they? Between hoovering up and monetizing every bit of content they can without any attribution or compensation and the absolute flood of mediocre generated content, they’ve really done in the last straggling remains of the open internet.

It’s not like everyone wasn’t already pulling the same grift, but quantity really does have a quality all its own.

imglorp1y ago

Of course, we have to be careful not to villainize a neutral tech. Instead let's call it what it is: unchecked capitalism and monopolistic behaviors.

Capitalism seems to work ok for the common good until you remove all the protections. LLMs provide a defacto monopoly for the owner which must already be a near monopoly: they take vast resources to train; only a giant corp can afford to buy all the content and provision enough resources to train one.

LLM did not enshittify what's left of the internet, greed did it.

latexr1y ago

On the one hand, you’re absolutely right. But on the other hand it’s not like it matters in practice. Isn’t most technology technically neutral? But it’s also made to be used by people, who can do so beneficially or detrimentally. Criticising a technology is a shorthand for criticising how it’s used.

synicalx1y ago

> Of course, we have to be careful not to villainize a neutral tech

This is a very good point IMO. If we're going to chastise LLM's we may as well give servers, switches, routers, fiber-optic cable, and silicone a bollocking as well since that's ultimately what's facilitating all this.

1 more reply

lifestyleguru1y ago

I deeply regret every minute spent on and kilobyte of text contributed to reddit.

Ylpertnodi1y ago

I don't. There's nothing around that is similar...with the same traction. The various 'verses are variations on cat pics. I'm still looking, though.

wccrawford1y ago

While it's still not Reddit, but I've been enjoying Lemmy. I have a similar range of communities on each, and other than some annoying groupthink, the content is often similar.

And to me, forgetting to log in to each of them feels similar, too. For what that's worth. (I hate both of them when not logged in.)

card_zero1y ago

I mostly contributed to r/nonsense and I'm pleased by the thought of that sub's content being used to train future AI, with information about the architectural uses of super-tall chef's hats, the prehistoric invasion of Europe by Beak People, and so forth.

trallnag1y ago

I can confidently state that I'm a net negative for Reddit, looking at the dozens of banned accounts in the trash bin of my KeePass vault

neilv1y ago

I'm concerned multiple ways by this, but I also could see some positive fallout from this, if it sets precedents that help protect 'content' owners from AI goldrush companies just taking everything.

gtirloni1y ago

AI companies are the least of our worries in the Reddit situation. The fact that Reddit has full control of user-generated data to do as they please gives them freedom to do as they please. I think this is the crux of today's issue.

AI companies like Google, Microsoft and OpenAI have deep pockets to 'unprotect' themselves from anything. The barrier to entry is for small AI companies and those aren't really making an impact currently.

PaulRobinson1y ago

This is great. It means I won't see Reddit content popping up all over search results in other engines. Can Medium do the same? And perhaps Quora?

jonpurdy1y ago

FYI, Kagi lets you do this and personalize it as you desire. They even share aggregated stats※ about which domains users choose to block/lower. (Mine generally match these stats.)

※ - https://kagi.com/stats?stat=leaderboard&k=-2

eliasdaler1y ago

You can also do this for free with uBlackList: https://github.com/iorate/ublacklist

This has greatly enhanced my Google experience - easy to ban content farms, AI-generator websites from appearing in Google Images etc.

WarOnPrivacy1y ago

> Kagi lets you do this and personalize it as you desire.

Kagi shill here. Are they finally applying filters and operands to image searches?

Asking because it was a tough year seeing Pinterest as top filter choice and top result in images (when set as filter=block).

(edit: I just tried searching->image: beautiful quilt patterns. I didn't spot any Pinterest results!)

I have never understood why DDG, etc steadfastly refuse to obey operands in image searches. Most days. Every blue moon operands seem to work. I think.

sidebar: Yesterday I saw Yandex obey quotes in a web search. It was the 1st time I've seen that.

1 more reply

Suppafly1y ago

>This is great. It means I won't see Reddit content popping up all over search results in other engines.

Honestly, that makes those other engines way less valuable because for many topics, telling the engine to specifically narrow the results down to reddit comments is the only way to get a decent answer to what you're looking for. I'd definitely support blocking Quora from everything though.

kingnothing1y ago

What use do you get out of a search engine if not searching for reddit and other forums? The rest of the internet has become a cesspool of useless AI generated crap.

kevincox1y ago

To be fair Reddit threads are more and more often getting filled with useless AI generated crap as well.

jjulius1y ago

To be fair, Reddit has plenty of astroturfing, too.

rkangel1y ago

Interesting. I have long found Reddit to be the an excellent source of solutions to problems. Stack Overflow usually beats it for programming specific stuff, but for everything else usually the most helpful answer comes from Reddit. It's a real person, helping another real person with a real problem.

troyvit1y ago

Kagi lets you configure the search engine to deprioritize or even fully eliminate search results. They ride on the back of Google's indexing so -- if you ever change your mind -- you could bring reddit searches back.

bdjsiqoocwk1y ago

What a weird thing to say. Reddit has for a long time been a place where real people hang out and have real conversations, unlike quora and medium.

VancouverMan1y ago

> where real people hang out and have real conversations

I don't consider the discussions there to be "real" in any meaningful way, thanks to the extensive moderation.

From what I've seen, there typically ends up being a small handful of moderator-enforced narratives that are deemed "acceptable" for a given subreddit, and any commenters deviating from those narratives get banned, or their comments end up as "[removed]" by "[deleted]", or the comments get obscured with the "comment score below threshold" notice.

It's generally some of the most one-sided and blandest discussion around. Given that there's often no meaningful back-and-forth involving differing perspectives of any sort, I'm not even sure if it should be considered "discussion". It's more like regurgitation and repetition.

I've found the situation to be particularly bad on the Canadian locale-specific subreddits, for example, but a enough of the tech-oriented ones I've seen seem to end up like that, too.

candiddevmike1y ago

I think Reddit lost that kind of authenticity a while ago. Advertisers know the "search:reddit.com <product>" trick, and when you look at the number of upvotes, it costs _pennies_ to get your product trending in the comments.

1 more reply

MattPalmer10861y ago

Its not strange to me. Every single time I've followed a Reddit link from search results, I've got a short and fairly useless conversation that doesn't help me at all. So I have never understood why people like it.

Obviously, people do see value in it, or they wouldn't keep saying so! I would happily exclude Reddit links from search results though.

psunavy031y ago

Yeah, but each sub to a greater or lesser degree, has its own hivemind you'll be run out of town (or possibly even banned) for challenging. And the average member of Reddit is quite willing to spout off confidently incorrect BS and downvote people into the ground who actually know what they're talking about.

Not exactly always a reliable source of info outside uncontroversial niche topics or places like /r/AskHistorians that actually moderate. And even there I've seen the occasional humdinger.

PaulRobinson1y ago

I've got thousands of karma. Used to love it. No more.

lfkdev1y ago

Yeah awesome, reddit was one of the last useful results beside the spam blogs and ai generated articles.

nullc1y ago

It's weird to say that reddit "works" with google. Every page they serve to google is stuffed full of hidden unrelated content, so any reddit result in google is unlikely to actually contain what you were searching for.

Google really should blacklist reddit entirely for this practice, but sadly as bad as reddit is it's still a much higher quality result than average for google.

account421y ago

Ugh it's absurd at how incompetent Google is at filtering out "related" content or similar volatile "sidebar" feeds in the sitesthey index that has nothing to do with the main content and won't be there when the user actually opens that link.

jumploops1y ago

IIRC, GPT-2 was primarily trained on Reddit[0]

[0]https://www.reddit.com/r/ChatGPT/comments/133xgb5/gpt2_was_p...

ChrisArchitect1y ago

Fine with this. This is the world OpenAI created. And all the people that started searching with +Reddit tacked on weirdly like 5 years ago. Reddit's covering themselves from internal user-concern and their general exposure to AI training and Google was smart enough to get on that quickly. We'll see what Bing's take is and what changes if anything now that 404medias's outrage farming is at play. This isn't a recent change afterall, month ago?

nomilk1y ago

Suppose a crawler or rival search engine doesn’t respect robots.txt, reddit can’t stop them. Make it a bit trickier, yes, but not stop them.

eschneider1y ago

It is evidence that they didn't have permission if you sue them.

kingnothing1y ago

There's no grounds on which to file suit. The 9th circuit court found web scraping is legal.

https://techcrunch.com/2022/04/18/web-scraping-legal-court/

1 more reply

miyuru1y ago

reddit blocked datacenter IPs even before this change.

nomilk1y ago

Could a motivated scraper not buy IPs/proxies that aren’t in those ranges, i.e. to blend in with general users?

2 more replies

r_singh1y ago

Thinking from reddits perspective they have nothing to lose really. It’s not like other search engines are going to pay any attention to the robots txt and Google’s AI would have still scraped data from Reddit regardless of the deal. Now they will just feel less bad about not citing sources possibly, depending on the user experience they want to deliver.

debacle1y ago

Reddit has been ripe for disruption for years. It's just waiting on an inflection point and someone to take it behind the barn.

crazygringo1y ago

The network effects are too strong.

Remember, the only reason Reddit "won" was because Digg destroyed itself with a radical upgrade that everyone hated.

Reddit would have to do something similarly self-inflicted, and I can't even guess where people would go. Reddit was already an alternative to Digg -- what's the alternative to Reddit? I mean, it's certainly not Quora.

CSMastermind1y ago

I don't think this is true.

The main thing I see Reddit being useful for are discussions about entertainment.

There's probably a subreddit for your favorite sports team, twitch steamer, TV show, book series video game, politics (which is entertainment for some people).

Reddit has seriously degraded the experience of a lot of these communities with things like restricting custom CSS.

It seems to me that the way you'd disrupt Reddit as a startup is to pick a vertical and laser focus on becoming the best discussion board for that community. If it's sports than have integrations for live stats, scores, etc.

In general you could attract users by offering profit sharing on ads the same way Youtube does for creators.

Have the best moderation tools in the world, a constant painpoint with Reddit. Give admins more flexibility over the appearance of the board, all things Reddit took away.

The other path for disruption would be if an established company with those communities tackled the problem. Lots of communities already us Discord, but they tend to also have a subreddit because chat and forums are different communication methods. Discord could easily offer a forum product as an extension of their chat services. If they do it well they'd drive a lot of users away from the subreddits.

1 more reply

NoMoreNicksLeft1y ago

It was already dead by then. Really, it was the various Slashdot exoduses... sites like K5 got large initial boosts, but stumbled and started to deteriorate. If the Digg exodus is what sent you to Slashdot, chances are you're the kind of user everyone else was trying to escape.

>what's the alternative to Reddit? I mean, it's certainly not Quora.

If it was deliberate I certainly can't tell, but one of the characteristics of Reddit is that it caused so many other little tiny internet forums to just wither away. Most were visually unappealing, running some ancient phpbb software or whatever, but there were so many like stars in the night sky. Now, if they're even still running, you look for the newest post, and it will say "November 2023". Hell, the only reason they are still running is that the credit card number on file paying for hosting doesn't expire until next year somehow. Reddit is a red tide algae choking out all life in the ocean, nothing else gets to exist anymore.

1 more reply

Suppafly1y ago

>Reddit was already an alternative to Digg -- what's the alternative to Reddit?

This site is essentially 'orange reddit', they just need to add sub-HNs or tagging or something and it'd be ready for an influx of reddit refugees. Not that any of really want it, but it's possible.

tayo421y ago

Reddit is quietly a huge website with a significant amount of users. So many people use it but dont talk about it. Google search says 1billion mau? Twice as big as Twitter

nope10001y ago

There is Lemmy for example, very similar to old Reddit. The big problem is the missing content outside of mainstream communities.

1 more reply

bdw52041y ago

The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

You don't want to end up banned from a movies forum because you also participate in a political forum. Federation solves that problem because you can use separate accounts without either forum knowing that you also use the other.

ravetcofx1y ago

This exists with Lemmy already and is fostering nice communities (and due to ActivityPub is interoperable with Mastodon accounts)

psunavy031y ago

They had this years ago, and they were called "forums."

Suppafly1y ago

>The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

Honestly, it's strange to me how hard people are trying to make distributed anything happen. Federation mostly solves a problem that real people don't have or care about.

1 more reply

teabee1y ago

Is this not just what the internet was before reddit? What features would "distributed reddit" have that an internet full of independent community forums be missing?

ks20481y ago

I like it principle, but after watching the situation with Twitter clones, I'm not too optimistic on federated services taking off.

I would like to see a wikipedia-style system for Twitter/Reddit: open access data, non-profit.

astrange1y ago

It's not possible because the most common problems with running a forum are spam and moderation, and both of those are too much work unless it's centralized.

jessriedel1y ago

Very few of the reddit users who are providing the content for free are motivated by which search engines are allowed to index the content, so I don't see how this would make it more ripe for competition. (If you just mean society would now be even better off if reddit were disrupted, ok, maybe, but that's a different thing.)

onlyrealcuzzo1y ago

Or for Google to buy it.

They could monetize it much better while being less annoying.

Ultimately - Google is getting everything they want from Reddit with this deal without having to buy it outright.

Short of Reddit transforming to an entirely different product (difficult) - I'm not sure where the major growth opportunity is for it.

rob741y ago

It wouldn't be the first time they have done something like this either. Remember https://en.wikipedia.org/wiki/Google_Groups ?

1 more reply

QVVRP4nYz1y ago

The killing of third party clients didn't have significant impact, I don't know what would they have to do to lose users, other than some kind of mandatory subscription fee.

escapecharacter1y ago

Man, I just want to be able to search the entire internet for when I’m doing niche research.

Does this mean there will be a future where everyone is running their own crawler? I suppose.

api1y ago

Networks effects are more powerful than we are. Witness the number of people who despise Xhitter but are still on there. Once something has a sufficient network effect they become immune to normal market forces and able to abuse their position with near impunity.

myrandomcomment1y ago

So I went Slashdot, Digg, Reddit. I stopped spending any time on Reddit 5 years ago. Not worth it.

cyanydeez1y ago

Work is such a flimsy word for qhat google currently does with search

As soon as someone shows me a search engine that restores quality of searxh, im getting a subscription for work.

It really cany be hard to whitelist sources and index appropiately.

Get goimg nerds , google has fallen.

thih91y ago

Story / rant warning.

I remember seeing an unhelpful hyperlink for the first time. It was a random word in the body of a random tech site that redirected to a list of articles from that site tagged with that term.

I remember being stunned, my expectation was that the link would lead me to another website, one that would be an authoritative source on that term and freely accessible.

20 years later we get a paywalled article about fragmented web – and we’re not slowing down.

1vuio0pswjnm71y ago

Works where archive.li is blocked:

https://cc.bingj.com/cache.aspx?d=5070227914243&w=ljIRk8yx42...

blackeyeblitzar1y ago

We need laws that make it so that giant platforms like Reddit have no exclusive rights to content submitted by users. It would be ridiculous for only Google to be able to train AI on YouTube or Reddit content for example.

causal1y ago

It feels like Reddit is approaching an inflection point anyway where bot-made content is concentrated enough to spoil the whole experience. Closed servers like Discord and Slack may be the last haven of online human interaction.

ozgrakkurt1y ago

Stopped using reddit after they hindered login-less viewing and blocked vpns. Everyone who respect themselves should start moving away from it imho. Same thing with google

manishsharan1y ago

For my use cases , Google is pretty much useless without Reddit

For example, when I search for product reviews, I always specify reddit. Otherwise the search results are inundated with SEO spam.

tempfile1y ago

Hopefully this paves the way for antitrust action, but I won't hold my breath.

Reddit's justification for this is profoundly wrong. Their "public content policy" is absurd doublespeak, and counter to everything the open internet is and hopes to be. You cannot simultaneously call yourself "open" and "public" while refusing access to automated clients. Every client is automated. They even go so far as to say that "crawling" (also known as "downloading") is an "abuse" and violates user privacy.

This is absurd, and not justified. I would love to see legislation that restricted server operators' ability to prohibit automated access in this way, but I suppose it will never happen. Some people in this thread have attempted to justify the policy by saying "they have to protect their income streams". No they don't. You don't have a right to an income stream, and you certainly don't have a right to lie in order to get all the benefits of an open internet with none of the downsides. Noting of course that the "downsides" are in this case actually just "competitors".

semiquaver1y ago

Sorry, what is the antitrust concern about Reddit blocking crawlers that aren’t paying them? Surely you don’t think Reddit has a monopoly on anything?

Or are you somehow suggesting that it’s google’s fault that Reddit took this step? I don’t see any indication that’s the case.

em-bee1y ago

not that reddit has a monopoly, but that google has.

google is using their power to prevent others from competing.

the problem here is of course that if reddit would be in financial trouble (i don't know if they are but let's imagine they need this money), they'd be between a rock and a hard place.

google should not be allowed to make exclusive deals, and reddit could not survive without the deal, then what would be left? google buys reddit, or the relevant authority approves of the deal?

i thought about the same problem with firefox. let's assume firefox is forced to allow people to make a choice of the default search engine (just like microsoft was forced to allow a choice of default browser on windows) then google might stop paying mozilla, and they could end up in financial trouble.

ideally no company ever depends on a single other company that much, but that only works if we don't allow companies to grow that much in the first place.

3 more replies

tempfile1y ago

Yes, sorry, should have been more clear: I claim google is in a monopoly position, not reddit. The rest of the comment is unrelated ranting about reddit's betrayal of their previously-held "public data is public" position.

lowbloodsugar1y ago

Funny that source of TFA blocked me from reading the whole thing.

earthboundkid1y ago

They literally think the scissor statement is a real thing that will really work, fml.

dvngnt_1y ago

site:reddit.com works for kagi for new posts this week?

rozab1y ago

Basically all 'independent' search engines piggyback off Google or Bing

https://help.kagi.com/kagi/search-details/search-sources.htm...

>Our search results also include anonymized API calls to all major search result providers worldwide

ColinHayhurst1y ago

>Basically all 'independent' search engines piggyback off Google or Bing

Incorrect: https://www.mojeek.com/about/why-mojeek

2 more replies

karaterobot1y ago

From the second paragraph of the article:

> Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.

dvngnt_1y ago

thanks i only read the first paragraph. then i went to kagi discord and they provided more context

AndroidKitKat1y ago

Kagi gets part of their index from Google, per the article, so perhaps that's the reason Kagi still works. Wonder if Vlad and Kagi will do (or have done) the calculus to see if buying crawlability from Reddit itself is cheaper than buying results from Google for Reddit search.

hugh_kagi1y ago

Not yet but it's something we want to look into.

ColinHayhurst1y ago

Kagi pays to use APIs from Mojeek and Google

Elfener1y ago

I mean, the reddit company did go public, so things like this were inevitable.

Also things like the API fiasco, and also small annoyances like the fact that when you click on an image on reddit, it now goes to a wrapper html page instead of just the actual image (this was one of the reasons reddit was better than most social media...).

account421y ago

It used to be that Reddit didn't host images and you'd have to link (often shitty) external image hosts. The someone created imgur to host images for reddit. And slowly but surely imgur became just another shitty image host (and social media site for some reason). Then reddit wanted some of the dough imgur was pulling in (probably just making losses) and added their own image hosting. At first it worked just like you are saying with you getting direct links to the image file. Now they also turned into yet another shitty image host.

Part of the blame for the redirect-to-wrapper page lies with browsers. If browsers didn't let servers reliably differentiated between a direct request and an <img> embed then this practice would not be as widespread.

mrec1y ago

Maybe it's just me or something temporary (I use Old Reddit, like all right-thinking folk) but for the past couple of days the image wrapper page seems to have been sent to the glue factory. I'm just getting the image now, unadorned.

melodyogonna1y ago

Wait that's actually terrible.

VoidWhisperer1y ago

Wow, reddit found a way to make themselves even less useful somehow. After the API fiasco, that seemed like it'd be pretty hard to do.

wvenable1y ago

But, apparently, they did finally find a way to make money.

jasode1y ago

>But, apparently, they did finally find a way to make money.

The most recent 10-K financial results 2024-03-31 (filed 2024-05-08) shows they actually lost money: https://www.sec.gov/edgar/browse/?CIK=1713445

(For 2024-Q1, Reddit lost -$575 million on revenue of $242 M.)

If the quoted "$60 million deal"[1] from Feb 2024 is accurate, that small amount from Google may not be enough for Reddit to turn a profit. It remains to be seen what the Q2 or Q3 financials will show.

[1] https://www.google.com/search?q=google+ai+deal+reddit

1 more reply

LunaSea1y ago

Barely enough to pay the CEO

splwjs1y ago

If they kept their API open then by now the entirety of the site would be ai slop that was built with chatgpt and launched with the api.

Then again most of what that site does is just blend and regurgitate the information that's currently on it anyway.

miohtama1y ago

Those AI bots would likely to be more intelligent commentors than Redditors

stainablesteel1y ago

which is ironic because pre-AI every solid piece of obscure information and non-programming question usually had an answer on reddit, its an extremely valuable dataset looking back. but moving forward i think its only going to become less valuable and people will probably manually/custom-scrape all the questions out of worthwhile subreddits and open up their data for free

splwjs1y ago

When I was young, my brother knew a guy who was really into movies. If you wanted to know about a movie you couldn't remember, you would go talk to that guy.

For a while, the internet had an end-run play that made that guy less useful. You can just go on the internet for obscure movie information, buddy.

But now it seems like knowing a movie guy is going to be the only way to get a real person's opinion on movies. The internet is about to forget everything without a profit motive and just start telling you that the latest product from a monolith corp like disney is the only movie worth watching. If someone scrapes all the useful movie opinions off of reddit and spends their time crafting it into a usable format, that guy's probably got a company. But not Bill. Bill's just a guy you can know or not know. You can't monetize knowing Bill. Sidenote that's probably why it irked me so bad when some bozo coined the phrase "social capital".

abdullahkhalids1y ago

The API changes and these robots.txt were part of the same strategy - preventing third parties from scrapping their data and reducing the AI generated content that makes it into their data. So they can sell that data and make money.

AlexandrB1y ago

> their data

Love how it's their data when it might make them money but not their data if they get sued.

1 more reply

kjkjadksj1y ago

Their dataset is already polluted with misinformation campaigns and shilling

Hikikomori1y ago

The only things it does for me is forcing me to use Google as a large amount of the answers I need is on reddit.

brewdad1y ago

So then this gambit worked. It sucks and I hate it. I will continue to use DDG/Bing first but it looks like I'll be hitting up Google more often too.

WarOnPrivacy1y ago

> The only things it does for me is forcing me to use Google

Startpage, Kagi and Lukol are 3 that source from Google. I imagine there are others.

immibis1y ago

That's what Google is paying them for :)

ein0p1y ago

Good for other search engines, I suppose. Reddit is a giant toxic pile of bovine manure.

nerfbatplz1y ago

I propose we change the term enshitification to engoogleification in regards to the internet.

crazygringo1y ago

This is about Reddit disallowing other search engines.

Blame Reddit, not Google.

dvngnt_1y ago

plenty of blame to go around

1 more reply

frizlab1y ago

I back up this proposal.

venkat2231y ago

Google is selfish

mediumsmart1y ago

that is awesome but I can't open old.reddit.com in my browser so its a non-issue.

dakial11y ago

And now lets watch white, grey and black hat SEO destroy reddit even more.

dbg314151y ago

Every time I think, “How scummy…” Reddit always finds another way to go lower.

venkat2231y ago

google is selfish

Khelavaster1y ago

robots.txt isn't legally binding. Can Reddit really force Bing not to crawl it..?

bitpush1y ago

When Microsoft strikes an exclusive deal with OpenAI to use their models, it is a smart, brilliant, clever move.

When Apple strikes an exclusive deal with suppliers for parts, it is sound business practice.

When Google strikes an exclusive deal with Reddit, it is ..

Some of you have no idea how businesses work, and it shows.

riku_iki1y ago

> When Google strikes an exclusive deal with Reddit, it is ..

It's because reddit is selling content created by users, base on promises that reddit supports open internet, open data, etc, without their consent and sharing revenue, which maybe legal but likely not ethical.

bitpush1y ago

Let's get specific. You're confusing with copyright and licensing.

The users hold the copyright (reddit claim that they made the meme) but reddit has the non-exclusive right to redistribute and license the content.

Two different things.

1 more reply

j / k navigate · click thread line to collapse

356 comments

tbeseda1y ago

https://archive.li/GS2I0

popcalc1y ago

  # Welcome to Reddit's robots.txt
  # Reddit believes in an open internet, but not the misuse of public content.
  # See https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy Reddit's Public Content Policy for access and use restrictions to Reddit content.
  # See https://www.reddit.com/r/reddit4researchers/ for details on how Reddit continues to support research and non-commercial use.
  # policy: https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy

  User-agent: *
  Disallow: /

Source: https://www.reddit.com/robots.txt

sunaookami1y ago

They serve a different robots.txt to Google: https://merj.com/blog/investigating-reddits-robots-txt-cloak...

You can see it here: https://search.google.com/test/rich-results/result?id=_mYogl... (click on "View Tested Page")

dogleash1y ago

> # Reddit believes in an open internet, but not the misuse of public content.

Calling it "public" content in the very act of exercising their ownership over it. The balls on whoever wrote that.

shit_game1y ago

1 more reply

pas1y ago

it's even worse. it's not theirs (it's the users'), they are merely hosting it and using it (ToS gives them a fancy irrevocable license I guess).

3 more replies

Khelavaster1y ago

The Fake News police should shut down this sort of messaging

will01y ago

Looks like it changed a month ago:

https://old.reddit.com/r/redditdev/comments/1doc3pt/updating...

immibis1y ago

Nobody who wants to be successful obeys robots.txt. And I do mean nobody.

chippiewill1y ago

They changed it to disallow so that scrapers can't just claim the robots.txt gave them permission.

3 more replies

latexr1y ago

1 more reply

maxnevermind1y ago

Has not NYT tried to sue OpenAI because of them ignoring robots.txt or you mean it's impossible to prove and / or it's still more profitable to just ignore robots.txt?

JohnFen1y ago

Sadly true. That's why I gave up on robots.txt years ago and started blocking crawlers outright in .htaccess

Of course, that became unsustainable so now I have everything behind a login wall.

Zuiii1y ago

> We believe in something that we will now proceed to violate.

I will never take a statement given by a company that blatantly lies like this at face value going forward. What a bunch of clowns.

raverbashing1y ago

With the amount of crap in Reddit, cleaning it must be a very non-trivial problem. (I mean, it never is, but in the case of Reddit it's probably extra complicated)

arnaudsm1y ago

I understand the AI context, but this is dangerously anticompetitive for other search engines.

This is a dangerous precedent for the internet. Business conglomerates have been controlling most of the web, but refusing basic interoperability is even worse.

zooq_ai1y ago

There is nothing preventing search companies paying the same $60 Million to license content.

If reddit had exclusive agreement, it would be anti-competive.

This is classic HN anti-Google tirade (and downvoting facts, logic and concepts of free market)

not_wyoming1y ago

> There is nothing preventing search companies paying the same $60 Million to license content.

Yes, actually, there is - having $60m to throw around.

[0] - https://en.wikipedia.org/wiki/Barriers_to_entry

4 more replies

not_wyoming1y ago

Juiciest update I’ve ever gotten to share: https://www.nytimes.com/2024/08/05/technology/google-antitru...

pluc1y ago

Paying 60 million to every site you want to index is also a bad precedent to set. Why can Reddit get paid and XYZ can't?

2 more replies

onlyrealcuzzo1y ago

This is an interesting development.

How many other sites might have leverage to charge to be indexed?

I don't want to live in a world where you have to use X search engine to get answers from Y site - but this seems like the beginning of that world.

Realistically, there are only 2 search engines now.

This seems very bad for Kagi - but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

ColinHayhurst1y ago

Kagi uses at least Google and Mojeek

edit:

> Realistically, there are only 2 search engines now.

https://seirdy.one/posts/2021/03/10/search-engines-with-own-...

WarOnPrivacy1y ago

> Realistically, there are only 2 search engines now.

From the article:

     Many alternatives to GBY [Google, Bing, and Yandex] exist, but almost none of them have their own results;

This seems to assert that ~0 other search providers do any crawling at all. Ever. Are we sure that's the case?

   (they could crawl but never ever return those results == more odd).

5 more replies

Yawrehto1y ago

Doesn't it list three major ones, Google, Bing, and Yandex, plus Mojeek and a few other small ones? That's a bit more than two.

McDyver1y ago

That seems like the business model for streaming. You subscribe to X provider to watch Y series. So, as for streaming, I suppose a pirate bay search engine will come up

toomuchtodo1y ago

Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... ("HN Search: annas archive")

[2] https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d... ("AcademicTorrents: Reddit comments/submissions 2005-06 to 2023-12 [2.52TB]")

gtirloni1y ago

> but this seems like the beginning of that world.

It's not the beginning, it's mere continuation.

Walled gardens have existed since the AOL days. They deteriorate over time but it doesn't prevent companies from trying (each time, in bigger attempts).

aAaaArrRgH1y ago

> but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

It still exists. It just isn't that popular.

splwjs1y ago

idk man i bet you five bucks and a handshake it's just going to play out like the existing startup grift.

The upstart gets indexed and has an API, the established player doesn't.

StrauXX1y ago

qingcharles1y ago

At the federal level; but states have their own laws. For instance, it can get you 5 years in prison in Illinois to violate a web site ToS.

https://www.ilga.gov/legislation/ilcs/ilcs4.asp?DocName=0720...

redcobra7621y ago

Has anyone ever successfully been prosecuted for violating this statute?

1 more reply

jpalomaki1y ago

Quite sure they are also enforcing these with some technical measures to limit scraping.

renlo1y ago

As was LinkedIn, who was forced to rate stop limiting / IP-banning scrapers for public pages.

2 more replies

wtf2421y ago

sct2021y ago

There's a new setting in Cloudflare to block AI/scraper bots. https://blog.cloudflare.com/declaring-your-aindependence-blo...

graeme1y ago

Anyone have any experience with this? Is there nothing but upside in blocking these bots

1 more reply

jedberg1y ago

But I can understand why they made the change they did. The data was being abused.

Closi1y ago

> But I can understand why they made the change they did. The data was being abused.

Depends how you see it - if you see it as 'their' data (legally true) or if you see it as user content (how their users would likely see it).

If you see it as 'user content', they are actually selling the data to be abused by one company, rather than stopping it being abused at all.

Ajedi321y ago

> if you see it as 'their' data (legally true)

1 more reply

passwordoops1y ago

Enough cash or enough data on hand to show the majority of traffic comes from the search monopoly

ColinHayhurst1y ago

toomuchtodo1y ago

[1] https://www.bloomberg.com/news/articles/2024-05-09/reddit-la... | https://archive.today/wQuKM

[2] https://www.sec.gov/edgar/browse/?CIK=1713445

1 more reply

JohnMakin1y ago

BeetleB1y ago

I know people will hate to hear this, but Reddit it's not important to the A well being of the Internet.

2 more replies

jedberg1y ago

I agree with you in theory, but in practice someone has to pay for all this magic.

4 more replies

ColinHayhurst1y ago

The blocks for MojeekBot, as Cloudflare verified and respectful bot for 20 years, started before the robots.txt file changes. We first noticed in early June.

We thought it was an oversight too at first. It usually is. Large publishers have blocked us when they have not considered the details, but then reinstated us when we got in touch and explained.

ekidd1y ago

eddd-ddde1y ago

if i create a vacuum cleaner and decide to only sell it at Walmart you can't get mad at me for not wanting to sell it at costco

you can always buy a competitor's or make your own vacuum cleaner if you hate buying at Walmart

maybe what you are really mad about is Reddit monopolising content

2 more replies

fredgrott1y ago

the article quotes reddit policy change: Reddit considers search and ads commercial activities and thus subject to robot.txt block and exclusion.

EasyMark1y ago

account421y ago

ykonstant1y ago

It's ironic, because Reddit is the only search engine that works on Google now thanks to shittening.

maxwell1y ago

They're both running on fumes at this point.

riiii1y ago

Also sniffing them.

QVVRP4nYz1y ago

For years reddit build-in search was broken (or at least broken) and people were forced to use 3rd parties like google, so we came full circle.

daft_pink1y ago

I don’t understand how this isn’t anti-competitive behavior. It seems like reddit has to offer this deal with similar terms to google’s competitors.

talldayo1y ago

They do offer that deal to others; a big news story was when OpenAI bought Reddit's data they were selling: https://openai.com/index/openai-and-reddit-partnership/

dathinab1y ago

yep, but for things which are "only" search engines it's not a viable offer. Only if you expect "big AI business value" from it does it make sense, maybe.

eddd-ddde1y ago

I don't see how this tracks at all. Companies can decide to only sell their products with some retailer if they want. You can't force them to make deals with other companies.

gtirloni1y ago

You certainly can in monopoly situations (which apparently this isn't the case).

Suppafly1y ago

Most business deals are anti-competitive in some way. What makes you think this specifically rises to the level where they'd legally have to offer similar terms to competitors?

daft_pink1y ago

I’m not sure. Maybe the angle is that Google is anti-competitive by signing an agreement that limits information to it’s rivals.

Being forced into using google services, because they are paying information companies to deal only with them seems like a disaster for the web.

carlosjobim1y ago

Why in the world would they have to do that? There are thousands of exclusive business-to-business deals being signed into action every second of the day.

lmeyerov1y ago

I'm not sure what to make of that.

morkalork1y ago

heisenbit1y ago

Certainly rude but also possibly legally problematic. If they were judged to be in a dominant position in a market and were found making deals with exclusivity then it can get expensive.

It all depends of course what the market is. If one looks as reddit not as a whole but as a collection of niches then one could imho find niches where reddit has a dominant knowledge position.

lmeyerov1y ago

We do 4-6 figures/yr for providers which is normal in our world

dathinab1y ago

john-radio1y ago

> Worse it doesn't even really "work" anymore, giving how most search are flooded with garbage SEO results and payed advertisements "basically" looking like search results ...

I don't understand what you're saying. That's exactly why people append `site:reddit.com` to their searches in the first place, because those search results typically aren't like that.

wwweston1y ago

Or at least, reddit posts and comments that are content messaging / marketing (human or AI) fit in better with earnest and natural posts, so that they're more effective.

numbers1y ago

1vuio0pswjnm71y ago

The veracity of this statement is questionable.

I found at least four web search engines not using Google's index that produced results from the last week.

Example: Recent eruption at Yellowstone Black Diamond Pool

https://www.ecosia.org/search?method=index&q=site:reddit.com...

https://search.brave.com/search?q=reddit.com+black+diamond+p...

https://api.yep.com/fs/2/search?client=web&gl=all&no_correct...

   POST /sp/search HTTP/1.0
   host: www.startpage.com
   content-length: 74
   content-type: application/x-www-form-urlencoded
   query=site:reddit.com black diamond pool&abp=-1&t=&lui=english&sc=&cat=web

At least for this example, I got the same desired result using Reddit site search.

https://old.reddit.com/search/?q=black+diamond+pool

If anyone has some good examples of search queries that I can test showing why a search engine must be used, please share.

1vuio0pswjnm71y ago

"Its search engine uses Microsoft's Bing's technology, with whom it has a long-term arrangement."

https://www.bbc.com/news/business-53922786

lopis1y ago

Ecosia does use Google or Bing, which you can select in the settings.

niutech1y ago

But Brave Search has their own independent index.

r_singh1y ago

I wonder how Aaron Swartz would react to this

geodel1y ago

My guess is he'd freak out once he'd hear that lawyers, law enforcement may get involved on this issue.

voisin1y ago

Makes sense that Google did this deal since their search quality tanked and they became an de facto front end UI for Reddit.

NoMoreNicksLeft1y ago

Google hasn't been a search engine in a long while, it's just an advertisement engine now.

dev1ycan1y ago

it's so bad it's crazy, you can legit not find stuff on the internet anymore, it's the same with youtube, I search something and get like 20 or so results and then everything else is hidden.

2 more replies

niutech1y ago

Have you tried Verbatim search in Google?

LegitShady1y ago

mutatio1y ago

peddling-brink1y ago

Google should abandon its mission to “organize the world’s information” because doing so requires spending money for valuable data, and others might not want to spend that money?

roughly1y ago

It’s not like everyone wasn’t already pulling the same grift, but quantity really does have a quality all its own.

imglorp1y ago

Of course, we have to be careful not to villainize a neutral tech. Instead let's call it what it is: unchecked capitalism and monopolistic behaviors.

LLM did not enshittify what's left of the internet, greed did it.

latexr1y ago

synicalx1y ago

> Of course, we have to be careful not to villainize a neutral tech

1 more reply

lifestyleguru1y ago

I deeply regret every minute spent on and kilobyte of text contributed to reddit.

Ylpertnodi1y ago

I don't. There's nothing around that is similar...with the same traction. The various 'verses are variations on cat pics. I'm still looking, though.

wccrawford1y ago

While it's still not Reddit, but I've been enjoying Lemmy. I have a similar range of communities on each, and other than some annoying groupthink, the content is often similar.

And to me, forgetting to log in to each of them feels similar, too. For what that's worth. (I hate both of them when not logged in.)

card_zero1y ago

trallnag1y ago

I can confidently state that I'm a net negative for Reddit, looking at the dozens of banned accounts in the trash bin of my KeePass vault

neilv1y ago

I'm concerned multiple ways by this, but I also could see some positive fallout from this, if it sets precedents that help protect 'content' owners from AI goldrush companies just taking everything.

gtirloni1y ago

PaulRobinson1y ago

This is great. It means I won't see Reddit content popping up all over search results in other engines. Can Medium do the same? And perhaps Quora?

jonpurdy1y ago

FYI, Kagi lets you do this and personalize it as you desire. They even share aggregated stats※ about which domains users choose to block/lower. (Mine generally match these stats.)

※ - https://kagi.com/stats?stat=leaderboard&k=-2

eliasdaler1y ago

You can also do this for free with uBlackList: https://github.com/iorate/ublacklist

This has greatly enhanced my Google experience - easy to ban content farms, AI-generator websites from appearing in Google Images etc.

WarOnPrivacy1y ago

> Kagi lets you do this and personalize it as you desire.

Kagi shill here. Are they finally applying filters and operands to image searches?

Asking because it was a tough year seeing Pinterest as top filter choice and top result in images (when set as filter=block).

(edit: I just tried searching->image: beautiful quilt patterns. I didn't spot any Pinterest results!)

I have never understood why DDG, etc steadfastly refuse to obey operands in image searches. Most days. Every blue moon operands seem to work. I think.

sidebar: Yesterday I saw Yandex obey quotes in a web search. It was the 1st time I've seen that.

1 more reply

Suppafly1y ago

>This is great. It means I won't see Reddit content popping up all over search results in other engines.

kingnothing1y ago

What use do you get out of a search engine if not searching for reddit and other forums? The rest of the internet has become a cesspool of useless AI generated crap.

kevincox1y ago

To be fair Reddit threads are more and more often getting filled with useless AI generated crap as well.

jjulius1y ago

To be fair, Reddit has plenty of astroturfing, too.

rkangel1y ago

troyvit1y ago

bdjsiqoocwk1y ago

What a weird thing to say. Reddit has for a long time been a place where real people hang out and have real conversations, unlike quora and medium.

VancouverMan1y ago

> where real people hang out and have real conversations

I don't consider the discussions there to be "real" in any meaningful way, thanks to the extensive moderation.

I've found the situation to be particularly bad on the Canadian locale-specific subreddits, for example, but a enough of the tech-oriented ones I've seen seem to end up like that, too.

candiddevmike1y ago

1 more reply

MattPalmer10861y ago

Obviously, people do see value in it, or they wouldn't keep saying so! I would happily exclude Reddit links from search results though.

psunavy031y ago

Not exactly always a reliable source of info outside uncontroversial niche topics or places like /r/AskHistorians that actually moderate. And even there I've seen the occasional humdinger.

PaulRobinson1y ago

I've got thousands of karma. Used to love it. No more.

lfkdev1y ago

Yeah awesome, reddit was one of the last useful results beside the spam blogs and ai generated articles.

nullc1y ago

Google really should blacklist reddit entirely for this practice, but sadly as bad as reddit is it's still a much higher quality result than average for google.

account421y ago

jumploops1y ago

IIRC, GPT-2 was primarily trained on Reddit[0]

[0]https://www.reddit.com/r/ChatGPT/comments/133xgb5/gpt2_was_p...

ChrisArchitect1y ago

nomilk1y ago

Suppose a crawler or rival search engine doesn’t respect robots.txt, reddit can’t stop them. Make it a bit trickier, yes, but not stop them.

eschneider1y ago

It is evidence that they didn't have permission if you sue them.

kingnothing1y ago

There's no grounds on which to file suit. The 9th circuit court found web scraping is legal.

https://techcrunch.com/2022/04/18/web-scraping-legal-court/

1 more reply

miyuru1y ago

reddit blocked datacenter IPs even before this change.

nomilk1y ago

Could a motivated scraper not buy IPs/proxies that aren’t in those ranges, i.e. to blend in with general users?

2 more replies

r_singh1y ago

debacle1y ago

Reddit has been ripe for disruption for years. It's just waiting on an inflection point and someone to take it behind the barn.

crazygringo1y ago

The network effects are too strong.

Remember, the only reason Reddit "won" was because Digg destroyed itself with a radical upgrade that everyone hated.

CSMastermind1y ago

I don't think this is true.

The main thing I see Reddit being useful for are discussions about entertainment.

There's probably a subreddit for your favorite sports team, twitch steamer, TV show, book series video game, politics (which is entertainment for some people).

Reddit has seriously degraded the experience of a lot of these communities with things like restricting custom CSS.

In general you could attract users by offering profit sharing on ads the same way Youtube does for creators.

Have the best moderation tools in the world, a constant painpoint with Reddit. Give admins more flexibility over the appearance of the board, all things Reddit took away.

1 more reply

NoMoreNicksLeft1y ago

>what's the alternative to Reddit? I mean, it's certainly not Quora.

1 more reply

Suppafly1y ago

>Reddit was already an alternative to Digg -- what's the alternative to Reddit?

This site is essentially 'orange reddit', they just need to add sub-HNs or tagging or something and it'd be ready for an influx of reddit refugees. Not that any of really want it, but it's possible.

tayo421y ago

Reddit is quietly a huge website with a significant amount of users. So many people use it but dont talk about it. Google search says 1billion mau? Twice as big as Twitter

nope10001y ago

There is Lemmy for example, very similar to old Reddit. The big problem is the missing content outside of mainstream communities.

1 more reply

bdw52041y ago

The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

ravetcofx1y ago

This exists with Lemmy already and is fostering nice communities (and due to ActivityPub is interoperable with Mastodon accounts)

psunavy031y ago

They had this years ago, and they were called "forums."

Suppafly1y ago

>The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

Honestly, it's strange to me how hard people are trying to make distributed anything happen. Federation mostly solves a problem that real people don't have or care about.

1 more reply

teabee1y ago

Is this not just what the internet was before reddit? What features would "distributed reddit" have that an internet full of independent community forums be missing?

ks20481y ago

I like it principle, but after watching the situation with Twitter clones, I'm not too optimistic on federated services taking off.

I would like to see a wikipedia-style system for Twitter/Reddit: open access data, non-profit.

astrange1y ago

It's not possible because the most common problems with running a forum are spam and moderation, and both of those are too much work unless it's centralized.

jessriedel1y ago

onlyrealcuzzo1y ago

Or for Google to buy it.

They could monetize it much better while being less annoying.

Ultimately - Google is getting everything they want from Reddit with this deal without having to buy it outright.

Short of Reddit transforming to an entirely different product (difficult) - I'm not sure where the major growth opportunity is for it.

rob741y ago

It wouldn't be the first time they have done something like this either. Remember https://en.wikipedia.org/wiki/Google_Groups ?

1 more reply

QVVRP4nYz1y ago

The killing of third party clients didn't have significant impact, I don't know what would they have to do to lose users, other than some kind of mandatory subscription fee.

escapecharacter1y ago

Man, I just want to be able to search the entire internet for when I’m doing niche research.

Does this mean there will be a future where everyone is running their own crawler? I suppose.

api1y ago

myrandomcomment1y ago

So I went Slashdot, Digg, Reddit. I stopped spending any time on Reddit 5 years ago. Not worth it.

cyanydeez1y ago

Work is such a flimsy word for qhat google currently does with search

As soon as someone shows me a search engine that restores quality of searxh, im getting a subscription for work.

It really cany be hard to whitelist sources and index appropiately.

Get goimg nerds , google has fallen.

thih91y ago

Story / rant warning.

I remember seeing an unhelpful hyperlink for the first time. It was a random word in the body of a random tech site that redirected to a list of articles from that site tagged with that term.

I remember being stunned, my expectation was that the link would lead me to another website, one that would be an authoritative source on that term and freely accessible.

20 years later we get a paywalled article about fragmented web – and we’re not slowing down.

1vuio0pswjnm71y ago

Works where archive.li is blocked:

https://cc.bingj.com/cache.aspx?d=5070227914243&w=ljIRk8yx42...

blackeyeblitzar1y ago

causal1y ago

ozgrakkurt1y ago

Stopped using reddit after they hindered login-less viewing and blocked vpns. Everyone who respect themselves should start moving away from it imho. Same thing with google

manishsharan1y ago

For my use cases , Google is pretty much useless without Reddit

For example, when I search for product reviews, I always specify reddit. Otherwise the search results are inundated with SEO spam.

tempfile1y ago

Hopefully this paves the way for antitrust action, but I won't hold my breath.

semiquaver1y ago

Sorry, what is the antitrust concern about Reddit blocking crawlers that aren’t paying them? Surely you don’t think Reddit has a monopoly on anything?

Or are you somehow suggesting that it’s google’s fault that Reddit took this step? I don’t see any indication that’s the case.

em-bee1y ago

not that reddit has a monopoly, but that google has.

google is using their power to prevent others from competing.

the problem here is of course that if reddit would be in financial trouble (i don't know if they are but let's imagine they need this money), they'd be between a rock and a hard place.

google should not be allowed to make exclusive deals, and reddit could not survive without the deal, then what would be left? google buys reddit, or the relevant authority approves of the deal?

ideally no company ever depends on a single other company that much, but that only works if we don't allow companies to grow that much in the first place.

3 more replies

tempfile1y ago

lowbloodsugar1y ago

Funny that source of TFA blocked me from reading the whole thing.

earthboundkid1y ago

They literally think the scissor statement is a real thing that will really work, fml.

dvngnt_1y ago

site:reddit.com works for kagi for new posts this week?

rozab1y ago

Basically all 'independent' search engines piggyback off Google or Bing

https://help.kagi.com/kagi/search-details/search-sources.htm...

>Our search results also include anonymized API calls to all major search result providers worldwide

ColinHayhurst1y ago

>Basically all 'independent' search engines piggyback off Google or Bing

Incorrect: https://www.mojeek.com/about/why-mojeek

2 more replies

karaterobot1y ago

From the second paragraph of the article:

> Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.

dvngnt_1y ago

thanks i only read the first paragraph. then i went to kagi discord and they provided more context

AndroidKitKat1y ago

hugh_kagi1y ago

Not yet but it's something we want to look into.

ColinHayhurst1y ago

Kagi pays to use APIs from Mojeek and Google

Elfener1y ago

I mean, the reddit company did go public, so things like this were inevitable.

account421y ago

mrec1y ago

melodyogonna1y ago

Wait that's actually terrible.

VoidWhisperer1y ago

Wow, reddit found a way to make themselves even less useful somehow. After the API fiasco, that seemed like it'd be pretty hard to do.

wvenable1y ago

But, apparently, they did finally find a way to make money.

jasode1y ago

>But, apparently, they did finally find a way to make money.

The most recent 10-K financial results 2024-03-31 (filed 2024-05-08) shows they actually lost money: https://www.sec.gov/edgar/browse/?CIK=1713445

(For 2024-Q1, Reddit lost -$575 million on revenue of $242 M.)

[1] https://www.google.com/search?q=google+ai+deal+reddit

1 more reply

LunaSea1y ago

Barely enough to pay the CEO

splwjs1y ago

If they kept their API open then by now the entirety of the site would be ai slop that was built with chatgpt and launched with the api.

Then again most of what that site does is just blend and regurgitate the information that's currently on it anyway.

miohtama1y ago

Those AI bots would likely to be more intelligent commentors than Redditors

stainablesteel1y ago

splwjs1y ago

When I was young, my brother knew a guy who was really into movies. If you wanted to know about a movie you couldn't remember, you would go talk to that guy.

For a while, the internet had an end-run play that made that guy less useful. You can just go on the internet for obscure movie information, buddy.

abdullahkhalids1y ago

AlexandrB1y ago

> their data

Love how it's their data when it might make them money but not their data if they get sued.

1 more reply

kjkjadksj1y ago

Their dataset is already polluted with misinformation campaigns and shilling

Hikikomori1y ago

The only things it does for me is forcing me to use Google as a large amount of the answers I need is on reddit.

brewdad1y ago

So then this gambit worked. It sucks and I hate it. I will continue to use DDG/Bing first but it looks like I'll be hitting up Google more often too.

WarOnPrivacy1y ago

> The only things it does for me is forcing me to use Google

Startpage, Kagi and Lukol are 3 that source from Google. I imagine there are others.

immibis1y ago

That's what Google is paying them for :)

ein0p1y ago

Good for other search engines, I suppose. Reddit is a giant toxic pile of bovine manure.

nerfbatplz1y ago

I propose we change the term enshitification to engoogleification in regards to the internet.

crazygringo1y ago

This is about Reddit disallowing other search engines.

Blame Reddit, not Google.

dvngnt_1y ago

plenty of blame to go around

1 more reply

frizlab1y ago

I back up this proposal.

venkat2231y ago

Google is selfish

mediumsmart1y ago

that is awesome but I can't open old.reddit.com in my browser so its a non-issue.

dakial11y ago

And now lets watch white, grey and black hat SEO destroy reddit even more.

dbg314151y ago

Every time I think, “How scummy…” Reddit always finds another way to go lower.

venkat2231y ago

google is selfish

Khelavaster1y ago

robots.txt isn't legally binding. Can Reddit really force Bing not to crawl it..?

bitpush1y ago

When Microsoft strikes an exclusive deal with OpenAI to use their models, it is a smart, brilliant, clever move.

When Apple strikes an exclusive deal with suppliers for parts, it is sound business practice.

When Google strikes an exclusive deal with Reddit, it is ..

Some of you have no idea how businesses work, and it shows.

riku_iki1y ago

> When Google strikes an exclusive deal with Reddit, it is ..

bitpush1y ago

Let's get specific. You're confusing with copyright and licensing.

The users hold the copyright (reddit claim that they made the meme) but reddit has the non-exclusive right to redistribute and license the content.

Two different things.

1 more reply

j / k navigate · click thread line to collapse