Taking action against scraping for hire (opens in new tab)

(about.fb.com)

220 pointspawelkobojek3y ago227 comments

227 comments

Collecting the rhetorical BS:

"scraping attacks"

Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.

"self-compromised"

Monopolists want to sell you thus it's imperative they maintain the fiction of "one person, one account". By admitting you own your account, they'd have to allow sharing and they wouldn't be able to provide their customers (advertisers) with reliable data about individuals.

"protect people from scraping"

Monopolists will protect themselves and call it protecting you. They will attempt to make you afraid of some other actor using your data in harmful ways so as to detract from how they monetize you and use your data in harmful ways.

"deter the abuse"

Monopolists don't want to argue about what constitutes abuse. Anything they write in their TOS is entirely for their benefit and only constrained by local law (if that). They will abuse you to the fullest extent they can get away with while arguing that any action to use your rights is "abuse."

"safeguard people against clone sites"

Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.

More subtle but even more ironic rhetorical points

"for hire" / "paying for access"

Emphasizing that people making money (gasp) for providing this service, is bad.

"industry leader in taking legal action" + "across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society"

Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.

pr0zac3y ago

While I agree with your assessment of the BS in the article wrt scraping, and also agree with your assessment that the behaviour is completely about FB protecting itself and its monopoly control (the word control being important), I think its important to emphasize its not about FB caring whether other entities having access to the data, its about FB caring about it's public perception with regard to its having that data at all.

Over the last few years or so it feels like, to reference a @dril tweet[1], Facebook has just been 'turning a big dial taht says "data access" on it and constantly looking back at the audience for approval like a contestant on the price is right' with how much it allows 3rd parties to get at its data.

Keep in mind ~5 years ago the big thing at FB was "Open Graph" and "Graph Search" which gave everyone really in-depth access to their data with the idea that Facebook would be the "data platform" on top of which all of these 3rd parties would build apps and interfaces. This of course eventually resulted in the whole Cambridge Analytica thing and now this gigantic swing in the other direction of being overly protective of the data as a kneejerk PR reaction to all the bad press.

FB loved sharing data and provided a direct API for accessing it when the public narrative was about data freedom and 3rd party developer friendliness and it hates giving any access at all and goes around sues web scrapers now that the public narrative is all about privacy.

Facebook will happily align itself in whatever way results in the least public outcry arguing they shouldn't be allowed to have the data in the first place regardless of if that means giving access or restricting it.

1: https://twitter.com/dril/status/841892608788041732

Mo33y ago

The example you stated is a truly fantastic one. Graph Search was pretty much like a direct API into their front facing network.

nathanaldensr3y ago

Great post that summarizes exactly what I feel about globocorps. The euphemisms and propaganda are disgusting.

noslenwerdna3y ago

The users agreed to share their data with Facebook, not some other company. If they didn't prevent this, they'd be asking for another Cambridge Analytica

stickfigure3y ago

The users agreed to share their data with everyone that uses Instagram. Because that's how the site works.

kube-system3y ago

There’s an important difference between technically consenting and informed consent.

Given what I know about the bot problem on Instagram, I would imagine many people have been tricked into sharing their private profiles with scraping bots. Many bots are copying real people’s profiles and then spamming their friends with follow requests. It’s highly effective and gives these bots access to private profiles.

Fooling people is fraudulent, period.

greatgib3y ago

The user agreed in facebook to have is data "public", so it can't complain that a robot scrap it.

Nothing prevents him to restrict access to his pages an data to "trusted" friends.

kube-system3y ago

The description in the article sounds like it scrapes private profile data.

> Octopus designed the software to scrape data accessible to the user when logged into their accounts

2 more replies

jasfi3y ago

That is a very good point, but surely it was taken into consideration when scraping was declared legal?

stefan_3y ago

All that case says is "scraping is not a violation of the CFAA". But of course the scraped data still exists in legal limbo; maybe you can compute derived information from it, but the moment a scraper reproduces it there is all of copyright law waiting for them.

1 more reply

danuker3y ago

https://techcrunch.com/2022/04/18/web-scraping-legal-court/

utahcon3y ago

The only argument I have here (sadly in favor of FB) is with "safeguard people against clone sites". While I did give my data to FB, I didn't approve that transfer to another site/system. That is the only place I could possibly see some legal foot hold.

asdff3y ago

What happens when FB builds a shadow instagram profile of you based on your FB account? That already happens. FB clones their own data for other projects no different than what you might fear happening if this data were cloned to a third party. The cat is out of the bag already but FB wants to pretend they are the only ones with the right to abuse.

kbenson3y ago

It's impossible to control information once been created. The longer it's existed and the more locations you can see it make that spread exponentially more likely.

Wehether we make that spread of informationlegal or not does little to affect whether it happens.

There are two things that might help. First, don't share as much information. Once it's no longer limited to you or your close group of friends which hopefully won't share it along with your name, it's mostly out of your control. Second, put limits (laws) on what information companies are able to synthesize about you, and how long they can retain it. If there's less information created about you (or it's ephemeral, created and destroyed as needed), and if they need to clean out older data, there's less to be shared or stolen.

kube-system3y ago

“It’s hard to enforce the rule of law” is not a good reason to abandon it entirely. Data privacy laws make data privacy better even without being 100% infallible.

We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.

1 more reply

mylons3y ago

they also toss in the chinese affiliation in hopes to bring even more ill will from the reader towards the company. china is probably doing some bad things, but scraping facebook ain’t one of them.

kube-system3y ago

Scraping social media is something that China is very notorious for doing. They are 100% positively scraping all major social networks around the world.

They do this to collect information of foreign policy interest to them, to silence political dissidents abroad, etc.

For example: https://www.washingtonpost.com/national-security/china-harve...

And: https://www.propublica.org/article/even-on-us-campuses-china...

iandanforth3y ago

Good point, I missed that one.

SergeAx3y ago

I don't get the thing about "monopoly".

Let's start with one thing: copyright on databases. Take IMDb: they collect and combine totally open data on movies cast, crew, soundtracks used and so on. Everyone can go to the cinema, wait until movie ends, write down data from credits roll and put it on the database. There's no prohibition on this activity. Cinema may prohibit filming inside, but not using pencil on paper. Or you may buy a DVD released later, and do just the same. Or you may even write a movie company email asking for those data in electronic form and chances are they will send it to you or point to some promo materials website where it is published already.

But the entire database is a product of work, and that makes it valuable. So the company or organization spent time and money collecting, indexing and cross-linking those data, and has a right to bank on that work. Easily copying that database for commercial purpose _is_ stealing. This is why we have a database copyright laws.

Now back to Meta. They created this product and made it attractive enough so people are adding their data voluntary. Every single piece of data is quite open (maybe not really so for personal bits like face photos, emails and phone numbers). Meta spent a lot of cash making and keeping product that attractive, and now banks on those collected data by targeting ads.

Nothing in the world prohibits everyone else to create a service, make it valuable, attract people, collect data (according to data collection laws) and bank on that. But just copying data collected my Meta is stealing, and Meta is in its own right to protect it. The fact that Meta did it before doesn't makes it monopolist. In fact, there are lots of companies doing the same, like Google, Amazon, Apple, eBay etc. So in my opinion it is not a monopoly defending its' position, but rather business defending its' assets from stealing.

rmbyrro3y ago

Missed this one:

> a US subsidiary of a "Chinese national" "high-tech" enterprise

Replacing it with "a business" would do just fine.

1 more reply

TechBro86153y ago

Indeed. It's the height of hypocrisy for a company to define the borders of its own system and then prosecute those who they consider in violation of them. There is no consideration given to whether the data should have been collected and retained by Facebook in the first place, regardless of whatever arbitrary access policies they defined to fit their own business and data model.

It's not clear what Facebook's position on scraping truly is. Sometimes they downplay it as "normalized and widespread," and other times they castigate it as inexplicably legal and clearly immoral, or even outright "in violation of state and federal law." For example:

- April 2021. Researchers find an exposed database containing the scraped data of 533 million facebook users. Some news reports refer to it as a "breach." Facebook attempts to downplay the issue as the result of third party scraping. Headline in ZDNet: "Internal Facebook email reveals intent to frame data scraping as ‘normalized, broad industry issue’" [0]

- October 2020. Facebook announces lawsuits against companies it claimed created a "malicious extension on Google’s Chrome Web Store designed to scrape Facebook, in violation of Facebook’s Terms and Policies and state and federal law." [1]

So... which is it? Does Facebook believe that scraping is a "broad, normalized industry issue?" Or is it a violation of "state and federal law?" It seems like they measure severity of its impact primarily based on the reactions of political commentators.

And what's the difference between automating a browser and automating an API client? Why did Facebook design an API for accessing the data they collected, if it's illegal to collect? They've even claimed to be the victim of Cambridge Analytica, who purchased a "quiz" application created by a developer who pieced it together using code straight from the "examples" section of Facebook's API documentation.

There is one obvious resolution to this apparent contradiction. If we remove Facebook from the question, then the contradiction resolves itself. All we need to do is stop presuming that Facebook has the right to collect and retain this data in the first place. And as a user, if you publish your data to a website designed for sharing it with other people, then by definition it is no longer private data. Therein lies the central question: what is "semi-private" data, and who controls its boundaries?

[0] https://www.zdnet.com/article/facebook-internal-email-reveal...

[1] https://about.fb.com/news/2020/10/taking-legal-action-agains...

p.s. another thing they never mention is why companies want to scrape lists of facebook users. perhaps it might have something to do with the "lookalike audience" feature, and its more precisely targetable predecessors, which allow advertisers to upload a list of usernames and email addresses for targeted advertising?

fxtentacle3y ago

Of course, Facebook wants to make it sound like scraping is illegal, when it generally isn't.

But account hijacking and mass-creation of accounts just to access private pages are clear violations of the Facebook and Instagram ToS, so they surely can sue for that.

Raed6673y ago

Violation of ToS does not mean a violation of the law.

closewith3y ago

Most law suits aren't due to breaches of the law, but breaches of contract. Whether terms of service constitute an enforceable contact is another matter.

adamsmith1433y ago

ToS have been around for decades, surely this question is settled by now?

1 more reply

jhoelzel3y ago

if a bot creates the account, who breaches the contract?

1 more reply

stonemetal123y ago

That is why they are suing rather than pressing charges. When someone steals your car you don't sue them you press charges. When someone doesn't uphold their end of a contract you don't press charges you sue for breach of contract.

compsciphd3y ago

in reality, you as an individual can't press charges. Only the state can. And many times the state chooses not to. You can sue in civil court, but individuals can't bring cases in criminal court.

3 more replies

sneak3y ago

"pressing charges" isn't a thing.

2 more replies

CoastalCoder3y ago

I don't think I know the answer, but I'm curious:

Does violating a website's TOS meant your accessing it beyond your authority, making it a violation of the US's Computer Fraud and Abuse Act?

tumult3y ago

Not a violation. Decided by Supreme Court in 2021. Van Buren vs. United States. It was a big deal.

zja3y ago

Violating TOS no; Gaining access beyond your authority maybe https://www.eff.org/deeplinks/2010/07/court-violating-terms-...

1 more reply

danaris3y ago

I don't have a source for this, but my recollection is that this has been successfully argued by a couple of companies—but then an appeals court found very firmly that it was not the case.

Essentially, having that be true would mean that any given website could create whole new classes of criminal behavior.

1 more reply

dementiapatien3y ago

Since when do you get sued for breaching TOS?

curiousllama3y ago

Since you start a business on the violation.

"Since when do I get sued for taking too many free samples from Costco?" -> "Since you started taking millions of them to resell"

jhoelzel3y ago

im not sure on american law, but if you give me those samples willingly i can do whatever i want with them.

Actually this is the reason why many products come with the lable "not for resale" but i have yet to find somebody who cares about it :D

1 more reply

thallium2053y ago

Since when do you get sued for breaching a contract? When the offense is worth it.

golemotron3y ago

You can get sued for anything that causes harm.

Relevant life lesson: don't do things to people with money that they might perceive as harm.

Corollary: Being sued is as much punishment as losing a suit for most people.

contravariant3y ago

I don't know but it's at least been that way since Aaron Swartz did it I suppose.

HeckFeck3y ago

Data harvesting is moral for me, but not for thee.

mateuszbuda3y ago

In general I agree that harvesting public data is moral. I think that in these particular cases it's: 1) extracting data from profiles that opted for not being public (only available to logged in users) and 2) reposting scraped data (publicly?) as belonging to the guy who scraped it without users consent.

kordlessagain3y ago

Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

At the same time, I don't think all of Instagram's users care if their images are hidden, or not.

It's quite unfortunate Facebook/Meta is using hostile language and the word "scraping" together in this case. Scraping is a legitimate process used by various business models to gather information from the Web, which itself was originally intended to be an open forum for people to share content.

Hostile business models have corrupted that intent and turned it into a competitive environment that is harming users and legitimate models which may not have the funding larger corporations can muster.

I have a "scraper" I've built that will either snapshot a page from a user's browser or crawl it remotely with Selinium/Firefox, on the user's behalf, to save the content in an index for searching later, by that user. It's not automated, nor does it parse and crawl URLs in the pages saved. It doesn't use page content in a wider context, either.

I've spent a significant amount of time trying to "work around" anti-scraping efforts by various companies and it's frustrating to see hostility instead of cooperation in certain types of use.

car_analogy3y ago

> Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

1) It was public when the content was posted by its authors. Facebook locked it down retroactively, regardless of the author's intent.

2) A login requirement doesn't make it non-public, if making an account is trivial, and there are already hundreds of millions of accounts. Is the plot of Avengers: Endgame also not public, because it's locked behind a ticket purchase or subscription?

1 more reply

Alex39173y ago

> extracting data from profiles that opted for not being public

The tool lets you download the contact info of your friends, which you should be able to do anyway. In fact Facebook tries to trick its users into thinking they can do this with their data takeout option, but the downloaded files don't actually include any of the contact info for your contacts. Which makes zero sense, considering the entire point of Facebook is that it's a digital rolodex for storing your friends' contact info.

slightwinder3y ago

From the article, it seems to be service for scrapping data you have access anyway. As long as they only handle those data to the requesting customer, whose login they used, I don't see a difference between general public, and this users personalized "public". If access is still limited to the people who have the access-rights, then I don't see a difference between accessing through the official interface, or via scrapped data.

saddlerustle3y ago

Users make information available on facebook with the expectation that they are able to later control access to it (other than the obvious threat model of screenshotting, etc). This is violating that expectation and thus their privacy.

3 more replies

adolph3y ago

The state of "opted for not being public" and 'available to any system authenticated person' seem contradictory.

I appreciate that 'system authenticated person' is a smaller set than those who can access anything publicly accessible, and that the former is a subset of the latter.

lolinder3y ago

I agree with the moral argument against posting the scraped data publicly, but if someone gave my account access to their data, I don't think they have a moral right to say I can't use a script to do something private with it.

Scripts are tools, and like any tool they're extensions of the self. If it's morally okay to do it by hand, it's morally okay to do it with a script, so long as my script is respectful of server resources.

upupandup3y ago

Instagram behind a login screen is public. If you say were an OnlyFans model and somebody paid for your videos, scraped them, then there would've been implicit agreement.

Sharing photos on Instagram, there is no such understanding, news outlets have been logging in to view and publish your instagram photos so.

trasz3y ago

If they are being harvested it makes them public by definition. Unless there was a break-in.

bko3y ago

It's their platform. Do you really want some random companies scraping your facebook and instagram posts?

logifail3y ago

> Do you really want some random companies scraping your facebook and instagram posts?

Thought experiment: if you want to keep control over your data, try something radical: don't hand it to Meta/FB/IG at all

(Full disclosure, I'm neither on FB nor IG)

iandanforth3y ago

Yes. I want a free and open web.

xvector3y ago

Good for you. Normal people do not want posts shared privately amongst friends to become publicly available.

3 more replies

ceejayoz3y ago

I'd rather anyone than "just Facebook".

"Just Facebook" has made the web shittier; entire realms of essentially public, often great content hidden behind a login wall.

trasz3y ago

It’s not “your Facebook”, it’s Facebook’s Facebook. You already made that data public, otherwise it would be impossible to scrap it.

ogurechny3y ago

As others said, there is no “you” in the scheme. It's Facebook's data. When people access that data without paying, they are “bad guys”. When the very same people pay for it, they are “legal partners”. In both cases they can do anything with it, while Facebook can't be held responsible because of all the official agreements. So as long as there is no specifically bad publicity or money loss anything goes either way.

“You” only exist in numerous empty statements about “privacy”, “respect”, etc. If you are feeling artsy, you can make that hyped NFT thing out of those, and see whether those kilobytes of text really worth anything.

lbriner3y ago

What you are claiming here is not true in Europe. If FB hold data about you, the data is still your legal right. You can have it deleted and changed if it is somehow untrue and have variou other rights too.

There is a relationship involved because ultimately as a FB user, if I don't like what they are doing, I can ask them to remove my data permanently and they must legally do that. If someone has "scraped" that data (if it is considered PID), without my permission or a legal basis to do so, they are in breach of the GDPR and can have enforcement taken against them.

I think some of these "aggregation" businesses will fall foul of this in Europe but I don't know what will realistically happen if that business does not exist in Europe and breaches the GDPR.

2 more replies

vorpalhex3y ago

You published them for the world to see... so yes, presumably.

rustdeveloper3y ago

“This industry makes scraping available to individuals and companies that otherwise would not have the capabilities.” - seems like web scraping companies are doing a good job :)

jhoelzel3y ago

The phone charger makes engery available to individuals and companies that otherwise would not have the capabilities. ;)

theincredulousk3y ago

Maybe some irony here as IIRC Facebook started as essentially a scraping company, pulling student profiles from college websites and re-publishing it for their own profit.

The scrapers have become the scrapees. The horror.

PhilipA3y ago

>Octopus, a US subsidiary of a Chinese national high-tech enterprise, built a cloud-based platform designed to provide paying customers access to on-demand scraping software and services.

It is interesting as how they try to position this as a Chinese attack on them.

upupandup3y ago

It must coincide with Christopher Wray's sudden claim that there is an active dragnet of sorts that is trying to subvert America from within much like the recent election interference of a former Tianmen square activist who tried to run for congress I think.

It makes me think that there are many people on CCP's dole, rich powerful famous people are somehow beholden to the CCP in some unknown way but we can all guess correctly that they are all old white men who have previously been seen with young females.

MangoCoffee3y ago

it look like Zack is giving up on the Chinese market.

romanovcode3y ago

I guess after Winnie the Pooh rejected to name his children for him he got sour grapes for China.

throwaway_meta3y ago

People that are criticizing this probably were also critical of the Cambridge Analytica scandal, but it would be useful to compare what happened there and here.

With Cambridge Analytica:

- Facebook allowed users (with informed consent) to allow external developers to access their data and limited data about their friends, in order to build social-enabled apps.

- CA exploited this to scrape basic profile data from a large number of users. It broke the ToS by doing so (in particular by using the data for purposes different than stated)

Here the same is happening:

- people are giving a third company access to their profile, which includes access to friends' data (in fact a lot more than what the app platform allowed to do)

- the company is scraping all the data.

At the time of CA, the criticism was that Facebook didn't do enough to enforce its ToS (or maybe that the data sharing should have not been allowed in the first place? But the terms were common knowledge and the attack potential became clear only in hindsight), here people are criticizing that Facebook is in fact enforcing its ToS.

Also note that strong enforcement against scraping is one of the mandates that came from the FTC settlement.

It seems inevitable that any news about Facebook/Meta is read in the worst possible light these days, even when the criticism is self-contradictory. I would expect less superficial commentary from HN.

unosama3y ago

The real reason most people were upset about Cambridge Analytica was it revealed to the public how advertising and PR companies manipulate us. The fact they violated facebook ToS is moreso the excuse for the press covering it when they wanted to write another anti-Trump piece. If you were accusing a specific newspaper of hypocrisy based on two article I might agree. But you're referring to general public sentiment, and I really don't think most people cared or were surprised about the data collection. The shock and scandal was the realization that targeted advertising campaigns and information bubbles have the potential to sway elections.

throwaway_meta3y ago

I'm referring to the HN crowd, I'm not sure that can be equated to "general public sentiment".

I agree with your first paragraph, and my point is that it is not possible to argue at the same time that Facebook should share data more broadly and allow scraping, and at the same time be critical that Facebook allowed CA to happen in the first place.

If the CA scandal was a wake-up call, it appears it was not internalized enough for people to understand the implications of what they're suggesting in this thread?

carride3y ago

In the early days of FB, they convinced people that pages (or some content, sorry I do not know the FB terms) could be public for anyone to view without needing to login to FB. This was very helpful for small businesses and communities. In many countries this is still the quickest place to make a public page. Though now, every small business or community page I want to visit is locked out unless I login FB. Even if I do login it is impossible to copy paste the important details of a page or post, plus the UI is as ugly as it has always been.

carride3y ago

I am currently in the USA and when I visit a public FB page e.g. [1], there is a small login header, and a very big annoying footer login. I estimate 15% of the content is blocked. I had spent the past year outside USA until one month ago. When I visited the same sites while traveling outside the USA, the annoying login footer moves to the middle of the page blocking almost all content. I do not have proof at the moment, but that was my experience trying to read 95% of government, business, and community pages who are almost all on FB.

  [1] https://www.facebook.com/ParquesNacionalesdeArgentina

htrp3y ago

This is different from LinkedIn v HiQ because HiQ was only scraping publicly available data that was generally accessible to the broader internet. In these two cases, the data is being scraped from FB/Insta using credentials that the client handed over or the mass creation of accounts solely for scraping purposes.

Nextgrid3y ago

> the mass creation of accounts solely for scraping purposes.

Those accounts wouldn't be allowed to view private data though unless they friend/follow the person first, so they'll only still be limited to data the account holders intend to be public and available to anyone.

There's also no evidence that the scraped data was aggregated at scale or commingled in any way, so even if customers provided their actual credentials which grant them access to private data of their friends, the scraper didn't share it with anyone else but them.

squaresmile3y ago

Yeah, I think this is more like the Cambridge Analytica situation.

benwad3y ago

Did FB ever take any legal action against Cambridge Analytica? I can't remember anything about it and this sounds very similar to that (although back in those days FB's tools made this incredibly easy).

lesuorac3y ago

No. FBs ToS at the time [1] allowed CA to do what they did.

Namely, CA didn't resell the data or give it to an ad agency.

[1]: https://web.archive.org/web/20180329131546/https://developer...

Nextgrid3y ago

I wish the Cambridge Analytica FUD would stop. CA's "attack" was to setup a malicious website that convinced idiots to give it access to their Facebook account using the standard oAuth2 flow.

Did they misuse the collected data? Sure. But people granted access to that data knowingly. This wasn't really an attack in my view.

Facebook wasn’t really complicit and definitely didn’t sell/give away any data.

postalrat3y ago

What would be your position the data being scraped is data the site is selectively providing google for indexing but don't provide publicly.

i_have_an_idea3y ago

> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus

"self-compromised" lol

clearly these people just wanted an automated way to access their own data

antonf3y ago

> clearly these people just wanted an automated way to access their own data

GDPR and CCPA (and probably many other national/state privacy laws) forces facebook/instagram/etc to let you download and/or delete your data without using third party websites. Usually people self-compromise their accounts in exchange for money: https://www.buzzfeednews.com/article/craigsilverman/facebook...

pclmulqdq3y ago

They have to keep the walls up on their garden so they can get maximum value from harvesting.

ok1234563y ago

Remember back when facebook grew their little network by scraping your gmail contacts.

Google blocked them.

There was animus between the two companies that resulted in Facebook not making an official android app until 2010.

pid-13y ago

> scrapping attack

mohamez3y ago

That cracked me up when I read it lol

almog3y ago

Ironically, around a year ago I disclosed (using their White Hat bug bounty program) that I'm able to access recruitment data (candidates details mostly) using very cheap form of scraping against a 3rd party service provider, they dismissed it and instructed me to report it to the 3rd party that operates that service (which I did beforehand but the issue has had not been fixed).

Sorry for being vague here, I haven't publicly disclosed it yet, but will probably have to if it don't get fixed.

nicholasjarnold3y ago

Funny story from the early days of TheFaceBook, probably around 2005ish:

I was a webmaster of a set of servers on a major university's network. I also had access (enough to run arbitrary programs that had pretty much full ingress/egress to the public internet) to a number of machines across the campus's network. Through some of my coursework and ACM chapter activities I met some other similarly minded technical people with similar levels of access.

We decide that it would be fun to use our superpowers (access + programming abilities + curiosity) to sign up for various accounts on FB and essentially scrape and friend as much as possible. At the time they had some rate limiting, some IP banning (which wasn't terrible because the Uni gave public IPv4 addrs to all machines on campus by default) and then added some early CAPTCHA which we ended up breaking pretty trivially with some python and image recognition code.

Never got sued... :) Never really did much with the scripts or data except test that they worked. Fun times.

cosmiccatnap3y ago

I would consider this appropriate if one of the largest offenders of scrapping weren't the one pretending to be the offended.

paultopia3y ago

"Scraping attacks" LOL

sophacles3y ago

Why not? weev was put in jail over incrementing a number in a url. Surely writing software to put values into urls is even worse.

sneak3y ago

Let's be clear and accurate: technically weev was put in jail for conspiring on IRC with JacksonBrown. JacksonBrown was the one who wrote a PHP script that incremented a value in a URL (and appended a valid Luhn check digit following incrementation).

Conspiracy to access a protected computer system - that is, typing on IRC. weev didn't write any of the code or access the API.

samsoftstuff3y ago

It's like they don't know that courts just made it legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

brushfoot3y ago

From the article: "[T]he Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act."

The key phrase is "publicly accessible." This wasn't that. The scraping was done by automating Facebook accounts, which have terms of service, which forbid scraping.

ToS/EULAs make a big difference. They're the reason Blizzard could shut down bnetd's StarCraft server. They're why no one can legally reverse engineer Oracle to create a drop-in replacement, despite interoperability provisions.

More and more platforms are putting the majority of your user-generated content behind auth walls with ToS because that's how they prevent competitors from swiping it.

EMIRELADERO3y ago

> ToS/EULAs make a big difference. They're the reason Blizzard could shut down bnetd's StarCraft server. They're why no one can legally reverse engineer Oracle to create a drop-in replacement, despite interoperability provisions.

Strictly referencing EULAs for user-owned copies of software here, not ToS:

That is not true. The Blizzard court clearly erred in not considering unconscionability when analyzing the EULA. As for Oracle, the interoperability provisions are what overrides that part of the EULA.

Nextgrid3y ago

Does it go into detail about the actual meaning of "publicly accessible"? Because most content on Facebook/Instagram requires any valid login (as opposed to a specific account) and that data people intend to be public (especially on Insta).

In this case, the account requirement would be a technicality and the data, for all intents and purposes, would still be considered "publicly accessible" if anyone with an account can access it.

upupandup3y ago

Putting a login screen that any public member can bypass isn't private information. Private info would be Onlyfans videos. So far there is no such feature on Instagram

blantonl3y ago

"Legal" doesn't make it ethical, nor does it shield you from liability if you willfully violate contract law (terms of service)

Nextgrid3y ago

So much bad faith in this press release but not surprising from such a disgusting company, with of course some China-related fear-mongering despite no evidence of wrongdoing.

> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus.

They didn't "self-compromise" their account. They trust Octopus to act on their behalf, and unlike Facebook, Octopus' interests are most likely more aligned with their users' since their service is paid. This is no different from handing your Facebook credentials to your social media manager or secretary. There's no evidence that Octopus misused this access in any way.

> Octopus designed the software to scrape data accessible to the user when logged into their accounts, including data about their Facebook Friends such as email address, phone number, gender and date of birth, as well as Instagram followers and engagement information such as name, user profile URL, location and number of likes and comments per post.

This is either information people intend to be public or information they trust their friends to keep private. Now if Octopus was leaking the private information to third-parties it would be one thing, but so far I see no evidence Octopus was disclosing the scraped information to anyone but their customer (who is already authorized to access it).

> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services

Translation: Meta is an industry leader in protecting its disgusting business model that hinges on making public data behind a walled garden with an unacceptable "privacy" policy. There wouldn't be a market for Octopus (or other scrapers) if Facebook already allowed customers to efficiently access information they're already entitled to, but that would be against their interests as their entire business hinges on information being held hostage.

They've created a problem, are selling the cure (well in this case monetizing it via ads) and are now pissed off that someone else is selling the cure for cheaper.

Litost3y ago

Anyone else heard of Tim Berners-Lee's idea of hosting your data in pods outside the relevant corps wanting access to it and you controlling what's shared and how? This is such a completely different way of doing it, I'm not sure of all the implications, be that from admin (how much effort) to security (would this be a massive hacking opportunity) etc. https://www.theregister.com/2022/01/20/tim_bernerslee/

allenleee3y ago

Ironically, Octopus reminds me of "Octopus VR" in the Silicon Valley show.

https://www.youtube.com/watch?v=ltFB4WBdDg4

mothsonasloth3y ago

"It's a water animal"

viburnum3y ago

One of Facebook’s earliest acquisitions was a scraping company called Octazen.

dangerlibrary3y ago

Fingers crossed they eventually get around to suing Clearview AI out of existence.

https://www.nytimes.com/2020/01/18/technology/clearview-priv...

oxff3y ago

Pretty rich idea coming from FB, lol. They do human scraping.

trasz3y ago

We need to update the law to make sure Meta loses in cases like this.

jmyeet3y ago

I'm torn on Web scraping because the extreme of each end of the spectrum on this issue both seem unreasonable.

On one side, you have people who say any form of scraping is be disallowed, even prosecutable. This went so far that the Department of Justice on behalf of AT&T prosecuted a case of URL modification [1]. One of the few bright spots for this psychotic Supreme Court was to curtail the government's power under the CFAA by limiting what constituted "unauthorized" access [2].

On the other hand, there are those who think that any level of scraping should be fine and I think that's untenable too. Consider Yahoo indexing of Stack Overflow [3]:

> In the meantime, since Yahoo (via Slurp!) is about 0.3% of our traffic, but insists on rudely consuming a huge chunk of our prime-time bandwidth, they’re getting IP banned and blocked.

Do these "scraping extremists" think such actions should be illegal? It's actually not that far-fetched given the Ninth Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like if you change your website with the intent that it'll make scraping more difficult, is that a problem? What if it's an unintended side effect?

Additionally, companies like Meta, Google and Apple are going to be way more acountable to abiding by data retention laws and regulations than any scraper. If it's OK to scrape FB.com completely, that information is out there forever.

I certainly think the government shouldn't prosecute on behalf of companies. At least that should expose to people how the government's #1 priority is in fact to protect the true constituents: corporations and the capital-owning class.

[1]: https://www.techdirt.com/2013/09/30/dojs-insane-argument-aga...

[2]: https://en.wikipedia.org/wiki/Van_Buren_v._United_States

[3]: https://stackoverflow.blog/2009/06/16/the-perfect-web-spider...

[4]: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

ConstantVigil3y ago

> So much about this case is ridiculous, and it’s complicated by the fact that nearly everyone agrees that weev is a world-class jerk. But, you need to separate that out from the details of what he did here, to note that it was nothing particularly special, and it involved the sort of thing that security researchers do all the time, and which all sorts of non-security researchers do quite often.

Yeah... uhm... I used to do exactly this sort of thing...

When I was a teenager, I would look at the URL of whatever site I was on, and would change a number here, or a letter there; and see what I got.

Sometimes you get nothing, sometimes you get something. Sometimes that something is quite interesting.

romanovcode3y ago

> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services, which provide scraping as a service across multiple websites.

Sure, as long as Meta is not the one selling the data to Cambridge Analytica it's wrong.

xvector3y ago

HN is hypocritical - most commenters here are against this because "Meta bad," but at the same time, most commenters wouldn't want their posts shared privately amongst friends to be scraped and made available publicly.

oefrha3y ago

> most commenters wouldn't want their posts shared privately amongst friends to be scraped and made available publicly.

Where's the "posts shared privately amongst friends made public" part? There are two cases here:

1. A service that logs in as the customer (who voluntarily provide their credentials) and scrapes information visible to said customer on their behalf. Nothing about "made available publicly" is alleged.

2. An individual using a pool of bot accounts to scrape posts visible to any logged in user. Nothing about "shared privately" is alleged. To be clear I don't like the method, but I'll also have to admit I've used one of the Instagram "clone sites" in the past thanks to their login wall.

Unless I missed something, it sounds like you just made it up.

mpeg3y ago

For that to happen, one of your friends would have had to willingly allow this tool to scrape their social network, which would include your private posts.

Is the scraper to blame here, or the friend?

ogurechny3y ago

As many other people, you are calling something “private” when it is not.

“Privately shared with friends” used to mean that only you and your friends know something. You don't “share” anything with “friends” on a social network. You give the information to a giant corporation. If it finds it suitable, it then delivers it to other users, but only after it records your location, analyzes the content to check if you were, say, affected by some melodramatic event (and therefore should be tricked into spending more time… I mean, get “personal recommendations” for a certain kind of content), and does a billion other things.

If you consider that this is fine, please relay all your conversations with family and friends through me from now on. I offer secure, reliable, fast, yada yada communication service. And it's hip! Ask anyone on the street what they use.

pawelkobojekOP3y ago

There are two cases they brought up, one being web scraping and the other is making a clone website publicly displaying content from Instagram.

I think Meta might be mixing up these two cases here on purpose to make it look like web scraping is as bad as stealing photos to publish it on a clone website.

postalrat3y ago

Who is scraping their private messages? Themselves or their friends?

Komodai3y ago

lol maybe if you don't want that happening you shouldn't be using Facebook

throwaway59593y ago

Wasn’t Meta stealing news articles and not paying news organizations for them?

NelsonMinar3y ago

Octopus sounds really useful; is there an open source equivalent? I'd love to be able to scrape my own data on Facebook. Their data export feature is fairly good but far from complete.

typon3y ago

Google has turned Google Search into a walled garden by scraping people's content and serving it up on their own platter. Is anyone going to stand up to them?

dmje3y ago

Or Facebook could just open up their data. Oh wait, not their data, silly me. Everyone else's data. Keep on scraping, I say.

rmbyrro3y ago

The fact they're wasting time on that is a sign that Facebook decay phase has already started.

upupandup3y ago

whoa wasn't there somebody on HN that ran a web scraping shop that were boasting they can scrape instagram a while back? are these the same guys???

I don't know how far Facebook can get with this, thought Linkedin's court ruling made scraping legal de-facto

jascii3y ago

So, Facebook doesn't want to share the data it wants us to share with them? Figures...

postalrat3y ago

Hey instagram/facebook/linkedin/etc: It's not your data.

samsoftstuff3y ago

It's like they don't know that courts made it legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

neya3y ago

Evil Big Co. that literally STEALS people's personal information everywhere they go even after they've indicated they want to be left alone is now offended when someone does the same to them?

Well, color me surprised /s

Fuck Facebook. Meta. Or whatever you want to call it.

Hedepig3y ago

Is this much different from LinkedIn vs hiQ?

nojito3y ago

Logged in vs not logged in data.

logifail3y ago

> Logged in

Is this actually private data, or is it public stuff that's become annoyingly hard to view anonymously because Meta chose to stick it behind a login box?

cupofpython3y ago

>public stuff that's become annoyingly hard to view anonymously because Meta chose to stick it behind a login box

this one

nojito3y ago

Anything behind a login gate is private data for that registered user only.

3 more replies

throw202207073y ago

From GDPR point-of-view this kind of 3rd party data collection is not acceptable (assuming it covers personal information, for example names of people and what they have posted). The difference with Meta's own data collection is that the users have relationship with Meta and users have given their permission for Meta to handle the data. Users also know they can contact Meta and ask them to remove the data.

3rd parties don't have the consent from users. Users don't even have an idea these companies might be holding their data.

Nextgrid3y ago

From a GDPR point of view the scraper would be acting as a data processor on behalf of their customer, no different from using a cloud storage service for your contacts. It's fine as long as the third-party doesn't misuse the scraped data or share it with third-parties and there's no evidence they did so in this case.

danuker3y ago

> and there's no evidence they did so in this case.

Indeed; the users probably wanted to make the data public, if scraper accounts could see it. There is a GDPR allowance for data "manifestly made public by the data subject".

https://gdpr-info.eu/art-9-gdpr/

Here, it's just Facebook wanting to keep the data inside a walled garden.

For the same reason, I quit LinkedIn and made my own site. I don't want people to have to sign in to see my profile.

uhtred3y ago

Fuck off Facebook you scumbags

Komodai3y ago

Is it Octopus Data Inc. aka Octoparse they are suing?

jacooper3y ago

They are will using fb.com domain? I though meta is not FaceBook?....

Silica61493y ago

I think it's like Google vs Alphabet. Alphabet is the parent company like Meta.

As for why their domain is facebook for their news site, not sure why. It would make for sense for it to be under meta instead.

j / k navigate · click thread line to collapse

227 comments

iandanforth3y ago

Collecting the rhetorical BS:

"scraping attacks"

Scraping is not an attack. Monopolists want to pretend they own your data because they get unlimited access to monetize it whereas competitors should have none.

"self-compromised"

"protect people from scraping"

"deter the abuse"

"safeguard people against clone sites"

Monopolists want to maintain their monopoly, there is no greater threat than a direct challenge to that monopoly by allowing data to move freely.

More subtle but even more ironic rhetorical points

"for hire" / "paying for access"

Emphasizing that people making money (gasp) for providing this service, is bad.

"industry leader in taking legal action" + "across many platforms and national boundaries, also requires a collective effort from platforms, policymakers and civil society"

Monopolists can pay high priced marketers to rebrand them as patriotic hero figures fighting valiantly for the little guy.

pr0zac3y ago

1: https://twitter.com/dril/status/841892608788041732

Mo33y ago

The example you stated is a truly fantastic one. Graph Search was pretty much like a direct API into their front facing network.

nathanaldensr3y ago

Great post that summarizes exactly what I feel about globocorps. The euphemisms and propaganda are disgusting.

noslenwerdna3y ago

The users agreed to share their data with Facebook, not some other company. If they didn't prevent this, they'd be asking for another Cambridge Analytica

stickfigure3y ago

The users agreed to share their data with everyone that uses Instagram. Because that's how the site works.

kube-system3y ago

There’s an important difference between technically consenting and informed consent.

Fooling people is fraudulent, period.

greatgib3y ago

The user agreed in facebook to have is data "public", so it can't complain that a robot scrap it.

Nothing prevents him to restrict access to his pages an data to "trusted" friends.

kube-system3y ago

The description in the article sounds like it scrapes private profile data.

> Octopus designed the software to scrape data accessible to the user when logged into their accounts

2 more replies

jasfi3y ago

That is a very good point, but surely it was taken into consideration when scraping was declared legal?

stefan_3y ago

1 more reply

danuker3y ago

https://techcrunch.com/2022/04/18/web-scraping-legal-court/

utahcon3y ago

asdff3y ago

kbenson3y ago

It's impossible to control information once been created. The longer it's existed and the more locations you can see it make that spread exponentially more likely.

Wehether we make that spread of informationlegal or not does little to affect whether it happens.

kube-system3y ago

“It’s hard to enforce the rule of law” is not a good reason to abandon it entirely. Data privacy laws make data privacy better even without being 100% infallible.

We should be both practicing good data hygiene and using legal tools to combat those who abuse data privacy.

1 more reply

mylons3y ago

they also toss in the chinese affiliation in hopes to bring even more ill will from the reader towards the company. china is probably doing some bad things, but scraping facebook ain’t one of them.

kube-system3y ago

Scraping social media is something that China is very notorious for doing. They are 100% positively scraping all major social networks around the world.

They do this to collect information of foreign policy interest to them, to silence political dissidents abroad, etc.

For example: https://www.washingtonpost.com/national-security/china-harve...

And: https://www.propublica.org/article/even-on-us-campuses-china...

iandanforth3y ago

Good point, I missed that one.

SergeAx3y ago

I don't get the thing about "monopoly".

rmbyrro3y ago

Missed this one:

> a US subsidiary of a "Chinese national" "high-tech" enterprise

Replacing it with "a business" would do just fine.

1 more reply

TechBro86153y ago

[0] https://www.zdnet.com/article/facebook-internal-email-reveal...

[1] https://about.fb.com/news/2020/10/taking-legal-action-agains...

fxtentacle3y ago

Of course, Facebook wants to make it sound like scraping is illegal, when it generally isn't.

But account hijacking and mass-creation of accounts just to access private pages are clear violations of the Facebook and Instagram ToS, so they surely can sue for that.

Raed6673y ago

Violation of ToS does not mean a violation of the law.

closewith3y ago

Most law suits aren't due to breaches of the law, but breaches of contract. Whether terms of service constitute an enforceable contact is another matter.

adamsmith1433y ago

ToS have been around for decades, surely this question is settled by now?

1 more reply

jhoelzel3y ago

if a bot creates the account, who breaches the contract?

1 more reply

stonemetal123y ago

compsciphd3y ago

in reality, you as an individual can't press charges. Only the state can. And many times the state chooses not to. You can sue in civil court, but individuals can't bring cases in criminal court.

3 more replies

sneak3y ago

"pressing charges" isn't a thing.

2 more replies

CoastalCoder3y ago

I don't think I know the answer, but I'm curious:

Does violating a website's TOS meant your accessing it beyond your authority, making it a violation of the US's Computer Fraud and Abuse Act?

tumult3y ago

Not a violation. Decided by Supreme Court in 2021. Van Buren vs. United States. It was a big deal.

zja3y ago

Violating TOS no; Gaining access beyond your authority maybe https://www.eff.org/deeplinks/2010/07/court-violating-terms-...

1 more reply

danaris3y ago

I don't have a source for this, but my recollection is that this has been successfully argued by a couple of companies—but then an appeals court found very firmly that it was not the case.

Essentially, having that be true would mean that any given website could create whole new classes of criminal behavior.

1 more reply

dementiapatien3y ago

Since when do you get sued for breaching TOS?

curiousllama3y ago

Since you start a business on the violation.

"Since when do I get sued for taking too many free samples from Costco?" -> "Since you started taking millions of them to resell"

jhoelzel3y ago

im not sure on american law, but if you give me those samples willingly i can do whatever i want with them.

Actually this is the reason why many products come with the lable "not for resale" but i have yet to find somebody who cares about it :D

1 more reply

thallium2053y ago

Since when do you get sued for breaching a contract? When the offense is worth it.

golemotron3y ago

You can get sued for anything that causes harm.

Relevant life lesson: don't do things to people with money that they might perceive as harm.

Corollary: Being sued is as much punishment as losing a suit for most people.

contravariant3y ago

I don't know but it's at least been that way since Aaron Swartz did it I suppose.

HeckFeck3y ago

Data harvesting is moral for me, but not for thee.

mateuszbuda3y ago

kordlessagain3y ago

Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

At the same time, I don't think all of Instagram's users care if their images are hidden, or not.

I've spent a significant amount of time trying to "work around" anti-scraping efforts by various companies and it's frustrating to see hostility instead of cooperation in certain types of use.

car_analogy3y ago

> Facebook has hidden much of Instagram's content behind logins, so that makes most of it "not public".

1) It was public when the content was posted by its authors. Facebook locked it down retroactively, regardless of the author's intent.

1 more reply

Alex39173y ago

> extracting data from profiles that opted for not being public

slightwinder3y ago

saddlerustle3y ago

3 more replies

adolph3y ago

The state of "opted for not being public" and 'available to any system authenticated person' seem contradictory.

I appreciate that 'system authenticated person' is a smaller set than those who can access anything publicly accessible, and that the former is a subset of the latter.

lolinder3y ago

upupandup3y ago

Instagram behind a login screen is public. If you say were an OnlyFans model and somebody paid for your videos, scraped them, then there would've been implicit agreement.

Sharing photos on Instagram, there is no such understanding, news outlets have been logging in to view and publish your instagram photos so.

trasz3y ago

If they are being harvested it makes them public by definition. Unless there was a break-in.

bko3y ago

It's their platform. Do you really want some random companies scraping your facebook and instagram posts?

logifail3y ago

> Do you really want some random companies scraping your facebook and instagram posts?

Thought experiment: if you want to keep control over your data, try something radical: don't hand it to Meta/FB/IG at all

(Full disclosure, I'm neither on FB nor IG)

iandanforth3y ago

Yes. I want a free and open web.

xvector3y ago

Good for you. Normal people do not want posts shared privately amongst friends to become publicly available.

3 more replies

ceejayoz3y ago

I'd rather anyone than "just Facebook".

"Just Facebook" has made the web shittier; entire realms of essentially public, often great content hidden behind a login wall.

trasz3y ago

It’s not “your Facebook”, it’s Facebook’s Facebook. You already made that data public, otherwise it would be impossible to scrap it.

ogurechny3y ago

lbriner3y ago

I think some of these "aggregation" businesses will fall foul of this in Europe but I don't know what will realistically happen if that business does not exist in Europe and breaches the GDPR.

2 more replies

vorpalhex3y ago

You published them for the world to see... so yes, presumably.

rustdeveloper3y ago

“This industry makes scraping available to individuals and companies that otherwise would not have the capabilities.” - seems like web scraping companies are doing a good job :)

jhoelzel3y ago

The phone charger makes engery available to individuals and companies that otherwise would not have the capabilities. ;)

theincredulousk3y ago

Maybe some irony here as IIRC Facebook started as essentially a scraping company, pulling student profiles from college websites and re-publishing it for their own profit.

The scrapers have become the scrapees. The horror.

PhilipA3y ago

>Octopus, a US subsidiary of a Chinese national high-tech enterprise, built a cloud-based platform designed to provide paying customers access to on-demand scraping software and services.

It is interesting as how they try to position this as a Chinese attack on them.

upupandup3y ago

MangoCoffee3y ago

it look like Zack is giving up on the Chinese market.

romanovcode3y ago

I guess after Winnie the Pooh rejected to name his children for him he got sour grapes for China.

throwaway_meta3y ago

People that are criticizing this probably were also critical of the Cambridge Analytica scandal, but it would be useful to compare what happened there and here.

With Cambridge Analytica:

- Facebook allowed users (with informed consent) to allow external developers to access their data and limited data about their friends, in order to build social-enabled apps.

- CA exploited this to scrape basic profile data from a large number of users. It broke the ToS by doing so (in particular by using the data for purposes different than stated)

Here the same is happening:

- people are giving a third company access to their profile, which includes access to friends' data (in fact a lot more than what the app platform allowed to do)

- the company is scraping all the data.

Also note that strong enforcement against scraping is one of the mandates that came from the FTC settlement.

unosama3y ago

throwaway_meta3y ago

I'm referring to the HN crowd, I'm not sure that can be equated to "general public sentiment".

If the CA scandal was a wake-up call, it appears it was not internalized enough for people to understand the implications of what they're suggesting in this thread?

carride3y ago

  [1] https://www.facebook.com/ParquesNacionalesdeArgentina

htrp3y ago

Nextgrid3y ago

> the mass creation of accounts solely for scraping purposes.

squaresmile3y ago

Yeah, I think this is more like the Cambridge Analytica situation.

benwad3y ago

lesuorac3y ago

No. FBs ToS at the time [1] allowed CA to do what they did.

Namely, CA didn't resell the data or give it to an ad agency.

[1]: https://web.archive.org/web/20180329131546/https://developer...

Nextgrid3y ago

I wish the Cambridge Analytica FUD would stop. CA's "attack" was to setup a malicious website that convinced idiots to give it access to their Facebook account using the standard oAuth2 flow.

Did they misuse the collected data? Sure. But people granted access to that data knowingly. This wasn't really an attack in my view.

Facebook wasn’t really complicit and definitely didn’t sell/give away any data.

postalrat3y ago

What would be your position the data being scraped is data the site is selectively providing google for indexing but don't provide publicly.

i_have_an_idea3y ago

> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus

"self-compromised" lol

clearly these people just wanted an automated way to access their own data

antonf3y ago

> clearly these people just wanted an automated way to access their own data

pclmulqdq3y ago

They have to keep the walls up on their garden so they can get maximum value from harvesting.

ok1234563y ago

Remember back when facebook grew their little network by scraping your gmail contacts.

Google blocked them.

There was animus between the two companies that resulted in Facebook not making an official android app until 2010.

pid-13y ago

> scrapping attack

mohamez3y ago

That cracked me up when I read it lol

almog3y ago

Sorry for being vague here, I haven't publicly disclosed it yet, but will probably have to if it don't get fixed.

nicholasjarnold3y ago

Funny story from the early days of TheFaceBook, probably around 2005ish:

Never got sued... :) Never really did much with the scripts or data except test that they worked. Fun times.

cosmiccatnap3y ago

I would consider this appropriate if one of the largest offenders of scrapping weren't the one pretending to be the offended.

paultopia3y ago

"Scraping attacks" LOL

sophacles3y ago

Why not? weev was put in jail over incrementing a number in a url. Surely writing software to put values into urls is even worse.

sneak3y ago

Conspiracy to access a protected computer system - that is, typing on IRC. weev didn't write any of the code or access the API.

samsoftstuff3y ago

It's like they don't know that courts just made it legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

brushfoot3y ago

The key phrase is "publicly accessible." This wasn't that. The scraping was done by automating Facebook accounts, which have terms of service, which forbid scraping.

More and more platforms are putting the majority of your user-generated content behind auth walls with ToS because that's how they prevent competitors from swiping it.

EMIRELADERO3y ago

Strictly referencing EULAs for user-owned copies of software here, not ToS:

Nextgrid3y ago

In this case, the account requirement would be a technicality and the data, for all intents and purposes, would still be considered "publicly accessible" if anyone with an account can access it.

upupandup3y ago

Putting a login screen that any public member can bypass isn't private information. Private info would be Onlyfans videos. So far there is no such feature on Instagram

blantonl3y ago

"Legal" doesn't make it ethical, nor does it shield you from liability if you willfully violate contract law (terms of service)

Nextgrid3y ago

So much bad faith in this press release but not surprising from such a disgusting company, with of course some China-related fear-mongering despite no evidence of wrongdoing.

> After paying for access to the scraping software, customers self-compromised their Facebook and Instagram accounts by providing their authentication information to Octopus.

> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services

They've created a problem, are selling the cure (well in this case monetizing it via ads) and are now pissed off that someone else is selling the cure for cheaper.

Litost3y ago

allenleee3y ago

Ironically, Octopus reminds me of "Octopus VR" in the Silicon Valley show.

https://www.youtube.com/watch?v=ltFB4WBdDg4

mothsonasloth3y ago

"It's a water animal"

viburnum3y ago

One of Facebook’s earliest acquisitions was a scraping company called Octazen.

dangerlibrary3y ago

Fingers crossed they eventually get around to suing Clearview AI out of existence.

https://www.nytimes.com/2020/01/18/technology/clearview-priv...

oxff3y ago

Pretty rich idea coming from FB, lol. They do human scraping.

trasz3y ago

We need to update the law to make sure Meta loses in cases like this.

jmyeet3y ago

I'm torn on Web scraping because the extreme of each end of the spectrum on this issue both seem unreasonable.

On the other hand, there are those who think that any level of scraping should be fine and I think that's untenable too. Consider Yahoo indexing of Stack Overflow [3]:

> In the meantime, since Yahoo (via Slurp!) is about 0.3% of our traffic, but insists on rudely consuming a huge chunk of our prime-time bandwidth, they’re getting IP banned and blocked.

[1]: https://www.techdirt.com/2013/09/30/dojs-insane-argument-aga...

[2]: https://en.wikipedia.org/wiki/Van_Buren_v._United_States

[3]: https://stackoverflow.blog/2009/06/16/the-perfect-web-spider...

[4]: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

ConstantVigil3y ago

Yeah... uhm... I used to do exactly this sort of thing...

When I was a teenager, I would look at the URL of whatever site I was on, and would change a number here, or a letter there; and see what I got.

Sometimes you get nothing, sometimes you get something. Sometimes that something is quite interesting.

romanovcode3y ago

> Meta is an industry leader in taking legal action to protect people from scraping and exposing these types of services, which provide scraping as a service across multiple websites.

Sure, as long as Meta is not the one selling the data to Cambridge Analytica it's wrong.

xvector3y ago

oefrha3y ago

> most commenters wouldn't want their posts shared privately amongst friends to be scraped and made available publicly.

Where's the "posts shared privately amongst friends made public" part? There are two cases here:

Unless I missed something, it sounds like you just made it up.

mpeg3y ago

For that to happen, one of your friends would have had to willingly allow this tool to scrape their social network, which would include your private posts.

Is the scraper to blame here, or the friend?

ogurechny3y ago

As many other people, you are calling something “private” when it is not.

pawelkobojekOP3y ago

There are two cases they brought up, one being web scraping and the other is making a clone website publicly displaying content from Instagram.

I think Meta might be mixing up these two cases here on purpose to make it look like web scraping is as bad as stealing photos to publish it on a clone website.

postalrat3y ago

Who is scraping their private messages? Themselves or their friends?

Komodai3y ago

lol maybe if you don't want that happening you shouldn't be using Facebook

throwaway59593y ago

Wasn’t Meta stealing news articles and not paying news organizations for them?

NelsonMinar3y ago

Octopus sounds really useful; is there an open source equivalent? I'd love to be able to scrape my own data on Facebook. Their data export feature is fairly good but far from complete.

typon3y ago

Google has turned Google Search into a walled garden by scraping people's content and serving it up on their own platter. Is anyone going to stand up to them?

dmje3y ago

Or Facebook could just open up their data. Oh wait, not their data, silly me. Everyone else's data. Keep on scraping, I say.

rmbyrro3y ago

The fact they're wasting time on that is a sign that Facebook decay phase has already started.

upupandup3y ago

whoa wasn't there somebody on HN that ran a web scraping shop that were boasting they can scrape instagram a while back? are these the same guys???

I don't know how far Facebook can get with this, thought Linkedin's court ruling made scraping legal de-facto

jascii3y ago

So, Facebook doesn't want to share the data it wants us to share with them? Figures...

postalrat3y ago

Hey instagram/facebook/linkedin/etc: It's not your data.

samsoftstuff3y ago

It's like they don't know that courts made it legal: https://techcrunch.com/2022/04/18/web-scraping-legal-court/

neya3y ago

Evil Big Co. that literally STEALS people's personal information everywhere they go even after they've indicated they want to be left alone is now offended when someone does the same to them?

Well, color me surprised /s

Fuck Facebook. Meta. Or whatever you want to call it.

Hedepig3y ago

Is this much different from LinkedIn vs hiQ?

nojito3y ago

Logged in vs not logged in data.

logifail3y ago

> Logged in

Is this actually private data, or is it public stuff that's become annoyingly hard to view anonymously because Meta chose to stick it behind a login box?

cupofpython3y ago

>public stuff that's become annoyingly hard to view anonymously because Meta chose to stick it behind a login box

this one

nojito3y ago

Anything behind a login gate is private data for that registered user only.

3 more replies

throw202207073y ago

3rd parties don't have the consent from users. Users don't even have an idea these companies might be holding their data.

Nextgrid3y ago

danuker3y ago

> and there's no evidence they did so in this case.

Indeed; the users probably wanted to make the data public, if scraper accounts could see it. There is a GDPR allowance for data "manifestly made public by the data subject".

https://gdpr-info.eu/art-9-gdpr/

Here, it's just Facebook wanting to keep the data inside a walled garden.

For the same reason, I quit LinkedIn and made my own site. I don't want people to have to sign in to see my profile.

uhtred3y ago

Fuck off Facebook you scumbags

Komodai3y ago

Is it Octopus Data Inc. aka Octoparse they are suing?

jacooper3y ago

They are will using fb.com domain? I though meta is not FaceBook?....

Silica61493y ago

I think it's like Google vs Alphabet. Alphabet is the parent company like Meta.

As for why their domain is facebook for their news site, not sure why. It would make for sense for it to be under meta instead.

j / k navigate · click thread line to collapse