LinkedIn loses appeal over access to user profiles (opens in new tab)

(reuters.com)

574 pointsisalmon6y ago163 comments

163 comments

The summary here is that LinkedIn tried to argue that it could prevent scraping of public LinkedIn profile data under their ToS, but the courts have ruled that if data is public and provided by users, it can be scraped/crawled, that is, it isn’t LinkedIn property. This is generally a positive outcome for people/companies turning web text and HTML into structured data, e.g. tools like Puppeteer and Scrapy can be used more freely on sites like LinkedIn, Twitter, and Reddit. Now, you might still get into trouble if you re-publish that data, but you can, at least, safely use the data ”internally”, and the act of scraping/crawling (politely) is not, per se, something unlawful.

eagsalazar26y ago

Not sure "isn't LinkedIn property" is accurate here. They still retain ownership and control of redistribution just like any other IP. This is more of a philosophical question about whether "viewing" itself is a violation of their ownership rights and really about the definitions of "viewing" and "public" in the context of the internet.

Seems like they've simply determined that viewing any freely accessible URL is "public" and that "viewing" does include scraping. This seems like a very reasonable determination as it maps pretty neatly to how we think about viewing public content IRL where I am free to drive down the road (for profit or pleasure) and record publicly viewable signage and activities and use that data any way I see fit.

nordsieck6y ago

> Not sure "isn't LinkedIn property" is accurate here.

It is very accurate. Users retain the copyright on their works in so far as their works are able to be copyrighted. Anything that is a "mere fact", and can't be copyrighted, is also not LinkedIn's property.

From LinkedIn's terms of service[1]:

> you are only granting LinkedIn and our affiliates the following non-exclusive license:

> A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services and the services of others, without any further consent, notice and/or compensation to you or others.

___

1. https://www.linkedin.com/legal/user-agreement#rights

jjeaff6y ago

If I enter my employment record and my profile pic, birthdate, etc, I don't think that is the ip of linkedin. Maybe the way they display it or if they are transforming it in some way it could be considered ip. But if someone scrapes all that user entered data and then displays it somewhere else in a different format, I can't imagine LinkedIn being able to claim their ip has been infringed.

close046y ago

I think this all of this should be the user's choice since every company should put the user at the center of these decisions. If I want my data to be shared in any way I can simply tick a box and allow that. If I don't then keep it just for me and the people I chose to share it on that platform.

It should also be made clear to the users if that data is being used as payment for the services provided by mentioning explicitly and in a detailed way where that data goes.

1 more reply

rebuilder6y ago

Maybe it's more accurate to say "any publicly linked URL"? IIRC, charges have been successfully brought against people for e.g. iterating through user identifiers in URLs to gain access to other users' data. (Do correct me if I'm wrong on that count!)

fragmede6y ago

Andrew Auernheimer, more commonly known as weev, got all of AT&T's ipad users' email addresses at that time, by enumerating all the possible sim-card IDs, against a public facing ATT website. He was charged and convicted the Computer Fraud and Abuse Act (CFAA), and sentenced to 41 months in federal prison that. His sentence was vacated after 13 months due to a technicality of the venue; that judge did not address the substantive question on the legality of the site access.

Weev may be an odious person, but everyone has rights in a court of law, even white supremacists.

1 more reply

komali26y ago

Some kid was charged for that but in my opinion it was stupid. URL to me means part of the UX. If you search on Google using a query parameter directly instead of entering the query in their search box, should that count as wrongful use?

3 more replies

giancarlostoro6y ago

I think that's fine, but I also think the end-user should decide. With Google (edit: I meant Facebook) I'm able to determine whether or not I want to show up in search results. This shouldn't be an absolute is or isn't public situation.

gravitas6y ago

LinkedIn already allows discreet control over your profile's public visibility along with the ability to micro-manage some of it, the URL you're looking for: https://www.linkedin.com/public-profile/settings

tus886y ago

You can decide to not use linked in, and use a service that does not make profiles public.

ludamad6y ago

"If your apple looks a little banged up, eat an orange"

Gene_Parmesan6y ago

Even better, the decision here is only concerning profiles of people who have elected to make that profile public. It's very simple to make your LinkedIn profile private.

1 more reply

lucb1e6y ago

This is about the copyright on the items that people post, i.e. creative works, right? But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

Does this judgement say anything about that, i.e. whether it matters that users contributed the facts in their collection (so I'm not talking about posts, descriptions, etc.) rather than that they collected it themselves and therefore get a form of property right?

Edit: wait, database copyright is not a thing in the USA. Of course they wouldn't say anything about that.

nordsieck6y ago

IANAL

> But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

I don't think so.

> Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101 [1]

The thing is, LinkedIn is not authoring the compilation. The individual users are.

___

1. https://www.bitlaw.com/copyright/database.html

bdowling6y ago

LinkedIn may be the author of the compilation because they curate the database by removing fake profiles and encouraging users to complete their profiles. Also, the graph of connections between profiles may constitute a non-trivial organization method which takes the database out of the trivially-organized databases which were held uncopyrightable in the past. (e.g., Feist v. Rural[0])

In any case, this decision was mostly about upholding the lower court's granting of an order preventing LinkedIn from blocking hiQ's scrapers for the duration of the lawsuit. HiQ could still lose on the copyright questions or other issues.

[0] https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

tempestn6y ago

My understanding is that the contract (TOS) portion is not decided. This decision stated that Linkedin does not have a protected property interest in the profiles, so it can not claim copyright there. It's possible they could claim things like compilation copyright; that's is as yet undecided. Also, the appears court only dealt with the CFAA issue I believe; there's still the contract (TOS) to consider, as well as a possible trespass claim.

Now, the CFAA was the only criminal statue involved, so I guess that supports what you said, that scraping is not unlawful. There still may be liability though, and using the data only internally would not necessarily protect from that. It remains to be seen.

perl4ever6y ago

"it can be scraped/crawled, that is, it isn’t LinkedIn property"

I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

To me, it seems like common sense would be that if you make a public website, you are implicitly permitting some copies, but surely it's not all or nothing?

freeone30006y ago

Facts and tables are not copyrightable. The phone numbers in a phone book are not copyrightable, merely their presentation order[0]. If you were to copy, say, the linkedin website, or the linkedin branding, or the name linkedin, or any of their ads, those would be eligible, but the simple collection of names, emails, and phone numbers is ineligible for copyright.

0: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

Sharlin6y ago

This depends on jurisdiction though. In the European Union specifically there exists sui generis legislation that grants certain rights to the assembler of a database [1]. However, it’s a more interesting situation when the database keeper just provides a means for individuals to fill in their own data.

[1] https://en.wikipedia.org/wiki/Database_right

dragonwriter6y ago

> I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

No, if anything, that supports the decision.

To the extent that the material is copyrightable, it belongs to the users, who have chosen to make it public; copying incidental and necessary to that access is allowed under an implied license doctrine. Microsoft's efforts to restrict access had nothing to do with copyright, but ToS.

GarrisonPrime6y ago

Perhaps it depends on intent. Clearly, the creators of the content, and those who posted the content, did so for the sole intention of making it public and usable outside the LinkedIn system. Their posting of it on LinkedIn is incidental; what site is used or who owns it is largely irrelevant to them, whereas such things clearly do matter to any company or person creating and posting their own unique content to their own site.

paulgb6y ago

At one point, to fight scraping, Craigslist changed their terms so that users assigned them copyright of listings rather than just a license. It didn't work well for them, but it's an interesting approach.

https://www.eff.org/deeplinks/2013/04/craigslist-owns-what-y...

jammygit6y ago

My understanding is that Facebook uses similar clauses to disallow web scraping. Does that mean Facebook is fair game too?

andy_ppp6y ago

I'm pretty sure you would get a big GDPR fine if you start taking data people agreed to put on Linked-in without their express permission.

echelon6y ago

This is fantastic. I would like to see wider legislation allowing scraping of IMDB, Genius, Reddit, Facebook, and Google made legal. These services receive free input from users. The data should remain free.

Edit (sort of off topic): There's still value in the building and providing services at scale, but this lowers the barrier to cross the moat for small players. The first step is data liberation. Then we can work to bring down the other cost barriers. It's a lot easier to build services that scale in 2019 than it was in 2005.

The semantic web was misguided in 200X, but we might want to take another swing at it in the future.

polygot6y ago

If you add .json to the end of a Reddit URL, it will return JSON data. For example: https://www.reddit.com/r/ubuntu.json . It also works with comment threads and posts.

avip6y ago

Wonderful feature also used by Trello https://trello.com/b/rq2mYJNn/public-trello-boards.json

ludamad6y ago

Now that is an ergonomic API.

polygot6y ago

Also, it outputs XML and RSS too: https://www.reddit.com/r/ubuntu.rss and https://www.reddit.com/r/ubuntu.xml

ahbyb6y ago

xml and rss seem to be the same exact output

sneak6y ago

I have adopted this in other projects and added the functionality there as well; it is a brilliant idea.

diminoten6y ago

Yeah no need to scrape Reddit, their content is accessible via their API.

Raidion6y ago

PRAW is also a great python reddit "scraper" that allows you to pull data via their API very easily.

psv16y ago

Another side of this is that the entity doing the scraping is more often than not another company. Which means that if your proposal is implemented, a user can voluntarily give their personal data to Google/Reddit/Facebook etc but that company then has to make the user's personal data available to another company.

ptero6y ago

It's not quite like that. The first company cannot prevent scraping by individuals or another company of information that it already shows to everyone. Which, to me, is a good thing. My 2c.

gkoberger6y ago

Eh. I want my picture and name uploaded to LinkedIn, since it's a professional network and people use it to find me for good reasons. It may seem dumb, however not having a LinkedIn with a good picture can genuinely hurt your career.

I do NOT want my picture run through facial recognition software, or my name/email sold to marketers who will add it to a drip campaign.

3 more replies

dx0346y ago

So Google would need to allow scraping of search results? That would be a huge change, they currently prevent that pretty aggressively.

TeMPOraL6y ago

This is a problem because you're talking personal data, not because of scrapping. Personal / personally identifiable data is special and special protections apply to it. But regular data would fare just fine under GP's proposal.

bobthepanda6y ago

It only applies to data displayed publicly, though. If Facebook and such started requiring logins to see personal data would that be such a bad thing?

jjeaff6y ago

I'm not certain, but it kind of sounds like even things behind a login are still scrapable. Assuming the general public can get a login easily anyway. Basically, just requiring an account is not enough to forbid scraping.

tekknik6y ago

When you talk about data here these are people. This HiQ software is actually a bit scary. What if it gives a false signal which ends in an employees termination? Data on people should not be freely attainable, the person should give explicit access. If I don’t want HiQ processing my information (I don’t) then they shouldn’t be able to. Especially now with some employers requiring a LinkedIn profile.

sireat6y ago

Reddit has a decent API

The golden rule is to use the API before you start raw scraping.

OrgNet6y ago

for IMDb, they have a lot of data that is easily accessible, not sure what is missing though: https://datasets.imdbws.com/...

tdhoot6y ago

Only for personal and non-commercial use, which is probably not what startups need.

OrgNet6y ago

process it on your personal computer and use the output in your startup

1 more reply

JustSomeNobody6y ago

Why would they have to make it available to startups in an easily accessible manner?

1 more reply

jakeogh6y ago

It's already legal. Adding law adds restrictions.

pharrington6y ago

You're not wrong. In the general case though, adding law can mandate that a certain already occurring activity must be done.

lonelappde6y ago

What gives you a rightful claim to information that I gave to someone else, if neither I nor they consent?

austhrow7436y ago

You did consent. This is about information you gave to LinkedIn and told them to give to the general public.

perspective16y ago

I'm torn. On the one hand, scraping helps break down walled gardens. On the other, we're talking about personal details being used in novel ways that no LinkedIn user probably understands. I doubt any LinkedIn user writes their profile expecting HiQ to scrape it, assign a "flight risk" score and alert your bosses.

heavyset_go6y ago

User privacy shouldn't be dependent on draconian anti-scraping laws.

Besides, LinkedIn is already sharing every last bit of their users' information with the highest bidder.

landryraccoon6y ago

To me this is conceptually the same problem as DRM - with your position similar to those trying to build DRM systems.

One can’t both hand over data freely to a service (in this case Linkedin) and also subsequently prevent all sharing of that data. Or to put it another way, you can’t both put your information on a public billboard hoping a recruiter sees it to offer you a job AND keep it strangers private from people you hope won’t misuse it.

Nextgrid6y ago

The users agreed to publish their details publicly on LinkedIn. It’s normal that anyone can access those details and use them however they like.

smohare6y ago

There is a broader ethical discussion of how to treat data, nominally public, that is increasingly collected, persisted and analyzed indefinitely by adversarial agents. It seems clear to me that a more nuanced categorization is required. This data is not public in the same sense as an uttered word was in a town square a hundred years ago.

Imagine being denied job opportunities because some company has analyzed the careers of the last 25 generations of your ancestors and deemed your lineage to be inadequate?

pergadad6y ago

You mean maybe for your LinkedIn general profile info to be public, but not for the old profile data or the metadata of when your changed what to be public. It's not per se secret or hidden but it's also not intended to be kept and processed.

This is also a clear case where GDPR would come in. This is personal data, whether intentional or not, and the scraper is obliged to conform to EU laws if they scrape data on EU citizens - including eg information rights and deletion.

alkonaut6y ago

I don't agree to republish them. Just because they are publicly accessible on SiteA doesn't mean I have agreed to have them be published on SiteB does it?

mmcwilliams6y ago

That isn't what's really being ruled on here, though. Republishing the data is still restricted via copyright, but the data may exist in an internal database that SiteB uses to do X.

The question at hand is whether or not SiteB (or more appropriately CompanyB) is able to automatically download the content on SiteA or have to make an intern manually copy and paste the data into a spreadsheet.

lucb1e6y ago

There are privacy settings though, and recruiters used to be able to see more than other non-contacts. I'm not sure if that is still the case, but so what is being shared is not always exactly obvious to the user.

kovek6y ago

What if there could be a robot.txt kind of setting that users could use to prevent being scraped?

jakeogh6y ago

Pointless unless you want a one world governement to enforce a law monoculture.

post_below6y ago

I'd personally call HiQ's business model bottom feeding.

However restricting access to public information on the internet will benefit only the established titans. So this ruling is great news.

zlagen6y ago

Users know their information is public and they have the option to make it private on Linkedin. If Linkedin is worried about the privacy of their users they should let them know about the risks of having a public profile.

perl4ever6y ago

The consequences of information leakage can be zero for an indefinite amount of time before something surfaces with catastrophic effect. Everybody knows this, because everybody sees it happen constantly, even though it is relatively unlikely it will happen to you, today.

Human beings are hardwired to do things that they see others doing unless there is a really clear connection between the actions and disaster. There has to be another mechanism to deal with probably small, but unquantifiable risks.

spaced-out6y ago

Same here. On one hand, this lessens the monopoly power of large tech companies, on the other hand, it gives users less control over their data.

paxys6y ago

IMO if you set up a profile on LinkedIn there's a pretty clear expectation that your bosses will be able to see it.

pjc506y ago

Doing so in Europe is a clear GDPR violation.

I think that's a reasonable balance - you can scrape data, but not personal data without consent of the scraped person.

TeMPOraL6y ago

I'd say it's not only reasonable, it's also "carving nature at its joints". The reason scrapping personal data is problematic is because it's personal data, not because it's being scrapped - so protection should be applied from the direction of personal data regulations.

undefined38406y ago

I recently learned from a recruiter that one license for one recruiter for LinkedIn is $10k a year, so that is what they are protecting.

phs318u6y ago

I’m a very active user of LinkedIn, effectively cultivating my “professional brand” on it. I’ve been contracting for years and use my network to find gigs. While I don’t have an issue with the business that HiQ are in (informing businesses of employee flight risk), I do believe there’s a qualitative difference between data that I publish for consumption by human eyeballs for free (a use of my data that I’ve authorised), and someone harvesting such data and en-mass for commercial purposes that I have not authorised. HiQ have not asked for my permission to use my data, they have not made any commitments about how they will use and not use my data. Given that they have access to my contact details (even via LI itself), they are capable of contacting me to request permission to use my data.

paxys6y ago

The difference isn't as clear as you are making it out to be.

If you have a public LinkedIn profile, should an employer be able to look at it without your explicit consent and reach out to you for job opportunities (or disqualify you from one)?

Should the employer be able to pay someone else (say a recruiting agency) to look at LinkedIn profiles on their behalf?

Should the recruiting agency be able to use automated tools (which scrape public profiles) that make things easier for them?

CosmicShadow6y ago

What HiQ did was scrape public data, so if you have your LI profile set to public, then anyone can access it and do what they will with it, just like if you posted a print out of it on a bulletin board in a mall. It's in the open and is free game for whtever. You can make your entire profile or just aspects of it private, meaning people need to login to LI to see your stuff, which then protects you under the TOS.

I think profiles were default public so you could be found on Google and for SEO purposes for both you and LI.

You'd be hard pressed to find a public profile accessible anymore on LI anyway, even with public settings, you'll hit an authwall 9 out of 10 times.

phs318u6y ago

I understand what HiQ have done. I'm saying I believe there's a material difference between public data for consumption by individual human beings, and systematic commercial harvesting. I appreciate that in the US, there may be no legal distinction between types of consumption of public data. Public data is public. However, I'm arguing that any commercial use or of my data beyond fair-use, should require my permission and an explanation of how my data will be stored and treated, so that I can be assured that my rights (over further unauthorised use) are preserved.

EDIT: It occurs to me that HiQ's success over LinkedIn does not necessarily imply they would be successful against actual LI users in a GDPR-like jurisdiction. Also, what if LI turned around and allowed each user to specify a style of CC license under which their specific data is published (by LI on behalf of the user). If I specified a non-commercial license variant, would that disallow HiQ's actions (without seeking permission)?

CosmicShadow6y ago

I can't say I know much about licensing and/or GDPR stuff, all I know is that if it's public, I don't have to agree to anything and I can do whatever I want, which is great for me and my business. From the other side, yes it sucks that people can take my stuff and profit from me and there is nothing I can do about it and no way to enforce it and I probably don't even know it's happening. (sounds like ad tracking!)

The way things work in North America at least to my understanding is that it doesn't matter what license you use, I don't have to agree to it to scrape it and use it if there is no click wrapper. I guess if you caught me explicitly using it in a certain way, I could get in trouble, but that is not easy. What you propose sounds reasonable, but I don't know how it would be enforced or if it would still stop people. I'm owed 30k in consulting wages and I can't even make it worth my while to pursue that from a legal standpoint, let alone try and sue some unknown and/or potentially massive company or scattered random ghosts across the interwebs.

danielrhodes6y ago

LinkedIn has played a very poor strategy here. The value of the service should be in the network, which is quite defensible. Instead, they’ve made the value in the profiles, which is not defensible. Few people curate their network on LinkedIn because you can't see profiles unless you are closely connected, so you are incentivized to add as many people as possible, thus devaluing the entire network. Then they go and sell unlimited access to profiles to recruiters and sales people. Thus, when other services come around and scrape their data, which LinkedIn needs to make somewhat publicly available for SEO juice, it becomes an existential threat.

If you look at Facebook, there is some limited profile data publicly available, but they will go to the wall to prevent people from seeing how those people are connected. In addition, they started from a very walled-off position, so they didn't become reliant on SEO traffic.

crazygringo6y ago

Question:

This seems to mean LinkedIn can't sue to prevent scraping.

I assume it's still legal for them to implement technological anti-scraping measures? So the two companies can play cat-and-mouse if they wish with rate-limiting, IP addresses, etc...

thomascgalvin6y ago

An earlier ruling actually ordered LinkedIn to stop attempting to block the scraping using technological measures, too.

zuminator6y ago

I believe that LinkedIn were enjoined from using measures to limit HiQ specifically from scraping their site, not from general authorization-based measures that might have the corollary effect of limiting access by HiQ. The idea is that LinkedIn can't make the data public and freely accessible and then turn around and say that the data isn't public if you're a potential competitor who is using an automated tool to access it in bulk.

buboard6y ago

that sounds bizarre. why would they order them to do that? what if they re trying to block spammers or sth. What about pages that users want public, but not indexable, e.g. dropbox shared links

perl4ever6y ago

Has robots.txt been outlawed now? In what jurisdiction exactly?

sjg0076y ago

I don't believe that robots.txt has ever had the backing of law.

jjeaff6y ago

Robots.txt doesn't restrict anything. It's just a request of what should and shouldn't be scraped by search engine spiders.

r_singh6y ago

Not too hard to surpass those with things like residential proxies, randomised user agents, headless browsers, etc. Bring on the anti scraping measures...

555556y ago

> This seems to mean LinkedIn can't sue to prevent scraping.

Zillow and similar companies have shut down numerous startups which relied on scraping their data.

How is this different?

paxys6y ago

LinkedIn data is provided freely by its users. MLS, on the other hand (which Zillow and all other such sites/agents get their data from), is a private database.

tempestn6y ago

This blog post is an excellent summary, and covers what was actually decided and what is still unknown: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

lr4444lr6y ago

What cracks me up about this is how these massive companies go to such lengths to call themselves mere platforms in order to avoid liability for content, and then when someone actually takes the content in this case they cry, "Foul! That's ours!" Can't have it both ways.

genidoi6y ago

Linkedin tried to argue that if they put data behind a login wall, then it no longer falls under the wide umbrella of "public data" and so it's "theirs". Previous cases already established that if a crawler can see the data without any session cookies then its okay. This ruling extended that to any data that can reasonably be accessed by any member of the public.

There will probably be more cases like this as the upper bound of what "public data" means; At what point does publicly aggregated data stop being public data? And do attempts that companies make to prevent that data from being captured (ip limiting, captchas, login walls) count as immoral/illegal, since they are restricting the public from accessing a public good?

mafuy6y ago

> Previous cases already established that if a crawler can see the data without any session cookies then its okay.

I'm interested in this, but I'm not sure how to learn more - can you give me a hint?

genidoi6y ago

Do you mean crawling without cookies or the legal case?

giancarlostoro6y ago

What's scarier is when they editorialize their platforms (also read censorship), therefore becoming content producers themselves. Today it's whoever you disagree with being censored, tomorrow it's your own voice.

Domenic_S6y ago

Who's responsible for privacy then? That's another situation where you can't have it both ways - can't tell the platform they don't own the data and simultaneously hold them to GDPR.

playing_colours6y ago

I do not like a hide and seek game with who viewed your profile functionality: upgrade to a paid subscription to see who viewed, upgrade to another tier to hide that you looked at someone.

It looks like the lack of imagination or business prowess to come up with more advanced, valuable, and less annoying ways for monetisation. If only they could make it easier to connect people with matching mutual interests, more flexible than plain traditional job board and the database of CVs.

gnicholas6y ago

You don’t have to pay to hide that you viewed someone’s profile. Maybe if you want to see who viewed yours, but also keep your browsing private — but it seems more reasonable to charge for that sort of functionality.

datelinereader6y ago

FYI, this article is from a month ago and this general story was discussed here at the time (linking the official announcement):

https://news.ycombinator.com/item?id=20920753

xupybd6y ago

After finding this https://github.com/Greenwolf/social_mapper, I strongly recommend against having a profile photo on linkedin. It has caused me to be far more careful about my presence on the internet.

In the post privacy age I don't want my personal opinions to come back and haunt me. I grow as a person but the internet remembers all. If I make a dumb mistake and it's published online that's not a problem for me in 10 years if that fades away. But people are collecting and correlating info now. I don't like it one bit. It means someone you've never met, in a country you've never been to could extort you. It's getting very scary.

vesche6y ago

You can make it so your picture on LinkedIn is only viewable by people who are connected with you. I do agree that people should be cautious about what they post/share online however.

gist6y ago

I think also what most people don't realize is that linkedin's current model makes it difficult to access someone's profile without them knowing (if they pay for it and have the option on their account) to see who is looking at their profile. As such the user wanting to look at a person's profile has no privacy that they have done so. There could be many reasons someone looks at someone else's profile (even just some kind of curiosity or mistake) so this to me is an issue in itself.

Sure there are ways around this (you can make up a fake profile and some info is public but normally what I run into is a request to login to linkedin to view something that I am interested in).

scarface746y ago

There is a setting that lets you see other people’s profile without them being notified. You can do it with free accounts. But you also can’t see who viewed your profile.

If you pay, you can keep your viewing private while seeing other people’s profile.

gist6y ago

But if they pay can't they override that or are you saying that even a paid account on linkedin can't see if you looked at their profile if you (on a free account) have said 'don't allow anyone to see'?

scarface746y ago

That’s what LinkedIn says. So I hope that’s the case.

ChrisMarshallNY6y ago

Personally, this doesn't bother me too much. I use LinkedIn specifically because it is public. I'm an "open kimono" type of person. Not particularly interested in hiding stuff.

However, the general principle of "Data Scraping as a Business Model" bothers me. This is by no means the only company that does it (I suspect that MS does it with their access to LinkedIn).

There are far more egregious instances, and many of them have ways to get users to voluntarily cede information (can you think of a rather obvious example?).

LinkedIn is a sandwich board. It's meant to be a public showcase. If you want private, I suspect there are much more focused (and probably valuable) venues that cater to particular communities.

alexandercrohde6y ago

> Not particularly interested in hiding stuff.

Well, so the company, HiQ, is basically scraping every time you update your linked in, to tell your employer you might be about to leave.

Now maybe that's cool with you. But it seems super sketchy to me, and one reason I deleted my linkedIn altogether.

ChrisMarshallNY6y ago

You made the correct decision.

It is not "cool" with me. It just means that it isn't a factor for me. I would not have used LI for a job search in any obvious way.

If I keep a fairly current and active generic profile, then LI is useful, and no one needs to know whether or not I'm looking.

hooloovoo_zoo6y ago

What if LinkedIn adds a visibility option in addition to public/private profile that says "I want LinkedIn to prevent robots from scraping my profile."? What if LinkedIn enables that mode by default? Can they then continue preventing scrapers?

alt_f46y ago

I think they can, but they won't because robots includes search engines and blacklisting search engines from user profiles will very negatively impact their metrics.

perl4ever6y ago

You're implying they can't discriminate. So does this case make robots.txt illegal?

alt_f46y ago

First off, robots.txt is optional. It's neither a technical nor a legal limitation at this point.

Second, OP's argument suggests a UI option that gives or removes user consent from all robots in general. Unless they plan to word it: "allow robots that we like that are good for us. but disallow other robots", I don't think it's okay to discriminate by either allowing/banning a particular robot, as that is not what the user agreed to.

myth_buster6y ago

Detailed discussion from September when the decision was made.

https://news.ycombinator.com/item?id=20920753

conjectures6y ago

IP aside, anyone else concerned about the business of HiQ?

I presume what they are doing is:

* Scrape profiles.

* Calculate time delta in jobs.

* 'Predict' churn rate for (prospective) employee.

With respect to prospective employees in particular this seems likely to entail lots of risks. Average job time delta is going to be a massively overdetermined variable, and noisy wrt 'next job delta'. I'm worried how they're going to sell that to employers.

mminer2376y ago

For anyone interested more in the law in the case without reading all 30+ pages of the opinion yourself, I wrote a brief for it last month when this ruling was made: https://matthewminer.name/law/briefs/Miscellaneous/hiQ+Labs+....

spider-mario6y ago

> “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others”

How is it evident that the users intend them to be accessed by scrapers and not just humans? Since the ToS forbid scraping, it seems very reasonable to me to imagine users making their profiles public because of that assumption that scraping is not tolerated.

alkonaut6y ago

What is the limit for what is "user provided"? My entire facebook profile, including my social graph is "user provided".

Does this mean that it would likely be possible for a competing network to have a "click here to import your friend list" for example?

brushfoot6y ago

This is great news. The data is public; it shouldn't matter whether you hire humans to parse it or develop a bot. LinkedIn was trying to have its cake and eat it too.

Causality16y ago

Would it really be that difficult for LinkedIn to requires users to be logged in before viewing profiles and include anti-automation rules in the EULA?

donohoe6y ago

In case its not clear, this is from September.

mherdeg6y ago

Hmm, how does this compare versus the Craigslist/3Taps/Radpad litigation? Are these similar issues?

EGreg6y ago

It sounded like this was going to be an opinion piece about how LinkedIn is losing its appeal to users.

atombender6y ago

Anyone versed in U.S. law who can comment on whether the judgement in this case sets a precedent?

gnicholas6y ago

Yes, in the 9th Circuit (western US) this is binding precedent. Elsewhere it can be cited but is not binding.

mminer2376y ago

Technically not. This was just a preliminary injunction. The case itself still has to be decided. But assuming this was indicative of how the court will rule, it will then be binding precedent in the Ninth Circuit.

Barrin926y ago

As expected a lot of people here talking about public data and whatnot, but that is a horrible decision.

"Circuit Judge Marsha Berzon said hiQ, which makes software to help employers determine whether employees will stay or quit, showed it faced irreparable harm absent an injunction because it might go out of business without access.[...]

“LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles,” Berzon wrote. “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others,” including prospective employers."

This isn't some sort of empowerment of the public, it's surveillance capitalism. No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job. The only thing this accomplishes is enabling shady business models that feed of a sort of internet voyeurism, and at the end of the day it'll lead to people turning their profiles private and making LinkedIn more difficult to use if you're someone who is looking for information in good faith.

themacguffinman6y ago

> No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job.

Yes they do. Do you think people who are afraid of their employer finding out about something would show it on their public LinkedIn profile in the first place? If a manager or colleague who they've likely already "connected" with simply opens your LinkedIn profile in their web browser and sees the same info that hiQ sees, then it's game over. If you don't want your employer to know, don't publish it on your public profile. It's absurd to suggest that some minimal manual effort to load a few profiles is a serious privacy defense.

jakeogh6y ago

Your argument is to let corporations effectively make law.

perl4ever6y ago

Corporations do effectively make law, at least in the US. Politicians have neither the time nor the expertise. There have been some widely read articles about how sometimes that law is not even freely available to the public.

jakeogh6y ago

Sounds like we agree that's a bad thing. To your first sentence, no, they propose laws. They dont get to revise their ToS and have it be a violation of the law when you ignore it. That would be like the EPA making a rule, because they were granted that power by congress.

1 more reply

Barrin926y ago

how did you get that out of my post? My argument is that people should make laws that ends the business model of companies like hiQ, and that LinkedIn, although obviously acting in self-interest, is legitimately defending its platform here against third parties who are trying to use public information in privacy-violating ways.

jakeogh6y ago

By letting ToS have the force of law...

1 more reply

alkonaut6y ago

Doesn't that get covered by laws such as GDPR (where applicable)? Just because I can scape your profile doesn't mean I can publish it, sell it etc (or even keep it). I can do it with your consent, and LinkedIn can't complain, isn't that it?

Barrin926y ago

GDPR requires affirmative and explicit consent to data-sharing. I am not a lawyer so I'm happy to be corrected by someone who has more regulatory knowledge here, but I am reasonably certain this company would not be allowed to operate this way in Europe.

anticensor6y ago

They would continue to ignore the law. I have seen the forced "consent" buttons countless times.

onetimemanytime6y ago

>>that required LinkedIn, a Microsoft Corp unit with more than 645 million members, to give hiQ Labs Inc access to publicly available member profiles.

Not sure this is a win for the web. Sure it's user submitted but the users agreed that Linked in owns that after they submit.

rgross16y ago

Are there any useful bots for scraping LI profile out there?

buboard6y ago

OK how does is that going to work for Facebook?

NKosmatos6y ago

This whole situation with public data, personal information, data scrapping, GDPR and us putting our own info on various sites displaying them publicly and then complaining if someone collects them and uses them, has gotten out of hand :-( I think I’ll have to side with hiQ on this.

pkilgore6y ago

> September 9, 2019 / 1:34 PM / a month ago

j / k navigate · click thread line to collapse

163 comments

pixelmonkey6y ago

eagsalazar26y ago

nordsieck6y ago

> Not sure "isn't LinkedIn property" is accurate here.

From LinkedIn's terms of service[1]:

> you are only granting LinkedIn and our affiliates the following non-exclusive license:

___

1. https://www.linkedin.com/legal/user-agreement#rights

jjeaff6y ago

close046y ago

It should also be made clear to the users if that data is being used as payment for the services provided by mentioning explicitly and in a detailed way where that data goes.

1 more reply

rebuilder6y ago

fragmede6y ago

Weev may be an odious person, but everyone has rights in a court of law, even white supremacists.

1 more reply

komali26y ago

3 more replies

giancarlostoro6y ago

gravitas6y ago

tus886y ago

You can decide to not use linked in, and use a service that does not make profiles public.

ludamad6y ago

"If your apple looks a little banged up, eat an orange"

Gene_Parmesan6y ago

Even better, the decision here is only concerning profiles of people who have elected to make that profile public. It's very simple to make your LinkedIn profile private.

1 more reply

lucb1e6y ago

Edit: wait, database copyright is not a thing in the USA. Of course they wouldn't say anything about that.

nordsieck6y ago

IANAL

> But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

I don't think so.

The thing is, LinkedIn is not authoring the compilation. The individual users are.

___

1. https://www.bitlaw.com/copyright/database.html

bdowling6y ago

[0] https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

tempestn6y ago

perl4ever6y ago

"it can be scraped/crawled, that is, it isn’t LinkedIn property"

I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

To me, it seems like common sense would be that if you make a public website, you are implicitly permitting some copies, but surely it's not all or nothing?

freeone30006y ago

0: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

Sharlin6y ago

[1] https://en.wikipedia.org/wiki/Database_right

dragonwriter6y ago

> I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

No, if anything, that supports the decision.

GarrisonPrime6y ago

paulgb6y ago

https://www.eff.org/deeplinks/2013/04/craigslist-owns-what-y...

jammygit6y ago

My understanding is that Facebook uses similar clauses to disallow web scraping. Does that mean Facebook is fair game too?

andy_ppp6y ago

I'm pretty sure you would get a big GDPR fine if you start taking data people agreed to put on Linked-in without their express permission.

echelon6y ago

The semantic web was misguided in 200X, but we might want to take another swing at it in the future.

polygot6y ago

If you add .json to the end of a Reddit URL, it will return JSON data. For example: https://www.reddit.com/r/ubuntu.json . It also works with comment threads and posts.

avip6y ago

Wonderful feature also used by Trello https://trello.com/b/rq2mYJNn/public-trello-boards.json

ludamad6y ago

Now that is an ergonomic API.

polygot6y ago

Also, it outputs XML and RSS too: https://www.reddit.com/r/ubuntu.rss and https://www.reddit.com/r/ubuntu.xml

ahbyb6y ago

xml and rss seem to be the same exact output

sneak6y ago

I have adopted this in other projects and added the functionality there as well; it is a brilliant idea.

diminoten6y ago

Yeah no need to scrape Reddit, their content is accessible via their API.

Raidion6y ago

PRAW is also a great python reddit "scraper" that allows you to pull data via their API very easily.

psv16y ago

ptero6y ago

It's not quite like that. The first company cannot prevent scraping by individuals or another company of information that it already shows to everyone. Which, to me, is a good thing. My 2c.

gkoberger6y ago

I do NOT want my picture run through facial recognition software, or my name/email sold to marketers who will add it to a drip campaign.

3 more replies

dx0346y ago

So Google would need to allow scraping of search results? That would be a huge change, they currently prevent that pretty aggressively.

TeMPOraL6y ago

bobthepanda6y ago

It only applies to data displayed publicly, though. If Facebook and such started requiring logins to see personal data would that be such a bad thing?

jjeaff6y ago

tekknik6y ago

sireat6y ago

Reddit has a decent API

The golden rule is to use the API before you start raw scraping.

OrgNet6y ago

for IMDb, they have a lot of data that is easily accessible, not sure what is missing though: https://datasets.imdbws.com/...

tdhoot6y ago

Only for personal and non-commercial use, which is probably not what startups need.

OrgNet6y ago

process it on your personal computer and use the output in your startup

1 more reply

JustSomeNobody6y ago

Why would they have to make it available to startups in an easily accessible manner?

1 more reply

jakeogh6y ago

It's already legal. Adding law adds restrictions.

pharrington6y ago

You're not wrong. In the general case though, adding law can mandate that a certain already occurring activity must be done.

lonelappde6y ago

What gives you a rightful claim to information that I gave to someone else, if neither I nor they consent?

austhrow7436y ago

You did consent. This is about information you gave to LinkedIn and told them to give to the general public.

perspective16y ago

heavyset_go6y ago

User privacy shouldn't be dependent on draconian anti-scraping laws.

Besides, LinkedIn is already sharing every last bit of their users' information with the highest bidder.

landryraccoon6y ago

To me this is conceptually the same problem as DRM - with your position similar to those trying to build DRM systems.

Nextgrid6y ago

The users agreed to publish their details publicly on LinkedIn. It’s normal that anyone can access those details and use them however they like.

smohare6y ago

Imagine being denied job opportunities because some company has analyzed the careers of the last 25 generations of your ancestors and deemed your lineage to be inadequate?

pergadad6y ago

alkonaut6y ago

I don't agree to republish them. Just because they are publicly accessible on SiteA doesn't mean I have agreed to have them be published on SiteB does it?

mmcwilliams6y ago

That isn't what's really being ruled on here, though. Republishing the data is still restricted via copyright, but the data may exist in an internal database that SiteB uses to do X.

lucb1e6y ago

kovek6y ago

What if there could be a robot.txt kind of setting that users could use to prevent being scraped?

jakeogh6y ago

Pointless unless you want a one world governement to enforce a law monoculture.

post_below6y ago

I'd personally call HiQ's business model bottom feeding.

However restricting access to public information on the internet will benefit only the established titans. So this ruling is great news.

zlagen6y ago

perl4ever6y ago

spaced-out6y ago

Same here. On one hand, this lessens the monopoly power of large tech companies, on the other hand, it gives users less control over their data.

paxys6y ago

IMO if you set up a profile on LinkedIn there's a pretty clear expectation that your bosses will be able to see it.

pjc506y ago

Doing so in Europe is a clear GDPR violation.

I think that's a reasonable balance - you can scrape data, but not personal data without consent of the scraped person.

TeMPOraL6y ago

undefined38406y ago

I recently learned from a recruiter that one license for one recruiter for LinkedIn is $10k a year, so that is what they are protecting.

phs318u6y ago

paxys6y ago

The difference isn't as clear as you are making it out to be.

If you have a public LinkedIn profile, should an employer be able to look at it without your explicit consent and reach out to you for job opportunities (or disqualify you from one)?

Should the employer be able to pay someone else (say a recruiting agency) to look at LinkedIn profiles on their behalf?

Should the recruiting agency be able to use automated tools (which scrape public profiles) that make things easier for them?

CosmicShadow6y ago

I think profiles were default public so you could be found on Google and for SEO purposes for both you and LI.

You'd be hard pressed to find a public profile accessible anymore on LI anyway, even with public settings, you'll hit an authwall 9 out of 10 times.

phs318u6y ago

CosmicShadow6y ago

danielrhodes6y ago

crazygringo6y ago

Question:

This seems to mean LinkedIn can't sue to prevent scraping.

I assume it's still legal for them to implement technological anti-scraping measures? So the two companies can play cat-and-mouse if they wish with rate-limiting, IP addresses, etc...

thomascgalvin6y ago

An earlier ruling actually ordered LinkedIn to stop attempting to block the scraping using technological measures, too.

zuminator6y ago

buboard6y ago

that sounds bizarre. why would they order them to do that? what if they re trying to block spammers or sth. What about pages that users want public, but not indexable, e.g. dropbox shared links

perl4ever6y ago

Has robots.txt been outlawed now? In what jurisdiction exactly?

sjg0076y ago

I don't believe that robots.txt has ever had the backing of law.

jjeaff6y ago

Robots.txt doesn't restrict anything. It's just a request of what should and shouldn't be scraped by search engine spiders.

r_singh6y ago

Not too hard to surpass those with things like residential proxies, randomised user agents, headless browsers, etc. Bring on the anti scraping measures...

555556y ago

> This seems to mean LinkedIn can't sue to prevent scraping.

Zillow and similar companies have shut down numerous startups which relied on scraping their data.

How is this different?

paxys6y ago

LinkedIn data is provided freely by its users. MLS, on the other hand (which Zillow and all other such sites/agents get their data from), is a private database.

tempestn6y ago

This blog post is an excellent summary, and covers what was actually decided and what is still unknown: https://blog.ericgoldman.org/archives/2019/09/ninth-circuit-...

lr4444lr6y ago

genidoi6y ago

mafuy6y ago

> Previous cases already established that if a crawler can see the data without any session cookies then its okay.

I'm interested in this, but I'm not sure how to learn more - can you give me a hint?

genidoi6y ago

Do you mean crawling without cookies or the legal case?

giancarlostoro6y ago

Domenic_S6y ago

Who's responsible for privacy then? That's another situation where you can't have it both ways - can't tell the platform they don't own the data and simultaneously hold them to GDPR.

playing_colours6y ago

I do not like a hide and seek game with who viewed your profile functionality: upgrade to a paid subscription to see who viewed, upgrade to another tier to hide that you looked at someone.

gnicholas6y ago

datelinereader6y ago

FYI, this article is from a month ago and this general story was discussed here at the time (linking the official announcement):

https://news.ycombinator.com/item?id=20920753

xupybd6y ago

vesche6y ago

You can make it so your picture on LinkedIn is only viewable by people who are connected with you. I do agree that people should be cautious about what they post/share online however.

gist6y ago

Sure there are ways around this (you can make up a fake profile and some info is public but normally what I run into is a request to login to linkedin to view something that I am interested in).

scarface746y ago

There is a setting that lets you see other people’s profile without them being notified. You can do it with free accounts. But you also can’t see who viewed your profile.

If you pay, you can keep your viewing private while seeing other people’s profile.

gist6y ago

scarface746y ago

That’s what LinkedIn says. So I hope that’s the case.

ChrisMarshallNY6y ago

Personally, this doesn't bother me too much. I use LinkedIn specifically because it is public. I'm an "open kimono" type of person. Not particularly interested in hiding stuff.

However, the general principle of "Data Scraping as a Business Model" bothers me. This is by no means the only company that does it (I suspect that MS does it with their access to LinkedIn).

There are far more egregious instances, and many of them have ways to get users to voluntarily cede information (can you think of a rather obvious example?).

LinkedIn is a sandwich board. It's meant to be a public showcase. If you want private, I suspect there are much more focused (and probably valuable) venues that cater to particular communities.

alexandercrohde6y ago

> Not particularly interested in hiding stuff.

Well, so the company, HiQ, is basically scraping every time you update your linked in, to tell your employer you might be about to leave.

Now maybe that's cool with you. But it seems super sketchy to me, and one reason I deleted my linkedIn altogether.

ChrisMarshallNY6y ago

You made the correct decision.

It is not "cool" with me. It just means that it isn't a factor for me. I would not have used LI for a job search in any obvious way.

If I keep a fairly current and active generic profile, then LI is useful, and no one needs to know whether or not I'm looking.

hooloovoo_zoo6y ago

alt_f46y ago

I think they can, but they won't because robots includes search engines and blacklisting search engines from user profiles will very negatively impact their metrics.

perl4ever6y ago

You're implying they can't discriminate. So does this case make robots.txt illegal?

alt_f46y ago

First off, robots.txt is optional. It's neither a technical nor a legal limitation at this point.

myth_buster6y ago

Detailed discussion from September when the decision was made.

https://news.ycombinator.com/item?id=20920753

conjectures6y ago

IP aside, anyone else concerned about the business of HiQ?

I presume what they are doing is:

* Scrape profiles.

* Calculate time delta in jobs.

* 'Predict' churn rate for (prospective) employee.

mminer2376y ago

spider-mario6y ago

> “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others”

alkonaut6y ago

What is the limit for what is "user provided"? My entire facebook profile, including my social graph is "user provided".

Does this mean that it would likely be possible for a competing network to have a "click here to import your friend list" for example?

brushfoot6y ago

This is great news. The data is public; it shouldn't matter whether you hire humans to parse it or develop a bot. LinkedIn was trying to have its cake and eat it too.

Causality16y ago

Would it really be that difficult for LinkedIn to requires users to be logged in before viewing profiles and include anti-automation rules in the EULA?

donohoe6y ago

In case its not clear, this is from September.

mherdeg6y ago

Hmm, how does this compare versus the Craigslist/3Taps/Radpad litigation? Are these similar issues?

EGreg6y ago

It sounded like this was going to be an opinion piece about how LinkedIn is losing its appeal to users.

atombender6y ago

Anyone versed in U.S. law who can comment on whether the judgement in this case sets a precedent?

gnicholas6y ago

Yes, in the 9th Circuit (western US) this is binding precedent. Elsewhere it can be cited but is not binding.

mminer2376y ago

Barrin926y ago

As expected a lot of people here talking about public data and whatnot, but that is a horrible decision.

themacguffinman6y ago

jakeogh6y ago

Your argument is to let corporations effectively make law.

perl4ever6y ago

jakeogh6y ago

1 more reply

Barrin926y ago

jakeogh6y ago

By letting ToS have the force of law...

1 more reply

alkonaut6y ago

Barrin926y ago

anticensor6y ago

They would continue to ignore the law. I have seen the forced "consent" buttons countless times.

onetimemanytime6y ago

>>that required LinkedIn, a Microsoft Corp unit with more than 645 million members, to give hiQ Labs Inc access to publicly available member profiles.

Not sure this is a win for the web. Sure it's user submitted but the users agreed that Linked in owns that after they submit.

rgross16y ago

Are there any useful bots for scraping LI profile out there?

buboard6y ago

OK how does is that going to work for Facebook?

NKosmatos6y ago

pkilgore6y ago

> September 9, 2019 / 1:34 PM / a month ago

j / k navigate · click thread line to collapse