Seems like they've simply determined that viewing any freely accessible URL is "public" and that "viewing" does include scraping. This seems like a very reasonable determination as it maps pretty neatly to how we think about viewing public content IRL where I am free to drive down the road (for profit or pleasure) and record publicly viewable signage and activities and use that data any way I see fit.
It is very accurate. Users retain the copyright on their works in so far as their works are able to be copyrighted. Anything that is a "mere fact", and can't be copyrighted, is also not LinkedIn's property.
From LinkedIn's terms of service[1]:
> you are only granting LinkedIn and our affiliates the following non-exclusive license:
> A worldwide, transferable and sublicensable right to use, copy, modify, distribute, publish, and process, information and content that you provide through our Services and the services of others, without any further consent, notice and/or compensation to you or others.
___
Does this judgement say anything about that, i.e. whether it matters that users contributed the facts in their collection (so I'm not talking about posts, descriptions, etc.) rather than that they collected it themselves and therefore get a form of property right?
Edit: wait, database copyright is not a thing in the USA. Of course they wouldn't say anything about that.
> But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?
I don't think so.
> Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101 [1]
The thing is, LinkedIn is not authoring the compilation. The individual users are.
___
Now, the CFAA was the only criminal statue involved, so I guess that supports what you said, that scraping is not unlawful. There still may be liability though, and using the data only internally would not necessarily protect from that. It remains to be seen.
I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?
To me, it seems like common sense would be that if you make a public website, you are implicitly permitting some copies, but surely it's not all or nothing?
0: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....
No, if anything, that supports the decision.
To the extent that the material is copyrightable, it belongs to the users, who have chosen to make it public; copying incidental and necessary to that access is allowed under an implied license doctrine. Microsoft's efforts to restrict access had nothing to do with copyright, but ToS.
Edit (sort of off topic): There's still value in the building and providing services at scale, but this lowers the barrier to cross the moat for small players. The first step is data liberation. Then we can work to bring down the other cost barriers. It's a lot easier to build services that scale in 2019 than it was in 2005.
The semantic web was misguided in 200X, but we might want to take another swing at it in the future.
The golden rule is to use the API before you start raw scraping.
Besides, LinkedIn is already sharing every last bit of their users' information with the highest bidder.
One can’t both hand over data freely to a service (in this case Linkedin) and also subsequently prevent all sharing of that data. Or to put it another way, you can’t both put your information on a public billboard hoping a recruiter sees it to offer you a job AND keep it strangers private from people you hope won’t misuse it.
Imagine being denied job opportunities because some company has analyzed the careers of the last 25 generations of your ancestors and deemed your lineage to be inadequate?
However restricting access to public information on the internet will benefit only the established titans. So this ruling is great news.
Human beings are hardwired to do things that they see others doing unless there is a really clear connection between the actions and disaster. There has to be another mechanism to deal with probably small, but unquantifiable risks.
I think that's a reasonable balance - you can scrape data, but not personal data without consent of the scraped person.
If you have a public LinkedIn profile, should an employer be able to look at it without your explicit consent and reach out to you for job opportunities (or disqualify you from one)?
Should the employer be able to pay someone else (say a recruiting agency) to look at LinkedIn profiles on their behalf?
Should the recruiting agency be able to use automated tools (which scrape public profiles) that make things easier for them?
I think profiles were default public so you could be found on Google and for SEO purposes for both you and LI.
You'd be hard pressed to find a public profile accessible anymore on LI anyway, even with public settings, you'll hit an authwall 9 out of 10 times.
EDIT: It occurs to me that HiQ's success over LinkedIn does not necessarily imply they would be successful against actual LI users in a GDPR-like jurisdiction. Also, what if LI turned around and allowed each user to specify a style of CC license under which their specific data is published (by LI on behalf of the user). If I specified a non-commercial license variant, would that disallow HiQ's actions (without seeking permission)?
If you look at Facebook, there is some limited profile data publicly available, but they will go to the wall to prevent people from seeing how those people are connected. In addition, they started from a very walled-off position, so they didn't become reliant on SEO traffic.
This seems to mean LinkedIn can't sue to prevent scraping.
I assume it's still legal for them to implement technological anti-scraping measures? So the two companies can play cat-and-mouse if they wish with rate-limiting, IP addresses, etc...
Zillow and similar companies have shut down numerous startups which relied on scraping their data.
How is this different?
There will probably be more cases like this as the upper bound of what "public data" means; At what point does publicly aggregated data stop being public data? And do attempts that companies make to prevent that data from being captured (ip limiting, captchas, login walls) count as immoral/illegal, since they are restricting the public from accessing a public good?
I'm interested in this, but I'm not sure how to learn more - can you give me a hint?
It looks like the lack of imagination or business prowess to come up with more advanced, valuable, and less annoying ways for monetisation. If only they could make it easier to connect people with matching mutual interests, more flexible than plain traditional job board and the database of CVs.
In the post privacy age I don't want my personal opinions to come back and haunt me. I grow as a person but the internet remembers all. If I make a dumb mistake and it's published online that's not a problem for me in 10 years if that fades away. But people are collecting and correlating info now. I don't like it one bit. It means someone you've never met, in a country you've never been to could extort you. It's getting very scary.
Sure there are ways around this (you can make up a fake profile and some info is public but normally what I run into is a request to login to linkedin to view something that I am interested in).
If you pay, you can keep your viewing private while seeing other people’s profile.
However, the general principle of "Data Scraping as a Business Model" bothers me. This is by no means the only company that does it (I suspect that MS does it with their access to LinkedIn).
There are far more egregious instances, and many of them have ways to get users to voluntarily cede information (can you think of a rather obvious example?).
LinkedIn is a sandwich board. It's meant to be a public showcase. If you want private, I suspect there are much more focused (and probably valuable) venues that cater to particular communities.
Well, so the company, HiQ, is basically scraping every time you update your linked in, to tell your employer you might be about to leave.
Now maybe that's cool with you. But it seems super sketchy to me, and one reason I deleted my linkedIn altogether.
It is not "cool" with me. It just means that it isn't a factor for me. I would not have used LI for a job search in any obvious way.
If I keep a fairly current and active generic profile, then LI is useful, and no one needs to know whether or not I'm looking.
I presume what they are doing is:
* Scrape profiles.
* Calculate time delta in jobs.
* 'Predict' churn rate for (prospective) employee.
With respect to prospective employees in particular this seems likely to entail lots of risks. Average job time delta is going to be a massively overdetermined variable, and noisy wrt 'next job delta'. I'm worried how they're going to sell that to employers.
How is it evident that the users intend them to be accessed by scrapers and not just humans? Since the ToS forbid scraping, it seems very reasonable to me to imagine users making their profiles public because of that assumption that scraping is not tolerated.
Does this mean that it would likely be possible for a competing network to have a "click here to import your friend list" for example?
"Circuit Judge Marsha Berzon said hiQ, which makes software to help employers determine whether employees will stay or quit, showed it faced irreparable harm absent an injunction because it might go out of business without access.[...]
“LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles,” Berzon wrote. “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others,” including prospective employers."
This isn't some sort of empowerment of the public, it's surveillance capitalism. No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job. The only thing this accomplishes is enabling shady business models that feed of a sort of internet voyeurism, and at the end of the day it'll lead to people turning their profiles private and making LinkedIn more difficult to use if you're someone who is looking for information in good faith.
Yes they do. Do you think people who are afraid of their employer finding out about something would show it on their public LinkedIn profile in the first place? If a manager or colleague who they've likely already "connected" with simply opens your LinkedIn profile in their web browser and sees the same info that hiQ sees, then it's game over. If you don't want your employer to know, don't publish it on your public profile. It's absurd to suggest that some minimal manual effort to load a few profiles is a serious privacy defense.
Doesn't that get covered by laws such as GDPR (where applicable)? Just because I can scape your profile doesn't mean I can publish it, sell it etc (or even keep it). I can do it with your consent, and LinkedIn can't complain, isn't that it?
Not sure this is a win for the web. Sure it's user submitted but the users agreed that Linked in owns that after they submit.