Anyone (especially the HN crowd) should know they have the data, and if you think they're not carefully analyzing it behind the scenes (like every other tech company who has your data), I've got things to sell you. I personally think a tiny peek like this into the data, much like the usage posts that OKCupid, YouPorn, and others give, is neat.
To test this, we took pairs of bad matches (actual 30% match) and told them they were exceptionally good for each other (displaying a 90% match.)
That's really not something people like having done to them. And the "HN crowd" shouldn't have an expectation of privacy and decency in data? Of course they're analyzing data, but it's really the viewpoint from which they do it that is unsettling. OKCupid says "no, duh, we're unethical. Deal with it." Uber says "Check it out! We drew a line between social security checks and prostitution!" (as waterlesscloud notes at https://news.ycombinator.com/item?id=8644138 )There are a million more beneficial ways that people could be using the data. Fighting hunger, poverty, illiteracy, etc., to me, is a "good" use of Big Data. Looking at sexual habits (when you're not selling sex) or openly manipulating people to get data is, to me, a "bad" use.
I'm sorry if the idea that "people's short overnight stays are evident in their travel data" makes you blush, but that isn't anyone else's problem.
(I say they "probably dropped PII" because when you do work of this sort, PII is boring data that slows down your calculations.)
Similarly, what's wrong with observing a correlation between welfare checks and prostitution? It's an interesting observation. It's potentially useful for public policy and fighting poverty (at least American style relative poverty), though of course a more detailed investigation needs to be done.
By contrast, it's simply not professional and reeks of juvenile behavior for Uber to be writing a post like this. Just because you have data and have these thoughts, doesn't mean you have to do the analysis and show the world. It doesn't help their users, it's not even that interesting, and it's not relevant to their value proposition as a business.
But since they are accused of trying to dig up dirt on people, this is a chilling reminder that they are more than capable of doing that, and apparently quite willing.
That's the creepy bit. Who owns that data? I want to live in a world where I own my data, and it can't be used for creepy purposes like this, or to extract additional value through arbitrage based on asymmetric information availability.
The Streisand effect is so well-known that I'm surprised anyone would delete a blog post nowadays.
EDIT: I actually hadn't read the blog post in detail until now, which was more than a little dumb. I thought it was just an analysis of rides along with some neat heatmap images. I didn't realize it was about sexual datapoints.
And yet.
See, there's another company that occasionally releases interesting data analytics: Google.
See: Word frequency over time, Predicting the spread of viruses from searches, etc.
The issue is that Uber is trying to explain motive and behavior at the individual level ("I know something about you!"). This is something that would be a definite no-no for Google. The cheekiness of the language certainly doesn't help either.
The more and more I hear about this company the more I am thankful we have heavily regulated taxis/cabs.
I think I'd rather my data only be available to a private company and their handful of engineers than the whole world.
(though it's not very well-written, some analysis a bit iffy, and the guesswork towards the peaks and dips in the graph rather low-effort)
Creepy/evil maybe no, because the data is clearly anonymised. However the cringe is all over this article. OKCupid's stuff could easily be just as cringey, but they know it's important to steer clear from that. Also they're a dating site, if they wrote an article about data-mining one-night stands, that would make sense. Not so much for a taxi company, especially not in light of Uber's general attitude.
The final sentence of the article definitely crossed from "cringe" into "creepy" for me, though. In particular from someone called "Uber".
The PDF in which this article was referenced did so to illustrate the availability of this data for frivolous purposes, and is right to call it "questionable" when considered in light of Hourdajian's statements about privacy and Uber's data-policies.
I mean, I found the idea behind the post interesting: of course you can analyze trends in ridership to draw interesting conclusions. At the end of the day, however, it's a horrible idea to say "Hey, we know which of you are being 'frisky' and where!"
Perhaps with a different motivation, this post wouldn't be nearly as ruinous. How about ridership patterns of sick or socioeconomically disadvantaged people? That's the kind of data that can change lives for the better.
You mean the sort of people who take the bus, who are pretty much the opposite of their target customers?
> Between June 2011 and August 2011 I worked with my friends over at Uber as their data scientist, writing (what I thought were) amusing, data-driven blog posts (among other, more serious roles).
The service they provide doesn't allow the "Ministry of Truth"[1] to doctor historical documents to meet their present day narrative.
archive.org respect the robots.txt of the current website owner. This can mean that they have the data but choose not to give you access to them. I have seen cases in the past where a website I once frequented became defunct, then the domain expired, then someone parked a holding page on that domain including a robots.txt that keeps archive.org from displaying the old data (which do not even belong to the current owner of the domain!).
If they wanted to, there are a number of ways Uber could prevent archive.org from displaying that blog post. Many of these ways are due to the good faith under which archive.org operates (nobody is forcing them to respect robots.txt), and some even involve resorting to legal methods. But history is always mutable.
(Nothing but love on my end for archive.org, believe me! But I do want to point out the lengths that some people will go to alter the historical record).
And yes I don't care what you think, but a company with a billion(ish) of funding is more powerful than YOU.
[0] https://web.archive.org/web/20140827195715/http://blog.uber....
Internal metrics teams nearly always have access to complete data. The issue is sharing non-anonymized data externally.
However there's no mention in these posts of such safeguards, and subjectively the post reads more like the analyst is just fishing around in the full raw dataset of ride times, start and end locations, and names. To wit:
"What else can we learn? First, we can devise a way to statistically assess whether there are more women or men in a neighborhood than we’d expect. [...] We used Rapleaf’s Name to Gender API to assess the likelihood of a rider’s gender given their name, only accepting a match if the probability was >= 95%."
And in the original post, he categorizes rides as possibly related to a late-night hookup based on whether the destination and departure points for 2 rides are within 0.1 mi of each other.
>Internal metrics teams nearly always have access to complete data. The issue is sharing non-anonymized data externally.
I disagree pretty strongly with this. Do you think that your average Uber rider would be OK with Uber employees analyzing their ride patterns (with their real names attached) to try to figure out where and when they are having sex? Do you think Uber should allow such access to its employees by policy? (It seems we agree that writing a blog post about it is not a great idea.)
This would also explain the spike near the weekend, among other things.
One could do an analysis like this while still working with anonymized data. Still a bit creepy, but not that different from reports and blog posts you see from other startups and tech companies.
Nothing they've done so far, in isolation, are IMO worth the pitchforks being handed out in tech and mainstream consciousness right now, but taken as a whole it's pretty easy to see why people aren't willing to cut Uber any slack or give them the benefit of the doubt.
So yeah, this thing by itself isn't "that bad", but it's one piece of a large puzzle of Uber's misbehavior.
There have been very, very few times when a company's webpage was down and I needed to go to google-archive or archive.org to refer to some innocuous information. However, the times that I've used those sites to gather evidence of possible whitewashing? Many, many times, in comparison.
OKCupid is a dating website which deliberately branded themselves as further on the "edgy" and "hookup" side of dating websites. Then you have POF somewhere in the middle, with eHarmony way on the other side, quite opposite of OKCupid.
I'm not sure why Uber would want to put themselves anywhere on that same scale (i.e. aligning your brand with notions of sex and one night stands). There's a time and a place for everything, and for edgy data analysis like this -- that "place" is edgy dating websites who want to be known for hooking up.
It's unprofessional and out of line with their brand image, obviously why the post got deleted. IMO this further validates all the bad press the media has been publishing about Uber.
https://web.archive.org/web/20140827195709/http://blog.uber....
Note that both of these posts had been up for years and only disappeared in the last few days.
"uberdata-how-prostitution-and-alcohol-make-uber-better/"
So if you took an uber to some bar/club/friends at 10-11pm and again after 2am when all bars or the T is closed, you're likely counted. I doubt this represents customers having one night stands and is likely just a heat map. This is further explained by the small pocket in Somerville that is not accessible by the train, but by bus where people may opt for an uber.
That's not to say that there are no rides of glory or whatever the hell kids call it today.
Would google publish data that shows how searches for porn spike during different times of the day or times of the year, as if it's some "cool and hip and edgy!" insight?
I don't think so.
And for the same reason they don't (whatever reason that is), it would probably also be wise for Uber not to post stuff like this.
I really don't care, nor am I offended. I'm just speculating that Uber doesn't have the brightest team of execs and still have a lot of "growing up" to do.
Google have been fighting a public relations war for a long time now to not appear creepy or stalkerish. I can think of few things they could blog about to make people consider not using Google more than "we know when you're looking for porn".
Uber have not (yet?) been widely called out as being creepy the way Google have. But Uber have data that can be every bit as personal as your search history, and posts like these make it obvious that people at Uber are thinking hard about putting those data to use.
There's a lot lurking under what at first glance appears to be merely a poorly-considered sophomorish post.
It's the actions of the unscrupulous minority that ruin this for the rest of us. I personally believe that most of the time when companies say "We simply aren't that interested in you." they're probably telling the truth. Stats is pointless if you look at single points. It only takes one person to snoop on an ex or to blow everything up. Unfortunately you have to mitigate that risk, but proper database sanitisation before handing over to the analysts should be sufficient. Provided there is no overlap between the sensitive database and the one the analysts have access to there shouldn't be a problem.
I guess it's a side effect of becoming 'big' that you can no longer run these kind of public posts without looking extremely unprofessional.
Does it really matter these days?
There was a related story published recently, NYC Taxicab Dataset Exposes Strip Club Johns and Celebrity Trips
http://research.neustar.biz/2014/09/15/riding-with-the-stars...
We are watching them pretty close, aren't we?
Is this data fascinating? I guess the time of year patterns and holiday anomalies are interesting but aside from that this behavior seems obvious?
now they have critical mass they can transition into "full boring corp speak"
HN don't throw stones, what boundaries are pushing to get traction right now?
We're not pushing ethical boundaries.