The Real ‘Stuff White People Like’ (opens in new tab)

(blog.okcupid.com)

204 pointsdfreidin15y ago82 comments

82 comments

On the proper use of this data:

Perhaps the easiest way to think of it is that the phrases are predictors for the race/sex, not the other way around. For example, you shouldn't expect every white male you meet to like Van Halen. However if someone says to you "I have a friend who's a big Van Halen fan", you're pretty safe in assuming that the friend is a white male.

Likewise, it might be that only 10% of blacks like soul food. But if almost no other demographics like it, it will still show up high on their list. So "is black" does not strongly imply "loves soul food", but "loves soul food" does strongly imply "is black".

In other words, http://en.wikipedia.org/wiki/Bayes_theorem

lurchpop15y ago

a commenter on there made a good point about the "soul food" term, where that's also the name of a movie, book and tv series so it ends up being 3 (probably black) data points all counting towards 1.

moultano15y ago

For that reason, I'd be happier if they sorted by KL Divergence instead of just log-odds. That'd give a much better tradeoff between commonality and predictive power.

lazyjeff15y ago

You have a valid point but I'm guessing they avoided kl-divergence because it's 1) harder to explain 2) it's not symmetric, e.g. kl(asian, white) != kl(white, asian) 3) it needs a smoothing function for comparing distributions where not every element is in both distributions.

patio1115y ago

There's just so much that they execute well on that I hate to pick any bit of it, but one thing everybody with linkbait should probably do is create something spiritually similar to the bar which pops up when you're done with the article. It is a force multiplier for all pillar content you write, it increases the viral factor, and the way it grabs someone's attention just when their brain is known to be vacant is sixteen flavors of brilliant.

I did something very similar for a client today, and after I get a little better at manipulating code to do it, I'm probably going to try something similar for getting trial signups. ("Looks like you're done reading about it. Feeling confused about what to do next? WHAM, signup box.")

tuesdays15y ago

Interesting, I was thinking the exact opposite, since those bars are all super annoying.

patio1115y ago

So there are very few things I would be less likely to read than "47 Shocking New Ways To Please Your Man In Bed", but on an intellectual level I understand that that formula has made Cosmo more money than I will ever see in my lifetime.

You are welcome to your preference, of course, but the relevant question for my client is "Does Tuesdays and people like him control a lot of links? Would he link to the article in the absence of this widget? Is he going to refrain from linking to the article now? Will the aggregate number of links we lose offset the massive gain we are expecting to get [and are capable of measuring] from less-opinionated users?"

If the answer to any one of those questions is "no", your preferences are not economically relevant to my client. (Similarly, in the context of doing software trials with a similar mechanism, I am going to go out on a limb and say the intersection of "hates Javascript tomfoolery like this" and "pays money for software" is so low that I am incapable of devising a system precise enough to measure it.)

xiongchiamiov15y ago

Oh wow, that is annoying.

I do like the pop-out bars that the NYT has at the end of their articles, though. See, for example, http://www.nytimes.com/2010/09/07/business/economy/07jobs.ht...

1 more reply

Perceval15y ago

I'm the same way. Whenever I encounter one, I tool around in AbBlock Plus for a while until I find the appropriate scripts to block.

1 more reply

theycallmemorty15y ago

I too was intrigued by them and thought to myself just a week ago: "Those popup-at-the-bottom-of-the-page things are brilliant. I should email the Bingo guy and get him to A/B test them."

patio1115y ago

I am rather more receptive to receiving that email than most companies would be, and if it comes with an offer to create MIT-licensed Javascript/CSS to make the UI happen, I'd probably have it live within an hour of receiving the code.

That same test has been on my agenda since at least April, but front end is my weakest skill, so I keep pushing it back. I've got two partial implementations where I tried to follow tutorials and they just blew up on me. (I just starting scheming and dreaming about hire #1 yesterday, and it will almost certainly be a front end guy.)

yalurker15y ago

Interesting read, a couple things come to mind though: how does "the people who have ok cupid profiles" vary against "the general population". Several things I suspect are skewed because of the population bias.

Also, they say most of their users are urban, but I'm curious if people aren't prone to list themselves as the nearest big city rather than where they really live. For instance, I suspect everyone within 45 minutes of Des Moines is listing themselves as living there, rather than the tiny farm town / suburb they really live in.

steveklabnik15y ago

Generally, OKC's demographics are perceived as being college age, liberal, middle class, and alt friendly.

I'm not sure they've ever directly posted demographic data, but that's their perceived demographic, at least.

JohnnyBrown15y ago

Agreed. The white male info made me think "divorced 30-45 yo professional type" quite distinctly. (I'm a young white guy). Also, I bet this thread will get seriously contentious if not for the fact that the HN audience was depicted quite well there (based on my imaginary demographic profile of Hacker News).

cullenking15y ago

I found this, like most of the other blog posts by the OK Cupid team, pretty genius. I am glad the people who sit on top of this goldmine of social information have a good sense of humor.

It would be cool to see a statistician guest post. The OK Cupid people are great at coming up with ideas for analysis, but I'd love to see some solid stats behind some of their analyses.

dnautics15y ago

It's interesting, I am asian, like soul food, but it's not something that would occur to me as putting on my profile. Similarly, I would write sashimi (if I ate it anymore) versus sushi, and I suspect that non-asians like sashimi just fine but wouldn't know to put it on... So the statistics point to self-cultural broadcasting, I think, more than preferences.

xiongchiamiov15y ago

I didn't know what "soul food" meant until I just looked it up on Wikipedia. Looking over the list[0], I see a number of things I greatly enjoy, but I'd normally just call it "food", or possibly "Southern food".

I'm from the lower part of California's central valley, which has a distinctly Southern influence due to the dustbowl migration. My town is roughly 40% Hispanic, 45% White, and 5% Pacific Islander, so if the term is only widely used among the black population, it's no wonder I've never heard it.

[0]: http://en.wikipedia.org/wiki/List_of_soul_food_items

dnautics15y ago

the term is widely used among the population, not just the black population, in the mid-atlantic up to new york, chicago, and the south.

Sometimes soul food is conflated with Jamaican food (jamaican soul) but for the most part when I think soul food, I think, okra, ham hocks/hambacks, collard and turnip greens, fried chicken, catfish, biscuits, gravy, cornbread, etc. I can't vouch for all the stuff in that wikipedia list, some of it I thought huh?

Since I stopped eating meat except for one day a week, a lot of that has gone out the window.

oiuygtfrtghyju15y ago

On the other hand this is a data point.

Knowing that the stuff you like with fish is called "sashimi" and the rice stuff is called "sushi" is a good indicator that you are asian.

Just like knowing that different types of cheese should actually taste different is a good indicator that you are european.

Natsu15y ago

> Knowing that the stuff you like with fish is called "sashimi" and the rice stuff is called "sushi" is a good indicator that you are asian.

Probably true statistically, but there are plenty of non-Asians like me who know the difference (sashimi is the raw fish, sushi is that which has sushi rice). But I'll never show up on their tests because I don't have an OKCupid profile and neither food is a favorite (they're okay, but definitely not favorites).

ovi25615y ago

>Just like knowing that different types of cheese should actually taste different is a good indicator that you are european.

This is what prolonged exposure to soylent green does to you.

Just kidding, we love Americans.

eru15y ago

Continental European. (OK, the British have Stilton.)

reader500015y ago

Those results are highly odd, but I don't think okcupid is mainstream enough to really glean any insight into racial psychology (if such a thing exists) from their data. Furthermore, they don't provide enough info on their analysis method, but I would be interested in seeing the results of a null-run: randomly assigning profiles to groups (rather than by race) and seeing what "statistically distinct" phrases arise (if the analysis is valid no phrases should arise).

It would also be interesting to see them do the same analysis for other features such as height, income, photo attractiveness, etc.

Similar analysis for craigslist personals by city: http://blog.kiwitobes.com/?p=42

narkee15y ago

They do have an enormous sample size. Much, much larger than any rigorous scientific studies based on getting undergrads to fill out surveys.

They've done similar analyses for things like self-reported heights, and photos:

http://blog.okcupid.com/index.php/the-biggest-lies-in-online...

http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/

reader500015y ago

Not really sure why this is getting upvoted and mine down, but it doesn't matter how large their sample size is since it's restricted to people who voluntarily sign up for an online dating website, specifically okcupid. As okc becomes more mainstream the data is more valid for the general population. But sample size is irrelevant here.

1 more reply

ttdi15y ago

I'm really surprised by how they didn't mention one of the most striking results from the data: Latinos on OKCupid are much more likely to have the word "stationed" in their profile than other demographics. Based on this, it looks like the military contains a large proportion of Latinos ("stationed in [location]"). What are the demographics of the military versus the general population?

oiuygtfrtghyju15y ago

According to http://www.armyg1.army.mil/HR/docs/demographics/FY05%20Army%...

Army: White 63.9%, Black 19.0%, Hispanic 10.3%, Asian 3.8%

USA: White 75%, Black 12%, Hispanic 15%, Asian 4%

So hispanic is underrepresented in the army and black over represented - of course you would have to balance these by age profile for each group and also consider US overseas territories that can join the army.

haroldp15y ago

Maybe we looked at the wrong service, since the latino profile also mentioned "marines":

http://en.wikipedia.org/wiki/File:HispanicMilitary.jpg

2 more replies

acon15y ago

Really interesting, but isn't this more about what white people want other people to think that they like, rather than what they actually like.

It would be interesting to compare this to what they actually like, but I have no idea how to get that data.

josefresco15y ago

Facebook maybe? And I don't mean pulling information from a profile but rather gleaning clues from status updates and comments. I would love to see a public vs private list of works these people would assign to themselves.

reader500015y ago

Eh their analysis method is not too hot. From the comments section:

"The phrases included in the black boxes are the top 50 phrases most statistically correlated to that group. We calculated this as follows:

1. We calculated the frequency of every 1, 2, and 3 word phrase for the whole population. 2. We calculated those same frequencies within each race/gender pair. 3. For each phrase, we divided #2 by #1. 4. This is the propensity of a given group to use a given phrase. 5. The list you see is the phrases with the 50 highest ratios of #2/#1."

So even if a group uses a phrase 1.001x more than the population average, it might still be listed, if there are no actual phrase-usage differences (i.e., all phrase ratios will be small, and the top 50 will be arbitrary).

byrneseyeview15y ago

They're talking about the top fifty such phrases. It seems unlikely that there would be a demographic group that is proportionately less likely to use any arbitrary phrase. The only possibility is for a group's phrase usage to be statistically indistinguishable from the average.

Fortunately, we can perform a sanity check: read some of the phrases to someone, and ask that person which group they think the phrases came from. I bet people will guess with high enough accuracy to establish that it's nonrandom.

reader500015y ago

I'm not saying the results were random. I'm saying they're not really allowing for a "confidence interval" in how many more times a certain group uses a phrase than average. For example if black men use "soul food" 30x greater than average that seems like a solid result. But if it's only 1.01x more than average that seems like noise.

1 more reply

blahedo15y ago

If they only use a phrase 1.001x more than the population average, it's going to be much further down on the list than the top 50, unless they use nearly every phrase less than the average (which, barring some extremely pathological outlier cases, is impossible).

moultano15y ago

>Basically this is just noise, and a null-run (random group assignments) would produce just as valid results.

You didn't substantiate this claim.

superk15y ago

I thought it was interesting how the largest countries aren't the most nationalistic - no Brazil, Mexico, China, Japan, etc.. I also came away with an identity crisis - #1 good food (Soul) and seeing Mos Def, Lupe Fiasco and Talib Kwali in the top "stuff"... dammit I might be black.

Lukeas1415y ago

As a black man it was refreshing to see a "description" of my demographic that I actually related too, as opposed to a stereotype based one that you might find on BET.

dasil00315y ago

I liked that Mos Def and Lupe Fiasco beat out Kanye.

alexophile15y ago

I doubt I'm alone here, but when confronted with the "insert fucking theory" I promptly went through the list of what white people like, inserting "fucking" anywhere I could...

"Groundhog Fucking Day" kind of left a bad taste in my mouth.

colinprince15y ago

Re; "I am cool" being in the top phrases. Could be used in a sentence, like "I am cool with that".

Jus saying.

sjtgraham15y ago

I am white and like none of those things other than guitar and software. I suspect the case is similar for most of the HN readership.

This should be called "what the lowest common denominator like"

dasil00315y ago

It struck me as sort of funny that the minority each had a common self-description (cool, funny, simple), but the closest thing white guys have is "I'm a country boy".

mkyc15y ago

http://stats.grok.se/en/201009/Soul_food

proemeth15y ago

Judging by the stats (The Red Sox), we have strong US East Coast bias (due to user base).

Lukeas1415y ago

I noticed this as well and came to the same conclusion considering I've never met a single Red Sox fan on the left coast. Although, it could be that east coasters all rally around one team, where as we're split between the Dodgers and the Giants, which would present a false bias.

lanstein15y ago

You have now.

awongh15y ago

seventh word for asian males: software developer....

ohashi15y ago

Yeah I found that a bit strange... mechanical engineer was there too... Also, no Japanese? Many of the other major asian countries were broadcasting.

ohashi15y ago

a software engineer #3 for indians.

jbooth15y ago

Yeah Sox!!

I'm taking this as scientific evidence that liking the Red Sox makes people more attractive.

bmelton15y ago

Considering this is based on profiles from users at a dating site, wouldn't the inverse be more likely?

steveklabnik15y ago

While normally, I'd agree with you, OKC has a high poly and hookup population, so normal relationship scarcity constraints do not necessarily apply.

jbooth15y ago

Well, one could say it's perceived as attractive, since they're putting it on their profiles.

Of course, I wasn't really attempting to make a serious point. And the Sox are in a distant 3rd in the AL East right now anyways, which presumably explains the downvotes :)

aristus15y ago

That last stat about reading level bothers me. "Ok, before anyone gets offended about reading level vs race, let's show you a stat that confirms another stereotype: religious people are stupid! And atheists are smartest of all! Scientifically proven with a reading test based on the lengths of words, and metrics I just made up. And don't worry that almost half of the data points belie my analysis, ha ha ha, it confirms your prejudices, so it's ok!"

untamedmedley15y ago

If the goal of black and latino OKC members is to attract members of the opposite sex within their race (which is most likely the case), then it would make sense that they would communicate in their own vernacular.

That is, a black person who is otherwise educated might use a phrase that makes perfect sense to other black people (e.g. "where dey do dat at?"; often used to express confusion at someone's ridiculous behavior) but that isn't clear to the mainstream. Latinos may pepper their profiles with Spanish words.

Asians might do this too if there wasn't so much ethnic/linguistic diversity among the Asian American population. As such, they likely use "safe" mainstream wording.

All this is to say that there are reasons that have nothing to do with intelligence that could have caused this sort of result.

xiongchiamiov15y ago

My technical writing professor always told us to try and reduce the complexity of our writing (I think she used Flesh Kincaid). Perhaps more of us white people have had similar teachers, and took their advice to heart.*

* Unlikely, but hey, I might as well bring it up, since we're talking about flaws in the analysis.

superk15y ago

Scientifically proven with a reading test based on the lengths of words, and metrics I just made up

Actually they're using the Coleman-Liau Index, a computer readability formula going back to 1975.

aristus15y ago

The metric about adherence was made up and without explanation of how they categorized people. The Coleman-Liau Index is based on lengths of words, which proves... what, exactly? Many of the "most" religious people scored higher than the ones rated "meh". Also, agnostics are in the middle of the pack.

So what does this chart predict? That certain self-selected religious categories in their dataset correlate with word length in the essays. Is that good statistics, or a confirmation of prejudices, ie, religious people are less intelligent?

With all of those charts they passed over with "no comment", they saw fit to make a joke about a "Comic Sans Bible".

It bothers me because it statistically shaky and baldly prejudiced. I would love to have a conversation about that, and I wish people would engage instead of downvote.

2 more replies

meelash15y ago

At least for Muslims, there is an obvious flaw since Islam forbids dating. So people that self-identified as "very serious" are pretty much by-definition not that. And since it is, among all the practices of Muslims, one that is popularly known and probably even overemphasized (relative to other practices), a large population of those that self-identified as "somewhat serious" are probably guilty of being a bit generous with themselves.

At the end of the day, of course, correlation does not imply causation. If either religious people get their panties in a bunch and overly-defensive or irreligious people start gloating, both are simply demonstrating that their dogma is overriding their intelligence.

j / k navigate · click thread line to collapse

82 comments

powrtoch15y ago

On the proper use of this data:

In other words, http://en.wikipedia.org/wiki/Bayes_theorem

lurchpop15y ago

a commenter on there made a good point about the "soul food" term, where that's also the name of a movie, book and tv series so it ends up being 3 (probably black) data points all counting towards 1.

moultano15y ago

For that reason, I'd be happier if they sorted by KL Divergence instead of just log-odds. That'd give a much better tradeoff between commonality and predictive power.

lazyjeff15y ago

patio1115y ago

tuesdays15y ago

Interesting, I was thinking the exact opposite, since those bars are all super annoying.

patio1115y ago

xiongchiamiov15y ago

Oh wow, that is annoying.

I do like the pop-out bars that the NYT has at the end of their articles, though. See, for example, http://www.nytimes.com/2010/09/07/business/economy/07jobs.ht...

1 more reply

Perceval15y ago

I'm the same way. Whenever I encounter one, I tool around in AbBlock Plus for a while until I find the appropriate scripts to block.

1 more reply

theycallmemorty15y ago

I too was intrigued by them and thought to myself just a week ago: "Those popup-at-the-bottom-of-the-page things are brilliant. I should email the Bingo guy and get him to A/B test them."

patio1115y ago

yalurker15y ago

steveklabnik15y ago

Generally, OKC's demographics are perceived as being college age, liberal, middle class, and alt friendly.

I'm not sure they've ever directly posted demographic data, but that's their perceived demographic, at least.

JohnnyBrown15y ago

cullenking15y ago

I found this, like most of the other blog posts by the OK Cupid team, pretty genius. I am glad the people who sit on top of this goldmine of social information have a good sense of humor.

It would be cool to see a statistician guest post. The OK Cupid people are great at coming up with ideas for analysis, but I'd love to see some solid stats behind some of their analyses.

dnautics15y ago

xiongchiamiov15y ago

[0]: http://en.wikipedia.org/wiki/List_of_soul_food_items

dnautics15y ago

the term is widely used among the population, not just the black population, in the mid-atlantic up to new york, chicago, and the south.

Since I stopped eating meat except for one day a week, a lot of that has gone out the window.

oiuygtfrtghyju15y ago

On the other hand this is a data point.

Knowing that the stuff you like with fish is called "sashimi" and the rice stuff is called "sushi" is a good indicator that you are asian.

Just like knowing that different types of cheese should actually taste different is a good indicator that you are european.

Natsu15y ago

> Knowing that the stuff you like with fish is called "sashimi" and the rice stuff is called "sushi" is a good indicator that you are asian.

ovi25615y ago

>Just like knowing that different types of cheese should actually taste different is a good indicator that you are european.

This is what prolonged exposure to soylent green does to you.

Just kidding, we love Americans.

eru15y ago

Continental European. (OK, the British have Stilton.)

reader500015y ago

It would also be interesting to see them do the same analysis for other features such as height, income, photo attractiveness, etc.

Similar analysis for craigslist personals by city: http://blog.kiwitobes.com/?p=42

narkee15y ago

They do have an enormous sample size. Much, much larger than any rigorous scientific studies based on getting undergrads to fill out surveys.

They've done similar analyses for things like self-reported heights, and photos:

http://blog.okcupid.com/index.php/the-biggest-lies-in-online...

http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/

reader500015y ago

1 more reply

ttdi15y ago

oiuygtfrtghyju15y ago

According to http://www.armyg1.army.mil/HR/docs/demographics/FY05%20Army%...

Army: White 63.9%, Black 19.0%, Hispanic 10.3%, Asian 3.8%

USA: White 75%, Black 12%, Hispanic 15%, Asian 4%

haroldp15y ago

Maybe we looked at the wrong service, since the latino profile also mentioned "marines":

http://en.wikipedia.org/wiki/File:HispanicMilitary.jpg

2 more replies

acon15y ago

Really interesting, but isn't this more about what white people want other people to think that they like, rather than what they actually like.

It would be interesting to compare this to what they actually like, but I have no idea how to get that data.

josefresco15y ago

reader500015y ago

Eh their analysis method is not too hot. From the comments section:

"The phrases included in the black boxes are the top 50 phrases most statistically correlated to that group. We calculated this as follows:

byrneseyeview15y ago

reader500015y ago

1 more reply

blahedo15y ago

moultano15y ago

>Basically this is just noise, and a null-run (random group assignments) would produce just as valid results.

You didn't substantiate this claim.

superk15y ago

Lukeas1415y ago

As a black man it was refreshing to see a "description" of my demographic that I actually related too, as opposed to a stereotype based one that you might find on BET.

dasil00315y ago

I liked that Mos Def and Lupe Fiasco beat out Kanye.

alexophile15y ago

I doubt I'm alone here, but when confronted with the "insert fucking theory" I promptly went through the list of what white people like, inserting "fucking" anywhere I could...

"Groundhog Fucking Day" kind of left a bad taste in my mouth.

colinprince15y ago

Re; "I am cool" being in the top phrases. Could be used in a sentence, like "I am cool with that".

Jus saying.

sjtgraham15y ago

I am white and like none of those things other than guitar and software. I suspect the case is similar for most of the HN readership.

This should be called "what the lowest common denominator like"

dasil00315y ago

It struck me as sort of funny that the minority each had a common self-description (cool, funny, simple), but the closest thing white guys have is "I'm a country boy".

mkyc15y ago

http://stats.grok.se/en/201009/Soul_food

proemeth15y ago

Judging by the stats (The Red Sox), we have strong US East Coast bias (due to user base).

Lukeas1415y ago

lanstein15y ago

You have now.

awongh15y ago

seventh word for asian males: software developer....

ohashi15y ago

Yeah I found that a bit strange... mechanical engineer was there too... Also, no Japanese? Many of the other major asian countries were broadcasting.

ohashi15y ago

a software engineer #3 for indians.

jbooth15y ago

Yeah Sox!!

I'm taking this as scientific evidence that liking the Red Sox makes people more attractive.

bmelton15y ago

Considering this is based on profiles from users at a dating site, wouldn't the inverse be more likely?

steveklabnik15y ago

While normally, I'd agree with you, OKC has a high poly and hookup population, so normal relationship scarcity constraints do not necessarily apply.

jbooth15y ago

Well, one could say it's perceived as attractive, since they're putting it on their profiles.

Of course, I wasn't really attempting to make a serious point. And the Sox are in a distant 3rd in the AL East right now anyways, which presumably explains the downvotes :)

aristus15y ago

untamedmedley15y ago

Asians might do this too if there wasn't so much ethnic/linguistic diversity among the Asian American population. As such, they likely use "safe" mainstream wording.

All this is to say that there are reasons that have nothing to do with intelligence that could have caused this sort of result.

xiongchiamiov15y ago

* Unlikely, but hey, I might as well bring it up, since we're talking about flaws in the analysis.

superk15y ago

Scientifically proven with a reading test based on the lengths of words, and metrics I just made up

Actually they're using the Coleman-Liau Index, a computer readability formula going back to 1975.

aristus15y ago

With all of those charts they passed over with "no comment", they saw fit to make a joke about a "Comic Sans Bible".

It bothers me because it statistically shaky and baldly prejudiced. I would love to have a conversation about that, and I wish people would engage instead of downvote.

2 more replies

meelash15y ago

j / k navigate · click thread line to collapse