Build an algorithm to predict friendships, then actually use it to meet people (opens in new tab)

(joingrouper.com)

60 pointswaxman12y ago37 comments

37 comments

This is a cool challenge but the prize is definitely lacking. I think anyone capable of writing an algorithm of the caliber you're looking for isn't likely to participate. I could be wrong but I think you're going to have to pony up some serious cash to get developers taking this seriously. Or you could go the more standard route and just hire someone to do the job.

chegra12y ago

I would do it cause I'm curious.

Edit: Now looking at the dataset, I wouldn't be able to use the model I developed personally.

benhamner12y ago

Ping me (b@kaggle.com) if you're interested in running this competition more formally on https://kaggle.com.

We've run hundreds of machine learning competitions & offer a real-time leaderboard to encourage competitive participation, a very active community of data scientists, and many other features that simplify running this type of challenge.

mariusz7912y ago

So basically they are asking people to build them an algorithm that will be a critical part of their business, in exchange for a free service that will be based on this algorithm. Right...

nottombrown12y ago

Sorry if this was unclear. You own any code that you write for this competition.

The prize is that we'll use your algorithm to validate any matches that you go on. If that doesn't seem worthwhile to you, feel free to pass on this contest.

Aloisius12y ago

Do you allow closed source entries? Rewriting an algorithm implemented in someone else's code to avoid copyright infringement is trivial not to mention inevitable given the performance requirement differences between a contest and a production site.

1 more reply

TrevorJ12y ago

So you won't use the winning solution in your product?

angersock12y ago

Yep, basically.

EDIT: ah fuck it might hack on this dataset anyways--if i get a beer and a date out of it ill help fiddle while rome burns

mjmahone1712y ago

This is interesting, but given your parameters (predict the most friendships), all you're technically asking for is recall. I'll write an algorithm that has 100% recall: predict that all people become friends with each other.

If this is really a competition (and not just "Here, have fun with our dataset!"), you need to define the rules a little bit more clearly. How are you weighing recall vs. precision? Or are you just looking at % correct labels, where the only two labels possible are "FRIENDS" and "NOT FRIENDS"?

nottombrown12y ago

Sorry this was unclear. We meant "correctly predict the most friendships"

You get 1 point for each friendship that you correctly predict did or did not occur. In the test data set ~50% of pairs became friends, so predicting "everyone became friends" would get 250 points, whereas a perfect algorithm would get 500 points.

I'm updating the README now to make our scoring system more clear.

glimcat12y ago

They're also looking for whether people become friends on Facebook.

The dominant factor here is going to be the rate at which the participants send and accept connection requests on Facebook. Some people send them to everyone they meet, some people never use Facebook.

KPI overfitting, yay!

(The best second-order effect is probably a multi-feature similarity measure between the participants and the person's current Facebook Friends, including graph distance to current Friends. In case anyone is taking a run at this.)

thebiglebrewski12y ago

This would be a little more fun if there was a cash prize. No offense meant, groupers look cool, but you'd probably get some more participation that way.

easy_rider12y ago

But then they could just hire a M.S. in Computer Science?

nottombrown12y ago

Hey HN, Grouper founder here. Let me know if you have any questions about the contest.

ddod12y ago

This is the sort of thing I'm personally very interested in, and I have some pretty novel ideas for how I'd approach it. That said, I wouldn't participate in this because it clearly devalues the industry. You should really rethink your approach.

Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

libria12y ago

> I wouldn't participate in this because it clearly devalues the industry.

People this may be aimed at:

* Experienced devs in boring day-jobs who are seeking some kind of off-time challenge.

* People just getting into ML and want to solve something real.

* CS students with spare time.

You know more about ML than me, but it doesn't sound like they're looking for a cancer cure; just fishing around for a one-off challenge. Or maybe they're taking names for future interview candidates.

> Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

Relax, dude. If people think this an interesting problem to solve, what's that to you?

jameszhang12y ago

Honestly, I think this is a very cool challenge. As someone who just went on a Grouper last night in Boston and had a great time, I think I just might participate and submit something. Do you have any limitations on how many people can form a team? Personally, I would pair on this with my roommate. He's the big data guy, and I'm the coder.

nottombrown12y ago

Feel free to work together as a team. Glad that you guys had fun last night :D

JFoss11712y ago

A few questions about the data:

1. How is it collected? From a survey, or grabbed from user FB profiles?

2. What is the platinum albums variable? Maybe the number of platinum albums that the user likes on FB??

3. Why are there some "male" entries in the f_gender column, and some "female" entries in the m_gender column?

nottombrown12y ago

1. The data is collected from the user's FB profile or comes from our internal ratings 2. The platinum_albums header is just a joke, we anonymized the data 3. Thanks for pointing that out. There was a bug with a few rows that is now fixed.

yankoff12y ago

Why you guys didn't want to run this competition on Kaggle? That could get it more attention from data scientists.

streptomycin12y ago

Is there more description of the data anywhere? Like what does having an "f_number_of_pets" of 7.5 mean?

mkwng12y ago

I just noticed in the FAQ it states, "...several fields have been renamed of course." If I'm understanding this correctly, any real-world conclusions you draw will be completely meaningless, as we're essentially working from a mislabeled dataset.

1 more reply

murtali12y ago

How is the submitted code used post contest?

chegra12y ago

Where do you submit your results?

maxk4212y ago

I'll be the first to say it: Your data is either incorrect, arbitrary, or we're missing some information here.

Why does everyone have "7.5" - 8 siblings and 7.5 - 8 "weekly workouts" and 7.5 - 8 platinum albums?

maxk4212y ago

Specifically, you should explain all the columns, including:

- Is that the person's height in inches?

- What does the asterisk in certain column-names indicate?

- Why do the pets, platinum_albums, weekly_workouts, number_of_siblings and pokemon_collected values seem to fall in the range of 7 - 8?

Also, this dataset is far too small. There is a single male-male relationship and that's not going to provide any significant data if we're looking at genders at all.

I would also argue that it's not the best set of metrics to use to determine whether people will become friends. Age and facebook_friends_count might give you some hints, but I seriously doubt that shoe size has as big an impact on the potential for friendship as, say, common interests, shared culture, income class, or other socioeconomic factors.

nottombrown12y ago

The headers with asterisks are intentionally mislabeled. Updated this to be more clear in the README.

JFoss11712y ago

You write in the README that the mislabeled columns are "from our internal ratings". Can you give any more definite sense of what this means? What kind of things are these ratings based off of? What are they designed to reflect? How are they computed (roughly)?

chegra12y ago

Mutual Information for the fields:

I(f_facebook_friends_count,members_became_friends) = 0.117320113379

I(m_facebook_friends_count,members_became_friends) = 0.113972809724

I(m_facebook_photos_count,members_became_friends) = 0.0449092782303

I(f_facebook_photos_count,members_became_friends) = 0.0426531483254

I(m_shoe_size,members_became_friends) = 0.00276175766018

I(m_height,members_became_friends) = 0.00255043390135

I(f_shoe_size

,members_became_friends) = 0.00233148724025

I(m_age,members_became_friends) = 0.00198005768283

I(f_height,members_became_friends) = 0.0013606978915

I(m_weekly_workouts,members_became_friends) = 0.00123271513215

I(f_age,members_became_friends) = 0.00122660347743

I(m_platinum_albums

,members_became_friends) = 0.00111710129455

I(f_number_of_pets,members_became_friends) = 0.00108593667378

I(f_pokemon_collected

,members_became_friends) = 0.000880040104571

I(m_number_of_siblings,members_became_friends) = 0.000830295252089

I(f_platinum_albums

,members_became_friends) = 0.000820683185117

I(m_number_of_pets,members_became_friends) = 0.000768855827053

I(m_pokemon_collected

,members_became_friends) = 0.000720822383999

I(f_weekly_workouts,members_became_friends) = 0.000620666529567

I(f_number_of_siblings

,members_became_friends) = 0.00019278884716

I(f_gender,members_became_friends) = 0.000124279429698

I(m_gender,members_became_friends) = 0.000124279429698

icebraining12y ago

This reminds of http://robrhinehart.com/?p=1005

That fact that the women are depicted as just three pairs of legs doesn't help, though.

joshfraser12y ago

Ok, let's make this more interesting. I'll pay $50 to the first person to de-anonymize their training set.

chbg12y ago

members_became_friends = 1/(1+ exp(-1297.88087 * f_shoe_size + m_shoe_size * m_facebook_friends_count - 11761.6138))

j / k navigate · click thread line to collapse

37 comments

avalaunch12y ago

chegra12y ago

I would do it cause I'm curious.

Edit: Now looking at the dataset, I wouldn't be able to use the model I developed personally.

benhamner12y ago

Ping me (b@kaggle.com) if you're interested in running this competition more formally on https://kaggle.com.

mariusz7912y ago

So basically they are asking people to build them an algorithm that will be a critical part of their business, in exchange for a free service that will be based on this algorithm. Right...

nottombrown12y ago

Sorry if this was unclear. You own any code that you write for this competition.

The prize is that we'll use your algorithm to validate any matches that you go on. If that doesn't seem worthwhile to you, feel free to pass on this contest.

Aloisius12y ago

1 more reply

TrevorJ12y ago

So you won't use the winning solution in your product?

angersock12y ago

Yep, basically.

EDIT: ah fuck it might hack on this dataset anyways--if i get a beer and a date out of it ill help fiddle while rome burns

mjmahone1712y ago

nottombrown12y ago

Sorry this was unclear. We meant "correctly predict the most friendships"

I'm updating the README now to make our scoring system more clear.

glimcat12y ago

They're also looking for whether people become friends on Facebook.

KPI overfitting, yay!

thebiglebrewski12y ago

This would be a little more fun if there was a cash prize. No offense meant, groupers look cool, but you'd probably get some more participation that way.

easy_rider12y ago

But then they could just hire a M.S. in Computer Science?

nottombrown12y ago

Hey HN, Grouper founder here. Let me know if you have any questions about the contest.

ddod12y ago

Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

libria12y ago

> I wouldn't participate in this because it clearly devalues the industry.

People this may be aimed at:

* Experienced devs in boring day-jobs who are seeking some kind of off-time challenge.

* People just getting into ML and want to solve something real.

* CS students with spare time.

> Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

Relax, dude. If people think this an interesting problem to solve, what's that to you?

jameszhang12y ago

nottombrown12y ago

Feel free to work together as a team. Glad that you guys had fun last night :D

JFoss11712y ago

A few questions about the data:

1. How is it collected? From a survey, or grabbed from user FB profiles?

2. What is the platinum albums variable? Maybe the number of platinum albums that the user likes on FB??

3. Why are there some "male" entries in the f_gender column, and some "female" entries in the m_gender column?

nottombrown12y ago

yankoff12y ago

Why you guys didn't want to run this competition on Kaggle? That could get it more attention from data scientists.

streptomycin12y ago

Is there more description of the data anywhere? Like what does having an "f_number_of_pets" of 7.5 mean?

mkwng12y ago

1 more reply

murtali12y ago

How is the submitted code used post contest?

chegra12y ago

Where do you submit your results?

maxk4212y ago

I'll be the first to say it: Your data is either incorrect, arbitrary, or we're missing some information here.

Why does everyone have "7.5" - 8 siblings and 7.5 - 8 "weekly workouts" and 7.5 - 8 platinum albums?

maxk4212y ago

Specifically, you should explain all the columns, including:

- Is that the person's height in inches?

- What does the asterisk in certain column-names indicate?

- Why do the pets, platinum_albums, weekly_workouts, number_of_siblings and pokemon_collected values seem to fall in the range of 7 - 8?

Also, this dataset is far too small. There is a single male-male relationship and that's not going to provide any significant data if we're looking at genders at all.

nottombrown12y ago

The headers with asterisks are intentionally mislabeled. Updated this to be more clear in the README.

JFoss11712y ago

chegra12y ago

Mutual Information for the fields:

I(f_facebook_friends_count,members_became_friends) = 0.117320113379

I(m_facebook_friends_count,members_became_friends) = 0.113972809724

I(m_facebook_photos_count,members_became_friends) = 0.0449092782303

I(f_facebook_photos_count,members_became_friends) = 0.0426531483254

I(m_shoe_size,members_became_friends) = 0.00276175766018

I(m_height,members_became_friends) = 0.00255043390135

I(f_shoe_size

,members_became_friends) = 0.00233148724025

I(m_age,members_became_friends) = 0.00198005768283

I(f_height,members_became_friends) = 0.0013606978915

I(m_weekly_workouts,members_became_friends) = 0.00123271513215

I(f_age,members_became_friends) = 0.00122660347743

I(m_platinum_albums

,members_became_friends) = 0.00111710129455

I(f_number_of_pets,members_became_friends) = 0.00108593667378

I(f_pokemon_collected

,members_became_friends) = 0.000880040104571

I(m_number_of_siblings,members_became_friends) = 0.000830295252089

I(f_platinum_albums

,members_became_friends) = 0.000820683185117

I(m_number_of_pets,members_became_friends) = 0.000768855827053

I(m_pokemon_collected

,members_became_friends) = 0.000720822383999

I(f_weekly_workouts,members_became_friends) = 0.000620666529567

I(f_number_of_siblings

,members_became_friends) = 0.00019278884716

I(f_gender,members_became_friends) = 0.000124279429698

I(m_gender,members_became_friends) = 0.000124279429698

icebraining12y ago

This reminds of http://robrhinehart.com/?p=1005

That fact that the women are depicted as just three pairs of legs doesn't help, though.

joshfraser12y ago

Ok, let's make this more interesting. I'll pay $50 to the first person to de-anonymize their training set.

chbg12y ago

members_became_friends = 1/(1+ exp(-1297.88087 * f_shoe_size + m_shoe_size * m_facebook_friends_count - 11761.6138))

j / k navigate · click thread line to collapse