There is no point antagonising people by guessing information about them wrongly - particularly if it's something they've become sensitised to by it occurring frequently.
If you need to know someone's gender (and largely, you don't), then ask them.
Except, of course, that I am male. My name is used for both genders. The thing completely failed on a few other ambiguous names I tried. I'll second AndrewDucker's opinion—just don't.
The numbers are honest enough to admit that the result is crap in this case - this type of statistical openness should be encouraged.
{"name":"maria","gender":"female","probability":"1.00","count":700}
{ "name": "kay", "gender": "female", "probability": "0.93", "count": 57, "country_id": "US", "language_id": "en" }
If any service needs to know gender (and I'm having a hard time thinking of times you NEED to know gender - dating sites?) - why not just ask? surely in a situation where you're reliant on having accurate gender information, guessing from $firstname and getting it wrong is worse than asking.
Male homepage: Die Hard, Star Wars, Bridget Jones.
Female homepage: Bridget Jones, Twilight, Star Wars.
Both males and females are shown primarily movies they are more likely to be interested in and your bounce rate goes down.
This is Hacker News. Such enlightened thought is frowned on by our new brogrammer overlords. Here's your beer.
If you wanted to deride the fact that many folks here won't spent multiples of effort on special, experimental, no-right-answers-and-likely-to-be-criticized-for-it-if-you-even-try cases that affect minuscule fractions of their potential user base, well...get in line behind the IE5 advocates, I guess.
Someone recommends using a free form entry for gender. No amount of normalization will fix the "ham sandwich" entries (except that we know they are nearly all male), so you'd trade the integrity of a small percentage of your data for the appearance of "making an effort" for the vanishingly small percentage. Net fail.
Just to be clear, my primary feeling here is that -- in the hypothetical case where gender matters -- you're best served by keeping it simple: (female | male | other/it's complicated | prefer not to answer). This should serve all cases equally.
Do you simply add some extra genders? Male-to-female transexual, female-to-male transexual, intersex? No matter how many categories you add, you'll always annoy someone for missing them out. Does 'genderqueer' and 'genderfluid' count as the same category, or different ones?
Maybe just add a free text form for people to input their gender? But then it's impossible to normalise if you want to do any analysis.
Maybe we should just be enlightened and ignore gender altogether? But sometimes knowing your user's gender is really important, and it seems weird to discard this data because some people don't fit. Maybe the best compromise is simply to have 3 categories - male/female/other - though even then you'll get complaints. "Who are you calling 'other'?"
Anyone have any other thoughts?
PS: I seem to see way more people in tech complaining about brogrammers than actual brogrammers.
A better approach, in the absence of more complex models, would be to use Laplace's sunrise formula.
She isn't the only one either, there are hundreds of them that took their name from a Catholic saint.
http://api.genderize.io/?name=eloi&language_id=ca
http://api.genderize.io/?name=tomeu&language_id=ca
http://api.genderize.io/?name=rigoberta&language_id=es
http://api.genderize.io/?name=presentaci%C3%B3n&language_id=...
Credit for distinguishing between names in languages, though! Joan returns female in English, but male in Catalan.
A lot of complaints, excluding the binary gender complaints, totaly forget about how languages like portuguese / french have male / female differences for nouns and other language constructs.
Let´s say I have to build a phrase where I have the user profession like engineer and I don't know upfront, for portuguese male would be "engenheiro" or " engenheira" for female. It does have a lot of practical uses. And with a big enough training, the decision to use for that user is on your hands.
Another strategy is to use gender-neutral terms until you find out the gender, as asking directly might be considered rude in some cultures.
"Benedikt-SSON is definitely male while Katrín Jakobs-DÓTIR is female"
(Hey, I swear it was before taking a look http://en.wikipedia.org/wiki/Icelandic_name)
Yeah, how about no.
It also seems accurate:
Pat = about 50/50 David = All man Jessica = All woman
Also, wrt to "binary gender identity" complaints, are we all college freshmen here?
* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.
We aren't, which is exactly why it's a problem.
I fail to see how this API needs to accommodate transpeople in its 0.1 release.