Attacking discrimination with smarter machine learning (opens in new tab)

(research.google.com)

164 pointsdudisbrie9y ago168 comments

168 comments

Well. At the end of the day, the companies will pick thresholds and rates to maximize their profits, based upon the data that they have available. They don't know every detail of our personal lives -- which would also be kind of unsettling -- so they have to resort to a simplified picture. Simplifications are always prejudiced at the individual level but if the prejudice is reflected in the numbers, it's balanced. It may not feel perfectly just but at least it's more just than an imbalanced prejudice.

A real life example: In many (all?) countries, it's common to pay more for your car insurance if you are young, if you are male, or if you recently acquired your driver's license. One might frame this as a prejudice towards young, male drivers as being reckless. But statistically they are just that, so the prejudice is balanced (within a reasonable tolerance). It's unfair to the careful young, male driver, but well, life is not always fair.

hajile9y ago

Women's healthcare costs several times more than mens over their lifetime (and that's before factoring in pregnancy). Countries are just fine "equalizing" that even though that means that men pay way more for their health insurance than they should have to. How many people complain about this happening?

Collectivism is just fine when you are the one on the receiving end. Either we set absolute lines of discrimination across all industries or we disallow it entirely.

tptacek9y ago

Generally we're most of us in society pretty comfortable with paying for the continuation of the species. There are exceptions to every rule, of course; some of us (not including me) are unhappy we pay to pave the streets, too.

2 more replies

tptacek9y ago

The difference with machine learning is that the model isn't designed by humans, through an actuarial process we can keep our brains wrapped around. It's a black box. We are OK attributing "crash risk" to young male drivers, because we can observe both that they are as a cohort statistically likely to crash and also understand why that would be the case. On the other hand, we're not comfortable with the idea that a computer program might spot some irrelevant correlation that determines that Armenians shouldn't get car loans --- and we're especially not comfortable with the idea that we might learn that this has happened only after years of unintended discrimination, because there is no straightforward way to interrogate the model for every possible bogus correlation it may have snagged on.

That's the computer science problem being worked on here.

tjic9y ago

Please don't say "we".

Not everyone shares your politics.

I'm perfectly happy with algorithms detecting that certain people are more likely to be safe drivers than average, and giving them lower rates, and concentrating premiums on the groups more likely to be in accidents, even if I don't understand why Armenians (in your example) get in more crashes.

6 more replies

yummyfajitas9y ago

No, bogus correlations and inaccurate conclusions are explicitly NOT the computer science problem being worked on here. You are trying to frame the problem as algorithms wrongly classifying people. That's just a standard statistics problem (called "overfitting") and has nothing to do with fairness.

The problem being worked on here is "what if Armenians shouldn't get car loans because they don't pay them back as much as other groups?" I.e., algorithms rightly classifying people leads to results that we believe are "unfair".

This research is illustrating that you can't simultaneously have accuracy and fairness. You need to explicitly decide how much accuracy you are willing to give up to get fairness. I.e. it's computing the tradeoffs needed to evaluate the ethical question: how many Armenian deadbeats should you extend credit to in order to be "fair"?

Go play with the simulation to see. The various fairness criteria all achieve lower than maximal profits.

2 more replies

BickNowstrom9y ago

It is possible to inspect black box models. See for instance http://www.blackboxworkshop.org/pdf/Turner2015_MES.pdf and https://homes.cs.washington.edu/~marcotcr/blog/lime/ . It is also common to use highly accurate white box models for cases where it is important to not overfit, leak, or discriminate, like MARS and GA^2M (somewhat out of scope for an actuary using R).

If a computer program spots an irrelevant unhelpful correlation its generalization error will noticeable go up. If it is an irrelevant "helpful" correlation it means there is a problem with the data (such as leakage), not with the algorithm. If there is a problem with the data, all bets are of, both for black and white box models.

A blackbox model will probably not find that being an Armenian alone will lead to more crashes. Being non-linear in nature it will find interactions (young male Armenians are more likely to crash than young males in general). If the importance of such a feature is not significant enough to distinguish it from noise, then regularization may automatically remove it.

Even if we observe that 99 out 100 Armenians crash their cars, and you decline someone a loan, because he/she is Armenian, you may just have discriminated against the 1 Armenian who is a safe driver. Young male drivers who drive safely have a worse time getting loans, because their group (the set of young male drivers) spoiled it for them. So their only hope of getting a loan is you adding more features (like nationality), to be able to distinguish them as safe drivers, not removing them and lumping them into the status quo.

1 more reply

losvedir9y ago

I don't get the impression that most people are afraid of irrelevant correlations, but rather relevant ones that nevertheless disproportionately affect certain populations.

To use your Armenian example, it could be that while being Armenian doesn't actually affect your driving, a "true" model could still end up being bad for Armenians if being Armenian is correlated with the things that actually do affect crash risk.

But what about this: we don't need to solve all social ills every single time. What if we let the algorithm correctly decide crash risk, and if we notice that it unduly impacts Armenians and that's not an outcome we want, we via a separate channel compensate the Armenians? That is, acknowledge the fact that Armenians may crash more, but give them a government subsidy to offset the higher premiums, and work to bring the premiums down (i.e.: fix the underlying issues that being Armenian is correlated with).

It's related to something I've been thinking about lately with regard to minimum wage. I like the idea that everyone should have a livable income, but tying the implementation to businesses that have low wage jobs seems like mixing concerns. For example, my company doesn't have any minimum wage jobs, but shouldn't my company chip into this social ideal same as any other?

What if we let the businesses pay whatever the market will bear, and if we decide that as a social concern people should get more than that, we subsidize them from the government, which is wear this concern is coming from in the first place.

2 more replies

reader50009y ago

I dont think most ML algos are as black box as you think, and if there were such an algo that regularly had a problem with finding "irrelevant correlations" then it wouldnt be a very good ML algo and people wouldn't use it.

1 more reply

adrianN9y ago

You're assuming that human brains are good at differentiating between irrelevant correlation and causation. I don't think that's right.

Scea919y ago

Depends on the model. Decision trees for example are white box models. In scenarios like this white box models are always preferred.

pdkl959y ago

> The difference with machine learning is that the model isn't designed by humans

Even if that was true[1], where are you getting the data from? Choosing which types of data to use is just as important as the model.

[1] as others have already pointed out, it depends on the technique/etc

cwilkes9y ago

In your example there could be deeper reasons for Armenians having higher crash rates. Let's say it is a combination of zip code, the food they are eating from a local Armenian Bell, and the phase of the moon.

However with a lot of ML classes we are told to look for the simpler explanation which would be nationality in this case, in the absence of collection of the real reasons.

Not sure how that can be gotten around. I'll read the paper.

theBobBob9y ago

Interestingly the EU has banned insurance underwriting based on gender even with all the actuarial data backing it up. Age is still fair game though.

reader50009y ago

So (assuming males are more costly to insure) either (a) females pay more than they should and are effectively subsidizing males or (b) males pay less than they should and the insurance companies will go broke..

5 more replies

angry-hacker9y ago

Which made the insurance for women, who are traditionally safer drivers because not taking unless risks, go up. Well facts can't be racist but they are at the same time.

2 more replies

jtbigwoo9y ago

Since the 80's, the state of Montana has banned insurance underwriting based on gender. There have been attempts to reverse that law, but none have passed.

My male Montana cousins always had slightly cheaper insurance than I did, but my female Montana cousins more than made up for it.

tomp9y ago

Is it just driving insurance or also other kinds (e.g. health insurance, where women might be paying more than men)?

beejiu9y ago

Discrimination is far more nuanced than it first appears.

We accept that younger drivers pay more for car insurance. And that disabled people pay more for their health insurance. And that the government (in the UK) has a special business grants scheme for ethnic minority entrepreneurs.

Obviously discrimination depends on context. If a car insurance firm had a special policy for ethnic minorities, people would (rightly) be outraged. But in the context of a government intervention, based on the evidenced disadvantages that ethnic minorities face, discrimination is accepted.

Scea919y ago

>[...] concept called equal opportunity. Here, the constraint is that of the people who can pay back a loan, the same fraction in each group should actually be granted a loan.

This does not seem fair to me, because if this is applied then your race (group) would determine your credit score threshold which feels discriminatory to me.

I feel that, by definition, it is not discriminatory only if none of your attributes that you could be discriminated against (gender, race, ...) are taken into account at all.

Maybe the concept of 'equal opportunity' is just some compromise between discrimination and making less informed decisions.

tomp9y ago

I think it's useful to figure out why there are discrepancies between two groups.

For example, let's take blacks in the US. The data tells you that a black person is more likely to be a criminal than a white person. There are two possible reasons for this: (1) blacks are more prone to crime, or (2) blacks are more likely to live in circumstances that make them criminals. With access to only anecdotal data, I strongly believe that (2) is true, and that if you took into account enough circumstances (e.g. single parent, school district, income level, parents' wealth) you'd be able to remove race from your model and still arrive to the "equal opportunity" result. That way, you wouldn't discriminate based on race, but you would still help the people most needy of help (poor, uneducated, etc.). I think the same applies to colege applications and sex wage gap, which is why I strongly oppose any kind of affirmative action.

Life expentancy sex gap might be a different case. IIRC, men die earlier because of some behavioural/social tendencies (working dangerous jobs, supressing emotions, risky behaviour (speeding, smoking)), so maybe there are some non-discriminatory (or at least less-discriminatory) ways of determining life expectancy (e.g. testosterone level, job description, ...). However, there are also biological differences that seem very strongly linked to the quality of being male (i.e. the Y chromosome). E.g. prostate cancer is less lethal than breast cancer, and women are more likely to get MS than men. These particular examples suggest that men should live longer, but I'm guessing there might be other gender-specific ilnesses that might reduce their (our) expected lifespan. If that's the case, I don't think it's too discriminatory to have different insurance fees for different sexes.

In all such scenarios, however, there could still be broader societal goals that would override specific instances of (non-)discrimination. For exapmle, AFAIK women's health is more expensive (because of pregnancy), but having more children benefits everyone in the society (in the West), so it makes sense to "discriminate" against men by letting women pay less for their health insurance.

corey_moncure9y ago

> The data tells you that a black person is more likely to be a criminal than a white person. There are two possible reasons for this:

I'd like to add a third possible reason for your consideration. Since "criminality" i.e. guilt of committing a crime is determined after a process engaging the law enforcement and justice systems, we have to examine whether there are inherent biases in those systems that result in skewed statistics. For instance, do police officers selectively target blacks for monitoring and investigation? Are blacks discriminated against in the courtroom as a result of procedure or human nature?

3 more replies

SerLava9y ago

>if you took into account enough circumstances (e.g. single parent, school district, income level, parents' wealth) you'd be able to remove race from your model and still arrive to the "equal opportunity" result. That way, you wouldn't discriminate based on race, but you would still help the people most needy of help (poor, uneducated, etc.).

This is pretty much the only way to help people who were born addicted to meth inside a trailer in the mountains who also happen to be blessed with being white.

mspeedy9y ago

The problem is, when given access to a large number of classifiers, some of which have inevitably been affected by a pre-existing racial bias, a black box machine learning algorithm will likely become discriminatory as well if race is not in some way represented and equalized.

For instance, many justice systems in the U.S. use machine learning software to determine the likelihood that a criminal will reoffend, and use that prediction to determine sentencing. Race is never used explicitly as a classifier, but the program ended up being significantly more likely to rate blacks as more likely to reoffend [1]. Classifiers like "had parents with previous criminal convictions" can be misleading when blacks are more likely to be convicted for the same crime as whites. It doesn't mean that the white person's parents didn't engage in criminal activity or other reprehensible behavior that might cause their child to become a violent, repeat offending criminal - just that they were able to get away with it more easily because of a biased system.

Machines end up just as biased as the data they've been trained on, so if we are going to use computers to judge things that have such a significant impact on people's lives, we can't risk racism slipping through the cracks.

[1] https://www.propublica.org/article/machine-bias-risk-assessm...

1 more reply

deong9y ago

But how do you do that? Race is baked into a lot of the attributes your classifier is going to find informative. Is someone a good credit risk? To decide, you look at features like past payment history, available balance, zip code, etc.

Past payment history: if you're black, prior discriminatory behaviors may have limited your ability to open credit accounts, and thus you have less history to go on. Available balance has the same reasoning. Zip code correlates with race.

I'd hope very few people are including an "is_black" feature in their classifiers. If you eliminate anything that is informative towards race though, you're likely going to have a classifier that doesn't work very well.

The problem is that we have datasets that have arisen from a history that included significant racism, both overt and latent. There is no way to separate those effects from the data. You either get an "optimal" classifier that is racially biased in ways we don't want, or you get one that intentionally gives up some perceived performance in favor of fairness.

tboyd479y ago

You'd be surprised. If you've ever racially identified on a standardized test, guess what, you've just set the is_black feature.

ascendingPig9y ago

Why should it not be fair? Under this constraint, the perfect equal opportunity model would be a model that accurately represents who will pay back a loan.

raverbashing9y ago

Well, "theoretically" you're right

In practice, the first variables to be used for classification are the ones that have a biggest effect. Then you're going to use them (from order of importance) as eliminatory or classificatory

> because if this is applied then your race (group) would determine your credit score threshold which feels discriminatory to me.

But the opposite is also discriminatory, which is what the article is showing. Because then you're using the same ruler to evaluate different groups, and of course the minority person with 2 jobs can't match the credit score of an Ivy-League educated WASP

Dowwie9y ago

For more information about "ethics and algorithms", read: "Weapons of Math Destruction" [1] or at least listen to the EconTalk podcast between the author and host Russ Roberts [2]

[1] https://www.amazon.com/Weapons-Math-Destruction-Increases-In...

[2] http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...

icebraining9y ago

Previous discussion on the podcast episode: https://news.ycombinator.com/item?id=12642432

denzil_correa9y ago

> Restricting to equal opportunity thresholds transfers the "burden of uncertainty" away from these groups and onto the creators of the scoring system. Doing so provides an incentive to invest in better classifiers.

Personally, I find this as the key outcome here. The accountability is on the people/systems who make the decision and that leads to an appropriate incentive. Win-win as a start.

sergioisidoro9y ago

I've played around with this concept, trying to replicate some previous work [1].

It's a sensitive topic, because sometimes we're actually tampering with the data, trying to eliminate known human or selection biases.

The first defence against discrimination is, in my honest oppinion, for everyone working with data to be aware of these problems. To know that, besides ROC, precisions and recalls, we should measure the impacts of the models in sensitive demographics (gender, race, nationality, sexuality).

And one of the things that I learned (in [1]) is that, even if you're carfull with the features you use, you might still have a negative effect.

[1] https://github.com/sergioisidoro/aequalis

denzil_correa9y ago

> And one of the things that I learned (in [1]) is that, even if you're carfull with the features you use, you might still have a negative effect.

One needs to understand how these features interplay with each other. For example, you may not directly use a protected class feature (race) to make your prediction but you might end up using a secondary or tertiary variable (like location) to end up learning a protected class feature due to statistical correlations.

cs28189y ago

What strikes me as interesting in this example is that it seems to assume the classifier is run once and that its decisions have no effect on future decisions.

I would imagine that if we are separating people into groups based on demographic or social factors to make decisions, then those decisions may have an impact on that entire group. (In this example, maybe granting more loans to the blue group alters the group characteristics and leads to higher profit in the long term despite higher immediate risks).

Is there an area of ML research that considers this kind of concept?

zitterbewegung9y ago

I don't think there is though it may fall under ethical considerations of AI or economics of AI. Would be hard to do a study since AI has really only been around for less than 50 years.

amelius9y ago

But how can we find out if a company uses ML in a non-discriminating way? If we cannot see it, and check it, then there is no incentive for companies to use it.

My guess is that at most companies spending time on making an algorithm non-discriminating will be viewed as a waste of time and money.

riskable9y ago

I work for a big bank... As long as people freak out when banks are found to be discriminating they will do their best to not discriminate.

Banks are built on trustworthiness. Having your bank's name in the headlines for discriminatory practices can have a severe negative impact on trustworthiness. They have a whole teams of people devoted to this topic and every year at most banks every employee has to learn about, "reputational risk."

Having worked at big banks for over a decade now I am 90% certain that discrimination by banks at this point is primarily due to carelessness.

tptacek9y ago

That is a story that banks like to believe about themselves, but it's worth considering that emergent discrimination (and the broader bucket of emergent malfeasance) is a property of all complex systems, not just machine learning. So, for instance, Wells Fargo needed to do more than just hope that its trustworthiness would survive its incentive systems.

1 more reply

BickNowstrom9y ago

Like the line in Fight Club [1] about recalls it is a simple formula:

Only if the cost * probability of a PR disaster outweighs the cost of a poor credit risk model, is a bank economically incentivized to not discriminate.

Then again, even if you take care to not directly discriminate, you will probably indirectly discriminate. For instance, in your credit risk model, remove the `gender` column from the features, and use the remaining features to try to predict it. If performance is better than random guessing, you are using proxy features (like `income`) to discriminate on `gender`. You will find that nearly every feature you use is correlated with `race`. Now what? Throw away all these features and let your competition eat your lunch?

Along the lines of what Pedro Domingos said [2], you can not solve the problem of discrimination by making poorer performing machine learning models that adhere to your view of what is ideal. Discrimination won't disappear because you made a model that makes you feel good. Want no discrimination of women? Work on closing the wage gap. Don't cripple your statistically correct ML models or sweep the discrimination under the rug, covered by correlated variables.

It is not so much carelessness, as it is the nature of the beast. And banks remain in business by how much they can trust their customers first and foremost, trustworthiness by customers is a second (and customer trust is very much malleable: It is the perception of trust, not objective trust like "can we trust this customer to pay back their loan").

It also depends on how you (mathematically) define "fairness" [3]. You can define fairness in ways that still allow you to discriminate.

[1] A new car built by my company leaves somewhere traveling at 60 mph. The rear differential locks up. The car crashes and burns with everyone trapped inside. Now, should we initiate a recall? Take the number of vehicles in the field, A, multiply by the probable rate of failure, B, multiply by the average out-of-court settlement, C. A times B times C equals X. If X is less than the cost of a recall, we don't do one.

[2] https://www.youtube.com/watch?v=furfdqtdAvc

[3] https://algorithmicfairness.wordpress.com/2016/09/26/on-the-... https://arxiv.org/abs/1609.07236

yummyfajitas9y ago

The problem here is that non-discrimination costs you money (in practice: lots!), because it requires you to issue bad loans to non-Asian minorities.

Secondly, it isn't mathematically possible to be "nondiscriminatory" - there are multiple definitions of that term and they are mutually conflicting. For example, as this article shows, "equal outcomes", "equal opportunity" and "equal treatment" (group unaware) don't make the same decisions.

So no matter which definition of "fair" you choose, some intrepid reporter can choose a different definition and then write a clickbait article calling you racist.

denzil_correa9y ago

> But how can we find out if a company uses ML in a non-discriminating way?

It is an interesting question. However, there are ways to answer this question as we have been measuring discrimination before algorithms. Algorithms are a way of reaching a decisions. Therefore, we can measure discrimination if we audit decisions generated by algorithms. In other words, look at the input-output of the algorithm and measure the impact. In the US, the doctrine of disparate impact has long been used a guideline to evaluate discrimination [0].

[0] https://en.wikipedia.org/wiki/Disparate_impact#The_80.25_rul...

PeterisP9y ago

Since the original article shows nice examples that there is no single "non-discriminatory way" possible - at the very least, you have to do the classic tradeoff between equal opportunity (and discriminatory outcomes) or equal outcomes (and discriminatory opportunities), you can safely assume that all companies use ML in somewhat discriminating way. At the very least, if they ask your ZIP code and it "matters" in their decision system in any way, then that's a positive sign of discrimination (one way or another) since it's so correlated with race among other things.

chishaku9y ago

I think companies like Google need a team akin to NYT's public editor. Obviously not perfect, but better than nothing.

d--b9y ago

Nice to see that the debate has reached the ears of the main people working on this field.

What is important to note here is that we need to tweak the mathematical model to the culture we want to achieve. In other words, the objective function of the optimization problem needs not only match the current state of the world, and provide an hindsight in one's own economic interests, it also needs to take into account the culture that we want to reflect, and influence. Otherwise, these statistics are just a giant status quo amplifier.

icebraining9y ago

I'm not sure that's clear. There are actually two ways to achieve the outcome we want; tweaking the model or changing the inputs.

What I mean, say the model identifies that a certain group has a greater risk due to systemic problems. If you change something about the group, you can change the calculated risk without changing the model. And this may very well be a better way to achieve the outcome you want.

Specifically, by preventing insurance companies from using a more accurate mode, what you're demanding is that the random people who happen to have taken the same insurance packages but are not part of the group should make an extra contribution to fix these systemic problems.

But why them? Shouldn't we all contribute instead, hopefully using a fair system for assessing how much should each pay?

Instead of tweaking the model, you can change the inputs by providing a state-backed guarantee to the underprivileged groups. Isn't it more fair overall?

d--b9y ago

No I don't think I expressed myself clearly. The problem is that machine learning learns from biased data. The input data is the total US population, which has wealthier groups than others. Because of the way we train machine learning (using all the data we have), the bias that is in the original data gets transfered to the logic.

Specifically, a machine learning will produce different numbers for 2 individuals with the exact same characteristics except the race. And that is the problem that needs to be addressed.

Let's put it in another context. Let's say I'm a white athlete, and I'm very good at running the 100m race. Actually I run just as fast as a black person who is my main rival. Now if someone has to select one of us to go to the Olympics, they should toss a coin to decide who goes. If you use a ML algorithm, it would absolutely send the black person, because no white people has won a 100m race in the last 20 olympics. That's the kind of bias ML does and that needs to be addressed.

adrianN9y ago

Tweaking the models to lead towards the world we'd like to live in seems like a noble goal, but I wonder how you should incentivize this. If discriminatory models provide higher profits, the idealistic models won't be used.

denzil_correa9y ago

> What is important to note here is that we need to tweak the mathematical model to the culture we want to achieve

Good point. Systems should work for people, not the other way round.

1 more reply

beejiu9y ago

Presumably it means unlawful or unethical discrimination, since classification without discrimination doesn't make sense.

tptacek9y ago

It's referring to the much-talked-about effect of emergent discrimination, where the model fitting process has the effect of amplifying the status quo, despite the fact that the status quo is informed in large part by structural injustices. For (oversimplified) instance: poor black people represent a cohort of loan applicants likely to default, and the model fitting process may go a step worse and attribute "default risk" to all black people.

The key thing to understand is that we're talking about discrimination that is usually unintended and unexpected by the designers of these systems.

tjic9y ago

> despite the fact that the status quo is informed in large part by structural injustices.

This is presented as if it's an unambiguous fact, when it's largely a political stance.

2 more replies

pidge9y ago

If membership in a discrimination-protected category is not completely binary, equal-opportunity policies incentivize people who could identify with multiple categories to choose the one with the lowest discrimination -adjusted threshold.

zimzim9y ago

Is it possible to make the equal opportunity and the max profit thresholds the same? maybe with a special tax system (the closer your threshold is to the "equal opportunity" the less tax you pay)? edit: i forgot its illegal to discriminate

lifeisstillgood9y ago

sorry, just checking - young black men are statisically less likely to rape than equivalent white (young men)

That is surprising. And given this discussion about only wanting stats that have a valid explanation as well as fitting the facts, is interesting.

sctb9y ago

We would ask that you please don't introduce this into the discussion out of nowhere, it's just off-topic. See also: https://news.ycombinator.com/item?id=13007847.

We detached this subthread from https://news.ycombinator.com/item?id=13006361.

lifeisstillgood9y ago

I did not introduce it out of nowhere it was a reference to the parent comment, which did seem relevant to the thread

It is a sensitive subject I suspect

And I cannot find the original parent comment in the discussion and I look like a total dick with my comment coming out of nowhere.

I will try and understand what moved where and get back to you

Aha:

"""A great example is how very resentful many young white men of college age are that universities are requiring them to take sensitivity courses designed to reduce the instance of campus rape, but strictly speaking men of that age are the overwhelming majority of bad actors in that environment. Statist"""

It is only my memory but I am fairly sure that the above comment said "strictly speaking young white men ..." which what prompted my comment. Maybe I transposed the white in the first part of the paragraph to the final part. Not sure. Please do check, let me know if I was being a fool or not.

Darn I need to md5 hash parent comments I reply to now :-)

lifeisstillgood9y ago

Ok, this has been a horrific morning - that first paragraph should have had ??!!!! or similar at the end. Which I hope turns it into more of a challenge than the total dick move it looks like.

I should have gone with "citation please"

Anyway, I have written up a sort of overly long comment which won't fit in 2000 chars so I can't submit it and so this is the link

http://www.mikadosoftware.com/articles/HNdisaster

I apologise for thinking this was not a slippery slope and for not reviewing my text for misunderstanding. And I apologise for putting a piece of text up there that is so blatantly... horrific. It's awful to have that in my permanent history even if I know how it was a mistake.

As it says in the article I am going to go and have some reflection time.

What a cock up.

See you in a few weeks.

eli_gottlieb9y ago

>That is surprising.

Have you seen statistics to the contrary? Why would it be surprising if you don't have an informed prior in the opposite direction?

lifeisstillgood9y ago

My prior is that that race has no influence on behaviour therefore I would assume no difference between rape tendency (indeed any criminal tendency) amount young men of different races.

As in I am surprised that the parent claimed ... err the parent I replied too seems to have vanished and my comment is looking pretty weird

Anyway from memory he raised that young white men had a higher rape stats than any other group - I was surprised by the inclusion of race in that

weberc29y ago

I agree that this should be a possibility, while I think the odds that millions of law enforcement officers and criminal justice faculty would spontaneously conspire against a particular race, I also think it's more likely than some genetic predisposition toward crime (i.e., the first option).

dang9y ago

There are so many places where this thread took a dismal turn that I suppose it's hopeless to even try pruning it, but since this is one of those places, we detached it from https://news.ycombinator.com/item?id=13005641 and marked it off-topic.

1 more reply

zepto9y ago

Nobody is talking about a conspiracy - racism is about bias.

weberc29y ago

Fair enough. It would be unlikely that millions of police would be spontaneously biased against African Americans, particularly when you factor in that those officers most likely to use excessive force against African Americans are themselves African American.

1 more reply

Retric9y ago

Now, this is his first offence, but a young man is found high on PCB walking down the street hitting cars with a baseball bat. It takes five cops and a significant struggle to arrest him.

Having read that, what would you assume is race and economic background is?

After criminal proceedings he was sentenced to community service.

Now, what would you assume is race and economic background is?

PS: Bias is insidious and really hard to control for.

striking9y ago

I'd assume he's a poor white man, from the very beginning.

White, because most people in the US are white.

1 more reply

weberc29y ago

I wouldn't make assumptions, but I don't understand your point.

1 more reply

alvarosm9y ago

This is how it should be: "Max Profit. The most profitable, since there are no constraints. But the two groups have different thresholds, meaning they are held to different standards."

Here's the big fallacy: "the two groups have different thresholds, meaning they are held to different standards." They are not held to different standards because they're different groups, but because of other reasons that indicate different loan default rates. So you cannot call this "discrimination". This is how things should be.

alvarosm9y ago

Any SJW care to explain why I'm wrong instead of downvoting? thanks! this is a lot of fun :D

dang9y ago

Since you've ignored our repeated requests to stop breaking the HN guidelines, we've banned your account.

If you don't want to be banned, you're welcome to email hn@ycombinator.com. We're happy to unban people if they give us reason to believe they'll only post civil, substantive comments in the future.

1 more reply

j / k navigate · click thread line to collapse

168 comments

mvindahl9y ago

hajile9y ago

Collectivism is just fine when you are the one on the receiving end. Either we set absolute lines of discrimination across all industries or we disallow it entirely.

tptacek9y ago

2 more replies

tptacek9y ago

That's the computer science problem being worked on here.

tjic9y ago

Please don't say "we".

Not everyone shares your politics.

6 more replies

yummyfajitas9y ago

Go play with the simulation to see. The various fairness criteria all achieve lower than maximal profits.

2 more replies

BickNowstrom9y ago

1 more reply

losvedir9y ago

I don't get the impression that most people are afraid of irrelevant correlations, but rather relevant ones that nevertheless disproportionately affect certain populations.

2 more replies

reader50009y ago

1 more reply

adrianN9y ago

You're assuming that human brains are good at differentiating between irrelevant correlation and causation. I don't think that's right.

Scea919y ago

Depends on the model. Decision trees for example are white box models. In scenarios like this white box models are always preferred.

pdkl959y ago

> The difference with machine learning is that the model isn't designed by humans

Even if that was true[1], where are you getting the data from? Choosing which types of data to use is just as important as the model.

[1] as others have already pointed out, it depends on the technique/etc

cwilkes9y ago

However with a lot of ML classes we are told to look for the simpler explanation which would be nationality in this case, in the absence of collection of the real reasons.

Not sure how that can be gotten around. I'll read the paper.

theBobBob9y ago

Interestingly the EU has banned insurance underwriting based on gender even with all the actuarial data backing it up. Age is still fair game though.

reader50009y ago

5 more replies

angry-hacker9y ago

Which made the insurance for women, who are traditionally safer drivers because not taking unless risks, go up. Well facts can't be racist but they are at the same time.

2 more replies

jtbigwoo9y ago

Since the 80's, the state of Montana has banned insurance underwriting based on gender. There have been attempts to reverse that law, but none have passed.

My male Montana cousins always had slightly cheaper insurance than I did, but my female Montana cousins more than made up for it.

tomp9y ago

Is it just driving insurance or also other kinds (e.g. health insurance, where women might be paying more than men)?

beejiu9y ago

Discrimination is far more nuanced than it first appears.

Scea919y ago

>[...] concept called equal opportunity. Here, the constraint is that of the people who can pay back a loan, the same fraction in each group should actually be granted a loan.

This does not seem fair to me, because if this is applied then your race (group) would determine your credit score threshold which feels discriminatory to me.

I feel that, by definition, it is not discriminatory only if none of your attributes that you could be discriminated against (gender, race, ...) are taken into account at all.

Maybe the concept of 'equal opportunity' is just some compromise between discrimination and making less informed decisions.

tomp9y ago

I think it's useful to figure out why there are discrepancies between two groups.

corey_moncure9y ago

> The data tells you that a black person is more likely to be a criminal than a white person. There are two possible reasons for this:

3 more replies

SerLava9y ago

This is pretty much the only way to help people who were born addicted to meth inside a trailer in the mountains who also happen to be blessed with being white.

mspeedy9y ago

[1] https://www.propublica.org/article/machine-bias-risk-assessm...

1 more reply

deong9y ago

tboyd479y ago

You'd be surprised. If you've ever racially identified on a standardized test, guess what, you've just set the is_black feature.

ascendingPig9y ago

Why should it not be fair? Under this constraint, the perfect equal opportunity model would be a model that accurately represents who will pay back a loan.

raverbashing9y ago

Well, "theoretically" you're right

In practice, the first variables to be used for classification are the ones that have a biggest effect. Then you're going to use them (from order of importance) as eliminatory or classificatory

> because if this is applied then your race (group) would determine your credit score threshold which feels discriminatory to me.

Dowwie9y ago

For more information about "ethics and algorithms", read: "Weapons of Math Destruction" [1] or at least listen to the EconTalk podcast between the author and host Russ Roberts [2]

[1] https://www.amazon.com/Weapons-Math-Destruction-Increases-In...

[2] http://www.econtalk.org/archives/2016/10/cathy_oneil_on_1.ht...

icebraining9y ago

Previous discussion on the podcast episode: https://news.ycombinator.com/item?id=12642432

denzil_correa9y ago

Personally, I find this as the key outcome here. The accountability is on the people/systems who make the decision and that leads to an appropriate incentive. Win-win as a start.

sergioisidoro9y ago

I've played around with this concept, trying to replicate some previous work [1].

It's a sensitive topic, because sometimes we're actually tampering with the data, trying to eliminate known human or selection biases.

And one of the things that I learned (in [1]) is that, even if you're carfull with the features you use, you might still have a negative effect.

[1] https://github.com/sergioisidoro/aequalis

denzil_correa9y ago

> And one of the things that I learned (in [1]) is that, even if you're carfull with the features you use, you might still have a negative effect.

cs28189y ago

What strikes me as interesting in this example is that it seems to assume the classifier is run once and that its decisions have no effect on future decisions.

Is there an area of ML research that considers this kind of concept?

zitterbewegung9y ago

I don't think there is though it may fall under ethical considerations of AI or economics of AI. Would be hard to do a study since AI has really only been around for less than 50 years.

amelius9y ago

But how can we find out if a company uses ML in a non-discriminating way? If we cannot see it, and check it, then there is no incentive for companies to use it.

My guess is that at most companies spending time on making an algorithm non-discriminating will be viewed as a waste of time and money.

riskable9y ago

I work for a big bank... As long as people freak out when banks are found to be discriminating they will do their best to not discriminate.

Having worked at big banks for over a decade now I am 90% certain that discrimination by banks at this point is primarily due to carelessness.

tptacek9y ago

1 more reply

BickNowstrom9y ago

Like the line in Fight Club [1] about recalls it is a simple formula:

Only if the cost * probability of a PR disaster outweighs the cost of a poor credit risk model, is a bank economically incentivized to not discriminate.

It also depends on how you (mathematically) define "fairness" [3]. You can define fairness in ways that still allow you to discriminate.

[2] https://www.youtube.com/watch?v=furfdqtdAvc

[3] https://algorithmicfairness.wordpress.com/2016/09/26/on-the-... https://arxiv.org/abs/1609.07236

yummyfajitas9y ago

The problem here is that non-discrimination costs you money (in practice: lots!), because it requires you to issue bad loans to non-Asian minorities.

So no matter which definition of "fair" you choose, some intrepid reporter can choose a different definition and then write a clickbait article calling you racist.

denzil_correa9y ago

> But how can we find out if a company uses ML in a non-discriminating way?

[0] https://en.wikipedia.org/wiki/Disparate_impact#The_80.25_rul...

PeterisP9y ago

chishaku9y ago

I think companies like Google need a team akin to NYT's public editor. Obviously not perfect, but better than nothing.

d--b9y ago

Nice to see that the debate has reached the ears of the main people working on this field.

icebraining9y ago

I'm not sure that's clear. There are actually two ways to achieve the outcome we want; tweaking the model or changing the inputs.

But why them? Shouldn't we all contribute instead, hopefully using a fair system for assessing how much should each pay?

Instead of tweaking the model, you can change the inputs by providing a state-backed guarantee to the underprivileged groups. Isn't it more fair overall?

d--b9y ago

Specifically, a machine learning will produce different numbers for 2 individuals with the exact same characteristics except the race. And that is the problem that needs to be addressed.

adrianN9y ago

denzil_correa9y ago

> What is important to note here is that we need to tweak the mathematical model to the culture we want to achieve

Good point. Systems should work for people, not the other way round.

1 more reply

beejiu9y ago

Presumably it means unlawful or unethical discrimination, since classification without discrimination doesn't make sense.

tptacek9y ago

The key thing to understand is that we're talking about discrimination that is usually unintended and unexpected by the designers of these systems.

tjic9y ago

> despite the fact that the status quo is informed in large part by structural injustices.

This is presented as if it's an unambiguous fact, when it's largely a political stance.

2 more replies

pidge9y ago

zimzim9y ago

lifeisstillgood9y ago

sorry, just checking - young black men are statisically less likely to rape than equivalent white (young men)

That is surprising. And given this discussion about only wanting stats that have a valid explanation as well as fitting the facts, is interesting.

sctb9y ago

We would ask that you please don't introduce this into the discussion out of nowhere, it's just off-topic. See also: https://news.ycombinator.com/item?id=13007847.

We detached this subthread from https://news.ycombinator.com/item?id=13006361.

lifeisstillgood9y ago

I did not introduce it out of nowhere it was a reference to the parent comment, which did seem relevant to the thread

It is a sensitive subject I suspect

And I cannot find the original parent comment in the discussion and I look like a total dick with my comment coming out of nowhere.

I will try and understand what moved where and get back to you

Aha:

Darn I need to md5 hash parent comments I reply to now :-)

lifeisstillgood9y ago

Ok, this has been a horrific morning - that first paragraph should have had ??!!!! or similar at the end. Which I hope turns it into more of a challenge than the total dick move it looks like.

I should have gone with "citation please"

Anyway, I have written up a sort of overly long comment which won't fit in 2000 chars so I can't submit it and so this is the link

http://www.mikadosoftware.com/articles/HNdisaster

As it says in the article I am going to go and have some reflection time.

What a cock up.

See you in a few weeks.

eli_gottlieb9y ago

>That is surprising.

Have you seen statistics to the contrary? Why would it be surprising if you don't have an informed prior in the opposite direction?

lifeisstillgood9y ago

My prior is that that race has no influence on behaviour therefore I would assume no difference between rape tendency (indeed any criminal tendency) amount young men of different races.

As in I am surprised that the parent claimed ... err the parent I replied too seems to have vanished and my comment is looking pretty weird

Anyway from memory he raised that young white men had a higher rape stats than any other group - I was surprised by the inclusion of race in that

weberc29y ago

dang9y ago

1 more reply

zepto9y ago

Nobody is talking about a conspiracy - racism is about bias.

weberc29y ago

1 more reply

Retric9y ago

Now, this is his first offence, but a young man is found high on PCB walking down the street hitting cars with a baseball bat. It takes five cops and a significant struggle to arrest him.

Having read that, what would you assume is race and economic background is?

After criminal proceedings he was sentenced to community service.

Now, what would you assume is race and economic background is?

PS: Bias is insidious and really hard to control for.

striking9y ago

I'd assume he's a poor white man, from the very beginning.

White, because most people in the US are white.

1 more reply

weberc29y ago

I wouldn't make assumptions, but I don't understand your point.

1 more reply

alvarosm9y ago

This is how it should be: "Max Profit. The most profitable, since there are no constraints. But the two groups have different thresholds, meaning they are held to different standards."

alvarosm9y ago

Any SJW care to explain why I'm wrong instead of downvoting? thanks! this is a lot of fun :D

dang9y ago

Since you've ignored our repeated requests to stop breaking the HN guidelines, we've banned your account.

If you don't want to be banned, you're welcome to email hn@ycombinator.com. We're happy to unban people if they give us reason to believe they'll only post civil, substantive comments in the future.

1 more reply

j / k navigate · click thread line to collapse