Notes on AI Bias (opens in new tab)

(ben-evans.com)

146 pointsandrevoget7y ago119 comments

119 comments

> Until about 2013, If you wanted to make a software system that could, say, recognise a cat in a photo, you would write logical steps. You’d make something that looked for edges in an image, and an eye detector, and a texture analyser for fur, and try to count legs, and so on, and you’d bolt them all together...

I'm doing a lot of such algorithms (well, not for images). Does someone know if such algorithms have a name? I'm calling it "heuristics" and I think it falls under "AI".

voidifremoved7y ago

A while ago, Google photos autogenerated a video for me from my photo library. It was about a minute long, stitched together dozens of photos, called "dog video", and with a horrifying yapping dog soundtrack.

Every single photo was of a cat.

I have to say I was humbled by the amount of human and computing power that had gone into developing this system over the years, that could achieve such a complicated, impressive technical feat, without requiring any effort or money on my part, and yet also be 100% wrong.

Bartweiss7y ago

> be 100% wrong

This really is quite impressive. It's rare for humans to do worse than random guessing on tasks, and they almost never do much worse. There's something almost charming about the ability of AI to put real effort into actively avoiding correct answers.

1 more reply

jofer7y ago

For the specific example given there, I'd say it's most often called feature engineering. I'd also argue that it's a lot more necessary than most people think, but I'm probably just being stodgy and am biased by working in relatively narrow domains.

Calling it "feature engineering" implies it's still being fed into some sort of trained classifier to make the final decision, though.

What you're describing of your own work might better fall under the broad category of an "expert system".

piker7y ago

Expert systems? https://en.wikipedia.org/wiki/Expert_system

twa9277y ago

I think expert systems consist of a "rule engine" where rules can be added dynamically?

microtherion7y ago

I kind of like the tongue in cheek moniker "GOFAI" (Good, Old-Fashioned AI), though that is applied more to symbolic AI https://www.cs.swarthmore.edu/~eroberts/cs91/projects/ethics...

AJRF7y ago

Maybe image segmentation? In my AI class it was referred to as image segmentation and edge detection (interchangeably)

https://en.wikipedia.org/wiki/Image_segmentation

chobeat7y ago

I call these approaches: "there must be OpenCV in there somewhere"

bhl7y ago

Heuristic algorithms using hand-crafted features.

frankbreetz7y ago

First order logic rule-based system

mv47y ago

this is similar to bag-of-words models

https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer...

layoutIfNeeded7y ago

I would call it “classical” machine learning.

twa9277y ago

Hmm, I think there's no "machine learning" here. There's a human hard-coding some thought process, using mostly some simple statistics/thresholds to e.g. define what a "fur texture" looks like.

1 more reply

fvdessen7y ago

> Since Amazon’s current employee base skews male, the examples of ‘successful hires’ also, mechanistically, skewed male and so, therefore, did this system’s selection of resumés. Amazon spotted this and the system was never put into production.

Couldn't they have retrained the system with a 50/50 mix of males / females resumes ? Or restrict the use of the algorithm to sort male resumes ? Or maybe resumes don't actually correlate at all with success in Amazon ...

DuskStar7y ago

One situation I could see leading to this result (Amazon cancelling their resume filtering software with the excuse that it 'skewed male') is that

1. The AI system accurately predicted employee success across both genders

AND

2. The AI system predicted that women would do worse than men

That's politically embarrassing and something that you can't necessarily 'fix' by improving the system. (see: all the 'will this person commit a crime if let out on parole' systems that end up accurately discriminating based on race)

This isn't to say that women are worse engineers than men, or anything of that sort - only that the applicant pool to Amazon was skewed, or women were treated worse in the workplace and thus performed worse, or a dozen other possible causes. (And only in this hypothetical scenario! I have no inside info from Amazon!)

gizmo6867y ago

Your example is quite possible, particularly at an organazation that would be embarrased by such a result.

Assume that the ability curve of male applicants and female applicants are identical; that the majority of applicants are male; and that Amazon wants to hire more females then would be expected given the portion of applicants that are female.

A natural way of accomplishing this goal is to give extra points to female applicants [0].

Due to selection bias, the ability curve of women within the population of Amazon engineers would skew lower then men within the population of Amazon engineers.

This is a special case of a more general phenomona. If you have signal S that is positivly correlated with a desired trait in the general population, and over select for S, you will find that S is negativly correlated within your population.

[0]. All proposals I have seen amount to either a good approximation of this or changing the applicant pool. And, by assumption, the latter is excluded.

Bartweiss7y ago

In this case, it appears to instead be a matter of journalists focusing on totally the wrong aspect of a story for more drama. Buried deep in the original Reuters piece is this offhand mention:

> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.

Apparently the recommendation system really did create gender bias, neither inherited from real differences nor from replicated human biases. (It looks like an issue with mismatched training data and task.) But that initial bias was found and corrected (2015) more than a year before the project was cancelled (2017) for providing "random" results. I think this is the most extreme case of algorithmic bias I've ever seen, but also the least commonly relevant; Amazon appears to have built a model which contained almost no rules except sexism, and scrapped it for not knowing anything worthwhile.

https://www.reuters.com/article/us-amazon-com-jobs-automatio...

1 more reply

lotu7y ago

This is feels like an elephant in the room when it comes to AI bias. We develop an AI that accurately predicts outcomes and discover it is biased, then instead of asking if maybe this means our current system is deeply biased and needs to be changed, we say, "don't use the AI; keep using the people who might or might not be biased but we don't know because we can't measure it in the way an AI can be measured."

If it isn't acceptable to use an AI to create biased outcomes how is it acceptable to use people to create the the same outcomes. AI decision making can be examined and tuned in ways that people cannot.

1 more reply

btilly7y ago

One major problem.

The parole software was NOT being fed data for "will this person commit another crime". It was being fed data for, "will this person be a suspect for another crime".

The significant difference is that selective enforcement biases the data that it was trained on. Said selective enforcement has multiple causes, including the fact that heavier patrolling in black neighborhoods makes catching crimes more likely.

The size of the selective enforcement bias shows in a number of ways. For example consider drugs. In surveys, the usage of illegal drugs is the same in blacks and whites. And yet 6 times as many blacks are arrested for using illegal drugs as whites.

1 more reply

munchbunny7y ago

I think this retelling of the story is over-simplified. It's a compelling story, but I don't know any competent engineers who give up on a whole project because of one setback. If this system never saw production use, it was because it's still not ready, or there were many other issues that aren't mentioned that led the team to give up, or because political winds shifted. Amazon is famous for killing projects quickly.

duxup7y ago

It does make you wonder how much AI will be .. AI and how much guidance for desired outcomes humans will give it.

Humans are pretty happy to create nonsensical results if it fits their goals... especially if it befits them. I wonder if with AI we do that to the point that it is somewhat irrelevant.

manfredo7y ago

The whole problem with allegations of AI bias is that people often point to disparities of outcome as proof of bias. The reality is that there are plenty of disparities on outcome regardless of bias, and the allegations of bias and attempt to rectify the alleged bias is another vector for the introduction of bias.

deogeo7y ago

Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita. And wouldn't the number of unsuccessful hires also skew male?

jerf7y ago

"Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita."

To some extent, you're bringing in your human bias to prefer human biases when you make that statement. We humans have a hierarchy of important attributes, and for various reasons believe race and gender are more important than eye color or height. But the machine learning algorithm just gets a multidimensional point in hyperspace. It doesn't, a priori, "know" that it needs to do a "per capita" adjustment based on FIELD_1 any more than it knows it needs to do a per capita adjustment on FIELD_2. And you can't "adjust" on all the fields because that'll just cancel out.

We are also in the weird position of wanting the machine to do adjustments based on FIELD_1, but without us having to actually admit to ourselves that we're doing it. From a technical perspective, probably the best answer is to do a straight-up training based on the data, then have an cleanly-separated after-the-fact cleanup process to perform whatever social adjustments it is we want on the outcome. But nobody is willing to admit that's what we want, and to put those adjustments down on paper in the form of code, because the instant they're concrete, pretty much everybody is going to decide they're wrong, and no two people are going to agree on the manner in which they are wrong, and an epic, national-front-page-news shitstorm will ensue. So here we are, trying to make adjustments without making adjustments, or, alternatively, trying to make adjustments in a place where we can blame the AI rather than humans.

(The ironic thing is that because we can't admit what we're trying to do, we're going to end up doing a really poor job of it. Tools will be applied haphazardly, the results can't be measured except very grossly at the very end of the process, and the goals won't be obtained and the system is always going to be quirky and weird. If we could clearly declare what it is we actually wanted, it would be fairly easy to get it from the AIs.)

Bartweiss7y ago

The basic "resumes skewed male so the algorithm did too" explanation appears to be incorrect. But it's found in the original Reuters story and most derived stories, and finding it here implies it's reached the level of urban legend.

Going by the details of the Reuters story and several others, it appears that what actually happened was a training/task mismatch. Amazon wanted an algorithm to do resume discovery, which recruiters would run and get quality predictions as they viewed resumes. But they trained it on resume results, giving it past resumes which had been submitted to Amazon and telling it to seek similar resumes. None of the stories make it clear if there even was negative training data; it looks like the tool was simply told to compute degree-of-similarity to past inputs, and possibly told to prioritize resumes which were ultimately hired.

As a result, the tool was trying to convert a relatively gender-neutral pool (resumes found online) to a skewed one (Amazon applicant resumes), and did so by weighting gendered terms. It also seems to have underweighted technical terms, failing to appreciate them as mandatory or strictly position-specific.

The developers were sufficiently aware of that to catch and correct the known gender biases (e.g. devaluing women's colleges or the literal word "women's"), but were scared there were other uncaught biases. And the results were apparently terrible all around, so the tool was scrapped. Which is pretty much what you'd expect from something trained on exclusively positive, sample-biased examples. The story has been seriously distorted, but the real plan also seems terrible...

theoh7y ago

Consider the possibility that the (pre-AI system) probability of success for a female applicant is the same as the probability of success of a male applicant. You could make a "per capita" quota as a kind of goal. That's not a problem, but how would you make sure the quota was met?

The typical AI system doesn't work on the basis of selecting candidates entirely at random, pro rata, in order to meet a quota. It works on the basis of criteria for success. One thing it might learn (unfortunately) is that most posts at the company are filled by men.

1 more reply

bumby7y ago

Or couldn't they provide data augmentation on the same samples to give the effect of a more diverse (and more populous) training set?

Using the blog's skin cancer example, couldn't the labelled images be augmented by altering the skin tones and adding these new examples to the training set?

It seems to me that some of the anomalous results discussed in the article are actually the result of poor model design or poor pre-processing data choices. We can't just throw anything to any ol' machine learning model and expect it to be magic

TheRealPomax7y ago

maybe, but this might also just be someone unwilling to commit to the sunk cost fallacy. You can spend time and money fixing it, or you can cut your losses and just stop trying to automate something that probably didn't need full automation to begin with.

Bartweiss7y ago

This story has been constantly misrepresented, because Reuters absolutely botched their initial report. Amazon was never building a tool to decide which interviewed candidates to hire, they were building a tool for discovering candidates. It was biased, but that gender bias wasn't the proximate reason for scrapping the tool.

As far as I can tell from later stories (e.g. 1, 2), what Amazon actually did was build a tool to show recruiters 'quality' predictions for all resumes, for instance as they scrolled LinkedIn. But they trained it on resumes submitted to Amazon for various positions, possibly also adding weight to resumes which produced hires.

In which case the problem is painfully obvious; the system effectively had no negative training data, and its positive examples (submitted resumes) didn't actually match the desired output (qualified resumes). It was computing degree of similarity between a gender-neutral-ish pool (resumes posted online) and a gender-skewed pool (resumes submitted to Amazon), and tried to make that conversion with whatever data was available - like devaluing resumes that mentioned women's colleges. (This wasn't just a proxy-variable thing, the model essentially learned to weight on gender.) Amazon's team apparently caught this issue and did the usual things like blinding on those words. But they were scared of uncaught factors; reading between the lines, they were unable to "detrain" biases like neural nets do because their dataset and task didn't match.

Ultimately, the tool was apparently scrapped because it made selections "almost at random". Which, again, isn't exactly surprising in light of the absolutely bonkers choice of training examples.

[1] https://www.aclu.org/blog/womens-rights/womens-rights-workpl...

[2] https://www.ml.cmu.edu/news/news-archive/2018/october/amazon...

eanzenberg7y ago

That wouldn't matter if the KPI (worker performance) being predicted, which is inherently biased as well, was distributed differently among the balanced pool of applicants.

turtlecloud7y ago

Just remove the gender/sex as variables for the AI and maybe name too. Preprocess the resumes to remove them. Now you remove the majority of gender bias for the AI.

gizmo6867y ago

AI is really good at infering information. If gender is a real signal, it would be very difficult to filter the input such that it is not making a determination by what could be reffered to inferred gender.

gambler7y ago

>The most obvious and immediately concerning place that this issue can be manifested is in human diversity.

I swear, when someone starts building autonomous killer robots, the first set of concerned articles will probably be asking whether robots were properly trained to target all genders and races with equal accuracy. This is not a sensible way to approach AI ethics.

>It was recently reported that Amazon had tried building a machine learning system to screen resumés for recruitment. Since Amazon’s current employee base skews male, the examples of ‘successful hires’ also, mechanistically, skewed male and so, therefore, did this system’s selection of resumés.

There is nothing "mechanistic" about this. It depends on how you select sample resumes and how you split them between "good" and "bad" labels.

I worked on a similar thing as an "encouraged" side-project at a certain company. Except I realized from day 1 that using AI on resumes is a bad idea and aimed to show this with data. My model was aiming to detect people who will quit or get fired within first 6 month (with the intent of lowering them in priority for interviews, supposedly). It miraculously achieved 85% accuracy... by figuring out how to detect summer interns.

Framing this problem as "bias" and especially hyper-focusing everyone's attention on diversity aspect of it is extremely irresponsible. (I'm not saying that's what the author is doing, but that's definitely what's being done at large.) Fundamentally, there are significant higher-level problems with using statistical ML models for things like hiring or crime prediction.

Bartweiss7y ago

That intern story is excellent; I'm adding it to my bank of "weird AI tricks" like pausing Tetris to avoid losing.

More topically, you're quite right to object to that Amazon reference. As far as I can tell, the real story is even worse than mislabeling. Amazon devs wanted a system to spot candidates in resume banks, so they trained it to recognize resumes similar to the ones submitted to Amazon in the past. The entire dataset was 'positive', and output degrees of similarity instead of classifications. Amazon applicants are mostly male while the pool was presumably 50/50, so that was learned as an element of "Amazon-candidate-ness".

That's also an interesting story, but from the first publication (in Reuters) it's been framed as an uneven base rate 'inevitably/predictably/mechanistically' producing a biased result. Which is not only untrue but downright backwards, since it implies that the rate in the general data is what matters, rather than the relative rate between samples and positive classifications. It's yet another variant of the mammogram base rates question, and I wish people would stop trying to reinforce the incorrect answer to that.

john-radio7y ago

> That intern story is excellent; I'm adding it to my bank of "weird AI tricks" like pausing Tetris to avoid losing.

Post your bank! Let's be like Magnus Carlson and occasionally ask ourselves, "What would DeepMind do?"

1 more reply

moomin7y ago

You’ve just read a long article that covers many aspects and zeroed in on your own hobby horse. You say there’s significantly bigger issues, but you don’t actually talk about that. Instead you talk about the thing you just said you didn’t think people should be talking about. There’s some serious projection going on here.

joshuamorton7y ago

>Framing this problem as "bias"

Except that's exactly what it is. Much as your model was biased against interns.

> and especially hyper-focusing everyone's attention on diversity aspect of it is extremely irresponsible.

Why? Pointing out a specific and concrete harm badly designed ML models cause is irresponsible? Just because the same kind of methodological flaw can cause other harms its irresponsible to use a motivating example?

gambler7y ago

>Why? Pointing out a specific and concrete harm badly designed ML models cause is irresponsible?

In my opinion, yes, if it leads most readers to misjudge some fundamental properties of the problem as a whole. Again, I'm not saying this article is guilty, but most are.

1 more reply

westoncb7y ago

> Except that's exactly what it is.

Using the term 'bias' has certain political motivations behind it. It's not about the term being technically untrue as it is about the term being non-neutral. For instance, here are some definitions of 'bias' I just grabbed from American Heritage:

"A preference or an inclination, especially one that inhibits impartial judgment."

"An unfair act or policy stemming from prejudice."

"A statistical sampling or testing error caused by systematically favoring some outcomes over others."

The ML model does not have a preference, inclination, or prejudice relating to interns, except insofar as we anthropomorphize it to have them. What does using a word suggesting that add?

A more neutral account of what's going on is along the lines: It's easy to accidentally train ML models so that they will make systematic errors. (Among those errors is the possibility for it to exhibit behavior resembling prejudice.)

1 more reply

jakelazaroff7y ago

> There is nothing "mechanistic" about this. It depends on how you select sample resumes and how you split them between "good" and "bad" labels.

Isn't that what the article is trying to say, though? That your model can only be as accurate as your data set… and that even then, you have to be very careful to make sure it's not inferring patterns from entirely unrelated information?

bduerst7y ago

If you train an AI using data from a system that already has certain biases, then the AI is going to replicate those same systemic biases in it's own predictions. It follows the "garbage in, garbage out" idiom.

Curiously though, did you compare the non-hire (full time) rates of interns vs fire rates of non-interns?

6cd6beb7y ago

>If you train an AI using data from a system that already has certain biases, then the AI is going to replicate those same systemic biases

That's not what happened in the example at all. The example company isn't biased against summer interns, "who stops working after x time" was just a bad question.

The comment you're replying to can boil down to "do you want a monkey's paw solving your problem? If so then AI may be for you"

Or perhaps "stop pretending you're ever going to get ethics or empathy out of a computer"

1 more reply

gambler7y ago

>did you compare the non-hire (full time) rates of interns vs fire rates of non-interns?

Not sure I understand the question. IIRC, the way data was setup there was no way to tell why an intern stopped working for the company, because for all interns "reason code" for separation was the same.

bumby7y ago

Isn't this the one of the major concerns of ML, the bias-variance trade-off? By creating a low-variance model, we create a highly biased model that misses some of the important feature relationships necessary to create a truly generalized model?

Meaning, isn't it prudent to spending time on this issue?

BickNowstrom7y ago

You are conflating bias (error) with bias (fairness).

1 more reply

lonnyk7y ago

I get the point, but why didn't you just exclude intern resumes from the training data? Do you still suspect a skewed result?

gambler7y ago

>I get the point, but why didn't you just exclude intern resumes from the training data?

That was the logical next step and we started on that, but it required exporting more historic data out of the HR system and filtering out anyone who started as an intern as well. Sounds simple, but in practice it's anything but. Just for the reference, data extraction, cleaning and filtering in that project took at least an order of magnitude more time than anything related to machine learning.

The project eventually lost steam and got abandoned.

>Do you still suspect a skewed result?

Absolutely. My personal intuition is that there is very little correlation between resumes and candidate quality. If that is true, any seemingly accurate predictions would be the result of a similar problem. Testing this hypothesis was a large portion of why I agreed to work on the project in the first place.

2 more replies

chobeat7y ago

I've just added this post to my reading list. I share it if anybody is interested in this and similar topics: https://github.com/chobeat/awesome-critical-tech-reading-lis...

Zolomon7y ago

There is a course on this at New York University: https://dataresponsibly.github.io/courses/spring19/

killjoywashere7y ago

I actually think this is where ML really shines. You can pick things apart. Sure, you might need carefully designed experiments, but you can subtract "female" from the resume and look for other data that cause some trained machine to activate, like patterns of word choice, etc. This is akin to the Go players learning from Alpha Go. It's actually a richly rewarding investigation for those of us who have done it. To discover a whole class of failure modes, that's success! And, unlike courts of law, the the process is much more efficient, because you don't have to contend with a defendant appealing to matters of intent or the emotions of a jury.

Someone7y ago

Short way to describe the problem: we want to build systems that detect causation, but statistical models can only detect correlation.

wongarsu7y ago

That's not entirely true: it's hard to show causation, but with enough data you can. If A correlates with B you know that either A causes B, B causes A, some C causes both A and B, or the correlation is a coincidence. If you have the data to rule out 3 of those the remaining possibility is the causation.

Someone7y ago

So, how do you, for example, rule out “some C causes both A and B“, if you may not even know of the existence of C?

More importantly, the only way to really show causation is by positing a mechanism.

eanzenberg7y ago

>>Now, suppose that 75% of the bad turbines use a Siemens sensor and only 12% of the good turbines use one (and suppose this has no connection to the failure). The system will build a model to spot turbines with Siemens sensors. Oops.

Given a statistically large enough sample, 2 outcomes: 1) The Siemens sensor actually is at fault. 2) The Siemens sensor is a part of a larger system, which is different in non-Siemens turbines, and that system is failing.

Either way, the model prediction on turbine failures is enhanced with that Siemens feature. But to even get to this granularity, you are diving into model explainability, or what features were important for each prediction. Here, you try to understand the black-box to find reasons for particular input->output.

munificent7y ago

I think you assume here that the historical effects that led to Siemens sensors correlating with failure will continue to be true in the future. And I think that is the key fallacy that makes AI bias a problem.

We aren't just looking for patterns. We are looking for patterns so that we can take action and affect the future. If the patterns, which are real enough in the historical data, don't correctly predict the impact of a choice, then they are anti-helpful bias.

For example, it may be that the company bought Siemens sensors years ago and then switched to another brand later. Unsurprisingly, older turbines fail more than newer ones. So, really, it's age that is the causative factor and the concrete action you want to take is to pay closer attention to older turbines. Even though the correlation to Siemens is real, if the action you take is "replace all the Seimens sensors with another brand", that won't make those old turbines work any better.

In other words, understanding data doesn't just mean "see which bits are correlated with which other bots". In order to be useful, we need to understand which changes to those bits in the future will be correlated with which desired outcomes. Anything less than that and you don't yet have information, just data.

MiroF7y ago

> I think you assume here that the historical effects that led to Siemens sensors correlating with failure will continue to be true in the future.

Yes, AI systems presume induction to be true. But so does... uh, science and most other things we do?

1 more reply

throwawaymath7y ago

No, that's incorrect. Note the part of your quote which says, "and suppose this has no connection to the failure."

The point is the Siemens sensor is a superfluous correlation with turbine failure, because the underlying dataset is biased towards Siemens sensors. The scenario suggested by the author is one in which your turbine failure dataset does not match reality.

No amount of sample enlargement will correct sample bias. You have a variable which is disproportionately represented in your underlying dataset despite being independent from a collection of variables correlated to failure, and the algorithm is learning that one instead.

Real world ways this is plausible and cannot be corrected by increased sampling:

1. Your telemetry data is accurate, but your logging service providing that data is faulty and only consumes data from a subset of meaningful publishers.

2. Whoever provided this dataset fat fingered a SQL query which joined too few tables including the sensor vendors, but correctly returned only the failing turbines.

3. Your data has (unnormalized) duplicates, because more than one system is providing telemetry data for Siemens sensors without the older systems being retired.

4. You use mostly Siemens sensors, and simply didn't correct for this in your sample.

DuskStar7y ago

Just to point out:

1. Not a spurious correlation - Siemens sensors are in fact associated with increased failure rates in the dataset and if you continue to sample data with the same methodology this correlation will continue. You need to fix your data collection methodology, but it's not a spurious correlation.

2. See #1.

3. See #1.

4. The original problem statement said that a low percentage of unfailed turbines used Siemens sensors, and a high percentage of failed turbines used Siemens sensors. So 'you use mostly Siemens sensors' would imply that most of your turbines have failed, which seems a little unlikely to me.

TheCoelacanth7y ago

Only if your test data is free of sample bias.

Given how incredibly hard it is to avoid sample bias, you can't take it for granted that your training data doesn't have any sample bias.

DuskStar7y ago

If the sample is "all the gas turbines I own", I don't particularly CARE about the bias...

3 more replies

jgon7y ago

This quote stands out to me:

"just as a dog is much better at finding drugs than people, but you wouldn’t convict someone on a dog’s evidence. And dogs are much more intelligent than any machine learning."

Because in my head I followed it with the sentence "but we're all confident that we will have dogs driving our cars in about 5 years." Food for thought for sure.

dmix7y ago

So dogs are better than humans at detecting drugs because they have a better sense of smell than can penetrate packaging. What does that have to do with technology being better/worse than humans at driving, exactly?

They didn't say dogs were better than technology at solving problems, in any sort of general sense.

j / k navigate · click thread line to collapse

119 comments

twa9277y ago

I'm doing a lot of such algorithms (well, not for images). Does someone know if such algorithms have a name? I'm calling it "heuristics" and I think it falls under "AI".

voidifremoved7y ago

Every single photo was of a cat.

Bartweiss7y ago

> be 100% wrong

1 more reply

jofer7y ago

Calling it "feature engineering" implies it's still being fed into some sort of trained classifier to make the final decision, though.

What you're describing of your own work might better fall under the broad category of an "expert system".

piker7y ago

Expert systems? https://en.wikipedia.org/wiki/Expert_system

twa9277y ago

I think expert systems consist of a "rule engine" where rules can be added dynamically?

microtherion7y ago

I kind of like the tongue in cheek moniker "GOFAI" (Good, Old-Fashioned AI), though that is applied more to symbolic AI https://www.cs.swarthmore.edu/~eroberts/cs91/projects/ethics...

AJRF7y ago

Maybe image segmentation? In my AI class it was referred to as image segmentation and edge detection (interchangeably)

https://en.wikipedia.org/wiki/Image_segmentation

chobeat7y ago

I call these approaches: "there must be OpenCV in there somewhere"

bhl7y ago

Heuristic algorithms using hand-crafted features.

frankbreetz7y ago

First order logic rule-based system

mv47y ago

this is similar to bag-of-words models

https://en.wikipedia.org/wiki/Bag-of-words_model_in_computer...

layoutIfNeeded7y ago

I would call it “classical” machine learning.

twa9277y ago

Hmm, I think there's no "machine learning" here. There's a human hard-coding some thought process, using mostly some simple statistics/thresholds to e.g. define what a "fur texture" looks like.

1 more reply

fvdessen7y ago

DuskStar7y ago

One situation I could see leading to this result (Amazon cancelling their resume filtering software with the excuse that it 'skewed male') is that

1. The AI system accurately predicted employee success across both genders

AND

2. The AI system predicted that women would do worse than men

gizmo6867y ago

Your example is quite possible, particularly at an organazation that would be embarrased by such a result.

A natural way of accomplishing this goal is to give extra points to female applicants [0].

Due to selection bias, the ability curve of women within the population of Amazon engineers would skew lower then men within the population of Amazon engineers.

[0]. All proposals I have seen amount to either a good approximation of this or changing the applicant pool. And, by assumption, the latter is excluded.

Bartweiss7y ago

In this case, it appears to instead be a matter of journalists focusing on totally the wrong aspect of a story for more drama. Buried deep in the original Reuters piece is this offhand mention:

https://www.reuters.com/article/us-amazon-com-jobs-automatio...

1 more reply

lotu7y ago

1 more reply

btilly7y ago

One major problem.

The parole software was NOT being fed data for "will this person commit another crime". It was being fed data for, "will this person be a suspect for another crime".

1 more reply

munchbunny7y ago

duxup7y ago

It does make you wonder how much AI will be .. AI and how much guidance for desired outcomes humans will give it.

Humans are pretty happy to create nonsensical results if it fits their goals... especially if it befits them. I wonder if with AI we do that to the point that it is somewhat irrelevant.

manfredo7y ago

deogeo7y ago

Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita. And wouldn't the number of unsuccessful hires also skew male?

jerf7y ago

"Sounds like an extraordinarily poor AI system if it depends on absolute numbers, and not per capita."

Bartweiss7y ago

theoh7y ago

1 more reply

bumby7y ago

Or couldn't they provide data augmentation on the same samples to give the effect of a more diverse (and more populous) training set?

Using the blog's skin cancer example, couldn't the labelled images be augmented by altering the skin tones and adding these new examples to the training set?

TheRealPomax7y ago

Bartweiss7y ago

Ultimately, the tool was apparently scrapped because it made selections "almost at random". Which, again, isn't exactly surprising in light of the absolutely bonkers choice of training examples.

[1] https://www.aclu.org/blog/womens-rights/womens-rights-workpl...

[2] https://www.ml.cmu.edu/news/news-archive/2018/october/amazon...

eanzenberg7y ago

That wouldn't matter if the KPI (worker performance) being predicted, which is inherently biased as well, was distributed differently among the balanced pool of applicants.

turtlecloud7y ago

Just remove the gender/sex as variables for the AI and maybe name too. Preprocess the resumes to remove them. Now you remove the majority of gender bias for the AI.

gizmo6867y ago

gambler7y ago

>The most obvious and immediately concerning place that this issue can be manifested is in human diversity.

There is nothing "mechanistic" about this. It depends on how you select sample resumes and how you split them between "good" and "bad" labels.

Bartweiss7y ago

That intern story is excellent; I'm adding it to my bank of "weird AI tricks" like pausing Tetris to avoid losing.

john-radio7y ago

> That intern story is excellent; I'm adding it to my bank of "weird AI tricks" like pausing Tetris to avoid losing.

Post your bank! Let's be like Magnus Carlson and occasionally ask ourselves, "What would DeepMind do?"

1 more reply

moomin7y ago

joshuamorton7y ago

>Framing this problem as "bias"

Except that's exactly what it is. Much as your model was biased against interns.

> and especially hyper-focusing everyone's attention on diversity aspect of it is extremely irresponsible.

gambler7y ago

>Why? Pointing out a specific and concrete harm badly designed ML models cause is irresponsible?

In my opinion, yes, if it leads most readers to misjudge some fundamental properties of the problem as a whole. Again, I'm not saying this article is guilty, but most are.

1 more reply

westoncb7y ago

> Except that's exactly what it is.

"A preference or an inclination, especially one that inhibits impartial judgment."

"An unfair act or policy stemming from prejudice."

"A statistical sampling or testing error caused by systematically favoring some outcomes over others."

The ML model does not have a preference, inclination, or prejudice relating to interns, except insofar as we anthropomorphize it to have them. What does using a word suggesting that add?

1 more reply

jakelazaroff7y ago

> There is nothing "mechanistic" about this. It depends on how you select sample resumes and how you split them between "good" and "bad" labels.

bduerst7y ago

Curiously though, did you compare the non-hire (full time) rates of interns vs fire rates of non-interns?

6cd6beb7y ago

>If you train an AI using data from a system that already has certain biases, then the AI is going to replicate those same systemic biases

That's not what happened in the example at all. The example company isn't biased against summer interns, "who stops working after x time" was just a bad question.

The comment you're replying to can boil down to "do you want a monkey's paw solving your problem? If so then AI may be for you"

Or perhaps "stop pretending you're ever going to get ethics or empathy out of a computer"

1 more reply

gambler7y ago

>did you compare the non-hire (full time) rates of interns vs fire rates of non-interns?

bumby7y ago

Meaning, isn't it prudent to spending time on this issue?

BickNowstrom7y ago

You are conflating bias (error) with bias (fairness).

1 more reply

lonnyk7y ago

I get the point, but why didn't you just exclude intern resumes from the training data? Do you still suspect a skewed result?

gambler7y ago

>I get the point, but why didn't you just exclude intern resumes from the training data?

The project eventually lost steam and got abandoned.

>Do you still suspect a skewed result?

2 more replies

chobeat7y ago

I've just added this post to my reading list. I share it if anybody is interested in this and similar topics: https://github.com/chobeat/awesome-critical-tech-reading-lis...

Zolomon7y ago

There is a course on this at New York University: https://dataresponsibly.github.io/courses/spring19/

killjoywashere7y ago

Someone7y ago

Short way to describe the problem: we want to build systems that detect causation, but statistical models can only detect correlation.

wongarsu7y ago

Someone7y ago

So, how do you, for example, rule out “some C causes both A and B“, if you may not even know of the existence of C?

More importantly, the only way to really show causation is by positing a mechanism.

eanzenberg7y ago

munificent7y ago

MiroF7y ago

> I think you assume here that the historical effects that led to Siemens sensors correlating with failure will continue to be true in the future.

Yes, AI systems presume induction to be true. But so does... uh, science and most other things we do?

1 more reply

throwawaymath7y ago

No, that's incorrect. Note the part of your quote which says, "and suppose this has no connection to the failure."

Real world ways this is plausible and cannot be corrected by increased sampling:

1. Your telemetry data is accurate, but your logging service providing that data is faulty and only consumes data from a subset of meaningful publishers.

2. Whoever provided this dataset fat fingered a SQL query which joined too few tables including the sensor vendors, but correctly returned only the failing turbines.

3. Your data has (unnormalized) duplicates, because more than one system is providing telemetry data for Siemens sensors without the older systems being retired.

4. You use mostly Siemens sensors, and simply didn't correct for this in your sample.

DuskStar7y ago

Just to point out:

2. See #1.

3. See #1.

TheCoelacanth7y ago

Only if your test data is free of sample bias.

Given how incredibly hard it is to avoid sample bias, you can't take it for granted that your training data doesn't have any sample bias.

DuskStar7y ago

If the sample is "all the gas turbines I own", I don't particularly CARE about the bias...

3 more replies

jgon7y ago

This quote stands out to me:

"just as a dog is much better at finding drugs than people, but you wouldn’t convict someone on a dog’s evidence. And dogs are much more intelligent than any machine learning."

Because in my head I followed it with the sentence "but we're all confident that we will have dogs driving our cars in about 5 years." Food for thought for sure.

dmix7y ago

They didn't say dogs were better than technology at solving problems, in any sort of general sense.

j / k navigate · click thread line to collapse