undefined | Better HN

0 pointstines3y ago0 comments

This raises some really interesting questions.

We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?

If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.

Also, is it wrong to have localized models? For example, should a model for use in Japan conform to the demographics of Japan, or to that of the world?

0 comments

karpierz3y ago

It depends on whether you'd like the model to learn casual or correlative relationships.

If you want the model to understand what a "nurse" actually is, then it shouldn't be associated with female.

If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

The issue with a correlative model is that it can easily be self-reinforcing.

bufbupa3y ago

At the end of a day, if you ask for a nurse, should the model output a male or female by default? If the input text lacks context/nuance, then the model must have some bias to infer the user's intent. This holds true for any image it generates; not just the politically sensitive ones. For example, if I ask for a picture of a person, and don't get one with pink hair, is that a shortcoming of the model?

I'd say that bias is only an issue if it's unable to respond to additional nuance in the input text. For example, if I ask for a "male nurse" it should be able to generate the less likely combination. Same with other races, hair colors, etc... Trying to generate a model that's "free of correlative relationships" is impossible because the model would never have the infinitely pedantic input text to describe the exact output image.

slg3y ago

This type of bias sounds a lot easier to explain away as a non-issue when we are using "nurse" as the hypothetical prompt. What if the prompt is "criminal", "rapist", or some other negative? Would that change your thought process or would you be okay with the system always returning a person of the same race and gender that statistics indicate is the most likely? Do you see how that could be a problem?

3 more replies

karpierz3y ago

> At the end of a day, if you ask for a nurse, should the model output a male or female by default?

Randomly pick one.

> Trying to generate a model that's "free of correlative relationships" is impossible because the model would never have the infinitely pedantic input text to describe the exact output image.

Sure, and you can never make a medical procedure 100% safe. Doesn't mean that you don't try to make them safer. You can trim the obvious low hanging fruit though.

3 more replies

webmaven3y ago

> If the input text lacks context/nuance, then the model must have some bias to infer the user's intent. This holds true for any image it generates; not just the politically sensitive ones. For example, if I ask for a picture of a person, and don't get one with pink hair, is that a shortcoming of the model?

You're ignoring that these models are stochastic. If I ask for a nurse and always get an image of a woman in scrubs, then yes, the model exhibits bias. If I get a male nurse half the time, we can say the model is unbiased WRT gender, at least. The same logic applies to CEOs always being old white men, criminals always being Black men, and so on. Stochastic models can output results that when aggregated exhibit a distribution from which we can infer bias or the lack thereof.

sangnoir3y ago

> At the end of a day, if you ask for a nurse, should the model output a male or female by default?

This depends on the application. As an example, it would be a problem if it's used as a CV-screening app that's implicitly down-ranking male-applicants to nurse positions, resulting in fewer interviews for them.

pshc3y ago

Perhaps to avoid this issue, future versions of the model would throw an error like “bias leak: please specify a gender for the nurse at character 32”

jdashg3y ago

Additionally, if you optimize for most-likely-as-best, you will end up with the stereotypical result 100% of the time, instead of in proportional frequency to the statistics.

Put another way, when we ask for an output optimized for "nursiness", is that not a request for some ur stereotypical nurse?

ar_lan3y ago

You could stipulate that it roll a die based on percentage results - if 70% of Americans are "white", then 70% of the time show a white person - 13% of the time the result should be black, etc.

That's excessively simplified but wouldn't this drop the stereotype and better reflect reality?

2 more replies

jvalencia3y ago

You could simply encode a score for how well the output matches the input. If 25% of trees in summer are brown, perhaps the output should also have 25% brown. The model scores itself on frequencies as well as correctness.

2 more replies

sinenomine3y ago

> It depends on whether you'd like the model to learn casual or correlative relationships.

I expect that in the practical limit of scale achievable, the regularization pressure inherent to the process of training these models converges to https://en.wikipedia.org/wiki/Minimum_description_length and the correlative relationships become optimized away, leaving mostly true causal relationships inherent to data-generating process.

drdeca3y ago

The meaning of the word "nurse" is determined by how the word "nurse" is used and understood.

Perhaps what "nurse" means isn't what "nurse" should mean, but what people mean when they say "nurse" is what "nurse" means.

LudwigNagasena3y ago

> If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

That’s a distinction without a difference. Meaning is use.

tinesOP3y ago

Not really; the gender of a nurse is accidental, other properties are essential.

3 more replies

mdp20213y ago

Very certainly not, since use is individual and thus a function of competence. So, adherence to meaning depends on the user. Conflict resolution?

And anyway - contextually -, the representational natures of "use" (instances) and that of "meaning" (definition) are completely different.

2 more replies

SnowHill99023y ago

It’s the same as with an artist: “hey artist, draw me a nurse.” “Hmm okay, do you want it a guy or girl?” “Don’t ask me, just draw what I’m saying.” The artist can then say: “Okay, but accept my biases.” or “I can’t since your input is ambiguous.”

For a one-shot generative algorithm you must accept the artist’s biases.

rvnx3y ago

Revert back to average representation of a nurse (give no weight to unspecified criterias, gender, age, skin-color, religion, country, hair-style, no style whether it's a drawing or a photography, no information about the year it was made, etc).

“hey artist, draw me a nurse.”

“Hmm okay, do you want it a guy or girl?”

“Don’t ask me, just draw what I’m saying.”

- Ok, I'll draw you what an average nurse looks like.

- Wait, it's a woman! She wears a nurse blouse and she has a nurse cap.

- Is it bad ?

- No.

- Ok then what's the problem, you asked for something that looked like a nurse but didn't specify anything else ?

SnowHill99023y ago

The average nurse has three-halfs of a tit.

1 more reply

jonny_eh3y ago

> But is it a flaw that the model encodes the world as it really is

Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

I read the disclaimer as "the model does NOT represent reality".

fnordpiglet3y ago

Worse these models are fed from media sourced in a society that tells a different story of reality than reality actually has. How can they be accurate? They just reflect the biases of our various medias and arts. But I don’t think there’s any meaningful resolution in the present other than acknowledging this and trying to release more representative models as you can.

nearbuy3y ago

I don't think we'd want the model to reflect the global statistics. We'd usually want it to reflect our own culture by default, unless it had contextual clues to do something else.

For example, the most eaten foods globally are maize, rice, wheat, cassava, etc. If it always depicted foods matching the global statistics, it wouldn't be giving most users what they expected from their prompt. American users would usually expect American foods, Japanese users would expect Japanese foods, etc.

> Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

Caucasians specifically are a global minority, but lighter skinned people are not, depending of course on how dark you consider skin to be "lighter skin". Most of the world's population is in Asia, so I guess a model that was globally statistically accurate would show mostly people from there.

ma2rten3y ago

Caucasians are overrepresented in internet pictures.

pxmpxm3y ago

This, I would imagine this heavily correlates to things like income and gdp per capita.

jonny_eh3y ago

Right, that's the likely cause of the bias.

tinesOP3y ago

Well first, I didn't say caucasian; light-skinned includes Spanish people and many others that caucasian excludes, and that's why I said the former. Also, they are a minority globally, but the GP mentioned "Western stereotypes", and they're a majority in the West, so that's why I said "in the west" when I said that there are more light-skinned people.

skybrian3y ago

Yes, there is a denominator problem. When selecting a sample "at random," what do you want the denominator to be? It could be "people in the US", "people in the West" (whatever countries you mean by that) or "people worldwide."

Also, getting a random sample of any demographic would be really hard, so no machine learning project is going to do that. Instead you've got a random sample of some arbitrary dataset that's not directly relevant to any particular purpose.

This is, in essence, a design or artistic problem: the Google researchers have some idea of what they want the statistical properties of their image generator to look like. What it does isn't it. So, artistically, the result doesn't meet their standards, and they're going to fix it.

There is no objective, universal, scientifically correct answer about which fictional images to generate. That doesn't all art is equally good, or that you should just ship anything without looking at quality along various axes.

godelski3y ago

> But is it a flaw that the model encodes the world as it really is

I want to be clear here, bias can be introduced at many different points. There's dataset bias, model bias, and training bias. Every model is biased. Every dataset is biased.

Yes, the real world is also biased. But I want to make sure that there are ways to resolve this issue. It is terribly difficult, especially in a DL framework (even more so in a generative model), but it is possible to significantly reduce the real world bias.

tinesOP3y ago

> Every dataset is biased.

Sure, I wasn't questioning the bias of the data, I was talking about the bias of the real world and whether we want the model to be "unbiased about bias" i.e. metabiased or not.

Showing nurses equally as men and women is not biased, but it's metabiased, because the real world is biased. Whether metabias is right or not is more interesting than the question of whether bias is wrong because it's more subtle.

Disclaimer: I'm a fucking idiot and I have no idea what I'm talking about so take with a grain of salt.

john_yaya3y ago

Please be kinder to yourself. You need to be your own strongest advocate, and that's not incompatible with being humble. You have plenty to contribute to this world, and the vast majority of us appreciate what you have to offer.

1 more reply

Imnimo3y ago

>If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.

Yeah, but you get that same effect on every axis, not just the one you're trying to correct. You might get male nurses, but they have green hair and six fingers, because you're sampling from the tail on all axes.

tinesOP3y ago

Yeah, good point, it's not as simple as I thought.

daenz3y ago

I think the statistics/representation problem is a big problem on its own, but IMO the bigger problem here is democratizing access to human-like creativity. Currently, the ability to create compelling art is only held by those with some artistic talent. With a tool like this, that restriction is gone. Everyone, no matter how uncreative, untalented, or uncommitted, can create compelling visuals, provided they can use language to describe what they want to see.

So even if we managed to create a perfect model of representation and inclusion, people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. Restricting the ability to be creative seems to be a new frontier of censorship.

adriand3y ago

> So even if we managed to create a perfect model of representation and inclusion, people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous.

Do they see it as dangerous? Or just offensive?

I can understand why people wouldn’t want a tool they have created to be used to generate disturbing, offensive or disgusting imagery. But I don’t really see how doing that would be dangerous.

In fact, I wonder if this sort of technology could reduce the harm caused by people with an interest in disgusting images, because no one needs to be harmed for a realistic image to be created. I am creeping myself out with this line of thinking, but it seems like one potential beneficial - albeit disturbing - outcome.

> Restricting the ability to be creative seems to be a new frontier of censorship.

I agree this is a new frontier, but it’s not censorship to withhold your own work. I also don’t really think this involves much creativity. I suppose coming up with prompts involves a modicum of creativity, but the real creator here is the model, it seems to me.

tinesOP3y ago

> In fact, I wonder if this sort of technology could reduce the harm caused by people with an interest in disgusting images, because no one needs to be harmed for a realistic image to be created. I am creeping myself out with this line of thinking, but it seems like one potential beneficial - albeit disturbing - outcome.

Interesting idea, but is there any evidence that e.g. consuming disturbing images makes people less likely to act out on disturbing urges? Far from catharsis, I'd imagine consumption of such material to increase one's appetite and likelihood of fulfilling their desires in real life rather than to decrease it.

I suppose it might be hard to measure.

gknoy3y ago

> > ... people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. > Do they see it as dangerous? Or just offensive?

I won't speak to whether something is "offensive", but I think that having underlying biases in image-classification or generation has very worrying secondary effects, especially given that organizations like law enforcement want to do things like facial recognition. It's not a perfect analogue, but I could easily see some company pitch a sketch-artist-replacement service that generated images based on someone's description. The potential for having inherent bias present in that makes that kind of thing worrying, especially since the people in charge of buying it are likely to care, or notice, about the caveats.

It does feel like a little bit of a stretch, but at the same time we've also seen such things happen with image classification systems.

webmaven3y ago

> I can understand why people wouldn’t want a tool they have created to be used to generate disturbing, offensive or disgusting imagery. But I don’t really see how doing that would be dangerous.

Propaganda can be extremely dangerous. Limiting or discouraging the use of powerful new tools for unsavory purposes such as creating deliberately biased depictions for propaganda purposes is only prudent. Ultimately it will probably require filtering of the prompts being used in much the same way that Google filters search queries.

concordDance3y ago

I can't quite tell if you're being sarcastic about people being able to make things other people would find offensive being a problem. Are you missing an /s?

webmaven3y ago

> We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?

If the model only generated images of female nurses, then it is not representative of the real world, because male nurses exist and they deserve to not be erased. The training data is the proximate causes here, but one wonders what process ended up distorting "most nurses are female" into "nearly all nurse photos are of female nurses" something amplified a real world imbalance into a dataset that exhibited more bias than the real world, and then training the AI bakes that bias into an algorithm (that may end up further reinforcing the bias in the real world depending on the use-cases).

ben_w3y ago

This sounds like descriptivism vs prescriptivism. In English (native language) I’m a descriptivist, in all other languages I have to tell myself to be a prescriptivist while I’m actively learning and then switch back to descriptivism to notice when the lessons were wrong or misleading.

pshc3y ago

I think it is problematic, yes, to produce a tool trained on data from the past that reinforces old stereotypes. We can’t just handwave it away as being a reflection of its training data. We would like it to do better by humanity. Fortunately the AI people are well aware of the insidious nature of these biases.

j / k navigate · click thread line to collapse

0 comments

karpierz3y ago

It depends on whether you'd like the model to learn casual or correlative relationships.

If you want the model to understand what a "nurse" actually is, then it shouldn't be associated with female.

If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

The issue with a correlative model is that it can easily be self-reinforcing.

bufbupa3y ago

slg3y ago

3 more replies

karpierz3y ago

> At the end of a day, if you ask for a nurse, should the model output a male or female by default?

Randomly pick one.

> Trying to generate a model that's "free of correlative relationships" is impossible because the model would never have the infinitely pedantic input text to describe the exact output image.

Sure, and you can never make a medical procedure 100% safe. Doesn't mean that you don't try to make them safer. You can trim the obvious low hanging fruit though.

3 more replies

webmaven3y ago

sangnoir3y ago

> At the end of a day, if you ask for a nurse, should the model output a male or female by default?

pshc3y ago

Perhaps to avoid this issue, future versions of the model would throw an error like “bias leak: please specify a gender for the nurse at character 32”

jdashg3y ago

Additionally, if you optimize for most-likely-as-best, you will end up with the stereotypical result 100% of the time, instead of in proportional frequency to the statistics.

Put another way, when we ask for an output optimized for "nursiness", is that not a request for some ur stereotypical nurse?

ar_lan3y ago

You could stipulate that it roll a die based on percentage results - if 70% of Americans are "white", then 70% of the time show a white person - 13% of the time the result should be black, etc.

That's excessively simplified but wouldn't this drop the stereotype and better reflect reality?

2 more replies

jvalencia3y ago

2 more replies

sinenomine3y ago

> It depends on whether you'd like the model to learn casual or correlative relationships.

drdeca3y ago

The meaning of the word "nurse" is determined by how the word "nurse" is used and understood.

Perhaps what "nurse" means isn't what "nurse" should mean, but what people mean when they say "nurse" is what "nurse" means.

LudwigNagasena3y ago

> If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.

That’s a distinction without a difference. Meaning is use.

tinesOP3y ago

Not really; the gender of a nurse is accidental, other properties are essential.

3 more replies

mdp20213y ago

Very certainly not, since use is individual and thus a function of competence. So, adherence to meaning depends on the user. Conflict resolution?

And anyway - contextually -, the representational natures of "use" (instances) and that of "meaning" (definition) are completely different.

2 more replies

SnowHill99023y ago

For a one-shot generative algorithm you must accept the artist’s biases.

rvnx3y ago

“hey artist, draw me a nurse.”

“Hmm okay, do you want it a guy or girl?”

“Don’t ask me, just draw what I’m saying.”

- Ok, I'll draw you what an average nurse looks like.

- Wait, it's a woman! She wears a nurse blouse and she has a nurse cap.

- Is it bad ?

- No.

- Ok then what's the problem, you asked for something that looked like a nurse but didn't specify anything else ?

SnowHill99023y ago

The average nurse has three-halfs of a tit.

1 more reply

jonny_eh3y ago

> But is it a flaw that the model encodes the world as it really is

Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

I read the disclaimer as "the model does NOT represent reality".

fnordpiglet3y ago

nearbuy3y ago

I don't think we'd want the model to reflect the global statistics. We'd usually want it to reflect our own culture by default, unless it had contextual clues to do something else.

> Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.

ma2rten3y ago

Caucasians are overrepresented in internet pictures.

pxmpxm3y ago

This, I would imagine this heavily correlates to things like income and gdp per capita.

jonny_eh3y ago

Right, that's the likely cause of the bias.

tinesOP3y ago

skybrian3y ago

godelski3y ago

> But is it a flaw that the model encodes the world as it really is

I want to be clear here, bias can be introduced at many different points. There's dataset bias, model bias, and training bias. Every model is biased. Every dataset is biased.

tinesOP3y ago

> Every dataset is biased.

Sure, I wasn't questioning the bias of the data, I was talking about the bias of the real world and whether we want the model to be "unbiased about bias" i.e. metabiased or not.

Disclaimer: I'm a fucking idiot and I have no idea what I'm talking about so take with a grain of salt.

john_yaya3y ago

1 more reply

Imnimo3y ago

tinesOP3y ago

Yeah, good point, it's not as simple as I thought.

daenz3y ago

adriand3y ago

Do they see it as dangerous? Or just offensive?

I can understand why people wouldn’t want a tool they have created to be used to generate disturbing, offensive or disgusting imagery. But I don’t really see how doing that would be dangerous.

> Restricting the ability to be creative seems to be a new frontier of censorship.

tinesOP3y ago

I suppose it might be hard to measure.

gknoy3y ago

> > ... people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. > Do they see it as dangerous? Or just offensive?

It does feel like a little bit of a stretch, but at the same time we've also seen such things happen with image classification systems.

webmaven3y ago

> I can understand why people wouldn’t want a tool they have created to be used to generate disturbing, offensive or disgusting imagery. But I don’t really see how doing that would be dangerous.

concordDance3y ago

I can't quite tell if you're being sarcastic about people being able to make things other people would find offensive being a problem. Are you missing an /s?

webmaven3y ago

ben_w3y ago

pshc3y ago

j / k navigate · click thread line to collapse