undefined | Better HN

0 pointsevertedsphere1y ago0 comments

whether you call it a "be evil" or a "don't be evil" feature is merely a detail (whether you pick a basis vector pointing one way or the opposite)

0 comments

TZubiri1y ago

What a strech.

Does an is_even function have an is_odd feature implemented?

Does an is_divisible_by_200 have an is_not_divisible_by_3 feature implemented?

Does a physics simulator have an "accelerate upwards" feature?

No, it's a bug/emergent property and interpreting it as a feature is a simple misunderstanding of the software.

Semantics matter, just because you can potentially negate a variable (or multiply it by any number) doesn't mean that property is inherent to the program.

Legend24401y ago

>No, it's a bug/emergent property and interpreting it as a feature is a simple misunderstanding of the software.

'Feature' has a different meaning in machine learning than it does in software. It means a measurable property of data, not a behavior of a program.

E.g. the language, style, tone, content, and semantics of text are all features. If text can be said to have a certain amount of 'evilness', then you have an evilness feature.

https://en.wikipedia.org/wiki/Feature_(machine_learning)

TZubiri1y ago

Ahh that's true. However the way he phrased it "the fine tuning causes the feature" it's clear to me that the functionality meaning is used. But I can't pinpoint exactly why.

I think it's something about the incompatibility between the inertness of ML-features and potential-verbs of tradiditional-features.

The OP says "be evil" feature, and refers that the finetuning causes it. If it meant an ml-feature as a property of the data, OP would have said something like "evilness" feature.

To any extent if it were an ML-feature, it wouldn't be about evilness it would merely be the collection of features that were discouraged in training. Which at that point becomes somewhat redundant.

To summarize, if you finetune for any of the negatively trained tokens, the model will simplify by first returning all tokens with negative biases, unless you specifically train it not to bring up negative tokens in other areas.

Dylan168071y ago

> Does an is_even function have an is_odd feature implemented?

If it's a function on integers, then yes. Especially if the output is also expressed as arbitrary integers.

> Does an is_divisible_by_200 have an is_not_divisible_by_3 feature implemented?

No.

> Does a physics simulator have an "accelerate upwards" feature?

Yes, if I'm interpreting what you mean by "accelerate upwards". That's just the gravity feature. It's not a bug, and it's not emergent.

> Semantics matter, just because you can potentially negate a variable (or multiply it by any number) doesn't mean that property is inherent to the program.

A major part of a neural network design is that variables can be activated in positive or negative directions as part of getting the output you want. Either direction is inherent.

TZubiri1y ago

>Yes, if I'm interpreting what you mean by "accelerate upwards". That's just the gravity feature. It's not a bug, and it's not emergent.

Gravity would be accelerating downwards.

>A major part of a neural network design is that variables can be activated in positive or negative directions as part of getting the output you want. Either direction is inherent.

This is true for traditional programs as well. But a variable being "activated" in either direction in runtime/inference, would not be a feature of the program. There is a very standard and well defined difference between runtime and design time.

1 more reply

j / k navigate · click thread line to collapse

0 comments

TZubiri1y ago

What a strech.

Does an is_even function have an is_odd feature implemented?

Does an is_divisible_by_200 have an is_not_divisible_by_3 feature implemented?

Does a physics simulator have an "accelerate upwards" feature?

No, it's a bug/emergent property and interpreting it as a feature is a simple misunderstanding of the software.

Semantics matter, just because you can potentially negate a variable (or multiply it by any number) doesn't mean that property is inherent to the program.

Legend24401y ago

>No, it's a bug/emergent property and interpreting it as a feature is a simple misunderstanding of the software.

'Feature' has a different meaning in machine learning than it does in software. It means a measurable property of data, not a behavior of a program.

E.g. the language, style, tone, content, and semantics of text are all features. If text can be said to have a certain amount of 'evilness', then you have an evilness feature.

https://en.wikipedia.org/wiki/Feature_(machine_learning)

TZubiri1y ago

Ahh that's true. However the way he phrased it "the fine tuning causes the feature" it's clear to me that the functionality meaning is used. But I can't pinpoint exactly why.

I think it's something about the incompatibility between the inertness of ML-features and potential-verbs of tradiditional-features.

The OP says "be evil" feature, and refers that the finetuning causes it. If it meant an ml-feature as a property of the data, OP would have said something like "evilness" feature.

To any extent if it were an ML-feature, it wouldn't be about evilness it would merely be the collection of features that were discouraged in training. Which at that point becomes somewhat redundant.

Dylan168071y ago

> Does an is_even function have an is_odd feature implemented?

If it's a function on integers, then yes. Especially if the output is also expressed as arbitrary integers.

> Does an is_divisible_by_200 have an is_not_divisible_by_3 feature implemented?

No.

> Does a physics simulator have an "accelerate upwards" feature?

Yes, if I'm interpreting what you mean by "accelerate upwards". That's just the gravity feature. It's not a bug, and it's not emergent.

> Semantics matter, just because you can potentially negate a variable (or multiply it by any number) doesn't mean that property is inherent to the program.

A major part of a neural network design is that variables can be activated in positive or negative directions as part of getting the output you want. Either direction is inherent.

TZubiri1y ago

>Yes, if I'm interpreting what you mean by "accelerate upwards". That's just the gravity feature. It's not a bug, and it's not emergent.

Gravity would be accelerating downwards.

>A major part of a neural network design is that variables can be activated in positive or negative directions as part of getting the output you want. Either direction is inherent.

1 more reply

j / k navigate · click thread line to collapse