Ignoring the potential offensiveness and YOLOing through it is swinging the bat wildly at every pitch.
What I'm suggesting is that Copilot should keep working when these words are present, but refuse to attach any significance to the specific word. This could probably be implemented by replacing the problematic words with randomly generated strings before processing the text, then swapping those strings back afterwards.
(It could be reasonable for Copilot to refuse to make suggestions at all if the output would contain truly offensive language, like unambiguous racial slurs or sexual terms. But "gender" clearly isn't that.)
Yah-- it's unfortunate but it's easy. It might be OK to tolerate it if it's clearly outside the range of tokens used in suggestions, but the filtering doesn't use tokenized stuff.
> What I'm suggesting is that Copilot should keep working when these words are present, but refuse to attach any significance to the specific word. This could probably be implemented by replacing the problematic words with randomly generated strings before processing the text, then swapping those strings back afterwards.
The problem is, the trained model is much smarter than the keyword-based filtering. If you just whiteout the watchwords, it still has a pretty good chance of gleaning context and making a commentary on gender that Microsoft would rather not deal with.
> (It could be reasonable for Copilot to refuse to make suggestions at all if the output would contain truly offensive language, like unambiguous racial slurs or sexual terms. But "gender" clearly isn't that.)
Right now the list is quite a large variety of things. Mostly racial slurs and sexual terms. But letting an AI ramble on after "blacks" is kind of dangerous, as are various gender-related terms that do have innocuous interpretations. It's easy to put words in the filter list and much harder to try and use nuance on these topics that even humans struggle with nuance around.