Building a no-code toxicity classifier by talking to GitHub Copilot (opens in new tab)

(surgehq.ai)

212 pointsAiFoGhost4y ago143 comments

143 comments

We're all focusing on the weaknesses of co-pilot (the comments can be longer than the code produced; you need to understand code to know when to elaborate your comment, etc).

But also ... what do you need to know to recognize that the concept of a 'toxicity classifier' is likely broken? We can do _profanity_ detection pretty well, and without a huge amount of data. But with 1000 example comments, can you actually get at 'toxicity'? Can you judge toxicity purely from a comment in isolation, or does it need to be considered in the context in which that comment is made?

Maybe you don't need to know about python, but if you're building this, you should probably have spent some time thinking and grappling with ML problems in context, right? You want to know that, for example, the pipeline copilot is suggesting (word counts, TFIDF, naive Bayes) doesn't understand word order? Or to wonder whether it's tokenizing on just whitespace, and whether `'eat sh!t'` will fail to get flagged b/c `'shit'` and `'sh!t'` are literally orthogonal to the model?

More people should be able to create digital stuff that _does_ things, and maybe copilot is a tool to help us move in that direction. Great! But writing a bad "toxicity classifier" by not really engaging with the problem or thinking about how the solution works and where it fails seems potentially net harmful. More people should be able to make physical stuff too, but 3d-printed high-capacity magazines don't really get most of us where we want to go.

ShamelessC4y ago

> We're all focusing on the weaknesses of co-pilot (the comments can be longer than the code produced; you need to understand code to know when to elaborate your comment, etc).

See, this tells me you may not have even used copilot. Because while tutorials such as this (and the OpenAI codex tools) have you use comments explicitly to code, the reality is that you're not hammering out plain english requirements for copilot to work. You just code - and sometimes it finishes your thought, sometimes it doesn't. You hit tab to accept autocomplete, just like you would for any other autocomplete. So you are generally reading and evaluating what copilot thinks is a good output and choosing whether it goes in the program or not with the TAB key.

ImprobableTruth4y ago

Copilot is great as a 'smart auto-complete' or when you need to do pattern based drudge work... but that's not what this article is about. It's trying to sell people on copilot as a no-code tool.

The leading question is this:

>But as helpful as it is for coders, what if it enabled non-engineers to program too – by merely talking to an AI about their goals?

and it answers this in my opinion deceptively by presenting what amounts to a parlor trick. Whether copilot in general is any good or not is in my mind totally separate to this.

1 more reply

chockchocschoir4y ago

It doesn't actually say that at all, because you can use Copilot in different ways. One way is the way you mention, by writing code and letting Copilot finish those off. Another way is the way GP describes it (and, the technique that the article uses) where you write comments and let Copilot fill out the code.

Just because one uses one of the ways doesn't mean they are not aware of the other way too.

1 more reply

BeefWellington4y ago

A few years ago I did some work with IBM's Watson Twitter integration. One of the fun things you could do was sentiment analysis. It was reasonably accurate for the extremes but anything in the gray area would be wildly off. A politely worded tweet that was scathing would come across high on the positive sides of the scale, whereas a perfectly reasonable sentence that included profanity as used in a quote would immediately be high on the negatives.

This part from the article made me chuckle, because IMO the author fell for some of the most basic language processing smoke & mirrors:

    …so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.

Here, it generates toxic_comments as a variable name, when the instructions were:

   # create an array with the following toxic comments: [etc]

This is pretty basic language parsing stuff that might have been kicking around awhile. I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like. While impressive, it's not nearly as interesting or good as the rest of the work being done.

Copilot appears no different to most ML models out there. Poor and incomplete training data will yield ok results for popular things but as soon as you ask for edge cases it will fall apart like Siri trying to understand a Scottish accent.

Eventually it might get there with enough good representative training data but it's unclear to me how long that will take. If it tracks with speech processing models it might take decades plus.

Another consideration is that because the training data is being done using github public repos (at least last I read), it's likely that it's ripe for abuse. If that's still how they're doing it I'm looking forward to the TEDTalk in two years from a researcher who "hacked" the copilot AI by polluting its training data.

visarga4y ago

> I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like.

OK, I am waiting for you to propose a basic language parser that can do it. There's a reason we're only now having this debate - it was unconceivable 5 years ago, in the era of basic language parsers.

1 more reply

sdoering4y ago

Years ago at pyData Berlin I remember a talk trying to classify comments from three major online newspapers with the question if we. Could detect where a comment was made.

One newspaper was left leaning, the other had the reputation of right wing trolls commenting and one was somewhat in the middle ground with a reputation of the audience being pseudo intellectual neoliberalists.

The 'center' (most typical) comment for these three sites totally was in line with these sentiments. The perfect proof (or confirmation bias).

But the classification didn't work. While there were clear cut cases (one has to love stereotypes) most cases were just neutral. Meaning they could have been made on any of these media sites. Either they were just too short or just not extreme enough.

I feel (used explicitly here) that toxicity is not something that is easily classifiable without deeper understanding of the context. Else, if feeling a comment was toxic was the measure one would need to query all walks of life from extreme left to extreme right and afterwards would probably be left with a lot of toxicity that doesn't tell us much except that different people will find different things toxic.

jayceekay4y ago

didn't watson turn out to be useless and spaghetti code inside? aka ibm's marketing arm

2 more replies

joshspankit4y ago

We must suspend disbelief a bit regardless: Any “toxicity classifier” has a limited operational life as people who want to say toxic things will simply adapt their language and walk circles around it.

From simple letter substitution (sh!t) to completely different words/concepts (unalive) to “layer 2 sarcasm” (where someone adopts the persona of someone who supports the word view that’s against what they believe in a non-obvious attempt to rally people against that persona).

People have been getting away with being toxic in public for a long time. ML cannot keep up. Humans can’t even keep up.

echen4y ago

(Post author here.) Agree with both you and the parent here! We work a lot in the NLP and Trust & Safety space, and many of the models and datasets we see do ignore context -- and so real-world "toxicity models often end up simply as "profanity detectors" (https://www.surgehq.ai/blog/are-popular-toxicity-models-simp...). Which would certainly happen with a Naive Bayes model as well.

Similarly, a lot of the training data/features ML engineers use ignore context -- for example, a Reddit comment may seem hateful in isolation, until you realize the subreddit it's in changes the meaning entirely (https://www.surgehq.ai/blog/why-context-aware-datasets-are-c...).

Regarding your point, we actually do a lot of "adversarial labeling" to try to make ML models robust to countermeasures (e.g., making sure that the ML models train on word letter substitutions), but it's pretty tricky!

IshKebab4y ago

The fact that "toxicity" is not well-defined or black and white and you'll never be able to reach 100% accuracy is extremely obvious and not very interesting. That's probably why nobody is talking about it.

qsort4y ago

Sure, but we probably can work on that a little more rather than throwing in the towel and saying 'toxicity is when text matches regexp'.

1 more reply

Spivak4y ago

Would you say the same about a less divisive speech pattern like a flamebait/flamewar classifier? Because it’s already double with simple heuristics like upvote/comment ratios and seems like a fine fit for a moderator assisted classifier.

1 more reply

tomerv4y ago

The first comment asks Copilot to import all the libraries needed for a toxicity classifier, and it imports libraries such as re (regex engine) and nltk (natural language toolkit). But what if I wanted a classifier for toxic chemicals and not toxic speech? That was my first thought when I saw "toxicity" in the title.

I'm now imagining a very frustrated junior developer a few years from now trying to argue with Copilot to write code for a classifier for chemical compounds, but it just spits out code for classifying text.

blackoil4y ago

Like googling is an essential skill for developers. In few years if Copilot deliver its promise, navigating it would be treated the same. You may also have an interview round wherein optimization would be how quickly can you get Copilot to write the expected code.

junon4y ago

If this is what software engineering turns into, put a bullet through my head please.

4 more replies

ripper11384y ago

Funny you point that out. In this specific example, what if I just googled “Python comment toxicity classifier” and looked for a complete solution that way? It’s like a semi-smart googler in your IDE.

visarga4y ago

> In few years if Copilot deliver its promise ...

That's ages ago in AI time.

ShamelessC4y ago

> I'm now imagining a very frustrated junior developer a few years from now trying to argue with Copilot to write code for a classifier for chemical compounds, but it just spits out code for classifying text.

So just the future version of a junior developer not knowing how to use their tools? Yeah, that scans. Still sounds incredibly useful however. The alternative is, of course, a junior developer fumbling as they try to write said program entirely from their learned skills and experience.

dgb234y ago

That’s a very important thing to experience.

There’s not always a quick fix or easy path. You can’t always patch existing stuff together or just wait until the problem goes away.

And when a tool helps you too much, then is there really a point in what you’re doing? It’s not even a learning experience anymore.

1 more reply

StopHammoTime4y ago

Just to clarify, it's not really no-code: pseudocode is the new bytecode it would seem and this is just compiling that into usable code.

You still need to be able to code and understand what you're doing. You can't just ask simple questions and get complex answers. You still have to be capable of asking complex questions.

A common scenario I can think if is where I struggle to remember the name or API of the exact thing I want to do but I know exactly how it works - typing that in and getting a result would improve my workflow, but it's just saving a trip to Google, we're not talking the difference between doing and not doing, just a saving a minute.

I would rate the value of this more as interesting rather than useful, simply because as another commenter highlighted it's just easier to write code. It could be useful incrementally but not for everything.

CallMeJim4y ago

Note that in part of the process, Copilot was the one asking complex questions when the human programmer didn't know how to proceed.

Copilot adds tremendous value for someone who knows what they want, but not how to do it.

For example, I'm not a great programmer. I'm also a lazy programmer. I had to convert a time to a specific format, in a specific timezone in JS, and I couldn't be bothered looking up documentation for Date.toLocaleTimeString (or is that Date.toLocaleString?).

I wrote a comment outlining exactly what I wanted: // given a date in ISO format (and UTC timezone), return the time in hh:mm AM/PM format (and x timezone) and immediately Copilot generated the code I was after.

Making something easier can definitely mean the difference between doing and not doing — I've taken on a lot of projects I wouldn't have attempted without Copilot.

zkldi4y ago

> I wrote a comment outlining exactly what I wanted, and immediately Copilot generated the code I was after.

How do you know it was what you were after? Like you said, it could be .toLocaleTimeString or .toLocaleString (or something else).

How do you verify that the AI isn't giving you broken/incorrect code? I guess you could check the docs, or run the code yourself, but at that point what's the value add for copilot?

3 more replies

chronolitus4y ago

Reminds me of this post by Scott Aaronson: https://scottaaronson.blog/?p=6288

"Forget all that. Judged against where AI was 20-25 years ago, when I was a student, a dog is now holding meaningful conversations in English. And people are complaining that the dog isn’t a very eloquent orator, that it often makes grammatical errors and has to start again, that it took heroic effort to train it, and that it’s unclear how much the dog really understands."

deadbeeves4y ago

Because those points are key to whether the technology can (or is worthwhile to) be evolved further. Is a dog that just holds conversations without any understanding what we want, or do we want the dog to eventually take on more interesting tasks? Can we get there with this method or not? How much effort would it take?

Technology that can only make you go ooh and ahh is pretty useless.

bmitc4y ago

I don't really understand this. You're not coding directly in the language, but now you're coding in an implicit language provided by Copilot. From what I've seen on Copilot, although it is an impressive piece of tech, all it really points out is that code documentation and discovery is terrible. But I'm not for sure writing implicit code in comments is really a better approach than seeking ways to make discovery of language and library features more discoverable.

And I know it sounds silly and like "I had an idea like that once" (see Office Space), but I actually came up with the idea for or at least a similar one to Copilot in an off comment to a coworker back in like 2014 or so. The idea was that as you wrote code, it would display on the side similar code that had been written by others doing the same or similar thing, and then it would allow you automatically upload small processing functions to some sort of cloud library. Same thing for doing autoformatting, although that's less of a concern now that formatters are becoming popular. The context I was working in was visual languages though. I had even started writing a tool during an "innovation week" (that I never showed) that would start visually classifying whether code written in the visual language was "good" or "clean" or not. I never got anywhere with it and mainly just have some diagrams generated from that project that were buggy so that they kind of look like art.

ShamelessC4y ago

You "came up" with the idea for intelligent autocomplete? And are you aware that this project actually required big innovations in language modeling and a supercomputer? Because I would say that is far more central to the concept behind the tech than the interface.

bmitc4y ago

An idea is not an implementation, and I clearly mentioned it was an offhand comment in a casual conversation. My "idea" was exactly what I described above. Nothing more. I'm sure several had this idea, and Copilot was probably already in development. My comment was just a way to give a personal anecdote. I'm not sure what your point or complaint is. Did you somehow miss the reference to Office Space? It wasn't a serious claim. Just a segue to some thoughts I had.

ShamelessC4y ago

okay yeah - I apologize. In the context of other comments it seemed a little more dismissive of the tech itself. I see now that you were quite clearly going for humility. Should have caught it on the first read however, sorry again.

1 more reply

kcorbitt4y ago

What funny timing! Just this week I've actually been working on an open source VS Code extension that uses OpenAI's new code edit API[1] to let you write or edit code in your IDE by typing instructions.

And as a bonus related to the article title, it literally lets you talk to your editor (ie you can press the keyboard shortcut and then give edit commands by voice[2]). I've been leaning on it heavily for the last few days and the setup feels really productive!

If you want to try it out you can install it here: https://marketplace.visualstudio.com/items?itemName=clippy-a...

You can also find the full source code here: https://github.com/corbt/clippy-ai/tree/main/vs-code-extensi...

I'd love feedback!

[1]: https://openai.com/blog/gpt-3-edit-insert/

[2]: I just wrote the voice command interface yesterday and it's still highly experimental. Relies on having ffmpeg installed on MacOS and doesn't work with all audio setups yet. But there's a clear path to making it more robust.

jwithington4y ago

this is cool! to clarify, it's different than the existing Copilot extension because this lets you edit existing code and uses voice commands?

hartator4y ago

Notice that the comments used to generate the code via GitHub Copilot are just another very inefficient programming language.

dwohnitmok4y ago

There is nonetheless something extremely valuable about being able to write at different levels of abstraction when developing code.

Copilot lets you do that in a way that is way beyond what a normal programming language would let you do, which of course has its own, very rigid, abstractions.

For some parts of the code you'll want to dive in and write every single line in painstaking detail. For others `# give me the industry standard analysis of this dataset` is maybe enough for your purposes. And being able to have that ability, even if you think of it as just another programming language in itself, is huge.

MichaelBurge4y ago

Programming languages have syntax and semantics, while text-generators are statistical. So I wouldn't call them a programming language, since "having well-defined semantics" is more fundamental than "is often used in an edit->run loop".

michalhuman4y ago

What makes it inefficient? It is verbose and similar to natural language. Given that code is more often read than written, isn't the code that is easier to understand more efficient?

______-_-______4y ago

On the other hand, most code is read more often than it is written, and those comments are very readable!

bobsmooth4y ago

Notice that the C used to generate the machine code via the compiler is just another very inefficient programming language.

junon4y ago

As with most harmful speech classifiers (even classic models) this most likely won't catch the more passive aggressive remarks. Those worded innocently but imply something terrible. I've had a 100% success rate getting these sorts of models to tell me asking someone to "kindly end their own life" is not rude, toxic or harmful.

esjeon4y ago

Not really no-code. Let's be honest. The OP is taking steps just like how an experienced SW developer would. Copilot simply cut the need for reading through documentations. This doesn't really say that Copilot can replace programmers.

p.s. Does anyone know when Copilot will update the insecure example on their website? Or are they just trying to be honest with the possible quality issues with the generated code?

saurik4y ago

I mean, I would hope they wouldn't try to "update" it by manually changing the results (as, as you note, that would be horribly dishonest).

Ozzie_osman4y ago

This is a game-changer, even if it doesn't work 100% of the time. I only infrequently need to use notebooks and dataframes, I'd say once every few months. Frequently enough that I have a vague idea of what I need to do but not frequently enough that I can remember syntax.

With this, I don't need to memorize the syntax OR be bottlenecked on looking at documentation or stack overflowing the commands I need.

qayxc4y ago

> With this, I don't need to memorize the syntax OR be bottlenecked on looking at documentation or stack overflowing the commands I need.

In other words: you're celebrating the fact that a tool allows you to become more and more incompetent.

I don't have much hope for future generations at this point.

Aeolos4y ago

People said that during the switch from assembly to C, and then again from C to managed languages. Yet, at each point, there are more and better software engineers, solving more and more challenging problems at each step - the modern web would not exist if all we had at our disposable was 80's era assembly.

Aren't you at least a bit curious what new possibilities this technology could enable? What new discoveries could e.g. an expert doctor or a biologist achieve given access to programming tools without spending decades learning programming?

1 more reply

Ozzie_osman4y ago

Well, for starters, I'd love to count myself as a "future generation" but statistically speaking I'm older than most people in our field by a good margin.

More importantly, only about 10% of my job time is spent doing work that is hands-on technical work. I'd say probably 1% of that total time is spent doing notebooks with dataframes. Whether I am competent or not is in no way determined by whether I can memorize the syntax to how to group by and count a dataframe. In fact I'd argue it's probably a poor use of time.

Whether memorizing things like syntax is part of competence or not is highly dependent on context. The ROI of me memorizing that specific syntax would probably be highly negative.

I'd fathom there are countless examples like that. There are people who only rarely need to code. There are people who code a lot but only rarely need to use a certain library or language. For people like that, making the code more accessible is a huge win (that includes IDEs, auto-complete or easy links to documentation, and things like Copilot).

lkschubert84y ago

Is memorization of an API surface competence? This feels very much like a "back in my day" argument.

pech0rin4y ago

echoing a bunch of comments but this seems sort of like a nightmare. its like the classic “dont use comments that are exactly what the code is doing”. basically you are requiring writing this type of boilerplate comments which are completely useless but are now so the machines can write the code for you. i guess if you could have some tool that auto-removes these comments afterwards it wouldn’t be terrible but i just see this as a way to have people completely forget apis and then not actually be able to find more powerful tools in a language just living on the rails that copilot provides for you. overall seems like a step backwards, especially if newer devs use this as a crutch when jumping in. now we have a generation of devs who dont actually understand the way things work.

i guess stack overflow has a similar problem but at least there people provide documentation, explanation, and helpful links. this just force feeds you some code. i dont see this as a positive movement for our industry as a whole

aaaaaaaaaaab4y ago

>now we have a generation of devs who dont actually understand the way things work.

Can’t wait for this to be true! I will be treated as a demigod compared to them. Job security for life!

arciini4y ago

This is really pretty impressive. I think Copilot for these kinds of one-off analysis tasks where specific data manipulation rather than structuring abstractions makes a lot more sense. Structuring libraries or building UI requires a lot more understanding of potential users - in that case, writing the requirements is honestly the harder part.

1 more reply

softwarebeware4y ago

I'm out almost immediately. The first comment is more text than the code that it produces.

______-_-______4y ago

That happens sometimes when you move up an abstraction level. I bet "self.count += 1" is a lot longer than the machine code it generates.

rodiger4y ago

Falls apart when we get down to binary, but holds across most levels of abstraction

1 more reply

lofties4y ago

Think of GitHub copilot as StackOverflow on steroids -- a quick way to write code when you're not sure how to achieve what you're trying to do.

After all, "How to parse a CSV file in Python" is longer than "csv.reader(file)" but without knowing that "csv.reader" exists, you have no other way but to tell Google what you need.

klabb34y ago

> Think of GitHub copilot as StackOverflow on steroids

This is how I already think of co-pilot, but these steroids seem to be mostly for prototyping.

SO often have comments and context such as "this works with 98% of browsers", "this isn't recommended, try X instead", "this works but can break library code because it changes the global scope", "this stopped working in version X" etc etc. Context like this can be important to take into account depending on what you're building.

1 more reply

neurostimulant4y ago

Soon we can all quit our programming jobs and become managers managing copilot by writing copious amount of comments to persuade copilot into generating our software products. Finally no coding required!

da39a3ee4y ago

I think you're missing the point: this is starting to open the door to people who can't code.

faddypaddy344y ago

I think the point is if you cannot/do not know how to code you cannot confirm what co-pilot is doing. Especially when it comes to complex topics like drawing context from natural online language using machine learning.

1 more reply

ShamelessC4y ago

I adore copilot and use it daily, but I'm pretty sure if I had always depended on it, I wouldn't be able to properly parse correct from incorrect programs.

It's a really really cool tool and a lot of these comments are just shallow dismissals from people who haven't actually used it and like to be reactionary on the internet because that's the world we live in apparently. But I think it works best when it's used by people with experience.

Hopefully future models with higher accuracy and research in grounding can get us to that point however.

wojcikstefan4y ago

1. This is not “no-code”. You still have to read & understand the code Copilot generates.

2. I’m very skeptical of a small group of people reading a bunch of online comments and deciding what is “toxic” and “non-toxic”, even more so when it’s done with no clear definitions/guidelines. As their GitHub repo [0] says:

> Rather than operating under a strict definition of toxicity, we asked our team to identify comments that they personally found toxic.

[0]: https://github.com/surge-ai/toxicity

stitched2gethr4y ago

This is actually pretty impressive. More so than I expected, and I sincerely hope this opens the door to simple solutions for those who are still learning or don't code often.

That said, this isn't the robot that replaces us, obviously. Making the process of getting to 80% faster is better for everyone, but the last 20 is tough and anything further needs real expertise. I like how promising this is for the masses.

vba6164y ago

I thought at first this was a classifier for the toxicity of no-code solutions.

For instance, Microsoft Power Automate should rank highly.

rogue74y ago

This is impressive, Copilot knows scikit-learn better than the data scientist that I am.

Loeffelmann4y ago

I've been using copilot for a bit now and it's honestly really impressive. I was skeptical at first and didn't really believe all the praise but it works so well. You still have to understand what the code is doing but more often then not copilot spits out a out of the box working solution. It is phenomenal at writing tests. I can pretty much tell it "write tests for this function" and it will do it with surprising Quality and maybe even goes through cases I haven't thought about.

I think this technology will really shake up how we code.

TauNeutrino4y ago

It's an AI writing another AI, the miracle of guided reproduction! As programmers we should appreciate the subtle meta in that.

It is also highly symbolic that the first AI (copilot) was created to save humans from repeating toil, while the second (classifier) is about controlling and limiting us.

I believe the author chose to apply his method to this particular example intentionally for the two above points, not because of the hype of toxicity.

holografix4y ago

This would be awesome for crap I don’t want to learn like CSS

bradleybuda4y ago

So, has anyone asked Copilot to write a better Copilot yet?

hoosieree4y ago

    // generate 1e13 different versions of bubble sort and add to db

fsargent4y ago

I seriously thought that GitHub CoPilot was suggesting how to find new kinds of sarin gas. https://www.theverge.com/2022/3/17/22983197/ai-new-possible-... How long until it does?

eurasiantiger4y ago

You could already ask it to do this.

MrYellowP4y ago

I'm not sure people understand how utterly dystopian and fascist this is. It's like people believe that this is a good thing, instead of understanding how totalitarianism is spreading literally everywhere.

"In the name of what's Good & Right, you have to behave how we want you to ... or else."

spyremeown4y ago

What are you talking about, dude? How is an AI that generates code fascist?

DeathArrow4y ago

I would like to see a project adding together capabilities of both Autopilot and Intellicode. Copilot uses GPT-4 and GitHub project for training and is giving suggestions based on few lines, Intellicode is reading the whole project and is giving suggestions based on that.

boredumb4y ago

The last thing this world needs is automation around people calling things toxic or problematic.

amelius4y ago

This works because there's a lot of ML code out there, and it's all very much the same.

ah271824y ago

The page is not working anymore, getting a 400 error

linkdd4y ago

Can we ask Copilot to write a proof for the collatz conjecture? or P=NP?

achenet4y ago

we could ask, but would have no garantee such a proof would be correct.

Metacelsus4y ago

From the title I thought it would be about chemical toxicity.

eric4smith4y ago

Impressive BUT.

Who is defining toxic speech? Where is that data being taken from?

This is the definition of using AI to set what the edges of “speech” should be based on potentially flawed data.

This is a clown world.

r3trohack3r4y ago

> In this example, we’re using the Copilot extension for Visual Studio Code, and a free toxicity dataset that we built;

(Emphasis mine)

Following that link:

> Surge AI is a data labeling platform and workforce. Our labeling team pored over tens of thousands of social media comments to build this toxicity dataset. Each comment was then evaluated by multiple members of our team to determine its severity level.

bobsmooth4y ago

I feel so sorry for the labeling team. Hope they were paid well.

Xorlev4y ago

I think you missed the forest for the trees. It isn't the model that matters, it's that copilot is building the classifier from intent (comments). It wouldn't matter if it was classifying flowers instead.

eric4smith4y ago

No. I did not miss it. The work is pretty good.

My problem is with the dataset and datasets like this overall that sets the tone through AI of what is acceptable and what is not.

hombre_fatal4y ago

This is absolutely insane. I had no idea Copilot was this good.

The negativity here just seems like sour grapes or weird goal posts.

Sure, it makes mistakes and needs verification. But know what also makes mistakes and needs verification? All the code I already manually write as I tediously ratchet towards a solution. Removing some cycles from that process is a win.

Just stubbing out close-enough boilerplate is a win by itself, like setting up an NLP pipeline or figuring out which menagerie of classes need to be instantiated and hooked up together to do basic things in some verbose libs/langs.

csomar4y ago

Copilot is insanely brilliant and good. Its only issue is that it takes too little context (up to what your cursor is pointing on the file, at least on vim). If it did take all context (your whole project, maybe your shell history, your data files, the imported libraries code, GitHub repo issues/PRs, etc...) and it had some LSP checker for errors, add all of that to GPT-4; and maybe we'll have something that can do complex coding stuff auto-magically.

Mountain_Skies4y ago

Even though it is based on mostly human written code, Copilot makes mistakes that are different from the type human coders typically make. It will take a different skill set to detect and correct the errors made by systems like Copilot. The same is true for self-driving cars. This doesn't mean that we shouldn't use these technologies, just that there will be adaptations to our behavior we'll need to make if we want to make use of them.

fn14y ago

> Even though it is based on mostly human written code, Copilot makes mistakes that are different from the type human coders typically make.

Can you give an example for this?

1 more reply

ShamelessC4y ago

> The negativity here just seems like sour grapes or weird goal posts.

Indeed. Every negative comment I have seen here has been a shallow dismissal by someone who clearly hasn't engaged with the tool. I'm not sure why people here are so primed to shit all over anything potentially innovative, seemingly even without background knowledge. Like, is there something inherently offensive to coders about a model that threatens to do their job? Or is it just years and years of people getting burned by previous "AI" projects without knowing that this one is actually rather impressive and comes from good research?

Keep shallow dismissals to yourselves people. It's in the site's rules.

xodjmk4y ago

You can't imagine how some people might have an adverse reaction to a low-barrier of entry arbitrarily defined self-appointed moral policing 'AI' tool generating framework? Not all ideas are good ideas. It doesn't mean the ops are not talented, just misdirected.

1 more reply

dash24y ago

You might think it's awesome and well-executed, and still think that an automated toxicity classifier is a terrible idea.

1 more reply

IshKebab4y ago

> I'm not sure why people here are so primed to shit all over anything potentially innovative

Maybe jealousy - people often downplay others' achievements to make theirs feel better. Or pride - "I don't need no stinking AI assistant! What are you saying? I couldn't write this myself?". I find the latter is a common reaction to static types too.

1 more reply

nimih4y ago

This is definitely a very cool tech demo, but I got the same feeling reading this as I do when I read a blog post years ago where a guy walked through using very rigid green-red-green TDD to solve a hairy algorithmic problem[0]: it sort of seems like the person already had the shape of the solution in their head before they started writing the code.

Which is maybe the point! As the article points out, remembering the correct incantation to get matplotlib to spit out a bar chart is hard[1]; I certainly have to look it up literally every time (well, these days, I just use tools which have more intuitive APIs, but that's maybe besides the point). I don't really know what it means to "binarize" a dataset, but apparently the language model did, and apparently seeing the giant stack trace when trying to plot a precision-recall curve was enough to prompt the article writer to realize such an operation might be useful. When you're doing exploratory analysis like this, keeping a train of thought going is extremely important, so avoiding paging back and forth to the scikit-learn documentation is obviously a huge win.

But, on the other hand, this isn't a "no-code" solution in any real sense, because for all intents and purposes the author really did all the difficult parts which would've been necessary for a "fully coded" solution: they knew the technical outcome they wanted and had very good domain knowledge to guide the solution, and, shoot, they still ended up needing to understand semantics of the programming language and abstractions they were working with in that stacktrace at the end. It's still extremely neat (and, presumably, useful) to see the computer was able to correctly guess at all the syntax and API interfaces for the most part[2], but I don't really think you can fault people for wanting to push back against the idea that this is somehow fundamentally transformative, since I think it's pretty obvious that the human is (still) doing the hard and interesting parts and the computer is (still) doing the tedious and boring parts. Maybe people shouldn't be getting flustered about a click-baity title over-promising a hip new technology, but as you say:

> Or is it just years and years of people getting burned by previous "AI" projects without knowing that this one is actually rather impressive and comes from good research?

There's definitely some of this.

---

[0] I wish I could find the link for this, but I'm very bad at google these days.

[1] To risk ascribing agency to a statistical model of github commits, it is sort of funny that the co-pilot pulled in seaborn as a dependency but then did everything directly with calls to plt and DataFrame.plot.

[2] I don't really have the expertise myself to tell you whether that scikit pipeline is at all reasonable, I suppose. It sure sounds fancy, though.

1 more reply

skhr06804y ago

> Sure, it makes mistakes and needs verification. But know what also makes mistakes and needs verification?

The problem is when it makes something that looks OK but does the opposite of what you want it to. See: machine translation

xodjmk4y ago

Please add "No-Code" and "Toxicity Classifier" to your toxicity dataset.

ibeckermayer4y ago

Banger

dusted4y ago

This comment will of course be down voted, I'll attribute this to selection bias caused by the headline of the article.

You can't classify a comment as boolean toxic, toxicity does not exist in a vacuum. To extend the analogy from it's biological counterpart, toxicity depends on the organism. You should never just a piece of text in isolation and draw any conclusion about it. It must understood in context, both that of the subject, the recipient and the sender.

ShamelessC4y ago

I mean, what you're saying just isn't really directly on-topic. The article's focus is a a copilot tutorial, clearly meant to be illustrative rather than literally used in production. So it comes across like you're criticizing the article for doing something it isn't really concerned with doing to the degree you are expecting.

Does that make sense?

dusted4y ago

It does make sense.

However, the framing of the tutorial is clearly about using automated censorship at scale.

Someone is going to roughly copy-paste this into some forum software and call it a day.

IshKebab4y ago

You'll be downvoted because that is obvious, irrelevant and has no practical consequences.

account424y ago

If I had to guess, most downvotes GP got were for "predicting" downvotes.

1 more reply

nixpulvis4y ago

Fuck people who think they can define speech patterns in datasets like this. Especially since I am required to request permission to view their "Elite" documents.

This is some dystopian shit right here. I don't care what fancy models you train on it, or even what funny jokes you make of it. I'm just so done with this.

j / k navigate · click thread line to collapse

143 comments

abeppu4y ago

We're all focusing on the weaknesses of co-pilot (the comments can be longer than the code produced; you need to understand code to know when to elaborate your comment, etc).

ShamelessC4y ago

> We're all focusing on the weaknesses of co-pilot (the comments can be longer than the code produced; you need to understand code to know when to elaborate your comment, etc).

ImprobableTruth4y ago

Copilot is great as a 'smart auto-complete' or when you need to do pattern based drudge work... but that's not what this article is about. It's trying to sell people on copilot as a no-code tool.

The leading question is this:

>But as helpful as it is for coders, what if it enabled non-engineers to program too – by merely talking to an AI about their goals?

and it answers this in my opinion deceptively by presenting what amounts to a parlor trick. Whether copilot in general is any good or not is in my mind totally separate to this.

1 more reply

chockchocschoir4y ago

Just because one uses one of the ways doesn't mean they are not aware of the other way too.

1 more reply

BeefWellington4y ago

This part from the article made me chuckle, because IMO the author fell for some of the most basic language processing smoke & mirrors:

    …so we’ll give it some examples. When generating the array, it even creates the ideal variable name and escapes the quotations.

Here, it generates toxic_comments as a variable name, when the instructions were:

   # create an array with the following toxic comments: [etc]

Eventually it might get there with enough good representative training data but it's unclear to me how long that will take. If it tracks with speech processing models it might take decades plus.

visarga4y ago

> I think the most basic english language parser could output something along the lines of what was suggested, given an understanding of what valid Python should look like.

1 more reply

sdoering4y ago

Years ago at pyData Berlin I remember a talk trying to classify comments from three major online newspapers with the question if we. Could detect where a comment was made.

The 'center' (most typical) comment for these three sites totally was in line with these sentiments. The perfect proof (or confirmation bias).

jayceekay4y ago

didn't watson turn out to be useless and spaghetti code inside? aka ibm's marketing arm

2 more replies

joshspankit4y ago

People have been getting away with being toxic in public for a long time. ML cannot keep up. Humans can’t even keep up.

echen4y ago

IshKebab4y ago

qsort4y ago

Sure, but we probably can work on that a little more rather than throwing in the towel and saying 'toxicity is when text matches regexp'.

1 more reply

Spivak4y ago

1 more reply

tomerv4y ago

blackoil4y ago

junon4y ago

If this is what software engineering turns into, put a bullet through my head please.

4 more replies

ripper11384y ago

visarga4y ago

> In few years if Copilot deliver its promise ...

That's ages ago in AI time.

ShamelessC4y ago

dgb234y ago

That’s a very important thing to experience.

There’s not always a quick fix or easy path. You can’t always patch existing stuff together or just wait until the problem goes away.

And when a tool helps you too much, then is there really a point in what you’re doing? It’s not even a learning experience anymore.

1 more reply

StopHammoTime4y ago

Just to clarify, it's not really no-code: pseudocode is the new bytecode it would seem and this is just compiling that into usable code.

You still need to be able to code and understand what you're doing. You can't just ask simple questions and get complex answers. You still have to be capable of asking complex questions.

CallMeJim4y ago

Note that in part of the process, Copilot was the one asking complex questions when the human programmer didn't know how to proceed.

Copilot adds tremendous value for someone who knows what they want, but not how to do it.

Making something easier can definitely mean the difference between doing and not doing — I've taken on a lot of projects I wouldn't have attempted without Copilot.

zkldi4y ago

> I wrote a comment outlining exactly what I wanted, and immediately Copilot generated the code I was after.

How do you know it was what you were after? Like you said, it could be .toLocaleTimeString or .toLocaleString (or something else).

How do you verify that the AI isn't giving you broken/incorrect code? I guess you could check the docs, or run the code yourself, but at that point what's the value add for copilot?

3 more replies

chronolitus4y ago

Reminds me of this post by Scott Aaronson: https://scottaaronson.blog/?p=6288

deadbeeves4y ago

Technology that can only make you go ooh and ahh is pretty useless.

bmitc4y ago

ShamelessC4y ago

bmitc4y ago

ShamelessC4y ago

1 more reply

kcorbitt4y ago

If you want to try it out you can install it here: https://marketplace.visualstudio.com/items?itemName=clippy-a...

You can also find the full source code here: https://github.com/corbt/clippy-ai/tree/main/vs-code-extensi...

I'd love feedback!

[1]: https://openai.com/blog/gpt-3-edit-insert/

jwithington4y ago

this is cool! to clarify, it's different than the existing Copilot extension because this lets you edit existing code and uses voice commands?

hartator4y ago

Notice that the comments used to generate the code via GitHub Copilot are just another very inefficient programming language.

dwohnitmok4y ago

There is nonetheless something extremely valuable about being able to write at different levels of abstraction when developing code.

Copilot lets you do that in a way that is way beyond what a normal programming language would let you do, which of course has its own, very rigid, abstractions.

MichaelBurge4y ago

michalhuman4y ago

What makes it inefficient? It is verbose and similar to natural language. Given that code is more often read than written, isn't the code that is easier to understand more efficient?

______-_-______4y ago

On the other hand, most code is read more often than it is written, and those comments are very readable!

bobsmooth4y ago

Notice that the C used to generate the machine code via the compiler is just another very inefficient programming language.

junon4y ago

esjeon4y ago

p.s. Does anyone know when Copilot will update the insecure example on their website? Or are they just trying to be honest with the possible quality issues with the generated code?

saurik4y ago

I mean, I would hope they wouldn't try to "update" it by manually changing the results (as, as you note, that would be horribly dishonest).

Ozzie_osman4y ago

With this, I don't need to memorize the syntax OR be bottlenecked on looking at documentation or stack overflowing the commands I need.

qayxc4y ago

> With this, I don't need to memorize the syntax OR be bottlenecked on looking at documentation or stack overflowing the commands I need.

In other words: you're celebrating the fact that a tool allows you to become more and more incompetent.

I don't have much hope for future generations at this point.

Aeolos4y ago

1 more reply

Ozzie_osman4y ago

Well, for starters, I'd love to count myself as a "future generation" but statistically speaking I'm older than most people in our field by a good margin.

Whether memorizing things like syntax is part of competence or not is highly dependent on context. The ROI of me memorizing that specific syntax would probably be highly negative.

lkschubert84y ago

Is memorization of an API surface competence? This feels very much like a "back in my day" argument.

pech0rin4y ago

aaaaaaaaaaab4y ago

>now we have a generation of devs who dont actually understand the way things work.

Can’t wait for this to be true! I will be treated as a demigod compared to them. Job security for life!

arciini4y ago

1 more reply

softwarebeware4y ago

I'm out almost immediately. The first comment is more text than the code that it produces.

______-_-______4y ago

That happens sometimes when you move up an abstraction level. I bet "self.count += 1" is a lot longer than the machine code it generates.

rodiger4y ago

Falls apart when we get down to binary, but holds across most levels of abstraction

1 more reply

lofties4y ago

Think of GitHub copilot as StackOverflow on steroids -- a quick way to write code when you're not sure how to achieve what you're trying to do.

After all, "How to parse a CSV file in Python" is longer than "csv.reader(file)" but without knowing that "csv.reader" exists, you have no other way but to tell Google what you need.

klabb34y ago

> Think of GitHub copilot as StackOverflow on steroids

This is how I already think of co-pilot, but these steroids seem to be mostly for prototyping.

1 more reply

neurostimulant4y ago

da39a3ee4y ago

I think you're missing the point: this is starting to open the door to people who can't code.

faddypaddy344y ago

1 more reply

ShamelessC4y ago

I adore copilot and use it daily, but I'm pretty sure if I had always depended on it, I wouldn't be able to properly parse correct from incorrect programs.

Hopefully future models with higher accuracy and research in grounding can get us to that point however.

wojcikstefan4y ago

1. This is not “no-code”. You still have to read & understand the code Copilot generates.

> Rather than operating under a strict definition of toxicity, we asked our team to identify comments that they personally found toxic.

[0]: https://github.com/surge-ai/toxicity

stitched2gethr4y ago

This is actually pretty impressive. More so than I expected, and I sincerely hope this opens the door to simple solutions for those who are still learning or don't code often.

vba6164y ago

I thought at first this was a classifier for the toxicity of no-code solutions.

For instance, Microsoft Power Automate should rank highly.

rogue74y ago

This is impressive, Copilot knows scikit-learn better than the data scientist that I am.

Loeffelmann4y ago

I think this technology will really shake up how we code.

TauNeutrino4y ago

It's an AI writing another AI, the miracle of guided reproduction! As programmers we should appreciate the subtle meta in that.

It is also highly symbolic that the first AI (copilot) was created to save humans from repeating toil, while the second (classifier) is about controlling and limiting us.

I believe the author chose to apply his method to this particular example intentionally for the two above points, not because of the hype of toxicity.

holografix4y ago

This would be awesome for crap I don’t want to learn like CSS

bradleybuda4y ago

So, has anyone asked Copilot to write a better Copilot yet?

hoosieree4y ago

    // generate 1e13 different versions of bubble sort and add to db

fsargent4y ago

I seriously thought that GitHub CoPilot was suggesting how to find new kinds of sarin gas. https://www.theverge.com/2022/3/17/22983197/ai-new-possible-... How long until it does?

eurasiantiger4y ago

You could already ask it to do this.

MrYellowP4y ago

"In the name of what's Good & Right, you have to behave how we want you to ... or else."

spyremeown4y ago

What are you talking about, dude? How is an AI that generates code fascist?

DeathArrow4y ago

boredumb4y ago

The last thing this world needs is automation around people calling things toxic or problematic.

amelius4y ago

This works because there's a lot of ML code out there, and it's all very much the same.

ah271824y ago

The page is not working anymore, getting a 400 error

linkdd4y ago

Can we ask Copilot to write a proof for the collatz conjecture? or P=NP?

achenet4y ago

we could ask, but would have no garantee such a proof would be correct.

Metacelsus4y ago

From the title I thought it would be about chemical toxicity.

eric4smith4y ago

Impressive BUT.

Who is defining toxic speech? Where is that data being taken from?

This is the definition of using AI to set what the edges of “speech” should be based on potentially flawed data.

This is a clown world.

r3trohack3r4y ago

> In this example, we’re using the Copilot extension for Visual Studio Code, and a free toxicity dataset that we built;

(Emphasis mine)

Following that link:

bobsmooth4y ago

I feel so sorry for the labeling team. Hope they were paid well.

Xorlev4y ago

eric4smith4y ago

No. I did not miss it. The work is pretty good.

My problem is with the dataset and datasets like this overall that sets the tone through AI of what is acceptable and what is not.

hombre_fatal4y ago

This is absolutely insane. I had no idea Copilot was this good.

The negativity here just seems like sour grapes or weird goal posts.

csomar4y ago

Mountain_Skies4y ago

fn14y ago

> Even though it is based on mostly human written code, Copilot makes mistakes that are different from the type human coders typically make.

Can you give an example for this?

1 more reply

ShamelessC4y ago

> The negativity here just seems like sour grapes or weird goal posts.

Keep shallow dismissals to yourselves people. It's in the site's rules.

xodjmk4y ago

1 more reply

dash24y ago

You might think it's awesome and well-executed, and still think that an automated toxicity classifier is a terrible idea.

1 more reply

IshKebab4y ago

> I'm not sure why people here are so primed to shit all over anything potentially innovative

1 more reply

nimih4y ago

> Or is it just years and years of people getting burned by previous "AI" projects without knowing that this one is actually rather impressive and comes from good research?

There's definitely some of this.

---

[0] I wish I could find the link for this, but I'm very bad at google these days.

[2] I don't really have the expertise myself to tell you whether that scikit pipeline is at all reasonable, I suppose. It sure sounds fancy, though.

1 more reply

skhr06804y ago

> Sure, it makes mistakes and needs verification. But know what also makes mistakes and needs verification?

The problem is when it makes something that looks OK but does the opposite of what you want it to. See: machine translation

xodjmk4y ago

Please add "No-Code" and "Toxicity Classifier" to your toxicity dataset.

ibeckermayer4y ago

Banger

dusted4y ago

This comment will of course be down voted, I'll attribute this to selection bias caused by the headline of the article.

ShamelessC4y ago

Does that make sense?

dusted4y ago

It does make sense.

However, the framing of the tutorial is clearly about using automated censorship at scale.

Someone is going to roughly copy-paste this into some forum software and call it a day.

IshKebab4y ago

You'll be downvoted because that is obvious, irrelevant and has no practical consequences.

account424y ago

If I had to guess, most downvotes GP got were for "predicting" downvotes.

1 more reply

nixpulvis4y ago

Fuck people who think they can define speech patterns in datasets like this. Especially since I am required to request permission to view their "Elite" documents.

This is some dystopian shit right here. I don't care what fancy models you train on it, or even what funny jokes you make of it. I'm just so done with this.

j / k navigate · click thread line to collapse