undefined | Better HN

0 pointsjerf3y ago0 comments

"how to put a sentence together describing the rules, it absolutely doesn't actually understand how "Chii" melds work"

The more experience I get with GPT-3 type technologies, the more I would never let them near my code. It wasn't an intent of the technology per se, but it has proved to be very good at producing superficially appealing output that can stand up not only to a quick scan, but to a moderately deep reading, but still falls apart on a more careful reading. At least when that's in my prose it isn't cheerfully and plausibly charging the wrong customer or cheerfully and plausibly dereferencing a null pointer.

Or to put it another way, it's an uncanny valley type effect. All props and kudos to the technologists who developed it, it's a legitimate step forward in technology, but at the same time it's almost the most dangerous possible iteration of it, where it's good enough to fool a human functioning at anything other than the highest level of attentiveness but not good enough to be correct all the time. See also, the dangers of almost self-driving cars; either be self-driving or don't but don't expect halfway in between to work well.

0 comments

omginternets3y ago

I wholeheartedly agree with your analysis, but feel like it’s ignoring the elephant in the room: writing code is not the bottleneck in need of optimization. Conceiving the solution is. Any time “saved” through Copilot and it’s ilk is immediately nullified by having to check it’s correctness. From there, the problem is worsened by the Frankensteinesque stitching together of disparate parts that you describe.

I can’t imagine how Copilot would save anything but a negligible amount of effort for someone who is actually thinking about what they’re writing.

3pt141593y ago

Right on the money.

What I want is a copilot that finds errors ala spellcheck-esque. Did I miss an early return? For example in the code below

    def some_worker
        if disabled_via_feature_flag
             logger.info("skipping some_worker")
        some_potentially_hazardous_method_call()

Right after the logger call I missed a return. A copilot could easily catch this. Invert the relationship. I don't need some boilerplate generator, I need a nitpicker that's smarter than a linter. I'm the smart thinker with a biological brain that is inattentive at times. Why is the computer trying to code and leaving mistake catching to me? It's backwards.

O_H_E3y ago

> I need a nitpicker that's smarter than a linter. I'm the smart thinker with a biological brain that is inattentive at times. Why is the computer trying to code and leaving mistake catching to me? It's backwards.

Hmmmm, that is actually a good observation.

jerfOP3y ago

Yes, that's a lot more interesting. Firing a "code review" bot at my code where it asks me questions would be potentially interesting. Even if it sometimes asked some blindingly stupid questions, if I was not obligated to respond to them, I'd be OK with that.

The main problem with that is, GPT-3 can't do that. Personally, while I sing the praises of GPT as a technology, and I do mean it, at the same time... it's actually not a very useful primitive to build further technology on. The question "if you were to continue this text, what would you continue it with?" is hard to build much more than what you see with Copilot. Without a concept of "why are you continuing it with that?" (which, in some sense, the neural net can answer, but the answer exists in a way that humans can not understand and there is no apparent practical way to convert that into something humans can understand).

So GPT-x may yet advance and is fascinating technology, but at the same time, in a lot of ways it's just not that useful.

It reminds me of the video game world, where we have just staggeringly unbelievable graphics technology, and everything else lags behind this spike. Being visual creatures, it causes us to badly overestimate what's actually going on in there. Similarly, it's great that AI has these talkerbots, but they've made a whole lot of progress on something that gives a good appearance, but doesn't necessarily represent the state of the art anywhere else. This AI branch of tech is a huge spike ahead of everything else. But it's not clear to me this technology is anything but a dead end, in the end, because it's just so hard to use it for anything truly useful.

pnut3y ago

Business is trying to commoditize software development, because it's slow and expensive. All they have ever wanted, is to get their automation with the blanks filled in correctly, more or less immediately.

No-code, visual programming, gherkin, even SQL are all prior attempts at reducing the expense of software development, and of sidestepping the expensive, excuse laden gatekeepers that are software developers.

Copilot is an MVP of a technology that will probably eventually succeed in doing this, and my guess is, it's going to make CRUD slinging obsolete very soon.

Copilot is not backwards, it's just that it's a convenience tool for the execution of business, not for software developers.

When version 2 of the tool can both code and error check, hopefully you're already promoted to architect by then...

1 more reply

tsbertalan3y ago

After a few more leading attempts, I managed to give this prompt at https://beta.openai.com/playground:

     What problem does the following pseudocode have?

     def some_worker
             if disabled_via_feature_flag
                  logger.info("skipping some_worker")
             some_potentially_hazardous_method_call()

And receive this response:

     The problem with this pseudocode is that there is no "end" keyword to close off the "if" statement. This means that the code after " some_potentially_hazardous_method_call()" will always be executed, even if the "disabled_via_feature_flag" condition is true.

And that's with a GPT3 without any special fine tuning. Of course, the name `some_potentially_hazardous_method_call` is pretty leading in itself. I rewrote the prompt slightly more realistically, as:

     What problem does the following code have?

     def optionally_do_work():
             if disabling_flag:
                  logger.info("skipping the work due to flag")
             do_work()

and received:

     The problem is that the code will still try to do the work, even if the flag is set.

This does seem like a pretty trivial easier-than-fizzbuzz question to be asking, though, since it's so encapsulated.

jrpelkonen3y ago

Perhaps curly braces are useful after all.

1 more reply

tangjurine3y ago

I would say it's good practice for code reviews =P

rasz3y ago

try PVS-Studio

haspok3y ago

You missed the `else` branch, not a `return`.

1 more reply

vosper3y ago

> I wholeheartedly agree with your analysis, but feel like it’s ignoring the elephant in the room: writing code is not the bottleneck in need of optimization. Conceiving the solution is.

I dunno about this. I know the received wisdom is that "writing the code isn't the hard part", but I think reality is more like "writing the code is only one of the hard parts". There's an awful lot of badly-written code, or code which is only partly correct, or only correct under some circumstances. The only way to make writing code not one of the hard parts is to specify 100% of the functionality, every corner case, and all test scenarios, before any code is written. And then you still have to verify that it was translated correctly into code, which I think we can all agree is another one of the hard parts!

Conceiving the solution is hard, thinking of edge cases, what-ifs, and failure scenarios is hard, creating effective tests is hard, and writing the actual code understandably and correctly is also hard!

omginternets3y ago

Yes, I think you’ve more precisely articulated what I had in mind. The point stands, though: codepilot does not help with the hard part of the job. It solves a problem that only exists for people who aren’t exercising care.

jgwil23y ago

Writing the code is the hard part mainly inasmuch as it forces you to concretize and clarify previously vague notions. In that sense, it's hard to separate "conceiving the solution" from "writing the code," unless you're perhaps one of those rare geniuses who are simply able to dictate ideas fully formed in their head (I'm thinking of the scene in Amadeus when Salieri examines some clean, uncorrected sheet music and is then shocked to discover that Mozart doesn't make copies and he's holding originals).

marcosdumay3y ago

> I know the received wisdom is that "writing the code isn't the hard part", but I think reality is more like "writing the code is only one of the hard parts".

Writing the code isn't the bottleneck. And there is no point in optimizing some part of a process that isn't a bottleneck.

Anyway, have you noticed that "understandably and correctly" isn't included on the OP's definition of "writing code"? That's for a reason, and it's the most adequate definition to use on this context.

zamfi3y ago

I also agree that I’d never assume copilot is right when it blurts out code, and that “writing code” is not the hard part — but I’d note three things I found from using copilot pretty intensively over the past year or so:

1. It has shifted some of the code-writing I do from generation to curation.

Most of the time, I have to make some small change to one of the first options I get. Sometimes I don’t. Sometimes I get some cool idiomatic way of doing something that’s still wrong, but inspires me to write something different than I originally planned. All of these are useful outcomes — and unrelated to whether someone is “actually thinking about what they’re writing”.

2. It has changed my tolerance for writing redundant code, for the better.

Like many programmers, I tend to optimize my code for readability first, and then other things later when I have more information. Sometimes, my desire for readability conflicts with my desire for code that avoids redundancy (e.g., “oh but if I put these three cases into an array I can just use a for loop and don’t have to write out as much code” etc. etc.) — and my old bias was avoiding redundancy more often than not. But copilot is really great at generating code that has redundancy, which has often helped me write more readable code in quite a few cases.

3. I refactor code way more now.

In part this is because, given code that already works but is not ideal (e.g., needs to be broken into more functions, or needs extra context, or some critical piece needs to be abstracted), copilot does a fantastic job at rewriting that code to fit new function prototypes or templates. IDEs can help with this task, for a few common types of refactoring, but copilot is way more flexible and I find myself much more willing to rewrite code because of it.

Copilot is not what many people want it to be, in much the same way that Tesla’s Autopilot is not what many people want it to be. But both do have their uses, and in general those uses fall into the category of “I, as human, get to watch and correct some things instead of having to generate all things.” This can be very useful. (FWIW, it takes some time to adapt to this; I teach and mentor a lot and I found myself relying on those skills a ton when working with copilot.)

We shouldn’t discount this usefulness just because these systems don’t also have other usefulness that we also want!

culi3y ago

I don't think you've actually used it tbh. It's much quicker to read code than to write it. In addition, 95% of Copilots suggestions are a single line and they're almost always right (and also totally optional).

Here's how it actually works in practice

1. Start a line to do an obvious piece of code that is slightly tedious to write 2. Type 2 characters and Copilot usually guesses what you want based on context. Perhaps this time it's off. 3. No matter, just type another 3 characters or so and Copilot catches up and gives a different suggestion. I just hit "tab" and the line is complete

It really shines in writing boiler plate. I admit that I'm paranoid every time it suggests more than 2 lines so I usually avoid it. But in ~year of using it I've run into Copilot induced headaches twice. Once was in the first week or so of using it. I sweared off of using it for anything more than a line then. Eventually I started to ease up since it was accurate so often and then I learned my second lesson with another mistake. Other than that it's done nothing but save me time. It's also been magnificent for learning new languages. I'll usually look up it's suggestions to understand better but even knowing what to look up is a huge service it provides

1 more reply

jan_Inkepa3y ago

I swap between programming languages a lot and copilot saves me a lot of "what's the syntax for for loops in language X again?" style friction, stuff with suggesting correct API usage patterns . It just saves on the friction of writing random scripts.

bryanrasmussen3y ago

>I can’t imagine how Copilot would save anything but a negligible amount of effort for someone who is actually thinking about what they’re writing.

since I have a right arm swelled up to twice normal size right now and it hurts to type for more than ten minutes (hopefully ok in a few days) I can imagine an advanced autocomplete being really useful for some disabilities.

omginternets3y ago

More so than, say, classical snippets, auto-complete, and speech-to-text?

And pray tell, how much typing is required to go back and fix the incorrect code produced by copilot?

P.S.: wishing you a speedy recovery!

1 more reply

paskozdilar3y ago

Can Copilot write tests? That way it could test its own code and tweak it until it works.

Of course, one would then ask how to verify tests. I suppose Copilot could write meta-tests - tests that verify other tests. That way it could test its own tests and tweak them until they work.

Of course, one would then ask how to verify meta-tests. I suppose Copilot could write meta-meta-tests - tests that verify meta-tests. That way it could test its own meta-tests and tweak them until they work.

Of course, one would then ask how to verify meta-meta-tests...

prepend3y ago

You need an adversarial co-pilot to write tests for those tests so you would put the two AIs against each other to try to properly test.

Hamcha3y ago

I've used Copilot to help with writing verbose unit tests. It can do it as long as you keep an eye over it (basically like an autocomplete), it definitely cannot produce robust test cases on its own though. If you try to do that, it won't take "meta-tests" to figure out they don't look right.

DeathArrow3y ago

>Can Copilot write tests? That way it could test its own code and tweak it until it works.

Sure it can. But you can't rely on them being good. You have to read the tests carefully.

datatrashfire3y ago

I can almost envision a future where human devs write tests, code generating frameworks build code from a spec.

1 more reply

Hnrobert423y ago

Isn’t this how they managed to stop the Borg?

1 more reply

TillE3y ago

Part of the pitch is that it helps you learn new languages, which I do sort of buy.

But yeah, the hard part of writing nontrivial software isn't typing code, it's the software architecture and design.

teawrecks3y ago

If you have a sufficiently well defined solution to a problem, then you have the code. The next step is just to compile it into something a machine understands. In other words, the code IS the solution, there is no difference between the two.

danachow3y ago

Only for the most trivial problems. Having seen the same problem implemented both with a spaghetti ball of shit vs something well organized that can be easily read and maintained I’m going to hard disagree on this sentiment.

eurasiantiger3y ago

You haven’t tried Copilot, have you?

Winsaucerer3y ago

I have, and it was producing impressive results. However, I was trying to learn Rust and to do so I needed to do the hard yards myself. Moreover, in my brief time using it, it was switching me to a code reviewing frame of mind rather than thinking through the problem.

I thought it might be more useful to me for a language I’m already good at, or one I’m not trying to master but just need to get a task done for.

manimino3y ago

Generated texts often sound very confident, even when they are totally incorrect.

A humorous example: https://cookingflavr.com/should-you-feed-orioles-all-summer/

Human pair programmers will signal when they're not sure about something. A code generator will not.

jerfOP3y ago

I had to log back in just to thank you for this link. I've encountered these sites before, and told people about them, but this is just such a perfect chef's kiss example. Sheer perfection.

alluro23y ago

That link is... really something - it actually gets better and better the further down you go.

By "better", I mean more absurd, shocking and funny :)

BrandonJung3y ago

when we first built Tabnine we had confidence percentages next to suggestions. Do you think this would help?

meowface3y ago

In my opinion it's probably not worth adding unless the highest-confidence suggestion for a particular completion is significantly lower than average. So it informs you when it's especially uncertain and doesn't have anything better to offer, but that's it.

culi3y ago

How has tabnine been responding to the development of Copilot? I've looked into it and the most enticing feature to me is the ability to opt out of it using your own code to train (not even sure if that's a correct interpretation)

However it's even more expensive than copilot...

jrochkind13y ago

Wow, it actually sounds like a great tool for someone who doesn't actually know how to program at all but still managed to get a programming job. Sounds like it could be literally years until they realize you don't know how to program and are using a GPT-3-type completer.

Gigachad3y ago

Copilot does a great job of example functions like “function that posts a tweet with the current time”

It falls apart when writing actual code that exists in an app. I’m not convinced even the lowest junior dev could get away with not knowing programming.

andai3y ago

I ran into the same problem of "sounds true" when testing the limits of GPT-3 as a general purpose "knowledge engine", used to answer questions about the real world. Either it doesn't understand the difference between truth and fiction, or it doesn't care. The output is 95% truthful and 5% outright fabrication. Even worse, it writes better than most humans, so the fabrications often come out sounding extremely convincing.

I once had it generate an entire interview with an author, which was so realistic I was sure it had encountered it verbatim in the training data. The interview was about one of his books. Turns out such a book didn't even exist, but GPT-3 knew real facts like the name his publisher, the names of employees there etc. and wove them into the story.

The best use I've found for GPT-3 is text summarization, it seems to do very well on that front. I think OpenAI are working on a hyperlinked interface that lets you jump to the original source for each fact in the summary.

bane3y ago

After looking at lots of these new models, I've come to the conclusion that they're all basically weighted gibberish generators that produce output not entirely dissimilar to old fashioned hidden markov chains -- just more sophisticated. The corpus is larger and the weighting scheme is much more sophisticated (i.e. you can use prompts), but at the end of the day they're sort of just barfing out nonsense that does a better job at tricking humans into thinking they're doing something smart. These models have really just introduced the concept of differing "qualities" of gibberish on a sliding scale from

   random words ---> markov models ---> transformer ---> human writer

Inevitably, users of these kinds of models want them to produce more and more specific output to the point that they really don't want what the models produce and instead are just trying to get a computer to write stuff for them that they want. Eventually all the tuning and filtering and whatnot turns into more work than just producing the output the user wants in the first place.

It's just a room of monkeys banging on typewriters at the end of the day.

LudwigNagasena3y ago

> it has proved to be very good at producing superficially appealing output that can stand up not only to a quick scan, but to a moderately deep reading, but still falls apart on a more careful reading

Huh, that’s my experience with human-written texts and journalism in particular.

DeathArrow3y ago

To me it's no danger, since I read what it generates. If it's wrong I either correct it or write it from scratch.

And I also write tests, which should catch bad logic.

j / k navigate · click thread line to collapse

0 comments

omginternets3y ago

I can’t imagine how Copilot would save anything but a negligible amount of effort for someone who is actually thinking about what they’re writing.

3pt141593y ago

Right on the money.

What I want is a copilot that finds errors ala spellcheck-esque. Did I miss an early return? For example in the code below

    def some_worker
        if disabled_via_feature_flag
             logger.info("skipping some_worker")
        some_potentially_hazardous_method_call()

O_H_E3y ago

Hmmmm, that is actually a good observation.

jerfOP3y ago

So GPT-x may yet advance and is fascinating technology, but at the same time, in a lot of ways it's just not that useful.

pnut3y ago

Copilot is an MVP of a technology that will probably eventually succeed in doing this, and my guess is, it's going to make CRUD slinging obsolete very soon.

Copilot is not backwards, it's just that it's a convenience tool for the execution of business, not for software developers.

When version 2 of the tool can both code and error check, hopefully you're already promoted to architect by then...

1 more reply

tsbertalan3y ago

After a few more leading attempts, I managed to give this prompt at https://beta.openai.com/playground:

     What problem does the following pseudocode have?

     def some_worker
             if disabled_via_feature_flag
                  logger.info("skipping some_worker")
             some_potentially_hazardous_method_call()

And receive this response:

     The problem with this pseudocode is that there is no "end" keyword to close off the "if" statement. This means that the code after " some_potentially_hazardous_method_call()" will always be executed, even if the "disabled_via_feature_flag" condition is true.

And that's with a GPT3 without any special fine tuning. Of course, the name `some_potentially_hazardous_method_call` is pretty leading in itself. I rewrote the prompt slightly more realistically, as:

     What problem does the following code have?

     def optionally_do_work():
             if disabling_flag:
                  logger.info("skipping the work due to flag")
             do_work()

and received:

     The problem is that the code will still try to do the work, even if the flag is set.

This does seem like a pretty trivial easier-than-fizzbuzz question to be asking, though, since it's so encapsulated.

jrpelkonen3y ago

Perhaps curly braces are useful after all.

1 more reply

tangjurine3y ago

I would say it's good practice for code reviews =P

rasz3y ago

try PVS-Studio

haspok3y ago

You missed the `else` branch, not a `return`.

1 more reply

vosper3y ago

> I wholeheartedly agree with your analysis, but feel like it’s ignoring the elephant in the room: writing code is not the bottleneck in need of optimization. Conceiving the solution is.

omginternets3y ago

jgwil23y ago

marcosdumay3y ago

> I know the received wisdom is that "writing the code isn't the hard part", but I think reality is more like "writing the code is only one of the hard parts".

Writing the code isn't the bottleneck. And there is no point in optimizing some part of a process that isn't a bottleneck.

zamfi3y ago

1. It has shifted some of the code-writing I do from generation to curation.

2. It has changed my tolerance for writing redundant code, for the better.

3. I refactor code way more now.

We shouldn’t discount this usefulness just because these systems don’t also have other usefulness that we also want!

culi3y ago

Here's how it actually works in practice

1 more reply

jan_Inkepa3y ago

bryanrasmussen3y ago

>I can’t imagine how Copilot would save anything but a negligible amount of effort for someone who is actually thinking about what they’re writing.

omginternets3y ago

More so than, say, classical snippets, auto-complete, and speech-to-text?

And pray tell, how much typing is required to go back and fix the incorrect code produced by copilot?

P.S.: wishing you a speedy recovery!

1 more reply

paskozdilar3y ago

Can Copilot write tests? That way it could test its own code and tweak it until it works.

Of course, one would then ask how to verify tests. I suppose Copilot could write meta-tests - tests that verify other tests. That way it could test its own tests and tweak them until they work.

Of course, one would then ask how to verify meta-meta-tests...

prepend3y ago

You need an adversarial co-pilot to write tests for those tests so you would put the two AIs against each other to try to properly test.

Hamcha3y ago

DeathArrow3y ago

>Can Copilot write tests? That way it could test its own code and tweak it until it works.

Sure it can. But you can't rely on them being good. You have to read the tests carefully.

datatrashfire3y ago

I can almost envision a future where human devs write tests, code generating frameworks build code from a spec.

1 more reply

Hnrobert423y ago

Isn’t this how they managed to stop the Borg?

1 more reply

TillE3y ago

Part of the pitch is that it helps you learn new languages, which I do sort of buy.

But yeah, the hard part of writing nontrivial software isn't typing code, it's the software architecture and design.

teawrecks3y ago

danachow3y ago

eurasiantiger3y ago

You haven’t tried Copilot, have you?

Winsaucerer3y ago

I thought it might be more useful to me for a language I’m already good at, or one I’m not trying to master but just need to get a task done for.

manimino3y ago

Generated texts often sound very confident, even when they are totally incorrect.

A humorous example: https://cookingflavr.com/should-you-feed-orioles-all-summer/

Human pair programmers will signal when they're not sure about something. A code generator will not.

jerfOP3y ago

I had to log back in just to thank you for this link. I've encountered these sites before, and told people about them, but this is just such a perfect chef's kiss example. Sheer perfection.

alluro23y ago

That link is... really something - it actually gets better and better the further down you go.

By "better", I mean more absurd, shocking and funny :)

BrandonJung3y ago

when we first built Tabnine we had confidence percentages next to suggestions. Do you think this would help?

meowface3y ago

culi3y ago

However it's even more expensive than copilot...

jrochkind13y ago

Gigachad3y ago

Copilot does a great job of example functions like “function that posts a tweet with the current time”

It falls apart when writing actual code that exists in an app. I’m not convinced even the lowest junior dev could get away with not knowing programming.

andai3y ago

bane3y ago

   random words ---> markov models ---> transformer ---> human writer

It's just a room of monkeys banging on typewriters at the end of the day.

LudwigNagasena3y ago

Huh, that’s my experience with human-written texts and journalism in particular.

DeathArrow3y ago

To me it's no danger, since I read what it generates. If it's wrong I either correct it or write it from scratch.

And I also write tests, which should catch bad logic.

j / k navigate · click thread line to collapse