The more experience I get with GPT-3 type technologies, the more I would never let them near my code. It wasn't an intent of the technology per se, but it has proved to be very good at producing superficially appealing output that can stand up not only to a quick scan, but to a moderately deep reading, but still falls apart on a more careful reading. At least when that's in my prose it isn't cheerfully and plausibly charging the wrong customer or cheerfully and plausibly dereferencing a null pointer.
Or to put it another way, it's an uncanny valley type effect. All props and kudos to the technologists who developed it, it's a legitimate step forward in technology, but at the same time it's almost the most dangerous possible iteration of it, where it's good enough to fool a human functioning at anything other than the highest level of attentiveness but not good enough to be correct all the time. See also, the dangers of almost self-driving cars; either be self-driving or don't but don't expect halfway in between to work well.
I can’t imagine how Copilot would save anything but a negligible amount of effort for someone who is actually thinking about what they’re writing.
What I want is a copilot that finds errors ala spellcheck-esque. Did I miss an early return? For example in the code below
def some_worker
if disabled_via_feature_flag
logger.info("skipping some_worker")
some_potentially_hazardous_method_call()
Right after the logger call I missed a return. A copilot could easily catch this. Invert the relationship. I don't need some boilerplate generator, I need a nitpicker that's smarter than a linter. I'm the smart thinker with a biological brain that is inattentive at times. Why is the computer trying to code and leaving mistake catching to me? It's backwards.Hmmmm, that is actually a good observation.
The main problem with that is, GPT-3 can't do that. Personally, while I sing the praises of GPT as a technology, and I do mean it, at the same time... it's actually not a very useful primitive to build further technology on. The question "if you were to continue this text, what would you continue it with?" is hard to build much more than what you see with Copilot. Without a concept of "why are you continuing it with that?" (which, in some sense, the neural net can answer, but the answer exists in a way that humans can not understand and there is no apparent practical way to convert that into something humans can understand).
So GPT-x may yet advance and is fascinating technology, but at the same time, in a lot of ways it's just not that useful.
It reminds me of the video game world, where we have just staggeringly unbelievable graphics technology, and everything else lags behind this spike. Being visual creatures, it causes us to badly overestimate what's actually going on in there. Similarly, it's great that AI has these talkerbots, but they've made a whole lot of progress on something that gives a good appearance, but doesn't necessarily represent the state of the art anywhere else. This AI branch of tech is a huge spike ahead of everything else. But it's not clear to me this technology is anything but a dead end, in the end, because it's just so hard to use it for anything truly useful.
No-code, visual programming, gherkin, even SQL are all prior attempts at reducing the expense of software development, and of sidestepping the expensive, excuse laden gatekeepers that are software developers.
Copilot is an MVP of a technology that will probably eventually succeed in doing this, and my guess is, it's going to make CRUD slinging obsolete very soon.
Copilot is not backwards, it's just that it's a convenience tool for the execution of business, not for software developers.
When version 2 of the tool can both code and error check, hopefully you're already promoted to architect by then...
What problem does the following pseudocode have?
def some_worker
if disabled_via_feature_flag
logger.info("skipping some_worker")
some_potentially_hazardous_method_call()
And receive this response: The problem with this pseudocode is that there is no "end" keyword to close off the "if" statement. This means that the code after " some_potentially_hazardous_method_call()" will always be executed, even if the "disabled_via_feature_flag" condition is true.
And that's with a GPT3 without any special fine tuning. Of course, the name `some_potentially_hazardous_method_call` is pretty leading in itself. I rewrote the prompt slightly more realistically, as: What problem does the following code have?
def optionally_do_work():
if disabling_flag:
logger.info("skipping the work due to flag")
do_work()
and received: The problem is that the code will still try to do the work, even if the flag is set.
This does seem like a pretty trivial easier-than-fizzbuzz question to be asking, though, since it's so encapsulated.I dunno about this. I know the received wisdom is that "writing the code isn't the hard part", but I think reality is more like "writing the code is only one of the hard parts". There's an awful lot of badly-written code, or code which is only partly correct, or only correct under some circumstances. The only way to make writing code not one of the hard parts is to specify 100% of the functionality, every corner case, and all test scenarios, before any code is written. And then you still have to verify that it was translated correctly into code, which I think we can all agree is another one of the hard parts!
Conceiving the solution is hard, thinking of edge cases, what-ifs, and failure scenarios is hard, creating effective tests is hard, and writing the actual code understandably and correctly is also hard!
Writing the code isn't the bottleneck. And there is no point in optimizing some part of a process that isn't a bottleneck.
Anyway, have you noticed that "understandably and correctly" isn't included on the OP's definition of "writing code"? That's for a reason, and it's the most adequate definition to use on this context.
1. It has shifted some of the code-writing I do from generation to curation.
Most of the time, I have to make some small change to one of the first options I get. Sometimes I don’t. Sometimes I get some cool idiomatic way of doing something that’s still wrong, but inspires me to write something different than I originally planned. All of these are useful outcomes — and unrelated to whether someone is “actually thinking about what they’re writing”.
2. It has changed my tolerance for writing redundant code, for the better.
Like many programmers, I tend to optimize my code for readability first, and then other things later when I have more information. Sometimes, my desire for readability conflicts with my desire for code that avoids redundancy (e.g., “oh but if I put these three cases into an array I can just use a for loop and don’t have to write out as much code” etc. etc.) — and my old bias was avoiding redundancy more often than not. But copilot is really great at generating code that has redundancy, which has often helped me write more readable code in quite a few cases.
3. I refactor code way more now.
In part this is because, given code that already works but is not ideal (e.g., needs to be broken into more functions, or needs extra context, or some critical piece needs to be abstracted), copilot does a fantastic job at rewriting that code to fit new function prototypes or templates. IDEs can help with this task, for a few common types of refactoring, but copilot is way more flexible and I find myself much more willing to rewrite code because of it.
Copilot is not what many people want it to be, in much the same way that Tesla’s Autopilot is not what many people want it to be. But both do have their uses, and in general those uses fall into the category of “I, as human, get to watch and correct some things instead of having to generate all things.” This can be very useful. (FWIW, it takes some time to adapt to this; I teach and mentor a lot and I found myself relying on those skills a ton when working with copilot.)
We shouldn’t discount this usefulness just because these systems don’t also have other usefulness that we also want!
Here's how it actually works in practice
1. Start a line to do an obvious piece of code that is slightly tedious to write 2. Type 2 characters and Copilot usually guesses what you want based on context. Perhaps this time it's off. 3. No matter, just type another 3 characters or so and Copilot catches up and gives a different suggestion. I just hit "tab" and the line is complete
It really shines in writing boiler plate. I admit that I'm paranoid every time it suggests more than 2 lines so I usually avoid it. But in ~year of using it I've run into Copilot induced headaches twice. Once was in the first week or so of using it. I sweared off of using it for anything more than a line then. Eventually I started to ease up since it was accurate so often and then I learned my second lesson with another mistake. Other than that it's done nothing but save me time. It's also been magnificent for learning new languages. I'll usually look up it's suggestions to understand better but even knowing what to look up is a huge service it provides
since I have a right arm swelled up to twice normal size right now and it hurts to type for more than ten minutes (hopefully ok in a few days) I can imagine an advanced autocomplete being really useful for some disabilities.
And pray tell, how much typing is required to go back and fix the incorrect code produced by copilot?
P.S.: wishing you a speedy recovery!
Of course, one would then ask how to verify tests. I suppose Copilot could write meta-tests - tests that verify other tests. That way it could test its own tests and tweak them until they work.
Of course, one would then ask how to verify meta-tests. I suppose Copilot could write meta-meta-tests - tests that verify meta-tests. That way it could test its own meta-tests and tweak them until they work.
Of course, one would then ask how to verify meta-meta-tests...
Sure it can. But you can't rely on them being good. You have to read the tests carefully.
But yeah, the hard part of writing nontrivial software isn't typing code, it's the software architecture and design.
I thought it might be more useful to me for a language I’m already good at, or one I’m not trying to master but just need to get a task done for.
A humorous example: https://cookingflavr.com/should-you-feed-orioles-all-summer/
Human pair programmers will signal when they're not sure about something. A code generator will not.
By "better", I mean more absurd, shocking and funny :)
However it's even more expensive than copilot...
It falls apart when writing actual code that exists in an app. I’m not convinced even the lowest junior dev could get away with not knowing programming.
I once had it generate an entire interview with an author, which was so realistic I was sure it had encountered it verbatim in the training data. The interview was about one of his books. Turns out such a book didn't even exist, but GPT-3 knew real facts like the name his publisher, the names of employees there etc. and wove them into the story.
The best use I've found for GPT-3 is text summarization, it seems to do very well on that front. I think OpenAI are working on a hyperlinked interface that lets you jump to the original source for each fact in the summary.
random words ---> markov models ---> transformer ---> human writer
Inevitably, users of these kinds of models want them to produce more and more specific output to the point that they really don't want what the models produce and instead are just trying to get a computer to write stuff for them that they want. Eventually all the tuning and filtering and whatnot turns into more work than just producing the output the user wants in the first place.It's just a room of monkeys banging on typewriters at the end of the day.
Huh, that’s my experience with human-written texts and journalism in particular.
And I also write tests, which should catch bad logic.