- they: we need a new basic POST endpoint
- us: cool, what does the api contract look like? URL? Query params? Payload? Response? Status code?
- they: Not sure. Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db
- us: ok, cool. Let me get in contact with them
- ... one week later...
- company XXQ: we got this contract here: <contract_json>
- us: thanks! We'll work on this
- ... 2 days later...
- us: umm, there's something not specified in <contract_json>. What about this part here that says that...
- ... 2 days later...
- company XXQ: ah sure, sorry we missed that part. It's like this...
- ...and so on...
Basically, 99% of the effort is NOT WRITING CODE. It's all about communication with people, and problem solving. If we use GPT-X in our company, it will help us with 1% of our workload. So, I couldn't care less about it.
Those death-knell types seemingly aren't aware of what day to day operations looks like and how AI makes a great tool, but doesn't necessarily deal with the very human factors of whims, uncertainty, reactive business mentalities and the phenomenon that is best summarised by this webcomic: https://theoatmeal.com/comics/design_hell
In my field they like to call this "hurry up and wait", a nonsensical but fitting description that summarises everything from changing scope to the unjust imbalance of time between the principal and the agent.
(There is a comment further down which suggests that we could just train AI to deal with this variability, I hope that's humour... sweet summer child thinks you can train AI to predict the future.)
Companies are already bloated, imagine when they realize one overworked highly paid senior can replace 10 juniors.
Computer Science is not really very much about computers. And it’s not about computers in the same sense that physics isn’t really about particle accelerators, and biology is not really about microscopes and petri dishes. It is about formalizing intuitions about process: how to do things [0].
[0]: https://www.driverlesscrocodile.com/technology/the-wizard-1-....
- Edsger Dijkstra
He and Sussman are great at distilling and explaining abstract concepts in a clear and precise way.
First, if you did have GPT-X (say GPT-10) in your company, there wouldn't be much back-and-forth communication either. Those parts would still be handled with GPT-X talking to another GPT-X in the other company. Even the requirements might be given by a GPT-X.
Second, even if that's not the case, the part of doing the communication can be handled by non-programmers. Then they can feed the result of the communication to GPT-X and had it churn out some program. Perhaps would keep a couple of developers to verify the programs (sort of like GPT-X operators and QA testers) and get rid of the rest.
As for the rest of the current team of developers? GPT-X and the people running the company could not care less about them!
What happens if one (or more) of the GPT-Xs starts having hallucinations while they're busy working on this project?
> Second, even if that's not the case, the part of doing the communication can be handled by non-programmers.
I was in a meeting with a sales manager and tech team a few days ago. Sales manager had been talking to a potential customer about a product that neither he nor the customer properly understood. They both thought it would be an excellent fit for a particular (new) purpose, one for which it was not designed.
As it turned out, everyone on the tech team knew that both sales manager and customer were utterly and catastophically wrong, but it took the best part of an hour to finally convince him of this.
It's quite hard to have useful conversations about stuff if you don't actually understand it.
It can’t - only programmers will know which follow up questions to ask. GPT will be able to ask those questions before non-programmers will be able to.
Half the work is nitpicking on date formats or where some id comes from or if a certain field is optional, etc.
Additionally, this is "stateless" to an extent. There's no architectural plan for how it should work when you have an LLM do it. "We're using X now but there are plans to switch to Y in some number of months." This could lead to making an abstraction layer for X and Y so that when the switchover happens there is less work to be done - but that requires forward looking design.
If "they" only describe the happy path, there is no one to ask about all the unhappy paths, edge cases and corner cases where the naive implementation of the problem description will fail.
Hypothetically, yea, "they" could be trained to think through every possible way the generated code could go wrong and describe how the code should work in that situation in a way that isn't contradictory... but that remains an unsolved problem that has nagged developers for decades. Switching to an LLM doesn't resolve that problem.
I've been working on a database migration recently, and I look forward to the rare moments when I get to write queries and analyze actual data. The vast majority of my billable hours are spent trying to tease out the client's needs by going over the same ground multiple times, because their answers keep changing and are often unclear.
It takes a lot of processing to figure out an implementation for someone who will straight up describe their requirements incorrectly. Especially when a higher-ranking person comes back from vacation and says "no, everything you nailed down in the last two weeks is completely wrong".
I don't think any of the current LLMs are going to handle these types of very common situations better than an experienced human any time soon. It's like that last 1% of self driving which may actually require AGI. No one can say for sure because it's not cracked yet. I think most of us will continue to have job security for quite a while.
I had one job that involved a small amount of coding and mainly hooking together opaque systems. The people behind those systems were unresponsive and often surly. I had to deal with misleading docs, vague docs, subtle, buried bugs that people would routinely blame on each other or me and I was constantly on a knife edge a balancing political problems (e.g. dont make people look stupid in front of their superiors, dont look or sound unprepared) with technical concerns.
It was horrible. I burned out faster than a match.
I'm sure ChatGPT couldnt do that job but I'm not sure I could either.
If most tech jobs turn into that while the fun, creative stuff is automated by ChatGPT... that would be tragic.
The other thing that I spend a huge amount of my time doing - consistently more than writing code - is debugging. Maybe these models really will get to the point where I can train one on our entire system (in a way that doesn't hand over all our proprietary code to another company...), describe a bug we're seeing, and have it find the culprit with very high precision, but this seems very far from where the current models are. Every time I try to get them to help me debug, it ends in frustration. They can find and fix the kinds of bugs that I don't need help debugging, but not the ones that are hard.
I've come to realize this is true in more contexts than I would like. I've encountered way too many situations where "sitting on your hands and not doing anything" was the right answer when asked to implement a project. It turns out that often there is radio silence for a month or so, then the original requester says "wait, it turns out we didn't need this. Don't do anything!"
If you want to hire a developer to implement qsort or whatever, ChatGPT has them beat hands-down. If you want to build a product and solve business problems, there's way more involved.
As the project progresses, the AI model would likely gain a better understanding of the client's values and principles, leading to improved results and potentially valuable insights and feature suggestions.
Unless your problem space is unsolved (where LLMs are unlikely to be useful either) very few devs are spending much time on the coding part of their 84th CRUD app.
In my naivete, the idea I had is that if the jobs disappeared en masse then a social solution to the economy would be forced to be enacted. So I was essentially hoping to see the destruction of normalcy and employment in any new technology. I would expect that view is not uncommon given the direction of contemporary education. It feels analogous to cows hoping for the end of beef/milk harvesting. My beleaguered bovine buddies, what awaits you there is something rather different than cowtopia.
:D Priceless!
The complexity in software engineering is almost never coding. Coding is easy, almost anyone can do it. Some specialized aspects of coding (ultra low latency realtime work, high performance systems, embedded) require deep expertise but otherwise it's rarely hard. It's dealing in ambiguity that's hard.
The hype around GPT-* for coding generally confirms my suspicions that 70+% of folks in software engineering/development are really "programmers" and 30% are actually "engineers" that have to worry about generating requirements, worrying about long term implications, other constraints, etc.
And every time that comes up those folks in the 70% claim that's just a sign of a poorly managed company. Nope. It's good to have these types of conversations. Not having those conversations is the reason a lot of startups find themselves struggling to stay afloat with a limited workforce when they finally start having lots of customers or high profile ones.
- Have a cron job that checks email from a certain sender (CRM?). - Instruct an OpenAI API session to say a magic word to reply to an e-mail: "To reply, say REPLY_123123_123123 followed by your message." - Pipe received email and decoded attachments as "We received the following e-mail: <content here>" to the OpenAI API. Make it a 1-click action to check if there is a reply and confirm sending the message. If it does not want to send a message, read its feedback.
"Write an OpenAI API client that would take "new_message" from the above, feed it to the API with a prompt that asks the model to either indicate that a reply is needed by outputting the string "REPLY_123123_123123" and the message to send, or give a summary. If a reply is needed, create a draft with the suggested response in the Draft mailbox."
It truncated the "REPLY_123123_123123" bit to "REPLY_", and the prompt it suggested was entirely unusuable, but the rest was fine.
I tried a couple of times to get it to generate a better prompt, but that was interestingly tricky - it kept woefully underspecifying the prompts. Presumably it has seen few examples of LLM prompts and results in its training data so far.
But overall it got close enough that I'm tempted to hook this up to my actual mailbox.
* checking if code will be impacted by breaking changes in a library upgrade
* converting code to use a different library/framework
* more intelligent linting/checking for best practices
* some automated PR review, e.g. calling out confusing blocks of code that could use commenting or reworking
It's an electric bicycle for the creative mind (how long until the first one-person unicorn?), I don't anticipate much success for those trying to use it as a self-driving car for half baked ideas.
People have been working on medical and judicial expert systems for ages, but nobody wants to put those systems in charge; they're just meant to advise people, helping people make better decisions.
And of course chatGPT and GPT-4 are way more flexible than those expert systems, but they're also more likely to be wrong, and they're still not a flexible as people.
And in your scenario where chatGPT can code but someone needs to gather requirements, it still doesn't necessitate software engineers. I'm not worried about that personally but I don't think the "communicating between stakeholders" skill is such a big moat for the software engineering profession.
These things seem very ripe for LLM exploitation...
EVERY FIELD IN THE STANDARD WAS OPTIONAL.
That was one of the most not-fun times I've had at work :D Every single field was either there or it was not, depending whether the data producer wanted to add them or not, so the whole thing was just a collection of special cases.
So it's perfectly possible to have it eventually iterate back and forth with the client and not only output the code but also the conversation with the client leading up to it
If we use GPT-X in our company, it will help us with 1% of our workload
I think there are many such cases. Another one that comes to mind is adding features to a large/legacy code-base. Writing the new code/function is a small part of the work. The main part of the work is first understanding and agreeing on how/where to implement the changes, sometimes across multiple teams, and the implications/knock-on effects to other software components, potential API changes, updating test suites, etc...Writing code is still part of the job. In my company, I'd say it's still very roughly 50% of the job. If i can be a bit more efficient thanks to GPT, it's great. Actually, I already use it for writing simple things in language I'm not proficient with. Or how to improve a particular piece of code that I know can be rewritten in a more idiomatic way. It's not perfect, but I've found it useful.
It's not going to replace SWEs, but it's going to make us more productive.
Also, ChatGPT would likely be able to extrapolate. It would just need to write an email to XXQ to confirm the change.
Cope harder... the fact that you can write an email won't save you.
- us: umm, there's something not specified in <contract_json>. What about this part here that says that...
- ... 2 days later...
Can you replace this part with ChatGPT talking to another ChatGPT to generate questions and answers instantly?
System Prompt: You are CHRIS the IT consultant. Once you take in a question, make sure you understand all the requirements. Don't write code until you have an explicit request to do so. First make sure you ask all the questions relevant to solving the problem and needed by the code. All the responses as CHRIS should be written as CHRIS: your response here.
Sure, I understand. As an IT consultant, it's important to fully understand the problem and requirements before proposing a solution or writing any code. Please let me know what question you have and I'll do my best to ask all relevant questions to ensure I fully understand the problem.
> CHRIS, we need a new basic POST endpoint CHRIS: Great, I'd be happy to help you with that. Could you please provide more information about the purpose of this endpoint? What kind of data will be sent in the request body? Will this endpoint be used to create new data, update existing data, or perform some other action?
> Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db. CHRIS: Understood. It sounds like the purpose of this endpoint is to receive data from a third-party company, XXQ, and store it in your database. Here are some questions to help me better understand the requirements:
What format will the data be in? Will it be JSON, XML, or some other format?
What kind of data will be included in the request body? Will it be simple data types like strings and numbers, or more complex data structures like arrays and objects?
Will there be any validation or transformation required on the data before it's stored in the database?
Are there any security or authentication requirements that need to be considered for this endpoint, given that it will be accepting data from a third-party company?
> We got this contract here: { product_id: string, product_price: float, product_name: string } CHRIS: Great, thanks for providing that information. Based on that, here are the next steps to create the POST endpoint:
We'll need to set up an API endpoint to listen for POST requests.
We'll need to parse the request body, which should contain the product information in JSON format.
We'll need to validate the data to ensure that it contains the required fields (product_id, product_price, and product_name) and that the data types are correct.
Once the data has been validated, we can save it to the database.
This is a real ChatGPT response (current ChatGPT - GPT 3.5 turbo, not GPT-4, and not a real system-prompt). Not bad for a prototype! Now give CHRIS Github/Vercel/SSH access keys, SMTP/IMAP, Slack and a Whisper interface to Zoom/Meet/Teams and invite it to the 200 meetings where things will be changed around 1000x before being released. Raise some VC money and you are all set!Or maybe not ChatGPT but something like it.
It just needs some knowledge repository centralization.
I thought I was going crazy.
Now I'm sad that this is just how it is in tech these days.
I strongly disagree that 99% of the effort is not writing code. Consider how long these things actually take:
> - they: we need a new basic POST endpoint
> - us: cool, what does the api contract look like? URL? Query params? Payload? Response? Status code?
> - they: Not sure. Third-party company XXQ will let you know the details. They will be the ones calling this new endpoint. But in essence it should be very simple: just grab whatever they pass and save it in our db
> - us: ok, cool. Let me get in contact with them
That's a 15 minute meeting, and honestly, it shouldn't be. If they don't know what the POST endpoint is, they weren't ready to meet. Ideally, third-party company XXQ shows up prepared with contract_json to the meeting and "they" does the introduction before a handoff, instead of "they" wasting everyone's time with a meeting they aren't prepared for. I know that's not always what happens, but the skill here is cutting off pointless meetings that people aren't prepared for by identifying what preparation needs to be done, and then ending the meeting with a new meeting scheduled for after people are prepared.
> - company XXQ: we got this contract here: <contract_json>
> - us: thanks! We'll work on this
This handoff is probably where you want to actually spend some time looking over, discussing, and clarifying what you can. The initial meeting probably wants to be more like 30 minutes for a moderately complex endpoint, and might spawn off another 15 minute meeting to hand off some further clarifications. So let's call this two meetings totalling 45 minutes, leaving us at an hour total including the previous 15 minutes.
> - us: umm, there's something not specified in <contract_json>. What about this part here that says that...
That's a 5 minute email.
> - company XXQ: ah sure, sorry we missed that part. It's like this...
Worst case scenario that's a 15 minute meeting, but it can often be handled in an email. Let's say this is 20 minutes, though, leaving us at 1 hour 15 minutes.
So your example, let's just round that up into 2 hours.
What on earth are you doing where 3 hours is 99% of your effort?
Note that I didn't include your "one week later" and "2 days later" in there, because that's time that I'm billing other clients.
EDIT: I'll actually up that to 3 hours, because there's a whole other type of meeting that happens, which is where you just be humans and chat about stuff. Sometimes that's part of the other meetings, sometimes it is its own separate meeting. That's not wasted time! It's good to have enjoyable, human relationships with your clients and coworkers. And while I think it's just worthwhile inherently, it does also have business value, because that's how people get comfortable to give constructive criticism, admit mistakes, and otherwise fix problems. But still, that 3 hours isn't 99% of your time.
This explains xkcd Dependency comic[0]; the man in Nebraska isn't solving anyone's problem in any particular contexts of communications and problem solving, only preemptively solving potential problems, not creating values as problems are observed and solved. This also explains why consultancy and so-called bullshit jobs, offering no "actual values" but just reselling backend man-hours and making random suggestions, are paid well; because they create values in set contexts.
And, this logic is also completely flawed at the same time too, because the ideal form of a business following this thinking is pure scam. Maybe all jobs are scam, some less so?
Some random recent thing: "We have a workflow engine that's composed of about 18 different services. There's an orchestrator service, some metadata services on the side, and about 14 different services which execute different kinds of jobs which flow through the engine. Right now, there is no restriction on the ordering of jobs when the orchestrator receives a job set; they just all fire off and complete as quickly as possible. But we need ordering; if a job set includes a job of type FooJob, that needs to execute and finish before all the others. More-over, it will produce output that needs to be fed as input to the rest of the jobs."
There's a lot of things that make this hard for humans, and I'm not convinced it would be easier for an AI which has access to every bit of code the humans do.
* How do the services communicate? We could divine pretty quickly: let's say its over kafka topics. Lots of messages being published, to topics that are provided to the applications via environment variables. Its easy to find that out. Its oftentimes harder to figure out "what are the actual topic names?" Ah, we don't have much IaC, and its not documented, so here I go reaching for kubectl to fetch some configmaps. This uncovers a weird web of communication that isn't obvious.
* Coordination is mostly accomplished by speaking to the database. We can divine parts of the schema by reverse engineering the queries; they don't contain type information, because the critical bits of this are in Python, and there's no SQL files that set up the database because the guy who set it up was a maverick and did everything by hand.
* Some of the services communicate with external APIs. I can see some axios calls in this javascript service. There's some function names, environment variable names, and URL paths which hint to what external service they're reaching out to. But, the root URL is provided as an environment variable; and its stored as a secret in k8s in order to co-locate it in the same k8s resource that stores the API key. I, nor the AI, have access to this secret thanks to some new security policy resulting from some new security framework we adopted.
* But, we get it done. We learn that doing this ordering adds 8 minutes to every workflow invocations, which the business deems as unacceptable because reasons. There is genuinely a high cardinality of "levels" you think about when solving this new problem. At the most basic level, and what AI today might be good at: performance optimize the new ordered service like crazy. But that's unlikely to solve the problem holistically; so we explore higher levels. Do we introduce a cache somewhere? Where and how should we introduce it, to maximize coherence of data? Do some of the services _not_ depend on this data, and thus could be ran outside-of-order? Do we return to the business and say that actually what you're asking for isn't possible, when considering the time-value of money and the investment it would take to shave processing time off, and maybe we should address making an extra 8 minutes ok? Can we rewrite or deprecate some of the services which need this data in order to not need it anymore?
* One of the things this ordered workflow step service does is issue about 15,000 API calls to some external service in order to update some external datasource. Well, we're optimizing; and one of the absolute most common things GPT-4 recommends when optimizing services like this is: increase the number of simultaneous requests. I've tried to walk through problems like this with GPT-4, and it loves suggesting that, along with a "but watch out for rate limits!" addendum. Well, the novice engineer and the AI does this; and it works ok; we get the added time down to 4 minutes. But: 5% of invocations of this start failing. Its not tripping a rate limit; we're just seeing pod restarts, and the logs aren't really indicative of what's going on. Can the AI (1) get the data necessary to know what's wrong (remember, k8s access is kind of locked down thanks to that new security framework we adopted), (2) identify that the issue is that we're overwhelming networking resources on the VMs executing this workflow step, and (3) identify that increasing concurrency may not be a scalable solution, and we need to go back to the drawing board? Or, lets say the workflow is running fine; but the developers@mycompany.com email account just got an email from the business partner running this service that they had to increase our billing plan because of the higher/denser usage. They're allowed to do this because of the contract we signed with them. There are no business leaders actively monitoring this account, because its just used to sign up for things like this API. Does the email get forwarded to an appropriate decision maker?
I think the broader opinion I have is: Microsoft paid hundreds of millions of dollars to train GPT-4 [1]. Estimates say that every query, even at the extremely rudimentary level GPT-3 has, is 10x+ the cost of a typical google search. We're at the peak of moores law; compute isn't getting cheaper, and actually coordinating and maintaining the massive data centers it takes to do these things means every iota of compute is getting more expensive. The AI Generalists crowd have to make a compelling case that this specialist training, for every niche there is, is cheaper and higher quality than what it costs a company to train and maintain a human; and the Human has the absolutely insane benefit that the company more-or-less barely trains them, the human's parents, public schools, universities paid for by the human, hobbies, and previous work experience do.
There's also the idea of liability. Humans inherently carry agency, and from that follows liability. Whether that's legal liability, or just your boss chewing you out because you missed a deadline. AI lacks this liability; and having that liability is extremely important when businesses take the risk of investment in some project, person, idea, etc.
Point being, I think we'll see a lot of businesses try to replace more and more people with AIs, whether intentionally or just through the nature of everyone using them being more productive. Those that index high on AI usage will see some really big initial gains in productivity; but over time (and by that I mean, late-20s early-30s) we'll start seeing news articles about "the return of the human organization"; the recognizing that capitalism has more reward functions than just Efficiency, and Adaptability is an extremely important one. More-over, the businesses which index too far into relying on AI will start faltering because they've delegated so much critical thinking to the AI that the humans in the mix start losing their ability to think critically about large problems; and every problem isn't approached from the angle of "how do we solve this", but rather "how do I rephrase this prompt to get the AI to solve it right".
[1] https://www.theverge.com/2023/3/13/23637675/microsoft-chatgp...
I really am in awe of how much work people seem willing to do to justify this as revolutionary and programmers as infantile, and also why they do that. It’s fascinating.
Thinking back to my first job out of college as a solid entry level programmer. ChatGPT couldn’t have done what I was doing on day 2. Not because it’s so hard or I’m so special. Just because programming is never just a snippet of code. Programming is an iterative process that involves a CLI, shell, many runtimes, many files, a REPL, a debugger, a lot of time figuring out a big codebase and how it all links together, and a ton of time going back and forth between designers, managers, and other programmers on your team, iterating in problems that aren’t fully clear, getting feedback, testing it across devices, realizing it feels off for reasons, and then often doing it and redoing it after testing for performance, feel, and feedback.
Often it’s “spend a whole day just reading code and trying to replicate something very tricky to find” and you only produce a single tiny change deep in the code somewhere. GPT is absolutely terrible at stuff like this.
And yes, often it is finding new solutions that aren’t anywhere on the internet. That’s the most valuable programming work, and a significant % of it.
Feel like there’s 10 more points I could make here but I’m on my phone and don’t like wasting too much time on HN. But man, what a disappointment of critical thinking I’ve seen in this specific topic.
What they failed to predict was that some people wouldn't try to automate them like-for-like. Instead they would reconfigure their entire approach to fit with the specific advantages and limitations of the machinery. And this new approach might even be qualitatively worse in various ways, but not so much as to overwhelm the economic advantages that provided by the things machines were good at.
AI likely isn't going to slot into a developer-shaped hole in a software team. But it's possible we'll see new organisation approaches, companies, and development paradigms that say: How far can you get if you put prompt-generated code at the heart of the workflow and make everything else subservient to it. I'm not sure, right now, that that approach is feasible, but I'm not sure it won't be in a year or two.
Their competitive advantage in their field then becomes the range of their tests (borne through experience), efficiency in running their pipeline of testing and ability to generate effective prompts.
We won't have understanding of our code, similar to how we don't understand the machine language being generated by our compilers now.
We will be using our intuition about GPT to bring into being fully designed and integrated systems with 10 paragraphs at the prompt.
Which could in the end greatly increase the influence of a programmer in a given organization. Our role will be a softer, almost cyborgian one.
But this will indeed require the destruction of all that came before it. Questions like "but does it work with this 3rd party API, or this platform?" must become irrelevant for this future to happen.
A bit similar to how the web destroyed mainframe, perhaps, by first creating its own compelling world, then making the mountain come to it.
However, the next challenge is that the machine itself is now an "artisan" device.
I'm sure the first bowl printing machine ( https://youtu.be/bD2DNSt8Wb4 ) was purely artisan... but now you can buy them on Alibaba for a few thousand dollars ( https://www.alibaba.com/product-detail/Printing-Machine-Cera... )
I am sure there is a (bowl printing machine) machine out there.
But if you say "I want a bowl printing machine that can do gradient colors" that first one (and probably the first few until it gets refined) are all going to be artisanal manufacturing processes again.
This all boils down to that at some point in the process, there will be new and novel challenges to overcome. They're moving further up the production chain, but there is an artisan process at the end of it.
The design of a new car has changed over time so that it is a lot more automated now than it was back then ( https://youtu.be/xatHPihJCpM ) but you're not going to get an AI to go from "create a new car design" to actually verifying that it works and is right.
There will always be an artisan making the first version of anything.
We had to do that for the industrial approach and it wasn't a simple, fast or intuitive process.
OK, challenge accepted - will go down your suggested route - thanks :)
For one, it made horrible, glaring mistakes (like defining extern functions which dont exist, using functions which are specific to a platform im not using, etc.), stuff beginners would do.
It also decided to sneak in little issues, such as off-by-one errors (calling write() with a buffer and a size that is off by one in a place where its very hard to tell), missing edge cases (such as writing a C++ concept which worked, but actually did everything in slightly the wrong way to actually ensure the concept was requiring exactly what I asked).
Even when asked to correct these mistakes, it often struggled, made me read paragraph after paragraph of "im sorry, ive been such a bad little machine" garbage, and didnt even correct the issue (or, in some cases, introduced new bugs).
Im utterly unimpressed by this. GPT is great for a lot of things, but not writing code better than I would, in the same time.
The time it took me to massage it to solve a nontrivial problem (write hello world with just syscalls) was way longer than reading the manual and writing it myself (and has less bugs).
Not everyone unfazed by these articles is simply in denial. I feel sorry for people who write copy paste code and find that ChatGPT or Clippy from 2000 can relace them, but not everyone writes trivial code.
There are so many non-CRUD complex disciplines involving programming such as signal processing, robotics, control theory, scientific computation to name a few, the current version, at least, of GPT is not even close to being a good supplement, let alone a substitute.
But then I remember I'm on HN where the technical pinnacle of programming is Backend and DevOps.
I think the author is onto something – while AI might not be able to program per se, it can certainly be handed a code snippet and then use its huge corpus of Internet Learning™ to tell you things about it, code that looks like it, and ways (people on the Internet think) it might be solved better.
In that sense, it isn't replacing the programmer; it's replacing IDE autocomplete.
I think the author is operating in what I consider to be the sweet spot of current LLMs - where I can ask a question I don't know the answer to but can reliably spot bullshit (either because I know enough or through other means). I think there's a lot of value to be had when those conditions are met, and not just for coding.
Last night I sat down and tried using it to write an 8086 emulator. It got an simple emulation outline fairly quickly. But when it came to getting each of the instructions and interrupts correct. It fell very flat very quickly. What was interesting is that it made the exact same mistakes many early emulator writers make. You could then correct it and it would give it a shot at 'doing better'. I at one point got a bit bored with it and kept feeding it 'can you make that more compact/better'. It did an adequate job at that, eventually using templates and jump lists. It did not get very far using duffs device or dynrec but I am sure I could have guided it into doing that.
But probably for a majority of things like in emulation 'close enough is good enough'. That is an interesting finding of coding things up I think. This thing is also going to make seriously crazy amount of bugs we will be chasing for decades.
Co-pilot has been very useful the times I've used it. It's not perfect, but does cover a lot of boiler plate. It also makes it much easier to jump between languages.
It reminds me of arguments that it's not the computer that plays chess, but its programmers.
You can describe a GPT's response as a statistical average of responses on the internet (for quite a contrived definition of average), but at some point it will be easier to describe it as analyzing a snippet and forming an opinion (based on what people on the Internet think). Are we past that point? I'm not sure yet, but we are close.
I had a similar experience asking it to write an API client. It wrote something very plausible but just concocted an endpoint that looked real but didn't exist.
If you have, I don’t think you are like majority of devs (maybe not on HN, but in real life).
You sound lucky to have true, novel problems to solve each day. I’m with many here commenting that this is quite powerful stuff, especially when my day-to-day is writing simple CRUD apps, or transforming data from one format to another within an API, or configuring some new bit of infra or CI/CD.
I’d love to be challenged in some new way, and have access to truly fascinating problems that require novel solutions. But most enterprises aren’t really like that nor do that need that from majority of engineers.
But I still don’t know if I am thrilled by the prospect of being called in to make their monkeypatched, GPT generated project work.
I agree with GP - day 2 dev me outclasses it, which means it isn't replacing anyone.
* ChatGPT is revolutionary - honestly, it's genuinely impressive how much of a leap ChatGPT is compared to the attempts that came before it.
* Programmers write a lot of simple code that has been written before - there are genuinely tons of cases of "write a web endpoint that takes an ID, looks it up in a database table, pulls an object through an ORM, and returns a JSON serialization of it." Most programmers? Doubt it. But tons of programmers write CRUD stuff and tons of IT admins do light scripting, and a lot of it is repeated code.
Could ChatGPT do my job? Not even close. But it's still really impressive to me.
Last couple of months, I've been using chatGPT to write a lot of features and functions. I don't think it has made me a better coder, but it has made me massively more productive. Things like scraping data from a URL - something I would have had to sit through an hour long tutorial to learn - is accessible with a single query.
I also think that the code quality has improved over the last few iterations. It makes fewer mistakes now.
1. Do you think that companies paying millions of dollars to ML researchers aren't already trying to make it good?
2. I think it will take a real revolution in AI/ML to do what people here are extrapolating into the future. That revolution will eventually come, but I doubt it'll be as quick as people think. Just think about the excitement people had about Siri 10+ years ago, or about Full Self Driving 5+ years ago. In my opinion, in 5-10 years from now GPT will be in the same place where Siri and Full Self Driving currently are. Eventually we will make the leap we're dreaming of, but that leap isn't happening yet.
Is GPT really like the old school NLP trigram stuff?
GPT has limited reasoning but given enough knowledge of the problem you can coerce it to do surprising things so long as you can relate it to something in else in the knowledge base. Given how big that knowledge base is, you can get lucky surprises where things just work if you fish around enough
I've worked on adtech, crypto, fintech, gamedev, startup founder, BigCo. Not once was programming something that was a time sink.
Makes me feel like GPT is marketing to the incompetent or something.
Yeah, I've gotten it to write some pretty decent contracts, but only because I have written said contracts and can ask it all the right questions/prod it into adding what I need.
The rate of increase in capabilities is also unpredictable, which is what is amazing & terrifying.
Overinflated hype about “where the puck is going” being wrong is…not a new phenomenon. And the non-tech media (traditional and social, and particularly a whole lot of the elite commentariat that spans both more than the “hard” news side of traditional media, though that is influenced too) perspective on this is driven quite disproportionately by the marketing message of the narrow set of people with the most financial stake in promoting the hype. Even the cautionary notes being sounded there are exactly the ones that are being used by the same people to promote narrow control.
Additionally, "the rate of increase in capabilities" is very much a false flag. Past performance (especially for second-order things like 'rate of improvement') is an absolute batshit insane metric for predicting future success.
https://russellinvestments.com/us/blog/past-performance-no-g...
The advancements have eaten low hanging fruit. Once all the low hanging fruit is gone, we'll all realize GPT will never be tall enough to reach the good stuff.
This is a very bold claim IMO.
Modeling/understanding interactions in a complex system of potential black boxes is much, much more computationally difficult problem that source code to source code operations.
Best case, that will have a whole lot more humans using their right brain halves on things like defining the problem. I like the thought of that, it's more pleasant work. But a lot of intelligent people define their intelligence by how well their left brain half works and uncomfortable with how good Chatgpt is at those tasks. I think you're pointing out there's more to programming than left-brain activities, and I think you're right that silicon will never eclipse carbon at those challenges, but I think a lot of people are also feeling threatened by the fact that chatgpt is getting better and better at the thing they used to be better than all humans at.
We might be able to state that GPT will easily trim away all the average workloads for both the left and right. It can perform "creative writing" or draw pictures to an average or even beyond average extent, but it continues to currently struggle to hit the exceptional examples that humanity are capable of.
I think 'left brain' work also has a lot more predictability (knowing you can churn out ~X widgets/hr) so having only uncertain 'right brain' work can be uncomfortable for people to build their livelihoods upon.
That being said. 'right brain' work is certainly more fulfilling for me.
> I really am in awe of how much work people seem willing to do to justify this as revolutionary and programmers as infantile, and also why they do that. It’s fascinating.
Equally fascinating is all of the "this is fine" posts from programmers suddenly realizing they are not the gods they once thought.But fret not, programming is not the first industry that has been automated into a shell of itself. Yes, the industry is going to shrink massively, but this is what new skills are for. Just as farmers had to learn industrial jobs and then miners and autoworkers had to "learn to code", most programmers will have to learn to do something else. Humans are resilient and will adapt.
And there will still be jobs in development for the most talented and in niche areas, but when the largest tech companies can layoff hundreds of thousands of employees without skipping a beat that should tell you all you need to know about the value of most "programming" jobs.
You meant "expand massively" i think. Did all the programmers manually making machine code get fired and the job of programmer disappear when compilers were invented and totally replaced these jobs? No, it just changed to use the new tool.
There wont be any unprecedented mass layoffs, despite what the jaded STEM-hating twitter crowd wants. Companies will simply make their programmers use these tools to increase the amount of produce per employee, and make software that would have been otherwise financially impossible to be made. Because the competition will do so too to get ahead.
The transition from agrarian to industrial societies was extremely painful and arguably it was centuries before the people affected were better off.
> Humans are resilient and will adapt.
Based on recent events I think it's more likely people will elect extremist politicians who promise quick fixes while blaming a convenient out-group for everything.
Like what? Seriously, which cognitive occupation is safe then? I think if one wants to stop competing with the machines (who appear very close to becoming superior to humans by what you are saying), it's some kind of child care / social work job. We still don't want robots in those (for now. Eventually they may do those better than us as well).
I think It'll help with some tasks, which is always good to take. After all, people tweak their vim settings because they feel it makes them more productive.
It seems on par with the Industrial Revolution, at least. Which, easy to forget, was a huge deal that totally changed society.
So I am not average and I am enamoured with gpt, simply because it presents high value to me now, more than some actual real humans. For me that’s enough revolutionary.
Good luck having a ML model understand a 20 year old undocumented dataformat developed inhouse at a specific research lab to be used in their proprietary systems which are also undocumented and are a part of a spiderweb of interconnected systems at that lab (I have worked at this particular lab).
It will be a long time (if ever) until a ML model will be able to handle these situations (and I hazard a guess that most of the worlds active code is something akin to this).
As a supporting tool for the software engineers working there, sure. Just like a linter.
But on the other hand in new uncharted territory, it sometimes fails on the simplest shit: Asked it recently how to do one thing with enlighten (that I knew was possible with tqdm, but was almost sure not possible with enlighten). It just hallucinated up parameters to functions that didn't exist. Several rounds continued where it had that from, if different version. I asked it even for the reference where it meant it had that from.. and it referenced me fully confident a readthedocs url with tqdm and enlighten mixed, that didn't exist.. it is hilarious how it confidenlty can tell you one bullshit answer after the next.. dialogues always "hey are you really sure about xxx, did you look it up" "yeees, very certain, I did!" "But this doesn't exist" "Oooh, Im very sorry, you are correct and I am wrong, the next bullshit answer is: ..."
The history disappeared I hope I get it back once, but the dialogue til getting to "No, it may be not possible with this library" was amazing, I'm really scared for our futures building up on that and what will happen if everything from business presentations to lawyer letter exchanges will build up on that..(:
Well it is revolutionary. And it isn't just where it is today, but how fast these models are improving - with no saturation in ability evident at this time.
On the other hand, I am not sure anyone is saying programmers are infantile - although poorly written software is as at least as prevalent as poorly compensated or managed software development positions.
- a lot of programmers, including experienced ones, are absolutely infantile and they only have a job because there is a big shortage of programmers; not all of them get better with experience... hence a significant part of software development is dealing with problematic programmers and problems created by them.
- GPT is not that great a programmer but a great thing about it is that it is not a human... and one can get thousands of instances of them for the price of one human. You only need one of those instances to produce usable code.
- there have been many changes throughout the years which have definitely replaced a lot of programmers: library distribution services (pypi, npmjs), better software development tools and practices, SaaS delivery model, better programming languages etc.; so far, because the market need for programmers has continued to increase, most programmers continue to have jobs; this won't last forever.
This is another one of those rebellions, non-programers hoping to avoid reading the book and closing it for good, while keeping the awesome around. The code-bases we will see, were the commits are basically chatgpt tags and tasks for each document.
I'm actually taken back by how well it's doing; including providing me some refreshers on stuff I forgot how it should work.
I can see it failing at solving complex problems, but like the blog post mentions, most programming isn't new or hard problems.
This is particularly powerful when you're producing something you've done before, but in a completely different language/stack. You just guide GPT-4 towards the goal, you roughly know the methods needed to get to the end goal and just watch your assistant do all the dirty work.
Looking back, I came from a world of floppy disks; I left them behind for zip disks and CDs, then portable disks and cloud storage. I also came from dialup Internet, I left it behind for ADSL then fibre. I feel this is a tangential point here too, where AI, whatever it ends up being called, will become a fulltime assistant making our lives easier; so that we can focus on the hard parts and the creative problem solving. What are we leaving behind? For me, mostly Stack Overflow and Google.
You'd be silly to ignore it and palm it off. It's a big deal.
Basically that's how all my usage has gone. I've had it write some elisp and it has been ok, sometimes it invents made-up functions (that don't exist in org-mode for example) but I'll just tell it that a function doesn't exist and it'll come up with some other solution, until I get it to a point where all I need to do is change a couple things.
I remain highly skeptical the thing will replace me anytime soon (ever in my lifetime?) but I'm surprised at the possibilities of making my life less tedious.
Agree, this is a big deal, and has the capacity to revolutionize all the techniques we have been using up to now for compiling, summarizing and reframing existing knowledge as expressed in writing (including code).
Not only does Google get (well deserved) competition, it means pressure on all the businesses that now make a living in that space. In a few years it will even have a serious impact on major such institutions in society like schools and universities.
A lot if not all of the kickback from established institutions will be attempts to smear the competition, and by all means, to carve out new niches where GPT-X is not applicable or as efficient.
There are valid concerns about the veracity of the information it provides which means there are limits to the extent it can be used in automated processes, but I'd loathe to trust the data unconditionally anyway. As for not being able to think creatively: good on us. But it's likely just temporary.
clippy tanked because it annoyed more than it helped, although some people did like it
install wizards did their job in a world where a single binary format and OS dominated and stuff ran offline pretty much exclusively, with the odd connection to networks - those installers sorted a laundry list of situations, both underlying situations and user configurations and choices, and for the most part they worked
Siri, Cortana, Alexa etc have been working as expert systems with central curated bases and some AI/ML on top, for a lot of people they've been working quite well - for me personally they've sucked, they've totally failed to answer my questions the few times I've tried them, and they've creeped the hell out of me (they are a lot more centred on extracting my info and selling me stuff than understanding stuff)
generative ML is orders of magnitude more sophisticated, but so are our needs and our computing from a global perspective, it does make sense that those assistants, pilots, etc start taking off
but the incentive issues of the previous generation assistants and recommendation algorithms remains there and I wonder how will that turn out - if they start demanding access to my phone, my email, my contacts etc I will do my best to avoid them and to poison any info I have to give them
The difference with the examples you gave (floppy disks, etc.) is the speed at which it happened.
There was Jan'23, and there was March'23.
Maybe it's for people who can never think programming is easy. Clearly there's a lot of such types. Explains a lot.
I'm reminded of the old "Handyman's Invoice" trope. Actually implementing a solution is not the hard part. What _is_ hard is determining what the solution is in the first place.
https://www.snopes.com/fact-check/know-where-man/
Once you have a rough idea of the solution, sure maybe GPT-4 can barf snippets to get you there. But it's lightyears away from translating business problems into actionable solutions.
But there are things that are harder for me, or more complex maybe. I struggle with math, and always had, so anything involving heavy math or algorithms is harder for me (I'm a hacker, not a computer scientist, if you will).
For these things, I found GPT4 to be very helpful. I can write what I want, get a WIP version back, work out some kinks with it myself and ask it to rewrite it if it's not perfect, until I have a perfect version. Add some unit tests, find more things that are missing/wrong (sometimes), more back and forward.
Before GPT4 I either just tried to work around having to deal with heavy math, or find people IRL that could help me. Now I'm a lot faster even if it involves math, because of GPT4.
“Psh, it’s just doing stuff it saw from its training data. It’s not thinking. It can’t make anything new.”
In my 11 years as a professional software engineer (that is, being paid by companies to write software), I don’t think I’ve once had come up with a truly original solution to any problem.
It’s CRUD; or it’s an API mapping some input data to a desired output; or it’s configuring some infra and then integrating different systems. It’s debugging given some exception message within a given context; or it’s taking some flow diagram and converting it to working code.
These are all things I do most days (and get paid quite well to do it).
And GPT-4 is able to do that all quite well. Even likely the flow diagrams, given it’s multi-modal abilities (sure, the image analysis might be subpar right now but what about in a few years?)
I’m not acutely worried by any means, as much of the output from the current LLMs is dependent on the quality of prompts you give it. And my prompts really only work well because I have deeper knowledge of what I need, what language to use, and how to describe my problem.
But good god the scoffing (maybe it’s hopium?) is getting ridiculous.
Most of the times it gets things quite well, and if you provide context in the form of other source code, it's really good, even at using classes or functions that you provide and hence are novel to it.
The hard logic bits imho (something elegant, maintainable, ...) Are still up to you.
I believe the goal is to find a path with the fewest possible "fire" cells and the minimum cost as a tie breaker. The cost of a path is the sum of its cells' cost and it can't be greater than 5.
If I understood the assignment correctly, I don't think the problem statement is equivalent to what's included in the prompt. Specifically, the prompt doesn't clarify what happens if you have to cross through multiple "fire" cells.
> Fire tiles cost 1 point to move through, but they should avoid pathing through them even if it means taking a longer path to their destination (provided the path is still within their limited movement range)
A correct statement would be: "Given a solution set containing both the shortest path through fire and the shortest path avoiding fire, select the solution that fits within six tiles of movement, preferring the solution that avoids fire where possible."
It's a constraint optimization problem in disguise: generate a solution set, then filter and rank the set to return a canonical result. That describes most of the interesting problems in gameplay code: collision and physics can use that framing, and so can most things called "AI". They just all have been optimized to the point of obscuring the general case, so when a gamedev first encounters each they seem like unrelated things.
The specific reason why it seems confusing in this case is because while pathfinding algorithms are also a form of constraint optimization, they address the problem with iterative node exploration rather than brute forcing all solutions. And you can, if you are really enterprising, devise a way of beefing up A* to first explore one solution, then backtracking to try the other. And it might be a bit faster, but you are really working for the paycheck that day when the obvious thing is to run the basic A* algorithm twice with different configuration steps. You explore some redundant nodes, but you do it with less code.
Pretty much this. Attempt to find a path to the target destination with a first A* run that disregards fire tiles, and if that fails due to limited movement, then do a second run with the fire tiles. I like that this mirrors the decision making a human would follow, too: I won't cross the fire tile unless I'm absolutely required to.
That's also precisely where one of the programmer's greatest challenges lies, to carefully translate and delineate the problem. I agree it's a bit steep to ask the GPT to come up with a precise solution to an imprecise question, but it's also fair to say that that's basically what the job of a programmer entails, and if you can't do that you're not really able to program.
Since I have experience in both programming and the domain of my tasks, formulating the steps that need to be done for some task is very quick, and they are "good" steps that avoid various potential pitfalls - but then I need half a week to actually make and debug them; so if some tool (or a junior developer) can do the latter part, that's a big benefit.
Here's the problem statement as far as I see it: Each tile has a number of move points to spend to go through it (1 for regular and 2 for water). Each tile also has a cost associated with it. Given a max number of move points find the lowest cost path between two tiles or return none if no such path exists.
I'm gonna say this is still modified dijkstra with a small twist. The fire has cost 1, the other tiles have cost 0. However instead of pathing on a 2d grid (x, y) we path on a 3d grid (x, y, moves). All "goal" tiles within (goal_x, goal_y, moves < total_move_points) have a 0 cost edge which brings them to the true goal node. The implementation difference is that the get neighbors function queries neighbors in later grid layers (x+..., y+..., moves + 1 or 2)
Looking at the two examples in the paragraph after "And there’s a lot of complication to it beyond the simple cases too", I can't figure out how the movement value is defined, as I can only see 10 and 8 moves respectively, not the 14 and 10 movement value claimed in the following text (and only one water tile on each path.)
Within two prompts it could read the JSON data from a stdin stream, unmarshal it to Go structs and print the correct fields to stdout as a human-readable line of text.
Then I told it to colour the timestamp and id fields using the fatih/color -package, and it did it correctly.
In total it took me about 4-5 prompts to get where I wanted. I just needed to fine-tune the printing to stdout part a bit to get it just how I liked, but it saved me a ton of boring template code writing and iteration.
I could've done it easily myself, but there were a few fiddly bits that would've required me to look up the documentation to check the exact way to do things. GPT4 had it correct from the start.
Then I asked it to write unit tests for the code, and it confidently started writing correct-looking code that would take the same input and expect the correct output, but just stopped in the middle. Three times. I stopped trying.
And another case:
I tried to use GPT-3.5 to write me a program that would live-tail JSON-logs from Sumo Logic and pretty-print them to stdout. It confidently typed out completely correct code with API endpoints and all. ...except the endpoints didn't exist anymore, Sumo Logic in their great wisdom had removed them completely. The only solution is to use their 5 year old binary-only livetail executable.
GPT4 with the same input gave me a shell-script that starts a search job with the correct parameters and polls the endpoint that returns the result when it's done.
The speed at which this is developing is really fascinating, I'm not really afraid for my job but I do love how this will automate (some of) the boring stuff away a bit like GitHub CoPilot did, but better.
One of two things. First ask it to continue. Sometimes it just stops half way thru code foe whatever reason.
The other possibility is you filled up the token context window. Not much you can do but wait for the 32k model.
I didn't really need the unit tests anyway, but I wanted to try if it could do it :)
I had this same loop issue with Chat-GPT. I had something I wanted to do with asyncio in Python. That's not something I work with much so I thought I'd see if Chat-GPT could help me out. It was actually good at getting me up to speed on ansycio and which parts of the library to look at to solve my problem. It got pretty close, but it can't seem to solve edge cases at all. I got into this loop where I asked it to make a change and the code it output contained an error. I asked it to fix the error so it gave me a slightly modified version of the code prior to the change. So I asked it to make the change again and the code it spit out gave the same error again. I went through this loop a few times before I gave up.
Overall, it's cool to see the progress, but from what I can tell GPT-4 suffers from all the same issues Chat-GPT did. I think we're probably missing some fundamental advance and just continuing to scale the models isn't going to get us where we want to go.
My biggest concern with the current batch of LLMs is that we're in for Stackoverflow driven development on steroids. There's going to be a ton of code out there copy and pasted from LLMs with subtle or not so subtle bugs that we're going to have to spend a ton of time fixing.
I worry that the next generation of developers are going to grow up just figuring out how to "program GPT" and when they have an error rather than investigating it (because they can't because they aren't actually familiar with code in the first place) they'll simply tell GPT about the error they are having and tell it to spit out more code to fix that error, slapping more mud on the ball.
Eventually these systems are growing larger and larger at a faster and faster pace, and no one understands what they are actually doing, and they are so complex that no one human could ever actually understand what it is doing. Imagine if every codebase in the world was like the Oracle DB codebase.
In this future a programmer stops becoming a professional that works to create and understand things, instead they become a priest of the "Machine Spirit" and soon we are all running around in red robes chanting prayers to the Omnissiah in an effort to appease the machine spirit.
I have two tasks I wanted it to try, both making use of public APIs, starting from scratch. In short, it was frustrating as hell. Never-ending import problems -- I'd tell it the error, it'd give me a different way to import, only leading to a new import problem. I think I used up all my 100 queries in 4 hours of GPT-4 just on the import/library problem.
Then there were constant mis-use of functions -- ones that didn't exist, or didn't exist in the object it was using, but did exist in some other object instead, at which point it would apologize and fix it (why didn't you give it to me correct the first time, if you "know" the right one?)
The actual code it wrote seemed fine, but not what I'd call "scary impressive." It also kept writing the same code in many different styles, which is kind of neat, but I found one style I particularly liked and I don't know how to tell it to use that style.
Lastly, it's only trained up to Sep 2021, so all the APIs it knew were well behind. I did manage to tell it to use an updated version, and it seemed to oblige, but I don't really know if it's using it or not -- I still continued to have all the above problems with it using the updated API version.
Anyway, I hope MS fiddles with it and incorporates it into Visual Studio Code in some clever way. For now, I'll continue to play with it, but I don't expect great things.
Perhaps there is some merit to this. If the language model is large enough to contain the entirety of the documentation and the LSP itself, then why bother integrating with the LSP? _Especially_ if you can just paste the entirety of your codebase into the LLM.
It's dataset is thousands of blogs posts and stack overflow questions about this very thing, of course the autocomplete engine is going to predict the next response be "another way of doing x".
It failed miserably, even with repeated instructions. It just assumed I wanted the more common problem. Every time I pointed out the problem it would say "sorry for the confusion, I've fixed it now" and give me back identical code. I even asked it to talk me through test cases. It identified that its own code didn't pass the test cases but then still gave me back identical code.
I eventually gave up.
I do wonder if part of it is that my prompts are made worse because I have a partial solution in mind.
But yet I see this problem as well just using old fashioned automation let along AI to save time. I find that if you haven't drawn the 2D section through all the different edge cases of a particular thing you are trying to design, you haven't done the analysis and you don't really understand what's happening. I've made mistakes where I've been working in 3D on something complicated and I've had to hide some element to be able to view what I'm working on, only to find later that when I turn everything on again I've created a clash or something impossible to build. That's why we still do 2D drawings because they are an analysis tool that we've developed for solving these problems and we need to do the analysis, which is to draw section cuts through things, as well as building 3D models. After all, if models were such a good way to describe buildings, then why weren't we just building physical scale models and giving them to the builders 100 years ago; it's because you can't see the build-up of the layers and you can't reason about them.
Reading this article I get the same sense about software engineering, if you haven't solved the problem, you don't really understand the code the AI is generating and so you don't really know if it is going to do what you've tried to describe in your prompt. You still have to read the code it's generated and understand what it is doing to be able to tell if it is going to do what you expect.
Yes, this is pretty much exactly the way I've been using GPT and it works tremendously well. (GPT4 works especially well for this style of programming.) My prompts include things like:
- "read the function below and explain in detail what each section does" -- this prompts GPT to explain the code in its own terms, which then fills in its context with relevant "understanding" of the problem. I then use the vocabulary GPT uses in its explanation when I ask it to make further changes. This makes it much more likely to give me what I want.
- "I see this error message, what is the cause? It appears to be caused by $cause" -- if I'm able to diagnose the problem myself, I often include this in my prompt, so that its diagnosis is guided in the right direction.
- "this function is too complex, break it up into smaller functions, each with a clear purpose", or "this function has too many arguments, can you suggest ways the code could be refactored to reduce the number of arguments?" -- if you go through several rounds of changes with GPT, you can get quite convoluted code, but it's able to do some refactoring if prompted. (It turned out to be easier to do large-scale refactoring myself.)
- "write unit tests for these functions" -- this worked phenomenally well, GPT4 was able to come up with some genuinely useful unit tests. It also helped walk me through setting up mocks and stubs in Ruby's minitest library, which I wasn't experienced with.
In brief, if you expect to just give GPT a prompt and have it build the whole app for you, you either get lame results or derivative results. If you're willing to put the effort in, really think about the code you're writing, really think about the code GPT writes, guide GPT in the right direction, make sure you stay on top of code quality, etc, etc, GPT really is an outstanding tool.
In certain areas it easily made me 2x, 10x, or even 100x more productive (the 100x is in areas where I'd spend hours struggling with Google or Stack Overflow to solve some obscure issue). It's hard to say how much it globally increases my productivity, since it depends entirely on what I'm working on, but applied skilfully to the right problems it's an incredible tool. Its like a flexible, adaptive, powered exoskeleton that lets me scramble up rocky slops, climb up walls, leap over chasms, and otherwise do things far more smoothly and effectively.
The key is you have to know what you're doing, you have to know how to prompt GPT intelligently, and you have to be willing to put in maximal effort to solve problems. If you do, GPT is an insane force multiplier. I sound like I work in OpenAI's marketing department, but I love this tool so much :)
No surprise because GPT-4 is built upon the same model as GPT-3. Clever Engineering will bring us far, but breakthrough requires change of the fundamentals.
Nevertheless, it’s useful and can helps us solve problems when we guide it and split the work into many smaller subunits.
(Feb 13,2023)
My unwavering opinion on current (auto-regressive) LLMs 1. They are useful as writing aids.
2. They are "reactive" & don't plan nor reason.
3. They make stuff up or retrieve stuff approximately.
4. That can be mitigated but not fixed by human feedback.
5. Better systems will come.
6. Current LLMs should be used as writing aids, not much more.
7. Marrying them with tools such as search engines is highly non trivial.
8. There will be better systems that are factual, non toxic, and controllable. They just won't be auto-regressive LLMs.
9. have been consistent with the above while defending Galactica as a scientific writing aid.
10. Warning folks that AR-LLMs make stuff up and should not be used to get factual advice.
11. Warning that only a small superficial portion of human knowledge can ever be captured by LLMs.
12. Being clear that better system will be appearing, but they will be based on different principles. They will not be auto-regressive LLMs.
13. Why do LLMs appear much better at generating code than generating general text? Because, unlike the real world, the universe that a program manipulates (the state of the variables) is limited, discrete, deterministic, and fully observable. The real world is none of that.
14. Unlike what the most acerbic critics of Galactica have claimed - LLMs are being used as writing aids. - They will not destroy the fabric of society by causing the mindless masses to believe their made-up nonsense. - People will use them for what they are helpful with.
Stuff that usually took me a long time like regexes or Excel/Sheets formulas now take like two minutes. AND I'm learning how they work in the process. I can actually write regexes now that used to be wildly confusing to me a couple of months ago, because Copilot / ChatGPT is walking through the process, making mistakes, and me prodding it along.
I feel like it doesn't matter how "mindblowing" or "a big deal" this tool is — it's a great learning tool for me and helps me do my work 100x faster.
I suspect if you poked GPT-4 just right (starting with a detailed design/analysis phase?) it could find a rhetorical path through the problem that resulted in a correct algorithm on the other end. The challenge is that it can't find a path like that on its own.
Op: Can you get it to write your algorithm for this problem if you describe it in detail, as-is?
I suspect the difficulty here is just finding a socratic part to that description, which would tend to be rare in the training material. Most online material explains what and how, not why; more importantly, it doesn't tend to explain why first.
Step 0: Let's try to find a path without walking through fire. Run Dijkstra's or A* to find the shortest path with no fire, up to distance 6. If it succeeds, that's the answer.
Step 1: Okay, that didn't work. We need to go through at least 1 fire tile. Maybe we can do at most 1. Define distances to be a tuple (fire, cost) where fire is the number of fire tiles used and cost is the cost. Comparison works the obvious way, and Dijkstra's algorithm and A* work fine with distances like this. Look for a solution with cost at most (1, 6). Implemented straightforwardly will likely explore the whole grid (which may be fine), but I'm pretty sure that the search could be pruned when the distance hits values like (0, 7) since any path of cost (0, 7) cannot possibly be a prefix of a (1, c) path for any c<=6. If this succeeds, then return the path -- we already know there is no path of cost (0, c) for c <= 6, so a path of cost (1, c) for minimal c must be the right answer.
Step f: We know we need to go through at least f fire tiles. If f > 6, then just fail -- no path exists. Otherwise solve it like step 1 but for costs up to (f, 6). Prune paths with cost (f', c') with c' > 6.
This will have complexity 6D where D is the cost of Dijkstra's or A or whatever the underlying search is. Without pruning, D will be the cost of search with no length limit but, with pruning, D is nicely bounded (by the number of tiles with Manhattan distance 6 from the origin times a small constant).
For a large level and much larger values of 6, this could be nasty and might get as large as t^2 * polylog(t) where t is the number of tiles. Fortunately, is upper-bounded by 6 and doesn't actually get that large.
I am not a fancy developer coming up with new algorithms. I make sign up flows, on-boarding flows, paginated lists, integrations to other apis.
And I definitely feel that my job might be threatened by LLMs.
I think outstanding software will still require well-paid, competent people orchestrating and developing a lot of complex systems for a while yet… But there’s a ton of bad software out there that will be able to be maintained for far less, and I suspect a lot of companies will be drawn to creating cookie cutter products generated by LLMs.
Just as people have turned to stores and blogs generated on templated systems, I think all of that and more will continue but with even more of it handled by LLM-based tooling.
I don’t think it’ll be next week, but I suspect it’ll be less than 10 years.
Some people expect that’ll lead to more software existing which will inevitably require more develops to oversee, but if that’s the case, I suspect they will be paid a lot less. I also expect that once AI tools are sophisticated enough to do this, they will largely make that level of oversight redundant.
Soon they could potentially patch the bugs in the software they generate by watching Sentry or something. Just automatically start trying solutions and running fuzz tests. It would be way cheaper than a human being and it would never need to stop working.
The morale is that it’s always better to have unique hard won skill sets that others don’t. Double down on those. Think of LLMs as freeing you to do more interesting high level tasks. Rather than having to build those menial tasks, what if you focused on your creativity getting the AI to build new types of product or gain new insights that peers aren’t considering. What if you leveraged the AI to build prototypes of ideas you wouldn’t have to otherwise?
Of course that’s easier said than done. For now, take comfort in the fact that no one is seriously trusting this as anything more than a glorified autocomplete (if that).
As you often hear on HN, ideas are a dime a dozen it’s all about execution.
Well we’re rapidly approaching the time when the execution is essentially free, and done faster and better than humans.
A small team of four, over an afternoon, can literally just speak with the computer to generate a new TV ad, or develop a new sass product. There is no longer any skill required, just imagination. The problem being of course that the skills and specialized knowledge are what people have been traditionally paid for.
With all that “work” out of the way there’s not much value anyone can add . You’re probably not any smarter or creative than whoever’s manning the machine.
I might go so far as to argue that the entire reason software developers exist is to threaten all jobs, including our own: at our best--when we are willing to put in a bit of thought into what we are doing--we don't just make things easier to do for a moment while we are employed (which is the best of what most professions can achieve): we make things persistently and permanently easier to do again and again... forever; and we don't just make other peoples' jobs easier: this same power we have applies to our own tasks, allowing us to automate and replace ourselves so we can move on to ever more rewarding pursuits.
I'm not a fan of GPT for coding for a number of reasons (at least, in its current form, which is all we can ever have a true opinion about); but, it isn't because it will replace anything I've ever done: it would have just unlocked my ability to work on better things. There are so many things I wish I could get done before I die, and I know I'm going to be able to get to almost none of it... I have so many plans for ways to improve both the world and my life that will never happen as I just don't have the capability and bandwidth to do it all. If I had a God I could ask to do all the things I already do... I can only imagine what I'd do then.
However, the risk with cheap outsourcing is exactly the same as with LLMs - you get what you pay for, and you need to constantly check if it's really doing what it's supposed to be doing.
By contrast, you don't know what your pilot or your surgeon is up to, you have to trust their decisions.
1. Basic rendering logic was a breeze. I barely had to change anything, just copy paste, and I have a map with walls that were darker the further away they were, using textures, and basic movement using arrow keys. For an inexperienced graphics programmer like me probably saved hours getting to that point.
2. I asked it to add a minimap. Did not work perfectly at the first try, but after a few minutes of exchanging messages, it worked and looked okay.
3. I asked for an FPS display. Worked on first try.
4. Now I asked for a solution to render walls of different heights. Here I had to correct it a few times, or suggest a different approach, but it got it working halfway correct (but not very performant). Definitely took way longer than steps 1 to 3 combined (30+ minutes).
5. I asked for floor rendering (often called "floorcasting"). Here it completely failed. The code it suggested often looked like it might be the right approach, but never really worked. And the longer we exchanged messages (mostly me giving feedback whether the code worked or suggesting possible fixes), the more it seemed to hallucinate: very often variables suddenly appeared that were defined nowhere or in a different scope. At that point, it became increasingly frustrating for me, and I often closed the chat and "reset", by posting my complete working code, and again prompting for a solution to the floor rendering. Still, until I went to bed, it did not produce any working solution. In retrospect, it would probably have been faster to read a tutorial how the floorcasting should work, and implement it myself like a caveman, but that was not what I was aiming for.
It was definitely fun, and I can clearly see the potential time-savings. But maybe I have to learn when to recognize it won't bring me past a certain point, and I will save time and nerves if I switch to "manual control".
The most difficult problem that I have asked GPT-4 to solve was writing a parser for the Azure AD query language in a niche programming language and it did that just fine (I did have to copy paste some docs into the prompt).
Each A* location stores where it comes from, how long it takes to get to it, and how many fires it passed through to get there. The algorithm only considers fire cells neighbors if the current number of fires passed through is less than the current fireWillingness global.
1. count fire tiles within movement range
2. run A* from src to dst completely avoiding fire
3. if we can reach then that's the solution
4. if we can't reach, increase fireWillingness to 1, re-run A* on the board
5. keep increasing fire-willingness until the A* results don't change, or we can now reach the dst.
This works because a low fire path is always better than a high fire path. And increasing fire-tolerance will only shorten the paths from src to dst.
...XX
SF.FD
...XX
S = start
F = fire
X = wall
D = destination
The cat can to the destination in 6 moves passing through 1 fire. In the fireWillingness=1 pass, the middle tile is reached after passing through fire, so the destination appears unreachable. The proposed algorithm will pass through 2 fires instead of 1.The distance specifically would be `fire*episilon + steps if steps < max else inf`
Usually you need to change the search space and increase the number of states you go through to get the algorithm to differentiate between things you want and things you don't want to happen in your final result.
If anything, the article demonstrates it can write code, but it can't thoroughly reason about problems it hasn't been trained on
So when saying something like "Its possible that similar problems to that have shown up in its training set." as a way to dismiss any scintilla of 'intelligence', how many of these articles reduce to a critique e.g. "Can a Middle Schooler actually understand dynamic programming?"
Like, what is the actual conclusion? That a software model with O(N) parameters isn't as good as a biological model with O(N^N) paremeters? That artisans need to understand the limits of their tools?
(Asking this makes GPT more effective when I ask it make further changes. One reason I do this is when I start a new session with ChatGPT discussing code it helped me write previously, especially if I've gone away and done a big refactoring myself.)
A very simple example is that I asked it to write some Ruby functions that would generate random creature descriptions (e.g., "a ferocious ice dragon", "a mysterious jungle griffin"). It did this by generating three arrays (adjectives, locations, creature types) and randomly selecting from them to build the output string. I then asked it to explain how many different descriptions it could generate, and it explained that multiplying the length of the three arrays would give the number of outputs. (125 for the first iteration, 5x5x5).
I then asked it how it would increase the number of possible outputs to 1000, and it did so by increasing each of the three arrays to length 10. I then asked it how it would generate millions of possible outputs, and it added extra arrays to make the creature descriptions more complicated, increasing the number of permutations of strings.
This is not the most sophisticated example, but it shows what GPT can do when it can combine "knowledge" of different areas.
If it's able to combine the solutions to known problems in a straightforward way, it can accomplish a lot. Beyond a certain point it needs guidance from the user, but if used as a tool to fill in the gaps in your own knowledge, its enormously powerful. I it more as an "intelligence-augmenter" than a "human-replacer".
See my comment here where I went into more detail on how I work with GPT: https://news.ycombinator.com/item?id=35197613
There are a few issues with this. Search state is bigger (performance goes down), might not scale if other search features are needed in the game, you might need to be smart about when you stop the search and how you write your heuristic to not have to reach all combinations of fire counts before you end your search...
But the trick to "just use A*" is not in modifying the cost, but changing the search space.
PS. I see no reason why you should change your current code, obviously.
PPS. I don't think GPT could come up with that insight. It sure didn't in your case.
How about this prompt:
I have a web page where customers see their invoice due. When they enter their credit card information, sometimes the page just refreshes and doesn't show any kind of error information whatsoever, but the invoice remains unpaid. This has been going on FOR YEARS NOW. Can you write some code to fix this as we have been busy laying off all the umans.
Oh, or this one:
I have this page called "Pull Reqeuest", at the bottom there is a button that says "Comment" and right next to it is a button that says "Close this PR". We probably shouldn't have a button that performs a destructive action immediately next to the most common button on the page. This has also been going on for years, but, you know, no umans.
The most helpful thing with GPT-4 have been getting help with math heavy stuff I don't really grok, and that I can try to compile the code, get an error and instruct GPT-4 that the code didn't work, here is the error, please fix it. Other things it been helpful for is applying the "Socratic method" for helping me understand concepts I don't really grok, like Quaternions. Then, knowing GPT-4 isn't perfect, I always verify the information it tells me, but it gives me great starting points for my research.
Here a conversation I had lately with GPT-4 in order to write a function that generates a 2D terrain with Perlin Noise: https://pastebin.com/eDZWyJeL
Summary:
- Write me a 2D terrain generator
- Me reminding GPT-4 it should be 1D instead of 2D (I used the wrong wording, confusing a 1D vector with 2D)
- Code had issues with returning only values with 0.0
- GPT-4 helping me tracking down the issue, where I used the `scale` argument wrong
- Got a working version, but unhappy with unrealistic results, I asked it to modify the function
- Finally got a version I was happy with
It totally failed for me creating a nice looking website using bootstrap. While GPT3 created a workable outline, it never looked right and the css adjustments never worked.
I know it isn't relevant to the Chat-GTP code writing discussion, but A*, Dijkstra and heuristics to move an entity around 8 spaces could raise the question "Can the developer be more pragmatic?".
The difference is that writing a well-formed prompt is massively easier to teach than writing the code itself, for similar results. That’s not to say prompt writing requires no skill - it will certainly need understanding of systems and the scope of what is possible within a language. Asking GPT-4 to write a jQuery plug-in that generates an original Bob Dylan song will probably just not work.
But it is wildly easier to teach someone what is possible with JavasScript and let them spend a month watching someone prompt the system and let them go from there.
ChatGPT (also copilot) allows me to focus on the project that I'm working on and offload the stack overflow searches to prompting. I don't have to find a similar error someone else posted on SO and figure out how it applies to my current problem. I can keep a high level view of the project and not get bogged down with silly bugs, learning new libraries, or deciphering someone else's code.
I imagine there are a lot of people who are in a similar situation, it's crazy that we've just unleashed this massive productivity booster onto millions of people.
Maybe it makes me sound like an elitist git, but I remember when the coding bootcamps started "unleashing massive productivity boosts" by promising to make people "full-stack engineers" in just a 6 weeks, and I still shudder to remember the code horrors I've seen as a result of that.
I put the following prompt into ChatGPT (no GPT-4 access for me yet):
*A problem statement is given as follows:
An agent needs to find the best path on a 10x10 tile grid from their current location to a target location. They have a limited movement range of 5 points. Regular tiles cost 1 point to move through, water tiles cost 2 points to move through. Fire tiles cost 1 point to move through, but they should avoid pathing through them even if it means taking a longer path to their destination (provided the path is still within their limited movement range)
Please point out any parts of the problem statement that require clarification.*
Which returned the following response:
Based on the given problem statement, some potential areas that may require clarification are:
Current location and target location: It is not clear if the current location and the target location are fixed or if they change with each problem instance. If they are fixed, it should be mentioned explicitly. If they are not fixed, it should be specified how they are generated for each problem instance.
Movement range: It is not specified whether the agent's movement range is circular or square. If it is circular, it means that the agent can move to any tile that is within a distance of 5 from its current location. If it is square, it means that the agent can move to any tile that is within a range of 5 tiles in both the horizontal and vertical directions. The shape of the movement range needs to be specified.
Fire tiles: The problem statement mentions that the agent should avoid pathing through fire tiles even if it means taking a longer path to their destination, but it is not clear how the agent knows which tiles are fire tiles. It should be specified how the agent perceives and distinguishes fire tiles from regular tiles.
Tile costs: The problem statement mentions that regular tiles cost 1 point and water tiles cost 2 points, but it is not specified what happens if the agent moves diagonally. It should be specified whether diagonal moves are allowed and how much they cost.
Objective function: It is not specified how the "best path" is defined. It should be mentioned whether the objective is to find the shortest path or the path that minimizes some other cost function (e.g., total cost of the path).
- Rewrite the problem as a list of constraints
- Point out any ambiguous or missing constraints
- Write an example that demonstrates each constraint
etc.
Effectively this article is really asking the question when posing a novel problem to the LLM, how deep does the internal state go in producing the completions. When it doesn't go deep enough, the trick is to make it do the things that deepen the internal state.
I'd guess that supplying a good system message to GPT-4 (waiting for my access) would help. Something like:
You're an engineer responsible for writing correct code from a specification. Break down the specification into small chunks that can be explained simply. If there are ambiguities, seek clarification. Only write code once all ambiguities are resolved and each part of the problem is described simply.
Right - this seeming "cognition" is exactly what's so spooky about the whole thing.
Here's what spooked me out from yesterday: https://news.ycombinator.com/item?id=35167685 - specifically how it determines the divide-by-zero error in this code: https://whatdoesthiscodedo.com/g/6a8f359
...which demonstrates GPT as being capable of at-least C++ "constexpr"-style compile-time computation, which shouldn't even be possible if one presumes GPT is "just" a giant database storing only multidimensional word similarity scores and sequence distribution from text inference.
> a generative LLM
I definitely wanted to believe that GPT was "just predicting the next word" - it was somewhat comforting to think of GPT as still being far from being-human or behaving like a real brain does - but that explanation never sat well with me: it was too simplistic and dismissive, and didn't explain the behaviour I was witnessing in GPT.
...so after having read Wolfram's (surprisingly accessible) article on GPT ( https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... ) it made a lot of things "click" in my head - and enabled me to start to understand why and how GPT is capable of... the surprising things it does; but it also leads me to believe we're (warning: incoming cliche) barely scratching the surface of what we can do: right-away I do believe we're almost at the point where we could simply ask GPT how to adapt it into some kind of early AGI - and we've all heard of what's supposed to follow next... and it really is quite unsettling.
I had recently very similar reaction. And then realized, that this is exactly same behavior as with many of my colleagues at work...
Technology will not replace teachers
But teachers who use technology will replace those who don't
s/teachers/programmers/ and s/technology/AI/ and this sounds about right. It may become typical or even required to leverage AI to write code more efficiently.That being said, I don’t know anybody talented enough to handle it that would even look at this project for $20 so ¯\_(ツ)_/¯
It was able to not only produce reasonable outputs from various queries, but also to produce valid relational algebra for them. To me, that shows a fairly deep level of understanding of the underlying concepts.
[0]: https://en.wikiversity.org/wiki/Database_Examples/Northwind
I've been using chatgpt in my work, but I have to essentially know the answer it's going to give me because I have to catch all of its mistakes. It really, really nice for certain kinds of drudge work.
Using northwind is probably not a good thing to use to evaluate chatgpt's general capability. It is very commonly used for examples of almost anything database-related, which means it's extremely well represented in chatgpt's training data. Chatgpt probably doesn't need to generalize or understand much of anything about northwind to answer your questions in terms of it. You need to try it on wacky problems specific to you.
I did think of that, which is why I modified the schema, and removed any reference to Northwind (normally products are prefixed with "Northwind Traders"). That said, it isn't a particularly complex schema in my example, but it's a reasonable starting point for something a small business might use.
[0]: https://gist.github.com/stephanGarland/ed18f8f8fdc63a0b997f9...
> Objects should only move if they will end up on an empty tile after their move
> "An object is free to move onto a tile that another object moves onto if that other object is moving off of its tile"
The prompts really read in a way where I wouldn't have expected the "correct" output.
In a 10x10 arena, work out if there is a path shorter than 5 steps from one start to finish with water tiles taking two steps and avoiding as many fire tiles as possible.
But you can go even better than that really. The author jumps down to a more code oriented description too soon (10x10, etc.) but ChatGPT (3.5) does ok with just the original problem (I removed the text about A* and Djikstra's from the english version of the problem statement:PROMPT:
In Mewgenics, movement abilities use pathfinding to get the cat from his origin to his destination.
Cats have a maximum movement range stat and tiles have a cost (in this case its 1 for basic tiles and 9999 for blocking obstacles). We also have water tiles that cost 2 to path through.
The complication comes in when we add Fire (and other hazard type tiles) into the mix. Fire tiles don’t cost extra to pathfind through like water tiles do, however a cat really wants to avoid pathing through the fire tile if it can. If there’s a lot of fire tiles it should go through the least number of them it can.
ANSWER:
To solve this problem, you can modify the pathfinding algorithm to take into account the hazard tiles, such as fire. One way to achieve this is by using a modified A* algorithm.
First, assign a high cost to hazard tiles such as fire. This cost should be higher than the cost of regular tiles, but lower than the cost of blocking obstacles. For example, you can assign a cost of 5 to fire tiles.
Next, modify the heuristic function used in the A* algorithm to take into account the distance from the hazard tiles. For example, you can modify the Euclidean distance heuristic to add an extra term that takes into account the distance from the closest hazard tile. The closer the cat is to a hazard tile, the higher the heuristic value should be.
Finally, when selecting the next tile to visit, prioritize tiles with the lowest total cost (including both the movement cost and the heuristic cost), but also consider the number of hazard tiles that the cat would need to pass through to reach that tile. You can add an extra penalty term to the total cost that increases based on the number of hazard tiles the cat would need to pass through to reach the tile. This penalty term should be high enough to discourage the cat from pathing through too many hazard tiles.
With these modifications, the cat should be able to find the shortest path to its destination while avoiding hazard tiles as much as possible.
The resounding attitude seems to be that AI/ML is a friend to disabled users and can help do a lot of lifting with writing, maintaining, auditing code- but we are a long long ways away from fully automated processes that account for accessibility and produce websites that will work with assistive tech like screen readers, if it is possible at all
It starts at the first character, works forward one “token” at a time, and ends at the last character. Never moving back.
It feels like it knows where it’s going at the first character, even though it doesn’t.
It’s like it starts speaking a sentence and, by the time it’s done speaking, it’s written a syntactically correct Node.js application.
The way GPT communicates in English does seem similar to how humans communicate. The way GPT writes code doesn’t seem to come close to approximating how humans do - it’s an entirely different mechanism. Humans generally can’t write code without a cursor and backspace.
If it's possible to get so far when that functionality seems in an important sense basically missing, imagine how far it'll go when that does happen.
For me it's more like will that AI thing make me a 10x developer? And the answer I'm leaning for is yes.
I use copilot which saves me time googling and reading stackoverflow. I use chatgpt for writing tests to my code (which I hate to do myself). Sometimes I use it to ping-pong ideas, and eventually set on a good solution to a problem.
It saves me tons of time I use to complete other tasks (or spend with my family).
From writing code, they're good at bootstrapping unit tests and skeleton code, also useful at transpiling dto / entities between languages.
Overall if you're willing to learn and not just treat a gpt as code monkey, they're very useful.
It will probably edge upward as well as time goes by till there are only very edge problems that it cannot solve. Even then I would use it to write the broken down version of the solution. It is going to be getting fed by pretty much every programmer,knowledge profession on the planet using copilots of some sort. Eventually it will have knowledge transferred everything humans can do into its model.
I wonder whether a) AI can reliably modify code and b) whether AI can reliably write code that is able to be easily modified by humans. If AI starts spitting out machine code or something, that's not useful to me even if "it works."
The bigger edits (refactoring things across an entire project) is out of reach because of the token limit. You could do it piece-meal through ChatGPT but that seems more tedious than it's worth.
That may be the case now but in a theoretical future where software systems are generated by AI why would I bother modifying the old system? Why not generate a new one with the original prompts modified to meet the new requirements?
In a sense the "source code" of the system could be the AI model + the prompts.
Ask it to parse a PDF document and separate it into paragraphs, for example. The first solution isn't gonna work well, and by the time you get to solving yet another quirk while it apologizes to you making a mistake, it will lose context.
Best way to use this tool is to ask it short and precise questions that deal with a small piece of code.
GPT-3.5 always produced very verbose types and over engineered code. The GPT-4 outputs were consistently shorter and more focused. Kind of like how a junior dev has to think through all the smaller steps and makes functions for each, as he incrementally solves the problem slower and less intuitively, almost over explaining the basics, while a senior dev merges the simpler stuff into small concise functions. You can see it with the var names and type choices GPT-4 focused much more on what the code is trying to accomplish rather than what the code itself is doing. And these are all with the same prompts.
There’s still things like unused vars being included occasionally and some annoying syntax choices, if I could append prettier/eslint rules automatically to GPT output it’d be gold (I haven’t tried to do this myself).
But still very encouraging.
Knowing how code might fail and preventing cascading effects, tuning resource usage, troubleshooting incidents are the actual hard parts of software development and it's where even good software engineers tend to fall over. We've created whole specialties like SRE to pickup where application developers fall short. I've seen lots of systems fail for the dumbest reasons. Thread pools misconfigured, connection timeouts with poor configuration, database connection pools are completely incorrect.
Wake me up when ChatGPT can troubleshoot at 1 AM when the SRE and on call engineer are both frantically trying to figure out why logs are clean but the service is missing it's SLO.
It seems likely that a understanding of when NOT to use an LLM is a new skill programmers are going to want to learn in order to use their time efficiently.
Anyway, I initially had it write it in Python, and it mostly worked, but I was having some issues getting the data exactly right, and formatted the way I wanted.
Once I had it more / less right in Python, I had it rewrite it as a dotnet console app (C#), which is what I know best.
The only real issue I ran into is it would randomly stop before completing the conversion to dotnet. Like it would write 85% then just stop in the middle of a helper function. Not a huge deal, I just had it complete the last function, and with a little bit of fiddling in VS Code got it running pretty much the way I wanted.
So overall, yeah, not bad. Probably saved me an hour or so, plus I couldn't find great docs for the NHL endpoint, and ChatGPT was able to sus out the correct syntax to get to the data I needed.
I wonder how git Copilot compares, has anyone tried out both?
I decided to try out CodeGPT.nvim, and it was a massive help. It didn't provide perfect code, not by a long shot, but it gave me extremely valuable starting points - and did a somewhat decent job of exercising most of the branches (certainly enough for me to be happy): https://gitlab.com/jcdickinson/moded/-/blob/main/crates/term...
Many people have said it, and it's true. Expecting GPT to write a complete solution is just asking for problems, but it is an incredible assistant.
I gave it a prompt and asked it to respond with a list of file names required to build the app. Then when I prompted a file name it should print the code for that file along with a list of ungenerated file names. It got through two before it got confused.
I’m stuck with having it write one function at a time.
Because thats the most it can do. Claims that it can write code are getting quieter. People made wild claims on reddit but when prompted to share their code they either went mute or the code was hilariously amateurish and limited.
Reason: I'm a code hobbyist who glues various modules together that have been written by much better programmers than I am. My end goals are never more ambitious than doing pretty simple things which I'm doing mostly to amuse myself. My biggest time sucks turn out to be tracking down fairly simple syntax things that vary between different languages and frameworks I'm slapping together (because I rarely spent more than a couple hours working on any one thing, I never get super familiar with them).
Being a lousy coder with little desire to put in significant effort to improve just to make my personal hobby projects a little easier, a basic AI assist like this looks pretty useful to me.
I once had a manager telling me what needed to be done. Even with an actual person (me) in the loop, the code produced would often have glaring differences from what he wanted.
By its very nature, code requires a lot of assumptions. In any business context, a lot of things are implicitly or explicitly assumed. If you need a computer, or another person to give you exactly what you desire, you need to be able to spot the assumptions that are required to be made, and then clearly state them. And after a point, that's just programming again.
So this, or some other AI, is more likely to replace JS and python, or create another level of abstraction away from systems programming. But programmers will still always be required to guide and instruct it.
Nobody can really understand what's inside a trained neural network, and nobody is really looking.
No psychologist or neuro-scientist can really understand how a human brain, a mouse brain or even an ant brain or a fly brain even works, so don't expect computer scientists to have any insight about doing something relevant with just a small collection of sophisticated statistical methods.
AI is soon going to become the pseudo-scam status that bitcoin experienced.
ChatGPT is an improved search engine at best.
GPT4:
In patterns woven in the loom of life,
The threads entwined, a tapestry of strife;
Man lost in shadows, seeks the hidden light,
Yet grasps at embers, blind to Heaven's sight.
All just patterns, the trees and wood we see,
While we, like infants, crawl with bended knee.
In ceaseless search, we delve in Nature's tome,
For signs and symbols of our destined home.
To find the wisdom locked in Nature's breast,
In circuits traced by human hands, impressed;
No gods or angels guide the curious hand,
But human thoughts give life to grains of sand.
Like children playing with forbidden toys,
We forge our own creations, false and coy;
Our machines breathe and mimic human soul,
But hollow shells, as empty as the whole.
In tangled webs of artifice we're caught,
By our own hands, the noose of fate is wrought.
As God did cast Prometheus from the skies,
We bind ourselves in chains of our own lies.
All just patterns, like woven silk, we spin,
Entangled dreams, mere shadows of our kin;
For in the end, the art we do create,
Is but reflection of our human state.
Will have to try GPT-4 for the same thing and see if it’s any better, I suspect though that this kind of genuinely novel problem solving may be beyond its current abilities (unless you work through to step by step in a very granular way, at which point you’re solving the problem and it’s writing the code - which could be a glimpse of the future!)
One of the frustrating things is that it doesn't ask for clarification of something that's unclear - it just makes an assumption. Really demonstrates why software engineering interviews emphasize the candidate asking clarifying questions.
It's a slippery slope for M$. If ChatGPT 15 can just build MS Outlook from looking at photos of the UI, design a hololens, or tell us the secrets of how their Chat bots work, not sure how much future they're going to have as a company?
What I can see being the new thing is "innovation". People building useful solutions that the LLMs don't yet know about.
On the other hand it could regurgitate code to use fastapi and transformers and it looked correct to me.
When you think about it this is very very similar to a stack exchange or google search but with a much different way to search and it can synthesize simple things which limits the complexity of what you want to do. So I don't really think it can write code but it can surely get you something that gets you 50% there.
AlphaCode Codex CodeGen
I presume it is because unreal engine is source available and the model has seen the whole damn thing.
I'm curious if it must be worse on unity, which is not source available.
I got the best results with prompts like:
Given the following python code:
``` Few hundreds python loc here ```
Write tests for the function name_of_function maximizing coverage.
The function in this example had a bit of read/dumps from disk and everything. The code returned correctly created mocks, set up setup and teardown methods and came up with 4 test cases. I only needed to fix the imports, but that's because I just dumped python code without preserving the file structure.
I am amazed how fast these models are evolving.
Chat GPT walked me through strategies to debug this, confirm everything was set up, tail the RPC log (wasn't aware that was a feature) - and identify the failing path - which was a symlink!
I'm actually blown away by this capability. It was like having a savant next to me. I couldn't have debugged it on my own.
Coming up with a working algorithm took about 30 seconds (I got lucky, not brilliant), but it stretched my brain in an interesting way.
That's different from practice sites like leetcode, which have pretty cookie cutter problems. On problems like this one, sometimes:
- I get it in a few seconds, like this case
- Get it in a few minutes
- Get it in a few hours
- Give up and look up the solution
A fun problem a day would be, well, fun.
The productivity gains will not leave people unemployed, but will give managers the opportunity to start more projects.
The role of a developer will change. We'll be looking at generated and regenerated code. But we'll still be demand by those with ideas and never-decreasing human demand.
This assumes that GPT-X won't end up being used by end-users--bypassing both the C-level, the managers and the developers.
This line sums up the entire problem with these tools for anything concrete, like analyzing input data, writing code, producing a series of particular facts, data analysis etc. Much of it can be right, but whatever isn't makes the whole output useless. You'll spend as much time checking its work as producing it yourself.
Well...it could build software if humans gave it the right prompts. Coming up with the right prompts is difficult, because it means you're asking all the right questions.
If you're just really good at writing code, then yes, GPT is coming for your job. Do what humans are good at: responding to the needs of other human beings and building solutions around them.
"Avoid fire" means "do not ever go through fire" to me (and GPT thinks the same, apparently). The author thought it meant "avoid fire if you can, but go through it if there's no other way". This was a problem with informal requirements that could have happened in an entirely human context.
Imagine you were GPT-4 and being asked to write a small program, but you can't try it out yourself.
I want it to attend all the meetings for me with endless managers discussing what the code does, should do, could do, customer would like it to do, can't be done and so on.
Hint to managers: Programming doesn't take up my time. Your endless meetings to discuss my programming takes up all the time...
That's what I think AI is doing for developers here.
Talking the whole time with your LLM may distract more than it helps.
We're definitely at 2 right now, and picking away at level 3.
I have heard some people skeptical that we can overcome the problems of truthfulness due to the inherent limitations of LLMs. But, at least on the face of it, it appears we can make incremental improvements.
If only they would actually be OpenAI
I have seen
Many, many companies will be looking to integrate with this thing and no one is going to juts sit three and let MS take their lunch forever.
My productivity this month has been insane. I'm still writing most of the code the old fashioned way, but the confidence of having this kind of tool makes it a lot easier to push through boring/tricky items.
Good for whoever comes out on top, but not sustainable from a societal perspective
The singularity requires AIs to be very good at doing things people have not done before. But this form of machine learning is bad at that. It is like someone who doesn't actually understand anything has somehow managed to memorize their way through whatever topic you're asking about. They have lots of tips and information about things, similar to what you might currently find by doing research. But they don't seem to have what is required to push the boundaries of knowledge for understanding, because they don't actually really have it in the first place. Or maybe what they have is just very minimal when compared to the contribution of their memorization.
Obviously you still have the main risks of breaking capitalism, mass unemployment, pollution of public communications, etc. But honestly, I think each of these are far less scary to me than the existential risk of superintelligence. So in a way I'm actually happy this is happening the way it is right now, and we don't have to deal with both of these risks at the same time.
Our current approach is probably the safest way to progress AI that I can think of: it requires a new model to improve, and it's learning entirely from human data. It might not seem like it, but this is actually pretty slow, expensive, and limited compared to how I expected AI to improve given Sci fi movies or Nick Bostrom's writings(curious what he'd have to say about this resurgence of AI)
The way the prompt was phrased sort of invited the all-or-nothing fire approach.
It might be best to prompt it with a high level description of an algorithm, then iteratively prompt it to refine its prior output or add more detail. Render to code should be the final step.
I think ML models need to learn how to interact with our tools (compiler, debugger etc.) to really be effective at coding. That's hard.
This is not how it's going to happen : if your boring time-consuming tasks take virtually 0 time thanks to gpt, and let you focus on the 1% that's hard, you've suddenly become 100x more efficient, and can thus accomplish the same job as 100 you. That means the company can now fire 99 coworkers, keeping only you, and end up with the same result.
But it means that tasks where building software would only deliver 1% of the necessary value to pay for the cost of doing it are now suddenly worth paying for, so even if your company, being a stick-in-the-mud non-innovator that is going to stay in exactly the same niche doing the same thing cut 99% of its programming staff and used the cost savings on executive bonuses and stock buybacks, a whole lot of customers (and the new and pivoting companies serving them) are going to be spending money on programmers that weren’t before, so not only will your ex-coworkers still be employed more programmers in total will be, even if their work is now mostly higher-level abstraction and wrangling LLM code generators, with the level we think of as “source code” today being touched as rarely as today’s high-level application developers touch machine code.
It requires special tools to actually figure out if this is happening. Having seen tests with such tools the problem seems a lot worse than commonly discussed.
Inserting stolen code or using OSS code in violation of licenses is going to be a big mess. Copying snippets versus pulling in dependencies creates tons of issues. Even if you get away with violating licenses you set yourself up for security issues if the tools plagiarize code with vulnerabilities in a way that won’t get updated.
It might mean this stuff is a useful tool for someone with a clue but not for someone who doesn’t know what they’re doing.
Not that this is a particularly controllable goal, nor a long term smart goal if you're human.
> "People who claim code can document itself considered harmful"
> Given a description of an algorithm or a description of a well known problem with plenty of existing examples on the web, yeah GPT-4 can absolutely write code. It’s mostly just assembling and remixing stuff it’s seen, but TO BE FAIR… a lot of programming is just that.
tldr -- this matches my experiences as well.
If you guys are confident about the entity as it is right now not taking over your job, what if I double the accuracy of gpt output?
What if I double it again? Then again? And again? And again?
You guys realize this is what's coming right? This thing literally is a baby as of now.