undefined | Better HN

Skip to content

Top New Best Ask Show Jobs

undefined | Better HN

0 pointslatexr9mo ago0 comments

> Or generating 4 versions of PR for the same task so that the human could just pick the best one.

That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality. Why are we doing this to ourselves and embracing it?

A few years ago, it would have been seen as a joke to say “the future of software development will be to have a million monkey interns banging on one million keyboards and submit a million PRs, then choose one”. Today, it’s lauded as a brilliant business and cost-saving idea.

We’re beyond doomed. The first major catastrophe caused by sloppy AI code can’t come soon enough. The sooner it happens, the better chance we have to self-correct.

0 comments

chamomeal9mo ago

I say this all the time!

Does anybody really want to be an assembly line QA reviewer for an automated code factory? Sounds like shit.

Also I can’t really imagine that in the first place. At my current job, each task is like 95% understanding all the little bits, and then 5% writing the code. If you’re reviewing PRs from a bot all day, you’ll still need to understand all the bits before you accept it. So how much time is that really gonna save?

diggan9mo ago

> Does anybody really want to be an assembly line QA reviewer for an automated code factory? Sounds like shit.

On the other hand, does anyone really wanna be a code-monkey implementing CRUD applications over and over by following product specifications by "product managers" that barely seem to understand the product they're "managing"?

See, we can make bad faith arguments both ways, but what's the point?

consumer4519mo ago

I hesitate to divide a group as diverse as software devs into two categories, but here I go:

I have a feeling that devs who love LLM coding tools are more product-driven than those who hate them.

Put another way, maybe devs with their own product ideas love LLM coding tools, whilr devs without them do not.

I am genuinely not trying to throw shade here in any way. Does this rough division ring true to anyone else? Is there any better way to put it?

nevertoolate9mo ago

Issue is if product people will do the “coding” and you have to fix it is miserable

ponector9mo ago

>That sounds awful.

Not for the cloud provider. AWS bill to the moon!

osigurdson9mo ago

I'm not sure that AI code has to be sloppy. I've had some success with hand coding some examples and then asking codex to rigorously adhere to prior conventions. This can end up with very self consistent code.

Agree though on the "pick the best PR" workflow. This is pure model training work and you should be compensated for it.

elif9mo ago

Yep this is what Andrej talks about around 20 minutes into this talk.

You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail. The second you start being vague, even if it WOULD be clear to a person with common sense, the LLM views that vagueness as a potential aspect of it's own creative liberty.

pja9mo ago

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail.

Sounds like ... programming.

Program specification is programming, ultimately. For any given problem if you’re lucky the specification is concise & uniquely defines the required program. If you’re unlucky the spec ends up longer than the code you’d write to implement it, because the language you’re writing it in is less suited to the problem domain than the actual code.

jebarker9mo ago

> the LLM views that vagueness as a potential aspect of it's own creative liberty.

I think that anthropomorphism actually clouds what’s going on here. There’s no creative choice inside an LLM. More description in the prompt just means more constraints on the latent space. You still have no certainty whether the LLM models the particular part of the world you’re constraining it to in the way you hope it does though.

joshuahedlund9mo ago

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail

I understand YMMV, but I have yet to find a use case where this takes me less time than writing the code myself.

throw2342342349mo ago

I've found myself personally thinking English is OK when I'm happy with a "lossy expansion" and don't need every single detail defined (i.e. the tedious boilerplate, or templating kind of code). After all to me an LLM can be seen as a lossy compression of actual detailed examples of working code - why not "uncompress it" and let it assume the gaps. As an example I want a UI to render some data but I'm not as fussed about the details of it, I don't want to specify exact co-ordinates of each button, etc

However when I want detailed changes I find it more troublesome at present than just typing in the code myself. i.e. I know exactly what I want and I can express it just as easily (sometimes easier) in code.

I find AI in some ways a generic DSL personally. The more I have to define, the more specific I have to be the more I start to evaluate code or DSL's as potentially more appropriate tools especially when the details DO matter for quality/acceptance.

9rx9mo ago

> You have to be extremely verbose in describing all of your requirements. There is seemingly no such thing as too much detail.

If only there was a language one could use that enables describing all of your requirements in a unambiguous manner, ensuring that you have provided all the necessary detail.

Oh wait.

SirMaster9mo ago

I'm really waiting for AI to get on par with the common sense of most humans in their respective fields.

bonoboTP9mo ago

If it's monkeylike quality and you need a million tries, it's shit. It you need four tries and one of those is top-tier professional programmer quality, then it's good.

agos9mo ago

if the thing producing the four PRs can't distinguish the top tier one, I have strong doubts that it can even produce it

solaire_oa9mo ago

Making 4 PRs for a well-known solution sounds insane, yes, but to be the devil's advocate, you could plausibly be working with an ambiguous task: "Create 4 PRs with 4 different dependency libraries, so that I can compare their implementations." Technically it wouldn't need to pick the best one.

I have apprehension about the future of software engineering, but comparison does technically seem like a valid use case.

layer89mo ago

The problem is, for any change, you have to understand the existing code base to assess the quality of the change in the four tries. This means, you aren’t relieved from being familiar with the code and reviewing everything. For many developers this review-only work style isn’t an exciting prospect.

And it will remain that way until you can delegate development tasks to AI with a 99+% success rate so that you don’t have to review their output and understand the code base anymore. At which point developers will become truly obsolete.

solaire_oa9mo ago

Top-tier professional programmer quality is exceedingly, impractically optimistic, for a few reasons.

1. There's a low probability of that in the first place.

2. You need to be a top-tier professional programmer to recognize that type of quality (i.e. a junior engineer could select one of the 3 shit PRs)

3. When it doesn't produce TTPPQ, you wasted tons of time prompting and reviewing shit code and still need to deliver, net negative.

I'm not doubting the utility of LLMs but the scattershot approach just feels like gambling to me.

zelphirkalt9mo ago

Also as a consequence of (1) the LLMs are trained on mediocre code mostly, so they often output mediocre or bad solutions.

diggan9mo ago

> A truly terrible and demotivating way to work and produce anything of real quality

You clearly have strong feelings about it, which is fine, but it would be much more interesting to know exactly why it would terrible and demotivating, and why it cannot produce anything of quality? And what is "real quality" and does that mean "fake quality" exists?

> million monkey interns banging on one million keyboards and submit a million PRs

I'm not sure if you misunderstand LLMs, or the famous "monkeys writing Shakespeare" part, but that example is more about randomness and infinity than about probabilistic machines somewhat working towards a goal with some non-determinism.

> We’re beyond doomed

The good news is that we've been doomed for a long time, yet we persist. If you take a look at how the internet is basically held up by duct-tape at this point, I think you'd feel slightly more comfortable with how crap absolutely everything is. Like 1% of software is actually Good Software while the rest barely works on a good day.

bgwalter9mo ago

If "AI" worked (which fortunately isn't the case), humans would be degraded to passive consumers in the last domain in which they were active creators: thinking.

Moreover, you would have to pay centralized corporations that stole all of humanity's intellectual output for engaging in your profession. That is terrifying.

The current reality is also terrifying: Mediocre developers are enabled to have a 10x volume (not quality). Mediocre execs like that and force everyone to use the "AI" snakeoil. The profession becomes even more bureaucratic, tool oriented and soulless.

People without a soul may not mind.

diggan9mo ago

> If "AI" worked (which fortunately isn't the case), humans would be degraded to passive consumers in the last domain in which they were active creators: thinking.

"AI" (depending on what you understand that to be) is already "working" for many, including myself. I've basically stopped using Google because of it.

> humans would be degraded to passive consumers in the last domain in which they were active creators: thinking

Why? I still think (I think at least), why would I stop thinking just because I have yet another tool in my toolbox?

> you would have to pay centralized corporations that stole all of humanity's intellectual output for engaging in your profession

Assuming we'll forever be stuck in the "mainframe" phase, then yeah. I agree that local models aren't really close to SOTA yet, but the ones you can run locally can already be useful in a couple of focused use cases, and judging by the speed of improvements, we won't always be stuck in this mainframe-phase.

> Mediocre developers are enabled to have a 10x volume (not quality).

In my experience, which admittedly been mostly in startups and smaller companies, this has always been the case. Most developers seem to like to produce MORE code over BETTER code, I'm not sure why that is, but I don't think LLMs will change people's mind about this, in either direction. Shitty developers will be shit, with or without LLMs.

3dsnano9mo ago

> And what is "real quality" and does that mean "fake quality" exists?

I think there is no real quality or fake quality, just quality. I am referencing the quality that Persig and C. Alexander have written about.

It’s… qualitative, so it’s hard to measure but easy to feel. Humans are really good at perceiving it then making objective decisions. LLMs don’t know what it is (they’ve heard about it and think they know).

diggan9mo ago

> LLMs don’t know what it is

Of course they don't, they're probability/prediction machines, they don't "know" anything, not even that Paris is the capital of France. What they do "know" is that once someone writes "The capital of France is", the most likely tokens to come after that, is "Paris". But they don't understand the concept, nor anything else, just that probably 54123 comes after 6723 (or whatever the tokens are).

Once you understand this, I think it's easy to reason about why they don't understand code quality, why they couldn't ever understand it, and how you can make them output quality code regardless.

abdullin9mo ago

It is actually funny that current AI+Coding tools benefit a lot from domain context and other information along the lines of Domain-Driven Design (which was inspired by the pattern language of C. Alexander).

A few teams have started incorporating `CONTEXT.MD` into module descriptions to leverage this.

koakuma-chan9mo ago

> That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality

This is the right way to work with generative AI, and it already is an extremely common and established practice when working with image generation.

xphos9mo ago

"If the only tool you have is a hammer, you tend to see every problem as a nail."

I think the worlds leaning dangerously into LLMs expecting them to solve every problem under the sun. Sure AI can solve problems but I think that domain 1 they Karpathy shows if it is the body of new knowledge in the world doesn't grow with LLMs and agents maybe generation and selection is the best method for working with domain 2/3 but there is something fundamentally lost in the rapid embrace of these AI tools.

A true challenge question for people is would you give up 10 points of IQ for access to the next gen AI model? I don't ask this in the sense that AI makes people stupid but rather that it frames the value of intelligence is that you have it. Rather than, in how you can look up or generate an answer that may or may not be correct quickly. How we use our tools deeply shapes what we will do in the future. A cautionary tale is US manufacturing of precision tools where we give up on teaching people how to use Lathes, because they could simply run CNC machines instead. Now that industry has an extreme lack of programmers for CNC machines, making it impossible to keep up with other precision instrument producing countries. This of course is a normative statement and has more complex variables but I fear in this dead set charge for AI we will lose sight of what makes programming languages and programming in general valuable

deadbabe9mo ago

It is not. The right way to work with generative AI is to get the right answer in the first shot. But it's the AI that is not living up to this promise.

Reviewing 4 different versions of AI code is grossly unproductive. A human co-worker can submit one version of code and usually have it accepted with a single review, no other "versions" to verify. 4 versions means you're reading 75% more code than is necessary. Multiply this across every change ever made to a code base, and you're wasting a shitload of time.

RHSeeger9mo ago

That's not really comparing apples to apples though.

> A human co-worker can submit one version of code and usually have it accepted with a single review, no other "versions" to verify.

But that human co-worker spent a lot of time generating what is being reviewed. You're trading "time saved coding" for "more time reviewing". You can't complain about the added time reviewing and then ignore all the time saved coding. THat's not to say it's necessarily a win, but it _is_ a tradeoff.

Plus that co-worker may very well have spent some time discussing various approaches to the problem (with you), with is somewhat parallel to the idea of reviewing 4 different PRs.

koakuma-chan9mo ago

> Reviewing 4 different versions of AI code is grossly unproductive.

You can have another AI do that for you. I review manually for now though (summaries, not the code, as I said in another message).

notTooFarGone9mo ago

I can recognize images in one look.

How about that 400 Line change that touches 7 files?

abdullin9mo ago

Exactly!

This is why there has to be "write me a detailed implementation plan" step in between. Which files is it going to change, how, what are the gotchas, which tests will be affected or added etc.

It is easier to review one document and point out missing bits, than chase the loose ends.

Once the plan is done and good, it is usually a smooth path to the PR.

mistersquid9mo ago

> I can recognize images in one look.

> How about that 400 Line change that touches 7 files?

Karpathy discusses this discrepancy. In his estimation LLMs currently do not have a UI comparable to 1970s CLI. Today, LLMs output text and text does not leverage the human brain’s ability to ingest visually coded information, literally, at a glance.

Karpathy surmises UIs for LLMs are coming and I suspect he’s correct.

koakuma-chan9mo ago

In my prompt I ask the LLM to write a short summary of how it solved the problem, run multiple instances of LLM concurrently, compare their summaries, and use the output of whichever LLM seems to have interpreted instructions the best, or arrived at the best solution.

j / k navigate · click thread line to collapse