undefined | Better HN

0 pointsmlsu11mo ago0 comments

I tried the agent thing on:

- Large C codebase (new feature and bugfix)

- Small rust codebase (new feature)

- Brand new greenfield frontend for an in-spec and documented openAPI API

- Small fixes to an existing frontend

It failed _dramatically_ in all cases. Maybe I'm using this thing wrong but it is devin-level fail. Gets diffs wrong. Passes phantom arguments to tools. Screws up basic features. Pulls in hundreds of line changes on unrelated files to refactor. Refactors again and again, over itself, partially, so that the uncompleted boneyard of an old refactor sits in the codebase like a skeleton (those tokens are also sent up to the model).

It genuinely makes an insane, horrible, spaghetti MESS of the codebase. Any codebase. I expected it to be good at svelte and solidJS since those are popular javascript frameworks with lots of training data. Nope, it's bad. This was a few days ago, Claude 4. Seriously, seriously people what am I missing here with this agents thing. They are such gluttonous eaters of tokens that I'm beginning to think these agent posts are paid advertising.

0 comments

vitaflo11mo ago

It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

An interesting thing about many of these types of posts is they never actually detail the tools they use and how they use them to achieve their results. It shouldn’t even be that hard for them to do, they could just have their agent do it for them.

phkahler11mo ago

>> It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

You may be right. The author of this one even says if you spend time prettying your code you should stop yak shaving. They apparently don't care about code quality.

1 more reply

lumost11mo ago

The agent/model being used makes a huge difference. Cline with Claude 3.7 is ridiculously expensive but useful. Copilot is vaguely ok.

1 more reply

runjake11mo ago

You’re not providing a key piece of information to provide you with an answer: what were the prompts you used? You can share your sessions via URL.

A prompt like “Write a $x program that does $y” is generally going to produce some pretty poor code. You generally want to include a lot of details and desires in your prompt. And include something like “Ask clarifying questions until you can provide a good solution”.

A lot of the people who complain about poor code generation use poor prompting.

phkahler11mo ago

It'd be nice if the AI advocates shared prompts, or even recorded entire sessions. Then we could all see how great it really is.

4 more replies

lexandstuff11mo ago

Prompt engineering isn't really that important anymore imo. If you're using a reasoning model, you can see if it understood your request by reading the reasoning trace.

3 more replies

phkahler11mo ago

There a many ways to do something wrong and few ways to do them right. It's on the AI advocates to show us session logs so we can all see how it's done right.

chucknthem11mo ago

How are you writing your prompts? I usually break a feature down to smaller task level before I prompt an agent (claude code in my case) to do anything. Feature level is often too hard to prompt and specify in enough detail for it to get right.

So I'd say claude 4 agents today are at smart but fresh intern level of autonomy. You still have to do the high level planning and task break down, but it can execute on tasks (say requiring 10 - 200 lines of code excluding tests). Any asking it to write much more code (200+ lines) often require a lot of follow ups and disappointment.

jvanderbot11mo ago

This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things seem bad.

1 more reply

presentation11mo ago

Coding agents should take you through a questionnaire before working. Break down what you are asking for into chunks, point me to key files that are important for this change, etc etc. I feel like a bit of extra prompting would help a lot of people get much better results rather than expecting people to know the arcane art of proompting just by looking at a chat input.

3 more replies

Affric11mo ago

I feel like when you prompt an LLM the LLM should take it almost as "what would the best possible prompt for this prompt be and then do that"...

jacob01911mo ago

I don't think it's fair to call that the agent thing. I've had profoundly positive results with agentic workflows for classification, analysis, and various business automations, including direct product pricing. You have to build an environment for the agent to make decisions in, with good instructions for what you want them to do. Then you wire it up so that the decisions have effects in the real world. You can acheieve really good results, and there is a lot of flexibility to tweak it and various tricks to optimize performance. Tools can allow agents to pull in relevant context as needed, or to execute complex multistep workflows. That is the agent thing.

Writing code is one thing that models can do when wired properly, and you can get a powerful productivity boost, but wielding the tools well is a skill of it's own, and results will vary by task, with each model having unique strengths. The most important skill is understanding the limitations.

Based on your task descriptions and the implied expectation, I'm unsurprised that you are frustrated with the results. For good results with anything requiring architecture decisions have a discussion with the model about architecture design, before diving in. Come up with a step by step plan and work through it together. Models are not like people, they know everything and nothing.

turtlebits11mo ago

Have it make small changes. Restrict it to a single file and scope it to <50 lines or so. Enough that you can easily digest without making it a chore.

declan_roberts11mo ago

A small change scoped to <50 lines is something easy to write for a normal software engineer. When do the LLMs start doing the hard part?

3 more replies

citizenpaul11mo ago

>these agent posts are paid advertising.

I'm 100% certain most if not all of them are, there is simply too much money flying around and I've seen things that marketing does in the past for way less hyped products. Though in this specific case I think the writer may simply be shilling AI to create demand for their service. Pay us monthly to one click deploy your broken incomplete AI slop. The app doesn't work? No problem just keep prompting harder and paying us more to host/build/test/deploy it...

I've also tried the agent thing and still am with only moderate success. Cursor, Claud-squad, goose, dagger AI agents. In other words all the new hotness, all with various features claiming to solve the fact that agents don't work. Guess what? they still don't.

But hey this is HN? most of the posters are tech fearing luddies right? All the contention on here must mean our grindset is wrong and we are not prompting hard enough.

There is even one shill Ghuntly that claims you need to be "redlining" ai at the cost of $500-$1000 per day to get the full benefits. LOL if that is not a veiled advertisement I don't know what is.

simonw11mo ago

Nobody has paid me to write anything about AI and if they did I would disclose it.

1 more reply

CapsAdmin11mo ago

This is my experience too most of the time. Though sometimes it does work, and sometimes a solution is found that I never thought of. But most of the time I have to change things around to my liking.

However, a counter argument to all this;

Does it matter if the code is messy?

None of this matters to the users and people who only know how to vibe code.

someguy10101011mo ago

> Does it matter if the code is messy?

It matters proportionally to the amount of time I intend to maintain it for, and the amount of maintenance expected.

ComplexSystems11mo ago

Same here. I keep trying to figure out WTF agent that people are using to get these great results, because Copilot with Claude 4 and Gemini 2.5 has been a disastrous mess for me.

j / k navigate · click thread line to collapse

0 comments

vitaflo11mo ago

It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

phkahler11mo ago

>> It’s entirely possible that the people talking up agents also produced spaghetti code but don’t care because they are so much more “productive”.

You may be right. The author of this one even says if you spend time prettying your code you should stop yak shaving. They apparently don't care about code quality.

1 more reply

lumost11mo ago

The agent/model being used makes a huge difference. Cline with Claude 3.7 is ridiculously expensive but useful. Copilot is vaguely ok.

1 more reply

runjake11mo ago

You’re not providing a key piece of information to provide you with an answer: what were the prompts you used? You can share your sessions via URL.

A lot of the people who complain about poor code generation use poor prompting.

phkahler11mo ago

It'd be nice if the AI advocates shared prompts, or even recorded entire sessions. Then we could all see how great it really is.

4 more replies

lexandstuff11mo ago

Prompt engineering isn't really that important anymore imo. If you're using a reasoning model, you can see if it understood your request by reading the reasoning trace.

3 more replies

phkahler11mo ago

There a many ways to do something wrong and few ways to do them right. It's on the AI advocates to show us session logs so we can all see how it's done right.

chucknthem11mo ago

jvanderbot11mo ago

1 more reply

presentation11mo ago

3 more replies

Affric11mo ago

I feel like when you prompt an LLM the LLM should take it almost as "what would the best possible prompt for this prompt be and then do that"...

jacob01911mo ago

turtlebits11mo ago

Have it make small changes. Restrict it to a single file and scope it to <50 lines or so. Enough that you can easily digest without making it a chore.

declan_roberts11mo ago

A small change scoped to <50 lines is something easy to write for a normal software engineer. When do the LLMs start doing the hard part?

3 more replies

citizenpaul11mo ago

>these agent posts are paid advertising.

But hey this is HN? most of the posters are tech fearing luddies right? All the contention on here must mean our grindset is wrong and we are not prompting hard enough.

There is even one shill Ghuntly that claims you need to be "redlining" ai at the cost of $500-$1000 per day to get the full benefits. LOL if that is not a veiled advertisement I don't know what is.

simonw11mo ago

Nobody has paid me to write anything about AI and if they did I would disclose it.

1 more reply

CapsAdmin11mo ago

This is my experience too most of the time. Though sometimes it does work, and sometimes a solution is found that I never thought of. But most of the time I have to change things around to my liking.

However, a counter argument to all this;

Does it matter if the code is messy?

None of this matters to the users and people who only know how to vibe code.

someguy10101011mo ago

> Does it matter if the code is messy?

It matters proportionally to the amount of time I intend to maintain it for, and the amount of maintenance expected.

ComplexSystems11mo ago

Same here. I keep trying to figure out WTF agent that people are using to get these great results, because Copilot with Claude 4 and Gemini 2.5 has been a disastrous mess for me.

j / k navigate · click thread line to collapse