A lot of python in a monorepo. Mono repos have an advantage right now because the LLM can pretty much look through the entire repo. But I'm also applying LLM to eliminate a lot of roles that are obsolete, not just using it to code.
Thanks for sharing your perspective with ACTUAL details unlike most people that have gotten bad results.
Sadly hardware programming is probably going to lag or never be figured out because there's just not enough info to train on. This might change in the future when/if reasoning models get better but there's no guarantee of that.
> which is now based on o4
based on o4 or is o4, those are two different things. augment says this: https://support.augmentcode.com/articles/5949245054-what-mod...
Augment uses many models, including ones that we train ourselves. Each interaction you have with Augment will touch multiple models. Our perspective is that the choice of models is an implementation detail, and the user does not need to stay current with the latest developments in the world of AI models to fully take advantage of our platform.
Which IMO is....a cop out, a terrible take, and just...slimey. I would not trust a company like this with my money. For all you know they are running your prompts against a shitty open source model running on a 3090 in their closet. The lack of transparency here is concerning.
You might be getting bad results for a few reasons:
- your prompts are not specific enough
- your context is poisoned. how strategically are you providing context to the prompt? a good trick is to give the llm an existing file as an example to how you want it to produce the output and tell it "Do X in the style of Y.file". Don't forget with the latest models and huge context windows you could very well provide entire subdirectories into context (although I would recommend being pretty targeted still)
- the model/tool you're using sucks
- you work in a problem domain that LLMs are genuinely bad at
Note: your company is paying a subscription to a service that isn't allowing you to bring your own keys. they have an incentive to optimize and make sure you're not costing them a lot of money. This could lead to worse results.
see here for Cline team's perspective on this topic: https://www.reddit.com/r/ChatGPTCoding/comments/1kymhkt/clin...
I suggest this as the bare minimum for the HN community when discussing their bad results with LLMs and coding:
- what is your problem domain
- show us your favorite prompt
- what model and tools are you using?
- are you using it as a chat or an agent?
- are you bringing your own keys or using a service?
- what did you supply in context when you got the bad result?
- how did you supply context? copy paste? file locations? attachments?
- what prompt did you use when you got the bad result?
I'm genuinely surprised when someone complaining about LLM results provides even 2 of those things in their comment.
Most of the cynics would not provide even half of this because it'd be embarrassing and reveal that they have no idea what they are talking about.