undefined | Better HN

0 pointsscosman1y ago0 comments

Maybe a TLDR from all the issues I'm reading in this thread:

- It's gotten way better in the last 6 months. Both models (Sonnet 3.5 and new October Sonnet 3.5), and tooling (Cursor). If you last tried Co-pilot, you should probably give it another look. It's also going to keep getting better. [1]

- It can make errors, and expect to do some code review and guiding. However the error rates are going way way down [1]. I'd say it's already below humans for a lot of tasks. I'm often doing 2/3 iterations before applying a diff, but a quick comment like "close, keep the test cases, but use the test fixture at the top of the file to reduce repeated code" and 5 seconds is all it takes to get a full refactor. Compared to code-review turn around with a team, it's magic.

- You need to learn how to use it. Setting the right prompts, adding files to the context, etc. I'd say it's already worth learning.

- I just knows the docs, and that's pretty invaluable. I know 10ish languages, which also means I don't remember the system call to get an env var in any of them. It does, and can insert it a lot faster than I can google it. Again, you'll need to code review, but more and more it's nailing idiomatic error checking in each language.

- You don't need libraries for boilerplate tasks. zero_pad is the extreme/joke example, but a lot more of my code is just using system libraries.

- It can do things other tools can't. Tell it to take the visual style of one blog post and port to another. Take it to use a test file I wrote for style reference, and update 12 other files to follow that style. Read the README and tests, then write pydocs for a library. Write a GitHub action to build docs and deploy to GitHub pages (including suggesting libraries, deploy actions, and offering alternatives). Again: you don't blindly trust anything, you code review, and tests are critical.

[1] https://www.anthropic.com/news/3-5-models-and-computer-use

0 comments

DeathArrow1y ago

Yes, it works for new code and simple cases. If you have large code bases, it doesn't have the context and you have to baby it, telling which files and functions it should look into before attempting to write something. That takes a lot of time.

Yes, it can do simple tasks, like you said, writing a call to get the environment variables.

But imagine you work on a basket calculation service, where you have base item prices, where you have to apply some discounts based on some complicated rules, you have to add various kinds of taxes for various countries in the world and you have to use a different number of decimals for each country. Each of your classes calls 5 to 6 other classes, all with a lot of business logic behind. Besides that, you also make lots of API calls to other services.

What will the AI do for you? Nothing, it will just help you write one liners to parse or split strings. For everything else it lacks context.

JamesSwift1y ago

Are you suggesting you would inline all that logic if you hand-rolled the method? Probably not, right? You would have a high-level algorithm of easily-understood parts. Why wouldnt the AI be able to 1) write that high-level algorithm and then 2) subsequently write the individual parts?

scosmanOP1y ago

What's the logic here? "I haven't seen it so it doesn't exist?"

There are hundreds of available examples of it processing large numbers of files, and making correct changes across them. There are benchmarks with open datasets already linked in the thread [1]. It's trivial to find examples of it making much more complex changes than "one liners to parse or split strings".

[1] https://huggingface.co/datasets/princeton-nlp/SWE-bench

j / k navigate · click thread line to collapse