LLM code generation may lead to an erosion of trust (opens in new tab)

(jaysthoughts.com)

248 pointsCoffeeOnWrite9mo ago275 comments

275 comments

(Works on older browsers and doesn't require JavaScript except to get past CloudSnare).

I have a friend that always says "innovation happens at the speed of trust". Ever since GPT3, that quote comes to mind over and over.

Verification has a high cost and trust is the main way to lower that cost. I don't see how one can build trust in LLMs. While they are extremely articulate in both code and natural language, they will also happily go down fractal rabbit holes and show behavior I would consider malicious in a person.

acedTrex9mo ago

Author here: I quite like that quote. A very succinct way of saying what took me a few paragraphs.

This new world of having to verify every single thing at all points is quite exhausting and frankly pretty slow.

JackFr9mo ago

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

The classic on the subject.

tayo429mo ago

We do this is in professional environments already with documentation for designs upfront and code reviews though

EGreg9mo ago

“Freedom of speech” in politics

Herring9mo ago

So get another LLM to do it. Judging is considerably easier [For LLMs] than writing something from scratch, so LLM judges will always have that edge in accuracy. Equivalently, I also like getting them to write tons of tests to build trust in correct behavior.

3 more replies

lubujackson9mo ago

We never can have total trust in LLM output, but we can certainly sanitize it and limit it's destructive range. Just like we sanitize user input and defend with pentests and hide secrets in dot files, we will eventually resolve to "best practices" and some "SOC-AI compliance" standard down the road.

It's just too useful to ignore, and trust is always built, brick by brick. Let's not forget humans are far from reliable anyway. Just like with driving cars, I imagine producing less buggy code (along predefined roads) will soon outpace humans. Then it is just blocking and tackling to improve complexity.

bluefirebrand9mo ago

> We never can have total trust in LLM output, but we can certainly sanitize it and limit it's destructive range

Can we really do this reliably? LLMs are non-deterministic, right, so how do we validate the output in a deterministic way?

We can validate things like shape of data being returned, but how do we validate correctness without an independent human in the loop to verify?

3 more replies

ngold9mo ago

You perfectly said nothing. Well done.

whiplash4519mo ago

> "innovation happens at the speed of trust"

You'll have to elaborate on that. How much trust was there in electricity, flight and radioactivity when we discovered them?

In science, you build trust as you go.

agent2819mo ago

Have you heard of the War of the Currents?

> As the use of AC spread rapidly with other companies deploying their own systems, the Edison Electric Light Company claimed in early 1888 that high voltages used in an alternating current system were hazardous, and that the design was inferior to, and infringed on the patents behind, their direct current system.

> In the spring of 1888, a media furor arose over electrical fatalities caused by pole-mounted high-voltage AC lines, attributed to the greed and callousness of the arc lighting companies that operated them.

https://en.wikipedia.org/wiki/War_of_the_currents

1 more reply

reaperducer9mo ago

How much trust was there in electricity, flight and radioactivity when we discovered them?

Not much.

Plenty of people were against electricity when it started becoming common. They were terrified of lamps, doorbells, telephones, or anything else with an electric wire. If they were compelled to use these things (like for their job) they would often wear heavy gloves to protect themselves. It is very occasionally mentioned in novels from the late 1800's.

(Edit: If you'd like to see this played out visually, watch the early episodes of Miss Fisher's Murder Mysteries on ABC [.oz])

There are still people afraid of electricity today. There is no shortage of information on the (ironically enough) internet about how to shield your home from the harmful effects of electrical wires, both in the house and utility lines.

Flight? I dunno about back then, but today there's plenty of people who are afraid to fly. If you live in Las Vegas for a while, you start to notice private train cars occasionally parked on the siding near the north outlet mall. These belong to celebrities who are afraid to fly, but have to go to Vegas for work.

Radioactivity? There was a plethora of radioactive hysteria in books, magazines, comics, television, movies, and radio. It's not hard to find.

1 more reply

dirkc9mo ago

I use it to mean that the more people trust each other, the quicker things get done. Maybe the statement can be rephrased as "progress happens at the speed of trust" to avoid the specific scientific connotation.

4 more replies

blurbleblurble9mo ago

I bumped into this at work but not in the way you might expect. My colleague and I were under some pressure to show progress and decided to rush merging a pretty significant refactor I'd been working on. It was a draft PR but we merged it for momentum's sake. The next week some bugs popped up in an untested area of the code.

As we were debugging, my colleague revealed his assumption that I'd used AI to write it, and expressed frustration at trying to understand something AI generated after the fact.

But I hadn't used AI for this. Sure, yes I do use AI to write code. But this code I'd written by hand and with careful deliberate thought to the overall design. The bugs didn't stem from some fundamental flaw in the refactor, they were little oversights in adjusting existing code to a modified API.

This actually ended up being a trust building experience over all because my colleague and I got to talk about the tension explicitly. It ended up being a pretty gentle encounter with the power of what's happening right now. In hindsight I'm glad it worked out this way, I could imagine in a different work environment, something like this could have been more messy.

Be careful out there.

kldg9mo ago

It can be a pretty a serious and offensive accusation for sure. When a dev voices their own characters in a game and has a flat affect and/or stilted speech pattern, it's inevitable to be called AI by someone. Art I don't understand or appreciate? Likely AI. Unimpressed by a Eurovision entry? Call it AI. Some people toss this around casually, but I wouldn't.

I made myself known to be a big fool ~4 years ago. A local newspaper published an article on a particular person with outrageous claims primarily using photographs as proof. I challenged the editor directly via email, laying out my reasoning for why I was sure the images were manipulated. My arguments relied on misunderstandings on my part and the person claims were levied against showing zero deviation in position and stance while posing with multiple people during a meet-and-greet. The editor was offended and trolled me in response. I didn't let up, and he realized I was an idiot, not an agitator, and shared the full unpublished video from where the photos were taken with me, at which point I apologized deeply and made a donation. My ego was appropriately small for the following year.

Before emailing him, I shared the photos with some level-headed friends for their opinion, specifically because I didn't want to make a false accusation. They came to the same conclusion that the images were most likely manipulated, so I was very confident going in.

Now I trust this paper and people involved implicitly, but this was a lot of work to convince just one person.

stavros9mo ago

I don't understand the premise. If I trust someone to write good code, I learned to trust them because their code works well, not because I have a theory of mind for them that "produces good code" a priori.

If someone uses an LLM and produces bug-free code, I'll trust them. If someone uses an LLM and produces buggy code, I won't trust them. How is this different from when they were only using their brain to produce the code?

acedTrex9mo ago

Author here:

Essentially the premise is that in medium trust environments like very large teams or low trust environments like an open source project.

LLMs make it very difficult to make an immediate snap judgement about the quality of the dev that submitted the patch based solely on the code itself.

In the absence of being able to ascertain the type of person you are dealing with you have to fall back too "no trust" and review everything with a very fine tooth comb. Essentially there are no longer any safe "review shortcuts" and that can be painful in places that relied on those markers to grease the wheels so to speak.

Obviously if you are in an existing competent high trust team then this problem does not apply and most likely seems completely foreign as a concept.

lxgr9mo ago

> LLMs make it very difficult to make an immediate snap judgement about the quality [...]

That's the core of the issue. It's time to say goodbye to heuristics like "the blog post is written in eloquent, grammatical English, hence the point its author is trying to make must be true" or "the code is idiomatic and following all code styles, hence it must be modeling the world with high fidelity".

Maybe that's not the worst thing in the world. I feel like it often made people complacent.

5 more replies

eddd-ddde9mo ago

> In the absence of being able to ascertain the type of person you are dealing with you have to fall back too "no trust" and review everything with a very fine tooth comb.

Is that not how you review all code? I don't care who wrote the code, just because certain person wrote the code doesn't give them an instant pass to skip my review process.

sim7c009mo ago

its about the quality of the code, not the quality of the dev. you might think it's related, but it's not.

a dev can write piece of good, and piece of bad code. so per code, review the code. not the dev!

2 more replies

alganet9mo ago

> I learned to trust them because their code works well

There's so much more than "works well". There are many cues that exist close to code, but are not code:

I trust more if the contributor explains their change well.

I trust more if the contributor did great things in the past.

I trust more if the contributor manages granularity well (reasonable commits, not huge changes).

I trust more if the contributor picks the right problems to work on (fixing bugs before adding new features, etc).

I trust more if the contributor proves being able to maintain existing code, not just add on top of it.

I trust more if the contributor makes regular contributions.

And so on...

acedTrex9mo ago

Author here:

Spot on, there are so many little things that we as humans use as subtle verification steps to decide how much scrutiny various things require. LLMs are not necessarily the death of that concept but they do make it far far harder.

moffkalast9mo ago

It's easy to get overconfident and not test the LLM's code enough when it worked fine for a handful of times in a row, and then you miss something.

The problem is often really one of miscommunication, the task may be clear to the person working on it, but with frequent context resets it's hard to make sure the LLM also knows what the whole picture is and they tend to make dumb assumptions when there's ambiguity.

The thing that 4o does with deep research where it asks for additional info before it does anything should be standard for any code generation too tbh, it would prevent a mountain of issues.

stavros9mo ago

Sure, but you're still responsible for the quality of the code you commit, LLM or no.

2 more replies

insane_dreamer9mo ago

> If someone uses an LLM and produces bug-free code, I'll trust them.

Only because you already trust them to know that the code is indeed bug-free. Some cases are simple and straightforward -- this routine returns a desired value or it doesn't. Other situations are much more complex in anticipating the ways in which it might interact with other parts of the system, edge cases that are not obvious, etc. Writing code that is "bug free" in that situation requires the writer of the code to understand the implications of the code, and if the dev doesn't understand exactly what the code does because it was written by an LLM, then they won't be able to understand the implications of the code. It then falls to the reviewer to understand the implications of the code -- increasing their workload. That was the premise.

somewhereoutth9mo ago

Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work. LLMs are not calculators, nor are they the internet.

A good rule of thumb is to simply reject any work that has had involvement of an LLM, and ignore any communication written by an LLM (even for EFL speakers, I'd much rather have your "bad" English than whatever ChatGPT says for you).

I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

stavros9mo ago

Well, no, a good rule of thumb is to expect people to write good code, no matter how they do it. Why would you mandate what tool they can use to do it?

1 more reply

tranchebald9mo ago

I’m not seeing a lot of discussion about verification or a stronger quality control process anywhere in the comments here. Is that some kind of unsolvable problem for software? I think if the standard of practice is to use author reputation as a substitute for a robust quality control process, then I wouldn’t be confident that the current practice is much better than AI code-babel.

badsectoracula9mo ago

> Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

You can say that for pretty much any sort of automation or anything that makes things easier for humans. I'm pretty sure people were saying that about doing math by hand around when calculators became mainstream too.

breuleux9mo ago

I think the main issue is people using LLMs to do things that they don't know how to do themselves. There's actually a similar problem with calculators, it's just a much smaller one: if you never learn how to add or multiply numbers by hand and use calculators for everything all the time, you may sometimes make absurd mistakes like tapping 44 * 3 instead of 44 * 37 and not bat an eye when your calculator tells you the result is a whole order of magnitude less than what you should have expected. Because you don't really understand how it works. You haven't developed the intuition.

There's nothing wrong with using LLMs to save time doing trivial stuff you know how to do yourself and can check very easily. The problem is that (very lazy) people are using them to do stuff they are themselves not competent at. They can't check, they won't learn, and the LLM is essentially their skill ceiling. This is very bad: what plus-value are you supposed to bring over something you don't understand? AGI won't have to improve from the current baseline to surpass humans if we're just going to drag ourselves down to its level.

mexicocitinluez9mo ago

>Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

What? How on god's green earth could you even pretend to know how all people are using these tools?

> LLMs are not calculators, nor are they the internet.

Umm, okay? How does that make them less useful?

I'm going to give you a concrete example of something I just did and let you try and do whatever mental gymnastics you have to do to tell me it wasn't useful:

Medicare requires all new patients receiving home health treatment go through a 100+ question long form. This form changes yearly, and it's my job to implement the form into our existing EMR. Well, part of that is creating a printable version. Guess what I did? I uploaded the entire pdf to Claude and asked it to create a print-friendly template using Cottle as the templating language in C#. It generated the 30 page print preview in a minute. And it took me about 10 more minutes to clean up.

> I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

The irony is that they're getting better by the day. That's not to say people don't use them for the wrong applications, but the idea that this tech is going to be banned is absurd.

> A good rule of thumb is to simply reject any work that has had involvement of an LLM

Do you have any idea how ridiculous this sounds to people who actually use the tools? Are you going to be able to hunt down the single React component in which I asked it to convert the MUI styles to tailwind? How could you possibly know? You can't.

sebmellen9mo ago

You’re being unfairly downvoted. There is a plague of well-groomed incoherency in half of the business emails I receive today. You can often tell that the author, without wrestling with the text to figure out what they want to say, is a kind of stochastic parrot.

This is okay for platitudes, but for emails that really matter, having this messy watercolor kind of writing totally destroys the clarity of the text and confuses everyone.

To your point, I’ve asked everyone on my team to refrain from writing words (not code) with ChatGPT or other tools, because the LLM invariably leads to more complicated output than the author just badly, but authentically, trying to express themselves in the text.

2 more replies

flir9mo ago

> A good rule of thumb is to simply reject any work that has had involvement of an LLM,

How are you going to know?

2 more replies

jnxx9mo ago

We already treat spam and unsolicited commercial email that way.

taneq9mo ago

If you have a long standing, effective heuristic that “people with excellent, professional writing are more accurate and reliable than people with sloppy spelling and punctuation” then the appearance of a semi-infinite group of ‘people’ writing well presented, convincingly worded articles which nonetheless are riddled with misinformation, hidden logical flaws, and inconsistencies, you’re gonna end up trusting everyone a lot less.

It’s like if someone started bricking up tunnel entrances and painting ultra realistic versions of the classic Road Runner tunnel painting on them, all over the place. You’d have to stop and poke every underpass with a stick just to be sure.

stavros9mo ago

Sure, your heuristic no longer works, and that's a bit inconvenient. We'll just find new ones.

3 more replies

legacynl9mo ago

> How is this different from when they were only using their brain to produce the code?

If A = B, higher B means that A is higher too.

Now it's A + Ai = B. Now Higher B doesn't necessarily mean higher A.

Especially since the current state of Ai is pretty much stochastic, and sometimes is worse than nothing at all

JambalayaJimbo9mo ago

I have never been in a work environment in which I’ve been able to do more than rubber stamp PRs. Performing a deep review of each change is simply impossible with the expectations we were given.

stavros9mo ago

Interesting, I've never been in one where we didn't read PRs.

mexicocitinluez9mo ago

It's not.

What you're seeing now is people who once thought and proclaimed these tools as useless now have to start to walk back their claims with stuff like this.

It does amaze me that the people who don't use these tools seem to have the most to say about them.

acedTrex9mo ago

Author here:

For what it's worth I do actually use the tools albeit incredibly intentionally and sparingly.

I see quite a few workflows and tasks that they can be a value add on, mostly outside of the hotpath of actual code generation but still quite enticing. So much so in fact I'm working on my own local agentic tool with some self hosted ollama models. I like to think that i am at least somewhat in the know on the capabilities and failure points of the latest LLM tooling.

That however doesn't change my thoughts on trying to ascertain if code submitted to me deserves a full indepth review or if I can maybe cut a few corners here and there.

1 more reply

globnomulous9mo ago

> It does amaze me that the people who don't use these tools seem to have the most to say about them.

You're kidding, right? Most people who don't use the tools and write about it are responding to the ongoing hype train -- a specific article, a specific claim, or an idea that seems to be gaining acceptance or to have gone unquestioned among LLM boosters.

I recently watched a talk by Andrei Karpathy. So much in it begged for a response. Google Glass was "all the rage" in 2013? Please. "Reading text is laborious and not fun. Looking at images is fun." You can't be serious.

Someone recently shared on HN a blog post explaining why the author doesn't use LLMs. The justification for the post? "People keep asking me."

1 more reply

satisfice9mo ago

LLMs make bad work— of any kind— look like plausibly good work. That’s why it is rational to automatically discount the products of anyone who has used AI.

I once had a member of my extended family who turned out to be a con artist. After she was caught, I cut off contact, saying I didn’t know her. She said “I am the same person you’ve known for ten years.” And I replied “I suppose so. And now I realized I have never known who that is, and that I never can know.”

We all assume the people in our lives are not actively trying to hurt us. When that trust breaks, it breaks hard.

No one who uses AI can claim “this is my work.” I don’t know that it is your work.

No one who uses AI can claim that it is good work, unless they thoroughly understand it, which they probably don’t.

A great many students of mine have claimed to have read and understand articles I have written, yet I discovered they didn’t. What if I were AI and they received my work and put their name on it as author? They’d be unable to explain, defend, or follow up on anything.

This kind of problem is not new to AI. But it has become ten times worse.

bobjordan9mo ago

I see where you're coming from, and I appreciate your perspective. The "con artist" analogy is plausible, for the fear of inauthenticity this technology creates. However, I’d like to offer a different view from someone who has been deep in the trenches of full-stack software development.

I’m someone who put in my "+10,000 hours" programming complex applications, before useful LLMs were released. I spent years diving into documentation and other people's source code every night, completely focused on full-stack mastery. Eventually, that commitment led to severe burnout. My health was bad, my marriage was suffering. I released my application and then I immediately had to walk away from it for three years just to recover. I was convinced I’d never pick it up again.

It was hearing many reports that LLMs had gotten good at code that cautiously brought me back to my computer. That’s where my experience diverges so strongly from your concerns. You say, “No one who uses AI can claim ‘this is my work.’” I have to disagree. When I use an LLM, I am the architect and the final inspector. I direct the vision, design the system, and use a diff tool to review every single line of code it produces. Just recently, I used it as a partner to build a complex optimization model for my business's quote engine. Using a true optimization model was always the "right" way to do it but would have taken me months of grueling work before, learning all details of the library, reading other people’s code, etc. We got it done in a week. Do I feel like it’s my work? Absolutely. I just had a tireless and brilliant, if sometimes flawed, assistant.

You also claim the user won't "thoroughly understand it." I’ve found the opposite. To use an LLM effectively for anything non-trivial, you need a deeper understanding of the fundamentals to guide it and to catch its frequent, subtle mistakes. Without my years of experience, I would be unable to steer it for complex multi-module development, debug its output, or know that the "plausibly good work" it produced was actually wrong in some ways (like N+1 problems).

I can sympathize with your experience as a teacher. The problem of students using these tools to fake comprehension is real and difficult. In academia, the process of learning, getting some real fraction of the +10,000hrs is the goal. But in the professional world, the result is the goal, and this is a new, powerful tool to achieve better results. I’m not sure how a teacher should instruct students in this new reality, but demonizing LLM use is probably not the best approach.

For me, it didn't make bad work look good. It made great work possible again, all while allowing me to have my life back. It brought the joy back to my software development craft without killing me or my family to do it. My life is a lot more balanced now and for that, I’m thankful.

satisfice9mo ago

Here's the problem, friend: I also have put in my 10,000 hours. I've been coding as part of my job since 1983. I switched to testing from production coding in 1987, but I ran a team that tested developer tools, at Apple and Borland, for eight years. I've been living and breathing testing for decades as a consultant and expert witness.

I do not lightly say that I don't trust the work of someone who uses AI. I'm required to practice with LLMs as part of my job. I've developed things with the help of AI. Small things, because the amount of vigilance necessary to do big things is prohibitive.

Fools rush in, they say. I'm not a fool, and I'm not claiming that you are either. What I know is that there is a huge burden of proof on the shoulders of people who claim that AI is NOT problematic-- given the substantial evidence that it behaves recklessly. This burden is not satisfied by people who say "well, I'm experienced and I trust it."

1 more reply

axegon_9mo ago

That is already the case for me. The amount of times I've read "apologies for the oversight, you are absolutely correct" is staggering: 8 or 9 out of 10 times. Meanwhile I constantly see people mindlessly copy paying llm generated code and subsequently furious when it doesn't do what they expected it to do. Which, btw, is the better option: I'd rather have something obviously broken as opposed to something seemingly working.

autobodie9mo ago

In my experience, LLMs are extremely inclined to modify code just to pass tests instead of meeting requirements.

fwip9mo ago

When they're not modifying the tests to match buggy behavior. :P

devjab9mo ago

Are you using the LLM's through a browser chatbot? Because the AI-agents we use with direct code-access aren't very chatty. I'd also argue that they are more capable than a lot of junior programmers, at least around here. We're almost at a point where you can feed the agents short specific tasks, and they will perform them well enough to not really require anything outside of a code review.

That being said, the prediction engine still can't do any real engineering. If you don't specifically task them with using things like Python generators, you're very likely to have a piece of code that eats up a gazillion memory. Which unfortunately don't set them appart from a lot of Python programmers I know, but it is an example of how the LLM's are exactly as bad as you mention. On the positive side, it helps with people actually writing the specification tasks in more detail than just "add feature".

Where AI-agents are the most useful for us is with legacy code that nobody prioritise. We have a data extractor which was written in the previous millennium. It basically uses around two hunded hard-coded coordinates to extact data from a specific type of documents which arrive by fax. It's worked for 30ish years because the documents haven't changed... but it recently did, and it took co-pilot like 30 seconds to correct the coordinates. Something that would've likely taken a human a full day of excruciating boredom.

I have no idea how our industry expect anyone to become experts in the age of vibe coding though.

furyofantares9mo ago

> Because the AI-agents we use with direct code-access aren't very chatty.

Every time I tell claude code something it did is wrong, or might be wrong, or even just ask a leading question about a potential bug it just wrote, it leads with "You're absolutely correct!" before even invoking any tools.

Maybe you've just become used to ignoring this. I mostly ignore it but it is a bit annoying when I'm trying to use the agent to help me figure out if the code it wrote is correct, so I ask it some question it should be capable of helping with and it leads with "you're absolutely correct".

I didn't make a proposition that can be correct or not, and it didn't do any work yet to to investigate my question - it feels like it has poisoned its own context by leading with this.

2 more replies

gibspaulding9mo ago

> Where AI-agents are the most useful for us is with legacy code

I’d love to hear more about your workflow and the code base you’re working in. I have access to Amazon Q (which it looks like is using Claude Sonnet 4 behind the scenes) through work, and while I found it very useful for Greenfield projects, I’ve really struggled using it to work on our older code bases. These are all single file 20,000 to 100,000 line C modules with lots of global variables and most of the logic plus 25 years of changes dumped into a few long functions. It’s hard to navigate for a human, but seems to completely overwhelm Q’s context window.

Do other Agents handle this sort of scenario better, or are there tricks to making things more manageable? Obviously re-factoring to break everything up into smaller files and smaller functions would be great, but that’s just the sort of project that I want to be able to use the AI for.

1 more reply

teeray9mo ago

> Because the AI-agents we use with direct code-access aren't very chatty.

So they’re even more confident in their wrongness

1 more reply

axegon_9mo ago

Of course I don't have any extensions locally, I am not a lunatic. I don't always have access to my personal hardware and I would never trust an extension to pass my code around over http to a server I don't have full control of. Ddosecrets should have been enough of a warning for most people but I suspect countless more will have to learn that lesson the hard way.

mexicocitinluez9mo ago

> 8 or 9 out of 10 times.

Not they don't. This is 100% a made up statistic.

bluefirebrand9mo ago

It isn't even being presented as a statistic it is someone saying what they have experienced

2 more replies

HardCodedBias9mo ago

All of this fighting against LLMs is pissing in the wind.

It seems that LLMs, as they work today, make developers more productive. It is possible that they benefit less experienced developers even more than experienced developers.

More productivity, and perhaps very large multiples of productivity, will not be abandoned due roadblocks constructed by those who oppose the technology due to some reason.

Examples of the new productivity tool causing enormous harm (eg: bug that brings down some large service for a considerable amount of time) will not stop the technology if it being considerable productivity.

Working with the technology and mitigating it's weaknesses is the only rational path forward. And those mitigation can't be a set of rules that completely strip the new technology of it's productivity gains. The mitigations have to work with the technology to increase its adoption or they will be worked around.

mjr009mo ago

> It seems that LLMs, as they work today, make developers more productive.

Think this strongly depends on the developer and what they're attempting to accomplish.

In my experience, most people who swear LLMs make them 10x more productive are relatively junior front-end developers or serial startup devs who are constantly greenfielding new apps. These are totally valid use cases, to be clear, but it means a junior front-end dev and a senior embedded C dev tend to talk past each other when they're discussing AI productivity gains.

> Working with the technology and mitigating it's weaknesses is the only rational path forward.

Or just using it more sensibly. As an example: is the idea of an AI "agent" even a good one? The recent incident with Copilot[0] made MS and AI look like a laughingstock. It's possible that trying to let AI autonomously do work just isn't very smart.

As a recent analogy, we can look at blockchain and cryptocurrency. Love it or hate it, it's clear from the success of Coinbase and others that blockchain has found some real, if niche, use cases. But during peak crypto hype, you had people saying stuff like "we're going to track the coffee bean supply chain using blockchain". In 2025 that sounds like an exaggerated joke from Twitter, but in 2020 it was IBM legitimately trying to sell this stuff[1].

It's possible we'll look back and see AI agents, or other current applications of generative AI, as the coffee blockchain of this bubble.

[0] https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...

[1] https://www.forbes.com/sites/robertanzalone/2020/07/15/big-c...

parineum9mo ago

> In my experience, most people who swear LLMs make them 10x more productive are relatively junior front-end developers or serial startup devs who are constantly greenfielding new apps. These are totally valid use cases, to be clear, but it means a junior front-end dev and a senior embedded C dev tend to talk past each other when they're discussing AI productivity gains.

I agree with this quite a lot. I also think that those greenfield apps quickly become unmanageable by AI as you need to start applying solutions that are unique/tailored for your objective or you want to start abstracting some functionality into building components and base classes that the AI hasn't seen before.

I find AI very useful to get me to a from beginner to intermediate in codebases and domains that I'm not familiar with but, once I get the familiarity, the next steps I take mostly without AI because I want to do novel things it's never seen before.

conartist69mo ago

And here it is again. "More productive"

But this doesn't mean that the model/human combo is more effective at serving the needs of users! It means "producing more code."

There are no LLMs shipping changesets that delete 2000 lines of code -- that's how you know "making engineers more productive" is a way of talking about how much code is being created...

eikenberry9mo ago

My wife's company recently hired some contractors and they were touting their productivity with AI by saying how it allowed them (one person) to write 150k lines of code in 3 weeks. They said this without sarcasm. It was funny and scary at the same time that anyone might buy this as a good outcome. Classic lines-of-code metric rearing its ugly head again.

FuckButtons9mo ago

I think you’re arguing against something the author didn’t actually say.

You seem to be claiming that this is a binary, either we will or won’t use llms, but the author is mostly talking about risk mitigation.

By analogy it seems like you’re saying the author is fundamentally against the development of the motor car because they’ve pointed out that some have exploded whereas before, we had horses which didn’t explode, and maybe we should work on making them explode less before we fire up the glue factories.

scelerat9mo ago

I didn't see the post as pissing into the wind so much as calling out several caveats of coding with LLMs, especially on teams, and ideas on how to mitigate them.

ge969mo ago

It is funny (ego) I remember when React was new and I refused to learn it, had I learned it earlier I probably would have entered the market years earlier.

Even now I have this refusal to use GPT where as my coworkers lately have been saying "ChatGPT says" or this code was created by chatGPT idk, for me I take pride writing code myself/not using GPT but I also still use google/stackoverflow which you could say is a slower version of GPT.

anthonypasq9mo ago

this mindset does not work in software. My dad would still be programming with punchcards if he thought this way. instead he using copilot daily writing microservices and isnt some annoying dinosaur

1 more reply

pu_pe9mo ago

> While the industry leaping abstractions that came before focused on removing complexity, they did so with the fundamental assertion that the abstraction they created was correct. That is not to say they were perfect, or they never caused bugs or failures. But those events were a failure of the given implementation a departure from what the abstraction was SUPPOSED to do, every mistake, once patched led to a safer more robust system. LLMs by their very fundamental design are a probabilistic prediction engine, they merely approximate correctness for varying amounts of time.

I think what the author misses here is that imperfect, probabilistic agents can build reliable, deterministic systems. No one would trust a garbage collection tool based on how reliable the author was, but rather if it proves it can do what it intends to do after extensive testing.

I can certainly see an erosion of trust in the future, with the result being that test-driven development gains even more momentum. Don't trust, and verify.

lbalazscs9mo ago

It's naive to hope that automatic tests will find all problems. There are several types of problems that are hard to detect automatically: concurrency problems, resource management errors, security vulnerabilities, etc.

An even more important question: who tests the tests themselves? In traditional development, every piece of logic is implemented twice: once in the code and once in the tests. The tests checks the code, and in turn, the code implicitly checks the tests. It's quite common to find that a bug was actually in the tests, not the app code. You can't just blindly trust the tests, and wait until your agent finds a way to replicate a test bug in the code.

acedTrex9mo ago

> I think what the author misses here is that imperfect, probabilistic agents can build reliable, deterministic systems. No one would trust a garbage collection tool based on how reliable the author was, but rather if it proves it can do what it intends to do after extensive testing.

> but rather if it proves it can do what it intends to do after extensive testing.

Author here: Here I was less talking about the effectiveness of the output of a given tool and more so about the tool itself.

To take your garbage collection example, sure perhaps an agentic system at some point can spin some stuff up and beat it into submission with test harnesses, bug fixes etc.

But, imagine you used the model AS the garbage collector/tool, in that say every sweep you simply dumped the memory of the program into the model and told it to release the unneeded blocks. You would NEVER be able to trust that the model itself correctly identifies the correct memory blocks and no amount of "patching" or "fine tuning" would ever get you there.

With other historical abstractions like say jvm, if the deterministic output, in this case the assembly the jit emits is incorrect that bug is patched and the abstraction will never have that same fault again. not so with LLMs.

To me that distinction is very important when trying to point out previous developer tooling that changed the entire nature of the industry. It's not to say I do not think LLMs will have a profound impact on the way things work in the future. But I do think we are in completely uncharted territory with limited historical precedence to guide us.

bluefirebrand9mo ago

> I think what the author misses here is that imperfect, probabilistic agents can build reliable, deterministic systems

That is quite a statement! You're talking about systems that are essentially entropy-machines somehow creating order?

> with the result being that test-driven development gains even more momentum

Why is it that TDD is always put forward as the silver bullet that fixes all issues with building software

The number of times I've seen TDD build the wrong software after starting with the wrong tests is actually embarassing

cheriot9mo ago

> promises that the contributed code is not the product of an LLM but rather original and understood completely.

> require them to be majority hand written.

We should specify the outcome not the process. Expecting the contributor to understand the patch is a good idea.

> Juniors may be encouraged/required to elide LLM-assisted tooling for a period of time during their onboarding.

This is a terrible idea. Onboarding is a lot of random environment setup hitches that LLMs are often really good at. It's also getting up to speed on code and docs and I've got some great text search/summarizing tools to share.

bluefirebrand9mo ago

> Onboarding is a lot of random environment setup hitches

Learning how to navigate these hitches is a really important process

If we streamline every bit of difficulty or complexity out of our lives, it seems trivially obvious that we will soon have no idea what to do when we encounter difficulty or complexity. Is that just me thinking that?

cheriot9mo ago

Some people find a solution, think about it, and incorporate that into their understanding of the world.

Some people ask for a coworker to do it for them, c/p stackoverflw, etc and never learn.

AI makes the first group that much more effective. Setup should be a learning process not a hazing ritual.

RunningDroid9mo ago

> > Onboarding is a lot of random environment setup hitches > > Learning how to navigate these hitches is a really important process

To add to this, a barrier to contribution can reduce low quality/spam contributions. The downside is that a barrier to contribution that's too high reduces all contributions.

kmoser9mo ago

There will always be people who know how to handle the complexity we're trying to automate away. If I can't figure out some arcane tax law when filling out my taxes, I ask my accountant, as it's literally their job to know these things.

1 more reply

namenotrequired9mo ago

> LLMs … approximate correctness for varying amounts of time. Once that time runs out there is a sharp drop off in model accuracy, it simply cannot continue to offer you an output that even approximates something workable. I have taken to calling this phenomenon the "AI Cliff," as it is very sharp and very sudden

I’ve never heard of this cliff before. Has anyone else experienced this?

gwd9mo ago

I experience it pretty regularly -- once the complexity of the code passes a certain threshold, the LLM can't keep everything in its head and starts thrashing around. Part of my job working with the LLM is to manage the complexity it sees.

And one of the things with current generators is that they tend to make things more complex over time, rather than less. It's always me prompting the LLM to refactor things to make it simpler, or doing the refactoring once it's gotten to complex for the LLM to deal with.

So at least with the current generation of LLMs, it seems rather inevitable that if you just "give LLMs their head" and let them do what they want, eventually they'll create a giant Rube Goldberg mess that you'll have to try to clean up.

ETA: And to the point of the article -- if you're an old salt, you'll be able to recognize when the LLM is taking you out to sea early, and be able to navigate your way back into shallower waters even if you go out a bit too far. If you're a new hand, you'll be out of your depth and lost at sea before you know it's happened.

windward9mo ago

I've seen it referred to as 'context drunk'.

Imagine that you have your input to the context, 10000 tokens that are 99% correct. Each time the LLM replies it adds 1000 tokens that are 90% correct.

After some back-and-forth of you correcting the LLM, its context window is mostly its own backwash^Woutput. Worse, the error compounds because the 90% that is correct is just correct extrapolation of an argument about incorrect code, and because the LLM ranks more recent tokens as more important.

The same problem also shows up in prose.

Workaccount29mo ago

I call it context rot. As the context fills up the quality of output erodes with it. The rot gets even worse or progresses faster the more spurious or tangential discussion is in context.

This is also can be made much worse by thinking models, as their CoT is all in context, and if there thoughts really wander it just plants seeds of poison feeding the rot. I really wish they can implement some form of context pruning, so you can nip irrelevant context when it forms.

In the meantime, I make summaries and carry it to a fresh instance when I notice the rot forming.

bubblyworld9mo ago

I've only experienced this while vibe coding through chat interfaces, i.e. in the complete absence of feedback loops. This is much less of a problem with agentic tools like claude code/codex/gemini cli, where they manage their own context windows and can run your dev tooling to sanity check themselves as they go.

Paradigma119mo ago

If the context gets to big or otherwise poisoned you have to restart the chat/agent. A bit like windows of old. This trains you to document the current state of your work so the new agent can get up to speed.

Kuinox9mo ago

I'm doing my own procedurally generated benchmark.

I can make the problem input bigger as I want.

Each LLM have a different thresholf for each problem, when crossed the performance of the LLM collapse.

lubujackson9mo ago

I definitely hit this vibe coding a large-ish backend. Well defined data structures, good modularity, etc. But at a point, Cursor started to lose the plot and rewrite or duplicate functions, recreate or misue data structures, etc.

The solve was to define several Cursor rules files for different views of the codebase - here's the structure, here's the validation logic, etc. That and using o3 has at least gotten me to the next level.

npteljes9mo ago

I reset "work" AI sessions quite frequently, so I didn't see that there. I experienced it though with storytelling. In my storytelling scenario, context and length was important. And the AI at one late point forgot how my characters should behave in the developing situation, and just had them react to it in a very different way. And there was no going back from that. Very weird experience.

sandspar9mo ago

I'm not sure. Is he talking about context poisoning?

impure9mo ago

This sounds a lot like accuracy collapse as discussed in that Apple paper. That paper clearly showed that there is some point where AI accuracy collapses extremely quickly.

I suspect it has something more to do with the model producing too many tokens and becoming fixated on what it said before. You'll often see this in long conversations. The only way to fix it is to start a new conversation.

Syzygies9mo ago

One can find opinions that Claude Code Opus 4 is worth the monthly $200 I pay for Anthropic's Max plan. Opus 4 is smarter; one either can't afford to use it, or can't afford not to use it. I'm in the latter group.

One feature others have noted is that the Opus 4 context buffer rarely "wears out" in a work session. It can, and one needs to recognize this and start over. With other agents, it was my routine experience that I'd be lucky to get an hour before having to restart my agent. A reliable way to induce this "cliff" is to let AI take on a much too hard problem in one step, then flail helplessly trying to fix their mess. Vibe-coding an unsuitable problem. One can even kill Opus 4 this way, but that's no way to run a race horse.

Some "persistence of memory" harness is as important as one's testing harness, for effective AI coding. With the right care having AI edit its own context prompts for orienting new sessions, this all matters less. AI is spectacularly bad at breaking problems into small steps without our guidance, and small steps done right can be different sessions. I'll regularly start new sessions when I have a hunch that this will get me better focus for the next step. So the cliff isn't so important. But Opus 4 is smarter in other ways.

fwip9mo ago

Sometimes after it flails for a while, but I think it's on the right path, I'll rewind the context to just before it started trying to solve the problem (but keep the code changes). And I'll tell it "I got this other guy to attempt what we just talked about, but it still has some problems."

Snipping out the flailing in this way seems to help.

suddenlybananas9mo ago

>can't afford not to use it. I'm in the latter group.

People love to justify big expenses as necessary.

1 more reply

beau_g9mo ago

The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

That said, I do think it would be nice for people to note in pull requests which files have AI gen code in the diff. It's still a good idea to look at LLM gen code vs human code with a bit different lens, the mistakes each make are often a bit different in flavor, and it would save time for me in a review to know which is which. Has anyone seen this at a larger org and is it of value to you as a reviewer? Maybe some tool sets can already do this automatically (I suppose all these companies report the % of code that is LLM generated must have one if they actually have these granular metrics?)

acedTrex9mo ago

Author here:

> The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

Hmm, I was just saying I hadn't seen much literature or discussion on trust dynamics in teams with LLMs. Maybe I'm just in the wrong spaces for such discussions but I haven't really come across it.

acedTrex9mo ago

Hi everyone, author here.

Sorry about the JS stuff I wrote this while also fooling around with alpine.js for fun. I never expected it to make it to HN. I'll get a static version up and running.

Happy to answer any questions or hear other thoughts.

Edit: https://static.jaysthoughts.com/

Static version here with slightly wonky formatting, sorry for the hassle.

Edit2: Should work on mobile now well, added a quick breakpoint.

konaraddi9mo ago

Given the topic of your post, and high pagespeed results, I think >99% of your intended audience can already read the original. No need to apologize or please HN users.

davidthewatson9mo ago

Well said. The death of trust in software is a well worn path from the money that funds and founds it to the design and engineering that builds it - at least the 2 guys-in-a-garage startup work I was involved in for decades. HITL is key. Even with a human in the loop, you wind up at Therac 25. That's exactly where hybrid closed loop insulin pumps are right now. Autonomy and insulin don't mix well. If there weren't a moat of attorneys keeping the signal/noise ratio down, we'd already realize that at scale - like the PR team at 3 letter technical universities designed to protect parents from the exploding pressure inside the halls there.

DyslexicAtheist9mo ago

it's really hard using AI (not impossible) to produce meaningful offensive security to improve defense due to there being way too many guard rails.

While on the other hand real nation-state threat actors would face no such limitations.

On a more general level, what concerns me isn't whether people use it to get utility out of it (that would be silly), but the power-imbalance in the hand of a few, and with new people pouring their questions into it, this divide getting wider. But it's not just the people using AI directly but also every post online that eventually gets used for training. So to be against it would mean to stop producing digital content.

geor9e9mo ago

They changed the headline to "Yes, I will judge you for using AI..." so I feel like I got the whole story already.

1 more reply

kordlessagain8mo ago

Spending half the time building Claude Code tools (MCP servers) and half my time working on Gnosis, an AI powered oracle.

What is an oracle? That's a system that:

- Knows things - Has pre-crawled, indexed information about specific domains

- Answers authoritatively - Not just web search, but curated, verified data

- Connects isolated systems - Apps can query Gnosis instead of implementing their own crawling/search

- May have some practical use for blockchain actions (typically a crypto "oracle" bridges web data with chain data. In this context the "oracle" is AI + storage + transactions on the chain.

The Core Components:

- Evolve: Our tooling layer - manages the MCP servers, handles deployment, monitors health. Agentic tools.

- Wraith: Web crawler that fetches and processes content from URLs, handles JavaScript rendering, screenshots, and more. Agentic crawler.

- Alaya: Vector database (streaming projected dimensions) for storing and searching through all the collected information. Agentic storage.

- Gnosis-Docker: Container orchestration MCP server for managing these services locally. Agentic DevOps.

There's more coming.

https://github.com/kordless/gnosis-evolve

https://linkedin.com/in/kordless

https://github.com/kordless/gnosis-wraith (under heavy development)

There's also a complete MCP inspection and debugging system for Python here: https://github.com/kordless/gnosis-mystic

heisenbit9mo ago

> The reality is that LLMs enable an inexperienced engineer to punch far above their proverbial weight class. That is to say, it allows them to work with concepts immediately that might have taken days, months or even years otherwise to get to that level of output.

At the moment LLMs allow me to punch far above my weight class in Python where I do a short term job. But then I know all the concepts from decades dabbling in other ecosystems. Let‘s all admit there is a huge amount of accidental complexity (h/t Brooks‘s Silver-bullet) in our world. For better or worse there are skill silos that are now breaking down.

lawlessone9mo ago

One trust breaking issue is we still can't know why the LLM makes specific choices.

Sure we can ask it why it did something but any reason it gives is just something generated to sound plausible.

mensetmanusman9mo ago

All this means is that the QC is going to be 10x more important.

archibaldJ9mo ago

This can be solved when the ARC puzzle is cracked (https://arcprize.org/play) so we can automate correctness-checking like in coq but for program synthesis.

mizzao9mo ago

The last section of this post seems to be quite predictive of a sibling post on the front page right now: https://news.ycombinator.com/item?id=44382752

wg09mo ago

We have seen those 10x engineers churning out PRs and huge PRs before anyone can fathom and make sense of the whole damn thing.

Wondering what they would be producing with LLMs?

I_Lorem9mo ago

He's making a good point on trust, but, really, doesn't the trust flow both directions? Should the Sr. Engineer rubber stamp or just take a quick glance at Bob's implementation because he's earned his chops, or should the Sr. Engineer apply the same level of review regardless of whether it's Bob, Mary, or Rando Calrissian submitting their work for review?

eikenberry9mo ago

The Sr. Engineer should definitely give (presumably another Sr. Eng.) Bob's code a quicky review and approve it. If Mary or Rando are Sr. then they should get the same level as well. If anyone is a Jr. they should get a much more in-depth review as it's a teaching opportunity, whereas Sr. on Sr. reviews are done to enforce conventions and to be sure the PR has an audience (people take more care when they know other people will look at it).

fhd29mo ago

A bit tangential, but I noticed quite a discrepancy between augmented coding done well, and augmented coding how I actually see it done in the wild.

There's a lot of posts about how to do it well, and I like the idea of it, generally. I think GenAI has genuine applications in software development beyond as a Google/SO replacement.

But then there's real world code. I constantly see:

1. Over engineering. People used to keep it simple because they were limited by how fast they can type. Well, those gloves sure did come off for a lot of developers.

2. Lack of understanding / memory. If I ask someone about how their code works, if they didn't write it (or at least carefully analyse it), it's rare for them to understand or even remember what they did there. The common answer to "how does this work?", went from "I think like this but let me double check" to "no idea". Some will be proud to tell you they auto generated documentation, too. If you have any questions about that, chances are you'll get another "no idea" response. If you ask an LLM how it works, that's very hit and miss for non-trivial systems. I always tell my devs I hire them to understand systems first and formost, building systems comes second. I feel increasingly alone with that attitude.

3. Bugs. So many bugs. It seems devs that generate code would need to do a lot more explicit testing than those who don't. There's probably just a missing feedback loop: When typing in code, you tend to have to test every little button action and so on at least once, it's just part of the work. Chances are you don't break it since you last tested it, so while this happens, manually written code generally has one time exhaustive manual testing built into the process naturally. If you generate a whole UI area, you need to do thorough testing of all kinds of conditions. Seems people don't.

So while it could be great, from my perspective, it feels like more of a net negative in practice. It's all fun and games until there's a problem. And there always is.

Maybe I have a bad sample of the industry. We essentially specialise on taking over technically disastrous projects and other kinds of tricky situations. Few people hire us to work on a good system with a strong team behind it.

But still, comparing the questionable code bases I got into two years ago with those I get into now, there is a pretty clear change for the worse.

Maybe I'm pessimistic, but I'm starting to think we'll need another software crisis (and perhaps a wee AI winter) to get our act together with this new technology. I hope I'm wrong.

throwawayoldie9mo ago

IMHO, s/may/has/

atemerev9mo ago

I am a software engineer who writes 80-90% code with AI (sorry, can't ignore the productivity boost), and I mostly agree with this sentiment.

I found out very early that under no circumstances you may have the code you don't understand, anywhere. Well, you may, but not in public, and you should commit to understanding it before anyone else sees that. Particularly before sales guys do.

However, AI can help you with learning too. You can run experiments, test hypotheses and burn your fingers so fast. I like it.

helge92109mo ago

I checked with HR at my company and got an answer I'm not allowed to announce the following: anyone submitting the code or asking a question about the code without disclosing the fact that the code in question was generated by LLM would be cursed.

benreesman9mo ago

I'm currently standing up a C++ capability in an org that hasn't historically had one, so things like the style guide and examples folder require a lot of care to give a good start for new contributors.

I have instructions for agents that are different in some details of convention, e.g. human contributors use AAA allocation style, agents are instructed to use type first. I convert code that "graduates" from agent product to review-ready as I review agent output, which keeps me honest that I don't myself submit code without scrutiny to the review of other humans: they are able to prompt an LLM without my involvement, and I'm able to ship LLM slop without making a demand on their time. Its an honor system, but a useful one if everyone acts in good faith.

I get use from the agents, but I almost always make changes and reconcile contradictions.

pfdietz9mo ago

There was trust?

thedudeabides59mo ago

dont.trust.machines

tomhow9mo ago

[Stub for offtopicness, including but not limited to comments replying to original title rather than article's content]

extr9mo ago

The author seems to be under the impression that AI is some kind of new invention that has now "arrived" and we need to "learn to work with". The old world is over. "Guaranteeing patches are written by hand" is like the Tesla Gigafactory wanting a guarantee that the nuts and bolts they purchase are hand-lathed.

lynx979mo ago

No worries, I also judge you for relying on JavaScript for your "simple blog".

rvnx9mo ago

Claude said to use Markdown, text file or HTML with minimal CSS. So it means the author does not know how to prompt.

The blog itself is using Alpine JS, which is a human-written framework 6 years ago (https://github.com/alpinejs/alpine), and you can see the result is not good.

mnmalst9mo ago

Ha, I came her to make the same comment.

Two completely unnecessary request to: jsdelivr.net and net.cdn.cloudflare.net

acedTrex9mo ago

I wrote it while playing with alpine.js for fun just messing around with stuff.

Never actually expected it to be posted on HN. Working on getting a static version up now.

gblargg9mo ago

Doesn't even work on older browsers either.

can16358p9mo ago

Ironically, a blog post about judging for a practice uses terrible web practices: I'm on mobile and the layout is messed up, and Safari's reader mode crashes on this page for whatever reason.

acedTrex9mo ago

Mobile layout should be fixed now, I also just threw up a quick static version here as well https://static.jaysthoughts.com/

rvnx9mo ago

On Safari mobile you even get a white page, which is almost poetic. It means it pushes your imagination to the max.

MaxikCZ9mo ago

Yes, I will judge you for requiring javascript to display a page of such basic nature.

djm_9mo ago

You could do with using an LLM to make your site work on mobile.

EbNar9mo ago

I'll surely care that a stranger on the internet judges me about the tools I use kor I don't).

Kuinox9mo ago

7 comments.

3 have obviously only read the title, and 3 comments how the article require JS.

Well played HN.

tomhow9mo ago

This exactly why the guideline about titles says:

Otherwise please use the original title, unless it is misleading or linkbait.

This title counts as linkbait so I've changed it. It turns out the article is much better (for HN) than the title suggests.

1 more reply

sandspar9mo ago

That's typical for link sharing communities like HN and Reddit. His title clearly struck a nerve. I assume many people opened the link, saw that it was a wall of text, scanned the first paragraph, categorized his point into some slot that they understand, then came here to compete in HN's side-market status game. Normal web browsing behavior, in other words.

j3th9n9mo ago

Back in the day they would judge people for turning on a lightbulb instead of lighting a candle.

thereisnospork9mo ago

In a few years people who don't/can't use AI will be looked at like people who couldn't use a computer ~20 years ago.

It might not solve every problem, but it solves enough of them better enough it belongs in the tool kit.

tines9mo ago

I think it will be the opposite. AI causes cognitive decline, in the future only the people who don't use AI will retain their ability to think. Same as smartphone usage, the less the better.

2 more replies

sandspar9mo ago

It's interesting that AI proponents say stuff like, "Humans will remain interested in other humans, even after AI can do all our jobs." It really does seem to be true. Here for example we have a guy who's using AI to make a status-seeking statement i.e. "I'm playing a strong supporting role on the 'anti-AI thinkers' team therefore I'm high status". Like, humans have an amazing ability to repurpose anything into status markers. Even AI. I think that if AI replaces all of our actual jobs then we'll still spend our time doing status jobs. In a way this guy is living in the future even more than most AI users.

michelsedgh9mo ago

For now, yes, because humans are doing most of jobs better than AI. In 10 years time, if the AI's are doing a better job, people like author need to learn all the ropes if they wanna catch up. I don't think LLMs will destroy all jobs, i think those who learn them and use them properly, and those professionals will outdo people who don't use these tools just for the sake of saying I'm high status I dont use these tools.

1 more reply

DocTomoe9mo ago

You can judge all you want. You'll eventually appear much like that old woman secretly judging you in church.

Most of the current discourse on AI coding assistants sounds either breathlessly optimistic or catastrophically alarmist. What’s missing is a more surgical observation: the disruptive effect of LLMs is not evenly distributed. In fact, the clash between how open source and industry teams establish trust reveals a fault line that’s been papered over with hype and metrics.

FOSS project work on a trust basis - but industry standard is automated testing, pair programming, and development speed. That CRUD app for finding out if a rental car is available? Not exactly in need for a hand-crafted piece of code, and no-one cares if Junior Dev #18493 is trusted within the software dev organization.

If the LLM-generated code breaks, blame gets passed, retros are held, Jira tickets multiply — the world keeps spinning, and a team fixes it. If a junior doesn’t understand their own patch, the senior rewrites it under deadline. It’s not pretty, but it works. And when it doesn’t, nobody loses “reputation” - they lose time, money, maybe sleep. But not identity.

LLMs challenge open source where it’s most vulnerable - in its culture. Meanwhile, industry just treats them like the next Jenkins: mildly annoying at first, but soon part of the stack.

The author loves the old ways, for many valid reasons: Gabled houses are beautiful, but outside of architectural circles, prefab is what scaled the suburbs, not timber joints and romanticism.

observationist9mo ago

There's no reason to think AI will stop improving, and the rate of improvement is increasing as well, and no reason to think that these tools won't vastly outperform us in the very near future. Putting aside AGI and ASI, simply improving the frameworks of instructions and context, breaking down problems into smaller problems, and methodology of tools will result in quality multiplication.

Making these sort of blanket assessments of AI, as if it were a singular, static phenomena is bad thinking. You can say things like "AI Code bad!" about a particular model, or a particular model used in a particular context, and make sense. You cannot make generalized statements about LLMs as if they are uniform in their flaws and failure modes.

They're as bad now as they're ever going to be again, and they're getting better faster, at a rate outpacing the expectations and predictions of all the experts.

The best experts in the world, working on these systems, have a nearly universal sentiment of "holy shit" when working on and building better AI - we should probably pay attention to what they're seeing and saying.

There's a huge swathe of performance gains to be made in fixing awful human code. There's a ton of low hanging fruit to be gotten by doing repetitive and tedious stuff humans won't or can't do. Those two things mean at least 20 or more years of impressive utility from AI code can be had.

Things are just going to get faster, and weirder, and weirder faster.

klabb39mo ago

> There's no reason to think AI will stop improving

No, and there’s no reason to think cars will stop improving either, but that doesn’t mean they will start flying.

The first error is in thinking that AI is converging towards a human brain. To treat this as a null hypothesis is incongruent both wrt the functional differences between the two and crucially empirical observations of the current trajectory of LLMs. We have seen rapid increases in ability, yes, but those abilities are very asymmetrical by domain. Pattern matching and shitposting? Absolutely crushing humans already. Novel conceptual ideas and consistency checked reasoning? Not so much, eg all that hype around PhD-level novel math problems died down as quickly as it had been manufactured. If they were converging on human brain function, why this vastly uneven ability increases?

The second error is to assume a superlinear ability improvement when the data has more or less run out and has to be slowly replenished over time, while avoiding the AI pollution in public sources. It’s like assuming oil will accelerate if it had run out and we needed to wait for more bio-matter to decompose for every new drop of crude. Can we improve engine design and make ICEs more efficient? Yes, but it’s a diminishing returns game. The scaling hypothesis was not exponential but sigmoid, which is in line with most paradigm shifts and novel discoveries.

> Making these sort of blanket assessments of AI, as if it were a singular, static phenomena is bad thinking.

I agree, but do you agree with yourself here? Ie:

> no reason to think that these tools won't vastly outperform us in the very near future

.. so back to single axis again? How is this different from saying calculators outperform humans?

jrflowers9mo ago

> shitposting? Absolutely crushing humans already

Where can I go to see language model shitposters that are better than human shitposters?

1 more reply

christhecaribou9mo ago

Sure, if we all collectively ignore model collapse.

ayakaneko9mo ago

I think that, yes sure, there's no reason to think AI will stop improving.

But I think that everyone is lossing trust not because there is no potential that LLMs could write good code or not, it's the trust to the user who uses LLMs to uncontrollable-ly generate those patches without any knowledge, fact checks, and verifications. (many of them may not even know how to test it.)

In another word, while LLMs is potentially capable of being a good SWE, but the human behind it right now, is spamming, and doing non-sense works, and let the unpaid open source maintainers to review and feedback them (most of the time, manually).

j / k navigate · click thread line to collapse

275 comments

gblargg9mo ago

https://archive.is/5I9sB

(Works on older browsers and doesn't require JavaScript except to get past CloudSnare).

dirkc9mo ago

I have a friend that always says "innovation happens at the speed of trust". Ever since GPT3, that quote comes to mind over and over.

acedTrex9mo ago

Author here: I quite like that quote. A very succinct way of saying what took me a few paragraphs.

This new world of having to verify every single thing at all points is quite exhausting and frankly pretty slow.

JackFr9mo ago

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

The classic on the subject.

tayo429mo ago

We do this is in professional environments already with documentation for designs upfront and code reviews though

EGreg9mo ago

“Freedom of speech” in politics

Herring9mo ago

3 more replies

lubujackson9mo ago

bluefirebrand9mo ago

> We never can have total trust in LLM output, but we can certainly sanitize it and limit it's destructive range

Can we really do this reliably? LLMs are non-deterministic, right, so how do we validate the output in a deterministic way?

We can validate things like shape of data being returned, but how do we validate correctness without an independent human in the loop to verify?

3 more replies

ngold9mo ago

You perfectly said nothing. Well done.

whiplash4519mo ago

> "innovation happens at the speed of trust"

You'll have to elaborate on that. How much trust was there in electricity, flight and radioactivity when we discovered them?

In science, you build trust as you go.

agent2819mo ago

Have you heard of the War of the Currents?

https://en.wikipedia.org/wiki/War_of_the_currents

1 more reply

reaperducer9mo ago

How much trust was there in electricity, flight and radioactivity when we discovered them?

Not much.

(Edit: If you'd like to see this played out visually, watch the early episodes of Miss Fisher's Murder Mysteries on ABC [.oz])

Radioactivity? There was a plethora of radioactive hysteria in books, magazines, comics, television, movies, and radio. It's not hard to find.

1 more reply

dirkc9mo ago

4 more replies

blurbleblurble9mo ago

As we were debugging, my colleague revealed his assumption that I'd used AI to write it, and expressed frustration at trying to understand something AI generated after the fact.

Be careful out there.

kldg9mo ago

Now I trust this paper and people involved implicitly, but this was a lot of work to convince just one person.

stavros9mo ago

acedTrex9mo ago

Author here:

Essentially the premise is that in medium trust environments like very large teams or low trust environments like an open source project.

LLMs make it very difficult to make an immediate snap judgement about the quality of the dev that submitted the patch based solely on the code itself.

Obviously if you are in an existing competent high trust team then this problem does not apply and most likely seems completely foreign as a concept.

lxgr9mo ago

> LLMs make it very difficult to make an immediate snap judgement about the quality [...]

Maybe that's not the worst thing in the world. I feel like it often made people complacent.

5 more replies

eddd-ddde9mo ago

> In the absence of being able to ascertain the type of person you are dealing with you have to fall back too "no trust" and review everything with a very fine tooth comb.

Is that not how you review all code? I don't care who wrote the code, just because certain person wrote the code doesn't give them an instant pass to skip my review process.

sim7c009mo ago

its about the quality of the code, not the quality of the dev. you might think it's related, but it's not.

a dev can write piece of good, and piece of bad code. so per code, review the code. not the dev!

2 more replies

alganet9mo ago

> I learned to trust them because their code works well

There's so much more than "works well". There are many cues that exist close to code, but are not code:

I trust more if the contributor explains their change well.

I trust more if the contributor did great things in the past.

I trust more if the contributor manages granularity well (reasonable commits, not huge changes).

I trust more if the contributor picks the right problems to work on (fixing bugs before adding new features, etc).

I trust more if the contributor proves being able to maintain existing code, not just add on top of it.

I trust more if the contributor makes regular contributions.

And so on...

acedTrex9mo ago

Author here:

moffkalast9mo ago

It's easy to get overconfident and not test the LLM's code enough when it worked fine for a handful of times in a row, and then you miss something.

The thing that 4o does with deep research where it asks for additional info before it does anything should be standard for any code generation too tbh, it would prevent a mountain of issues.

stavros9mo ago

Sure, but you're still responsible for the quality of the code you commit, LLM or no.

2 more replies

insane_dreamer9mo ago

> If someone uses an LLM and produces bug-free code, I'll trust them.

somewhereoutth9mo ago

Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work. LLMs are not calculators, nor are they the internet.

I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

stavros9mo ago

Well, no, a good rule of thumb is to expect people to write good code, no matter how they do it. Why would you mandate what tool they can use to do it?

1 more reply

tranchebald9mo ago

badsectoracula9mo ago

> Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

breuleux9mo ago

mexicocitinluez9mo ago

>Because when people use LLMs, they are getting the tool to do the work for them, not using the tool to do the work.

What? How on god's green earth could you even pretend to know how all people are using these tools?

> LLMs are not calculators, nor are they the internet.

Umm, okay? How does that make them less useful?

I'm going to give you a concrete example of something I just did and let you try and do whatever mental gymnastics you have to do to tell me it wasn't useful:

> I suspect that as the serious problems with LLMs become ever more apparent, this will become standard policy across the board. Certainly I hope so.

The irony is that they're getting better by the day. That's not to say people don't use them for the wrong applications, but the idea that this tech is going to be banned is absurd.

> A good rule of thumb is to simply reject any work that has had involvement of an LLM

sebmellen9mo ago

This is okay for platitudes, but for emails that really matter, having this messy watercolor kind of writing totally destroys the clarity of the text and confuses everyone.

2 more replies

flir9mo ago

> A good rule of thumb is to simply reject any work that has had involvement of an LLM,

How are you going to know?

2 more replies

jnxx9mo ago

We already treat spam and unsolicited commercial email that way.

taneq9mo ago

stavros9mo ago

Sure, your heuristic no longer works, and that's a bit inconvenient. We'll just find new ones.

3 more replies

legacynl9mo ago

> How is this different from when they were only using their brain to produce the code?

If A = B, higher B means that A is higher too.

Now it's A + Ai = B. Now Higher B doesn't necessarily mean higher A.

Especially since the current state of Ai is pretty much stochastic, and sometimes is worse than nothing at all

JambalayaJimbo9mo ago

I have never been in a work environment in which I’ve been able to do more than rubber stamp PRs. Performing a deep review of each change is simply impossible with the expectations we were given.

stavros9mo ago

Interesting, I've never been in one where we didn't read PRs.

mexicocitinluez9mo ago

It's not.

What you're seeing now is people who once thought and proclaimed these tools as useless now have to start to walk back their claims with stuff like this.

It does amaze me that the people who don't use these tools seem to have the most to say about them.

acedTrex9mo ago

Author here:

For what it's worth I do actually use the tools albeit incredibly intentionally and sparingly.

That however doesn't change my thoughts on trying to ascertain if code submitted to me deserves a full indepth review or if I can maybe cut a few corners here and there.

1 more reply

globnomulous9mo ago

> It does amaze me that the people who don't use these tools seem to have the most to say about them.

Someone recently shared on HN a blog post explaining why the author doesn't use LLMs. The justification for the post? "People keep asking me."

1 more reply

satisfice9mo ago

LLMs make bad work— of any kind— look like plausibly good work. That’s why it is rational to automatically discount the products of anyone who has used AI.

We all assume the people in our lives are not actively trying to hurt us. When that trust breaks, it breaks hard.

No one who uses AI can claim “this is my work.” I don’t know that it is your work.

No one who uses AI can claim that it is good work, unless they thoroughly understand it, which they probably don’t.

This kind of problem is not new to AI. But it has become ten times worse.

bobjordan9mo ago

satisfice9mo ago

1 more reply

axegon_9mo ago

autobodie9mo ago

In my experience, LLMs are extremely inclined to modify code just to pass tests instead of meeting requirements.

fwip9mo ago

When they're not modifying the tests to match buggy behavior. :P

devjab9mo ago

I have no idea how our industry expect anyone to become experts in the age of vibe coding though.

furyofantares9mo ago

> Because the AI-agents we use with direct code-access aren't very chatty.

I didn't make a proposition that can be correct or not, and it didn't do any work yet to to investigate my question - it feels like it has poisoned its own context by leading with this.

2 more replies

gibspaulding9mo ago

> Where AI-agents are the most useful for us is with legacy code

1 more reply

teeray9mo ago

> Because the AI-agents we use with direct code-access aren't very chatty.

So they’re even more confident in their wrongness

1 more reply

axegon_9mo ago

mexicocitinluez9mo ago

> 8 or 9 out of 10 times.

Not they don't. This is 100% a made up statistic.

bluefirebrand9mo ago

It isn't even being presented as a statistic it is someone saying what they have experienced

2 more replies

HardCodedBias9mo ago

All of this fighting against LLMs is pissing in the wind.

It seems that LLMs, as they work today, make developers more productive. It is possible that they benefit less experienced developers even more than experienced developers.

More productivity, and perhaps very large multiples of productivity, will not be abandoned due roadblocks constructed by those who oppose the technology due to some reason.

mjr009mo ago

> It seems that LLMs, as they work today, make developers more productive.

Think this strongly depends on the developer and what they're attempting to accomplish.

> Working with the technology and mitigating it's weaknesses is the only rational path forward.

It's possible we'll look back and see AI agents, or other current applications of generative AI, as the coffee blockchain of this bubble.

[0] https://www.reddit.com/r/ExperiencedDevs/comments/1krttqo/my...

[1] https://www.forbes.com/sites/robertanzalone/2020/07/15/big-c...

parineum9mo ago

conartist69mo ago

And here it is again. "More productive"

But this doesn't mean that the model/human combo is more effective at serving the needs of users! It means "producing more code."

There are no LLMs shipping changesets that delete 2000 lines of code -- that's how you know "making engineers more productive" is a way of talking about how much code is being created...

eikenberry9mo ago

FuckButtons9mo ago

I think you’re arguing against something the author didn’t actually say.

You seem to be claiming that this is a binary, either we will or won’t use llms, but the author is mostly talking about risk mitigation.

scelerat9mo ago

I didn't see the post as pissing into the wind so much as calling out several caveats of coding with LLMs, especially on teams, and ideas on how to mitigate them.

ge969mo ago

It is funny (ego) I remember when React was new and I refused to learn it, had I learned it earlier I probably would have entered the market years earlier.

anthonypasq9mo ago

this mindset does not work in software. My dad would still be programming with punchcards if he thought this way. instead he using copilot daily writing microservices and isnt some annoying dinosaur

1 more reply

pu_pe9mo ago

I can certainly see an erosion of trust in the future, with the result being that test-driven development gains even more momentum. Don't trust, and verify.

lbalazscs9mo ago

acedTrex9mo ago

> but rather if it proves it can do what it intends to do after extensive testing.

Author here: Here I was less talking about the effectiveness of the output of a given tool and more so about the tool itself.

To take your garbage collection example, sure perhaps an agentic system at some point can spin some stuff up and beat it into submission with test harnesses, bug fixes etc.

bluefirebrand9mo ago

> I think what the author misses here is that imperfect, probabilistic agents can build reliable, deterministic systems

That is quite a statement! You're talking about systems that are essentially entropy-machines somehow creating order?

> with the result being that test-driven development gains even more momentum

Why is it that TDD is always put forward as the silver bullet that fixes all issues with building software

The number of times I've seen TDD build the wrong software after starting with the wrong tests is actually embarassing

cheriot9mo ago

> promises that the contributed code is not the product of an LLM but rather original and understood completely.

> require them to be majority hand written.

We should specify the outcome not the process. Expecting the contributor to understand the patch is a good idea.

> Juniors may be encouraged/required to elide LLM-assisted tooling for a period of time during their onboarding.

bluefirebrand9mo ago

> Onboarding is a lot of random environment setup hitches

Learning how to navigate these hitches is a really important process

cheriot9mo ago

Some people find a solution, think about it, and incorporate that into their understanding of the world.

Some people ask for a coworker to do it for them, c/p stackoverflw, etc and never learn.

AI makes the first group that much more effective. Setup should be a learning process not a hazing ritual.

RunningDroid9mo ago

> > Onboarding is a lot of random environment setup hitches > > Learning how to navigate these hitches is a really important process

To add to this, a barrier to contribution can reduce low quality/spam contributions. The downside is that a barrier to contribution that's too high reduces all contributions.

kmoser9mo ago

1 more reply

namenotrequired9mo ago

I’ve never heard of this cliff before. Has anyone else experienced this?

gwd9mo ago

windward9mo ago

I've seen it referred to as 'context drunk'.

Imagine that you have your input to the context, 10000 tokens that are 99% correct. Each time the LLM replies it adds 1000 tokens that are 90% correct.

The same problem also shows up in prose.

Workaccount29mo ago

I call it context rot. As the context fills up the quality of output erodes with it. The rot gets even worse or progresses faster the more spurious or tangential discussion is in context.

In the meantime, I make summaries and carry it to a fresh instance when I notice the rot forming.

bubblyworld9mo ago

Paradigma119mo ago

Kuinox9mo ago

I'm doing my own procedurally generated benchmark.

I can make the problem input bigger as I want.

Each LLM have a different thresholf for each problem, when crossed the performance of the LLM collapse.

lubujackson9mo ago

npteljes9mo ago

sandspar9mo ago

I'm not sure. Is he talking about context poisoning?

impure9mo ago

This sounds a lot like accuracy collapse as discussed in that Apple paper. That paper clearly showed that there is some point where AI accuracy collapses extremely quickly.

Syzygies9mo ago

fwip9mo ago

Snipping out the flailing in this way seems to help.

suddenlybananas9mo ago

>can't afford not to use it. I'm in the latter group.

People love to justify big expenses as necessary.

1 more reply

beau_g9mo ago

The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

acedTrex9mo ago

Author here:

> The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

Hmm, I was just saying I hadn't seen much literature or discussion on trust dynamics in teams with LLMs. Maybe I'm just in the wrong spaces for such discussions but I haven't really come across it.

acedTrex9mo ago

Hi everyone, author here.

Sorry about the JS stuff I wrote this while also fooling around with alpine.js for fun. I never expected it to make it to HN. I'll get a static version up and running.

Happy to answer any questions or hear other thoughts.

Edit: https://static.jaysthoughts.com/

Static version here with slightly wonky formatting, sorry for the hassle.

Edit2: Should work on mobile now well, added a quick breakpoint.

konaraddi9mo ago

Given the topic of your post, and high pagespeed results, I think >99% of your intended audience can already read the original. No need to apologize or please HN users.

davidthewatson9mo ago

DyslexicAtheist9mo ago

it's really hard using AI (not impossible) to produce meaningful offensive security to improve defense due to there being way too many guard rails.

While on the other hand real nation-state threat actors would face no such limitations.

geor9e9mo ago

They changed the headline to "Yes, I will judge you for using AI..." so I feel like I got the whole story already.

1 more reply

kordlessagain8mo ago

Spending half the time building Claude Code tools (MCP servers) and half my time working on Gnosis, an AI powered oracle.

What is an oracle? That's a system that:

- Knows things - Has pre-crawled, indexed information about specific domains

- Answers authoritatively - Not just web search, but curated, verified data

- Connects isolated systems - Apps can query Gnosis instead of implementing their own crawling/search

- May have some practical use for blockchain actions (typically a crypto "oracle" bridges web data with chain data. In this context the "oracle" is AI + storage + transactions on the chain.

The Core Components:

- Evolve: Our tooling layer - manages the MCP servers, handles deployment, monitors health. Agentic tools.

- Wraith: Web crawler that fetches and processes content from URLs, handles JavaScript rendering, screenshots, and more. Agentic crawler.

- Alaya: Vector database (streaming projected dimensions) for storing and searching through all the collected information. Agentic storage.

- Gnosis-Docker: Container orchestration MCP server for managing these services locally. Agentic DevOps.

There's more coming.

https://github.com/kordless/gnosis-evolve

https://linkedin.com/in/kordless

https://github.com/kordless/gnosis-wraith (under heavy development)

There's also a complete MCP inspection and debugging system for Python here: https://github.com/kordless/gnosis-mystic

heisenbit9mo ago

lawlessone9mo ago

One trust breaking issue is we still can't know why the LLM makes specific choices.

Sure we can ask it why it did something but any reason it gives is just something generated to sound plausible.

mensetmanusman9mo ago

All this means is that the QC is going to be 10x more important.

archibaldJ9mo ago

This can be solved when the ARC puzzle is cracked (https://arcprize.org/play) so we can automate correctness-checking like in coq but for program synthesis.

mizzao9mo ago

The last section of this post seems to be quite predictive of a sibling post on the front page right now: https://news.ycombinator.com/item?id=44382752

wg09mo ago

We have seen those 10x engineers churning out PRs and huge PRs before anyone can fathom and make sense of the whole damn thing.

Wondering what they would be producing with LLMs?

I_Lorem9mo ago

eikenberry9mo ago

fhd29mo ago

A bit tangential, but I noticed quite a discrepancy between augmented coding done well, and augmented coding how I actually see it done in the wild.

There's a lot of posts about how to do it well, and I like the idea of it, generally. I think GenAI has genuine applications in software development beyond as a Google/SO replacement.

But then there's real world code. I constantly see:

1. Over engineering. People used to keep it simple because they were limited by how fast they can type. Well, those gloves sure did come off for a lot of developers.

So while it could be great, from my perspective, it feels like more of a net negative in practice. It's all fun and games until there's a problem. And there always is.

But still, comparing the questionable code bases I got into two years ago with those I get into now, there is a pretty clear change for the worse.

Maybe I'm pessimistic, but I'm starting to think we'll need another software crisis (and perhaps a wee AI winter) to get our act together with this new technology. I hope I'm wrong.

throwawayoldie9mo ago

IMHO, s/may/has/

atemerev9mo ago

I am a software engineer who writes 80-90% code with AI (sorry, can't ignore the productivity boost), and I mostly agree with this sentiment.

However, AI can help you with learning too. You can run experiments, test hypotheses and burn your fingers so fast. I like it.

helge92109mo ago

benreesman9mo ago

I get use from the agents, but I almost always make changes and reconcile contradictions.

pfdietz9mo ago

There was trust?

thedudeabides59mo ago

dont.trust.machines

tomhow9mo ago

[Stub for offtopicness, including but not limited to comments replying to original title rather than article's content]

extr9mo ago

lynx979mo ago

No worries, I also judge you for relying on JavaScript for your "simple blog".

rvnx9mo ago

Claude said to use Markdown, text file or HTML with minimal CSS. So it means the author does not know how to prompt.

The blog itself is using Alpine JS, which is a human-written framework 6 years ago (https://github.com/alpinejs/alpine), and you can see the result is not good.

mnmalst9mo ago

Ha, I came her to make the same comment.

Two completely unnecessary request to: jsdelivr.net and net.cdn.cloudflare.net

acedTrex9mo ago

I wrote it while playing with alpine.js for fun just messing around with stuff.

Never actually expected it to be posted on HN. Working on getting a static version up now.

gblargg9mo ago

Doesn't even work on older browsers either.

can16358p9mo ago

Ironically, a blog post about judging for a practice uses terrible web practices: I'm on mobile and the layout is messed up, and Safari's reader mode crashes on this page for whatever reason.

acedTrex9mo ago

Mobile layout should be fixed now, I also just threw up a quick static version here as well https://static.jaysthoughts.com/

rvnx9mo ago

On Safari mobile you even get a white page, which is almost poetic. It means it pushes your imagination to the max.

MaxikCZ9mo ago

Yes, I will judge you for requiring javascript to display a page of such basic nature.

djm_9mo ago

You could do with using an LLM to make your site work on mobile.

EbNar9mo ago

I'll surely care that a stranger on the internet judges me about the tools I use kor I don't).

Kuinox9mo ago

7 comments.

3 have obviously only read the title, and 3 comments how the article require JS.

Well played HN.

tomhow9mo ago

This exactly why the guideline about titles says:

Otherwise please use the original title, unless it is misleading or linkbait.

This title counts as linkbait so I've changed it. It turns out the article is much better (for HN) than the title suggests.

1 more reply

sandspar9mo ago

j3th9n9mo ago

Back in the day they would judge people for turning on a lightbulb instead of lighting a candle.

thereisnospork9mo ago

In a few years people who don't/can't use AI will be looked at like people who couldn't use a computer ~20 years ago.

It might not solve every problem, but it solves enough of them better enough it belongs in the tool kit.

tines9mo ago

I think it will be the opposite. AI causes cognitive decline, in the future only the people who don't use AI will retain their ability to think. Same as smartphone usage, the less the better.

2 more replies

sandspar9mo ago

michelsedgh9mo ago

1 more reply

DocTomoe9mo ago

You can judge all you want. You'll eventually appear much like that old woman secretly judging you in church.

LLMs challenge open source where it’s most vulnerable - in its culture. Meanwhile, industry just treats them like the next Jenkins: mildly annoying at first, but soon part of the stack.

The author loves the old ways, for many valid reasons: Gabled houses are beautiful, but outside of architectural circles, prefab is what scaled the suburbs, not timber joints and romanticism.

observationist9mo ago

They're as bad now as they're ever going to be again, and they're getting better faster, at a rate outpacing the expectations and predictions of all the experts.

Things are just going to get faster, and weirder, and weirder faster.

klabb39mo ago

> There's no reason to think AI will stop improving

No, and there’s no reason to think cars will stop improving either, but that doesn’t mean they will start flying.

> Making these sort of blanket assessments of AI, as if it were a singular, static phenomena is bad thinking.

I agree, but do you agree with yourself here? Ie:

> no reason to think that these tools won't vastly outperform us in the very near future

.. so back to single axis again? How is this different from saying calculators outperform humans?

jrflowers9mo ago

> shitposting? Absolutely crushing humans already

Where can I go to see language model shitposters that are better than human shitposters?

1 more reply

christhecaribou9mo ago

Sure, if we all collectively ignore model collapse.

ayakaneko9mo ago

I think that, yes sure, there's no reason to think AI will stop improving.

j / k navigate · click thread line to collapse