LLMs are the ultimate demoware (opens in new tab)

(blog.charliemeyer.co)

55 pointscsmeyer7mo ago126 comments

126 comments

It's wild to me that, of all the things to call LLMs out for, this piece has chosen to include math tutoring. I've been doing Math Academy for a bit over 6 months now, going from (essentially) Algebra II through Calc II (integration by parts, arc lengths, Taylor expansions) and LLMs have been a huge part of what has made that effective:

* Clear explanation of concepts that respond to questions and reformulate when things bounce

* Step-by-step verification of solutions, spotting exactly where calculations have gone

* Instantaneously generating new problem sets to reinforce concepts

LLMs are probably not going to live up to all sorts of claims their proponents make. But I don't think you can ever have tried to use an LLM in a math course and reach the conclusion that it's "demoware" for that application. At what point, over 6 months of continuous work, does it stop being a "demo"?

simonw7mo ago

This https://www.mathacademy.com/ ? Interesting, hadn't seen that before. I've been thinking I'd like to brush up on a bunch of those topics.

tptacek7mo ago

Wholeheartedly recommend it, just remember we're not the core market for it (that's high school students, though the curricula goes all the way through the normal college math sequence).

Minutes later

In case I've spooked anyone, they have an adult course series (Foundations I, II, and III) that's accelerated by trimming out all the material their authors believe are important only for things like school placement exams; the modal adult Math Academy person is doing I, II, and III as a leadup to their Math for Machine Learning course, which is linear algebra and multivariable calc.

I think it's one of the three most mindblowing learning resources I've ever used. One of the other three: Lingua Latina Familia Romana. In both cases, I have the uncanny certainty that I am operating at the limit of my ability to acquire and retain new information, which is a fun place to be.

voidhorse7mo ago

Generating problems is fantastic, but I'd caution on overreliance in the other two cases.

Basically all of the cognitive science literature on learning that I am aware of says that the more you do directly and the less hand holding you are given, the better your acquisition and long term retention. In particular, having the LLM elaborate concepts for you is probably one of the worst things you can do when it comes to learning. Struggling through that elaboration process yourself is going to make the learning stick much more strongly, at least if all of the research is to be believed.

tptacek7mo ago

I understand that. The core of the pedagogical approach here is Math Academy, not LLMs. (Math Academy isn't an LLM; it's a spaced-repetition accelerated curriculum centered on graded problem set submissions). But the LLM functions exactly the way a tutor would in a math course, and for that application, LLMs have become extremely effective; arguably more effective than human tutors.

empath757mo ago

It seems very hard to maintain the belief that LLMs are useless in the face of the fact that millions of people are using them. It's very much "nobody goes there anymore, it's too crowded"

tptacek7mo ago

I think you'd be crazy to say LLMs are blockchain-style hype when it comes to software development but I don't begrudge anybody who believes they're not currently workable for the kinds of problems they work on; I think reasonable people can disagree about how ready for prime time they are for production software development.

But for math tutoring? If you claim LLM math tutoring is demoware, you're very clearly telling on yourself.

badsectoracula7mo ago

I wouldn't trust the LLM's raw output to be correct, but math is provable and if there was a filter between the LLM's output (which would be in some more rigid/structured format, not free form text) and whatever the user sees that tries to prove the LLM's output is correct (and try again if it goes wrong[0]), then i can see LLMs being perfectly fine for that.

In fact i'd say in general anything that LLMs produce that can be "statically checked" in some way, can be fine to rely on. You most likely need more than a chat interface though, but i think in general it is plausible for such solutions to exit.

[0] hopefully it wont end up always failing, ending up in an infinite loop :-P

csmeyerOP7mo ago

(OP) In my post, I actually ask the question of whether a student would _want_ to interact with the tutor, not if the tutor is capable of providing good instruction. These are drastically different critiques.

tjr7mo ago

I have seen LLMs fabricate bogus calculations; I personally would be hesitant to use an LLM as my one and only source of math learning, but I suppose using it in conjunction with something like Math Academy mitigates that issue? You've clearly had good success here, but any problem areas with the LLM to watch out for?

rsynnott7mo ago

On that basis you'll also be adopting TCM, and homeopathy, and dowsing, and all major religions simultaneously, and all major fad diets simultaneously? Like, "lots of people like this thing and think it is helping them" is not terribly strong evidence that it is actually helping them. It's not a good argument.

simonw7mo ago

How about "lots of people like this thing" where many of those people are credible professionals who I have respected from LONG before they started using LLMs?

2 more replies

unshavedyak7mo ago

Offtopic, but do you have any comparison to math academy to something like Khan, or other platforms? MA seems a bit expensive for someone just wanting to improve a general skill, but perhaps it's well worth it? I thought Khan was also investing in similar AI offerings, so i'm curious how they intersect

tptacek7mo ago

Khan never clicked for me, and while the cost of Math Academy is below my noise floor (when you back it out to $/hr of engagement) as an adult professional in his prime earning years, I should also add that the cost is also a motivator: I've never been tempted to take a break, in part because I'm on the meter.

benterix7mo ago

While I agree, on an unrelated note - I knew I know your nick from somewhere...

And then I realized[0].

[0] https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-arti...

tptacek7mo ago

I had a conversation with that person a couple weeks ago. They're nice. I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.

For the record, I'm a systems programmer and a security person and I don't work for an AI company (you can Six Degrees of Sam Altman any startup to AI now if you want to make the claim, but if you try I'm just going to say "Sir, This Is A Wendy's".)

benterix7mo ago

> I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.

Maybe you could present a joint statement of some kind, that would be interesting. I enjoy listening to the arguments of both camps and constantly comparing them to the actual state of things - and my conclusion is, sorry for the cliche, the only constant is change.

kasey_junk7mo ago

I on the other hand am the mental case that drinks rocket fuel.

j457mo ago

Absolutely.

This piece feels like a “I tried it out how I could” piece vs “I spent time learning how others are learning math with LLMs too”

LLMs will make meaningful advances in personalized learning.

Some of the frameworks might evolve along the way.

thegrim337mo ago

So .. a person who doesn't know X, is using LLMs to learn X, yet is able to judge that LLMs are doing a good job at teaching X, even though the person doesn't know X?

simonw7mo ago

There are many, many things in life where you can evaluate if you are learning the thing despite not having access to an expert guide who can verify what you are learning.

Cooking: does the food taste better as you learn more?

Programming: are you able to build functioning software that does what you want it to do, better than you could earlier on in your path?

Fixing a broken dishwasher: does the dishwasher work again now?

The idea that learning only works if you have an expert on hand to verify that you are learning is one of those things that seems obviously true until you think harder about it.

tptacek7mo ago

You're confused. Math Academy is not an LLM.

pjc507mo ago

Is it always correct?

tptacek7mo ago

In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.

And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.

Jensson7mo ago

> In my experience, it's 100%. Not 95%, not 99%.

Yeah, they seem to be there on high school math problems today, there aren't that many variations on them and there are billions of examples of data on them so LLM can saturate those.

Just don't assume they are this reliable on solving real world math tasks yet, those are more varied still and stump models.

1 more reply

AnotherGoodName7mo ago

I've used LLMs to try to help digest some advanced maths. Eg. "Explain the number field seive with lots of numeric examples".

Yes the numeric examples often don't work. The consequences of this though are similar to a failed web search. As in it's not a big deal and when it does work it's very helpful.

Maths is one of those things with so much objectivity that even the LLM usually realizes it has failed to create a numeric example. "Here the numeric example breaks down since we cannot find a congruence of squares in this example without finding more B-smooth numbers in step 1". Ok that's a shame, i would have loved to see an end to end numeric example.

I think people get too hung up on any possibility of LLMs not being perfect while still being extremely helpful.

lomase7mo ago

A LLM can't "realize" anything. Unless you are saying that LLMs are aware.

1 more reply

getnormality7mo ago

It's nice that you think it's clear and responsive, but I think it [1] needs to be validated by an expert in both the material and education. Or we need some way to show that people have actually learned the topic. People sometimes prefer explanations that are intuitive and familiar but not accurate.

Meanwhile, there are math education resources like iXL that maybe cost a little money but the lessons and practice problems are fully curated by human experts (AFAICT). I'm not saying these resources are perfect either, but as a mathematician who has experimented a lot with LLMs, including in supposed tutoring modes, they make a lot of mistakes and take a lot of shortcuts that should materially decrease their effectiveness as tutors.

[1] LLM-based tutoring (edit: footnote added to clarify)

tptacek7mo ago

That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried.

I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.

getnormality7mo ago

That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.

Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.

The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].

[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...

2 more replies

simonw7mo ago

What makes you think https://www.mathacademy.com/faq hadn't been evaluated by experts?

That appears to be their whole thing, and they've been in business for longer than LLMs have been around.

getnormality7mo ago

I think before that question is useful to ask, we have to know if that FAQ even says anything about LLM-based tutoring. After a few minutes of research, I can't find any evidence that Math Academy offers LLM-based tutoring.

2 more replies

voidhorse7mo ago

I think parent was clearly referring to LLM use, and not math academy.

worldsayshi7mo ago

I agree that LLM output need to be validated to be valuable but math (unless it's on a quite high level I suppose) seems like one of the areas with the most potential for doing validations, without requiring an expert to validate everything.

If you're working on educational math problems with solutions you can validate against the solutions. If you're working with proofs you can evaluate the proofs in a proof checker. Or you can run the resulting math expressions through a calculator.

j457mo ago

There is a bit of oversimplification here.

Understanding if the student has actually learned is a competency piece, in math it’s mostly show your work and/or did you have the right answer.

The continued top down attempts to boil the whole sea with LLMs is part of the current problem.

It’s getting pretty good though for focused tutoring.

For students, models setup to tutor too often are trying to boil a sea (all education) instead of a kiddie pool. The reality is that more and more seems like k-6 if not k-12 students can be supported.

If we look at the EdTech space from the bottom up, namely learner-centric, there is both a real need and opportunity.

For school age students, math largely has not changed in hundreds of years, and doesn’t change often. Either you understand it or have to put in the work.

There’s no shortage of human created written teaching resources. A teacher could create their own touring assistant based off their explanations.

Alternatively, an open source textbook could be inputted. There’s a reason why training or fine tuning on books has caused lawsuits - it can increase accuracy many fold.

Teachers are burdened with repetitive marking, there’s def a place for personalized marking tools.

We know LLMs respond differently to different input. Their superpower is being able to regenerate an input as many different many different ways, which can include personalization.

Just because one has experimented with LLMs doesn’t mean there isn’t a way to get a result from them just because we haven’t been able to understand how.

If examples of the chat logs or prompts can be provided of what did or didn’t work it helps have a conversation without the subjectivity.

Mathematics is a great lens to see that folks are trying to get non-deterministic software to behave like all the deterministic software we’ve had before, instead of finding the places where non-deterministic strengths can shine.

It’s not all or nothing, or one or the other.

AnotherGoodName7mo ago

>I think it needs to be validated by an expert in both the material and education

LLMs getting it wrong is terrible when it matters but i also don't think it's a huge problem when it comes to acting as an additional resource to learning. Here the parent is using a lesson plan that costs money and using LLM for a little more explanation. It's similar to using web search on a topic and sometimes you get a hit, sometimes you don't.

Asking LLMs for numeric examples of complex maths sometimes fails. It's easy to spot and no great loss. When it works though it's extremely helpful to follow through.

fourside7mo ago

Not sure the condescending tone is really necessary. I’d agree with you if the parent comment was saying they asked an LLM to create a math curriculum and problems for them. But they’re using an established app created by a math major and then using LLMs to ask questions. It’s easier to validate the responses you get back in those cases.

getnormality7mo ago

I think students are not a reliable source of information about the effectiveness of LLM tutoring. There is no 100% nice way to say this, but I did my best. You're free to disagree, but I think the tone criticism is off-base.

2 more replies

baklavaEmperor7mo ago

It’s interesting how people insist math requires expert validation when it’s literally the most self validating subject there is. The instinct to gatekeep even something as mechanistically checkable as algebra says more about insecurity in education than it does about rigor.

Blackthorn7mo ago

Wanting an actual check on the device that is notorious for making things up is gatekeeping now?

1 more reply

adrianton37mo ago

"5.11 or 5.9 which number is greater?" was a meme query a few months ago to ask an LLM as it would confidenly prove how 5.11 is greater - so yes, we do need expert validation!

2 more replies

dingnuts7mo ago

Isn't this moving the goalposts? It's great that you're learning but MathAcademy appears to be a whole product that may incorporate an LLM but is much more, and it's a paid product none of us can evaluate. It's not possible to tell from looking at their site, or from your comment, what content is generated, or how it is verified before being used as teaching material.

There are probably smart ways to incorporate LLM output into an application like the one you're lauding but your comment is a little like responding "but my cake tastes good" to someone who says you shouldn't eat raw flour.

tptacek7mo ago

You're confused. Math Academy isn't LLM-based.

dpflan7mo ago

Yes, AI allows for exquisite demos, demos that tantalize the audience into thinking of the infinite potential of the technology, that stunning vision expands and expands until the universe of potential overwhelms the dreamer into a state of terminal fantasy. So it is always a solution looking for a problem. There are cases where the two meet more realistically and a valuable impactful company develops it.

The fact it can generate human language that is very compelling for certain context, makes it seem possible of doing so for many, many more contexts.

js87mo ago

LLM models have evolved to autonomously convince humans that they're useful. They're the ultimate memetic parasites.

some-guy7mo ago

I think LLMs can both be bad for humanity (which I believe) AND useful at certain tasks. The general populace has been convinced that they’re the ultimate authority which is very sad (e.g. “@grok is this true?”)

rsynnott7mo ago

... I mean, it's not evolution. These things have people guiding them. Note the whole 'agreeability' controversy. That one's is a bit like cigarette companies back in the day optimising their products for addictiveness; do you do the right thing, or the thing that makes people buy your product more?

rdtsc7mo ago

> “Demoware” is a type of software that looks “good” during a demonstration.

I like the term. I have been using a similar phrase "looks good in a snippet" when referring to certain styles of programming.

Once such instance was when nodejs was becoming popular and everyone was showing how easy concurrent programming can be with a few callbacks in a snippet. However building a large code base with that would eventually turn into a nightmare.

Another example is databases which don't fsync after writes by default. They look great in benchmarks (webscale even!) then in production suddenly some of data goes missing. But at least those initial benchmark demos were impressive.

book_mike7mo ago

LLMs are useful if you use them properly and they are getting better everyday. Arguing against LLMs is like arguing against a shovel. Just use it right.

gipp7mo ago

A lot of arguing "against LLMs" is not arguing "shovels aren't useful," it's arguing "maybe shovels aren't actually going to replace all human labor, and sinking so much capital into it we're starting to conceptualize it in terms of 'percent of global GDP' might not be such a great idea."

lm284697mo ago

That's the theory, if the vast majority of people use it wrong the problem is the tool, not the user.

soperj7mo ago

I haven't noticed them getting any better in the last year.

simonw7mo ago

You absolutely have not been paying attention then. The difference in quality between September 2025 LLMs (GPT-5, Claude 4/4.5) and September 2024 (we were still on GPT-4o) is huge.

For one thing, last year's LLMs were nowhere near winning gold on collegiate math and programming competitions. That's because the "reasoning" thing hadn't kicked off yet - the first model to demonstrate that trick was o1 in ... OK that was September 12th 2024 so it just makes it to a year old now.

cronelius7mo ago

LLM improvement is a sigmoid, not a parabola. The sooner we understand this, the less money we will lose to deceptive marketing

turnsout7mo ago

This is such a weak take to read while I have Claude Code running in the background creating a new database migration for a feature we're building

skydhash7mo ago

How much time to create a new database migration, like for actually typing it?

Ancapistani7mo ago

I'd estimate that using AI to implement a well-defined feature typically takes about 1.5x the time for me - and yet I still use AI extensively.

The key difference is that I can context switch. Once the AI has context and is doing its thing, I can move on to another task that's not working in the same area or project. I can post on HN. I can catch up on my Slack inbounds, or my email.

Having two tasks running at once nets a small but nice improvement in velocity. Having any tasks running while I'm doing other things effectively doubles my output.

lomase7mo ago

Can't your stack do that for you?

The one I use creates the migrations, locally, for free and deterministically in about 30 seconds.

turnsout7mo ago

I'm using raw Postgres. I don't like a lot of dependencies or "stacks" where they don't need to be. Claude does this extremely efficiently.

lomase7mo ago

You say you don't like stacks, and that you don't need them.

But you pay a subscription to run a stack of software in anothers people computer.

btw wtf is raw postgres?

1 more reply

dingnuts7mo ago

If Claude Code can do it what do we need you for?

simonw7mo ago

Someone needs to know what a "database migration" is in order to ask Claude to build one.

I think a lot of people are massively underestimating how much knowledge and skill is needed in software engineering beyond typing code into a text editor.

bgwalter7mo ago

You know that Linus was at one point "typing code into an editor" to review other people's patches because he found it easier to catch mistakes that way.

If your field of software engineering is so simple that you can survive on code snippets stolen from other people, great. Please do not generalize that.

badsectoracula7mo ago

What simonw wrote but also someone needs to ensure the code is correct and take the blame once it fails :-P.

I think even if we ever reach actual AGI (in the far far future), we'll still want low level meatbags around to blame :-P

turnsout7mo ago

Someone has to come up with the idea for app, have a vision for what it needs to be, and continually push it forward without going off-course.

evilduck7mo ago

How do you know it did it correctly?

turnsout7mo ago

Get this: I read the code and understand it

js87mo ago

NNs are also demoware in the sense they contain extremely condensed and incomprehensible model of the world (or part of). Demo coders would be proud.

Edit: I mean their outputs are procedurally generated, like in https://en.m.wikipedia.org/wiki/Demoscene

yvdriess7mo ago

I wouldn't say they contain any model of the world. They're a statistical predictive model, which have proven effective at certain tasks. My take is that the demoware part is not inherent to the NN approach, but rather that the tasks it's unreasonably effective produce very cool demo-able tasks for which the audience readily fills in the blanks. Cool demos make it easier to get further resources, so demoware-prone techniques tend to pull more funding, at least for a while.

clueless7mo ago

"then fails to consistently help in completing tasks when deployed for daily use."

This article seems to be baitware trying to push some outdated perspective. LLMs have only gotten more powerful over the last 3 years (being able to do more things), and so far not much has stopped them from becoming even more powerful (with the help of reasoning, other external methods, etc) in the future.

"daily use" is so subjective and this article will be out dated soon as we get closer to an AGI (with LLMs as the base layer and not the main driver)

slaterbug7mo ago

What evidence is there that AGI will come “soon”?

dsr_7mo ago

Or "ever"?

(I'm not denying the possiblity. I'm proclaiming a lack of evidence.)

slaterbug7mo ago

I’ve been daydreaming lately about what the fundamental limits of “intelligence” could be, something like the concept of computability but for AI, or even biological brains.

Though I will say, surely the existence of the human brain (which by definition is general intelligence), suggests that creating AGI is fundamentally possible?

1 more reply

clueless7mo ago

What evidence did we have that LLMs would be such transformative techs before they were suddenly introduced, and have such surprising behaviors? Not sure we need to always be looking for evidence for potentially surprising and disruptive tech

lm284697mo ago

They can "feel it", like people "felt" we'd have commercial space flight "soon" after we put people on the moon, it's all delusion and wishful thinking.

rsynnott7mo ago

It's worse than that, really, because there was at least a fairly obvious _path_ there, even if the economics were, to say the least, shaky. For AGI... not so much.

1 more reply

rsynnott7mo ago

Following a proud tradition; 4GLs and 5GLs and no-code solutions and so forth were also, essentially, demoware.

tedggh7mo ago

I wish demoware or even battle tested software was that easy to sell.

bgwalter7mo ago

And the article is kicked to the third page because it is well written and the demoware metaphor is so powerful that it needs to be suppressed.

eitland7mo ago

Again and again people keep saying this while many of us keep using LLMs to create value.

jdiff7mo ago

Countless people in comments say this, but other people fail to see evidence of that in the wild. As has been said in response to this point many times in the past: Where's the open source renaissance that should be happening right now? Where are the actual, in-use dependencies and libraries that are being developed by AI?

The only times I've personally seen LLMs engaged in repos has been handling issues, and they made an astounding mess of things that hurt far more often than it helped for anything more than automatically tagging issues. And I don't see any LLMs allowed off the leash to be making commits. Not in anything with any actual downstream users.

simonw7mo ago

Let's look at every PR on GitHub in public repos (many of which are likely to be under open source licenses) that may have been created with LLM tools, using GitHub Search for various clues:

GitHub Copilot: 247,000 https://github.com/search?q=is%3Apr+author%3Acopilot-swe-age... - is:pr author:copilot-swe-agent[bot]

Claude: 147,000 https://github.com/search?q=is%3Apr+in%3Abody+%28%22Generate... - is:pr in:body ("Generated with Claude Code" OR "Co-Authored-By: Claude" OR "Co-authored-by: Claude")

OpenAI Codex: ~2,000,000 (over-estimate, there's no obvious author reference here so this is just title or bid containing "codex"): https://github.com/search?q=is%3Apr+%28in%3Abody+OR+in%3Atit... - is:pr (in:body OR in:title) codex

Suggestions for improvements to this methodology are welcome!

pjc507mo ago

What's the acceptance rate on such PRs?

1 more reply

miltonlost7mo ago

That's a denominator of total. How many are actually useful?

pizlonator7mo ago

The main problem with your search methodology is that maybe AI is good at generating a high volume of slop commits.

Slop commits are not unique to AI. Every project I’ve worked on had that person who has high commit count and when you peek at the commits they are just noise.

I’m not saying you’re wrong btw. Just saying this is a possible hole in the methodology

lm284697mo ago

HN people: lines of code and numbers of PRs are irrelevant to determine the capabilities of a developer.

Also HN people: look at the magic slop machine, it made all these lines of codes and PRs, it is irrefutable proof that it's good and AGI

1 more reply

unshavedyak7mo ago

> Countless people in comments say this, but other people fail to see evidence of that in the wild. As has been said in response to this point many times in the past: Where's the open source renaissance that should be happening right now? Where are the actual, in-use dependencies and libraries that are being developed by AI?

The thing that this comment misses, imo, is that LLMs are not always enabling people who previously couldn't create value to create value. In fact i think they are likely to cause some people who created value previously to create even less value!

However that's not mutually exclusive with enabling others to create more value than they did previously. Is it a net gain for society? Currently I'd bet not, by a large margin. However is it a net gain for some individual users of LLMs? I suspect yes.

LLMs are a powerful tool for the right job, and as time goes on the "right job" keeps expanding to more territory. The problem is it's a tool that takes a keen eye to analyze and train on. It's not easy to use for reliable output. It's currently a multiplier for those willing to use it on the right jobs and with the right training (reviews, suspicion, etc).

eitland7mo ago

> The thing that this comment misses, imo, is that LLMs are not always enabling people who previously couldn't create value to create value. In fact i think they are likely to cause some people who created value previously to create even less value!

Agree.

For some time I’ve compared AI to a nail gun:

It can make an experienced builder much quicker at certain jobs.

But for someone new to the trade, I’m not convinced it makes them faster at all. It might remove some of the drudgery, yes — but it also adds a very real chance of shooting oneself in the foot (or hand).

eitland7mo ago

Using the same arguments people used (use?) against IDEs and I think also against compilers and stuff back in the punch card days.

I am not a researcher, but I am a techlead and I've seen it work again and again: IDEs work. And LLMs work.

They are force multipliers though, they absolutely work best with people who already know a bit of software engineering.

pizlonator7mo ago

What would it mean to see it in the wild?

I think that highly productive people who have incorporated LLMs into their workflows are enjoying a productivity multiplier.

I don’t think it’s 2x but it’s greater than 1x, if I had to guess. It’s just one of those things that’s impossible to measure beyond reasonable doubt

badsectoracula7mo ago

Well, i haven't used LLMs much for code (i tried it, it was neat but ultimately i found it more interesting to do things myself) and i refuse to rely on any cloud-based solutions, be it AI or not, so i've only been using local LLMs, but even so i've found a few neat uses for it.

One of my favorite uses is that i have configured my window manager (Window Maker) that when i press Win+/ it launches xterm with a script that runs a custom C++ utility based on llama.cpp that combines a prompt that asks a quantized version of Mistral Small 3.2 to provide suggestions for grammar and spelling mistakes in text, then uses xclip to put whatever i have selected and filters the program's output through another utility that colorizes the output using some simple regex. Whenever i write any text that i care about having (more) correct grammar and spelling (e.g. documentation - i do not use it for informal text like this one or in chat) i use it to find mistakes as English is not my first language (and it tends to find a lot of them). Since the output is shown in a separate window (xterm) instead of replacing the text i can check if the correction is fine (and the act of actually typing the correction helps me remember some stuff... in theory at least :-P). The [0] shows an example of how it looks.

I also wrote a simple Tcl/Tk script that calls some of the above with more generalized queries, one of which is to translate text to English, which i'm mainly using to translate comments on Steam games[1] :-P. It is also helpful whenever i want to try out something quickly, like -e.g.- recently i thought that common email obfuscation techniques in text (like some AT example DOT com) are pointless nowadays with LLMs, so i tried that from a site i found online[2] (pretty much everything that didn't rely on JavaScript was defeated by Mistral Small).

As for programming, i used Devstral Small 1.0 once to make a simple raytracer, though i wrote about half of the code by hand since it was making a bunch of mistakes[3]. Also recently i needed to scrape some data from a page - normally i'd do it by hand, but i was feeling bored at the time so i asked Devstral to write a Python script using Beautiful Soup to do it for me and it worked just fine.

None of the above are things i'd value for billions though. But at the same time, i wouldn't have any other solution for the grammar and translation stuff (free and under my control at least).

[0] https://i.imgur.com/f4OrNI5.png

[1] https://i.imgur.com/jPYYKCd.png

[2] https://i.imgur.com/ytYkyQW.png

[3] https://i.imgur.com/FevOm0o.png

rsynnott7mo ago

The trouble is that peoples' self-evaluation of things that they believe are helping them is generally poor, and there's, at best, weak and conflicting evidence which is _not_ based on polling users.

In particular, "producing stuff" is not necessarily "creating value"; some stuff has _negative_ value.

lm284697mo ago

If the value produced was so great I think we've be able to measure it by now, or at least see something. If you remove the hype around AI the economy is actually on the way down, productivity wasn't measured to have increased since llm became mainstream either.

Lot's of vibes and feelings but 0 measurable impact

whatever17mo ago

Are you Nvidia? If not, then I don’t believe you.

bgwalter7mo ago

The initial ChatGPT release in 2022 was the product of 7 years of private research that in turn built on decades of public research.

Rumors say that Google wasn't far behind at the time, but didn't push releases. Perhaps because they were not that impressed by the applications or did not want "AI" to cannibalize their other products.

So it seems very likely that everything has been squeezed out of the decades of research and we have plateaued.

Desperate measures like Nvidia buying its own graphics cards through circular investment schemes do not inspire confidence either. Or Microsoft now doing CoPilot product placement ads in teenager YouTube channels. When Google launched, people just used it because it was good. This all fits very well with the demoware angle of the article.

j / k navigate · click thread line to collapse