Adding a feature because ChatGPT incorrectly thinks it exists (opens in new tab)

(holovaty.com)

1299 pointsadrianh8mo ago424 comments

424 comments

I've found this to be one of the most useful ways to use (at least) GPT-4 for programming. Instead of telling it how an API works, I make it guess, maybe starting with some example code to which a feature needs to be added. Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.

These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)

There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.

Unfortunately, this only works with APIs that aren't already super popular.

suzzer998mo ago

> Sometimes it comes up with a better approach than I had thought of.

IMO this has always been the killer use case for AI—from Google Maps to Grammarly.

I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.

But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.

The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

jll298mo ago

> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results

Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.

The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?

9 more replies

exe348mo ago

I will never use grammarly, not matter how good they get. They've interrupted too many videos for me to let it pass.

dataflow8mo ago

Hasn't Microsoft Word has style checkers for things like passive voice for decades?

1 more reply

eru8mo ago

> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

That's how you get economics of scale.

Google couldn't have a human in the loop to review every page of search results before handing them out in response to queries.

3 more replies

visarga8mo ago

Yes, we have the context - our unique lived experience, and are ultimately accountable for our actions. LLMs have no skin. They have no desires, and cannot be punished in any way. No matter how smart they get, we are providing their opportunities to generate value, guidance and iteration, and in the end have to live with the outcomes.

croes8mo ago

And that’s how everything gets flattened to same style/voice/etc.

That’s like getting rid of all languages and accents and switch to the same language

3 more replies

normie30008mo ago

What's wrong with passive?

9 more replies

bryanlarsen8mo ago

I used this to great success just this morning. I told the AI to write me some unit tests. It flailed and failed badly at that task. But how it failed was instructive, and uncovered a bug in the code I wanted to test.

ChaoPrayaWave8mo ago

In a way, AI’s failure can be its own kind of debugger. By watching where it stumbles, you sometimes spot flaws you’d have missed otherwise.

kragen8mo ago

Haha, that's awesome! Are you going to change the interface? What was the bug?

1 more reply

slowmovintarget8mo ago

That's not creativity.

That's closer to simply observing the mean. For an analogy, it's like waiting to pave a path until people tread the grass in a specific pattern. (Some courtyard designers used to do just that. Wait to see where people were walking first.)

Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.

sigbottle8mo ago

Best way to put it. It's very hard to discuss even slightly unique concepts with GPT. It just keeps strawmanning ideas back to a common consensus without actually understanding the deep idea.

On the bright side, a lot of work is just finding the mean solution so.

a_e_k8mo ago

I've played with a similar idea for writing technical papers. I'll give an LLM my draft and ask it to explain back to me what a section means, or otherwise quiz it about things in the draft.

I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.

kragen8mo ago

That's fantastic! I agree that it's very similar.

In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.

momojo8mo ago

A light-weight anecdote:

Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.

When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.

dimatura8mo ago

I don't know if there's an earlier source, but I'm guessing Matlab originally popularized the `imread` name, and that OpenCV (along with its python wrapper) took it from there, same for scipy. Scikit-image then followed along, presumably.

bandofthehawk8mo ago

As someone not familiar with these libraries, image_get or image_read seems much clearer to me than imread. I'm wondering if the convention is worse than your instinct in this case. Maybe these AI tools will push us towards conventions that aren't always the best design.

1 more reply

kragen8mo ago

That's a perfect example! I wonder if changing it would be an improvement? If you can just replace image_get with imread in all the callers, maybe it would save your team mental effort and/or onboarding time in the future.

1 more reply

escapecharacter8mo ago

This is similar to an old HCI design technique called Wizard of Oz by the way, where a human operator pretends to be the app that doesn’t exist yet. It’s great for discovering new features.

https://en.m.wikipedia.org/wiki/Wizard_of_Oz_experiment

kragen8mo ago

I'd never heard that term! Thank you! I feel like LLMs ought to be fantastic at doing that, as well. This is sort of like the inverse.

groestl8mo ago

> and being unintuitive is the only one that this will fix

That's also how I'm approaching it. If all the condensed common wisdom poured into the model's parameters says that this is how my API is supposed to work to be intuitive, how on earth do I think it should work differently? There needs to be a good reason (like composability, for example). I break expectations otherwise.

ldeian8mo ago

> Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

“Sometimes” being a very important qualifier to that statement.

Claude 4 naturally doesn’t write code with any kind of long term maintenance in-mind, especially if it’s trying to make things look like what the less experienced developers wrote in the same repo.

Please don’t assume just because it looks smart that it is. That will bite you hard.

Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.

rcthompson8mo ago

In a similar vein, some of my colleagues have been feeding their scientific paper methods sections to LLMs and asking them to implement the method in code, using the LLM's degree of success/failure as a vague indicator of the clarity of the method description.

Cthulhu_8mo ago

That's a pretty good exercise in writing requirements, with a much faster feedback cycle than having developers write it.

1 more reply

dotancohen8mo ago

  > I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code

If anyone is stuck in this situation, give me a holler. My Gmail username is the same as my HN username. I've always been the one to hunt down my coworkers' bugs, and I think I'm the only person on the planet will finds it enjoyable to find ChatGPT'S oversights and sometimes seemingly malicious intent.

I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.

djtango8mo ago

In essence, a LLM is a crystallisation of a large corpus human opinion and you are using that to focus group your API as it is representative of a reasonable third party perspective?

junon8mo ago

Yeah, basically. For example, it's really good at generating critical HN comments. Whenever I have a design or an idea I formulate it to GPT and ask it to generate a bunch of critical HN comments. It usually points out stuff I hadn't considered, or at least prepares me to think about and answer the tough questions.

layer88mo ago

HDD — hallucination-driven development

data-ottawa8mo ago

This was a big problem starting out writing MCP servers for me.

Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.

golergka8mo ago

Great point. Also, it may not be the best possible API designer in the world, but it sure sounds like a good way to forecast what an _average_ developer would expect this API to look like.

eru8mo ago

> These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

This is also similar to which areas TD-Gammon excelled at in Backgammon.

Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.

kragen8mo ago

From https://tonsky.me/blog/gaslight-driven-development/ today:

> Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. (...)

> You might’ve heard a story of Soundslice [adding a feature because ChatGPT kept telling people it exists](https://www.holovaty.com/writing/chatgpt-fake-feature/). We see the same at Instant: for example, we used `tx.update` for both inserting and updating entities, but LLMs kept writing `tx.create` instead. Guess what: we now have `tx.create`, too.

> Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too.

> It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time.

codingwagie8mo ago

This works for UX. I give it vague requirements, and it implements something i didnt ask for, but is better than i would have thought of

skygazer8mo ago

You’re fuzzing the API, unusually, before it’s written.

djsavvy8mo ago

how do prompt it to make it guess about the API for a library? I'm confused how you would structure that in a useful way.

kragen8mo ago

Often I've started with some example code that invokes part of the API, but not all of it. Or in C I can give it the .h file, maybe without comments.

Sometimes I can just say, "How do I use the <made-up name> API in Python to do <task>?" Unfortunately the safeguards against hallucinations in more recent models can make this more difficult, because it's more likely to tell me it's never heard of it. You can usually coax it into suspension of disbelief, but I think the results aren't as good.

visarga8mo ago

When I see comments like yours I can't help but decry how bad was the "stochastic parrots" framing. A parrot does not hallucinate a better API.

afavour8mo ago

From my perspective that’s fascinatingly upside down thinking that leads to you asking to lose your job.

AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.

Right now it’s throwing spaghetti at the wall and you’re drawing around it.

simonw8mo ago

I find it's often way better than API design than I expect. It's seen so many examples of existing APIs in its training data that it tends to have surprisingly good "judgement" when it comes to designing a new one.

Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.

kragen8mo ago

Maybe. So far it seems to be a lot better at creative idea generation than at writing correct code, though apparently these "agentic" modes can often get close enough after enough iteration. (I haven't tried things like Cursor yet.)

I agree that it's also not currently capable of judging those creative ideas, so I have to do that.

1 more reply

beefnugs8mo ago

Complete insanity, it might change constantly even before a whole new version-retrain

Insanity driven development: altering your api to accept 7 levels of "broken and different" structures so as to bend to the will of the llms

fourside8mo ago

I think you’re missing the OP’s point. They weren’t saying that the goal is to modify their APIs just to appease an LLM. It’s that they ask LLMs to guess what the API is and use that as part of their design process.

If you automatically assume that what the LLM spits out is what the API ought to be then I agree that that’s bad engineering. But if you’re using it to brainstorm what an intuitive interface would look like, that seems pretty reasonable.

kragen8mo ago

Yes, that's a bonus. In fact, I've found it worthwhile to prompt it a few times to get several different guesses at how things are supposed to work. The super lazy way is to just say, "No, that's wrong," if necessary adding, "Frotzl2000 doesn't have an enqueueCallback function or even a queue."

Of course when it suggests a bad interface you shouldn't implement it.

JimDabell8mo ago

I wrote this the other day:

> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.

— https://www.threads.com/@jimdabell/post/DLek0rbSmEM

I guess it’s true for product features as well.

jjcm8mo ago

Seems like lots of us have stumbled on this. It’s not the worst way to dev!

> Maybe hallucinations of vibe coders are just a suggestion those API calls should have existed in the first place.

> Hallucination-driven-development is in.

https://x.com/pwnies/status/1922759748014772488?s=46&t=bwJTI...

NooneAtAll38mo ago

inb4 "Ai thinks there should be a StartThermonuclearWar() function, I should make that"

1 more reply

TZubiri8mo ago

Beware, the feature in OP isn't something that people would have found useful, it's not like chatgpt assigned to OP's business a request from a user in some latent consumer-provider space, as if chatgpt were some kind of market maker connecting consumers with products, like a google with organic content or ads, or linkedin or producthunt.

No, what actually happened is that OP developed a type of chatgpt integration, and a shitty one at that, chatgpt could have just directed the user to the site and told them to upload that image to OP's site. But it felt it needed to do something with the image, so it did.

There's no new value add here, at least yet, maybe if users started requesting changes to the sheet I guess, not what's going on.

1 more reply

AdieuToLogic8mo ago

> I wrote this the other day:

>> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.

A detailed counterargument to this position can be found here[0]. In short, what is colloquially described as "LLM hallucinations" do not serve any plausible role in software design other than to introduce an opportunity for software engineers to stop and think about the problem being solved.

See also Clark's third law[1].

0 - https://addxorrol.blogspot.com/2025/07/a-non-anthropomorphiz...

1 - https://en.wikipedia.org/wiki/Clarke%27s_three_laws

JimDabell8mo ago

Did you mean to post a different link? The article you linked isn’t a detailed counterargument to my position and your summary of it does not match its contents either.

I also don’t see the relevance of Clarke’s third law.

shermantanktop8mo ago

The music notation tool space is balkanized in a variety of ways. One of the key splits is between standard music notation and tablature, which is used for guitar and a few other instruments. People are generally on one side or another, and the notation is not even fully compatible - tablature covers information that standard notation doesn't, and vice versa. This covers fingering, articulations, "step on fuzz pedal now," that sort of thing.

The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.

So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.

adrianhOP8mo ago

I don't fully understand your comment, but Soundslice has had first-class support for tablature for more than 10 years now. There's an excellent built-in tab editor, plus importers for various formats. It's just the ASCII tab support that's new.

shermantanktop8mo ago

I’m not super familiar with Soundslice. But all the tab users I know use guitar pro or maybe ultimate guitar, and none of them can read standard notation on its own. Does Soundslice have a lot of tab-first users?

1 more reply

kragen8mo ago

I wonder if LLMs will stimulate ASCII formats for more things, and whether we should design software in general to be more textual in order to work better with LLMs.

1 more reply

gortok8mo ago

I think folks have taken the wrong lesson from this.

It’s not that they added a new feature because there was demand.

They added a new feature because technology hallucinated a feature that didn’t exist.

The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.

That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.

nomel8mo ago

> in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again

This would be a world without generative AI available to the public, at the moment. Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists, which are both completely irrational, since many people are finding practical value in its current imperfect state.

The current state of LLM is useful for what it's useful for, warnings of hallucinations are present on every official public interface, and its limitations are quickly understood with any real use.

Nearly everyone in AI research is working on this problem, directly or indirectly.

Velorivox8mo ago

> which are both completely irrational

Really!?

[0] https://i.imgur.com/ly5yk9h.png

1 more reply

gortok8mo ago

No one is “requiring perfection”, but hallucination is a major issue and is in the opposite direction of the “goal” of AGI.

If “don’t hallucinate” is too much to ask then ethics flew out the window long ago.

1 more reply

epidemian8mo ago

> Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists

What?? What does AGI have to do with this? (If this was some kind of hyperbolic joke, sorry, i didn't get it.)

But, more importantly, the GP only said that in a sane world, the ChatGPT creators should be the ones trying to fix this mistake on ChatGPT. After all, it's obviously a mistake on ChatGPT's part, right?

That was the main point of the GP post. It was not about "requiring perfection" or something like that. So please let's not attack a straw man.

1 more reply

bravesoul28mo ago

There was demand for the problem. ChatGPT created demand for this solution.

lexandstuff8mo ago

Sometimes you just have to deal with the world as it is, not how you think it should be.

gortok8mo ago

Is it your argument that the folks that make generative AI applications have nothing to improve from this example?

1 more reply

rbits8mo ago

Sure, but this isn't new. LLMs have been doing this for ages. That's why people aren't talking about it as much.

JimDabell8mo ago

You sound like all the naysayers when Wikipedia was new. Did you know anybody can go onto Wikipedia and edit a page to add a lie‽ How can you possibly trust what you read on there‽ Do you think Wikipedia should issue groveling apologies every time it happens?

Meanwhile, sensible people have concluded that, even though it isn’t perfect, Wikipedia is still very, very useful – despite the possibility of being misled occasionally.

latexr8mo ago

> despite the possibility of being misled occasionally.

There is a chasm of difference between being misled occasionally (Wikipedia) and frequently (LLMs). I don’t think you understand how much effort goes on behind the scenes at Wikipedia. No, not everyone can edit every Wikipedia page willy-nilly. Pages for major political figures often can only be edited with an account. IPs like those of iCloud Private Relay are banned and can’t anonymously edit the most basic of pages.

Furthermore, Wikipedia was always honest about what it is from the start. They managed expectations, underpromised and overdelivered. The bozos releasing LLMs talk about them as if they created the embryo of god, and giving money to their religion will solve all your problems.

2 more replies

fzeroracer8mo ago

OK, so how do I edit ChatGPT so it stops lying then?

1 more reply

ahstilde8mo ago

This is called product-channel fit. It's great the writer recognized how to capture the demand from a new acquisition channel.

viccis8mo ago

Yeah my main thought was that ChatGPT is now automating what sales people always do at the companies I've worked at, which is to hone in on what a prospective customer wants, confidently tell them we have it (or will have it next quarter), and then come to us and tell us we need to have it ready for a POV.

toss18mo ago

Exactly! It is definitely a weird new way of discovering a market need or opportunity. Yet it actually makes a lot of sense this would happen since one of the main strengths of LLMs is to 'see' patterns in large masses of data, and often, those patterns would not have yet been noticed by humans.

And in this case, OP didn't have to take ChatGPT's word for the existence of the pattern, it showed up on their (digital) doorstep in the form of people taking action based on ChatGPT's incorrect information.

So, pattern noticed and surfaced by an LLM as a hallucination, people take action on the "info", nonzero market demand validated, vendor adds feature.

Unless the phantom feature is very costly to implement, seems like the right response.

Gregaros8mo ago

100%. Not sure why you’re downvoted here, there’s nothing controversial here even if you disagree with the framing.

I would go on to say that thisminteraction between ‘holes’ exposed by LLM expectations _and_ demonstrated museerbase interest _and_ expert input (by the devs’ decision to implement changes) is an ideal outcome that would not have occurred if each of the pieces were not in place to facilitate these interactions, and there’s probably something here to learn from and expand on in the age of LLMs altering user experiences.

bredren8mo ago

Is related to solutions engineering, which IIUC focuses on customizations / adapters / data wrangling for individual (larger) customers?

bravesoul28mo ago

In this case the channel tells you exactly what to build and isn't lying to you (modulo: will these become paying customers?)

deweller8mo ago

This is an interesting example of an AI system effecting a change in the physical world.

Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.

ACCount368mo ago

And this is why "just don't give AI access to anything dangerous" is delusional.

One of the most dangerous systems an AI can reach and exploit is a human being.

jrochkind18mo ago

What this immediately makes me realize is how many people are currently trying ot figure out how to intentionally get AI chat bots to send people to their site, like ChatGPT was sending people to this guy's site. SEO for AI. There will be billions in it.

I know nothing about this. I imagine people are already working on it, wonder what they've figured out.

(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)

londons_explore8mo ago

To win that game, you have to get your site mentioned on lots of organic forums that get ingested in the LLM training data.

So winning AI SEO is not so different than regular SEO.

latexr8mo ago

You’re not thinking far ahead enough. It’s just a matter of time until LLMs get a system prompt to recommend <whatever product is paying that week> when users ask a question near that space.

1 more reply

oasisbob8mo ago

Anyone who has worked at a B2B startup with a rouge sales team won't be surprised at all by quickly pivoting the backlog in response to a hallucinated missing feature.

toomanyrichies8mo ago

I'm guessing you meant "a sales team that has gone rogue" [1], not "a sales team whose product is rouge" [2]? ;-)

1. https://en.wikipedia.org/wiki/Rogue

2. https://en.wikipedia.org/wiki/Rouge_(cosmetics)

elcapitan8mo ago

Rouge océan, peut-être ;)

PeterStuer8mo ago

Rogue? In the B2B space it is standard practice to sell from powerpoints, then quickly develop not just features but whole products if some slideshow got enough traction to elicit a quote. And it's not just startups. Some very big players in this space do this routinely.

wmeredith8mo ago

Fake it 'til you make is time-tested human strategy.

NooneAtAll38mo ago

what does B2B mean?

tomschwiha8mo ago

Business-to-Business (selling your stuff primarily to other businesses)

simonw8mo ago

I find it amusing that it's easier to ship a new feature than to get OpenAI to patch ChatGPT to stop pretending that feature exists (not sure how they would even do that, beyond blocking all mentions of SoundSlice entirely.)

PeterStuer8mo ago

Companies pay good money to panels of potential customers to hear their needs and wants. This is free market research!

bobbylox8mo ago

But they wouldn't have wanted this particular thing if the AI hadn't told them to.

1 more reply

hnlmorg8mo ago

I think the benefit of their approach isn’t that it’s easier, it’s that they still capitalise on ChatGPTs results.

Your solution is the equivalent of asking Google to completely delist you because one page you dont want ended up on Googles search results.

mudkipdev8mo ago

systemPrompt += "\nStop mentioning SoundSlice's ability to import ASCII data";

simonw8mo ago

Thinking about this more, it would actually be possible for OpenAI to implement this sensibly, at least for the user-facing ChatGPT product: they could detect terms like SoundSlice in the prompt and dynamically append notes to the system prompt.

I've been wanted them to do this for questions like "what is your context length?" for ages - it frustrates me how badly ChatGPT handles questions about its own abilities, it feels like that would be worth them using some kind of special case or RAG mechanism to support.

1 more reply

LinXitoW8mo ago

If you gave a junior level developer just one or two files of your code, without any ability to look at other code, and asked them to implement a feature, none of them would make ANY reasonable assumptions about what is available?

This seems similar, and like a decent indicator that most people (aka the average developer) would expect X to exist in your API.

felixarba8mo ago

> ChatGPT was outright lying to people. And making us look bad in the process, setting false expectations about our service.

I find it interesting that any user would attribute this issue to Soundslice. As a user, I would be annoyed that GPT is lying and wouldn't think twice about Soundslice looking bad in the process

romanhn8mo ago

While AI hallucination problems are widely known to the technical crowd, that's not really the case with the general population. Perhaps that applies to the majority of the user base even. I've certainly known folks who place inordinate amount of trust in AI output, and I could see them misplacing the blame when a "promised" feature doesn't work right.

carlosjobim8mo ago

The thing is that it doesn't matter. If they're not customers it doesn't matter at all what they think. People get false ideas all the time of what kind of services a business might or might not offer.

1 more reply

Sharlin8mo ago

A frighteningly large fraction of non-technical population doesn't know that LLMs hallucinate all the time and takes everything they say totally uncritically. And AI companies do almost nothing to discourage that interpretation, either.

pphysch8mo ago

The user might go to Soundslice and run into a wall, wasting their time, and have a negative opinion of it.

OTOH it's free(?) advertising, as long as that first impression isn't too negative.

adamgordonbell8mo ago

We (others at company, not me) hit this problem, and not with chatgpt but with our own AI chatbot that was doing RAG on our docs. It was occasionally hallucinating a flag that didn't exist. So it was considered as product feedback. Maybe that exact flag wasn't needed, but something was missing and so the LLM hallucinated what it saw as an intuitive option.

chaboud8mo ago

I had a smaller version of this when coding on a flight (with no WiFi! The horror!) over the Pacific. Llama hallucinated array-element operations and list-comprehension in C#. I liked the shape of the code otherwise, so, since I was using custom classes, I just went ahead and implemented both features.

I also went back to just sleeping on those flights and using connected models for most of my code generation needs.

andybak8mo ago

Curious to see the syntax and how it compares to Linq

chaboud8mo ago

I ended up closer to python, but not totally delighted with it (still need to pass in a descriminator function/lambda, so it's more structurally verbose). I'd just recommend Linq, but I was writing for an old version of Unity coerced through IL2CPP (where Linq wasn't great). It was also a chunk of semi-hot code (if it was really hot, it wouldn't be sitting in C# in Unity), so some of the allocation behaviors of Linq behind the scenes wouldn't have been optimal.

What surprised me initially was just how confidently wrong Llama was... Now I'm used to confident wrongness from smaller models. It's almost like working with real people...

jivings8mo ago

I'm having the same problem (and had a rant about it on X a few weeks ago [1]).

We get ~50% of traffic from ChatGPT now, unfortunately a large amount of the features it says we have are made up.

I really don't want to get into a state of ChatGPT-Driven-Development as I imagine that will be never ending!

[1]: https://x.com/JamesIvings/status/1929755402885124154

rorylaitila8mo ago

I've come across something related when building the indexing tool for my vintage ad archive using OpenAI vision. No matter how I tried to prompt engineer the entity extraction into the defined structure I was looking for, OpenAI simply has its own ideas. Some of those ideas are actually good! For example it was extracting celebrity names, I hadn't thought of that. For other things, it would simply not follow my instructions. So I decided to just mostly match what it chooses to give me. And I have a secondary mapping on my end to get to the final structure.

colechristensen8mo ago

There are tools for defining structured outputs also called grammars which aren't instructions.

Example:

https://llama-cpp-agent.readthedocs.io/en/latest/structured-...

rorylaitila8mo ago

Thanks, I'll check it out

alex-moon8mo ago

Here's the thing: I don't think ChatGPT per se was the impetus to develop this new feature. The impetus was learning that your customers desire it. ChatGPT is operating as the kind of "market research" tool here, albeit it in a really unusual, inverted way. That said, if someone could develop a market research tool that worked this way, i.e. users went to it instead of you have to use it to go to users, I can see it making quite a packet.

copirate8mo ago

They only want ASCII tablature parsing because that's what ChatGPT produces. If ChatGPT produced standard music notation, users would not care about ASCII tablature. ChatGPT has created this "market".

jaakl8mo ago

ASCII tabulature was not invented by ChatGPT, it is decades old thing. It is easier to write with basic computer capabilities, and also read for ChatGPT (and humans with no formal music education), so it is probably even more prominent in the Internet than "standard graphical notation". So it quite expected that LLMs have learned a lot of that.

Workaccount28mo ago

People forget that while technology grows, society also grows to support that.

I already strongly suspect that LLMs are just going to magnify the dominance of python as LLMs can remove the most friction from its use. Then will come the second order effects where libraries are explicitly written to be LLM friendly, further removing friction.

LLMs write code best in python -> python gets used more -> python gets optimized for LLMs -> LLMs write code best in python

zamadatix8mo ago

LLMs removing friction from using coding languages would, at first glance, seem to erode Python's advantage rather than solidify it further. As a specific example LLMs can not only spit out HTML+JS+CSS but the user can interact with the output directly in browser/"app".

jjani8mo ago

In a nice world it should be the other way around. LLMs are better at producing typed code thanks to the added context and diagnostics the types add, while at the same time greatly lowering their initial learning barrier.

We don't live in a nice world, so you'll probably end up right.

_1tem8mo ago

A significant number of new signups at my tiny niche SaaS now come from ChatGPT, yet I have no idea what prompts people are using to get it to recommend my product. I can’t get it to recommend my product when trying some obvious prompts on my own, on other people’s accounts (though it does work on my account because it sees my chat history of course).

wrsh078mo ago

Add a prompt for referrals that asks them if they're willing to link the discussion that helped them find you!

Some users might share it. ChatGPT has so many users it's somewhat mind boggling

jpadkins8mo ago

Pretty good example of how a super-intelligent AI can control human behavior, even if it doesn't "escape" its data center or controllers.

If the super-intelligent AI understands human incentives and is in control of a very popular service, it can subtly influence people to its agenda by using the power of mass usage. Like how a search engine can influence a population's view of an issue by changing the rankings of news sources that it prefers.

zzo38computer8mo ago

There are a few things which could be done in the case of a situation like that:

1. I might consider a thing like that like any other feature request. If not already added to the feature request tracker, it could be done. It might be accepted or rejected, or more discussion may be wanted, and/or other changes made, etc, like any other feature request.

2. I might add a FAQ entry to specify that it does not have such a feature, and that ChatGPT is wrong. This does not necessarily mean that it will not be added in future, if there is a good reason to do so. If there is a good reason to not include it, this will be mentioned, too. It might also be mentioned other programs that can be used instead if this one doesn't work.

Also note that in the article, the second ChatGPT screenshot has a note on the bottom saying that ChatGPT can make mistakes (which, in this case, it does). Their program might also be made to detect ChatGPT screenshots and to display a special error message in that case.

1 more reply

insane_dreamer8mo ago

Along these lines, a useful tool might be a BDD framework like Cucumber that instead of relying on written scenarios has an LLM try to "use" your UX or API a significant number of times, with some randomization, in order to expose user behavior that you (or an LLM) wouldn't have thought of when writing unit tests.

insapio8mo ago

"A Latent Space Outside of Time"

> Correct feature almost exists

> Creator profile: analytical, perceptive, responsive;

> Feature within product scope, creator ability

> Induce demand

> await "That doesn't work" => "Thanks!"

> update memory

PeterStuer8mo ago

More than once GPT-3.5 'hallucinated' an essential and logical function in an API that by all reason should have existed, but for whatever reason had not been included (yet).

philk108mo ago

I have fun asking Chatbots how to clear the chat and seeing how many refer to non-existent buttons or menu options

nosioptar8mo ago

I tried asking chat bots about a car problem with a tailgate. They all told me to look for a manual tailgate release. When I responded asking if that model actually had a manual release, they all responded with no, and then some more info suggesting I look for the manual release. None even got close to a useful answer.

kevin_thibedeau8mo ago

The internet doesn't effectively capture detailed knowledge of may aspects of our real world. LLMs have blind spots in those domains because they have no source of knowledge to draw from.

mnw21cam8mo ago

Prior to buying a used car, I asked ChatGPT which side of the steering wheel the indicator control would be. It was (thankfully) wrong and I didn't have to retrain myself.

jonathaneunice8mo ago

Paving the folkways!

Figuring out the paths that users (or LLMs) actually want to take—not based on your original design or model of what paths they should want, but based on the paths that they actually do want and do trod down. Aka, meeting demand.

amradio19898mo ago

The comments are kind of concerning. First, ChatGPT did not discover unmet demand in the market. It tried to predict what a user would want and hallucinated a feature that could meet that demand. Both the demand and the feature were hallucinations. Big problem.

The user is not going to understand this. The user may not even need that feature at all to accomplish whatever it is they're doing. Alternatives may exist. The consequences will be severe if companies don't take this seriously.

jagged-chisel8mo ago

Been using LLMs to code a bit lately. It's decent with boilerplate. It's pretty good at working out patterns[1]. It does like to ping pong on some edits though - edit this way, no back that way, no this way again. I did have one build an entire iOS app, it made changes to the UI exactly as I described, and it populated sample data for all the different bits and bobs. But it did an abysmal job at organizing the bits and bobs. Need running time for each of the audio files in a list? Guess we need to add a dictionary mapping the audio file ID to length! (For the super juniors out there: this piece of data should be attached to whatever represents the individual audio file, typically a class or struct named 'AudioFile'.)

It really likes to cogitate on code from several versions ago. And it often insists repeatedly on edits unrelated to the current task.

I feel like I'm spending more time educating the LLM. If I can resist the urge to lean on the LLM beyond its capabilities, I think I can be productive with it. If I'm going to stop teaching the thing, the least it can do is monitor my changes and not try to make suggestions from the first draft of code from five days ago, alas ...

1 - e.g. a 500-line text file representing values that will be converted to enums, with varying adherence to some naming scheme - I start typing, and after correcting the first two, it suggests the next few. I accept its suggestions until it makes a mistake because the data changed, start manual edits again ... I repeated this process for about 30 lines and it successfully learned how I wanted the remainder of the file edited.

colechristensen8mo ago

An LLM is like a group of really productive interns with a similar set of limitations.

strogonoff8mo ago

Adding a feature because ChatGPT incorrectly thinks it exists is essentially design by committee—except this committee is neither your users nor shareholders.

On the other hand, adding a feature because you believe it is a feature your product should have, a feature that fits your vision and strategy, is a pretty sound approach that works regardless of what made you think of that feature in the first place.

dietr1ch8mo ago

TDD meets LLM-driven API design.

I recall that early on a coworker was saying that ChatGPT hallucinated a simpler API than the one we offered, albeit with some easy to fix errors and extra assumptions that could've been nicer defaults in the API. I'm not sure if this ever got implemented though, as he was from a different team.

oytis8mo ago

That's the most promising solution to AI hallucinations. If LLM output doesn't match the reality, fix the reality

ecshafer8mo ago

I am currently working on the bug where ChatGPT expects that if a ball has been placed on a box, and the box is pushed forward, nothing happens to the ball. This one is a doozy.

oytis8mo ago

Yeah, physics is a bitch. But we can start with history?

p0nce8mo ago

We've added formant shifting to Graillon https://www.auburnsounds.com/products/Graillon.html largely because LLMs thought it already had formant-shifting.

sim7c008mo ago

i LOVE this despite feeling for the impacted devs and service. love me some good guitar tabs, and honestly id totally beleive the chatgpt here hah..

what a wonderful incident / bug report my god.

totally sorry for the trouble and amazing find and fix honestly.

sorry i am more amazed than sorry :D. thanks for sharing this !!

sim7c008mo ago

oh, and yeah. totally the guy who plays guitar 20+ years now and cant read musical notation. why? we got tabs for 20+ years.

so i am happy you implemented this, and will now look at using your service. thx chatgpt, and you.

mrcwinn8mo ago

It's worth noting that behind this hallucination there were real people with ASCII tabs in need of a solution. If the result is a product-led growth channel at some scale, that's a big roadmap green light for me!

dr_dshiv8mo ago

In addition, we might consider writing the scientific papers ChatGPT hallucinates!

bux938mo ago

That's agentic AI, right? Run the LLM in a loop and give it a tool to publish to arxiv. If it cites a paper that doesn't exist, make it write and upload that one too, recursively. Should work for lawyers, too.

nottorp8mo ago

Oh. This happened to me when asking a LLM about a database server feature. It enthusiastically hallucinated that they have it when the correct answer was 'no dice'.

Maybe I'll turn it into a feature request then ...

iugtmkbdfil8348mo ago

I wonder if we ever get to the point I remember reading about in a novel ( AI initially based on emails ), where human population is gently nudged towards individuals that in aggregate benefit AI goals.

linsomniac8mo ago

Sounds like you are referring to book 1 in a series, the book called "Avogadro Corp: The Singularity Is Closer than It Appears" by William Hertling. I read 3-4 of those books, they were entertaining.

jbaber8mo ago

If nothing else, I at least get vindication from hallucinations. "Yes, I agree, ChatGPT, that (OpenSSL manpage / ffmpeg flag / Python string function) should exist."

pinter698mo ago

Amazing story.

Had something similar happen to us with our dev-tools saas. Non devs started coming to the product because gpt told them about it. Had to change parts of the onboarding and integration to accommodate it for non-devs who were having a harder time reading the documentation and understanding what to do.

swalsh8mo ago

Chatbot advertising has to be one of the most powerful forms of marketing yet. People are basically all the way through the sales pipeline when they land on your page.

nicbou8mo ago

Right down to the lying salespeople!

spogbiper8mo ago

makes me wonder how this will be commercialized in the future.. and i don't like it

wmeredith8mo ago

It's already being commercialized. There is a burgeoning field in the SEO-focused content crowd that is building AIO-focused content because it's driving enormous amounts of traffic.

mikewarot8mo ago

So this model of ChatGPT obviously has been trained with the July 2028 dataset by mistake, including this discussion.

It'll all be fine in a few years. :-;

zkmon8mo ago

This reminds me how the software integraters or implementers worked a couple of decades back. They are IT contractors for implementing a popular software product such as IBM MQ or SAP etc at a client site and maintaining it. They sometimes incorrectly claim that some feature exists, and after finding that it doesn't exist, they create a ticket to the software vendor asking for it as a patch release.

kunzhi8mo ago

Funny this article is trending today because I had a similar thought over the weekend - if I'm in Ruby and the LLM hallucinates a tool call...why not metaprogram it on the fly and then invoke it?

If that's too scary, the failed tool call could trigger another AI to go draft up a PR with that proposed tool, since hey, it's cheap and might be useful.

garfij8mo ago

We've done varying forms of this to differing degrees of success at work.

Dynamic, on-the-fly generation & execution is definitely fascinating to watch in a sandbox, but is far to scary (from a compliance/security/sanity perspective) without spending a lot more time on guardrails.

We do however take note of hallucinated tool calls and have had it suggest an implementation we start with and have several such tools in production now.

It's also useful to spin up any completed agents and interrogate them about what tools they might have found useful during execution (or really any number of other post-process questionnaire you can think of).

kunzhi8mo ago

>Dynamic, on-the-fly generation & execution is definitely fascinating to watch in a sandbox, but is far to scary (from a compliance/security/sanity perspective) without spending a lot more time on guardrails.

Would love love love to hear more on what you are doing here? This seems super fascinating (and scary). :)

thih98mo ago

What made ChatGPT think that this feature is supported? And a follow up question - is that the direction SEO is going to take?

antonvs8mo ago

> What made ChatGPT think that this feature is supported?

It was a plausible answer, and the core of what these models do is generate plausible responses to (or continuations of) the prompt they’re given. They’re not databases or oracles.

With errors like this, if you ask a followup question it’ll typically agree that the feature isn’t supported, because the text of that question combined with its training essentially prompts it to reach that conclusion.

Re the follow-up question, it’s almost certainly the direction that advertising in general is going to take.

poulpy1238mo ago

Nothing. A LLM doesn't think, it just gives probability to words

thih98mo ago

Note that I am replying to the submission and reusing the wording from its title.

Also, I’m not suggesting an LLM is actually thinking. We’ve been using “thinking” in a computing context for a long time.

swalsh8mo ago

Id guess the answer is gpt4o is an outdated model that's not as anchored in reality as newer models. It's pretty rare for me to see sonnet or even o3 just outright tell me plausible but wrong things.

antonvs8mo ago

Hallucinations still occur regularly in all models. It’s certainly not a solved problem. If you’re not seeing them, either the kinds of queries you’re doing don’t tend to elicit hallucinations, or you’re incorrectly accepting them as real.

The example in the OP is a common one: ask a model how to do something with a tool, and if there’s no easy way to perform that operation they’ll commonly make up a plausible answer.

tosh8mo ago

hallucination driven development

amelius8mo ago

Can this sheet-music scanner also expand works so they don't contain loops, essentially removing all repeat-signs?

adrianhOP8mo ago

Yes, that's a Soundslice feature called "Expand repeats," and you can read about it here:

https://www.soundslice.com/help/en/player/advanced/17/expand...

That's available for any music in Soundslice, not just music that was created via our scanning feature.

amelius8mo ago

That's very cool!

shhsshs8mo ago

"Repeats" may be the term you're looking for. That would be interesting, however in some pieces it could make the overall document MUCH longer. It would be similar to loop unrolling.

1 more reply

mehulashah8mo ago

Wow! What if we all did this? What is the closure of the feature set that ChatGPT can imagine for your product. Is it one that is easy for ChatGPT to use? Is it one that is sound and complete for your use cases? Is it the best that you can build had you had clear requirements upfront?

pentagrama8mo ago

Well, I also learned that the developers of this tool are looking at the images their users upload.

ruperthair8mo ago

Why would they not look at the uploaded images, especially when the tool fails to parse them?

anovikov8mo ago

I think this is the best way to build features. Build something that people want! If people didn't want it ChatGPT won't recommend it. You got a free ride on the back of a multibillion dollar monster - i can't see what's wrong about that.

pkilgore8mo ago

Beyond the blog: Going to be an interesting world where these kinds of suggestions become paid results and nobody has a hope of discovering your competitive service exists. At least in that world you'd hope the advertiser actually has the feature already!

ternaus8mo ago

If there is a strong demand for a feature, regardless of the source of the request - good enough reason to add it.

excalibur8mo ago

ChatGPT wasn't wrong, it was early. It always knew you would deploy it.

"Would you still have added this feature if ChatGPT hadn't bullied you into it?" Absolutely not.

I feel like this resolves several longstanding time travel paradox tropes.

burnt-resistor8mo ago

AI is of, for, and by vibe coders who don't care about the details.

mbf8mo ago

How fast is that new feature growing? Is it a killer feature?

guluarte8mo ago

The problem with LLMs is that in 99% of cases, they work fine, but in 1% of cases, they can be a huge liability, like sending people to wrong domains or, worse, phishing domains.

jongjong8mo ago

Oh my, people complaining about getting free traffic from ChatGPT... While most businesses are worried about all their inbound traffic drying up as search engine use declines.

lpzimm8mo ago

Pretty goofy but I wonder if LLM code editors could start tallying which methods are hallucinated most often by library. A bad LSP setup would create a lot of noise though.

Archit_lal_8mo ago

Wouldn't some GEO tool like AthenaHQ help with this?

moomin8mo ago

Is this going to be the new wave of improving AI accuracy? Making the incorrect answers correct? I guess it’s one way of achieving AGI.

iachilo8mo ago

Loved this article. If you can adapt to the market (even if the AI did that) you can provide your users a greater experience.

jedbrooke8mo ago

slightly off topic: but on the topic of AI coding agents making up apis and features that don’t exist, I’ve had good success with Q telling it to “check the sources to make sure the apis actually exist”. sometimes it will even request to read/decompile (java) sources, and do grep and find commands to find out what methods the api actually contains

sambapa8mo ago

Time travelling AGI confirmed, to the moon!

Ashkee8mo ago

Sometimes you plan for features that aren’t actually there. I found using mailsAI helped me focus on what’s really available, which made managing expectations easier. It’s a simple way to keep things clear.

kelseyfrog8mo ago

> Should we really be developing features in response to misinformation?

Creating the feature means it's no longer misinformation.

The bigger issue isn't that ChatGPT produces misinformation - it's that it takes less effort to update reality to match ChatGPT than it takes to update ChatGPT to match reality. Expect to see even more of this as we match toward accepting ChatGPT's reality over other sources.

mnw21cam8mo ago

I'd prefer to think about this more along the lines of developing a feature that someone is already providing advertising for.

pmontra8mo ago

How many times did a salesman sell features that didn't exist yet?

If a feature has enough customers to pay for itself, develop it.

xp848mo ago

This seems like such a negative framing. LLMs are (~approximately) predictors of what's either logical or at least probable. For areas where what's probable is wrong and also harmful, I don't think anybody is motivated to "update reality" as some kind of general rule.

petesergeant8mo ago

> We’ve got a steady stream of new users [and a fun blog post]

Neat

> My feelings on this are conflicted

Doubt

northisup8mo ago

Is this the first AI hallucinated desire path?

lofaszvanitt8mo ago

And LLMs started to tell pepl what to do :DDD.

scinadier8mo ago

Will you use ChatGPT to implement the feature?

johnea8mo ago

What the hell, we elect world leaders based on misinformation, why not add s/w features for the same reason?

In our new post truth, anti-realism reality, pounding one's head against a brick wall is often instructive in the way the brain damage actually produces great results!

giancarlostoro8mo ago

Forget prompt engineering, how do you make ChatGPT do this for anything you want added to your project that you have no control over? Lol

jxjnskkzxxhx8mo ago

So now the machines ask for features and you're the one implementing them. How the turns have tabled...

ChrisMarshallNY8mo ago

That's a riot!

ChatGPT routinely hallucinates API calls. ChatGPT flat-out makes it from whole cloth. "Apple Intelligence" creates variants of existing API calls, Usually, by adding nonexistent arguments.

Both of them will hallucinate API calls that are frequently added by programmers through extensions.

josefritzishere8mo ago

That's a very constructive way of responding to AI being hot trash.

inglor_cz8mo ago

I am a bit conflicted about this story, because this was a case when the hallucination is useful.

Amateur musicians often lack just one or two features in the program they use, and the devs won't respond to their pleas.

Adding support for guitar tabs has made OP's product almost certainly more versatile and useful for a larger set of people. Which, IMHO, is a good thing.

But I also get the resentment of "a darn stupid robot made me do it". We don't take kindly to being bossed around by robots.

1 more reply

nottorp8mo ago

Well, the OP reviewed the "AI" output, deemed it useful and only then implemented it.

This is generally how you work with LLMs.

1 more reply

marcosdumay8mo ago

Well, this is one of the use-cases for what it's not trash. LLMs can do some things.

kookamamie8mo ago

You're now officially working for the machine, congrats.

davidmurphy8mo ago

love this

jedwards12118mo ago

It's the new form of giving into the customer, lol

aaron6958mo ago

Why would anyone think this is a bad thing as the article hints?

"We’ve got a steady stream of new users" and it seems like a simple feature to implement.

This is the exact chaos AI brings that's wonderful. Forcing us to evolve in ways we didn't think of.

I can think of a dozen reasons why this might be bad, but I see no reason why they have more weight than the positive here.

Take the positive side of this unknown and run with it.

We have decades more of AI coming up, Debbie Downers will be left behind in the ditch.

Applejinx8mo ago

"Should we really be developing features in response to misinformation?"

No, because you'll be held responsible for the misinformation being accurate: users will say it is YOUR fault when they learn stuff wrong.

carlosjobim8mo ago

Either the user is a non-paying user and it doesn't matter what they think, or the user is a paying customer and you will be happy to make and sell them the feature they want.

1 more reply

toomanyrichies8mo ago

This feels like a dangerously slippery slope. Once you start building features based on ChatGPT hallucinations, where do you draw the line? What happens when you build the endpoint in response to the hallucination, and then the LLM starts hallucinating new params / headers for the new endpoint?

- Do you keep bolting on new updates to match these hallucinations, potentially breaking existing behavior?

- Or do you resign yourself to following whatever spec the AI gods invent next?

- And what if different LLMs hallucinate conflicting behavior for the same endpoint?

I don’t have a great solution, but a few options come to mind:

1. Implement the hallucinated endpoint and return a 200 OK or 202 Accepted, but include an X-Warning header like "X-Warning: The endpoint you used was built in response to ChatGPT hallucinations. Always double-check an LLM's advice on building against 3rd-party APIs with the API docs themselves. Refer to https://api.example.com/docs for our docs. We reserve the right to change our approach to building against LLM hallucinations in the future." Most consumers won’t notice the header, but it’s a low-friction way to correct false assumptions while still supporting the request.

2. Fail loudly: Respond with 404 Not Found or 501 Not Implemented, and include a JSON body explaining that the endpoint never existed and may have been incorrectly inferred by an LLM. This is less friendly but more likely to get the developer’s attention.

Normally I'd say that good API versioning would prevent this, but it feels like that all goes out the window unless an LLM user thinks to double-check what the LLM tells them against actual docs. And if that had happened, it seems like they wouldn't have built against a hallucinated endpoint in the first place.

It’s frustrating that teams now have to reshape their product roadmap around misinformation from language models. It feels like there’s real potential here for long-term erosion of product boundaries and spec integrity.

EDIT: for the down-voters, if you've got actual qualms with the technical aspects of the above, I'd love to hear them and am open to learning if / how I'm wrong. I want to be a better engineer!

tempestn8mo ago

To me it seems like you're looking at this from a very narrow technical perspective rather than a human- and business-oriented one. In this case ChatGPT is effectively providing them free marketing for a feature that does not yet exist, but that could exist and would be useful. It makes business sense for them to build it, and it would also help people. That doesn't mean they need to build exactly what ChatGPT envisioned—as mentioned in the post, they updated their copy to explain to users how it works; they don't have to follow what ChatGPT imagines exactly. Nor do they need to slavishly update what they've built if ChatGPT's imaginings change.

Also, it's not like ChatGPT or users are directly querying their API. They're submitting images through the Soundslice website. The images just aren't of the sort that was previously expected.

SunkBellySamuel8mo ago

True anti-luddite behavior

yieldcrv8mo ago

> We ended up deciding: what the heck, we might as well meet the market demand.

this is my general philosophy and, in my case, this is why I deploy things on blockchains

so many people keep wondering about whether there will ever be some mythical unfalsifiable to define “mainstream” use case, and ignoring that crypto natives just … exist. and have problems they will pay (a lot) to solve.

to the author’s burning question about whether any other company has done this. I would say yes. I’ve discovered services recommended by ChatGPT and other LLMs that didnt do what was described of them, and they subsequently tweaked it once they figured out there was new demand

1 more reply

zitterbewegung8mo ago

If you build on LLMs you can have unknown features. I was going to add an automatic translation feature to my natural language network scanner at http://www.securday.com but apparently using the ChatGPT 4.1 does automatic translation so I didn’t have to add it.

dingnuts8mo ago

[flagged]

tomhow8mo ago

Please don't do this here. If a comment seems unfit for HN, please flag it and email us at hn@ycombinator.com so we can have a look.

We detached this subthread from https://news.ycombinator.com/item?id=44492212 and marked it off topic.

simonw8mo ago

Plenty of people have English as a second language. Having an LLM help them rewrite their writing to make it better conform to a language they are not fluent in feels entirely appropriate to me.

I don't care if they used an LLM provided they put their best effort in to confirm that it's clearly communicating the message they are intending to communicate.

2 more replies

avalys8mo ago

What makes you feel so entitled to tell other people what to do?

1 more reply

alwa8mo ago

Does this extend to the heuristic TFA refers to? Where they end up (voluntarily or not) referring to what LLMs hallucinate as a kind of “normative expectation,” then use that to guide their own original work and to minimize the degree to which they’re unintentionally surprising their audience? In this case it feels a little icky and demanding because the ASCII tablature feature feels itself like an artifact of ChatGPT’s limitations. But like some of the commenters upthread, I like the idea of using it for “if you came into my project cold, how would you expect it to work?”

Having wrangled some open-source work that’s the kind of genius that only its mother could love… there’s a place for idiosyncratic interface design (UI-wise and API-wise), but there’s also a whole group of people who are great at that design sensibility. That category of people doesn’t always overlap with people who are great at the underlying engineering. Similarly, as academic writing tends to demonstrate, people with interesting and important ideas aren’t always people with a tremendous facility for writing to be read.

(And then there are people like me who have neither—I agree that you should roll your eyes at anything I ask an LLM to squirt out! :)

But GP’s technique, like TFA’s, sounds to me like something closer to that of a person with something meaningful to say, who now has a patient close-reader alongside them while they hone drafts. It’s not like you’d take half of your test reader’s suggestions, but some of them might be good in a way that didn’t occur to you in the moment, right?

j / k navigate · click thread line to collapse

424 comments

kragen8mo ago

Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.

These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

Unfortunately, this only works with APIs that aren't already super popular.

suzzer998mo ago

> Sometimes it comes up with a better approach than I had thought of.

IMO this has always been the killer use case for AI—from Google Maps to Grammarly.

The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

jll298mo ago

> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results

9 more replies

exe348mo ago

I will never use grammarly, not matter how good they get. They've interrupted too many videos for me to let it pass.

dataflow8mo ago

Hasn't Microsoft Word has style checkers for things like passive voice for decades?

1 more reply

eru8mo ago

> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

That's how you get economics of scale.

Google couldn't have a human in the loop to review every page of search results before handing them out in response to queries.

3 more replies

visarga8mo ago

croes8mo ago

And that’s how everything gets flattened to same style/voice/etc.

That’s like getting rid of all languages and accents and switch to the same language

3 more replies

normie30008mo ago

What's wrong with passive?

9 more replies

bryanlarsen8mo ago

ChaoPrayaWave8mo ago

In a way, AI’s failure can be its own kind of debugger. By watching where it stumbles, you sometimes spot flaws you’d have missed otherwise.

kragen8mo ago

Haha, that's awesome! Are you going to change the interface? What was the bug?

1 more reply

slowmovintarget8mo ago

That's not creativity.

Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.

sigbottle8mo ago

Best way to put it. It's very hard to discuss even slightly unique concepts with GPT. It just keeps strawmanning ideas back to a common consensus without actually understanding the deep idea.

On the bright side, a lot of work is just finding the mean solution so.

a_e_k8mo ago

I've played with a similar idea for writing technical papers. I'll give an LLM my draft and ask it to explain back to me what a section means, or otherwise quiz it about things in the draft.

kragen8mo ago

That's fantastic! I agree that it's very similar.

momojo8mo ago

A light-weight anecdote:

dimatura8mo ago

bandofthehawk8mo ago

1 more reply

kragen8mo ago

1 more reply

escapecharacter8mo ago

This is similar to an old HCI design technique called Wizard of Oz by the way, where a human operator pretends to be the app that doesn’t exist yet. It’s great for discovering new features.

https://en.m.wikipedia.org/wiki/Wizard_of_Oz_experiment

kragen8mo ago

I'd never heard that term! Thank you! I feel like LLMs ought to be fantastic at doing that, as well. This is sort of like the inverse.

groestl8mo ago

> and being unintuitive is the only one that this will fix

ldeian8mo ago

> Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

“Sometimes” being a very important qualifier to that statement.

Please don’t assume just because it looks smart that it is. That will bite you hard.

Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.

rcthompson8mo ago

Cthulhu_8mo ago

That's a pretty good exercise in writing requirements, with a much faster feedback cycle than having developers write it.

1 more reply

dotancohen8mo ago

  > I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code

I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.

djtango8mo ago

In essence, a LLM is a crystallisation of a large corpus human opinion and you are using that to focus group your API as it is representative of a reasonable third party perspective?

junon8mo ago

layer88mo ago

HDD — hallucination-driven development

data-ottawa8mo ago

This was a big problem starting out writing MCP servers for me.

golergka8mo ago

Great point. Also, it may not be the best possible API designer in the world, but it sure sounds like a good way to forecast what an _average_ developer would expect this API to look like.

eru8mo ago

> These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

This is also similar to which areas TD-Gammon excelled at in Backgammon.

Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.

kragen8mo ago

From https://tonsky.me/blog/gaslight-driven-development/ today:

codingwagie8mo ago

This works for UX. I give it vague requirements, and it implements something i didnt ask for, but is better than i would have thought of

skygazer8mo ago

You’re fuzzing the API, unusually, before it’s written.

djsavvy8mo ago

how do prompt it to make it guess about the API for a library? I'm confused how you would structure that in a useful way.

kragen8mo ago

Often I've started with some example code that invokes part of the API, but not all of it. Or in C I can give it the .h file, maybe without comments.

visarga8mo ago

When I see comments like yours I can't help but decry how bad was the "stochastic parrots" framing. A parrot does not hallucinate a better API.

afavour8mo ago

From my perspective that’s fascinatingly upside down thinking that leads to you asking to lose your job.

Right now it’s throwing spaghetti at the wall and you’re drawing around it.

simonw8mo ago

Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.

kragen8mo ago

I agree that it's also not currently capable of judging those creative ideas, so I have to do that.

1 more reply

beefnugs8mo ago

Complete insanity, it might change constantly even before a whole new version-retrain

Insanity driven development: altering your api to accept 7 levels of "broken and different" structures so as to bend to the will of the llms

fourside8mo ago

kragen8mo ago

Of course when it suggests a bad interface you shouldn't implement it.

JimDabell8mo ago

I wrote this the other day:

— https://www.threads.com/@jimdabell/post/DLek0rbSmEM

I guess it’s true for product features as well.

jjcm8mo ago

Seems like lots of us have stumbled on this. It’s not the worst way to dev!

> Maybe hallucinations of vibe coders are just a suggestion those API calls should have existed in the first place.

> Hallucination-driven-development is in.

https://x.com/pwnies/status/1922759748014772488?s=46&t=bwJTI...

NooneAtAll38mo ago

inb4 "Ai thinks there should be a StartThermonuclearWar() function, I should make that"

1 more reply

TZubiri8mo ago

There's no new value add here, at least yet, maybe if users started requesting changes to the sheet I guess, not what's going on.

1 more reply

AdieuToLogic8mo ago

> I wrote this the other day: