Response Healing: Reduce JSON defects by 80%+ (opens in new tab)

(openrouter.ai)

51 pointsnumlocked5mo ago47 comments

47 comments

Very confused. When you enable structured output the response should adhere to the JSON schema EXACTLY, not best effort, by constraining the output via guided decoding. This is even documented in OpenRouter's structured output doc

> The model will respond with a JSON object that strictly follows your schema

Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.

Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.

WTF

osaariki5mo ago

You're exactly right. The llguidance library [1,2] seems to have emerged as the go-to solution for this by virtue of being >10X faster than its competition. It's work from some past colleagues of mine at Microsoft Research based on theory of (regex) derivatives, which we perviously used to ship a novel kind of regex engine for .NET. It's cool work and AFAIK should ensure full adherence to a JSON grammar.

llguidance is used in vLLM, SGLang, internally at OpenAI and elsewhere. At the same time, I also see a non-trivial JSON error rate from Gemini models in large scale synthetic generations, so perhaps Google hasn't seen the "llight" yet and are using something less principled.

1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance

red2awn5mo ago

Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...

Gemini [0] is falsely advertising this:

> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.

[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...

top1aibooster5mo ago

> Here's something most developers overlook: if an LLM has a 2% JSON defect rate, and Response Healing drops that to 1%, you haven't just made a 1% improvement. You've cut your defects, bugs, and support tickets in half.

If part of my system can't even manage to output JSON reliably, it needs way more "healing" than syntax munging. This comes across as naive.

Dylan168075mo ago

Plus, that claim isn't even true. A 1% and 2% JSON defect rate are going to annoy a similar amount of people into filing bugs and tickets.

fhcuvyxu5mo ago

1% fail rate on API requests is horrifyingly embarrassing.

0cf8612b2e1e5mo ago

Sounds like we are twice as close to AGI!

01HNNWZ0MV43FF5mo ago

"it's not just X, it's Y"

Don't you worry about Planet Express, let me worry about blank.

Spivak5mo ago

Model itself can't output JSON reliably. It's on you building a system around the model to make sure it either returns correct output or errors which is trivial to do.

arm325mo ago

But, but, you've just cut your defects, bugs, and support tickets in half!

wat100005mo ago

I thought structured output was done by only allowing tokens that would produce valid output. For their example of a missing closing bracket, the end token wouldn't be allowed, and it would only accept tokens that contain a digit, comma, or closing bracket. I guess that must not be the case, though. Doing that seems like a better way to address this.

numlockedOP5mo ago

That is a way of doing that, but it's quite expensive computationally. There are some companies that can make it feasible [0], but it's often not a perfect process and different inference providers implement it different ways.

[0] https://dottxt.ai/

ViewTrick10025mo ago

I have used structured outputs both with OpenAI and the Gemini models. In the beginning they had some rough edges but lately it's been smooth sailing.

Seems like Openrouter also supports structured outputs.

https://openrouter.ai/docs/guides/features/structured-output...

xg155mo ago

Out of curiosity, why is it so expensive? Shouldn't constraining the possible result tokens make the inference less expensive? (because you have to calculate less logits and could occasionally even skip tokens entirely if there is only one valid option)

red2awn5mo ago

Tokens are sampled from logits using the constraints after a normal forward pass. The forward pass is the expensive part of LLM inference which isn't affected by structured output.

1 more reply

wat100005mo ago

Is there anything in the JSON grammer that only allows one valid option? In any case, I also don't understand why it would be costly. The fact that tokens are typically multiple characters would complicate things somewhat, but checking that a given token results in valid partial JSON doesn't seem too hard.

1 more reply

joshstrange5mo ago

I’d be (genuinely) interested to hear from people who think this will help. In my mind, if the JSON isn’t valid I wouldn’t trust a “healed” version of it to be correct either. I mean, I guess you just do schema validation on your end and so maybe fixing a missing comma/brace/etc is actually really helpful. I’ve not done JSON generation at scale to know.

gruez5mo ago

>What about XML? The plugin can heal XML output as well - contact us if you’d like access.

Isn't this exactly how we got weird html parsing logic in the first place, with "autohealing" logic for mismatched closing tags or quotes?

AlexCoventry5mo ago

This is probably a bit different. An LLM outputs a token at a time ("autoregressively") by sampling from a per-position token probability distribution, which depends on all the prior context so far. While the post doesn't describe OpenRouter's approach, most structured LLM output works by putting a mask over that distribution, so that any token which would break the intended structure has probability zero and cannot be sampled. So for instance, in the broken example from the post,

    {"name": "Alice", "age": 30

the standard LLM output would have stopped there because the LLM output an end-of-sequence (EOS) token. But because that would lead to a syntax error in JSON, the EOS token would have probability zero, and it would be forced to either extend the number "30", or add more entries to the object, or end it with "}".

I haven't played much with structured output, but I imagine the biggest risk is that you may force the model to work with contexts outside its training data, leading it to produce garbage, though hopefully syntactically-correct garbage.

I don't understand, though, why the probability of incorrect JSON wouldn't go to 0, under this framework (unless you hit the max sequence length before the JSON ended.) The post implies that JSON errors still happen, so it's possible they're doing something else.

lijok5mo ago

One of the best shitposts I have ever seen, by far. Absurdism taken to its finest form.

culi5mo ago

I did some searching for an open-source version of this and found this pretty neat library for Elixir called json_remedy

https://github.com/nshkrdotcom/json_remedy

oats5mo ago

Is this a joke? Am I going crazy?

I don't like this future we're going towards where we have to trick our software (which we can no longer understand the workings of) into doing what we tell it to by asking it nicely, or by putting another black box on the end to "fix" the output. This is the opposite of engineering. This is negotiation with a genie trapped in silicon.

blibble5mo ago

it does seem as if the world has gone insane

we have brilliant machines that can more or less work perfectly

then the scam artists have convinced people that spending a trillion dollar and terawatts to get essentially a biased random number generator to produce unusable garbage is somehow an improvement

Spivak5mo ago

These models have turned a bunch of NLP problems that were previously impossible into something trivial. I have personally built extremely reliable systems from the biased random number generator. Our f-score using "classic" NLP went from 20% to 99% using LLMs.

no_wizard5mo ago

NLP, natural language processing for the unfamiliar. LLMs are tailor made for this work particularly well. They're great tokenizers of structured rules. Its why they're also halfway decent at generating code in some situations.

I think the fall down you see is in logical domains of that rely on relative complexity and contextual awareness in a different way. I've had less luck, for example, having AI systems parse and break down a spreadsheet with complex rules. Thats simply recent memory

gavmor5mo ago

I don't know, I think it's pretty cool that we can turn arbitrary human speech into well-formed RPCs.

Eisenstein5mo ago

It is easier to realize that software development was never engineering. Physical engineering is reliant on physics, while software is reliant on other software. Physics are static and as regarding practical engineering is known and can be applied rigorously and taught in courses. Software is constantly changing, contain tons of edge cases, and as we can see by recent developments, can change in unpredictable ways and lead to entirely new paradigms.

So, the software that you learned on is changing. You aren't going crazy, but the ground is indeed shifting. The problem is that you assumed it couldn't shift because you were applying the wrong constraints.

nubg5mo ago

Dear Openrouter blog authors, could you please stop writing your blogposts with LLMs?

The content of your posts is really insightful and interesting, but it's feel like junk quality because of the way LLMs write blogposts.

What was your prompt?

lab5mo ago

A lot of it was finger written -- curious which part sounded like LLM to you?

CallMeJim5mo ago

> > Here's something most developers overlook: if an LLM has a 2% JSON defect rate, and Response Healing drops that to 1%, you haven't just made a 1% improvement. You've cut your defects, bugs, and support tickets in half.

This sounds AI written.

nubg5mo ago

Meaning parts were LLM written? With no disclosure?

Sabinus5mo ago

"With no disclosure?"

Why do you have an expectation that a company will disclose to you when they use AI for their copywriting? Do you want them to disclose the software they used to draft and publish? If a manager reviewed the blog post before it went live?

2 more replies

re-thc5mo ago

Next up: blog healing

kgeist5mo ago

With guided decoding (structured output according to a schema), a model can sometimes return broken JSON - usually if it stops midway for some reason. In those rare cases, a better approach would be to simply retry, no? Trying to "fix" broken JSON without understanding the context can mask real problems and produce data that appears to be correct, but is actually corrupted.

petesergeant5mo ago

I see responses here split into users who actually rely on JSON outputs, who are happy, and people who don't, who are being snippy. Thank you OpenRouter, this is a great feature.

impure5mo ago

I have built something similar before. But I’ve never had any problems with Gemini not doing Json properly. The problematic models are the open models such as Gemma and GPT OSS.

kristianp5mo ago

How do they know the output needs to be in json format?

stuaxo5mo ago

This is good, is there a python library to do this ?

idle_zealot5mo ago

This really gets at the heart of my instinctive dislike of how LLMs are being deployed. A core feature of computers, and tools in general, is reliability. I like software because you can set something up, run it, and (ideally) know that it will do the same job the same way each subsequent time you run it. I want a button that is clearly labeled, and when pressed, does a specific thing, acting like a limb, an extension of my will. I do not, in almost all cases, want my computer to be another distinct entity that I conduct social interactions with.

Maybe people got used to computers being unreliable and unpredictable as the UIs we shipped became more distracting, less learnable, always shifting and hiding information, popping up suggestions and displaying non-deterministic-seeming behavior. We trained users to treat their devices like unruly animals that they can never quite trust. So now the idea of a machine that embodies a more clever (but still unreliable) animal to wrangle sounds like a clear upgrade.

But as someone who's spent an inordinate amount of time tweaking and tuning his computing environment to prune out flakey components and fine-tune bindings and navigation, the idea of integrating a tool into my workflow that does amazing things but fails utterly even 1% of the time sounds like a nightmare, a sort of perpetual torture of low-grade anxiety.

ksenzee5mo ago

> We trained users to treat their devices like unruly animals that they can never quite trust. So now the idea of a machine that embodies a more clever (but still unreliable) animal to wrangle sounds like a clear upgrade.

I wish I didn't agree with this, but I think you're exactly right. Even engineers dealing with systems we know are deterministic will joke about making the right sacrifices to the tech gods to get such-and-such working. Take that a step further and maybe it doesn't feel too bad to some people for the system to actually not be deterministic, if you have a way to "convince" it to do what you want. How depressing.

Eisenstein5mo ago

Software is only deterministic if the software it relies on never changes. Forced updates make this impossible, so treating software as deterministic is actually wrong.

seawatts5mo ago

This is incredible!

j / k navigate · click thread line to collapse

47 comments

red2awn5mo ago

> The model will respond with a JSON object that strictly follows your schema

Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.

WTF

osaariki5mo ago

1: https://guidance-ai.github.io/llguidance/llg-go-brrr 2: https://github.com/guidance-ai/llguidance

red2awn5mo ago

Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...

Gemini [0] is falsely advertising this:

> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.

[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...

top1aibooster5mo ago

If part of my system can't even manage to output JSON reliably, it needs way more "healing" than syntax munging. This comes across as naive.

Dylan168075mo ago

Plus, that claim isn't even true. A 1% and 2% JSON defect rate are going to annoy a similar amount of people into filing bugs and tickets.

fhcuvyxu5mo ago

1% fail rate on API requests is horrifyingly embarrassing.

0cf8612b2e1e5mo ago

Sounds like we are twice as close to AGI!

01HNNWZ0MV43FF5mo ago

"it's not just X, it's Y"

Don't you worry about Planet Express, let me worry about blank.

Spivak5mo ago

Model itself can't output JSON reliably. It's on you building a system around the model to make sure it either returns correct output or errors which is trivial to do.

arm325mo ago

But, but, you've just cut your defects, bugs, and support tickets in half!

wat100005mo ago

numlockedOP5mo ago

[0] https://dottxt.ai/

ViewTrick10025mo ago

I have used structured outputs both with OpenAI and the Gemini models. In the beginning they had some rough edges but lately it's been smooth sailing.

Seems like Openrouter also supports structured outputs.

https://openrouter.ai/docs/guides/features/structured-output...

xg155mo ago

red2awn5mo ago

Tokens are sampled from logits using the constraints after a normal forward pass. The forward pass is the expensive part of LLM inference which isn't affected by structured output.

1 more reply

wat100005mo ago

1 more reply

joshstrange5mo ago

gruez5mo ago

>What about XML? The plugin can heal XML output as well - contact us if you’d like access.

Isn't this exactly how we got weird html parsing logic in the first place, with "autohealing" logic for mismatched closing tags or quotes?

AlexCoventry5mo ago

    {"name": "Alice", "age": 30

lijok5mo ago

One of the best shitposts I have ever seen, by far. Absurdism taken to its finest form.

culi5mo ago

I did some searching for an open-source version of this and found this pretty neat library for Elixir called json_remedy

https://github.com/nshkrdotcom/json_remedy

oats5mo ago

Is this a joke? Am I going crazy?

blibble5mo ago

it does seem as if the world has gone insane

we have brilliant machines that can more or less work perfectly

then the scam artists have convinced people that spending a trillion dollar and terawatts to get essentially a biased random number generator to produce unusable garbage is somehow an improvement

Spivak5mo ago

no_wizard5mo ago

gavmor5mo ago

I don't know, I think it's pretty cool that we can turn arbitrary human speech into well-formed RPCs.

Eisenstein5mo ago

nubg5mo ago

Dear Openrouter blog authors, could you please stop writing your blogposts with LLMs?

The content of your posts is really insightful and interesting, but it's feel like junk quality because of the way LLMs write blogposts.

What was your prompt?

lab5mo ago

A lot of it was finger written -- curious which part sounded like LLM to you?

CallMeJim5mo ago

This sounds AI written.

nubg5mo ago

Meaning parts were LLM written? With no disclosure?

Sabinus5mo ago

"With no disclosure?"

2 more replies

re-thc5mo ago

Next up: blog healing

kgeist5mo ago

petesergeant5mo ago

I see responses here split into users who actually rely on JSON outputs, who are happy, and people who don't, who are being snippy. Thank you OpenRouter, this is a great feature.

impure5mo ago

I have built something similar before. But I’ve never had any problems with Gemini not doing Json properly. The problematic models are the open models such as Gemma and GPT OSS.

kristianp5mo ago

How do they know the output needs to be in json format?

stuaxo5mo ago

This is good, is there a python library to do this ?

idle_zealot5mo ago

ksenzee5mo ago

Eisenstein5mo ago

Software is only deterministic if the software it relies on never changes. Forced updates make this impossible, so treating software as deterministic is actually wrong.

seawatts5mo ago

This is incredible!

j / k navigate · click thread line to collapse