undefined | Better HN

0 pointsdanielbln9mo ago0 comments

If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch, or rather you want to catch that before it happens and deal with it. Either way, it's not a silent error that goes unnoticed usually, what makes you think that?

0 comments

diggan9mo ago

> If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch

Not every endpoint works the same way, I'm pretty sure LM Studio's OpenAI-compatible endpoints will silently (from the clients perspective) truncate the context, rather than throw an error. It's up to the client to make sure the context fits in those cases.

OpenAI's own endpoints do show an error and refuses if you exceed the context length though. I think I've seen others use the "finish_reason" attribute too to signal the context length was exceeded, rather than setting an error status code on the response.

Overall, even "OpenAI-compatible" endpoints often aren't 100% faithful reproductions of the OpenAI endpoints, sadly.

danielblnOP9mo ago

That seems like terrible API design to just truncate without telling the caller. Anthropic, Google and OpenAI all will fail very loudly if you exceed the context window, and that's how it should be. But fair enough, this shouldn't happen anyway and the context should be actively handled before it blows up either way.

diggan9mo ago

> That seems like terrible API design to just truncate without telling the caller

Agree, confused me a lot the first time I encountered it.

It would be great if implementations/endpoints could converge, but with OpenAI moving to the Responses API rather than ChatCompletion, yet the rest of the ecosystem seemingly still implementing ChatCompletion with various small differences (like how to do structured outputs), it feels like it's getting further away, not closer...

dist-epoch9mo ago

It's complicated, for example some models (o3) will throw an error if you set temperature.

What do you do if you want to support multiple models in your LLM gateway? Do you throw an error if a user sets temperature for o3, thus dumping the problem on them? Or just ignore it, but potentially creating confusion because temperature will seem to not work for some models?

1 more reply

CraigJPerry9mo ago

Interesting, the completion return object is documented but theres no error or exception field. In practice the only errors ive seen so far have been on the HTTP transport layer.

It would make sense to me for the chat context to raise an exception. Maybe i should read the docs further…

j / k navigate · click thread line to collapse

0 comments

diggan9mo ago

> If you exceed the context window the remote LLM endpoint will throw you an error which you probably want to catch

Overall, even "OpenAI-compatible" endpoints often aren't 100% faithful reproductions of the OpenAI endpoints, sadly.

danielblnOP9mo ago

diggan9mo ago

> That seems like terrible API design to just truncate without telling the caller

Agree, confused me a lot the first time I encountered it.

dist-epoch9mo ago

It's complicated, for example some models (o3) will throw an error if you set temperature.

1 more reply

CraigJPerry9mo ago

Interesting, the completion return object is documented but theres no error or exception field. In practice the only errors ive seen so far have been on the HTTP transport layer.

It would make sense to me for the chat context to raise an exception. Maybe i should read the docs further…

j / k navigate · click thread line to collapse