Contemplative LLMs (opens in new tab)

(maharshi.bearblog.dev)

26 pointszora_goron1y ago10 comments

10 comments

lsy1y ago

The title makes it sound like some new architecture, but this is a blog post where someone likes the results they get sometimes when they fiddle with their input to the LLM to suggest “contemplation”, which apparently makes the LLM generate a large paragraph of highly neurotic text before the answer. There aren’t benchmarks or investigation of the model to see whether it is robust or generalizable so it’s hard to say whether this is useful or not.

padolsey1y ago

Tbf have you read most of the academic papers on LLMs lately? It's all a lot of boilerplate academic language packaging around "we tried prompting it this way and it did good things". Tho yes I do appreciate some scientific prudence.

padolsey1y ago

Cool! I love seeing the very subtle emergent behaviours of different CoT approaches. I reckon people still don't fully appreciate the brittle artistic subtlety of trying to make something akin to logic emerge from these weird transformer machines.

> Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer

I suspect the author wisely avoided function-calling/JSON since it doesn't guarantee sequence. This–and a few other frailties–make me almost always use XML-like markup for my LLM API calls.

Markup langs like XML and HTML lend themselves quite beautifully to this task. They are stream-friendly, semantically enriched, leniently parseable (html was designed in part for fallible humans to write and for browsers to incrementally render) and by nature of being "markup" they are complementary to the autoregressive nature of LLMs. One assumes as well that tonnes of prose appears in HTML found in training corpuses, less so in JSON which is usually used for transactional data and RPC-like things, which must surely bias JSON completions to more robotic formations. FWIW I ended up creating a library (github.com/padolsey/xmllm) to help me get structured data from LLMs using XML (through the forgiving eyes of an HTML parser), so that I never have to rely on specific LLM tool/function-calling abstractions. Even tiny models like Qwen2.5 and Ministral3B have pretty superb (x|ht)ml compliance, much less so with JSON.

leobg1y ago

Good point with the HTML in the training data.

I always found the “structured output feature” thing odd. LLMs will follow any structure present in the prompt. So with a few few-shot examples, you have always been able to make them do anything - return JSON, Python function calls, or any syntax you come up with that works well for your task (ie streaming, separating thinking tokens from output, etc.).

All of that comes just with a little bit of overhead for the developer: Giving the examples, writing the parsing logic, and a strategy for the rare cases where the parsing may fail (retry, default value, prompt adaptation).

Once you embrace this approach, switching models is trivial. You can evaluate models with heavy RLHF like OpenAI’s against open models, base models, tiny models. Very easy to save a lot of money and achieve incredible speeds.

vunderba1y ago

Isn't this just a much larger "prompt based equivalent" of chain-of-reasoning systems like Qwq?

https://qwenlm.github.io/blog/qwq-32b-preview

throwaway3141551y ago

Yes, which is discussed several times in the article.

m3kw91y ago

If they were confident of the answer at first even when using contemplating as the context, why would they start saying “that doesn’t seem right”? And then re work the answer

pizza1y ago

It might be that the most obvious way to eventually get to the right answer in the long run might look like starting from the enticing, obvious, and wrong initial guess and then pivoting. It definitely seems like an easier behavior to emulate than always getting the answer right on the first try.

DrewADesign1y ago

I just really don’t like having to fight through a research tool’s Dunning-Kruger incompetence blindness, and I’m pretty sure most non-technical users don’t either. If you’ve got a built-in habit of academic skepticism for even the most confidently delivered information, and ideally at least a vague mental model of why that process works, then it’s a useful tool. That describes a portion of the population too small to justify spending the kind of money we’ve spent on generative AI. And beyond that, you have to know enough about what you’re asking it to realize when it’s full of shit. How much more does that reduce the useful use cases? At first, I thought “man it would be cool to give this thing access to all kinds of APIs so I could get news, weather info, transit info… but if I just have to double check everything it says to make sure it’s not just making its response up, what’s the use in that? I sure hope this all becomes a lot more reliable without me having to tell it to be, really soon.

riwsky1y ago

Pierre Menard, Author of the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” Paper

j / k navigate · click thread line to collapse

10 comments

lsy1y ago

padolsey1y ago

> Now, instead asking the model to respond in JSON we use XML tags to separate the start and end of the contemplation phase and the final answer

I suspect the author wisely avoided function-calling/JSON since it doesn't guarantee sequence. This–and a few other frailties–make me almost always use XML-like markup for my LLM API calls.

leobg1y ago

Good point with the HTML in the training data.

vunderba1y ago

Isn't this just a much larger "prompt based equivalent" of chain-of-reasoning systems like Qwq?

https://qwenlm.github.io/blog/qwq-32b-preview

throwaway3141551y ago

Yes, which is discussed several times in the article.

m3kw91y ago

If they were confident of the answer at first even when using contemplating as the context, why would they start saying “that doesn’t seem right”? And then re work the answer

pizza1y ago

DrewADesign1y ago

riwsky1y ago

Pierre Menard, Author of the “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” Paper

j / k navigate · click thread line to collapse