undefined | Better HN

0 pointsImnimo1y ago0 comments

Here's an example of language switching:

https://gr.inc/question/although-a-few-years-ago-the-fundame...

In the dropdown set to DeepSeek-R1, switch to the LIMO model (which apparently has a high frequency of language switching).

I'm not sure about examples of gibberish or totally illegible reasoning. My guess is that since R1-Zero still had the KL penalty, it should all be somewhat legible - the KL penalty encourages the model to not move too far from what the base model would say in any given context.

0 comments

jstanley1y ago

Thanks, that's cool to see. I hadn't seen this site before but browsing around I also found this example: https://gr.inc/question/why-does-the-professor-say-this-good... - also with LIMO.

pizza1y ago

Seems like if you want to stay in the same language, you could just add a verifiable rewards term for that w/o having to fully load up on the baggage of a base model KL penalty.

kcorbitt1y ago

Yep. And tbh you probably don't even have to do this; the R1 paper found that just running SFT the base model with a relatively small number of monolingual reasoning traces was enough for it to get the idea and iirc they didn't even bother selecting for language specifically in the RL training looop itself.

j / k navigate · click thread line to collapse