undefined | Better HN

0 pointsmeasurablefunc2mo ago0 comments

There is no RL for programming languages. Especially ones w/ no significant amount of code.

0 comments

nl2mo ago

I guess the op was implying that is something fixable fairly easily?

(Which is true - it's easy to prompt your LLM with the language grammar, have it generate code and then RL on that)

Easy in the sense of "it is only having enough GPUs to RL a coding capable LLM" anyway.

If you can generate code from the grammar then what exactly are you RLing? The point was to generate code in the first place so what does backpropagation get you here?

nl2mo ago

Post RL you won't need to put the grammar in the prompt anymore.

1 more reply

thorum2mo ago

Go read the DeepSeek R1 paper

measurablefuncOP2mo ago

Why would I do that? If you know something then quote the relevant passage & equation that says you can train code generators w/ RL on a novel language w/ little to no code to train on. More generally, don't ask random people on the internet to do work for you for free.

thorum2mo ago

Your other comment sounded like you were interested in learning about how AI labs are applying RL to improve programming capability. If so, the DeepSeek R1 paper is a good introduction to the topic (maybe a bit out of date at this point, but very approachable). RL training works fine for low resource languages as long as you have tooling to verify outputs and enough compute to throw at the problem.

2 more replies

whimsicalism2mo ago

well, that’s one way to react to being provided with interesting reading material.

1 more reply

whimsicalism2mo ago

not even wrong

measurablefuncOP2mo ago

Exactly.

j / k navigate · click thread line to collapse

0 comments

nl2mo ago

I guess the op was implying that is something fixable fairly easily?

(Which is true - it's easy to prompt your LLM with the language grammar, have it generate code and then RL on that)

Easy in the sense of "it is only having enough GPUs to RL a coding capable LLM" anyway.

measurablefuncOP2mo ago

If you can generate code from the grammar then what exactly are you RLing? The point was to generate code in the first place so what does backpropagation get you here?

nl2mo ago

Post RL you won't need to put the grammar in the prompt anymore.

1 more reply

thorum2mo ago

Go read the DeepSeek R1 paper

measurablefuncOP2mo ago

thorum2mo ago

2 more replies

whimsicalism2mo ago

well, that’s one way to react to being provided with interesting reading material.

1 more reply

whimsicalism2mo ago

not even wrong

measurablefuncOP2mo ago

Exactly.

j / k navigate · click thread line to collapse