I used RL fine-tuning to make an LLM generate ugly and unpythonic FizzBuzz code (opens in new tab)

(seantey.github.io)

4 pointsseanrrr3mo ago1 comments

1 comments

I wrote up a blog post for a hackathon project where I used RL fine-tuning to make an LLM generate intentionally ugly and unpythonic FizzBuzz code. The post covers what I learned about reward shaping and GRPO. Feedback on the writing or content is welcome!

j / k navigate · click thread line to collapse

I used RL fine-tuning to make an LLM generate ugly and unpythonic FizzBuzz code (opens in new tab)

(seantey.github.io)

4 pointsseanrrr3mo ago1 comments

1 comments

seanrrrOP3mo ago

I wrote up a blog post for a hackathon project where I used RL fine-tuning to make an LLM generate intentionally ugly and unpythonic FizzBuzz code. The post covers what I learned about reward shaping and GRPO. Feedback on the writing or content is welcome!

j / k navigate · click thread line to collapse