2Prompt caching but for RL – 7.5x speedup on long-prompt/short-response workloads (opens in new tab)(castform.com)4kumama14d ago0
3Pokegents: Making multi-agent coding feel like a team (opens in new tab)(castform.com)8kumama17d ago1
4Grpo explained: group relative policy optimization for LLM finetuning (opens in new tab)(cgft.io)1kumama1mo ago0
9Show HN: Benchmax, a new open-source RL environment framework for LLM finetuning (opens in new tab)(github.com)1kumama10mo ago0
10Beating o3/o4-mini with Codebase-specific Reinforcement Learning (opens in new tab)(cgft.io)3kumama11mo ago0
11We might be overestimating coding agent performance on SWE-Bench (opens in new tab)(cgft.io)1kumama1y ago1
12How to Improve Code Completion LLMs with Repo-Specific Finetuning (opens in new tab)(cgft.io)3kumama1y ago1
13Show HN: Free AI Code Completion for Xcode with model choice/codebase context (opens in new tab)(cgft.io)2kumama1y ago0