Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE) | Better HN
0 comments
No comments yet.
A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)
(opens in new tab)
(github.com)
1 points
starzmustdie
2mo ago
0 comments
Share