2Understanding Emergent Abilities of Language Models from the Loss Perspective (opens in new tab)(arxiv.org)6maccaw1y ago1
8Rlaif: Scaling Reinforcement Learning from Human Feedback with AI Feedback (opens in new tab)(arxiv.org)1maccaw2y ago0