2The Practitioner's Guide to the Maximal Update Parameterization (opens in new tab)(blog.eleuther.ai)1tipsytoad1y ago0
3DenseFormer: Enhancing Information Flow in Transformers (opens in new tab)(arxiv.org)123tipsytoad2y ago33