1When Models Manipulate Manifolds: The Geometry of a Counting Task (opens in new tab)(transformer-circuits.pub)41wheel5mo ago0
4Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (opens in new tab)(transformer-circuits.pub)1681wheel1y ago124
5The Claude 3 Model Family: Opus, Sonnet, Haiku [pdf] (opens in new tab)(www-cdn.anthropic.com)331wheel2y ago3
7Patchscopes: A Unifying Framework for Inspecting Hidden Representations of LMs (opens in new tab)(pair-code.github.io)21wheel2y ago0
8Do Machine Learning Models Memorize or Generalize? (opens in new tab)(pair.withgoogle.com)4541wheel2y ago210
9An interactive introduction to grokking and mechanistic interpretability (opens in new tab)(pair.withgoogle.com)11wheel2y ago0
10From Confidently Incorrect Models to Humble Ensembles (opens in new tab)(pair.withgoogle.com)11wheel3y ago1
12Searching for Unintended Biases with Saliency (opens in new tab)(pair.withgoogle.com)21wheel3y ago1
13Interactive Visualizations of Federated Learning (opens in new tab)(pair.withgoogle.com)11wheel3y ago0
15It’s Not Spider-Man’s Fault: Why Best Picture Winners Aren’t Hits Anymore (opens in new tab)(roadtolarissa.com)21wheel4y ago0