One of the questions I've been thinking about a lot looking at the past year of interpretability research is just how much of what we are finding is "what we're attuned to find" as opposed to "what's actually there."
Are we only measuring the tip of the iceberg, and have coalesced towards getting better at iceberg tip measuring?