Skip to content

Top New Best Ask Show Jobs

thw20 | Better HN

thw20

1 karmaJoined April 24, 20249 submissions

Recent submissions

1

Simple, zero overhead way to compress model, KV cache via Low-Rank Decomposition (opens in new tab)

(jeffreywong20.github.io)

1thw2013d ago0

2

Towards understanding multiple attention sinks in LLMs (opens in new tab)

(github.com)

1thw202mo ago2

3

The Existence and Behavior of Secondary Attention Sinks (opens in new tab)

(arxiv.org)

1thw203mo ago0