1StreamIndex: Memory-bounded compressed sparse attention via streaming top-k (opens in new tab)(arxiv.org)4OsamaJaber2d ago0
3DeepSeek V4's indexer dies at 65K. We got it to 1M on 6GB (opens in new tab)(arxiv.org)5OsamaJaber12d ago0
4AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search (opens in new tab)(arxiv.org)4OsamaJaber16d ago0
5DeepSeek V4's indexer OOMs at 65K context. We got it to 1M in 6G (opens in new tab)(arxiv.org)8OsamaJaber20d ago0
6Ouroboros: Dynamic Weight Generation for Recursive Transformers (opens in new tab)(arxiv.org)2OsamaJaber1mo ago0
7Tide: Token-Informed Depth Execution for Per-Token Early Exit in LLM Inference (opens in new tab)(arxiv.org)3OsamaJaber1mo ago1