2Elusive order of async GPU kernels: scheduling, abstractions, DSL implications (opens in new tab)(ianbarber.blog)1matt_d1h ago0
3MileStone: A Multi-Objective Compiler Phase Ordering Framework (opens in new tab)(arxiv.org)1matt_d2h ago0
4SSV: Sparse Speculative Verification for Efficient LLM Inference (opens in new tab)(arxiv.org)4matt_d2d ago0
5Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection (opens in new tab)(arxiv.org)2matt_d2d ago0
6Characterization of machine learning compilers for LLM inference on NVIDIA GPUs (opens in new tab)(link.springer.com)3matt_d2d ago0
9Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel (opens in new tab)(arxiv.org)6matt_d3d ago0
10PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps (opens in new tab)(arxiv.org)1matt_d3d ago0
11CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs (opens in new tab)(arxiv.org)105matt_d4d ago12
12[RFC] Open Access to Standards Documents – LLVM Project (opens in new tab)(discourse.llvm.org)6matt_d4d ago0
14NanoTag: Systems Support for Efficient Byte-Granular Overflow Detection on Arm (opens in new tab)(github.com)2matt_d5d ago0
15InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents (opens in new tab)(inferencebench.ai)2matt_d5d ago0