matt_d on Hacker News

1

You don't need all the LLM benchmarks (opens in new tab)

(alex.smola.org)

1matt_d26m ago0

2

Elusive order of async GPU kernels: scheduling, abstractions, DSL implications (opens in new tab)

(ianbarber.blog)

1matt_d1h ago0

3

MileStone: A Multi-Objective Compiler Phase Ordering Framework (opens in new tab)

(arxiv.org)

1matt_d2h ago0

4

SSV: Sparse Speculative Verification for Efficient LLM Inference (opens in new tab)

(arxiv.org)

4matt_d2d ago0

5

Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection (opens in new tab)

(arxiv.org)

2matt_d2d ago0

6

Characterization of machine learning compilers for LLM inference on NVIDIA GPUs (opens in new tab)

(link.springer.com)

3matt_d2d ago0

7

Chip design from the bottom up – Reiner Pope [video] (opens in new tab)

(youtube.com)

2matt_d3d ago0

8

LT2: Linear-Time Looped Transformers (opens in new tab)

(charlesdddd.github.io)

2matt_d3d ago0

9

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel (opens in new tab)

(arxiv.org)

6matt_d3d ago0

10

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps (opens in new tab)

(arxiv.org)

1matt_d3d ago0

11

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs (opens in new tab)

(arxiv.org)

105matt_d4d ago12

12

[RFC] Open Access to Standards Documents – LLVM Project (opens in new tab)

(discourse.llvm.org)

6matt_d4d ago0

13

Curly braces: An evolution of UNIX and C (opens in new tab)

(thalia.dev)

6matt_d4d ago2

14

NanoTag: Systems Support for Efficient Byte-Granular Overflow Detection on Arm (opens in new tab)

(github.com)

2matt_d5d ago0

15

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents (opens in new tab)

(inferencebench.ai)

2matt_d5d ago0

matt_d

Recent submissions

You don't need all the LLM benchmarks (opens in new tab)

Elusive order of async GPU kernels: scheduling, abstractions, DSL implications (opens in new tab)

MileStone: A Multi-Objective Compiler Phase Ordering Framework (opens in new tab)

SSV: Sparse Speculative Verification for Efficient LLM Inference (opens in new tab)

Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection (opens in new tab)

Characterization of machine learning compilers for LLM inference on NVIDIA GPUs (opens in new tab)

Chip design from the bottom up – Reiner Pope [video] (opens in new tab)

LT2: Linear-Time Looped Transformers (opens in new tab)

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel (opens in new tab)

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps (opens in new tab)

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs (opens in new tab)

[RFC] Open Access to Standards Documents – LLVM Project (opens in new tab)

Curly braces: An evolution of UNIX and C (opens in new tab)

NanoTag: Systems Support for Efficient Byte-Granular Overflow Detection on Arm (opens in new tab)

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents (opens in new tab)

Recent submissions

You don't need all the LLM benchmarks (opens in new tab)

Elusive order of async GPU kernels: scheduling, abstractions, DSL implications (opens in new tab)

MileStone: A Multi-Objective Compiler Phase Ordering Framework (opens in new tab)

SSV: Sparse Speculative Verification for Efficient LLM Inference (opens in new tab)

Characterizing Real-World Bugs in Tile Programs for Automated Bug Detection (opens in new tab)

Characterization of machine learning compilers for LLM inference on NVIDIA GPUs (opens in new tab)

Chip design from the bottom up – Reiner Pope [video] (opens in new tab)

LT2: Linear-Time Looped Transformers (opens in new tab)

Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel (opens in new tab)

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Apps (opens in new tab)

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs (opens in new tab)

[RFC] Open Access to Standards Documents – LLVM Project (opens in new tab)

Curly braces: An evolution of UNIX and C (opens in new tab)

NanoTag: Systems Support for Efficient Byte-Granular Overflow Detection on Arm (opens in new tab)

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents (opens in new tab)