charles_irl on Hacker News

1

Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint (opens in new tab)

(modal.com)

91charles_irl7d ago18

2

How to Achieve Serverless GPUs (opens in new tab)

(modal.com)

8charles_irl13d ago0

3

Three types of LLM workloads and how to serve them (opens in new tab)

(modal.com)

75charles_irl4mo ago5

4

Host overhead is killing your inference efficiency (opens in new tab)

(modal.com)

3charles_irl6mo ago0

5

Quantized Float Exposed (opens in new tab)

(quant.exposed)

2charles_irl6mo ago1

6

Against SQL (2021) (opens in new tab)

(scattered-thoughts.net)

82charles_irl7mo ago77

7

Length-extension attacks are still a thing (opens in new tab)

(00f.net)

2charles_irl7mo ago1

8

The future of Python web services looks GIL-free (opens in new tab)

(blog.baro.dev)

3charles_irl7mo ago0

9

Lexical differential highlighting instead of syntax highlighting (opens in new tab)

(wordsandbuttons.online)

2charles_irl7mo ago0

10

CReact – JSX for the Cloud (opens in new tab)

(github.com)

1charles_irl7mo ago0

11

QUIC and the end of TCP sockets (opens in new tab)

(codemia.io)

62charles_irl7mo ago82

12

In C++ modules globally unique module names seem to be unavoidable (opens in new tab)

(nibblestew.blogspot.com)

2charles_irl7mo ago0

13

Stupid jj Tricks (opens in new tab)

(andre.arko.net)

3charles_irl7mo ago0

14

We reverse-engineered Flash Attention 4 (opens in new tab)

(modal.com)

5charles_irl8mo ago0

15

A Tour of eBPF in the Linux Kernel: Observability, Security and Networking (opens in new tab)

(lucavall.in)

2charles_irl8mo ago0

charles_irl

Recent submissions

Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint (opens in new tab)

How to Achieve Serverless GPUs (opens in new tab)

Three types of LLM workloads and how to serve them (opens in new tab)

Host overhead is killing your inference efficiency (opens in new tab)

Quantized Float Exposed (opens in new tab)

Against SQL (2021) (opens in new tab)

Length-extension attacks are still a thing (opens in new tab)

The future of Python web services looks GIL-free (opens in new tab)

Lexical differential highlighting instead of syntax highlighting (opens in new tab)

CReact – JSX for the Cloud (opens in new tab)

QUIC and the end of TCP sockets (opens in new tab)

In C++ modules globally unique module names seem to be unavoidable (opens in new tab)

Stupid jj Tricks (opens in new tab)

We reverse-engineered Flash Attention 4 (opens in new tab)

A Tour of eBPF in the Linux Kernel: Observability, Security and Networking (opens in new tab)

Recent submissions

Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint (opens in new tab)

How to Achieve Serverless GPUs (opens in new tab)

Three types of LLM workloads and how to serve them (opens in new tab)

Host overhead is killing your inference efficiency (opens in new tab)

Quantized Float Exposed (opens in new tab)

Against SQL (2021) (opens in new tab)

Length-extension attacks are still a thing (opens in new tab)

The future of Python web services looks GIL-free (opens in new tab)

Lexical differential highlighting instead of syntax highlighting (opens in new tab)

CReact – JSX for the Cloud (opens in new tab)

QUIC and the end of TCP sockets (opens in new tab)

In C++ modules globally unique module names seem to be unavoidable (opens in new tab)

Stupid jj Tricks (opens in new tab)

We reverse-engineered Flash Attention 4 (opens in new tab)

A Tour of eBPF in the Linux Kernel: Observability, Security and Networking (opens in new tab)