undefined | Better HN

0 pointscyanf1y ago0 comments

They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.

0 comments

jpgvm1y ago

Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.

j / k navigate · click thread line to collapse

0 pointscyanf1y ago0 comments

They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.

jpgvm1y ago

Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.

j / k navigate · click thread line to collapse