Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
pclmulqdq
2y ago
0 comments
Share
They are putting the whole LLM into SRAM across multiple computing chips, IIRC. That is a very expensive way to go about serving a model, but should give pretty great speed at low batch size.
0 comments
No comments yet.