Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
DeathArrow
4h ago
0 comments
Share
>This preview runs a 2B model
I guess with 1B or 500M model inference would be even faster?
0 comments
default
newest
oldest
gaeld
3h ago
In theory yes, although not in a linearly proportional way, because in practice our memory streaming is not yet perfect. There are still some fixed costs that we did not fully optimize (for now).
j
/
k
navigate · click thread line to collapse