Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
FergusArgyll
2y ago
0 comments
Share
You shouldn't have to quantize it that much, maybe you're running a lot of other programs while running inference?
Also, try using pure llama.cpp, AFAIK it's the least possible overhead
0 comments
default
newest
oldest
regularfry
2y ago
Getting more value out of phi-2-sized models is where you really want to be on lower-end M1's.
j
/
k
navigate · click thread line to collapse