undefined | Better HN

0 pointsmsp262y ago0 comments

>This works well for me except the 15B+ don't run fast enough on a 4090

I assume quantized models will run a lot better. TheBloke already seems like he's on it.

0 comments

Unfortunately what I tested was StarCoder 4bit. We really need exllama which should make even 30b viable from what I can tell.

Because codellama is llama based it may just work possibly?

j / k navigate · click thread line to collapse

Unfortunately what I tested was StarCoder 4bit. We really need exllama which should make even 30b viable from what I can tell.

Because codellama is llama based it may just work possibly?

j / k navigate · click thread line to collapse