undefined | Better HN

0 pointsmark_l_watson2y ago0 comments

How did you run it? Are there model files in Ollama format? Are you running on NVidia or Apple Silicon?

EDIT: just saw this “ Megatron (1, 2, and 3) is a large, powerful transformer developed by the Applied Deep Learning Research team at NVIDIA.”

0 comments

4 comments · 3 top-level

brucethemoose22y ago· 1 in thread

My recommendation is:

- Exui with exl2 files on good GPUs.

- Koboldcpp with gguf files for small GPUs and Apple silicon.

There are many reasons, but in a nutshell they are the fastest and most VRAM efficient.

I can fit 34Bs with about 75K context on a single 24GB 3090 before the quality drop from quantization really starts to get dramatic.

Thanks! I will check out Koboldcpp.

In the textgeneration web ui on NVidia gpu

your edit is entirely unrelated to this topic

j / k navigate · click thread line to collapse