undefined | Better HN

0 pointspatrakov2y ago0 comments

Instructions:

1. Install llama.cpp from https://github.com/ggerganov/llama.cpp. Alternatively, install https://github.com/oobabooga/text-generation-webui 2. Go to https://huggingface.co/TheBloke and search for GGUF. Download and put the model file in the same directory. Then find the "example llama.cpp command line" and run without the "-ngl 35" switch.

However, at this point, if your laptop has at least 32 GB of RAM, there is no point in trying anything except Mixtral 8x7b and its fine-tunes. It is fast (4 tokens per second on an 8-core Ryzen without any GPU acceleration - which would not work on integrated Ryzen APUs anyway because they don't have dedicated VRAM) and provides answers only slightly worse than ChatGPT 3.5. Its main deficiency is the tendency to forget the initial instruction - for example, when asked to explain a particular SAMBA configuration file, it started OK, but then continued mentioning directives that were not in the file under discussion.

0 comments

akudha2y ago

Thank You. One more question - can this be used at work, internally? I work for a non-profit, not sure if that makes a difference

viraptor2y ago

From the work side, nobody can answer you apart from your work legal dept / manager.

From the licence side... read the licence and find out.

j / k navigate · click thread line to collapse