He speaks very unclearly, instead of saying GPT-4-turbo he says 4.5 preview. 4.5 is invention of his.
Also mixtral medium - no idea of what he means by that.
Not to mention a claim that mixtral is as good as gpt-4. It’s on the quality of gpt3.5 at best, which is still amazing for an open source model, but a year behind openai
Mistral-medium is a model that mistral serves only via API since it's a prototype model. It hasn't been released yet and it's bigger than the mixtral-8x7b model
Sorry, but there's little that's unclear about what he said.
"mixtral medium" is just a typo: he means mistral-medium.
And GPT 4.5 is certainly not an "invention of his". Whether it exists or not (which is debatable, OpenAI said it was just mentioned in a GPT 4 hallutination and caught on), it' s a version name thrown around for like a month in forums, blog posts, news articles and such.
I just spoke all night to 8x7B and can say that it sucks much less than 3.5. It doesn’t screw up and apologize all the time (and screw up again) and doesn’t repeat what I just said verbatim. That is on topics I have a decent expertise in myself. Never had this experience of periodically forgetting that it’s not a human company with 3.5.
Local setup, “text generation webui”, TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF (Q4_K_M) on HF. You can run it on a decent intel cpu, takes around 32.5GB of ram including os (8gb for me). GPU with tensorcores can speed up few layers if you have one, but isn’t required. I get around 2.5-3 t/s with 8700 and 4070ti, that’s enough for chats that require some thinking.
Edit: I was using 2k window, a larger one would probably eat more ram. But even with 2k it didn’t feel like it loses context or something.
For macOS and Linux, Ollama is probably the easiest way to try Mixtral (and a large number of models) locally. LM Studio is also nice and available for Mac, Windows, and Linux.
As these models can be quite large and memory intensive, if you want to just give it a quick spin, huggingface.co/chat, chat.nbox.ai, and labs.pplx.ai all have Mixtral hosted atm.
I think with Mixtral Medium they mean MoE 2x13B which is on top on huggingface leaderboard? It is still not close to 8x175B, but size alone is not most important factor. With smarter training methods and data it is possible we will see performance similar to gpt-4 in open source mixture of experts of smaller sizes.