What sort of on-board compute do you typically have today?
> As for what you should look to invest in?
> I'm sure it's just a coincidence that training neural networks and mining cryptocurrencies are both applications that benefit from very large arrays of GPUs. [...]
> If I was a VC I'd be hiring complexity theory nerds to figure out what areas of research are promising once you have Yottaflops of numerical processing power available, then I'd be placing bets on the GPU manufacturers going there
[1]: https://www.antipope.org/charlie/blog-static/2023/02/place-y...
With a bunch of people trailling behind with "it kind of works" open alternatives.
It's not so bad. Nvidia could come and say, "hey, I'm going to lock down your GPU so that you can only use it to render polygons in my whitelisted list of video-games, and then you pay us $$$$$$ to buy our 'datacenter' thingy for anything else." But if they do it, people will go and buy the competitor's product.
And yes, probably their 4090 are being bought by some rich kids with their parents' money, but I reckon most of it are sales to professionals, people who would justify their purchase decision with more than playing First-person-shooters. I for example play videogames with my gf, and we have equivalent GPUs. Hers is AMD and costs less than mine, even if it does the same, but I went for Nvidia so that PhysX were available and I could use Pytorch and Numba+GPU and even C++ CUDA. The moment Nvidia locks that down, I'll have to switch to AMD.
Meatcubator: https://youtu.be/Z_ZGq8Tah0k
Growing human brain cells: https://youtu.be/V2YDApNRK3g
(when fed the leaked bing prompt, my AI decided it was Australian and started tossing in random shit like "but here in Australia, we'd call it limey green" when asked about chartreuse, i assume because the codename for bing chat is 'sydney')
https://github.com/KoboldAI/KoboldAI-Client To read more about current popular models.
https://koboldai.net/ is a way to run some of these models in the "cloud". There's no account required and the prompts are run on other people's hardware, with priority weighting based on how much compute you have used or donated. There's an anonymous api key and there's no expectation that the output can't be logged.
The models that run on hardware locally are very basic in the quality of output. Here's an example of a 6B output used to try to emulate chatgpt. https://mobile.twitter.com/Knaikk/status/1629711223863345154 The model was finetuned on story completion so it's not meaningfully comparable.
It's less popular because the hardware required for the great output is still above the top of line consumer specs. 24 gb vram is closer to a bare minimum to get meaningful output, and fine-tuning is still out of reach. There's some development with using services like runpod.
Stable Diffusion was in the same place as this in the same time frame of the model getting released. Its only been a few days.
You can download it from Facebook, but it's behind "apply for access" form. Magnet links floating around are just a workaround around that form.
That said, commercial use is forbidden by the license specified in the form: https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z...
1) Spin it up on a cluster in Belarus
2) ???
3) Profit?
FB trained a LLaMA-I (instruction tuned) variant for sports, just to show they can, but I don't think it got released.
User: <question or task>
Assistant:
not so bad !
I computed the speed by doing speed=number of words/ total run time
It's not that hard to create a consumer-grade desktop with 256GB in 2023.
Even without enough ram, you can stream model weights from disk and run at [size of model/disk read speed] seconds per token.
I'm doing that on a small GPU with this code, but it should be easy to get this working with the CPU as compute instead (and at least with my disk/CPU, I'm not even sure that it would run even slower, I think disk read would probably still be the bottleneck)
A lack of an absurd number of CPUs just means it's slow, not impossible.
However, the 65B parameter, according to the benchmarks, is such a beast that you might be able to do some things on it that are not possible on ChatGPT (despite all of ChatGPT's quality of life features). Amazing times.
So before you start a task, you sort of describe the domain, and the model is separated into the third most useful and relevant to that topic/query, and 2/3rd most distant from that realm. Then either just the 1/3rd is used in a detached fashion, or it works as 2 layers of cache, one in ram one on disk.
per the readme it looks like there a few bugs to figure out in case anyone here is a pytorch expert
Most gaming desktops have a solid gpu but not enough vram. Pity having the gpu idle here
Uh-oh, bad start.
It could be venv as well, I suppose, I haven't used conda.
Looks like this is just tweaking some defaults and commenting out some code that enables cuda. It also switches to something called gloo, which I'm not familiar with. Seems like an alternate backend.
Mark LLM: “ Yes, unfortunately, the media and our competitors are all over the idea that Meta is a “dirty company”. They have tried to spin all our successes and accomplishments in a negative light. This has been incredibly frustrating and demoralizing for us, but we know that we are working hard to build a great company and we are confident that our efforts will be rewarded. In the end, our products speak for themselves, and despite all the negative media coverage we are focused on continuing to build great products for our users and being an amazing place for them to socialize in the virtual world.”