But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!
“bfloat16 data type and arithmetic instructions (AI and others)”
https://eclecticlight.co/2024/01/15/why-the-m2-is-more-advan...
Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time
To make your own competing LLM today you need hundreds of millions of dollars, the "very expensive" of this is on a whole different level. You could afford the things you talked about on a software engineering salary, it would be a lot of money for that engineer but at least he could do it, no way anyone but a billionaire could fund a new competing LLM today.
Training AI models costs a fortune, but so far it's been just front-loading costs in hopes of a windfall. We'll see what actually happens.
Its easier to spin up a business for sure -- also easier to unwind it - there not as sticky as they used to be.
I assure you that before Apache and Linux took over that "dot" in the .com was not cheap!
Fortunately it only really lasted maybe 1993-1997 (I think Oracle announced Linux support in 1997, and that allowed a bunch of companies to start moving off Solaris).
But it wasn't until after the 2001 crash that people started doing sharded MySQL and then NoSQL to scale databases (when you needed it back then!).
It's early. You can do LORA training now on home systems, and for $500 you can rent enough compute to do even more meaningful fine-tuning. Lets see where we are in 5 and 10 years time.
(Provided the doomers don't get LLMs banned of course!)
I don't know a lot about ML. Does anyone know if it is possible to keep training the system while it is running?
That would help a lot if you don't have the possibility to use huge training sets as a starting point.
They mention custom building as much as they can. If FB magically has the option to 10x the compute power, would they need to re-engineer the whole stack? What about 100x? Is each of these re-writes just a re-write, or is it a whole order of magnitude more complex?
My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?
To get the job he applied for a spot I'm Software Engineer applied in Machine Learning, he went through the multiple step interview process, and then when he got the job he did a few weeks of training and interviewing teams. One of the teams in charge of optimizing ML code in Meta picked him up and now he works there.
Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.
In sum:
- Get a formal education in the area - Get work experience somewhere - Apply for a big tech job in Software Engineer applied with ML - Hope they hire you and have a spot in one of the teams in charge of optimizing stuff
I have a PhD in CS, and lots of experience in optimization and some in throughput/speedups (in an amdahl sense) for planning problems. My biggest challenge is really getting something meaty with high constraints or large compute requirements. By the time I get a pipeline set up it's good enough and we move on. So it's tough to build up that skillset to get in the door where the big problems are.
Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.
So you need a good scheduler, that understand dependencies (no, the k8s scheduler(s) are shit for this, plus it wont scale past 1k nodes without eating all of your network bandwidth), then you need a dataloader that can provide the dataset access, then you need the IPC that allows sharing/joining of GPUs together.
all of that needs to be wrapped up into a python interface that fairly simple to use.
Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.
the model stuff is the cherry on the top
Some folks start with more familiarity in ML research and dip down as far as they need.
Other folks come from a traditional distributed systems/compilers/HPC background, and apply those skills to ML systems.
Feel free to DM me to learn more.
Thanks! (Your number is consistent with what I hear of, but I never managed to get solid sources to back them up)
Which is a fourth of what they spent in VR/AR in a year. And Gen AI is something they could easily get more revenue as it has now become proven technology, and Meta could possibly leapfrog others because of the data moat.
Meta certainly has an edge in engineer count, undoubtedly. But I'd say they really, really want the metaverse to succeed more to have their on walled garden (i.e. equivalent power of Apple and Google stores, etc.). There's a reason they gave a hard pass to a Google partnership.
I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.
I wonder if they will use it in RSC.
I’d point the interested at the DLRM paper [1]: that was just after I left and I’m sad I missed it. FB got into disagg racks and SDN and stuff fairly early, and we already had half-U dual-socket SKUs with the SSD and (increasingly) even DRAM elsewhere in the rack in 2018, but we were doing huge NNs for recommenders and rankers even for then. I don’t know if this is considered proprietary so I’ll play it safe and just say that a click-prediction model on IG Stories in 2018 was on the order of a modest but real LLM today (at FP32!).
The crazy part is they were HOGWILD trained on Intel AVX-2, which is just wild to think about. When I was screwing around with CUDA kernels we were time sharing NVIDIA dev boxes, typically 2-4 people doing CUDA were splitting up a single card as late as maybe 2016. I was managing what was called “IGML Infra” when I left and was on a first-name basis with the next-gen hardware people and any NVIDIA deal was still so closely guarded I didn’t hear more than rumors about GPUs for training let alone inference.
350k Hopper this year, Jesus. Say what you want about Meta but don’t say they can’t pour concrete and design SKUs on a dime: best damned infrastructure folks in the game pound-for-pound to this day.
The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.
[1] https://arxiv.org/pdf/1906.00091.pdf
[2] https://arxiv.org/pdf/2108.09373.pdf
[3] https://engineering.fb.com/2022/10/18/open-source/ocp-summit...
OpenAI takes money from MSFT and buys Azure services
Anthropic takes Amazon money and buys AWS services (as do many robotics etc)
I am fairly sure it’s not illegal but it’s definitely low quality revenue
Here more on the deals (2003):
https://www.cnet.com/tech/services-and-software/aol-saga-ope...
Popular names included AOL, Cisco, Yahoo, etc.
Not sure if Amazon’s term sheets driving high valuation are nothing but AWS credits (Amazon’s own license to print money).
There was one manager who worked at two large Dutch companies and sold AWS to them, as in, moving their entire IT, workloads and servers over to AWS. I wouldn't be surprised if there was a deal made there somewhere.
Granted, HW is much harder than SW, but I would not discount Meta's ability to displace NVIDIA entirely.
Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.
Those GPUs are going to subsume the entire music, film, and gaming industries. And that's just to start.
I see what you did there, Meta.
https://www.reuters.com/technology/inside-metas-scramble-cat...
Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.
Sharing on Hacker News ... they now their audience.
but I suspect its not that, because Twine is optimised for services rather than batch processing, and doesn't really have the concept of priorities.
Such a large number, makes sense?
Meta's commitment to Open Source is well under calculation.
OCP is a way to rally lower-tier vendors to form a semi-alliance to keep up with super-gorilla like AWS & Google.
LLaMA has already gained much more than its cost (look at the stock price, and the open source ecosystem built surrounding LLaMA, and Google's open source Gemma models which is a proof of Meta's success).
IMHO, Meta's Open Source strategy already covered at least 5 years in prospect. That's enough to finesse a 180 degree turn around if necessary (i.e., from open source to close source)
https://www.amazon.com/Tesla-NVIDIA-Learning-Compute-Graphic...
But I do wonder how they foresee monetising this.
Since people don't want to talk to algorithms, this would result in them shunning all social media, which is a huge danger to companies in the space.
In contrast, Microsoft is spending over $10b per quarter capex on cloud.
That makes Zuck look conservative after his big loss on metaverse.
https://www.datacenterdynamics.com/en/news/q3-2023-cloud-res...
> In contrast, Microsoft is spending over $10b per quarter capex on cloud.
to service other people's work load. Its a different business.