Building Meta's GenAI infrastructure (opens in new tab)

(engineering.fb.com)

664 pointsmootpt2y ago303 comments

303 comments

float8 got a mention! x2 more FLOPs! Also xformers has 2:4 sparsity support now so another x2? Is Llama3 gonna use like float8 + 2:4 sparsity for the MLP, so 4x H100 float16 FLOPs? Pytorch has fp8 experimental support, whilst attention is still complex to do in float8 due to precision issues, so maybe attention is in float16, and RoPE / layernorms in float16 / float32, whilst everything else is float8?

GamerAlias2y ago

I was thinking why is this one guy on HN so deeply interested and discussing technical details from a minor remark. Then I clocked the name. Great work on Gemma bugs

danielhanchen2y ago

Oh thanks :) I always like small details :)

andy992y ago

Is there float8 support in any common CPU intrinsics? It sounds interesting but curious what will be the impact if any on CPU inference.

teaearlgraycold2y ago

I’m curious if there’s a meaningful quality difference between float8 and some uint8 alternative (fixed precision or a look up table).

1 more reply

ashvardanian2y ago

Nope. Moreover, simulating it even with AVX-512 is quite an experience. Been postponing it for 2 years now... But first of all, you need to choose the version of float8 you want to implement, as the standards differ between GPU vendors.

1 more reply

ipsum22y ago

You're still bounded by memory bandwidth, so adding multiples to FLOPs is not going to give you a good representation of overall speedup.

jabl2y ago

Well, those smaller floats require less BW to transfer back and forth as well. Perhaps not a reduction linear in the size of the float, as maybe smaller floats require more iterations and/or more nodes in the model graph to get an equivalent result.

But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!

1 more reply

danielhanchen2y ago

I'm not sure exactly on how NVIDIA calculates FLOPs, but I do know for Intel's FLOPs, it's calculated from how many FMA units, how many loads can be done in tandem, and what the throughput is. And ye fp8 requires 2x less space. Sparse 2:4 might be less pronounced, since the matrix first needs to be constructed on the fly, and there is like a small matrix of indicator values.

j452y ago

Is it safe to assume this is the same float16 that exists in Apple m2 chips but not m1?

j452y ago

Clarification: bfloat16

“bfloat16 data type and arithmetic instructions (AI and others)”

https://eclecticlight.co/2024/01/15/why-the-m2-is-more-advan...

boywitharupee2y ago

care to explain why attention has precision issues with fp8?

danielhanchen2y ago

Oh so float8's L2 Norm from float32 is around I think 1e-4, whilst float16 is 1e-6. Sadly attention is quite sensitive. There are some hybrid methods which just before the attention kernel which is done in fp8, upcasts the Q and K from the RoPE kernel to become float16, then also leaves V to be in float8. Everything is done in fp8 on the fly, and the output is fp8. This makes errors go to 1e-6.

1 more reply

dougdonohoe2y ago

Having lived through the dot-com era, I find the AI-era slightly dispiriting because of the sheer capital cost of training models. At the start of the dot-com era, anyone could spin up an e-commerce site with relatively little infrastructure costs. Now, it seems, only the hyper-scale companies can build these AI models. Meta, Google, Microsoft, Open-AI, etc.

herval2y ago

I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time

Jensson2y ago

> I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

To make your own competing LLM today you need hundreds of millions of dollars, the "very expensive" of this is on a whole different level. You could afford the things you talked about on a software engineering salary, it would be a lot of money for that engineer but at least he could do it, no way anyone but a billionaire could fund a new competing LLM today.

2 more replies

renegade-otter2y ago

Not everything has to be AI. You can run a small business infra for MUCH less than you did back then, especially if you adjust for inflation (!).

Training AI models costs a fortune, but so far it's been just front-loading costs in hopes of a windfall. We'll see what actually happens.

boringg2y ago

Front loading costs to eventually extract rents on usage with one hell of a capital wall protecting the assets.

Its easier to spin up a business for sure -- also easier to unwind it - there not as sticky as they used to be.

2 more replies

andy992y ago

So far it's been pretty "democratic" - I feel in no way disadvantaged because I can't train a foundation model myself. Actually the ecosystem is a lot better than 25 years ago - there are open source (or source available) versions of basically everything you'd want to participate in modern AI/ML.

mewpmewp22y ago

But none of those are remotely as good as GPT4 for example.

1 more reply

nl2y ago

I too went through the dot com era: as in when Sun Microsystems had the tag line "we are the dot in dot com".

I assure you that before Apache and Linux took over that "dot" in the .com was not cheap!

Fortunately it only really lasted maybe 1993-1997 (I think Oracle announced Linux support in 1997, and that allowed a bunch of companies to start moving off Solaris).

But it wasn't until after the 2001 crash that people started doing sharded MySQL and then NoSQL to scale databases (when you needed it back then!).

It's early. You can do LORA training now on home systems, and for $500 you can rent enough compute to do even more meaningful fine-tuning. Lets see where we are in 5 and 10 years time.

(Provided the doomers don't get LLMs banned of course!)

danielhanchen2y ago

Another way to compete with the big tech incumbents is instead of hardware, try maths and software hacks to level the playing field! Training models is still black magic, so making it faster on the software side can solve the capital cost issue somewhat!

toxik2y ago

This kind of research is also incredibly capital intensive. You have to pay some of the smartest people around to work in it.

1 more reply

richardw2y ago

I find the market way more open and competitive than dot-com. Everyone is throwing up a chatbot or RAG solution. There are tradesmen and secretaries and infinite 19 year olds who are now able to wire together a no-code app or low-code bot and add value to real businesses. The hyper scalars are making some money but absolutely don't have this locked up. Any Groq or Mistral could wander in and eat their lunch, and we haven't really started the race yet. The next decade will be ridiculous.

infecto2y ago

Could not have said it better. Nobody has won the race yet and things are getting better. Building a foundation model is not cheap but not out of reach still for a startup.

danielmarkbruce2y ago

It's not quite the same thing. A model is just one part of a product. You can spin up a product with zero infra and calling APIs hosting models.

hackerlight2y ago

Foundation models != application layer. The question is whether the application layer's lunch will be eaten by better foundation models.

tdudhhu2y ago

As far as I know training is the main issue.

I don't know a lot about ML. Does anyone know if it is possible to keep training the system while it is running?

That would help a lot if you don't have the possibility to use huge training sets as a starting point.

xdeepak812y ago

Ads and Search engine uses a continuous incremental training to add the new relevant information.

mindwok2y ago

We will probably get there, it's just going to take time for hardware supply chains to catch up. I feel it's more comparable to mainframe eras - it took time for general purpose computing to become commoditised.

ZiiS2y ago

Only hyper-scale companies like ATT could build the fibre; scrappy startups like Google and Amazon ate their lunch.

usiaiekamm2y ago

They are also so far profitless (unless you are nvidia) and useless. The last gasp of an industry on its last legs.

rmbyrro2y ago

Fine-tuning is quite accessible for the average small business or hacker, though.

islewis2y ago

I know we won't get it this from FB, but I'd be really interested to see how the relationship of compute power to engineering hours scales.

They mention custom building as much as they can. If FB magically has the option to 10x the compute power, would they need to re-engineer the whole stack? What about 100x? Is each of these re-writes just a re-write, or is it a whole order of magnitude more complex?

My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?

bilekas2y ago

I'm not 100% sure but I would.make an educated guess that that cluster in the first image for example is a sample of scalable clusters, so throwing more hardware at it could bring improvements but sooner or later the cost to improvements will call for an optimization or rewrite as you call it, so a bit of both usually. It seems a bit of a balancing act really!

jvalencia2y ago

The cost of training quickly outpaces the cost of development as context length increases. So hardware is cheap until it isn't anymore, by orders of magnitude.

samstave2y ago

But there is still significant cost in the physical buildouts of new pods/DCs, whatever and the human engineering hours to physically build, even though its a mix of resources across the vendors and FB? - it still would be interesting to know man hours into the physical build of the HW.

tintor2y ago

"just a re-write"

mirekrusin2y ago

...the idea is that at some point it "just re-writes" itself.

1 more reply

jvanderbot2y ago

So, I'd love to work on optimizing pipelines like this. How does one "get into" it? It seems a ML scientist with some C/C++ and infra knowledge just dips down into the system when required? Or is it CUDA/SIMD experts who move "up" into ML?

thegginthesky2y ago

I know someone who works on this in Meta. His resume is computer science heavy, with a masters in Machine Learning. On the previous experience side, before getting into Meta, he had about a decade working as a Software Engineer with Machine Learning system in multiple languages, such as Go, C++ and Python.

To get the job he applied for a spot I'm Software Engineer applied in Machine Learning, he went through the multiple step interview process, and then when he got the job he did a few weeks of training and interviewing teams. One of the teams in charge of optimizing ML code in Meta picked him up and now he works there.

Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.

In sum:

- Get a formal education in the area - Get work experience somewhere - Apply for a big tech job in Software Engineer applied with ML - Hope they hire you and have a spot in one of the teams in charge of optimizing stuff

jvanderbot2y ago

This is helpful thank you. There's always some luck.

I have a PhD in CS, and lots of experience in optimization and some in throughput/speedups (in an amdahl sense) for planning problems. My biggest challenge is really getting something meaty with high constraints or large compute requirements. By the time I get a pipeline set up it's good enough and we move on. So it's tough to build up that skillset to get in the door where the big problems are.

KaiserPro2y ago

A lot of the optimisation at this level is getting data into the right place at the right time, without killing the network.

Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.

So you need a good scheduler, that understand dependencies (no, the k8s scheduler(s) are shit for this, plus it wont scale past 1k nodes without eating all of your network bandwidth), then you need a dataloader that can provide the dataset access, then you need the IPC that allows sharing/joining of GPUs together.

all of that needs to be wrapped up into a python interface that fairly simple to use.

Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.

the model stuff is the cherry on the top

claytonjy2y ago

can you say more about the network issues with thousands of k8s nodes? I'm regularly running 2-3000 nodes in a GKE cluster, majority have GPUs, is this something I need to be worrying about?

1 more reply

jvanderbot2y ago

Ok, but back to my main question, how do I get into this?

1 more reply

chillee2y ago

I work on PyTorch Compilers at Meta, and I think folks enter ML Systems from all directions :)

Some folks start with more familiarity in ML research and dip down as far as they need.

Other folks come from a traditional distributed systems/compilers/HPC background, and apply those skills to ML systems.

gajjanag2y ago

Our group works on some of this stuff at Meta, and we have a pretty good diversity of backgrounds - high performance computing (the bulk), computer systems, compilers, ML engineers, etc. We are hiring.

Feel free to DM me to learn more.

jvanderbot2y ago

I will, thank you. Any info is very helpful.

yalok2y ago

start with something small - take some kernel function in C, and try to optimize it for your laptops assembly SIMD instruction set.

fuddle2y ago

How much are they paying for H100's? If they are paying $10k: 350,000 NVIDIA H100 x $10k = $3.5b

trsohmers2y ago

Significantly more than that; MFN pricing for NVIDIA DGX H100 (which has been getting priority supply allocation, so many have been suckered into buying them in order to get fast delivery) is ~$309k, while a basically equivalent HGX H100 system is ~$250k, coming to a price per GPU at the full server level being ~$31.5k. With Meta’s custom OCP systems integrating the SXM baseboards from NVIDIA, my guess is that their cost per GPU would be in the ~$23-$25k range.

fuddle2y ago

350,000 NVIDIA H100 x $23k = $8b :0

1 more reply

bigcat123456782y ago

Would you kindly provide sources to the numbers? What is MFN?

Thanks! (Your number is consistent with what I hear of, but I never managed to get solid sources to back them up)

YetAnotherNick2y ago

> $3.5b

Which is a fourth of what they spent in VR/AR in a year. And Gen AI is something they could easily get more revenue as it has now become proven technology, and Meta could possibly leapfrog others because of the data moat.

dougb52y ago

Proven technology, maybe, but proven product-market fit for the kinds of things Facebook is using it for? Their linked blog about AI features gives examples "AI stickers" and image editing... cool, but are these potential multi-billion dollar lifts to their existing business? I guess I'm skeptical it's worthwhile unless they're able to unseat ChatGPT with a market-leading general purpose assistant.

2 more replies

NBJack2y ago

What moat exactly? Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well. I'm not even sure they can use historical data.

Meta certainly has an edge in engineer count, undoubtedly. But I'd say they really, really want the metaverse to succeed more to have their on walled garden (i.e. equivalent power of Apple and Google stores, etc.). There's a reason they gave a hard pass to a Google partnership.

3 more replies

vineyardmike2y ago

It’s often forgotten now, but just a few years NVidia was cancelled production batches and writing down inventory when the GPU shortage cleared. No one needed more GPUs. It also happens to be when Meta first announced they were going to increase CapEx spending on compute.

I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.

transcriptase2y ago

I don’t think it was that nobody needed GPUs. It was that nvidia thought they could get scalper margins by restricting supply after the shortage showed people were willing to pay scalper prices.

dekhn2y ago

That sounds like a reasonable budget for 3 years of hardware at a major AI company.

ZiiS2y ago

They may have to pay a premium to secure ~¼ of the output; certainly unlikely to be that steep a discount.

theptip2y ago

Semi analysis posted recently noting that Meta locked in these purchases a while ago; something like a year or more. So they probably didn’t pay today’s spot rate.

loeg2y ago

Yes, billions in GPU cap ex.

gingergoat2y ago

The article doesn't mention MTIA, meta's custom ASIC for training & inference acceleration. https://ai.meta.com/blog/meta-training-inference-accelerator...

I wonder if they will use it in RSC.

benreesman2y ago

I think it’s always useful to pay attention to the history on stuff like this and it’s a rare pleasure to be able to give some pointers in the literature along with some color to those interested from first-hand experience.

I’d point the interested at the DLRM paper [1]: that was just after I left and I’m sad I missed it. FB got into disagg racks and SDN and stuff fairly early, and we already had half-U dual-socket SKUs with the SSD and (increasingly) even DRAM elsewhere in the rack in 2018, but we were doing huge NNs for recommenders and rankers even for then. I don’t know if this is considered proprietary so I’ll play it safe and just say that a click-prediction model on IG Stories in 2018 was on the order of a modest but real LLM today (at FP32!).

The crazy part is they were HOGWILD trained on Intel AVX-2, which is just wild to think about. When I was screwing around with CUDA kernels we were time sharing NVIDIA dev boxes, typically 2-4 people doing CUDA were splitting up a single card as late as maybe 2016. I was managing what was called “IGML Infra” when I left and was on a first-name basis with the next-gen hardware people and any NVIDIA deal was still so closely guarded I didn’t hear more than rumors about GPUs for training let alone inference.

350k Hopper this year, Jesus. Say what you want about Meta but don’t say they can’t pour concrete and design SKUs on a dime: best damned infrastructure folks in the game pound-for-pound to this day.

The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.

[1] https://arxiv.org/pdf/1906.00091.pdf

[2] https://arxiv.org/pdf/2108.09373.pdf

[3] https://engineering.fb.com/2022/10/18/open-source/ocp-summit...

[4] https://youtu.be/lQlIwWVlPGo?si=rRbRUAXX7aM0UcVO

DEDLINE2y ago

I wonder if Meta would ever try to compete with AWS / MSFT / GOOG for AI workloads

lifeisstillgood2y ago

FB does not have the flywheel of running data centres - all three of those mentioned run hyper scale datacentres that they can then juice by “investing” billions in AI companies who then turn around and put those billions as revenue in the investors

OpenAI takes money from MSFT and buys Azure services

Anthropic takes Amazon money and buys AWS services (as do many robotics etc)

I am fairly sure it’s not illegal but it’s definitely low quality revenue

miohtama2y ago

Such barter deals were also popular during the 00s Internet Bubble.

Here more on the deals (2003):

https://www.cnet.com/tech/services-and-software/aol-saga-ope...

Popular names included AOL, Cisco, Yahoo, etc.

Not sure if Amazon’s term sheets driving high valuation are nothing but AWS credits (Amazon’s own license to print money).

1 more reply

woah2y ago

Sounds like it's free equity at the very least

1 more reply

vineyardmike2y ago

NVidia also invests in their AI customers.

1 more reply

itslennysfault2y ago

Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity. Meta could absolutely do the same, but in the short term, I think they find using that capacity more valuable than selling it.

1 more reply

virtuallynathan2y ago

Facebook has more datacenter space and power than Amazon, Google, and Microsoft -- possibly more than Amazon and Microsoft combined...

5 more replies

rthnbgrredf2y ago

Meta could build their own cloud offering. But it would take years to match the current existing offerings of AWS, Azure and GCP in terms of scale and wide range of cloud solutions.

Cthulhu_2y ago

And then there's sales. All of those three - and more you haven't considered, like the Chinese mega-IT companies - spend huge amounts on training, partnerships, consultancy, etc to get companies to use their services instead of their competitors. My current employer seems all-in on Azure, previous one was AWS.

There was one manager who worked at two large Dutch companies and sold AWS to them, as in, moving their entire IT, workloads and servers over to AWS. I wouldn't be surprised if there was a deal made there somewhere.

oblio2y ago

The real question is: why aren't they? They had the infrastructure needed to seed a cloud offering 10 years ago. Heck, if Oracle managed to be in 5th (6th? 7th?) place, Facebook for sure could have been a top 5 contender, at least.

3 more replies

bionhoward2y ago

aww, those existing offerings are overcomplicated as hell, a fresh look could yield substantially simpler cloud developer experience and this would compete well against those other cloud offerings on simplicity alone

redleader552y ago

For consumers, AI could just be stateless "micro service". Meta already has enough surfaces where customers can interact with AI.

crowcroft2y ago

I think Meta have avoided doing this because it would complicate their business priorities. They don’t really do B2B.

carlossouza2y ago

What do you mean by “they don’t do B2B”? They sell ads to companies, don’t they?

mjburgess2y ago

I'd be great if they could invest in an alternative to nvidia -- then, in one fell swoop, destroy the moats of everyone in the industry.

math_dandy2y ago

A company moving away from Nvidia/CUDA while the field is developing so rapidly would result in that company falling behind. When (if) the rate of progress in the AI space slows, then perhaps the big players will have the breathing room to consider rethinking foundational components of their infrastructure. But even at that point, their massive investment in Nvidia will likely render this impractical. Nvidia decisively won the AI hardware lottery, and that's why it's worth trillions.

whiplash4512y ago

People said the same thing when tensorflow was all the rage and pytorch was a side project.

Granted, HW is much harder than SW, but I would not discount Meta's ability to displace NVIDIA entirely.

1 more reply

mjburgess2y ago

I'm more concerned to avoid nvidia (et al.) market domination, than chasing the top-edge of the genAI benefits sigmoid. This will prevent much broad-based innovation.

1 more reply

paxys2y ago

Except that "one fell swoop" would realistically be 20+ years of research and development from the top minds in the semiconductor industry.

logicchains2y ago

It's not the hardware keeping NVidia ahead, it's the software. Hardware-wise AMD is competitive with NVidia, but their lack of a competitive CUDA alternative is hurting adoption.

brucethemoose22y ago

Facebook very specifically bought and customized Intel SKUs tailored for AI workloads for some time.

John238322y ago

https://engineering.fb.com/2023/10/18/ml-applications/meta-a...

aeyes2y ago

Isn't Google trying to do this with their TPUs?

crakenzak2y ago

I still, for the life of me, can't understand why Google doesn't just start selling their TPUs to everyone. Nvidia wouldn't be anywhere near their size if they only made H100s available through their DGX cloud, which is what Google is doing only making TPUs available through Google Cloud.

Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.

5 more replies

elwell2y ago

> Meta’s long-term vision is to build artificial general intelligence (AGI)

valzam2y ago

Don't worry, this goal will change with the next hype cycle

latchkey2y ago

I pity the fools that think AI is just another internet hype cycle.

1 more reply

hendersoon2y ago

350k H100 cards, around ten billion dollars just for the GPUs. Less if Nvidia gives a volume discount, which I imagine they do not.

renegade-otter2y ago

It will be ironic if Meta sinks all this money into the new trend and finds out later that it has been a huge boondoggle, just as publishers followed Facebook's "guidance" on video being the future, subsequently gutting the talent pool and investing into video production and staff - only to find out it was all a total waste.

motoxpro2y ago

It already paid off. When the world moved from determinisic to probablistic ad modeling. That's why their numbers are so good right now compared to every other advertiser

2 more replies

tayo422y ago

What does video not be in the future mean? In social media tiktok and reels are everywhere?

2 more replies

foobarian2y ago

There is still hope then for cheap gaming GPUs some day soon! I have pretty much the last 10 years of flagship releases to catch up on...

echelon2y ago

As a practitioner in the field, I can assure you this is not a boondoggle.

Those GPUs are going to subsume the entire music, film, and gaming industries. And that's just to start.

1 more reply

alexsereno2y ago

Honestly Meta is consistently one of the better companies at releasing tech stack info or just open sourcing, these kinds of articles are super fun

rshm2y ago

I think some elements of this stack might flow into the open compute.

adamnemecek2y ago

Do you find this informative?

alexsereno2y ago

Yes of course - it depends on what lens though. If you mean "I'm learning to build better from this" then no, but its very informative on Meta's own goals and mindset as well as real numbers that allow comparison to investment in other areas, etc. Also the point was mostly that Meta does publish a lot in the open - including actual open source tech stacks etc. They're reasonably good actors in this specific domain.

wseqyrku2y ago

> Commitment to open AI innovation

I see what you did there, Meta.

owenpalmer2y ago

Haha, I noticed that too xD

zone4112y ago

Meta is still playing catch-up. Might be hard to believe but according to Reuters they've been trying to run AI workloads mostly on CPUs until 2022 and they had to pull the plug on the first iteration of their AI chip.

https://www.reuters.com/technology/inside-metas-scramble-cat...

axpy9062y ago

Definitely has some pr buzz and flex in the article. Now I see why.

latchkey2y ago

> we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.

loeg2y ago

Yeah, and RoCE isn't single vendor. I'm not sure IB scales to the relevant cluster sizes, either.

anonymousDan2y ago

Is NVLink just not scalable enough here?

1 more reply

seydor2y ago

This is great news for Nvidia and their stock, but are they sure the LLMs and image models will scale indefinitely? nature and biology has a preference for sigmoids. What if we find out that AGI requries different kinds of cpu capabilities

jiggawatts2y ago

If anything, NVIDIA H100 GPUs are too general purpose! The optimal compute for AI training would be more specialised, but then would be efficient at only one NN architecture. Until we know what the best architecture is, the general purpose clusters remain a good strategy.

spencerchubb2y ago

All this compute and my Instagram Reels feed still isn't as good as my TikTok feed

zeroonetwothree2y ago

What does that have to do with Gen AI

lmm2y ago

If Gen AI doesn't have anything to do with "Meta"'s actual business then WTF are they setting all this money on fire for?

spencerchubb2y ago

GenAI infra is the same as regular AI infra. They used GenAI in the title because it's a buzzword.

2 more replies

mrkramer2y ago

"Share this: Hacker News" Noice

BonoboIO2y ago

I thought at first "what are you talking about", when i check my uBlock filters. Was blocking the whole "Share this" content section.

Sharing on Hacker News ... they now their audience.

mrkramer2y ago

I also use uBlock but my filters are the default ones and I saw it without any problem but tbh this is the first time that I saw some post on the Web have HN as a share option or the first time that I was surprised seeing it. Maybe it has something to do with Google ranking "trusted human information and knowledge" higher than "non-human" information and knowledge[0] or simply some Meta software engineer loves and uses HN so s/he decided to include HN as well, idk.

[0] https://news.ycombinator.com/item?id=39423949

pinko2y ago

The link mentions "our internal job scheduler" and how they had to optimize it for this work -- does anyone know what this job scheduler is called, or how it works?

KaiserPro2y ago

it might be twine: https://www.usenix.org/system/files/osdi20-tang.pdf

but I suspect its not that, because Twine is optimised for services rather than batch processing, and doesn't really have the concept of priorities.

radicality2y ago

I would think it’s probably that. Also, has this been renamed to Twine from Tupperware?

zerop2y ago

> At Meta, we handle hundreds of trillions of AI model executions per day

Such a large number, makes sense?

GeneralMayhem2y ago

Sure. 100T/day * 1day/86400sec ~= 1B/sec. They're probably considering at least a few hundred candidates per impression, and every impression is going to go through _at least_ two models (relevance and pCTR/revenue), so you could get there just with online serving at 5Mqps, which is plausible. But they're also going to be doing a lot of stuff in batch - spam predictions, ad budget forecasts, etc - so that every candidate actually runs through four or five different models, and every actual impression could do more than that.

sangnoir2y ago

How many ads does Meta serve a day, and how many AI model executions are done for each one? Repeat the same for stories, post and comment recommendations on Facebook and Instagram, and you have very big numbers. To that, Add VR, internal modeling and other backoffice/ offline analyses over billions of users and you'll easily get into the trillions.

dakiol2y ago

What's an "AI model execution"? When I ask something to ChatGPT and it answers to me, does that count as 1 "AI model execution" for OpenAI?

pants22y ago

Perhaps there's some combinatorics where every time an ad or post is displayed to the user, it runs through some hundreds/thousands of candidates and computes their relevance.

ilaksh2y ago

"Everything You Wanted to Know About GenAI at Meta, Except the One Thing You Honestly Care About" (Llama 3).

dekhn2y ago

it's really interesting just how similar these systems are to the designs adopted for HPC over the past few decades. I'm salty because it took a while for the ML community to converge on this (20+K GPUs connected by a real fabric with low latency and high bandwidth).

sashank_15092y ago

Metas backing itself into a corner with its admirable commitment to open source. Unfortunately, at some point when they decide to monetize their billions spent and try to release a closed source model, the level of vitriol they will deal with will be an order of magnitude above what even OpenAI is experiencing. I don’t think they realize that!

bigcat123456782y ago

Meta's commitment to Open Source is well under calculation.

OCP is a way to rally lower-tier vendors to form a semi-alliance to keep up with super-gorilla like AWS & Google.

LLaMA has already gained much more than its cost (look at the stock price, and the open source ecosystem built surrounding LLaMA, and Google's open source Gemma models which is a proof of Meta's success).

IMHO, Meta's Open Source strategy already covered at least 5 years in prospect. That's enough to finesse a 180 degree turn around if necessary (i.e., from open source to close source)

Horffupolde2y ago

The general public doesn’t care. Only developers.

marmaduke2y ago

Just for comparison, Swiss CSCS new Alps system will get 5k GH200 nodes (each with a H100).

dazhbog2y ago

Searched H100 and an Amazon link popped up. Good reviews.

https://www.amazon.com/Tesla-NVIDIA-Learning-Compute-Graphic...

mejutoco2y ago

Those reviews are hilarious

delanyoyoko2y ago

You've got to read "open" roughly 3x in a paragraph.

papichulo20232y ago

If they release models I dont care honestly, they can brag about that as much as they want.

lvl1022y ago

This reads more like a flex for the investment community.

codingjaguar2y ago

"By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s." This AI game is getting into a GPU war. Heard that Meta is pushing a lot of CPU wordloads to GPU to co-locate with model inference for infra simplicity.

delegate2y ago

Subtitled 'Here's what you'll never be able to do'.

froonly2y ago

lmfao at the Meta folks not giving any credit whatsoever to the company that actually came up with and implemented the infrastructure work.

jfkfif2y ago

What’s the company?

sangnoir2y ago

Facebook.

1 more reply

pwb252y ago

so tired of this, not everyone need to work with AI stuff. work on facebook that is a disaster page instead

sidcool2y ago

Those are some seriously great engineering numbers. Mera, with all the negative pressure it receives (rightfully so) is an engineering powerhouse.

But I do wonder how they foresee monetising this.

pedrovhb2y ago

Meta seems to actually be taking all the right steps in how they're contributing to open source AI research. Is this a "commodotize your complement" kind of situation?

CuriouslyC2y ago

Yann wants to be open and Mark seems happy to salt the earth.

torginus2y ago

I genuinely think one of the most plausible short-term dangers of AI is the creation of lifelike bots which will be absolutely indistinguishable from real humans in short-form online interaction.

Since people don't want to talk to algorithms, this would result in them shunning all social media, which is a huge danger to companies in the space.

bananabrick2y ago

What do you mean?

CuriouslyC2y ago

In pretty much every interview, Yann has talked about how important that AI infrastructure is open and distributed for the good of humanity, and how he wouldn't work for a company that wasn't open. Since Mark doesn't have an AI product to cannibalize, it's in his interest to devalue the AI products of others ("salting the earth").

2 more replies

choppaface2y ago

Total cluster they say will reach 350k H100, which at $30k street price is about $10b.

In contrast, Microsoft is spending over $10b per quarter capex on cloud.

That makes Zuck look conservative after his big loss on metaverse.

https://www.datacenterdynamics.com/en/news/q3-2023-cloud-res...

yuliyp2y ago

That's a weird comparison. The GPU is only a part of the capex: there's the rest of the servers and racks, the networking, as well as the buildings/cooling systems to support that.

KaiserPro2y ago

the biggest cost at meta is infra.

> In contrast, Microsoft is spending over $10b per quarter capex on cloud.

to service other people's work load. Its a different business.

baby2y ago

What loss lol. Stop the fud

Legend24402y ago

Has literally anyone spent money on the metaverse? Maybe it'll still take off in the future, but it's a $40b loss so far.

2 more replies

j / k navigate · click thread line to collapse

303 comments

danielhanchen2y ago

GamerAlias2y ago

I was thinking why is this one guy on HN so deeply interested and discussing technical details from a minor remark. Then I clocked the name. Great work on Gemma bugs

danielhanchen2y ago

Oh thanks :) I always like small details :)

andy992y ago

Is there float8 support in any common CPU intrinsics? It sounds interesting but curious what will be the impact if any on CPU inference.

teaearlgraycold2y ago

I’m curious if there’s a meaningful quality difference between float8 and some uint8 alternative (fixed precision or a look up table).

1 more reply

ashvardanian2y ago

1 more reply

ipsum22y ago

You're still bounded by memory bandwidth, so adding multiples to FLOPs is not going to give you a good representation of overall speedup.

jabl2y ago

But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!

1 more reply

danielhanchen2y ago

j452y ago

Is it safe to assume this is the same float16 that exists in Apple m2 chips but not m1?

j452y ago

Clarification: bfloat16

“bfloat16 data type and arithmetic instructions (AI and others)”

https://eclecticlight.co/2024/01/15/why-the-m2-is-more-advan...

boywitharupee2y ago

care to explain why attention has precision issues with fp8?

danielhanchen2y ago

1 more reply

dougdonohoe2y ago

herval2y ago

Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time

Jensson2y ago

2 more replies

renegade-otter2y ago

Not everything has to be AI. You can run a small business infra for MUCH less than you did back then, especially if you adjust for inflation (!).

Training AI models costs a fortune, but so far it's been just front-loading costs in hopes of a windfall. We'll see what actually happens.

boringg2y ago

Front loading costs to eventually extract rents on usage with one hell of a capital wall protecting the assets.

Its easier to spin up a business for sure -- also easier to unwind it - there not as sticky as they used to be.

2 more replies

andy992y ago

mewpmewp22y ago

But none of those are remotely as good as GPT4 for example.

1 more reply

nl2y ago

I too went through the dot com era: as in when Sun Microsystems had the tag line "we are the dot in dot com".

I assure you that before Apache and Linux took over that "dot" in the .com was not cheap!

Fortunately it only really lasted maybe 1993-1997 (I think Oracle announced Linux support in 1997, and that allowed a bunch of companies to start moving off Solaris).

But it wasn't until after the 2001 crash that people started doing sharded MySQL and then NoSQL to scale databases (when you needed it back then!).

It's early. You can do LORA training now on home systems, and for $500 you can rent enough compute to do even more meaningful fine-tuning. Lets see where we are in 5 and 10 years time.

(Provided the doomers don't get LLMs banned of course!)

danielhanchen2y ago

toxik2y ago

This kind of research is also incredibly capital intensive. You have to pay some of the smartest people around to work in it.

1 more reply

richardw2y ago

infecto2y ago

Could not have said it better. Nobody has won the race yet and things are getting better. Building a foundation model is not cheap but not out of reach still for a startup.

danielmarkbruce2y ago

It's not quite the same thing. A model is just one part of a product. You can spin up a product with zero infra and calling APIs hosting models.

hackerlight2y ago

Foundation models != application layer. The question is whether the application layer's lunch will be eaten by better foundation models.

tdudhhu2y ago

As far as I know training is the main issue.

I don't know a lot about ML. Does anyone know if it is possible to keep training the system while it is running?

That would help a lot if you don't have the possibility to use huge training sets as a starting point.

xdeepak812y ago

Ads and Search engine uses a continuous incremental training to add the new relevant information.

mindwok2y ago

ZiiS2y ago

Only hyper-scale companies like ATT could build the fibre; scrappy startups like Google and Amazon ate their lunch.

usiaiekamm2y ago

They are also so far profitless (unless you are nvidia) and useless. The last gasp of an industry on its last legs.

rmbyrro2y ago

Fine-tuning is quite accessible for the average small business or hacker, though.

islewis2y ago

I know we won't get it this from FB, but I'd be really interested to see how the relationship of compute power to engineering hours scales.

My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?

bilekas2y ago

jvalencia2y ago

The cost of training quickly outpaces the cost of development as context length increases. So hardware is cheap until it isn't anymore, by orders of magnitude.

samstave2y ago

tintor2y ago

"just a re-write"

mirekrusin2y ago

...the idea is that at some point it "just re-writes" itself.

1 more reply

jvanderbot2y ago

thegginthesky2y ago

Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.

In sum:

jvanderbot2y ago

This is helpful thank you. There's always some luck.

KaiserPro2y ago

A lot of the optimisation at this level is getting data into the right place at the right time, without killing the network.

Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.

all of that needs to be wrapped up into a python interface that fairly simple to use.

Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.

the model stuff is the cherry on the top

claytonjy2y ago

can you say more about the network issues with thousands of k8s nodes? I'm regularly running 2-3000 nodes in a GKE cluster, majority have GPUs, is this something I need to be worrying about?

1 more reply

jvanderbot2y ago

Ok, but back to my main question, how do I get into this?

1 more reply

chillee2y ago

I work on PyTorch Compilers at Meta, and I think folks enter ML Systems from all directions :)

Some folks start with more familiarity in ML research and dip down as far as they need.

Other folks come from a traditional distributed systems/compilers/HPC background, and apply those skills to ML systems.

gajjanag2y ago

Feel free to DM me to learn more.

jvanderbot2y ago

I will, thank you. Any info is very helpful.

yalok2y ago

start with something small - take some kernel function in C, and try to optimize it for your laptops assembly SIMD instruction set.

fuddle2y ago

How much are they paying for H100's? If they are paying $10k: 350,000 NVIDIA H100 x $10k = $3.5b

trsohmers2y ago

fuddle2y ago

350,000 NVIDIA H100 x $23k = $8b :0

1 more reply

bigcat123456782y ago

Would you kindly provide sources to the numbers? What is MFN?

Thanks! (Your number is consistent with what I hear of, but I never managed to get solid sources to back them up)

YetAnotherNick2y ago

> $3.5b

dougb52y ago

2 more replies

NBJack2y ago

3 more replies

vineyardmike2y ago

I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.

transcriptase2y ago

I don’t think it was that nobody needed GPUs. It was that nvidia thought they could get scalper margins by restricting supply after the shortage showed people were willing to pay scalper prices.

dekhn2y ago

That sounds like a reasonable budget for 3 years of hardware at a major AI company.

ZiiS2y ago

They may have to pay a premium to secure ~¼ of the output; certainly unlikely to be that steep a discount.

theptip2y ago

Semi analysis posted recently noting that Meta locked in these purchases a while ago; something like a year or more. So they probably didn’t pay today’s spot rate.

loeg2y ago

Yes, billions in GPU cap ex.

gingergoat2y ago

The article doesn't mention MTIA, meta's custom ASIC for training & inference acceleration. https://ai.meta.com/blog/meta-training-inference-accelerator...

I wonder if they will use it in RSC.

benreesman2y ago

The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.

[1] https://arxiv.org/pdf/1906.00091.pdf

[2] https://arxiv.org/pdf/2108.09373.pdf

[3] https://engineering.fb.com/2022/10/18/open-source/ocp-summit...

[4] https://youtu.be/lQlIwWVlPGo?si=rRbRUAXX7aM0UcVO

DEDLINE2y ago

I wonder if Meta would ever try to compete with AWS / MSFT / GOOG for AI workloads

lifeisstillgood2y ago

OpenAI takes money from MSFT and buys Azure services

Anthropic takes Amazon money and buys AWS services (as do many robotics etc)

I am fairly sure it’s not illegal but it’s definitely low quality revenue

miohtama2y ago

Such barter deals were also popular during the 00s Internet Bubble.

Here more on the deals (2003):

https://www.cnet.com/tech/services-and-software/aol-saga-ope...

Popular names included AOL, Cisco, Yahoo, etc.

Not sure if Amazon’s term sheets driving high valuation are nothing but AWS credits (Amazon’s own license to print money).

1 more reply

woah2y ago

Sounds like it's free equity at the very least

1 more reply

vineyardmike2y ago

NVidia also invests in their AI customers.

1 more reply

itslennysfault2y ago

1 more reply

virtuallynathan2y ago

Facebook has more datacenter space and power than Amazon, Google, and Microsoft -- possibly more than Amazon and Microsoft combined...

5 more replies

rthnbgrredf2y ago

Meta could build their own cloud offering. But it would take years to match the current existing offerings of AWS, Azure and GCP in terms of scale and wide range of cloud solutions.

Cthulhu_2y ago

oblio2y ago

3 more replies

bionhoward2y ago

redleader552y ago

For consumers, AI could just be stateless "micro service". Meta already has enough surfaces where customers can interact with AI.

crowcroft2y ago

I think Meta have avoided doing this because it would complicate their business priorities. They don’t really do B2B.

carlossouza2y ago

What do you mean by “they don’t do B2B”? They sell ads to companies, don’t they?

mjburgess2y ago

I'd be great if they could invest in an alternative to nvidia -- then, in one fell swoop, destroy the moats of everyone in the industry.

math_dandy2y ago

whiplash4512y ago

People said the same thing when tensorflow was all the rage and pytorch was a side project.

Granted, HW is much harder than SW, but I would not discount Meta's ability to displace NVIDIA entirely.

1 more reply

mjburgess2y ago

I'm more concerned to avoid nvidia (et al.) market domination, than chasing the top-edge of the genAI benefits sigmoid. This will prevent much broad-based innovation.

1 more reply

paxys2y ago

Except that "one fell swoop" would realistically be 20+ years of research and development from the top minds in the semiconductor industry.

logicchains2y ago

It's not the hardware keeping NVidia ahead, it's the software. Hardware-wise AMD is competitive with NVidia, but their lack of a competitive CUDA alternative is hurting adoption.

brucethemoose22y ago

Facebook very specifically bought and customized Intel SKUs tailored for AI workloads for some time.

John238322y ago

https://engineering.fb.com/2023/10/18/ml-applications/meta-a...

aeyes2y ago

Isn't Google trying to do this with their TPUs?

crakenzak2y ago

Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.

5 more replies

elwell2y ago

> Meta’s long-term vision is to build artificial general intelligence (AGI)

valzam2y ago

Don't worry, this goal will change with the next hype cycle

latchkey2y ago

I pity the fools that think AI is just another internet hype cycle.

1 more reply

hendersoon2y ago

350k H100 cards, around ten billion dollars just for the GPUs. Less if Nvidia gives a volume discount, which I imagine they do not.

renegade-otter2y ago

motoxpro2y ago

It already paid off. When the world moved from determinisic to probablistic ad modeling. That's why their numbers are so good right now compared to every other advertiser

2 more replies

tayo422y ago

What does video not be in the future mean? In social media tiktok and reels are everywhere?

2 more replies

foobarian2y ago

There is still hope then for cheap gaming GPUs some day soon! I have pretty much the last 10 years of flagship releases to catch up on...

echelon2y ago

As a practitioner in the field, I can assure you this is not a boondoggle.

Those GPUs are going to subsume the entire music, film, and gaming industries. And that's just to start.

1 more reply

alexsereno2y ago

Honestly Meta is consistently one of the better companies at releasing tech stack info or just open sourcing, these kinds of articles are super fun

rshm2y ago

I think some elements of this stack might flow into the open compute.

adamnemecek2y ago

Do you find this informative?

alexsereno2y ago

wseqyrku2y ago

> Commitment to open AI innovation

I see what you did there, Meta.

owenpalmer2y ago

Haha, I noticed that too xD

zone4112y ago

https://www.reuters.com/technology/inside-metas-scramble-cat...

axpy9062y ago

Definitely has some pr buzz and flex in the article. Now I see why.

latchkey2y ago

> we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.

loeg2y ago

Yeah, and RoCE isn't single vendor. I'm not sure IB scales to the relevant cluster sizes, either.

anonymousDan2y ago

Is NVLink just not scalable enough here?

1 more reply

seydor2y ago

jiggawatts2y ago

spencerchubb2y ago

All this compute and my Instagram Reels feed still isn't as good as my TikTok feed

zeroonetwothree2y ago

What does that have to do with Gen AI

lmm2y ago

If Gen AI doesn't have anything to do with "Meta"'s actual business then WTF are they setting all this money on fire for?

spencerchubb2y ago

GenAI infra is the same as regular AI infra. They used GenAI in the title because it's a buzzword.

2 more replies

mrkramer2y ago

"Share this: Hacker News" Noice

BonoboIO2y ago

I thought at first "what are you talking about", when i check my uBlock filters. Was blocking the whole "Share this" content section.

Sharing on Hacker News ... they now their audience.

mrkramer2y ago

[0] https://news.ycombinator.com/item?id=39423949

pinko2y ago

The link mentions "our internal job scheduler" and how they had to optimize it for this work -- does anyone know what this job scheduler is called, or how it works?

KaiserPro2y ago

it might be twine: https://www.usenix.org/system/files/osdi20-tang.pdf

but I suspect its not that, because Twine is optimised for services rather than batch processing, and doesn't really have the concept of priorities.

radicality2y ago

I would think it’s probably that. Also, has this been renamed to Twine from Tupperware?

zerop2y ago

> At Meta, we handle hundreds of trillions of AI model executions per day

Such a large number, makes sense?

GeneralMayhem2y ago

sangnoir2y ago

dakiol2y ago

What's an "AI model execution"? When I ask something to ChatGPT and it answers to me, does that count as 1 "AI model execution" for OpenAI?

pants22y ago

Perhaps there's some combinatorics where every time an ad or post is displayed to the user, it runs through some hundreds/thousands of candidates and computes their relevance.

ilaksh2y ago

"Everything You Wanted to Know About GenAI at Meta, Except the One Thing You Honestly Care About" (Llama 3).

dekhn2y ago

sashank_15092y ago

bigcat123456782y ago

Meta's commitment to Open Source is well under calculation.

OCP is a way to rally lower-tier vendors to form a semi-alliance to keep up with super-gorilla like AWS & Google.

IMHO, Meta's Open Source strategy already covered at least 5 years in prospect. That's enough to finesse a 180 degree turn around if necessary (i.e., from open source to close source)

Horffupolde2y ago

The general public doesn’t care. Only developers.

marmaduke2y ago

Just for comparison, Swiss CSCS new Alps system will get 5k GH200 nodes (each with a H100).

dazhbog2y ago

Searched H100 and an Amazon link popped up. Good reviews.

https://www.amazon.com/Tesla-NVIDIA-Learning-Compute-Graphic...

mejutoco2y ago

Those reviews are hilarious

delanyoyoko2y ago

You've got to read "open" roughly 3x in a paragraph.

papichulo20232y ago

If they release models I dont care honestly, they can brag about that as much as they want.

lvl1022y ago

This reads more like a flex for the investment community.

codingjaguar2y ago

delegate2y ago

Subtitled 'Here's what you'll never be able to do'.

froonly2y ago

lmfao at the Meta folks not giving any credit whatsoever to the company that actually came up with and implemented the infrastructure work.

jfkfif2y ago

What’s the company?

sangnoir2y ago

Facebook.

1 more reply

pwb252y ago

so tired of this, not everyone need to work with AI stuff. work on facebook that is a disaster page instead

sidcool2y ago

Those are some seriously great engineering numbers. Mera, with all the negative pressure it receives (rightfully so) is an engineering powerhouse.

But I do wonder how they foresee monetising this.

pedrovhb2y ago

Meta seems to actually be taking all the right steps in how they're contributing to open source AI research. Is this a "commodotize your complement" kind of situation?

CuriouslyC2y ago

Yann wants to be open and Mark seems happy to salt the earth.

torginus2y ago

I genuinely think one of the most plausible short-term dangers of AI is the creation of lifelike bots which will be absolutely indistinguishable from real humans in short-form online interaction.

Since people don't want to talk to algorithms, this would result in them shunning all social media, which is a huge danger to companies in the space.

bananabrick2y ago

What do you mean?

CuriouslyC2y ago

2 more replies

choppaface2y ago

Total cluster they say will reach 350k H100, which at $30k street price is about $10b.

In contrast, Microsoft is spending over $10b per quarter capex on cloud.

That makes Zuck look conservative after his big loss on metaverse.

https://www.datacenterdynamics.com/en/news/q3-2023-cloud-res...

yuliyp2y ago

That's a weird comparison. The GPU is only a part of the capex: there's the rest of the servers and racks, the networking, as well as the buildings/cooling systems to support that.

KaiserPro2y ago

the biggest cost at meta is infra.

> In contrast, Microsoft is spending over $10b per quarter capex on cloud.

to service other people's work load. Its a different business.

baby2y ago

What loss lol. Stop the fud

Legend24402y ago

Has literally anyone spent money on the metaverse? Maybe it'll still take off in the future, but it's a $40b loss so far.

2 more replies

j / k navigate · click thread line to collapse