CUDA Is Still a Giant Moat for Nvidia | Better HN

CUDA Is Still a Giant Moat for Nvidia | Better HN

111 comments

rikafurude212y ago

Anyone following geohots current tinygrad struggles has seen this proven right in front of them. AMD gpus are practically unusable for any serious ML work and he had to learn it the hard way, having dropped 100K into AMD gpus, assuming the drivers would work and if not, he even personally offered to fix them.

ghxst2y ago

Not just geohot but AMD has a really amazing opportunity here to work with many other talented engineers that are more than willing to put resources into this if they had good documentation and tools to work with, it's really puzzling to me that AMD isn't taking this opportunity more seriously.

A shortened version of what was happening: the firmware / hardware is so bad that instead of fixing it the AMD team just added some restarts when the AMD card locks up (which happens all the time with computatiins), and even those restarts don't work, the whole computer had to be restarted.

This serious bug was open since May and AMD doesn't seem to respond as seriously as it should be.

KennyBlanken2y ago

Is this the same geohot that 9+ months ago declared he was "done with AMD"?

Isn't geohot infamous for stealing other people's work?

PBCAK?

That said, ROCm only officially supports a fraction of its product line, and an odd smattering throughout at that. It's a joke compared to CUDA which will run on damn near anything. And AMD has a long, long history of dogshit drivers (at least on Windows.)

AMD just doesn't seem to give enough of a shit to invest money into securing top talent for this, and NVIDIA will continue to stomp them.

Yeah, the same guy that was going to single handedly fix Twitter's search for Elon and resigned after 4 weeks saying there was nothing he could do.

justinclift2y ago

> Isn't geohot infamous for stealing other people's work?

Are you meaning the Sony Playstation hacking where they took legal action against him, or are you meaning other stuff?

That same bug is still open, not fixed. Azure announced access to AMD GPU cloud with NDA, but the cards are unusable for compute work as they lock up randomly.

Buttons8402y ago

I saw him on Twitch today in passing, the title was about "ripping <something> out of AMD drivers" or similar, so it seems he's still at it.

i67vw32y ago

AMD is a deeply unserious company. They could have made boatload of money for shareholders like Nvidia did, but the AMD management looks very bad to me.

Shareholders of AMD should look into it and do some firings of top Executives/CEO until morale improves.

I think the problem AMD has is that they just don't have enough engineers and can't hire more because nvidia (an to a lesser extent Apple, AWS, Google and Microsoft) just gobbles up all people who have any experience with this sort of thing.

A long time ago AMD decided to 100% focus on budget consumer graphics (including consoles), that decision was the right decision at the time. However being in low-margin business it seems they don't have the people (or the budget to last-minute hire) to pump out the R&D for a generic neural network platform without moving people away from their consumer graphics division.

jgord2y ago

I dont understand this - arent almost all ML NN models built in pytorch, and arent these compiled / jit'd into a lower level format - and can we not have various backends/drivers for that, such as CUDA / ROCM / vnni ?

The article is unsatisfying because it doesnt explain WHY cuda reigns supreme.

One hypothesis put forward is that the main alternative ROCM is just not very complete and not very fast - thats a good argument.

Another hypothesis that is not considered is : CUDA reigns supreme, because NVIDIA GPUs reign supreme.

But people dont write CUDA code .. they write pytorch code ?!

dannyw2y ago

Nobody else seems to be willing to invest serious funding, including market rates for SWEs, into compelling alternatives. I believe AMD's TC for senior software engineers tops out at 200k in the Bay Area.

The problems you generally experience are:

  * Inexplicably poor performance
  * Poor (and sometimes incorrect) documentation
  * Difficulties debugging
  * Crashes and hangs

aurareturn2y ago

Why is this? AI is going to a multi trillion market. I can't think of anything else bigger except maybe electricity, real estate, and food.

If I'm AMD, I'd spend at least $1 billion/year figuring out the software side.

I can't think of an easier way for AMD to return value to shareholders than eroding CUDA advantage.

Heck, Meta invested something like $100b on VR so far and VR is not nearly the market that AI is.

No device support...

I started playing around with porting some CUDA code to ROCm/HIP on a Ryzen laptop APU I had. While an "unsupported" configuration (which was understood), it all worked until AMD suddenly and explicitly blocked the ability to run on APU's. Currently the only way to get back to work on that project on that particular computer would be to run a closed-source patched driver from some rando on the internet. Needless to say, I lost interest.

Last I checked, there were only 7 consumer SKU's that could run AMD's current compute stack, the oldest being 1 generation old. Even among the enterprise hardware they only support ~2 generations back. So you can't even grab some old cheap recycled gear on e-bay to hack on their ecosystem.

Meanwhile, I can pull anything with an NVIDIA logo on it from a junkyard it'll happily run CUDA code that I wrote for the 8800GTX 15+ years ago.

I'm an AI compiler engineer and AMDs hiring process was ... Non-competitive. Companies are hiring left and right at a fast clip and heres AMD wanting you to fly out in a month. I love their CPUs but... Come on. You gotta be serious to compete

You really mean TC and not base salary? That’s shockingly bad.

mike_hearn2y ago

They do write CUDA code, oh boy do they ever. PyTorch is just a coordinator for CUDA or sometimes Metal kernels. New AI architectures and algorithms often end up needing a new or tuned kernel. Look at Flash Attention for an example of one of those that had a big impact.

The tooling around ROCm is not as good (debuggers, profilers etc), and at least in my tangential experience (that is, involving GPGPU computation, but not for ML), custom operations are faster when written in CUDA code than in a high level Python wrapper (or, for that matter, using tools like OpenMP). Just as we write all our actually performance demanding code in C/C++, we write all our performance sensitive GPU code in CUDA (and obviously, performance is the entire point of putting in the effort to write GPU code).

pjmlp2y ago

The world of GPU programming is more than just PyTorch for starters.

Then there is the quality of hardware, debugging tools, IDE support, supported languages (again isn't only PyTorch), and libraries.

Yeah, on paper there is in reality there isn't.

aws_ls2y ago

I wonder where does Mojo (new programming language by Chris Lattner's company) fit in all this? Their promise is to be a super-set of Python (like C++ was to C) and resolve all hardware interface issues.

I know its still in development. But curious to know if someone has played around with it for the kind of needs discussed on this page.

j-wangOP2y ago

> I dont understand this - arent almost all ML NN models built in pytorch, and arent these compiled / jit'd into a lower level format - and can we not have various backends/drivers for that, such as CUDA / ROCM / vnni ?

PyTorch already does. But if you're saying "NN" and "pytorch" that already means you're outside of the audience for CUDA I'm talking about in the article. My own stuff was usually Bayesian Hierarchical Models, which at least at the time made pytorch completely useless (that was nearly a decade ago though—maybe that specific use case improved).

If you've tried to write actually new (or different enough) NNs or entirely different models, pytorch is too high-level, and sometimes even TF is too. Even aside from that, if you're a maintainer of BLAS or some specific library for sparse MM with very specific distributions that are optimized for it...

Anyway, those are the key cases, but even aside from that, if you've ever tried even with some higher-level libraries to do non-vanilla stuff, nothing works as well as it should. You get random, inscrutable errors that certainly do exist on NVIDIA GPUs/stuff-based-on-CUDA-under-the-hood, but way way fewer. For newer, custom stuff, getting things like numerical overflows or other completely breaking problems on alternative backends, but don't happen / work just fine on CPU or CUDA backend is not really that uncommon. Or the CUDA backend is just ridiculously faster. If you're doing something annoying, new, and complicated enough, there's no point in taking the aggravation.

The people who write the stuff that is used in PyTorch or other libraries definitely write CUDA code (in C++ etc). And then the people who use PyTorch just build on top of that.

I deliberately tried to keep it accessible and have non-technical (or just non-software) audiences also be able to get an intuition for why CUDA has such strong lock-in. Otherwise, the pushback I've often gotten "just re-write it" or "it's just software" which if it were so simple, people wouldn't need to be yelling so much at AMD across so many comments. Basically, people who can't fathom why software technical debt can ever be a thing. Or, if it is, China has infinite money and time anyway.

A high-level analysis should say that Huawei, AMD, and Intel all should easily invest enough to make this all work and compete with CUDA to push their hardware platforms. The reality is decentralized decision-making from users also makes it more of an expensive, uncertain bet that people will adopt. A bunch of the lower-level, underlying libraries that things are built on AND the researchers who do bleeding-edge research still have a huge amount of experience in and stuff built on CUDA.

fancyfredbot2y ago

I am not sure CUDA is the moat, but yes, software is the moat.

To first order nobody writes any CUDA, and even if you do you are probably bad at it. The language is slightly easier to use than openCL but writing really performant code is still a nightmare (a pipeline of asynchronous memory copies from global to shared memory is not easy to program but this is a requirement for full performance on tensor cores).

So no, the moat really isn't the language. It's not even the libraries, it's the integration of the libraries into third party software like pytorch, jax, etc. This is the truly massive advantage NVIDIA has, and they got it by being early and by being installed in an awful lot of machines.

danielscrubs2y ago

”To first order nobody writes any CUDA and even if you do you are probably bad at it” is such an anti-intellectual stance that is repeated to such a large extent that it irks me. It’s the authors protecting their ego and is said about everything they don’t understand. It is said about compilers, about static typing, about pretty anything the authors do not yet know.

At least say why people wouldn’t be good at it. The documentation is poor, the GPUs are a black box or anything in that vein. Then they can help you learn instead of preemptively dismiss it.

david-gpu2y ago

I used to work at NVidia on the design of their tensor cores. As you can imagine, I had to be rather familiar with various kinds of high performance kernels that people are talking about in this thread.

I second the GP: nobody in their right mind would try to compete with the performance or functionality of libraries like cuDNN/ or cuBLAS.

NVidia pays for an army of exceptionally skilled folks to write these high performance kernels, working hand in hand with the architects that design the hardware, and with access to various sophisticated tools and performance models beyond what is available to the general public.

It would be like trying to compete against Olympians, to use an analogy that we can all understand.

fancyfredbot2y ago

I gave an example of why people wouldn't be good at it with the pipelined asynchronous memory copies. Take a look at link below to the documentation. It's just plain difficult to do something as basic as move data into shared memory efficiently. Others have given far more detailed responses.

You probably won't like this, but I'm also going to suggest you take a look at the HN guidelines about assuming good faith, and around responding to the argument instead of calling names. My comment might have irked you but that's not actually a basis for deciding I'm anti intellectual, that I'm protecting my ego, and that I really just need someone to help me learn.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....

I've worked in one of the top computing labs, with top GPU computing startups, have investor money from Nvidia, wrote CUDA for years, and hire people to do write GPU code. And would say, most people -- even Nvidia employees and our own -- are individually bad at writing good CUDA code: it takes a highly multi-skilled team working together to make anything more than demoware. Most people who say they can write CUDA, when you scratch a little bit of the items I put below, you realize they can only for some basic one-offs. Think some finance person running one job for a month, but not at the equivalent of a senior python/java/c++ developer doing whatever reliable backend code they're hired to do that lives on.

To give a feel, while at Berkeley, we had an award-winning grad student working on autotuning CUDA kernels and empirically figuring out what does / doesn't work well on some GPUs. Nvidia engineers would come to him to learn about how their hardware and code works together for surprisingly basic scenarios.

It's difficult to write great CUDA code because it needs to excel in multiple specializations at the same time:

* It's not just writing fast low-level code, but knowing which algorithmic code to do. So you or your code reviewer needs to be an expert at algorithms. Worse, those algorithms are both high-level, and unknown to most programmers, also specific to hardware models, think scenarios like NUMA-aware data parallel algorithms for irregular computations. The math is generally non-traditional too, e.g., esoteric matrix tricks to manipulate sparsity and numerical stability.

* You ideally will write for 1 or more generations of architectures. And each architecture changes all sorts of basic constants around memory/thread/etc counts at multiple layers of the architecture. If you're good, you also add some sort of autotuning & JIT layers around that to adjust for different generations, models, and inputs.

* This stuff needs to compose. Most folks are good at algorithms, software engineering, or performance... not all three at the same time. Doing this for parallel/concurrent code is one of the hardest areas of computer science. Ex: Maintaining determinism, thinking through memory life cycles, enabling async vs sync frameworks to call it, handling multitenancy, ... . In practice, resiliency in CUDA land is ~non-existent. Overall, while there are cool projects, the Rust etc revolution hasn't happened here yet, so systems & software engineering still feels like early unix & c++ vs what we know is possible.

* AI has made it even more interesting nowadays. The types of processing on GPUs are richer now, multi+many GPU is much more of a thing, and disk IO as well. For big national lab and genAI foundation model level work, you also have to think about many racks of GPUs, not just a few nodes. While there's more tooling, the problem space is harder.

This is very hard to build for. Our solution early on was figuring out how to raise the abstraction level so we didn't have to. In our case, we figured out how to write ~all our code as operations over dataframes that we compiled down to OpenCL/CUDA, and Nvidia thankfully picked that up with what became RAPIDS.AI. Maybe more familiar to the HN crowd, it's basically the precursor and GPU / high-performance / energy-efficient / low-latency version of what the duckdb folks recently began on the (easier) CPU side for columnar analytics.

It's hard to do all that kind of optimization, so IMO it's a bad idea for most AI/ML/etc teams to do it. At this point, it takes a company at the scale of Nvidia to properly invest in optimizing this kind of stack, and software developers should use higher-level abstractions, whether pytorch, rapids, or something else. Having lived building & using these systems for 15 years, and worked with most of the companies involved, I haven't put any of my investment dollars into AMD nor Intel due to the revolving door of poor software culture.

Chip startups also have funny hubris here, where they know they need to try, but end up having hardware people run the show and fail at it. I think it's a bit different this time around b/c many can focus just on AI inferencing, and that doesn't need as much what the above is about, at least for current generations.

Edit: If not obvious, much of our code that merits writing with CUDA in mind also merits reading research papers to understand the implications at these different levels. Imagine scheduling that into your agile sprint plan. How many people on your team regularly do that, and in multiple fields beyond whatever simple ICML pytorch layering remix happened last week?

aurareturn2y ago

So do people write in CUDA? I assume non-ML scientists do but ML researchers don't?

frozenport2y ago

Yeah in the ML space you don't need to, but in engineering, HPC its still really popular. Perhaps in some universe we'll replace C++ with ONNX.

homarp2y ago

or use tvm

> nobody writes any CUDA

That's an extreme stretch, and far from truth.

Many people write CUDA, both in industry and academia.

fancyfredbot2y ago

What I said was "to first order nobody writes any CUDA". Using "to first order" in that way is probably an abuse of terminology, but my intent was to say the majority of people using GPUs do not write CUDA, not that literally nobody does (which would be absurd).

sbierwagen2y ago

If I was an AMD shareholder I'd seriously be considering a vote to remove CEO Lisa Su. They make nearly identical products to NVIDIA, yet that other company is worth literally ten times as much, because pytorch actually works on their cards. Why isn't she prioritizing firmware that doesn't crash?

david-gpu2y ago

> Why isn't she prioritizing firmware that doesn't crash?

I used to work in the GPU industry and this sort of view is both pervasive and misguided.

GPUs are immensely complex machines. It is really hard to get them to work, let alone work with high performance.

Because of this, and in spite of the amount of time and resources spent on validation and verification, the hardware often contains flaws. It is the responsibility of the drivers to work around these flaws in various ways. When a flaw hasn't been discovered and worked around yet, you perceive it as the GPU being unstable or crashing.

There is no fast simple solution to this. You need a finely tuned corporate machine from beginning to end. Better hiring processes, better management, better design processes, better verification processes, better software development practices, better marketing and sales, better customer relations. Everything.

imtringued2y ago

>GPUs are immensely complex machines. It is really hard to get them to work, let alone work with high performance.

This is like saying combustion engines are immensely complex machines when your car suddenly loses power on the highway for no apparent reason and then when you restart the engine it works for another five minutes again. When you drive on normal roads it works flawlessly. It must be the engine, right? After all, it is the most complicated aspect!

Except in reality it is far more likely for it to be a problem in the electronics driving the fuel pump or spark plug.

AMD most likely has some sort of buffer overflow or deadlock in their GPU drivers that is causing difficult to diagnose problems. It is very unlikely that the silicon itself is broken when it works fine for playing video games and it also works fine when your GPU is one of the few officially supported by ROCm.

croes2y ago

You want to fire someone who helped getting AMD on top of Intel?

Pretty bad idea, especially in midst of the AI hype.

curt152y ago

AMD has a CPU division too, and Zen basically resurrected AMD against Intel.

jejeyyy772y ago

"Why isn't she prioritizing firmware that doesn't crash?"

why can't xyz company build apps/websites/products that don't have bugs??

mrbishalsaha2y ago

Well deserved in my mind. Nvidia has been pushing the use of AI chips for far to long. The literally did everything possible to make it happen.

I believe LLMs will be commoditised while the compute power will be the next big thing.

chii2y ago

> Well deserved in my mind.

not if this moat could be leveraged into a monopoly on AI chips, to the detriment of society.

I want to see competition in this space.

Unfortunately, the market rally of nvidia stock is suggesting that most investors are expecting this monopoly to eventuate.

Therefore, it is in the interest of society to ensure that such a software moat is not established. Look what happened to the web browser when microsoft held a monopoly on it, and look at what is happening with chrome, apple appstore, etc.

> Look what happened to the web browser when microsoft held a monopoly on it, and look at what is happening with chrome, apple appstore, etc.

Realistically what happened is that after a few decades of development, competitors arose and took the market. In the meantime, Microsoft became rich. Who cares

andsoitis2y ago

if the prize is big enough, there will rise others.

aurareturn2y ago

>I believe LLMs will be commoditised while the compute power will be the next big thing.

Can you talk more about this? Would love to understand.

bsder2y ago

CUDA is a moat because AMD and Intel are run by morons^W^W^W run by people who can't swallow the fact that software is more important than hardware.

Intel should be shoveling out 16GB Arc graphics cards for free to every graduate program in the country who can fill out a web form. In a couple years, they'd displace NVIDIA.

AMD needs to be funding a CUDA shim that allows people to port stuff directly to their cards. And they need to NOT be segmenting the consumer and professional cards software ecosystems.

Yes, there has been progress. However, when you look at the amount of money that AMD and Intel throw at software vs how much NVIDIA throws at software, it's an instant facepalm moment.

NVIDIA is 100% vulnerable--if it weren't for the fact that their competitors are idiots.

aurareturn2y ago

>NVIDIA is 100% vulnerable--if it weren't for the fact that their competitors are idiots.

I think Nvidia sees it too. That's why they're moving upstream by providing the entire stack from CUDA, GPUs, interconnects chips, networking chips, racks, OS, software, models.

I think the "CUDA moat" people like OP are underselling Nvidia. They're positioning themselves as the full-stack AI provider. Forget CUDA.

chii2y ago

The moat that CUDA currently provides is what gives nvidia the room to move up. CUDA is a stepping stone - something stable they can rely on to cement an even higher position (tell me that the full stack is less vendor locked-in than CUDA, and i'll have a bridge to sell you).

lumost2y ago

How hard do you think it is to find engineers who are

- Great at legacy C++ code.

- Great at new C++ code.

- Great at embedded/high performance/distributed code.

- Are experts in Linear Algebra and Calculus

- Are competent at Machine Learning and similar problems.

Now imagine, that after you find ~10-50 competent senior engineers who can each segment and train 1-5 engineers, you also need to hire 10-20 managers, PMs and directors who are smart enough to do more than "copy NVidia's offering from last year", and wise enough to still build a 1:1 compatibility layer.

Apple is likely seeing more traction on their metal API by virtue that it is reasonably well guaranteed to be around in ~5 years, and is common on multiple device platforms that students/devs use or customers deploy.

Tiktaalik2y ago

All this describes video game programmers to me (well 4/5 at least). Given that there's been thousands of game layoffs recently anyone looking to build their AI teams should be diving through linkedin looking for laid off video game programmers.

dannyw2y ago

We're talking about trillion dollars of market cap here. If the difficulty is in hiring up to ~70 people, with somewhat but not obscure skills, perhaps the executives should be revisited.

It's kind of hilarious to be saying that Apple is more likely to be seeing traction on Metal of all things, when all but the last one of those requirements fit graphics programmers in Vulkan or DirectX, both of which have far more traction than Metal, and that last requirement is pretty easy to pick up if you're an expert in linear algebra and calculus.

It gets even stranger when considering that as major GPU makers, both AMD and Intel have lots of access to such talent.

CUDA is a shallow moat whose effectiveness depends entirely on NVidia convincing people to be mortally fearful of water.

kkielhofner2y ago

Genuine questions. What are your use cases? What do you do? How much experience?

My personal experience shows CUDA to in fact be a very deep moat. In ~12 years CUDA and ~6 ROCm (since Vega) I’ve never met a professional who says otherwise, including those at top500.org AMD sites.

From what I’ve seen online this take really seems to come from some kind of Linux desktop Nvidia grudge/bad experience or just good ‘ol gaming/desktop team red vs green vs blue nonsense.

Many things can be said about Nvidia and all kinds of things can be debated but suggesting that Nvidia has > 90% market share simply and solely because people drink Nvidia kool-aid is a wild take.

paulddraper2y ago

It's effectiveness depends on there being nothing on the other side.

lardo2y ago

> AMD needs to be funding a CUDA shim that allows people to port stuff directly to their cards. And they need to NOT be segmenting the consumer and professional cards software ecosystems.

Isn't that what HIPIFY does? https://github.com/ROCm/HIPIFY

paulmd2y ago

problem being that despite years of work and despite all the marketing hype, it’s still missing basic feature that are over 10+ years old on the nvidia side. If you can’t do dynamic parallelism then kernels can never launch kernels, for example. It has “partial support” for texture unit access. Inter-process communication is not supported. Etc.

https://rocm.docs.amd.com/projects/HIP/en/latest/user_guide/...

I don't know much about CUDA and NVIDIA, but it has always surprised me how hardware companies are so bad at producing good software tooling for their hardware.

Many microcontroller companies have terrible software support: no free C/C++ compilers, clunky IDEs, too much reliance on 3rd party software providers, no decent code libraries...

Even if they have software support, the code is bad and bloated. Look at ST's HAL libraries, for example. Thankfully, an open source or free tool often comes to the rescue, usually through the efforts of dedicated individual programmers. But billion-dollar companies relying on such 3rd party tooling seems insane to me.

Havoc2y ago

Blows my mind that AMD isnt throwing everything they’ve got at fixing this.

frozenport2y ago

+1 on the AMD are morons train.

AMD recently got rid of one of the CUDA compatibility layers instead of extending it.

Chasing compatibility is a waste of time and ultimately counterproductive. The important software is open source, they can just add direct support for their stuff. What they need to do is fix the stability of their drivers, make their stuff work on every GPU they sell or have sold in the past few years (as CUDA always has), and pay employees to integrate support into all the popular open source projects while fixing every bug that gets reported.

And they need to release high-RAM versions of their next gaming GPUs. More than anything else that will incentivize people to switch. If they're selling 36 GB while Nvidia is still selling 24 GB, people will do what it takes to move over.

jjmarr2y ago

> What they need to do is fix the stability of their drivers, make their stuff work on every GPU they sell or have sold in the past few years (as CUDA always has), and pay employees to integrate support into all the popular open source projects while fixing every bug that gets reported.

This takes a ton of employees which is hard for a company with a fraction of the software employees of Nvidia. (On that note there's 1185 engineering job postings on the AMD site right now... https://careers.amd.com/careers-home/jobs?categories=Enginee...)

pests2y ago

They didn't get rid of it, they dropped development and released it as open source.

steelbrain2y ago

> they dropped development and released it as open source.

"They" (being AMD) didn't. The person they contracted put in a clause that allowed him to open source the work (years AFTER) AMD stopped paying him.

Alifatisk2y ago

By reading all the comments here, everyone seem to agree on that AMD is betting on the wrong thing. Yet, they continue the same path.

- Abandoning ZLUDA was maybe not the best choice

- Not accepting the fact that software is equally as important as hardware is wrong

- Pushing more vram into their cards would attract more people

- Fix hardware issues (especially with the restarts on every fail) should be high priority

TMWNN2y ago

How does Apple's Metal compare to/compete with CUDA? I know Ollama and LM Studio support Metal.

shmerl2y ago

Lock-in should be broken. CUDA is one of the worst things about this whole ecosystem. Looks like AMD came close to breaking it, but they abandoned developing the translation layer.

Thinking that a cuda translation layer will take away Nvidias advantage is like expecting writing a c compiler to spontaneously result in unix

shmerl2y ago

It would take away a huge chunk of their advantage, no doubt about it. Let Nvida compete on merit instead of lock-in. Then you can say their advantage lies in being better. But Nvidia is very lock-in oriented, which undermines the claim that they are so much better than everyone.

aurareturn2y ago

  Chip War has a great section on how the Soviet Union tried a “just copy/steal” strategy in semiconductors and fell hopelessly behind because of it. It’s a great theoretical idea to just copy/steal and fast-follow, but semiconductors, AI, and other “harder technologies” require building human and intellectual capital that will get better with time. From there, you need to have the prior generation to keep up with ever-increasing complexity and difficulty as these things get more advanced.

I disagree with your section on Huawei and China. China isn't just trying to just copy/steal AI. In terms of models, China is a bit behind in LLMs but arguably more ahead in self-driving cars. China is throwing everything at semiconductor manufacturing instead because that's where their bottleneck truly is - not CUDA. Had Huawei had access to TSMC's 5nm and 3nm, they might already be equal to Nvidia in raw GPU prowess. After all, HiSilicon's Kirin already matched/exceeded Qualcomm before the Trump ban. Their 5G chips/implementation were well ahead of anyone else. In software, it's easier for China to adopt a CUDA alternative because China is usually really good at unifying under one vision - especially when they have to.

j / k navigate · click thread line to collapse