undefined | Better HN

0 pointsbigyabai3mo ago0 comments

This is the same justification that was used to ship the (now almost entirely defunct) NPUs on Apple and Android devices alike.

The A18 iPhone chip has 15b transistors for the GPU and CPU; the Taalas ASIC has 53b transistors dedicated to inference alone. If it's anything like NPUs, almost all vendors will bypass the baked-in silicon to use GPU acceleration past a certain point. It makes much more sense to ship a CUDA-style flexible GPGPU architecture.

0 comments

ivan_gammel3mo ago

Why are you thinking about phones specifically? Most heavy users are on laptops and workstations. On smartphones there might be a few more innovations necessary (low latency AI computing on the edge?)

bigyabaiOP3mo ago

Many laptops and workstations also fell for the NPU meme, which in retrospect was a mistake compared to reworking your GPU architecture. Those NPUs are all dark silicon now, just like these Taalas chips will be in 12-24 months.

Dedicated inference ASICs are a dead end. You can't reprogram them, you can't finetune them, and they won't keep any of their resale value. Outside cruise missiles it's hard to imagine where such a disposable technology would be desirable.

ivan_gammel3mo ago

Most consumers do not care about reprogramming or fine-tuning and have no idea what NPU is. For many (including specifically those who still mourn dead AI companions, killed by 4o switch) the long term stability is much more important than benchmark performance of evergreen frontier model. If Taalas can produce a good hardwired model at scale at consumer market price point, a lot of people will just drop their AI subscriptions.

bigyabaiOP3mo ago

> a lot of people will just drop their AI subscriptions.

For a 2.5 kW Server? I don't see it happening, your money and electricity is better spent on CUDA compute.

1 more reply

j / k navigate · click thread line to collapse

0 comments

ivan_gammel3mo ago

bigyabaiOP3mo ago

ivan_gammel3mo ago

bigyabaiOP3mo ago

> a lot of people will just drop their AI subscriptions.

For a 2.5 kW Server? I don't see it happening, your money and electricity is better spent on CUDA compute.

1 more reply

j / k navigate · click thread line to collapse