The latest strategies of etching weights into silicon seem like they can be generalized. We currently design gpu/tpu caching on the basis that the weights change frequently - if the weights do not change at all, or change very slowly - then there are other perhaps more efficient ways of laying out the memory on the chip which are somewhere between permanently etch a model onto silicon and use GPUs designed for graphics computation.
I'm assuming that they will do a silicon etching run once a year. Might be an interesting acquisition opportunity for Apple since that's the rhythm of their device release.
It's a good point, it would be a nice "upgrade story" to get the next generation model. At a fixed cost of ~$1000 per model, it wouldn't be a bad deal relative to current api costs.
That would be something like an FPGA. Which have been very unpopular so far due to high cost. And they also only support a relatively small number of weights.