a) For inference, cheaper and faster compute will increase total inference spend, because the end-user products will work better and people will use them more.
b) For training, the big labs will continue to spend because we have yet to see diminishing returns to scale - in fact, we have in the past year unlocked a new dimension to scale up training-time compute - doing more RL after pre-training to improve reasoning capabilities. Since current SOTA models are not yet smart enough for all the tasks people want to use them for, this means that any efficiency gains will be used to further improve performance. In the current competitive environment, even with DeepSeek's work, it's near-impossible to imagine OpenAI, Anthropic, Google, or Meta deciding to cut the compute budget for training their next model by an order of magnitude. They will still incorporate DeepSeek's techniques into their next model, but use them to squeeze even more performance out of the compute they have, and will keep purchasing as much compute as NVidia will sell them. Expect this trend to continue until there are no more returns to scale anymore.