Because the difference between a model that costs 10 million to train and a model that costs 10 billion to train is 6 months.
Deepseek R1 is something that you can run in a garage on hardware that the average software engineer can buy with a months salary and when it came out last month it was better than _every_ other model.