In the context in which you said it, it matters a lot.
>> The idea that they used o1's outputs for their distillation further shows that models like o1 are necessary.
> Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.
If deepseek was produced through the distillation (term of art) of o1, then the cost of producing deepseek is strictly higher than the cost of producing o1, and can't be avoided.
Continuing this argument, if the premise is true then deepseek can't be significantly improved without first producing a very expensive hypothetical o1-next model from which to distill better knowledge.
That is the argument that is being made. Please avoid shallow dismissals.
Edit: just to be clear, I doubt that deepseek was produced via distillation (term of art) of o1, since that would require access to o1's weights. It may have used some of o1's outputs to fine tune the model, which still would mean that the cost of training deepseek is strictly higher than training o1.