Additionally, models can be (and are) fine tuned via APIs, so if that is the threshold required for a system to be "open source", then that would also make the GPT4 family and other such API only models which allow finetuning open source.
There's a pretty clear difference between the 'finetuning' offered via API by GPT4 and the ability to do whatever sort of finetuning you want and get the weights at the end that you can do with open weights models.
"Brute forcing" is not the correct language to use for describing fine-tuning. It is not as if you are trying weights randomly and seeing which ones work on your dataset - you are following a gradient.
Yes, the difference is that one is provided over a remote API, and the provider of the API can restrict how you interact with it, while the other is performed directly by the user. One is a SaaS solution, the other is a compiled solution, and neither are open source.
""Brute forcing" is not the correct language to use for describing fine-tuning. It is not as if you are trying weights randomly and seeing which ones work on your dataset - you are following a gradient."
Whatever you want to call it, this doesn't sound like modifying functionality in source code. When I modify source code, I might make a change, check what that does, change the same functionality again, check the new change, etc... up to maybe a couple dozen times. What I don't do is have a very simple routine make very small modifications to all of the system's functionality, then check the result of that small change across the broad spectrum of functionality, and repeat millions of times.
Maybe an analogy would help. A family spent generations breeding the perfect apple tree and they decided to “open source” it. What would open sourcing look like?
Yeah, that is my point. Things that don't have source code can't be open source.
"Maybe an analogy would help. A family spent generations breeding the perfect apple tree and they decided to “open source” it. What would open sourcing look like?"
I think we need to be weary of dilemmas without solutions here. For example, let's think about another analogy: I was in a car accident last week. How can I open source my car accident?
I don't think all, or even most things, are actually "open sourcable". ML models could be open sourced, but it would require a lot of work to interpret the models and generate the source code from them.
No one on the planet understands how the model weights work exactly, nor can they modify them specifically (i.e. hand modifying the weights to get the result they want). This is an impossible standard.
The source code is open (sorta, it does have some restrictions). The weights are open. The training data is closed.
Which is my point. These models aren't open source because there is no source code to open. Maybe one day we will have strong enough interpretability to generate source from these models, and then we could have open source models. But today its not possible, and changing the meaning of open source such that it is possible probably isn't a great idea.