> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.
So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?
Not that it doesn’t have uses; not that there’s no value in research in small models.
Just, honestly, that these smaller models don’t have the capabilities of the larger models.
It’d be good to be a direct acknowledgment of that, because it seems like you’re going out of your way to promote the “it’s fine to have a small model”; and it is, roughly speaking. Parameter count isn’t everything. Small models are accessible, you can easily fine tune them. They are interesting.
…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.