undefined | Better HN

0 pointsacapybara2y ago0 comments

Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.

Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.

0 comments

wokwokwok2y ago

> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.

So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?

Not that it doesn’t have uses; not that there’s no value in research in small models.

Just, honestly, that these smaller models don’t have the capabilities of the larger models.

It’d be good to be a direct acknowledgment of that, because it seems like you’re going out of your way to promote the “it’s fine to have a small model”; and it is, roughly speaking. Parameter count isn’t everything. Small models are accessible, you can easily fine tune them. They are interesting.

…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.

tomrod2y ago

For your first point where you are attempting to impose agreement, I believe the other commentator is saying that tradeoffs are non-negligible between the two.

Sounds like the difference between edge and centralized ML scoring.

deepsquirrelnet2y ago

There is no “one size fits all” here. A bigger model is just a bigger hammer, that in many uses is too bulky and slow to be a proper solution.

At my job, I can’t casually fire up 8xA100 80gb instances. And if I could, the performance wouldn’t have the throughput I require to be useful. Big models are operationally much more expensive.

The smallest/fastest model that is accurate enough for your use case is ideal.

wokwokwok2y ago

> The smallest/fastest model that is accurate enough for your use case is ideal.

Sure.

…but it’s also fair to say that the smallest model that can fit your use case will be bounded by the parameter count.

No amount of training data can make 100 param model do text summarisation.

If you have a 3B param model, and you want a chat-GPT to embed in your app, do you think it’ll do?

I don’t.

The output is not at that quality level, because it’s too small.

Not everyone needs that; but these 3B / 7B models don’t have the capability to do everything.

1 more reply

chaxor2y ago

Of the goal is to use it to access a large knowledge base (like Google, but with better semantic searching), then it doesn't matter as much. There are some cases where it still matter due to not making some connections (for example, you may want an answer to something and not realize it due to your ignorance - a smaller model will get that a few percentage less times).

But ultimately small models are very good for most things, and much more preferable (to run at the home to organize your digital life, with a small SBC or old computer)

robertlagrant2y ago

> Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.

It's hard to pick out the actual answer: what is the application that this is good at? What has their "more nuanced" approach to understanding performance increased this model's performance at doing?

hhh2y ago

is this comment generated by an LLM?

j / k navigate · click thread line to collapse

0 comments

wokwokwok2y ago

> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.

So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?

Not that it doesn’t have uses; not that there’s no value in research in small models.

Just, honestly, that these smaller models don’t have the capabilities of the larger models.

…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.

tomrod2y ago

For your first point where you are attempting to impose agreement, I believe the other commentator is saying that tradeoffs are non-negligible between the two.

Sounds like the difference between edge and centralized ML scoring.

deepsquirrelnet2y ago

There is no “one size fits all” here. A bigger model is just a bigger hammer, that in many uses is too bulky and slow to be a proper solution.

At my job, I can’t casually fire up 8xA100 80gb instances. And if I could, the performance wouldn’t have the throughput I require to be useful. Big models are operationally much more expensive.

The smallest/fastest model that is accurate enough for your use case is ideal.

wokwokwok2y ago

> The smallest/fastest model that is accurate enough for your use case is ideal.

Sure.

…but it’s also fair to say that the smallest model that can fit your use case will be bounded by the parameter count.

No amount of training data can make 100 param model do text summarisation.

If you have a 3B param model, and you want a chat-GPT to embed in your app, do you think it’ll do?

I don’t.

The output is not at that quality level, because it’s too small.

Not everyone needs that; but these 3B / 7B models don’t have the capability to do everything.

1 more reply

chaxor2y ago

But ultimately small models are very good for most things, and much more preferable (to run at the home to organize your digital life, with a small SBC or old computer)

robertlagrant2y ago

It's hard to pick out the actual answer: what is the application that this is good at? What has their "more nuanced" approach to understanding performance increased this model's performance at doing?

hhh2y ago

is this comment generated by an LLM?

j / k navigate · click thread line to collapse