undefined | Better HN

0 pointsmrtranscendence3y ago0 comments

At a first approximation this is pretty good. I wouldn't say this exactly:

> A model is some architecture of how data will flow through these weight matrices, along with the values of each weight.

Because data doesn't really flow through weight matrices, though perhaps this is true if you squint at very simple models. Deep learning architectures are generally more complicated than multiplying values by weights and pushing the results to the next layer, though which architecture to use depends heavily on context.

> Tokens are sort of "words" in a sentence

Tokens are funny. What a token is depends on the context of the model you're using, but generally a token is a portion of a word. (Why? Efficiency is one reason; handling unknown words is another.)

0 comments

jstarfish3y ago

> What a token is depends on the context of the model you're using, but generally a token is a portion of a word.

When doing quick estimates, I just assume every syllable is a token. It tends to overestimate, which is fine for my OOM mitigation purposes.

j / k navigate · click thread line to collapse

0 comments

jstarfish3y ago

> What a token is depends on the context of the model you're using, but generally a token is a portion of a word.

When doing quick estimates, I just assume every syllable is a token. It tends to overestimate, which is fine for my OOM mitigation purposes.

j / k navigate · click thread line to collapse