undefined | Better HN

0 pointsjacobn2y ago0 comments

The training code is presumably quite a bit more complex than what they've open sourced, but part of the beauty of the GPT-based LLMs is their structural simplicity.

Now, that simplicity can be deceiving - there are a lot of conceptual interconnectedness within these models. They've been put together "just so" if you will.

If you look at the source code to nanoGPT and compare it to Llama3, the most remarkable thing (when you look past the superficial name changes) is just how similar they are.

If I recall correctly the primary differences are:

  - The MLP: Llama3 uses SwiGLU vs the more "traditional" x = x + proj(gelu(expand(x))) in GPT2
  - The token encoders, which is arguably external to the model
  - Attention: Llama3 uses Grouped Query Attention, vs full Multi-Head Attention in GPT2
  - Normalization: Llama3 uses RMSNorm, vs LayerNorm for GPT2

They were published more than five years apart. On the one hand progress has been breathtaking, truly astounding. On the other hand, it's almost exactly the same model.

Goes to show just how much is in the training data.

0 comments

jacobnOP2y ago

> Goes to show just how much is in the training data.

And in the scale (num_layers, embed_dim, num_heads) of the model of course ;)

novaRom2y ago

> beauty of the GPT-based LLMs is their structural simplicity

human brain's structure is also encoded in a short DNA sequence

jacobnOP2y ago

Forgot one: the positional encoding also changed, llama3 uses RoPE, gpt2 uses a learned embedding.

j / k navigate · click thread line to collapse

0 comments

jacobnOP2y ago

> Goes to show just how much is in the training data.

And in the scale (num_layers, embed_dim, num_heads) of the model of course ;)

novaRom2y ago

> beauty of the GPT-based LLMs is their structural simplicity

human brain's structure is also encoded in a short DNA sequence

jacobnOP2y ago

Forgot one: the positional encoding also changed, llama3 uses RoPE, gpt2 uses a learned embedding.

j / k navigate · click thread line to collapse