undefined | Better HN

0 pointssydthrowaway4y ago0 comments

Why not 'char', and not 'largest'?

0 comments

Not 'char' - because it's using BPE (byte pair encoding), so after tokenization you might get ["Transform", "ers"] instead of ["T", "r", "a", ...]. This is relevant to how it struggles to reverse words. Not 'largest' because there are larger models like Pathways Language Model (PaLM) with 540 Billion parameters.

j / k navigate · click thread line to collapse

0 comments

lopuhin4y ago

Not 'char' - because it's using BPE (byte pair encoding), so after tokenization you might get ["Transform", "ers"] instead of ["T", "r", "a", ...]. This is relevant to how it struggles to reverse words. Not 'largest' because there are larger models like Pathways Language Model (PaLM) with 540 Billion parameters.

j / k navigate · click thread line to collapse