Not 'char' - because it's using BPE (byte pair encoding), so after tokenization you might get ["Transform", "ers"] instead of ["T", "r", "a", ...]. This is relevant to how it struggles to reverse words. Not 'largest' because there are larger models like Pathways Language Model (PaLM) with 540 Billion parameters.