A quote from the link you posted: "Then, when you refer to “Lambda”, “ChatGPT”, “Bard”, or “Claude” then, it’s not the model weights that you are referring to. It’s the dataset."
Yet all four of these examples use the same model architecture (transformers).