And I never said as such.
>Why do you need to admonish me to be specific?
Because I'm confident that for any particular definition of "understanding", the difference won't be relevant. Case in point, the one you provided. You're now claiming that a word2vec model doesn't have some "understanding" based on it being unable to demonstrate a specific skill (circumlocution/definition)[1]. All of your other objections follow the same general format. Because the word2vec model can't perform a skill that you can, its "intuitive" understanding of a concept must be lesser.
Following such an argument to its logical conclusion, you'd have to agree that you have a better intuitive understanding of language than a paralyzed person, because you can dance the word while they cannot. I doubt you actually hold such a belief.
So if the demonstration of an arbitrary skill isn't the marker of understanding, since that would be unfair to our quadriplegic linguist friends, perhaps performance on specifically relevant skills is how we should measure whether or not some model has the "understanding" you want. To be less abstract, given some embedding that we think has some "understanding" of some concept, we need to get the I/O right. If the same embedding can be placed in models that are wired up to interface with the world differently, but still perform well, perhaps the "understanding" is more than surface level.
Word to vec models clearly "understand" synonyms and antonyms and similar word relations. Word2vec/word embedding based models are also I believe still SoTA in automatic summarization and language translation tasks, although the machinery is fairly distinct from the original paper.
So what we have is representation that can
1. Show you which words are similar to which other words
2. Use that knowledge to summarize text
3. Use that knowledge to translate text to a different language
4. Be poked at by humans where we can find semantically meaningful clusters and patterns via tools like t-SNE.
>What is that internal representation?
For word2vec, for example, its that the vector space the words are in clusters similar words. For this model, its that the vector space clusters similar colored cards.
For complex neural models, who knows. On the one hand, it would probably be very useful if we could glean useful structure from the internal representation, and indeed people are working on that[2]. But on the other hand, they're demonstrably useful even if we don't have a perfect understanding of the structure. And given that we don't understand how and why we humans understand concepts, that's fine for now.
Of course, all of this assumes that "understanding" is even the right word to use. There's a good argument to be made that a neural network can and never will "understand" anything, because that's only something that self-aware entities can do. But again, that's mostly a semantic distinction. If we're discussing the efficacy of word-embedding models and whether or not the representation of concepts in those embeddings is real or just...happenstance, I'm not really sure what you're going for there, the entire question of things like self-awareness is irrelevant.
[1]: I apologize for over-anthropomorphizing an ML model here, but it's the best way of putting this I can think of.