But the issue with the LLMs architectures in place is with the idea of "predicting the next token", so strident with the exercise of intelligence - where we search instead for the "neighbouring fitting ideas".
So, "hierarchical" in this context is there to express that it is typical of natural intelligence to refine an idea - formulating an hypothesis and improving its form (hence its expression) step after step of pondering. The issue of transparency in current LLMs, and the idea of "predicting the next token", do not help in having the idea of typical natural intelligence mechanism and the tentative interpretation of LLM internals match.