For another example of why this isn't commercially viable, look at what happened with Super Mario Maker. In that game _humans_ are given a fixed set of Mario doodads with which to build levels. But Nintendo kept the secret sauce for themselves - the ability to create new doodads. What follows is millions of derivative Mario levels unworthy of their own game. Even if you trained MarioGPT on the rich set of level data available in Mario Maker, you would not have an algorithm that makes commercially viable Mario levels.
It doesn't refute your point, but what actually happened was that brilliant tinkerers found ingenious ways to combine the basic tools to create whole new classes of advanced gadgets that enable styles of gameplay not intended by Nintendo. Here's a playlist with some examples and tutorials: https://youtube.com/playlist?list=PLekbcfvMB1gYieKXixxXVBTYC...
This approach could be a viable approach to that, but it may need some tuning. It is possible that the problem in this case is less GPT and more the training set; the examples given imply that the levels were characterized as a whole by some very superficial criteria, so it isn't necessarily a surprise that the resulting levels are equally superficial. The system was never trained on "shell jump" (not that that appears in Mario 1 AFAIK, it's just the first Mario term that came to mind), so it never produces them. I would want to look at training on a screen-by-screen basis, with some overlap, rather than levels, and more richly categorizing the input data.
If I were designing a new Indie game, I'd be feeding it some hand-crafted level snippets. However, in terms of getting it out, it would be hard to know whether I can feed the GPT system enough input with enough categorizations to know whether it would just be more cost-effective to design the levels directly. At the moment it is not obvious to me how to convince GPT to understand the concept of level flow, or even something as simply as "this pipe is physically impossible to jump over".
It is also possible there just isn't enough input data to really make this slick. There aren't that many publicly-available Mario levels.
If you train MarioGPT on MM2 levels, the reason you don't get "commercially viable Mario levels" is that's not what the community ever wanted to build, it's like training a model on abstract portraiture and then complaining this doesn't produce saleable landscape paintings. Mario Maker has multiple communities, let's look at two of them, in both cases they are not "commercially viable" for whatever that's worth.
Kaizo. Kaizo means roughly "re-arrange" in Japanese but eventually Kaizo Mario is a style in which tremendous skill is needed to navigate the course. Basic Kaizo techniques include the "Shell jump" in which Mario throws a shell, it bounces off a wall or other surface, and Mario jumps off the shell he threw. Mario can of course arrange to throw, jump off, and catch shells more than once, and he can cause Yoshi to swallow and then spit out a shell, jump off that shell, and catch it. Good Kaizo players think nothing of a multi shell jump to climb a wall, they'll assume that if there's a shell and a wall that's what is intended.
Kaizo Mario is far too difficult to be commercially successful. Most people could learn, if they're got good hand-eye co-ordination, but it's not easy and most people would only ever be passably good at it, so that hard Kaizo levels might be impossible either because they didn't figure out the technique or because their skills are inadequate, very frustrating.
"Chocolate" Kaizo (which is Kaizo where you also change the game's rules) isn't possible with Mario Maker, but even if an AI were able to make the best Chocolate Kaizo levels, they're not commercial, the best Chocolate Kaizo today is probably something like "Grand Pooh World 2" but there are maybe a few hundred people in the world who have fun playing something like that, so where's the money?
OK, next community, Troll. Troll Mario subverts the assumptions about the central concept of Mario. The idea is to surprise and perhaps frustrate the player, unlike Kaizo great skill is not mandatory, but patience is, and you need to be able to accept that you were wrong and learn from mistakes which many people struggle to do. A Troll level might present Mario with two apparent routes forward, a mushroom power up with a door, or a fire flower and a pipe. Except nope, those are both instant death, the correct solution is to jump into the obviously deadly pit, it wasn't really deadly and Mario gets a different mushroom then is pushed into a one-shot teleport.
A common Troll trope is the "anti-softlock" complete with use of the "Slide theme" music. Nintendo's levels are designed so that either Mario can win or you will be put out of your misery quickly to try again. Where it's possible to instead get stuck, unable to die, that's called a "Soft lock" - as opposed to a hard lock where the game just freezes. The anti-softlock then is the art of a Troll level making it possible but very difficult to die, even though Mario can't win. Fashion changes, sometimes it's popular to have actual softlocks, sometimes fake ones, where Mario will die after say 15 seconds somehow, but often especially later in a course, you have complex puzzles in which the only benefit of the solution is Mario dies and you can start over from the checkpoint you reached.
It has served as a good example to teach kids in my life about the scam of digital artificial scarcity employed by the game by making you wait for hearts or pay.
Being able to learn how the level generation works as a player is part of the experience of playing a roguelike game, so I don't think that's a bad thing though! Games with too much randomness and not enough structure can feel a bit samey
Then for Spelunky 2 there's the randomizer mod which randomizes almost everything. It pretty much never ceases to surprise you. Look up spelunky 2 randomizer on Youtube to see for yourself.
https://store.steampowered.com/app/210870/Cloudberry_Kingdom...
Incidentally, there's a nice example of a text representation of a level in the source code (requires scrolling horizontally, which isn't totally obvious from the GitHub UI): https://github.com/shyamsn97/mario-gpt/blob/main/mario_gpt/l...
Some parts are recognisable, for example the flag pole (which is typically at the end of mario levels, I believe).
On another thought: this could probably replace the chrome dino pretty well
But I'm not convinced the results are any smarter than a randomized procedural generation (I'm sure using it for text generation instead will yield sub-par results).