I wonder if you could somehow use a large multimodal model trained on all of the existing NES software machine code along with manuals, forums and videos for each game. If it's not enough training data, maybe include other 6502 software, with the platforms encoded to differentiate?
Then you get a model that can take a description of a game and maybe a proposed screenshot and generate a new ROM. You train it further by testing the ROMs and giving feedback, starting with negative feedback for ROMs that don't boot.
I was thinking of training it on all of the 6502 compatible machine code that's out there. There are at least 20,000 or 30,000 programs.
/sarcasm