That makes sense. I almost thought maybe one way to do it might be to use generative AI with a ControlNet to create a more photorealistic 3D version of each of the pixel-art environments, and then use those images with your approach.... then somehow separate the lighting information and mask it over the original pixel art?
But I’m not sure if that’s actually feasible or how it would work technically. :)