To clarify for people who don't follow NeRF techniques, this research is not prompt based. The algorithm is capturing the 3d scene from real life images. There is some super promising work in mixing NeRF based techniques with various generative models to create 3d objects from prompts but it doesn't seem close to creating anything of this kind of scale / detail yet. I do agree this is a future possibility though.