> written, composed, and (deep-faked) performed
Ok, so :
1- Written : ChatGPT already spits out pop-hit-ready lyrics (not a high bar...)
2- Composed : AI results are, by definition, perfectly attuned to our tastes. Plus pop melodies are the easiest. So if you curate a good one out off, say, 1000 random (midi) generations, you can compose a Hit with it
3- Performed : Videos are more difficult that images, but we are already conditionned to CGI + VFX heavy music videos (even low-prod ones). So I don't think it is difficult to generate an AI powered ("collage" style) deep-faked music video.
It is not over though. You did not mention if the sound is totally AI generated or if humains arrange/produce and play/sing in it.
As you can glimpse from the openAI jukebox [1], IMO it is convincingly generating these two elements with the three others into a coherent (less hallucinatory) whole that may be real difficult.
[1] https://jukebox.openai.com/