edit:
There appears to be more information in a variety of the creator's videos, but it looks like the projection from birdsong-to-3D is (probably simplified) taking the Mel spectrogram (40 features from this, unclear as to what they are) and passing it to PCA to get 3D vectors.