What you can think of contrastive learning as is: two separate models that take different inputs and make vectors of the same length as outputs. This is achieved by training both models on pairs of training data (in this case fMRI images and observed images).
What the LAION-5B work shows is that they did a good enough job of this training that the models are really good at creating similar vectors for nearly any image and fMRI pair.
Then, they make a prior model which basically says “our fMRI vectors are essentially image vectors with an arbitrary amount of randomness in them (representing the difference between the contrastive learning models). Let’s train a model to learn to remove that randomness, then we have image vectors.”
So yes, this is an impressive result at first glance and not some overfitting trick.
It’s also sort of bread and butter at this point (replace fMRI with “text” and that’s just what Stable Diffusion is).
They’ll be lots of these sort of results coming out soon.