In this case, the real bear has a blue ribbon and the "reconstructed" bear ha a red ribbon. Is the ribbon in the fMRI data and the computer choose the wrong color, or most of the images in the training set had ribbons and the computer just added one.
Imagine this something like this is used in the future to get something like https://en.wikipedia.org/wiki/Facial_composite . People may give too much importance to the details and arrest someone only because the computer imagined some detail, like the logo in the baseball cap.
Wow, tech not working to tech might kill someone went super fast here.
From what I understand, regular Stable Diffusion starts by generating a noise and then hallucinating modifications of that noise to make less noise. The more you let it run, the better the results.
So instead of just starting with a meaningless random noise, they're using the fMRI data to start. But if you didn't have the text prompt, you wouldn't get the right image. If you were looking at a cat but told it you were looking at a house, you'd probably end up with a small house, similar to one in its training set, positioned roughly where the cat was located in the original image.
One open question in the field: how to assess the alignment of the AI outcomes across different methods?
i.e) is there actually more information than a few bits encoding a crude object category, which stable diffusion then hallucinates the rest (/ uses to regurgitate an over-fit image)?
Or are there many bits, corresponding spatially to different regions of the stimulus - allowing for some meaningful degree of generalization.