undefined | Better HN

0 pointsDouble_a_923y ago0 comments

The output part is basically nonsense. It would be more honest if the output was a text. E.g. "Teddybear" instead of a bad image of a random teddybear.

0 comments

Hakkin3y ago

In this specific case I agree, since the model may be overfitted, it seems like it's currently just a glorified object classifier based on what was in the training data, but the fact that it works at all may indicate that the underlying idea has merit. They would probably have to train a much larger network to see if it's able to separate features distinctly enough using the input fMRI data to be useful.

gus_massa3y ago

The problem is that it's impossible to know what is in the fMRI data and what is hallucinated by the reconstruction.

In this case, the real bear has a blue ribbon and the "reconstructed" bear ha a red ribbon. Is the ribbon in the fMRI data and the computer choose the wrong color, or most of the images in the training set had ribbons and the computer just added one.

Imagine this something like this is used in the future to get something like https://en.wikipedia.org/wiki/Facial_composite . People may give too much importance to the details and arrest someone only because the computer imagined some detail, like the logo in the baseball cap.

mkagenius3y ago

> Imagine this something like this is used in the future to get something like https://en.wikipedia.org/wiki/Facial_composite . People may give too much importance to the details and arrest someone only because the computer imagined some detail, like the logo in the baseball cap.

Wow, tech not working to tech might kill someone went super fast here.

1 more reply

moron4hire3y ago

It's not an object classifier at all. They had to text-prompt the system, first. I think the general idea is using the fMRI data as the pseudorandom initialization for the latent diffusion model to explore.

From what I understand, regular Stable Diffusion starts by generating a noise and then hallucinating modifications of that noise to make less noise. The more you let it run, the better the results.

So instead of just starting with a meaningless random noise, they're using the fMRI data to start. But if you didn't have the text prompt, you wouldn't get the right image. If you were looking at a cat but told it you were looking at a house, you'd probably end up with a small house, similar to one in its training set, positioned roughly where the cat was located in the original image.

Hakkin3y ago

Briefly reading the paper, it seems they trained 2 models (using data from different stages in the visual cortex) to generate latent vectors for both the visual and textual representations of the fMRA data, then feed those into Stable Diffusion. Those are the models that would be overfit in this case, so instead of those models being able to encode features like "toy, animal, fluffy, brown, ears, nose, arms, legs" individually, it's likely just encoding all of those features combined into a generic "teddy bear" because the input dataset is too small. Obviously this is an oversimplification, but hopefully you get what I mean. I didn't mean it was literally an object classifier, but that the nature of a model like this, with a dataset so small, it does not have to ability to extrapolate fine details. With a larger dataset and more training, it may be able to actually do that.

dr_dshiv3y ago

My colleagues did the same, but with EEG. This makes the technique much more accessible: https://arxiv.org/abs/2302.10121

One open question in the field: how to assess the alignment of the AI outcomes across different methods?

angusturner3y ago

Largely agree with this, although I think it would be interesting to formulate in terms of: "what is the mutual information between the fMRI scan and the stimulus".

i.e) is there actually more information than a few bits encoding a crude object category, which stable diffusion then hallucinates the rest (/ uses to regurgitate an over-fit image)?

Or are there many bits, corresponding spatially to different regions of the stimulus - allowing for some meaningful degree of generalization.

j / k navigate · click thread line to collapse

0 comments

Hakkin3y ago

gus_massa3y ago

The problem is that it's impossible to know what is in the fMRI data and what is hallucinated by the reconstruction.

mkagenius3y ago

Wow, tech not working to tech might kill someone went super fast here.

1 more reply

moron4hire3y ago

From what I understand, regular Stable Diffusion starts by generating a noise and then hallucinating modifications of that noise to make less noise. The more you let it run, the better the results.

Hakkin3y ago

dr_dshiv3y ago

My colleagues did the same, but with EEG. This makes the technique much more accessible: https://arxiv.org/abs/2302.10121

One open question in the field: how to assess the alignment of the AI outcomes across different methods?

angusturner3y ago

Largely agree with this, although I think it would be interesting to formulate in terms of: "what is the mutual information between the fMRI scan and the stimulus".

i.e) is there actually more information than a few bits encoding a crude object category, which stable diffusion then hallucinates the rest (/ uses to regurgitate an over-fit image)?

Or are there many bits, corresponding spatially to different regions of the stimulus - allowing for some meaningful degree of generalization.

j / k navigate · click thread line to collapse