undefined | Better HN

0 pointsgkbrk4mo ago0 comments

LLMs are increasingly trained on images for multi-modal learning, so they too would have seen one object, then two.

0 comments

They never saw any kind of object, they only saw labeled groups of pixels – basic units of a digital image, representing a single point of color on a screen or in a digital file. Object is a material thing that can be seen and touched. Pixels are not objects.

madaxe_again4mo ago

My friend, you are blundering into metaphysics here - ceci n’est pas une pipe, the map is the territory, and all that.

We are no more in touch with physical reality than an LLM, unless you are in the habit of pressing your brain against things. Everything is interpreted through a symbolic map.

gloosx4mo ago

when photons strike your retina, they are literally striking brain tissue that is been pushed outward into the skull front window. Eyes are literally the brain, so yes, we are pressing it against things to "see" them.

gkbrkOP4mo ago

Okay, goalpost has instantly moved from seeing to "seeing and touching". Once you feed in touch sensor data, where are you going to move the goalpost next?

Models see when photons hit camera sensors, you see when photons hit your retina. Both of them are some kind of sight.

gloosx4mo ago

The difference between photons hitting the camera sensors and photons hitting the retina is immense. With a camera sensor, the process ends in data: voltages in an array of photodiodes get quantized into digital values. There is no subject to whom the image appears. The sensor records but it does not see.

When photons hit the retina, the same kind of photochemical transduction happens — but the signal does not stop at measurement. It flows through a living system that integrates it with memory, emotion, context, and self-awareness. The brain does not just register and store the light, it constructs an experience of seeing, a subjective phenomenon — qualia.

Once models start continuously learning from visual subjective experience, hit me up – and I'll tell you the models "see objects" now. Until direct raw photovoltaic information stream about the world around them without any labelling can actually make model to learn anything, they are not even close to "seeing".

j / k navigate · click thread line to collapse

0 comments

gloosx4mo ago

madaxe_again4mo ago

My friend, you are blundering into metaphysics here - ceci n’est pas une pipe, the map is the territory, and all that.

We are no more in touch with physical reality than an LLM, unless you are in the habit of pressing your brain against things. Everything is interpreted through a symbolic map.

gloosx4mo ago

gkbrkOP4mo ago

Okay, goalpost has instantly moved from seeing to "seeing and touching". Once you feed in touch sensor data, where are you going to move the goalpost next?

Models see when photons hit camera sensors, you see when photons hit your retina. Both of them are some kind of sight.

gloosx4mo ago

j / k navigate · click thread line to collapse