undefined | Better HN

0 pointsminimaxir2y ago0 comments

> So “dog” vs image of dog would both translate to a primordial signal : identity representation and in the domain of frequency do the comparison and project a coordinate in the spatial sense and eventually those two nodes would more likely be triggered at the same time due to the likelihood of “dog” being next to image of dog when parsing information across future events.

That is how CLIP embeddings work and were trained to work.

Hugging Face transformers now has a get_image_features() and get_text_features() function for CLIP models to make getting the embeddings for different modalities easy: https://huggingface.co/docs/transformers/model_doc/clip#tran...

0 comments

pyinstallwoes2y ago

Yeah but it doesn’t use a universal method does it? And it requires labeling.

The method I’m describing requires no labeling. Labeling would be a local only translation (alias). Labels emerge based on meaning. But the labels are more of an interface - not the actual nodes themselves which arise off the not identity principle * event proximity * comparisons.

warkdarrior2y ago

Labeling (which is typically manual and thus not scalable) is a proxy for comparisons. Two things are the same if they have the same label. The question is how else to encode the comparison information.

pyinstallwoes2y ago

Right one is manual the other is automatic and my hypothesis is you can have automatic universal labeling the way I describe

j / k navigate · click thread line to collapse

0 comments

pyinstallwoes2y ago

Yeah but it doesn’t use a universal method does it? And it requires labeling.

warkdarrior2y ago

pyinstallwoes2y ago

Right one is manual the other is automatic and my hypothesis is you can have automatic universal labeling the way I describe

j / k navigate · click thread line to collapse