Clip: Connecting Text and Images (opens in new tab)

(openai.com)

119 pointssama5y ago10 comments

10 comments

This is really cool! I especially like that there's a premade colab notebook that lets you play with it: https://colab.research.google.com/github/openai/clip/blob/ma... .

I'm a little surprised that the paper doesn't seem to mention the effect of fine-tuning pretrained image and text encoders taken from somewhere else instead of learning the encoding from scratch. I would naively expect that to take way less compute to get good results, and possibly generalize better.

I guess the point is to test whether this technique is actually good for learning new representations from scratch? Still, I'm sure they must have run the experiment at some point just to see, and it would've been really interesting to see the numbers.

neosat5y ago

Impressive work! Their approach makes a ton of sense: "by not directly optimizing for the benchmark, we show that it becomes much more representative"

and is evident in the generalization and robustness: ~74% higher on the adversarial benchmark and similar/superior results on the standard Imagenet

gravy5y ago

Sometimes I look at stuff like this and sit in despair that I'm not learning anything that will let me work these kinds of problems while I work at a defense contractor maintaining 20+ year old code.

panabee5y ago

you can catch up very quickly.

great resource on transformers, which underpin the openai breakthroughs (the "T" in GPT-3 stands for transformer): http://peterbloem.nl/blog/transformers

great reddit sub for tracking ml research and trends: https://www.reddit.com/r/machinelearning

great email newsletter on ml from andrew ng: https://www.deeplearning.ai/thebatch/

also, someone must maintain old code or things break. thanks for your contributions.

modeless5y ago

The field is young and unlike mature fields such as physics or mathematics it doesn't take a lifetime of study to master, yet. Neural nets are actually pretty simple at their core; seemingly too simple to work as well as they do.

It's entirely possible to learn this stuff to near state-of-the-art level with a year or two of self-directed study outside of work. All the research is published open access, the software is open source, there are many high quality datasets available, hardware access is simple and free through Colab, and there are dozens of online courses and lectures at every level from beginner all the way to the cutting edge, again all free.

gravy5y ago

Do you recommend any beginner courses?

1 more reply

android22225y ago

Haha you are not alone as I do the same on an old CRM for non-profits. I'm thinking about going back to school to gain the missing math and science I need to get myself to a more challenging job. Boring business software just ain't fun. But it pays :)

liuliu5y ago

It still took great amount of computation resources (~250 V100s in 12 days or ~500 V100s in 18 days), but this can have much broader impact in everyday life much quicker. It could quickly translate to much more reasonable image labels, search rankings, video recommendations rather quickly. Very impressive work.

kordlessagain5y ago

Now if there were just a model that would clip images out of a page for me.

j / k navigate · click thread line to collapse

10 comments

mlucy5y ago

This is really cool! I especially like that there's a premade colab notebook that lets you play with it: https://colab.research.google.com/github/openai/clip/blob/ma... .

neosat5y ago

Impressive work! Their approach makes a ton of sense: "by not directly optimizing for the benchmark, we show that it becomes much more representative"

and is evident in the generalization and robustness: ~74% higher on the adversarial benchmark and similar/superior results on the standard Imagenet

gravy5y ago

Sometimes I look at stuff like this and sit in despair that I'm not learning anything that will let me work these kinds of problems while I work at a defense contractor maintaining 20+ year old code.

panabee5y ago

you can catch up very quickly.

great resource on transformers, which underpin the openai breakthroughs (the "T" in GPT-3 stands for transformer): http://peterbloem.nl/blog/transformers

great reddit sub for tracking ml research and trends: https://www.reddit.com/r/machinelearning

great email newsletter on ml from andrew ng: https://www.deeplearning.ai/thebatch/

also, someone must maintain old code or things break. thanks for your contributions.

modeless5y ago

gravy5y ago

Do you recommend any beginner courses?

1 more reply

android22225y ago

liuliu5y ago

kordlessagain5y ago

Now if there were just a model that would clip images out of a page for me.

j / k navigate · click thread line to collapse