I'm a little surprised that the paper doesn't seem to mention the effect of fine-tuning pretrained image and text encoders taken from somewhere else instead of learning the encoding from scratch. I would naively expect that to take way less compute to get good results, and possibly generalize better.
I guess the point is to test whether this technique is actually good for learning new representations from scratch? Still, I'm sure they must have run the experiment at some point just to see, and it would've been really interesting to see the numbers.
and is evident in the generalization and robustness: ~74% higher on the adversarial benchmark and similar/superior results on the standard Imagenet
great resource on transformers, which underpin the openai breakthroughs (the "T" in GPT-3 stands for transformer): http://peterbloem.nl/blog/transformers
great reddit sub for tracking ml research and trends: https://www.reddit.com/r/machinelearning
great email newsletter on ml from andrew ng: https://www.deeplearning.ai/thebatch/
also, someone must maintain old code or things break. thanks for your contributions.
It's entirely possible to learn this stuff to near state-of-the-art level with a year or two of self-directed study outside of work. All the research is published open access, the software is open source, there are many high quality datasets available, hardware access is simple and free through Colab, and there are dozens of online courses and lectures at every level from beginner all the way to the cutting edge, again all free.