Build models like we build open-source software (opens in new tab)

(colinraffel.com)

51 pointstristanz4y ago8 comments

8 comments

I feel like there's several industries that are practically computer science yet don't utilize open source effectively. Data science is definitely one, but the video game industry definitely comes to mind.

You could argue game engines are notoriously complex, but the Linux kernel would like a word.

tommiegannert4y ago

Isn't the main driver for this that data science is new enough that companies don't know where the competitive edge is? And by default, the data is confidential.

Software modules are commoditized at this point. Just pick building blocks and spend all your time on tying them together in ever increasing complexity. Curl (and similar tools) clearly is a commodity, but that web scraper that gives you the ability to compare Apple to Orange in real-time, that may be what gives you the edge. And the first-order analysis of said data is even more likely to be what lifts you above the competition.

tristanzOP4y ago

Collaborative incremental improvement of models would be extremely disruptive. While this happens via research, it's massively inefficient, particularly as pretrained models get larger and span multiple modalities.

fennecfoxen4y ago

I assume you mean "disruptive" in the sense that it would enable an uninteresting status quo to be replaced with something more dynamic, rather than actually disrupting peoples' work with models.

robbedpeter4y ago

Interesting - the modular idea is one of the most interesting to me. The recent hierarchical transformers papers hint that models can be made smaller and might open the door to modular approaches, which could mean highly nuanced customization of your domains of interest, and fitting the model size to the capacity of consumer hardware like phones.

Thanks for the effort you're putting into this!

amznbyebyebye4y ago

There is definitely a problem re: large parameter models, the issue is I don’t think throwing software dev tools at this is the right solution.

The constraint is largely hardware. The incremental post training done via transfer learning is generally not broadly applicable to many use cases.

sharemywin4y ago

I'm curious how Deepmind's MOE models Perceiver and Switch might play into managing a open distributed model.

gwern4y ago

(Perceiver isn't a MoE, and Switch isn't DM.)

MoEs would work well with this paradigm. The whole point is to have discrete fully-separate experts, so if you train on one task and I train on another, our patches won't likely touch the same experts even without any special tricks. You could even go so far as to patch the dispatch layer and plug in a brand new expert(s). MoEs would be able to accumulate lots of patches and merge them with little difficulty. If this paradigm catches on, it might well justify MoEs on its own, regardless of the touted benefits of more efficient training & much cheaper forward passes.

Perceiver would have more trouble. Perceiver is like a RNN for Transformers: it's relatively few weights, applied repeatedly & intensively to a small latent encoding the knowledge about the input. Even with tricks, patches are going to fight over how to change those weights and change the encoded knowledge. A few patches might work, but a lot will get ugly.

j / k navigate · click thread line to collapse