undefined | Better HN

0 pointswrsh071y ago0 comments

But to your point - that is how I feel about graph nns vs transformers or the fully connected set (GPUs are so good at transformers and fully connected nns, even if there is a structure that makes sense we don't have the hardware to have it make sense.... Unless grok makes it cheap??)

0 comments

sdenton41y ago

Perhaps; in a lot of cases the architecture barely matters. Transformers took a lot of extra tricks to get working well; the ConvNext paper showed that applying those same tricks to convolutional networks can fully close the gap.

https://arxiv.org/abs/2201.03545

j / k navigate · click thread line to collapse

0 comments

sdenton41y ago

https://arxiv.org/abs/2201.03545

j / k navigate · click thread line to collapse