Papers with Code (opens in new tab)

(paperswithcode.com)

250 pointsmbildner4y ago41 comments

41 comments

It is weird to see Papers with Code on the front page of HN.

This site is the bread and butter of each Research Engineers and Scientists working in Deep Learning. You use the site almost everyday.

Advanced learners also use the site regularly.

You would just think that "everyone knows" and never think of sharing the site on HN.

TuringTest4y ago

I'm concerned that this "every knows" is increasingly becoming a true social problem, unsolved by current technology - in fact, worsened by it.

Knowledge about a field transfers best by hands-on association with people who practice it. Before widespread IT, communities of practice were local and relatively homogeneous; so it was easy to share the essentials of a field quickly, and get newcomers up and running with best practices.

Nowadays however, communities of practice are widespread, coming around all the world with very different backgrounds, communicating through low-bandwidth channels, and we're flooded with information so it's difficult to ascertain what is essential and what's accessory.

It is much more difficult for an outsider to grasp the essential qualities of a field they want to enter, as there are usually no guides comprehensive enough to detail everything you need to know.

jmnicolas4y ago

Why is it a problem? People should put at least a minimum of effort to research what might interests them. Not everything has to be spoon fed to people.

I never found any subject that needed let's say more than 10 minutes of internet searches to know if it's worth pursuing.

It was much harder before the web. I remember as a kid seeing books about C++ in the local shop but even with looking inside not understanding what C++ was. Nowadays I would get my answer almost instantly.

JohnHaugeland4y ago

> I'm concerned that this "every knows" is increasingly becoming a true social problem, unsolved by current technology - in fact, worsened by it.

You couldn't possibly believe this if you were old enough to remember what preceded the internet.

Good lord, no, today is not worse than microfiche and card catalogs.

TuringTest4y ago

I was a young adult when the web become widespread, and the problem I'm talking about was milder: precisely because there was a shortage of documentation, what was available limited the number of topics that you could learn about, and being flooded by different sources was less of a problem.

It was still possible to define a Library Science were books were classified by hand, and not some secret algorithm counting links as votes or learning and regurgitating a corpus of loosely related documents without understanding any of it. I.e., it was possible to make sense of information sources, and whatever you learned of a field came with a single consistent narrative. Nowadays, information gathering has become an exercise in picking and choosing unconnected fragments from which you must infer your own understanding.

In some ways you can still try to emulate the old way, by limiting yourself to a small set of publishers who try to compile and organize a small part of a field of knowledge - yet it is much easier than the teachings of that source will be deeply contradicted by some other seemingly authoritative source, without a clear way to know which one should you rely upon, and with the whole exercise feeling like it provides an incomplete perspective.

fault14y ago

I think you overestimate how many HN readers are "Research Engineers and Scientists working in Deep Learning".

p1esk4y ago

He also overestimates the importance of that site for “Research Engineers and Scientists working in Deep Learning".

visarga4y ago

Let me guess - everyone - means /r/machinelearning and a curated list of people on Twitter?

p1esk4y ago

I’m a DL researcher, I’ve known of this site for a few years, and while the original motivation behind it was good I personally never extracted much value from it. Usually googling the paper title or a model name plus ‘github’ and/or ‘pytorch’ will produce all relevant links to code.

“Bread and butter” for me is http://arxiv-sanity.com

criticaltinker4y ago

A couple previous discussions for those interested:

https://news.ycombinator.com/item?id=19054501 (Feb 1, 2019) 411 points, 23 comments

https://news.ycombinator.com/item?id=23391934 (June 2, 2020) 304 points, 21 comments

ivegotnoaccount4y ago

Not everyone works in machine learning which seems to be the only subject the site handles, and those that do not work in it may still be interested in hearing about the existence of such a website.

barefeg4y ago

I’m curious, how does it fit your daily workflow as an engineer? Is it somewhere where you get the “news” for the day? Or do you use it for getting information relevant to your current work projects?

stared4y ago

I never tracked is like news. I used it for two main things:

- Checking the state of the art (SotA) for a given problem. For some problems 2 year old solutions are still close to SotA; in others - there is a huge difference. And if there is a huge difference - is it because of architecture and parameter tuning, or using totally different architectures and training modes.

- Running code - to be used somewhere, or as a reference. Papers never have all details, and do not compile.

Context: I used to work in the field, as a consultant. Though, I cite Papers with Code in one overview paper.

igorkraw4y ago

You use it to find the code and data of a paper - since it also lists other implementations - to run additional baselines on Imagenet in order to appease reviewer #3( who has no idea why your paper on convex optimization has nothing to do with this but it's easier to run them than argue with them).

Pre-parenthesis part is dead serious, parenthesis part is slightly hyperbolic due to accumulated trauma with bad reviewers

visarga4y ago

It also gives an overview of the current state of the art for thousands of tasks and indexes current research by methods used, so you can quickly bootstrap research on a topic.

Another great resource is the HuggingFace model zoo. So many trained models easy to deploy.

MasterIdiot4y ago

While there's probably some value in posting the resources "everyone knows", Papers with Code was submitted multiple times in the past few years, which is a pretty common HN thing (whether this behavior is desired is up to dang and the community I guess).

hungryforcodes4y ago

I've never heard of it. Glad to have made its acquaintance.

mayankkaizen4y ago

After reading your comment, I now feel embarrassed as to why I haven't heard of this site.

toxik4y ago

Don’t worry, it’s not true.

btschaegg4y ago

Obligatory XKCD:

"Ten Thousand"

https://xkcd.com/1053/

2bitencryption4y ago

Question -

I get that your run-of-the-mill paper saying "Here we present a novel algorithm for xyz" will usually have the algorithm defined in simple psuedo-code, maybe with an implementation in a "real" language as a proof of concept.

But for the many papers describing novel ML models, how does that work? They seem to use images that diagram out the different layers of the model. But is that truly "universal" the way that a psuedo-code algorithm is universal? As in, if the authors use PyTorch (or whatever), can I take the exact model they describe in their paper and apply it in MyFavoriteMLToolkit and achieve similar results?

I guess my question is, what are the "primitives" of papers describing ML models? Is saying "convolutional layer" enough, or do they also describe the dozens of hyper-parameters, etc?

lkois4y ago

So porting between ML frameworks was my job for a while, and the short answer is Yes, common layers can be quite simple to describe and reproduce in different frameworks. eg "Conv2D(2,3)" is enough info, in code or text, to describe a 2d convolution layer with 2 outputs and a shape of 3x3.

The longer answer is that the rest of the Conv2D configuration can then be easily overlooked, unless changed from the defaults. And those can be different across frameworks and potentially break things, even they even exist in your preferred framework. You can always create custom layers though, if needed.

But many papers also seem to do a bad job describing the actual structure of their own ML network. They can be vague, confusing, or simply inaccurate. And that can be because they are a general concept with flexible details, or because they struggle to put their model into clear words and diagrams. Or simply because they know the code is going to do the lifting.

criticaltinker4y ago

It's a good question which might yield a very complex answer depending on how far down the rabbit hole of reproducible science/computation/machine learning you're willing to go.

To keep things simple, I'd say the true "primitives" of ML models can be reduced to mathematical formulas. For example, a plain old feed forward network is implemented as matrix multiplication. Sprinkle in a bit of calculus to analytically derive the formula for back-propagating errors (aka training), and you have the basic building blocks of modern deep learning. Convolutions, Transformers, etc are just a bit fancier spins on the same mathematical foundations.

Hyper-parameters are essentially tunable variables in a formula. I'd say your instinct is spot on - they are absolutely necessary to capture for reproducible results.

If you have the code and the data the answer should be yes. You should be able to take that PyTorch code and translate it to MyFavoriteMLToolkit to obtain numerically identical results.

In practice, we face the same universal difficulties as other computer science based research: fighting inconsistencies in software, hardware, all the way down to the physics of the universe with cosmic ray induced bit flips, etc.

nl4y ago

> But for the many papers describing novel ML models, how does that work? They seem to use images that diagram out the different layers of the model. But is that truly "universal" the way that a psuedo-code algorithm is universal? As in, if the authors use PyTorch (or whatever), can I take the exact model they describe in their paper and apply it in MyFavoriteMLToolkit and achieve similar results?

Generally, yes.

If they are standard, well-known layers that exist in both PyTorch and TF you can take a paper that was implemented in one and implement in the other and expect similar results (assuming you know a reasonable number of details[1]).

If they are non-standard layers it can be hard. There are lots of details that you need to port and even with access to the source code it can be easy to miss things.

[1] Here's an example of how things are implemented differently - you can still get the same result, but you need to know what you are doing: https://stackoverflow.com/questions/60079783/difference-betw...

liquidmetal4y ago

In my experience there are many lesser significant hyperparameters that can impact performance when going from the released code to your personal favorite framework.

Nothing you can't figure out by reading source code of the two frameworks or by reading the documentation closely.

Generally, people don't seem to care about reproducing exact metrics - as long as it is close enough they're happy. You need to dig a bit deeper if you want the full quality.

ur-whale4y ago

>But is that truly "universal" the way that a psuedo-code algorithm is universal?

My experience has been that pseudo-code is anything but universal.

In fact, having had many times to implement actual working code from research papers pseudo-code, I would posit that pseudo-code is nothing but a license for academics to provide stuff that simply doesn't work to the reader and get away with it. Thanks to pseudo-code, they get to gently skip over the hard bits to get the paper out the door as quickly as possible.

Papers with actual, git-clonable, working code, should be the standard for CS academic publishing.

zacmps4y ago

It depends. Usually a paper doesn't have enough room to mention all of the possible choices in preprocessing, architecture, optimiser, etc. You can usually get pretty close with details just in the paper, but it's not always possible.

That's why a large number of journals now have requirements for publishing code and/or pretrained models (if applicable).

An annoying trend I've noticed in a number of SotA ML papers in video classification present multiple models and only publish the exact architecture & weights for the smaller models which are only as-good-as SotA (see tiny video networks, X3D for examples).

motiejus4y ago

I implemented Wang–Müller algorithm, described it, and embedded the code to the pdf, along with tooling how to generate the example diagrams of the paper (and the whole paper). Everything is in the pdf[1].

Arxiv.org won't accept a pdf with attachments though, so only a stripped-down version will come there (once/if I get an endorsement, fingers crossed).

I copied this concept from Joe Armstrong, where he suggested to distribute Erlang modules as PDFs with code files (*.erl) as attachments. "Documentation comes first, and the distribution should prioritize humans".

[1]: See Section A.1 of https://github.com/motiejus/wm/blob/main/mj-msc-full.pdf

p1esk4y ago

I don’t get it, why not just include a link to github in your pdf?

motiejus4y ago

Links to external sites have a tendency to rot.

I've stumbled upon a number of scientific papers from 2000s that include links to sourceforge for code listings. Most of those are dead now.

Github will not be there for ever.

yukinon4y ago

This is a great site. It's pretty ML focused which lands a bit outside my interest range, does anyone know of a similar site that has papers from CS as a whole?

srvmshr4y ago

Not exactly what you need, but this page lists best papers awards (with links) from all major conferences

https://jeffhuang.com/best_paper_awards/

And here's PapersWeLove Repo with similar sauce

https://github.com/papers-we-love/papers-we-love

wallflower4y ago

Yes. I just resubmitted Papers We Love. Submission activity has dropped off and still a Scrooge McDuck’s embarrassment of riches.

https://paperswelove.org/

mnks4y ago

Glad you like Papers with Code. Please check [1] for the list of scientific domains we currently support and [2] for CS in particular.

[1]: https://portal.paperswithcode.com/

[2]: https://cs.paperswithcode.com/

barefeg4y ago

What are your interests? This one has some extra content like company blogs and conferences, though it’s still AI centric https://www.zeta-alpha.com/

productceo4y ago

Amazing website. As some noted in comments, widely used in AI research community, but I expect this website will be useful to the broader developer community as well!

criticaltinker4y ago

Transformer based architectures and unsupervised pre-training are achieving state of the art results across multiple modalities including NLP, CV, speech recognition, genomics, physics etc - so here's my must read list of recent papers on the topics (along with some of my notes). Happy holidays!

[1] Attention Is All You Need (2017) https://paperswithcode.com/paper/attention-is-all-you-need

Introduced the Transformer architecture and applied it to NLP tasks.

[2] The Annotated Transformer (2018) https://nlp.seas.harvard.edu/2018/04/03/attention.html

An “annotated” version of [1] in the form of a line-by-line Pytorch implementation. Super helpful for learning how to implement Transformers in practice!

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) https://paperswithcode.com/paper/bert-pre-training-of-deep-b...

One of the most highly cited papers in machine learning! Proposed an unsupervised pre-training objective called masked language modeling; learned bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

Bonus: https://nlp.stanford.edu/seminar/details/jdevlin.pdf

See the above slideshow from the primary author, noting the remarkably prescient conclusion: "With [unsupervised] pre-training, bigger == better, without clear limits (so far)"

[4] Conformer: Convolution-augmented Transformer for Speech Recognition (2020) https://paperswithcode.com/paper/conformer-convolution-augme...

Proposed an architecture combining aspects of CNNs and Transformers; performed data augmentation in frequency domain (spectral augmentation).

[5] Scaling Laws for Neural Language Models (2020) https://paperswithcode.com/paper/scaling-laws-for-neural-lan...

Arguably one of the most important papers published in the last 5 years! Studies empirical scaling laws for (Transformer) language models; performance scales as a power-law with model size, dataset size, and amount of compute used for training; trends span more than seven orders of magnitude.

[6] Language Models are Few-Shot Learners (May 2020, NeurIPS 2020 Best Paper) https://paperswithcode.com/paper/language-models-are-few-sho...

Introduced GPT-3, a Tranformer model with 175 billion parameters, 10x more than any previous non-sparse language model. Trained on Azure's AI supercomputer, training costs rumored to be over 12 million USD. Presented evidence that the average person cannot distinguish between real or GPT-3 generated news articles that are ~500 words long.

[7] CvT: Introducing Convolutions to Vision Transformers (May 2020) https://paperswithcode.com/paper/cvt-introducing-convolution...

Introduced the Convolutional vision Transformer (CvT) which has alternating layers of convolution and attention; used supervised pre-training on ImageNet-22k.

[8] Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition (Oct 2020) https://paperswithcode.com/paper/pushing-the-limits-of-semi-...

Scaled up the Conformer architecture to 1B parameters; used both unsupervised pre-training and iterative self-training. Observed through ablative analysis that unsupervised pre-training is the key to enabling growth in model size to transfer to model performance.

[9] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jan 2021) https://paperswithcode.com/paper/switch-transformers-scaling...

Introduced the Switch Transformer architecture, a sparse Mixture of Experts model advancing the scale of language models by pre-training up to 1 trillion parameter models. The sparsely-activated model has an outrageous number of parameters, but a constant computational cost. 1T parameter model was distilled (shrunk) by 99% while retaining 30% of the performance benefit of the larger model. Findings were consistent with [5].

[10] ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing (August 2021) https://paperswithcode.com/paper/prottrans-towards-cracking-...

Applied Transformer based NLP models to classify & predict properties of protein structure for a given amino acid sequence, using supercomputers at Oak Ridge National Laboratory. Proved that unsupervised pre-training captured useful features; used learned representation as input to small CNN/FNN models, yielding results challenging state of the art methods, notably without using multiple sequence alignment (MSA) and evolutionary information (EI) as input. Highlighted a remarkable trend across an immense diversity of protein LMs and corpus: performance on downstream supervised tasks increased with the number of samples presented during unsupervised pre-training.

[11] CoAtNet: Marrying Convolution and Attention for All Data Sizes (December 2021) https://paperswithcode.com/paper/coatnet-marrying-convolutio...

Current state of the art Top-1 Accuracy on ImageNet.

OttPeterR4y ago

Thanks for this thoughtful list. I try not to flood my ML dev team with too much academic reading but obviously some are too important. Seeing another persons take on what’s important helps me refine what I give to the newcomers to get them up to speed.

kettleballroll4y ago

The ViT paper doesn't make your list?

criticaltinker4y ago

Good suggestion, it was tough to narrow down the list! Here is a link to the ViT paper in case others are interested [1].

According to the latest ImageNet standings [2], ViT appears to have slipped to second place in Top-1 Accuracy. CoAtNet-7 is the new leader, but only by a slight margin and at the cost of what appears to be a significantly larger model.

[1] Scaling Vision Transformers https://paperswithcode.com/paper/scaling-vision-transformers

[2] https://paperswithcode.com/sota/image-classification-on-imag...

kettleballroll4y ago

That isn't the ViT paper, this one is https://paperswithcode.com/paper/an-image-is-worth-16x16-wor...

j / k navigate · click thread line to collapse

41 comments

rg1114y ago

It is weird to see Papers with Code on the front page of HN.

This site is the bread and butter of each Research Engineers and Scientists working in Deep Learning. You use the site almost everyday.

Advanced learners also use the site regularly.

You would just think that "everyone knows" and never think of sharing the site on HN.

TuringTest4y ago

I'm concerned that this "every knows" is increasingly becoming a true social problem, unsolved by current technology - in fact, worsened by it.

It is much more difficult for an outsider to grasp the essential qualities of a field they want to enter, as there are usually no guides comprehensive enough to detail everything you need to know.

jmnicolas4y ago

Why is it a problem? People should put at least a minimum of effort to research what might interests them. Not everything has to be spoon fed to people.

I never found any subject that needed let's say more than 10 minutes of internet searches to know if it's worth pursuing.

JohnHaugeland4y ago

> I'm concerned that this "every knows" is increasingly becoming a true social problem, unsolved by current technology - in fact, worsened by it.

You couldn't possibly believe this if you were old enough to remember what preceded the internet.

Good lord, no, today is not worse than microfiche and card catalogs.

TuringTest4y ago

fault14y ago

I think you overestimate how many HN readers are "Research Engineers and Scientists working in Deep Learning".

p1esk4y ago

He also overestimates the importance of that site for “Research Engineers and Scientists working in Deep Learning".

visarga4y ago

Let me guess - everyone - means /r/machinelearning and a curated list of people on Twitter?

p1esk4y ago

“Bread and butter” for me is http://arxiv-sanity.com

criticaltinker4y ago

A couple previous discussions for those interested:

https://news.ycombinator.com/item?id=19054501 (Feb 1, 2019) 411 points, 23 comments

https://news.ycombinator.com/item?id=23391934 (June 2, 2020) 304 points, 21 comments

ivegotnoaccount4y ago

Not everyone works in machine learning which seems to be the only subject the site handles, and those that do not work in it may still be interested in hearing about the existence of such a website.

barefeg4y ago

stared4y ago

I never tracked is like news. I used it for two main things:

- Running code - to be used somewhere, or as a reference. Papers never have all details, and do not compile.

Context: I used to work in the field, as a consultant. Though, I cite Papers with Code in one overview paper.

igorkraw4y ago

Pre-parenthesis part is dead serious, parenthesis part is slightly hyperbolic due to accumulated trauma with bad reviewers

visarga4y ago

It also gives an overview of the current state of the art for thousands of tasks and indexes current research by methods used, so you can quickly bootstrap research on a topic.

Another great resource is the HuggingFace model zoo. So many trained models easy to deploy.

MasterIdiot4y ago

hungryforcodes4y ago

I've never heard of it. Glad to have made its acquaintance.

mayankkaizen4y ago

After reading your comment, I now feel embarrassed as to why I haven't heard of this site.

toxik4y ago

Don’t worry, it’s not true.

btschaegg4y ago

Obligatory XKCD:

"Ten Thousand"

https://xkcd.com/1053/

2bitencryption4y ago

Question -

I guess my question is, what are the "primitives" of papers describing ML models? Is saying "convolutional layer" enough, or do they also describe the dozens of hyper-parameters, etc?

lkois4y ago

criticaltinker4y ago

It's a good question which might yield a very complex answer depending on how far down the rabbit hole of reproducible science/computation/machine learning you're willing to go.

Hyper-parameters are essentially tunable variables in a formula. I'd say your instinct is spot on - they are absolutely necessary to capture for reproducible results.

If you have the code and the data the answer should be yes. You should be able to take that PyTorch code and translate it to MyFavoriteMLToolkit to obtain numerically identical results.

nl4y ago

Generally, yes.

If they are non-standard layers it can be hard. There are lots of details that you need to port and even with access to the source code it can be easy to miss things.

liquidmetal4y ago

In my experience there are many lesser significant hyperparameters that can impact performance when going from the released code to your personal favorite framework.

Nothing you can't figure out by reading source code of the two frameworks or by reading the documentation closely.

Generally, people don't seem to care about reproducing exact metrics - as long as it is close enough they're happy. You need to dig a bit deeper if you want the full quality.

ur-whale4y ago

>But is that truly "universal" the way that a psuedo-code algorithm is universal?

My experience has been that pseudo-code is anything but universal.

Papers with actual, git-clonable, working code, should be the standard for CS academic publishing.

zacmps4y ago

That's why a large number of journals now have requirements for publishing code and/or pretrained models (if applicable).

motiejus4y ago

Arxiv.org won't accept a pdf with attachments though, so only a stripped-down version will come there (once/if I get an endorsement, fingers crossed).

[1]: See Section A.1 of https://github.com/motiejus/wm/blob/main/mj-msc-full.pdf

p1esk4y ago

I don’t get it, why not just include a link to github in your pdf?

motiejus4y ago

Links to external sites have a tendency to rot.

I've stumbled upon a number of scientific papers from 2000s that include links to sourceforge for code listings. Most of those are dead now.

Github will not be there for ever.

yukinon4y ago

This is a great site. It's pretty ML focused which lands a bit outside my interest range, does anyone know of a similar site that has papers from CS as a whole?

srvmshr4y ago

Not exactly what you need, but this page lists best papers awards (with links) from all major conferences

https://jeffhuang.com/best_paper_awards/

And here's PapersWeLove Repo with similar sauce

https://github.com/papers-we-love/papers-we-love

wallflower4y ago

Yes. I just resubmitted Papers We Love. Submission activity has dropped off and still a Scrooge McDuck’s embarrassment of riches.

https://paperswelove.org/

mnks4y ago

Glad you like Papers with Code. Please check [1] for the list of scientific domains we currently support and [2] for CS in particular.

[1]: https://portal.paperswithcode.com/

[2]: https://cs.paperswithcode.com/

barefeg4y ago

What are your interests? This one has some extra content like company blogs and conferences, though it’s still AI centric https://www.zeta-alpha.com/

productceo4y ago

Amazing website. As some noted in comments, widely used in AI research community, but I expect this website will be useful to the broader developer community as well!

criticaltinker4y ago

[1] Attention Is All You Need (2017) https://paperswithcode.com/paper/attention-is-all-you-need

Introduced the Transformer architecture and applied it to NLP tasks.

[2] The Annotated Transformer (2018) https://nlp.seas.harvard.edu/2018/04/03/attention.html

An “annotated” version of [1] in the form of a line-by-line Pytorch implementation. Super helpful for learning how to implement Transformers in practice!

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) https://paperswithcode.com/paper/bert-pre-training-of-deep-b...

Bonus: https://nlp.stanford.edu/seminar/details/jdevlin.pdf

See the above slideshow from the primary author, noting the remarkably prescient conclusion: "With [unsupervised] pre-training, bigger == better, without clear limits (so far)"

[4] Conformer: Convolution-augmented Transformer for Speech Recognition (2020) https://paperswithcode.com/paper/conformer-convolution-augme...

Proposed an architecture combining aspects of CNNs and Transformers; performed data augmentation in frequency domain (spectral augmentation).

[5] Scaling Laws for Neural Language Models (2020) https://paperswithcode.com/paper/scaling-laws-for-neural-lan...

[6] Language Models are Few-Shot Learners (May 2020, NeurIPS 2020 Best Paper) https://paperswithcode.com/paper/language-models-are-few-sho...

[7] CvT: Introducing Convolutions to Vision Transformers (May 2020) https://paperswithcode.com/paper/cvt-introducing-convolution...

Introduced the Convolutional vision Transformer (CvT) which has alternating layers of convolution and attention; used supervised pre-training on ImageNet-22k.

[8] Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition (Oct 2020) https://paperswithcode.com/paper/pushing-the-limits-of-semi-...

[9] Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (Jan 2021) https://paperswithcode.com/paper/switch-transformers-scaling...

[11] CoAtNet: Marrying Convolution and Attention for All Data Sizes (December 2021) https://paperswithcode.com/paper/coatnet-marrying-convolutio...

Current state of the art Top-1 Accuracy on ImageNet.

OttPeterR4y ago

kettleballroll4y ago

The ViT paper doesn't make your list?

criticaltinker4y ago

Good suggestion, it was tough to narrow down the list! Here is a link to the ViT paper in case others are interested [1].

[1] Scaling Vision Transformers https://paperswithcode.com/paper/scaling-vision-transformers

[2] https://paperswithcode.com/sota/image-classification-on-imag...

kettleballroll4y ago

That isn't the ViT paper, this one is https://paperswithcode.com/paper/an-image-is-worth-16x16-wor...

j / k navigate · click thread line to collapse