And I feel like we're far too dismissive of instances we see where good papers get rejected. We're too dismissive of the collusion rings. What am I putting in all this time to write and all this time to review (and be an emergency reviewer) if we aren't going to take some basic steps forward? Fuck, I've saved a Welling paper from rejection from two reviewers who admitted to not knowing PDEs, and this was a workshop (should have been accepted into the main conference). I think review works for those already successful, who can p̶a̶y̶ "perform more experiments when requested" their way out of review hell, but we're ignoring a lot of good work simply for lack of m̶o̶n̶e̶y̶ compute. It slows down our progress to reach AGI.
Not that I disagree, but I don't think that's a reason to not publish. There's another way to rephrase what you've said
many ideas that work well at small scales do not trivially work at large scales
But this is true for many works, even transformers. You don't just scale by turning up model parameters and data. You can, but generally more things are going on. So why hold these works back because of that? There may be nuggets in there that may be of value and people may learn how to scale them. Just because they don't scale (now or ever) doesn't mean they aren't of value (and let's be honest, if they don't scale, this is a real killer for the "scale is all you need" people)> Other ideas that work at super specialized settings don’t transfer or don’t generalize.
It is also hard to tell if these are hyper-parameter settings. Not that I disagree with you, but it is hard to tell.
> Correlations in huge multimodal datasets are way more complicated than most humans can grasp and we will not get to AGI before we can have a large enough group of people dealing with such data routinely.
I'm not sure I understand your argument here. The people I know that work at scale often have the worst understanding of large data. Not understanding the differences between density in a normal distribution and a uniform. Thinking that LERPing in a normal yields representative data. Or cosine simularity and orthogonality. IME people that work at scale benefit from being able to throw compute at problems.
> we don’t do anybody a favor by increasing the entropy of the publications in the huge ML conferences
You and I have very different ideas as to what constitutes information gain. I would say a majority of people studying two models (LLMs and diffusion) results in lower gain, not more.
And as I've said above, I don't care about novelty. It's a meaningless term. (and I wish to god people would read the fucking conference reviewer guidelines as they constantly violate them when discussing novelty)
Regarding large multimodal data, I don’t know what people you refer to, so I can’t comment further. The current math is useful but very limited when it comes to understanding the densities in such data; vectors are always orthogonal at high dim and densities are always sampled very poorly. The type of understanding of data that would help progress in drug and material design, say, is very different from the type of data that can help a chatbot code. Obviously the future AI should understand it all, but it may take interdisciplinary collaborations that best start at an early age and don’t fit the current academic system very well unfortunately.
I'd like to push back on this quite a bit. We don't have AI that shows decent reasoning capabilities. You can hope that this will be resolved, but I'd wager that this will just become more convoluted. A thing that acts like a human, even at an indistinguishable level need not also be human nor have the same capabilities of of a human[0]. This question WILL get harder to answer in the future, I'm certain of that, but we do need to be careful.
Getting to the main point, metrics are fucking hard. The curse of dimensionality isn't just that there are lots of numbers, it is that your nearest neighbor becomes ambiguous. It is that the difference between the furthest point (neighbor) and the closest point (nearest neighbor) decreases. It is that orthogonality becomes a more vague concept. That means may not be representative of a distribution. This is stuff that is incredibly complex and convolutes the nature of these measurements. For AI to be better than us, it would have to actually reason, because right now we __decide__ not to reason instead __decide__ to take the easy way out and act as if metrics are the same as they are in 2D (ignoring all advice from the mathematicians...).
It is not necessarily about the type of data when the issue we're facing is at an abstraction of any type of data. Categorically they share a lot of features. The current mindset in ML is "you don't need math" when the current wall we face is highly dependent on understanding these complex mathematics.
I think it is incredibly naive to just rely on AI solving our problems. How do we make AI to solve problems when we __won't__ even address the basic nature of problems themselves?
[0] As an example, think about an animatronic duck. It could be very lifelike and probably even fool a duck. In fact, we've seen pretty low quality ones fool animals, including just ones that are static and don't make sounds. Now imagine one that can fly and quack. But is it a duck? Can we do this without the robot being sentient? Certainly! Will it also fool humans? Almost surely! (No, I'm not suggesting birds aren't real. Just to clarify)
Personally, this has affected me as a late PhD student. Late in the literal sense as I'm not getting my work pushed out (even some SOTA stuff) because of factors like these and my department insists something is wrong with me but will not read my papers, the reviews, or suggest what I need to do besides "publish more." (Literally told to me, "try publishing 5 papers a year, one should get in.") You'll laugh at this, I pushed a paper into a workshop and a major complaint was that I didn't give enough background on StyleGAN because "not everyone would be familiar with the architecture." (while I can understand the comment, 8 pages is not much room when you gotta show pictures on several datasets. My appendix was quite lengthy and included all requested information). We just used a GAN as a proxy because diffusion is much more expensive to train (most common complaints are "not enough datasets" and "how's it scale"). I think this is the reason so many universities use pretrained networks instead of training things from scratch, which just railroads research.
(I also got a paper double desk rejected. First because it was "already published." Took a 2 months for them to realize it was arxiv only. Then they fixed that and rejected again because "didn't cite relevant works" with no mention of what those works were... I've obviously lost all faith in the review process)
So far, instead, I've seen:
- Banning social media posting so that only big tech and collusion positing can happen to "protect the little guy"
- Undoing the ban to lots of complaints
- Instituting a no LLM policy with no teeth and no method to actually verify
- Instituting a high school track to get those rich kids in sooner
Until I see such changes like "we're going to focus on review quality" I'm going to continue thinking it is a scam. They get paid by my tax dollars, by private companies, and I volunteer time, for what...? Something a LLM could have actually done better? I'm seeing great papers from big (and small) labs get turned down while terrible papers are getting accepted. Collusion rings go unpunished. And methods get more and more convoluted as everyone tries to game the system.You think of all people, we, ML, would understand reward hacking. But until we admit it, we can't solve it. And if we can't solve it here, how the hell are we going to convince anyone we're going to create safe AGI?
As for alternatives: I don't see why we don't just push to OpenReview and call it a day. We can link our code, it has revisions, and people can comment and review. I don't see what the advantage of having 1-3 referees who don't want to read my paper and have no interest in it but have strong incentives to reject it is any meaningful signal of value. I'll take arxiv over their opinions.
Thank you for fighting the good fight.
This is why I love OpenReview, I can spot and ignore nonsensical reviewer criticisms and ratings and look for the insightful comments and rebuttals. Many reviewers do put in a lot of very valuable work reading and critiquing most of which would go to waste if not made public.
And I gotta say, I'm not going to put up a fight much longer. As soon as I get out of my PhD I intend to just post to OR.