I've always imagined AGI (perhaps naively) as being achieved by clever usage of ML, plus some utilization of classical/symbolic AI from pre-AI winter days, plus probably some unknown elements.
a). an internal feedback loop that evaluates a possible output without actuating it, and self-modifies the parameters if the possible output is not what it's needed
b). the capability (based on a) to model own behaviours without acting on them, and to model other agents behaviours and incorporate that model into the feedback
c). the ability to switch between modelling own behaviour and other agents behaviour intentionally by the model itself - as part of the feedback loop
i.e. what I feel it's totally missing in the self-driving cars today is the capability to model OTHER traffic participants actions and intentions; an experienced and attentive human driver does this all the time, pays attention to the pedestrians on the side if they want to jump in front of the car, pays attention to where other cars are LIKELY to go, pays attention to how the bicyclist that's currently overtaken may fall, even pays attention to random soccer balls flying out of a courtyard because a kid may be chasing that. I am not seeing any driving car trying to model any agent outside its own.
If you are interested in self-driving cars, I can highly recommend their presentation from November 2021:
https://youtu.be/uJWN0K26NxQ?t=1467
For me it felt more convincing than Tesla's (a few months prior);
The thing that would convince me AGI is ready would be to play a convincing game of poker. Or join in on a conversation mid-way through, listen to it, and engage with it actively. Show that machines are able to pick up on social cues, understand them, and learn new ones. It's a high bar, yes, but it's in my opinion a prerequisite for a self-driving car that's able to share roadways with other cars, cyclists, and kids playing in the street.
"A robot modeled itself without prior knowledge of physics or its shape and used the self-model to perform tasks and detect self-damage."
The reasoning is that given enough training data the system would know the pedestrian is going to jump out or the cyclist is going to fall just based on sheer volume of training examples. It would have seen that scenario tons of times in the image data.
Whether that will actually work is the question though
Biology is glacially slow in comparison and one of the advantages from computing is being fast.
I believe that not modeling it is partially by design as a result of responsibility and blame frameworks. If you depend upon possible actions taken by others to be safe you are reckless. Extrapolating from current motions is more reliable than trying to profile everything. "They are moving towards the street at 3mph and 20 ft away, their vector will intersect with car, brake to avoid collision or accelerate enough to leave intersection zone before they can even reach us" seems a more reliable approach. It isn't like a kid will suddenly teleport into the road.
For what it's worth, this is my view as well. And I don't think it's particularly naive. Plenty of people have researched and/or are researching aspects of how to do this. But how to combine something like a neural network, with it's distributed (and very opaque) representations, with an inference engine that "wants" to work with discrete symbols is non-obvious. Or at least it appears to be, since nobody apparently has figured out how to do it yet - at least not to the level of yielding AGI.
but I've never heard a compelling argument for why pure ML would get us there.
The simplistic argument would be that ML models are, in some sense, trying to replicate "what the brain does" and it stands to reason that if your current toy ANN's (and let's be honest - the largest ANN's built to date are toys compared to the brain) are something like the brain, then in principle if you scale them up to "brain level" (in terms of numbers of neurons and synapses), you should get more intelligence. Now on the other hand, anybody working with ANN's today will tell you that they are at best "biologically inspired" and aren't even close to actually replicating what biological neural networks do. Soo... while people like Geoffrey Hinton have gone on record as saying that "ANN's are all you need" (I'm paraphrasing, and I don't have a citation handy, sorry) I tend to think that in the short term a valid approach is exactly what you suggested. Combine ML and use it for what it's good at (pattern recognition, largely) and use "old fashioned" symbolic AI for the things that it is good at (reasoning / inference / etc.)
Now, to figure out how to actually do that. :-)
Playing Chess at a grandmaster level was considered something only a human could do until the 1990s, and now no human has beat the best computer in 17 years while AGI seems further away than ever.
Mark my words: we'll create an AI that can pass the Turing test this decade, but we'll still be as far away from the badly defined general problem as we ever were.
My brother just became a grandpa and I was watching his grandson navigate the world this past weekend. It's unbelievable how quickly the brain can extrapolate a new relationship between objects/actions/etc and then apply it elsewhere. Minimally you see it in the drinking action applied to all sorts of things, this sort of repetitive clenching/releasing of the fingers to find things to grip without looking, etc etc. Watching mom use a fork and very quickly understand how to grasp and manipulate it. The model of just training everything from exogenous data into a flat network seems like it will hit some asymptotic limit.
I am a proponent of using a working theory that intelligence is an emergent property and we can in principle create new intelligences in a lab (or ML warehouse) if we provide the proper conditions, but that finding and maintaining those conditions is extremely hard. Some state of the art research today aims to integrate recognition capbilities (image recognititon and object detection/tracking on video, voice extraction from audio, text) with advanced generative models for language and behavior, as well as realtime rendering systems that can create realistic humans.
if we combine those we can make a bot that appears fully interactive, passes all turing tests, convinces typical person it's another person... and still has nothing inside researchers would call "artificial intelligence". It might even solve science problems that we can't without having any spark of creativity or agency. Or maybe when we make a bot with all those properties, some uncanny valley is crossed and out pops something that has objective AGI?
As the wise robot once said, "if you can't tell the difference, does it really matter?". We should forge ahead with building datacenter-scale brains and feed them with data and algorithms, while also maintaining a cadre of research scientists who are attuned to the ethical challenges of doing so, an ops team trained to recognize the early signs of sentience, and an exec team with humanity.
Heuristically, we came to be by a very dumb process of piling up newer generations. If my pet would communicate with me on the level of GPTx, I would be very impressed. That's why nowadays I have some scepticism for the ANN critics' arguments, though think it would be neat if they were right.
The thing that I dislike the most in these discussions is the pervasiveness of the AGI concept and the assumption of a linear scale of intelligence. Again, I can intuitively say that I'm more intelligent than my pet: but to quantify this, we'd need to use something silly like brain size, or qualitative/arbitrary things like "this being can talk". I think that human intelligence is a somewhat random point in a very multi-dimensional space, one that technology may never even have a reason to visit. But people tend to subscribe to the notion that this is the very important "point where AGI happens".
GPTx is not communicating with anyone. It is generating text that resembles text it had in its training set. The fact that human text is normally a form of communication doesn't make generating quasi-random text communication in itself. GPTx is no more communicating than a printer is when printing out text.
A cat or dog leading you to their empty food bowl is actual communication, and they are capable of much more advanced communication as well (especially dogs). The fact that it doesn't look like written text is not that relevant. They are of course worse than GPTx at producing text, just like they are worse than a printer at writing it on a blank page.
My feeling is this is a PR push by Facebook. All tech companies keep touting AI, especially Google but also Microsoft, Apple and Amazon. In some sense I believe these business want to control how their own success is defined. That is, they are right now convincing everyone that tech dominance is equivalent to AI dominance which is equivalent to ML dominance. In some sense this is turning into a purity test, like "which tech company is the most AI focused". I expect this kind of PR to accelerate as each company tries to prove its AI bona-fides to the market.
I listened to the episode with Yann. Compared to other talks (e.g. the previous one with Brian Keating) it was a bit dull and uninteresting. The answers were not that insightful.
I do the same thing and feel the same way, like I'm astroturfing or something. If its any consolation, I don't remember ever seeing your references and I hope you don't remember mine.
I suppose the same will soon be true for most ML-related areas of research sooner or later, at least as far as applied ML is concerned.
Already, a substantial amount of research innovation in NLP and CV has been coming from big companies in recent years.
Of course there is a discussion to be had about what that means for society at large. At this point, a lot of said companies to publish their results at conferences etc. But what if at some point they decide to be as "open" as OpenAI (ie., not)?
In the NLP space there's been a lot of work recently around reducing model sizes, since they've started to reach the point where model weights sometimes don't fit in the memory of most GPUs.
There's also projects like MarianNMT which completely abandon Python and write heavily optimized models with fast languages that can run quickly and accurately even without GPUs. I think we'll see a lot more of this, though of course there's a pretty big barrier in the sheer rarity of being good at both deep learning research and writing optimized low-level code.
As for writing low level code, I thought that was something usually handled by the compiler or where even the advanced high performance for high price mostly tweaked the compiler after analyzing the output. Not my direct space so I speak with no authority.
Constraints are the mother of creativity.
I don’t see something new here, these institutions to encourage people to share are old, so it must be a problem that had been recognized for a while.
Aisde from some of the academics and the "gain and share knowledge for knowledge's sake" types they hire why would they care?
For the record, I don't like the idea of scientific research becoming proprietary. At all. But is there anyone credulous enough to think these organizations would willingly risk their bottom line for principles like "openess" and not just play the PR games to make themselves appear open and concerned?
In other words "Don't LOOK evil but do evil when no one's looking".
The Frances Haugen already shows how damaging such openness can be.
If academics want to do research on expensive cutting-edge tech, they will have to join industrial labs or pool together resources, similar to particle physics or drug discovery research today.
Honestly ... this is lot of GPUs ... but is it the biggest...?
> Model training is done with mixed precision on the NVIDIA DGX SuperPOD-based Selene supercomputer powered by 560 DGX A100 servers networked with HDR InfiniBand in a full fat tree configuration. Each DGX A100 has eight NVIDIA A100 80GB Tensor Core GPUs
So Nvidia used 4480 GPUs to train Megatron-Turing NLG 530B for example.
As for today, Nvidia has this a very slightly smaller cluster that you outlined at ~5k, Microsoft as a few of them roughly of that size, and Microsoft also built a 10k GPU cluster for OpenAI 2 years ago, but those are V100 GPUs.
So, is 6k A100 "bigger" than 10k V100? Depends exactly how you use them, in a perfect usage scenario yes, slightly. In real life maybe not.
The point of making this machine is to have a lot of A100s going at the same time, and that will unblock some small set of researchers who are working on time-sensitive competitive research projects by giving them a slightly throughput and latency advantage on the largest problems. The vast majority of users would be better served by a small number of cheaper, slower GPUs that they had exclusive access to for the longest time period they could afford to wait.
“The experiences we’re building for the metaverse require
enormous compute power…and RSC will enable new AI models
that can learn from trillions of examples, understand
hundreds of languages, and more,” Meta CEO Mark Zuckerberg
I don't really understand how AI processing is going to make the 'experiences' any better? This seems to me like investor fluff, saying they have some insane capability that other 'VR providers' don't have...- 3d worlds with style transfer on the textures, like maybe there's a cafe with the visual style of Starry Night or something
- NPCs with conversation models that are finetuned for each NPC's personality and saves some history for each person it talks to for continuity
- Game-playing AI on NPCs that make them go around doing actual things or playing minigames with players
- The usual user tracking models, figuring out what people like to do in the metaverse and giving them more of that
- All the lower-level stuff that AI can do better - user inputs, rendering, etc.
Whether or not they can pull it off is a separate question - I think the tech is close but not quite there yet - but there's no doubt that the metaverse concept of "an expansive virtual world with lots of fun things to do" has many ways to use huge amounts of computation.
Example 2: Using AI upscaling (like Nvidia) to improve visual fidelity in games.
Example 3: Hand/body tracking for avatars.
The more AI compute, the more experimentation researchers can do.
In summary, rather than actually streaming video to the person you're chatting with, you send a keyframe, and then 'compressed' video is sent over the wire, and 'decompressed' at the receiver end.
I'm putting 'compression' in quotations because to me I'm not sure I'm comfortable calling it compression. Basically, you're remotely controlling an avatar of yourself.
While the obvious usage of this is reducing bandwidth used (in their example, an h264 stream at ~100KB/frame can be compressed to 0.1KB/frame, literally a thousandth of the bandwidth), it opens up some VERY interesting possibilities for a company like Meta (check from about 1:55 onwards in the video below).
You can view someone's face from any angle, not just the angle they're speaking from (as you might in a VR world), or you can even map the key points onto a completely different keyframe, allowing for hyper-realistic avatars or next-level virtual backgrounds (imagine: you send a keyframe of you sitting at your desk and hop on a video conference from the beach, and no-one's any the wiser as long as the sea is quiet enough)
The value ad is that the engineering community that they employ has a job, the stock stays higher because of their perceived value add to the tech, and the push to control data continues unburdened by something as trivial as a lack of compute power. Hooray. Progress.
A lot of this stuff is like trickling tech from F1 teams down into consumer cars. Some of the tech will likely end up in commodity datacenter/cloud stuff.
https://m.youtube.com/watch?v=BTETsm79D3A
There is never enough compute power. Dwarf Fortress on a supercomputer?
> understand
I know this is CEO-talk, but I sometimes wonder if these pricks really think they are inventing AI.
Evidence: actual work experience at building latent representations to characterize customer behavior at FAANG. It's hard to come up with something that really gets you, but it's not hard to come up with something likely to make you spend more. You're surprisingly predictable on that axis and even if you aren't because you put the hours into being a crazy outlier, almost everyone else is, and you don't matter.
It's just bunch of GPUs. It could be used for anything people can imagine, good or bad
Yes, but anything that could do that, will be used for military robots and context-aware ubiquitous comms surveillance.
> It's just bunch of GPUs. It could be used for anything people can imagine, good or bad
And nuclear power can be used for good or ill, too. But when the ills grow big enough, it's still fair to worry about proliferation and possible end-of-civilization events. It's unhelpful to reassure someone building a bomb bunker "Don't worry, nuclear power is just a tool, it can be used for good OR for bad".
Which, of course, is great for accessibility.
[0]: https://deepmind.com/blog/article/alphafold-a-solution-to-a-...
So you’ll have small human crowds but loads of anonymous avatar androids taking all the good fishing spots, riding the trails backwards, etc.
I’m joking hopefully
I fear the ammount of human information this AI is going to be free to analyze from Facebook and what it will deduce about us and then how Meta will use it to generate capital.
For general tasks like language modeling, we are still seeing predictable improvements (on the next-token-prediction loss) with increasing compute. We will very likely be able to scale things up by 10,000x or so and continue to see increasing performance.
But what does this mean for end users? We are probably going to see sigmoid-like curves, where qualitative features of these models (like being able to do math, or tell jokes, or tutor you in French, or provide therapy, or mediate international conflicts) will suddenly get a * lot * better at some point in the scaling curve. We saw this for simple arithmetic in the GPT-3 paper, where the small <1B param models were terrible at it, and then with 100B scale suddenly the model could do arithmetic with 80%+ accuracy.
Personally I would not expect diminishing returns with increased scale, instead there will be sudden leaps in ability that will be very economically valuable. And that is why Meta and others are so interested in scaling up these models.
The actual difference between the two is quite diminished compared to years past and seems to reduce more to how a collection of computers is used and not what it is.
Saturating a box with 500+ GB GPU RAM is fun. Only our gov users ask us for help on that typically: most of our users are commercial nowadays, but with much smaller/scaled down GPU rigs. I think that'll change as the fintechs keep improving and software gets easier, but they are still not there (outside of niches). Working on it :)
(If you like writing shaders, we are hiring :D )
In addition to the other responses, I like pointing people to this talk[1] by Jeff Hammond for a comprehensive answer to this question (you can skip to the 11:15 timestamp).
[1] https://uchicago.hosted.panopto.com/Panopto/Pages/Embed.aspx...
That sounds wicked evil. If ads, marketing and habit inducing platform designs are a problem now, imagine what this will lead to.
To understand what drives your users more than the users understand it themselves and to use that understanding for profit. Intensified.
And not to mention for surveillance, you know DARPA and the NSA want their hands all over this.
> The company declined to comment on the location of the facility or the cost
It's generally a common practice not to disclose addresses of your data centers, but they can usually be discerned with a bit of research. Journos aren't going to be that extensive.
The best thing is, assuming the 'quality' of their product scales with the amount of work put into it, we'll get... 30% more accurate ads? Somehow they'll steal 30% of Google's lunch? Well, I don't know, but it sure looks like an incredible amount of engineering talent has been put toward getting us 30% more nothing.
If we increase the efficiency of something (lets say software) by 100%, all the good things that can be done with software gain a 100% efficiency. However, that does not equate to all the bad things that can be done with software gain a 100% efficiency. Many destructive actions are orders of magnitude more efficient than all things constructive (currently), so the net result is that the world gets more dangerous.
For a more physical example, consider that a truck filled with powerful explosives could knock down a sky scraper. That is, for a handful of manhours, it is possible to undo the work of hundreds of thousands of manhours, plus the hundreds of thousands of manhours society would need to divert to managing the after effects of that disaster, and the emotional cost, etc.
There's an underlying efficiency bonus that destructive actions have that is not being accounted for.
Adtech is still a bad joke.
Correction: terafo pointed out they shipped 12.7m cards in Q3 2021 alone.
I want to hate this idea, but it would be the same as hating machines replacing manual labor over the last 100 years.
im not sure what to think, nor how to prepare myself for the next 20 years.