They write "Yet intuitively it seems like it should often be possible to predict which actions are dangerous and explore in a way that avoids them, even when we don’t have that much information about the environment." For humans, yes. None of the tweaks on machine learning they suggest do that, though. If your constraints are in the objective function, the objective function needs to contain the model of "don't do that". Which means you've just moved the common sense problem to the objective function.
Important problem to work on, even though nobody has made much progress on it in decades.
The success of deep learning might be something of a curse - it's go enough success that creating a safe system seems to automatically be modifying a neural net to be safe despite it not having the "engineered from the start" quality.
Possibly. In AI, someone has a good idea, there's great enthusiasm, people in the field predict Strong AI Real Soon Now, the good idea hits a ceiling, and then people are stuck for a while. AI has been through four cycles of that. The ceiling of the current cycle may be in sight.
The next big problem is, as they say, "safety", or "common sense". Nobody really has a handle on how to do that yet. Checking proposed explorations against a simulation of some kind works if you can simulate the world well enough. Outside that niche, it's hard.
Collecting vast amounts of data to predict what can go wrong without an underlying model runs into combinatorial limits. More things can go wrong than you are likely to be able to observe.
Good that there are people thinking about this.
Much is made of GPT-3's ability to sometimes do logic or even arithmetic. But that ability is unreliable and even more spread through the whole giant model. Extracting a particular piece of specifically logical reasoning from the model is hard problem. You can do it - N-times the cost of the model. And in general, you can add extras to the basic functionality of deep neural nets (few-shot, generational, etc) but with a cost of, again, N-times the base (plus decreased reliability). But the "full" qualities mentioned initially would many-many extras-equivalent to one-shot and need to have them happen on the fly. (And one-shot is fairly easy seeming. Take a system that recognizes images by label ("red", "vehicle", etc). Show it thing X - it uses the categories thing X activates to decide whether other things are similar to thing X. Simple but there's still lots of tuning to do here).
Just to emphasize, I think they'll need something extra in the basic approach.
Note - I never argued that "extras" (including formal "explanations") can't be added to deep learning system. My point is you absolutely can add some steps at generally high cost. The argument is those sequence of small steps won't get you to the ideal of broad flexibility that the OP landing page outlines.
Or they just didn't like their boss that much?
It seems as a layman that safer and more legible AI would come from putting neural networks inside of a more 'old-fashioned AI' framework - layers of control loops, decision trees etc based on the output of the black box AI. So at least if your robot crashes or whatever, the computer vision output is in some legible intermediate form and the motion is tractable to traditional analysis.
This can't be an original thought, it must already be done to a large extent. But I get the impression that the way things are going is to get the trained model to 'drive' the output, for whatever the application is. Can someone with current industry knowledge comment?