My take on recent debate over AI-risk

In my post on donating to animal rights orgs, I noted that organizations that claim to be working on risks from AI are a lot less cash-starved, now that Elon Musk has donated $10 million to the Future of Life Institute. They’re also a lot less publicity-starved, with not only Musk but also Stephen Hawking and Bill Gates lending their names to the cause.

The publicity has, predictably, generated a lot of pushback (see examples here and here). And while I think the issue of AI risk is worth thinking about, I’m sympathetic to many of the points made by critics, and disappointed by the rebuttals I’ve seen. For example, Andrew Ng, who’s Chief Scientist at Baidu Research and also known for his online course on machine learning, has said:

There’s been this hype about AI superintelligence and evil robots taking over the world, and I think I don’t worry about that for the same reason I don’t worry about overpopulation on Mars… we haven’t set foot on the planet, and I don’t know how to productively work on that problem.

The above link is to a blog post by former MIRI executive director Luke Muehlhauser, whose response to Ng focuses on AI timelines. But whether human-level AI is centuries or merely decades away, it’s still true that we’re not close enough to have a clear idea of what human-level AI will be like when it does arrive. I don’t think possible risks from AI aren’t worth thinking about at all, but like Ng I don’t know a way to productively work on the problem, and I’m not sure anyone else does either.

But I know a lot of people on team “worry about AI” disagree, and in fact claim we should be sending vastly larger amounts of money to organizations like MIRI, that AI risk should be prioritized over other pressing issues like global poverty and factory farming. Recently, I actually had a friend tell me he’ll be happy once 10% of world GDP is being spent trying to prevent risks from AI. And frankly, I’ve never heard anything remotely approaching a good argument for claims like this.

It’s important to distinguish the claim that we should be giving a great deal more attention to possible risks from AI from the broader claim that we should be giving a great deal more attention to concerns relating to the far future. Even granting the broader claim, why focus on AI? What not nuclear war, or tail risks from climate change, or efforts to bring about beneficial long-term shifts in social norms and institutions? Why not building doomsday bunkers in Antarctica, for that matter?

(I mention this last one because it seems like a cheaper alternative to Elon Musk’s project of Mars colonization.)

Of course, you can argue against putting much effort into all of the cause areas I’ve just mentioned, but the question is whether the case for worrying about AI is any better than the case for worrying about those causes. Many of the same objections–such as lack of tractability, and certain scenarios being arguably unlikely–apply equally to AI.

I often hear AI risk folks cite philosopher Nick Bostrom’s book Superintelligence as the definitive source for arguments for prioritizing concern about AI. But I don’t think Bostrom’s book can fill the role its fans want it to. As Bostrom himself says in the book’s preface:

Many of the points made in this book are probably wrong… I have gone to some lengths to indicate nuances and degrees of uncertainty throughout the text–encumber it with an unsightly smudge of “possibly,” “might,” “may,” “could well,” “it seems,” “probably,” “very likely,” “almost certainly.”

Looking at the text itself, it’s the “possibly”s and “could well”s that most often accompany key points. This isn’t to say Superintelligence is a bad book, taken as presented. But a catalog of possibilities doesn’t make for much of an argument about what issue should be humanity’s #1 priority.

Bostrom is clearly very interested in “foom” scenarios where a single AI rapidly self-improves to the point where it is able to take over the world, perhaps in a matter of days. But as economist Robin Hanson has noted:

Bostrom’s book has much thoughtful analysis of AI foom consequences and policy responses. But aside from mentioning a few factors that might increase or decrease foom chances, Bostrom simply doesn’t given an argument that we should expect foom. Instead, Bostrom just assumes that the reader thinks foom likely enough to be worth his detailed analysis.

Note that Robin is also very interested in the possible future impact of AI. But his view is that we’re more likely to see a more gradual scenario, probably driven by digital “emulations” of actual human brains. And herein lies another problem: “AI will be important for humanity’s future” is an incredibly vague prediction, covering a vast range of scenarios. What makes sense as preparation for one scenario may make no sense if you think another scenario is much more likely.

This is a problem for team “worry about AI,” because it’s hard to, say, make a case for donating to MIRI without making some fairly specific claims about the future of AI. People like Luke have tried to claim otherwise, and once upon a time, I believed them, but I no longer can. Lately, I’ve been finding that when I look closely at the arguments, people will disavow more controversial claims like “foom” one minute, then implicitly assume a significant chance of “foom” the next.

I’m not the only person to get this sense. For example, this article by blogger Nathan Taylor complains that when AI “skeptics” and “believers” argue, they often seem to end up agreeing on the substantive issues at stake. Taylor concludes that a lot of the seeming pointlessness of recent debates about AI come from the fact that the real thing diving people is the foom issue.

I think this is sometimes true, but not always. In some arguments the “skeptic” and “believer” aren’t that far apart, but there are people on both sides whose views are more extreme. (People who think AI should be humanity’s #1 concern vs. people who think it’s impossible in principle for anything to go wrong.) I also don’t think foom is the only questionable assumption that many members of team “worry about AI” make.

For example, current MIRI executive director Nate Soares, responding to some skepticism about MIRI, writes:

First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn’t necessarily mean humans have a really hard time generating math — in fact, humans have a surprisingly good track record when it comes to generating math!

The assumption that the key to dealing with possible risks from AI is more-or-less straightforward math research is another assumption that gets asserted without much in the way of argument. It’s assumptions like these that people like Soares need to actually argue for if they’re going to go around claiming people need to donate more to MIRI.


6 thoughts on “My take on recent debate over AI-risk

  1. I think Ng’s comment about not knowing how to productively work on the issues captures a key point. SIAI/MIRI has also never had much in the way of technical approaches to the problem either. Giving MIRI more money will pull more researchers into a group with no productive paths, where they’ll languish.

    Liked by 1 person

  2. (full disclosure: I’m going to be working at MIRI as a researcher starting next month)

    I definitely see the logic of Ng’s comment at a high level, but it starts to lose its appeal if you zoom in on the problem and look at MIRI’s technical research agenda ( The problems in the technical agenda are things that we want to have a theoretical understanding of before building AGI, even before we know exactly how future AGIs will be implemented.

    IMO, the most compelling example is ontology identification. We could give an AI a goal such as “increase human happiness”, but how does it know what happiness is? There needs to be some way to translate human concepts (like “happiness”) into the AI’s model of reality, so it can actually do what we mean. There is currently very little understanding of how an AI could “do what we mean”, even if we allow the AI to have a hypercomputer. There are approaches (most notably, Solomonoff induction) to predicting sequences of bits using a hypercomputer, but we don’t know how to turn Solomonoff induction into something that will “do what we mean”. This is one of the problems that MIRI is working on.

    This problem seems fairly robust to different AGI architectures. Whether the AGI uses neural networks or program induction or evolutionary algorithms or whatever, it’s going to have to, at some point, model the world. No matter what form that model takes, we’ll have trouble relating it to human concepts without a better understanding of the problem. Current machine learning work is mostly about making more efficient inductive inference algorithms, but these algorithms bring very little theoretical progress to the ontology identification problem, because they’re still working within the inductive inference paradigm and have the same problems that Solomonoff induction has.

    Meanwhile, there are AGI architectures (such as AIXI) that will probably kill us all if they are actually implemented. Luckily, they’re extremely difficult to implement, but approximations of them will only get better as Moore’s Law continues and approximation algorithms improve.

    Much of MIRI’s work is more like “turning philosophy into math” than “doing math”. It’s easier to make the case that AI value alignment is a “turning philosophy into math” problem than that it’s a “doing math” problem. Past examples of turning philosophy into math include: Bayesian probability theory, VNM utility theory, Solomonoff induction, and Pearl’s work in causality. Examples in progress (where there’s math, but it still misses something about the philosophy) include updateless decision theory and logical uncertainty. These are the kinds of insights that MIRI is trying to crank out, and I expect them to increase our chances of making a value aligned AGI.

    I think the case for thinking about the problems ahead of time is quite clear even if we’re quite confident in slow takeoff. If, as Robin Hanson suggests, we get an exponential increase in human-level AGIs, we’ll still want to have these mini-AGIs to be value-aligned, especially as they outnumber humans. Unless of course we create _only_ literal whole brain emulations, which seems unlikely (we’ll probably be able to make human-level de novo AGI by this time).


  3. IMO, the most compelling example is ontology identification…

    This paragraph seems to obviously involve a bunch of assumptions that need justification? Like the claim that advanced AIs will be driven by high-level goals like that. It’s also vulnerable to the objection that it seems like an area where AI capabilities research will actually give us a much better idea of how to make safer AI; understanding what’s meant by “increase human happiness” sounds an awful lot like an NLP problem.

    Your examples of “turning philosophy into math” seem dubious to me. Sure the math is good, but I don’t think it’s as clear as you seem to think what the math shows about the philosophy.


    • I should add that I do think there are cases where we’ve managed to turn “philosophy” into science. In Newton’s day, “natural philosophy” wasn’t a figure of speech; physics was as much a part of philosophy as ethics was. But no one seems to have worked out a general formula for turning *any* philosophical question into science. If anyone had, we’d probably be a lot less confused about philosophy than we are today.


    • I’m having difficulty imagining a world where no AGIs have high-level goals. Since we already have algorithms (like the one DeepMind is using here for turning an inductive inference engine (in DeepMind’s case, a neural network) into a reinforcement learner, it seems like as long as very powerful inductive inference engines exist in the future, it will be easy to turn them into potentially dangerous reinforcement learners. I expect that, when asked when we’ll get human-level AGI, most people are at least imagining something capable of human-level inductive inference. So this is only really possible to have no goal-directed AGIs if these inductive inference algorithms are available to so few people that no one decides to connect one to a reinforcement learner. I’m making some assumptions here, but they seem fairly weak to me.

      Most NLP research, as far as I can tell, is about efficient inductive inference. You have a human-curated training set, and you’re trying to find an efficient algorithm to match it. To the extent that these algorithms have internal representations, they’re either human-engineered and not philosophically interesting, or very opaque (as in deep learning). It’s possible that NLP research will help to understand ontology identification usefully, but I don’t expect the current research approach to help significantly.

      I’m not sure if I should read your last sentence as disagreeing with the idea that insights along the same lines as VNM utility theory will help us create value-aligned AI. My heuristic here is that if we haven’t tried very hard to turn some useful concept (like extrapolated volition) into math, then we should try doing that until we’ve put in a fair amount of effort with little progress.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s