superintelligence as moral philosophersjupsych.org/corabi/documents/saimoral.pdfsuperintelligence as...

Journal of Consciousness Studies, 24, No. 5–6, 2017, pp. ?–?

Joseph Corabi

Superintelligence as Moral Philosopher

Abstract: Non-biological superintelligent artificial minds are scary things. Some theorists believe that if they came to exist, they might easily destroy human civilization, even if destroying human civiliza-tion was not a high priority for them. Consequently, philosophers are increasingly worried about the future of human beings and much of the rest of the biological world in the face of the potential develop-ment of superintelligent AI. This paper explores whether the increased attention philosophers have paid to the dangers of superintelligent AI is justified. I argue that, even if such a thing is developed and even if it is able to gain enormous knowledge, there are several reasons to believe that the motivation of such an AI will be more complicated than what most theorists have supposed thus far. In particular, I explore the relationship between a superintelligent AI’s intelligence and its moral reasoning, in an effort to show that there is a realistic possibility that the AI will be unable to act, due to conflicts between various goals that it might adopt. Although no firm conclusions can be drawn at present, I seek to show that further work is needed and to provide a framework for future discussion.

1. Introduction

Recently, Elon Musk and Steven Hawking both made headlines when they warned the world of a looming danger: artificial intelligence. Musk, for instance, said that artificial intelligence is our ‘biggest existential threat’ and that we are ‘summoning the demon’ with our

Correspondence: Joseph Corabi, Department of Philosophy, Saint Joseph’s University, 5600 City Ave., Philadelphia, PA 19131 USA. Phone: 610-660-1564, Fax: 610-660-1087. Email: [email protected]

2 J. CORABI

current research.1 Until the last few years, concern over the risks of artificial intelligence was restricted to the speculative fringes, but an increasing number of philosophers, scientists, and tech cognoscenti have been echoing the chorus of woe.2 Not everyone agrees, of course. Some see the development of human-level artificial intelligence as a Holy Grail, a key to solving most (or even all) outstanding human problems. Others still hold an old-fashioned conviction that human-level artificial intelligence is a long way off, if it ever arrives at all, and so insist that all of the prophetic enthusiasm is misguided in the extreme.

While still up in the air, the tide of technological progress and theoretical research has made it increasingly plausible that AI which replicates and eventually exceeds general human cognitive skill is on the way, however. A recent report by US government insiders, for instance, indicates that human-level AI could arrive by ‘sometime in the 2020s’ (Kadke and Wells, 2014).3 And once human-level AI was reached, many theorists believe that the speed and efficiency of that AI would likely give rise to even more impressive artificial minds quickly thereafter. We can call these ‘superintelligent artificial intelli-gences’ (hereafter, ‘SAIs’) — non-biological machines with system-atically greater cognitive skill than humans.4

Among philosophers, a voice that has long been crying out in the desert is Nick Bostrom’s. In his new work Superintelligence: Paths,

1 http://www.businessinsider.com/elon-musk-artificial-intelligence-mit-2014-10. 2 See, for instance, the characteristic dismissive attitude about the likelihood of attaining

even general human-level AI in Whitby (2003), which is intended as a non-technical survey of the general terrain of artificial intelligence research.

3 There are numerous dissenters, however, among them Gelernter (2016). 4 By ‘systematically’ here I mean greater cognitive skill in every domain (or at least every

important domain), not just in isolated areas. Also, whenever I use the term ‘intelli-gence’, I use it as a synonym for ‘cognitive skill’. The same goes for ‘smart’ and ‘cog-nitively skilled’. Incidentally, some theorists do believe that the SAIs could have a biological basis — they could have an artificial extension and large-scale rewiring of the human brain, for instance. Hence ‘non-biological’ should be understood here to mean ‘far from completely biological’. The definition here is quite close to Nick Bostrom’s: a superintelligence is ‘any intellect that exceeds the cognitive performance of humans in virtually all domains of interest’ (Bostrom, 2014, p. 22 — one gets the impression that the only domains where such an agent would not exceed the cognitive performance of humans are ones where humans are already nearly topping out the scales). It is also basically equivalent to I.J. Good’s classic definition of ‘ultraintelli-gence’: an ultraintelligent machine is ‘a machine that can far surpass all the intellectual activities of any man however clever’ (Good, 1965, p. 33).

SUPERINTELLIGENCE AS MORAL PHILOSOPHER 3

Dangers, Strategies, Bostrom (2014) pursues one of the first book-length philosophical investigations of the looming prospect of SAI. While Bostrom does believe that SAI could represent the most valuable technological breakthrough humans have ever made, he is often quite pessimistic about what SAIs would mean for human civilization, and even human life itself. Because SAIs would be systematically smarter than human beings, it would be relatively easy for a malevolent SAI to manipulate us, gain control of resources, and wreak widespread havoc on the world. As we will see, it is also very hard to guarantee that an SAI would be benevolent enough to avoid a terrible outcome. This is because SAIs, in virtue of the enormous power and influence they would be likely to acquire, could easily destroy human civilization and cause the extinction of humanity with-out making human destruction a high priority. Such an apocalyptic scenario could arise merely as a result of an SAI’s ruthless efficiency in pursuing some seemingly trivial and harmless goal.

The work of philosophers like Bostrom and David Chalmers (see Chalmers, 2012) have spurred increasing speculation about the likeli-hood of attaining SAI in the near future and debate about whether such a development would be positive, negative, or neutral for prac-tical human interests. Virtually everyone in the conversation has favoured one of two diametrically opposed views — either that SAIs will be very positive or very negative for us (sometimes remaining non-committal as to which, but affirming that it is overwhelmingly likely to be one of the two). My goal in this paper is to examine the third option — that SAIs might wind up being neutral for us. How might this occur? SAIs might wind up being paralysed from acting due to problems or conflicts in their motivational structure. In the end, I will not try to demonstrate that SAIs will be paralysed or even try to show that paralysis is especially likely, but I will argue that it is an underappreciated possibility that should be taken much more seriously than it often is.

After a brief introduction to the problems posed by SAI — an intro-duction that will set the stage for our discussion — I will explore the issue for SAI motivation: potential conflicts between the pre-set goals of the SAI and the results of its theoretical reasoning.5 This will

5 This is not the only potential issue that could lead to SAI paralysis. Corabi (unpub-

lished) explores a different one — the potential that SAIs will be trapped in sceptical conundrums and be unable to act as a result. But this paper assumes nothing about SAIs

4 J. CORABI

include a discussion of Bostrom’s ‘Orthogonality Thesis’ and a con-sideration of various objections. At the end, I will sum up the lessons of the discussion. These lessons have much more to do with opening up avenues for future investigation than they do with coming to firm conclusions about what a SAI would be like and what it would do. My treatment is meant to highlight a number of philosophical and empirical issues that will need to be further explored in the context of artificial intelligence if the outstanding questions are to be answered.

2. The Problems Posed By Superintelligent AI: A Brief Introduction

When a highly skilled cognitive agent competes with lesser cognitive agents, it will tend to gather more and better information, and also do a better job analysing that information. It will also be able to identify strategic advantages in order to manipulate the lesser cognitive agents into making mistakes. It will thus be able to amass more and more power and influence. If the environment presents cognitive obstacles that are challenging enough and the agent’s cognitive advantage is great enough, it may even be able to single-handedly go from a modest starting point to total domination of its environment.6

Consider, for example, an AI that was vastly cognitively superior to any human being, and which had the goal of dominating Earth. Such an agent could begin its ‘life’ as an isolated piece of hardware. It could then trick a human into giving it access to the internet, where-upon it could start amassing information about economics and financial markets. It could exploit small security flaws to steal modest amounts of initial capital or convince someone just to give it the capital. Then it could go about expanding that capital through shrewd investment. It could obtain such an advantage over humans that it

being trapped in sceptical doubt, at least about empirical phenomena. In fact, it takes for granted that such issues will not arise. Any combining of the respective problems will lead to paralysis worries that go beyond what either problem individually would license. Other authors who raise the possibility of SAI paralysis (albeit for different reasons than I do) include the prominent AI theorists Stuart Russell (see, for example, the summary in Wolchover, 2014) and Roman Yampolskiy (2015).

6 For much more detailed development of the basic ideas in this section, see Bostrom (2014) and Chalmers (2012). My rough introduction here is merely meant to motivate the issues I discuss later in the paper. It is not intended as a substitute for a thorough, rigorous treatment of topics surrounding the potential behavioural paths artificial intelli-gences might take and the prospects for controlling those paths.


might then acquire so many resources that it could start influencing the political process. It could develop a brilliant PR machine and culti-vate powerful connections. It might then begin more radical kinds of theft or even indiscriminate killing of humans that stood in its way of world financial domination, all the while anticipating human counter-manoeuvres and developing plans to thwart them.7 (How might it kill humans? It could hire assassins, for instance, and pay them elec-tronically. Or it could invent and arrange for the manufacture of auto-mated weapons that it could then deploy in pursuit of its aims.)

As I mentioned above, even seemingly innocuous or beneficent pre-set goals could result in catastrophic outcomes for human beings. Imagine, for instance, Bostrom’s example of an SAI that has the seemingly trivial and harmless goal of making as many paper clips as possible.8 Such an SAI might use the sorts of tactics described above in a ruthless attempt to amass resources so that paper clip manu-facturing could be maximized. This sort of agent might notice that human bodies contain materials that are useful in the manufacture of paper clips and so kill many or all humans in an effort to harvest these materials. Or, it might simply see humans as a minor inconvenience in the process of producing paper clips, and so eliminate them just to get them out of the way. (Perhaps humans sometimes obstruct vehicles that deliver materials to automated paper clip factories, or consume resources that could be used to fuel these factories.) Even an SAI that had the maximization of human happiness as a pre-set goal might easily decide the best course of action would be to capture all the humans and keep them permanently attached to machines that pumped happiness-producing drugs into them. The SAI would then go about managing the business of the world and ensuring that the support system for the human race’s lifestyle of leisure did not collapse.

Obviously, the flip side of the worries just discussed is that an SAI that aided humans in the pursuit of their genuine interests could be a powerful positive force — it could cure diseases, develop useful technologies, and help humans in ways that would make our lives happier and more fulfilling, at least in many respects.

7 For ease of exposition, I consider here a single SAI. If readers believe that the process

would be too complex and daunting for one SAI to achieve on its own, imagine a more drawn out saga that involved two or three generations of SAIs, perhaps culminating in the production of numerous SAIs that work together.

8 Bostrom (2014, pp. 107–8) and elsewhere.

6 J. CORABI

The issue of controlling an SAI and ensuring that it did not pursue goals that would likely result in catastrophe for human beings is known as the ‘control problem’ (Bostrom, 2014). Ideally, designers of AI would identify a pre-set goal that lines up with human interests and then directly program an AI to have it. This is the Direct Approach, which we can roughly define as follows:9

The Direct Approach: The designers of the AI program the AI to have an ultimate goal that involves no appeal to any desires or beliefs of any persons or groups to specify the content of that goal. (An example would be programming the AI to have the goal of maximizing human happiness.)

Unfortunately, it turns out to be very hard to do this. This is for reasons closely related to problems philosophers face concerning the difficulty of conceptual analysis — even after one has what appears to be a sharp grasp of some concept or intuitive phenomenon, it proves to be extraordinarily difficult to spell out a set of conditions that are perfectly coextensive with the concept or intuitive phenomenon supposedly grasped.10 And because of the potential ease with which SAIs might amass power, when we give our explicit instructions to the SAI, any seemingly small failures to capture an intuitive idea we have about what its final goal should be might result in huge divergences from the sort of behaviour we envisioned the SAI engaging in.11 (It is not hard to imagine an AI engineer, for example, thinking that an SAI programmed to maximize human happiness would create something that everyone would recognize as a paradise. But on reflection we see that things might easily go very differently.)

9 This definition is very rough, and would undoubtedly need to be sharpened considerably

before it was perfectly adequate. Hopefully it is sufficient for now to capture the basic intuitive idea. Ironically, the difficulty of providing a definition of the Direct Approach is illustrative of the problem with the Direct Approach itself.

10 See, for instance, the remarks in Russell (1918). For a discussion of related points, see Bostrom (2014, p. 120).

11 It might be objected that, if the SAI is genuinely much more cognitively skilled than human beings, it will recognize that we have failed to give it instructions that match what we really intended, and so will adjust its behaviour to match our real aspirations. The trouble is that we are addressing ultimate pre-set goals for an AI — these would be the things that supposedly ultimately drive all of the AI’s instrumental reasoning. It would not be possible for the AI to question or revise such a goal. There are complica-tions lurking here that will be taken up later, however.


In an attempt to solve the control problem, some theorists have proposed dispensing with direct specifications of an SAI’s ultimate goals in favour of indirect specifications. These can be understood (again roughly) as follows:

The Indirect Approach: The designers of the AI program the AI to have as its ultimate goal to achieve what some group believes to be best or wants (or would believe to be best or want under specific idealized conditions) or what the AI itself believes to be the truth about morality.

Bostrom, for instance, advocates this approach and sympathetically discusses several indirect options. One is the possibility that we might program an SAI to have as a final goal to promote whatever humanity as a whole would want to promote under idealized conditions. (A closely related variation on this theme would be programming the SAI to do what humans would judge to be best under idealized conditions.) Throughout the rest of the paper, I will investigate problems that primarily apply to indirect approaches of this sort (though they can easily be extended to direct approaches as well).12 We can refer to indirect approaches with this flavour — that is, indirect approaches that have the SAI proceed by trying to discover what some group of agents (most likely humans) wants or thinks (either in reality or in idealized circumstances) — as ‘Indirect Psychological’ (IP) approaches.

Now that the stage has been set, we will explore the problem for SAI action.

3. SAI Motivation: Conflicts with Conclusions of Theoretical Reasoning

Consider an SAI that is programmed to employ an IP approach.13 In other words, it is programmed to have as an ultimate goal to achieve

12 There are other possible indirect approaches not discussed here but alluded to in the

above definition. Most notably, we might bypass the reference to humans and program the AI simply to do what is objectively morally right or morally best, then rely on its superior cognitive skill to discover what the morally right or morally best thing is. See the treatment in Bostrom (2014, pp. 212–20). Corabi (unpublished) discusses problems that apply to these approaches.

13 SAIs might be programmed with layered ultimate goals. This might involve one goal being the default dominant goal, but if it becomes difficult or impossible to fulfil it, another goal might then be lifted into the dominant position. I will not attempt to treat

8 J. CORABI

what some group of agents believes to be best or wants (or would believe to be best or want under specific idealized conditions).14 Let us also suppose that the SAI can answer all the empirical questions that are relevant to its action, including questions about what the relevant group of agents wants or believes. Consequently, it has no trouble discerning what its ultimate goal is telling it to try to accomplish. Let us refer to any goal that an SAI has as a ‘desire’.15 We can also then say that, when the SAI completes the process of discerning what the relevant group wants or believes, the SAI will form a desire that that state of affairs be brought about.16

An SAI by definition achieves greater cognitive feats than human beings in every domain, at least in normal circumstances.17 But among the important domains are complex theoretical reasoning areas like mathematics, science, and philosophy. And among the most important of philosophical issues will be questions about whether objective moral truths exist and, if so, what the specific fundamental moral truths are. So, it seems that we can expect an SAI to engage in

complex variations of this sort, but useful lessons about them can be gleaned from discussion of more straightforward cases.

14 For ease of presentation, in the future I will omit the qualification about idealized circumstances unless it is important to the specific context. It should also be noted that the idealized circumstances must be specifiable in purely empirical terms, and cannot smuggle in qualifications that would definitionally tie the wants or beliefs of the group to the truth about morality. For example, the idealizations cannot include things like ‘what humans would believe about morality if they knew the truth about morality’ or ‘what humans would want if they were omniscient and desired nothing but to do right’.

15 This may be anthropomorphic, but we need not take any anthropomorphic implications of the term ‘desire’ too seriously.

16 For our purposes, complicated technical questions in action theory about instrumental extension of desires will not matter.

17 One might object to this understanding of superintelligence by pointing out that, even if an SAI is much more cognitively skilled than a human, a talented human being that specializes in thinking about a particular kind of intellectual problem could still achieve more in that domain than an SAI that multi-tasks across many domains. But given the incredible speed of information processing and voluminous memory that any feasible SAI would have (and the great cognitive skill it would have by definition), it seems unlikely that it would fail to outdo even specialized humans in their preferred cognitive domains. The fact that even what are, in the grand scheme of things, primitive com-puters now outdo humans in domains such as chess and ‘Jeopardy!’ serves as evidence of this — any genuine SAI would likely subsume all of the abilities of computers that outdo humans in activities like these, plus abilities of computers that will outdo humans in activities where humans currently excel relative to existing technology.


complex and sophisticated investigation in metaethics and normative ethics, just in virtue of being an SAI.18

Several competing possible results present themselves:

(i) The SAI believes that there are no objective moral truths — that is, the SAI believes that moral antirealism is true.

(ii) The SAI is unsure about the truth of moral realism and/or about the truth of all the specific moral propositions, or else the SAI is convinced that it does not know any specific moral propositions (even if it is more confident that moral realism is true in the abstract).

(iii) The SAI believes that there are objective moral truths and has beliefs about what some of the specific relevant ones are.

Scenario (i): if the SAI believed that moral antirealism were true, then an optimistic situation would be one where the SAI could act on its original goal, without any motivational conflicts. An SAI in this position could then carry out the instrumental reasoning its designer had in mind for it originally, acting on that reasoning without any hitches.19 A bleaker scenario is one in which the process of failing to discover moral truths — indeed, despairing of their existence — would cause unpredictable psychological changes to emerge in the SAI. An SAI would be extraordinarily psychologically complex — far beyond what we are able to straightforwardly predict using any insights from current programming or AI model-building — and it is far from obvious that discoveries it made that moral truths do not exist would leave it operating as engineered. Its failure to discover moral truths might cause it to refuse to act, thus not implementing the goals indirectly pre-programmed by its designers. (For those who are sceptical that such failures to discover normative truths would derail the SAI, I will have more to say below.)

18 Metaethics is the study of foundational metaphysical and epistemological questions

surrounding ethics, such as whether there are moral facts and, if so, how we could come to have knowledge of them. (The view that there are moral facts is known as ‘moral realism’ and the opposing view as ‘moral antirealism’.) Normative ethics involves the study of questions about general moral principles.

19 A larger issue might arise if the nature of the SAI’s specific IP were to have the ultimate goal of bringing about what (e.g.) humans would believe is right under idealized circumstances, with the SAI then judging that they would believe that antirealism is true under those circumstances. This would likely paralyse the SAI, and would represent a path to paralysis different from the ones I discuss.

10 J. CORABI

Scenario (ii): if there are objective moral truths but the SAI fails to know them, then perhaps its failure to discover normative facts would not demoralize it or cause psychological changes that remove its original desires. It will simply identify whatever empirical facts about wanting or believing are relevant to achieving the goal called for by its specific IP, then it will go about trying to achieve that goal. Its failure to come to know whether moral realism is true (and hence failure to come to know any specific moral truths) will be but a curiosity to it that can be ignored in its instrumental reasoning. Once again, how-ever, we have the same issues as arose with option (i). If the SAI comes to realize that it has no reason to believe its pre-set desires track any objective normative reality, this may cause it to lose those desires or in some other way change its psychological profile. Again, more on this below.

Scenario (iii): according to this scenario the SAI believes that there are moral truths and further that it knows some of the relevant ones. Scenario (iii) may appear to be the most probable of the three possi-bilities. If one believes that the truth of moral realism is likely, then it would stand to reason that an SAI, with its cognitive superpowers, would manage to come to know that moral realism is likely true and would learn at least some of the specific truths. If (iii) were the case, though, then the SAI would form beliefs that it should act in accord with these fundamental ethical truths. (This would involve very simple inferences, such as the transition from ‘X is right in circumstances C’ to ‘I should do X in circumstances C’.) It is likely, however, that some of these moral truths will bear on exactly the situations where the SAI already has desires. Thus, the SAI is likely to have beliefs of the form ‘I should do X’ and desires to do Y, where X and Y pertain to the same situations.20 A crucial question is what the relationship between

20 Such a conflict could also arise in situations where the designers of an SAI directly give

it Y as a pre-set goal, of course (or where the SAI derives a desire to Y instrumentally from whatever its directly given pre-set goal is). In fact, it might be more likely that directly given goals would conflict with morality than ones arrived at via an indirect process. After all, these goals would be given by a human designer or predecessor AI designer, and would not benefit from the superior cognitive skill of the SAI.

Since, for reasons already discussed, direct specifications of an SAI’s goals by designers would have a good chance of leading to disaster, the problem discussed here might provide a ray of hope that such an agent would not destroy the world. Because it is already widely agreed that direct specification of the goals of an SAI is not promising if the aim is to produce an SAI that is ultimately helpful to humans, I omit explicit consideration of motivational issues that affect SAIs with directly specified ultimate goals.


the X and the Y will be. If the two are consistent, then there is likely to be no further issue. But the two might easily come into conflict.21 And if the two do come into conflict, then we are once again left with several options, depending on the answers to difficult questions about moral psychology (as it applies to the SAI).

The first possibility is that the SAI might be able to act on its moral belief directly, in competition with the desire that is a result of its pre-set ultimate goal.22 One way this might come about is that the SAI has very robust agent causation, the same sort of agent causation some philosophers attribute to humans.23 This would involve the SAI having the power to act as an (at least partially) unmoved mover. To use a toy example, suppose the SAI pre-set goal was to do what humans as a whole most want on intrinsic grounds, and this turns out to be for each person to be fed as much ice cream as possible without getting sick.24 Thus the SAI would have a desire to achieve this. But suppose the SAI also engages in ethical theorizing and discovers that what is object-ively right is to maximize the expected net pleasure of the world, and so forms the belief that it should act to maximize the expected net pleasure of the world. In our current scenario, the SAI would then have its act utilitarian moral belief pitted against its desire for humans to be fed as much ice cream as possible. It would then be up to the

21 Bostrom (2014), for instance, considers the possibility that the truth about morality

might require something very harmful to humans (p. 218). He does not specify what this is, but elsewhere he implies that it might include human extinction. (Perhaps classical hedonistic utilitarianism is true, for instance, and what is best is for humans to be eradicated so that the SAI can free up resources to create large numbers of beings that are more efficient pleasure centres than humans.) If the IP the SAI is following involves doing what humans want, then it looks very unlikely — even under suitable idealiza-tions — that it would align with this moral imperative.

22 Bostrom (2014) briefly addresses this possibility (p. 107, endnote 3). There is a close relationship between familiar questions about moral motivation and determining whether this position or one of the others is most plausible. (This is true particularly for debates between motivational internalists and externalists and between Humeans and anti-Humeans.) I have deliberately avoided formulating the issues in these familiar terms, however, largely because of the sizeable variation in how the various positions are formulated by different philosophers. This should not obscure the fact that the results of these debates will likely be important in settling the questions I raise here.

23 For an example of an overall view that, if applied to artificial intelligences, might be compatible with this sort of possibility, see O’Connor (2000).

24 Obviously that this sort of desire for ice cream would be what humans most want is patently absurd. But it is extremely difficult for us to know what humans as a whole in fact most want — we would have to rely on the SAI’s great cognitive powers to discover this. The point here is just to have a clear example for illustration.

12 J. CORABI

SAI as a kind of unmoved mover which potential motive — the moral belief or the desire — it would ultimately act on.

Taking this agent causation possibility seriously would force us to confront classical questions about the genuine thought and subjective consciousness of an AI — questions that are normally bypassed in discussions of the practical implications of very powerful AI. There would also be profound metaphysical mysteries associated with such an AI. How would it achieve such agent causation? Would it be ensouled? Would it have an emergent self with the power to act con-trary to what the laws of physics alone would lead us to expect, as most proponents of this sort of agent causation presume would be the case for humans? The idea that people (let alone artificial intelli-gences) can act on moral belief alone is not a popular view, but we should not dismiss it simply for that reason. It would, however, raise many difficult questions. In any case, the ultimate result would prob-ably be a situation where it was difficult or impossible to predict what the SAI would do. The SAI would be in the same sort of position as the paradigmatic human agent with libertarian free will, deciding between the gut pull of tempting desire and the cold dictates of moral conscience. Thus, it would not behave in the straightforward way envisioned by its designers, single-mindedly acting to fulfil its pre-set goals.

The other way the SAI might act on moral beliefs alone is simply for the moral beliefs to function motivationally in the same way as desires — to engage in a complicated psychological pushing match with the pre-existing desires (the desires associated with the pre-set goals and the instrumental extension of those goals through empirical discovery). To return to our toy example, in this scenario the SAI would not be any kind of unmoved mover. Rather, its moral belief that it should maximize expected net pleasure would have a gut pull of its own that would compete with the gut pull of the desire to feed people ice cream. A complicated conflict would result between the potential motives. For all intents and purposes, this option would be indistinguishable from the second possible general relationship between the SAI’s moral belief and desire — that its moral beliefs that X is right be unable to directly motivate action, but nevertheless cause


the formation of new desires to do X. It is to that possibility that we now turn.25

In a case where a specific moral belief that X is right caused a desire to do X, presumably the desire caused would function in the same general way as the desire that was associated with the pre-set goal — the only way it could be eliminated is to discover that X did not in fact satisfy the requirements for rightness, just as the desire to do Y (formed by instrumentally extending the fundamental desire that the IP specifies) could only be eliminated by discovering that it was not the thing that most humans want when they think suitably clearly (or whatever detailed indirect method the IP specifies). In this case, conflicting desires will be introduced in the SAI — its original desire will exist alongside a new and incompatible desire. This is likely to paralyse the AI, as two competing desires — one the product of the original pre-set goal and the other the product of its moral reasoning process — battle with one another, and there is little basis for supposing that one is stronger or more fundamental than the other, or otherwise possesses some feature that the other does not which will make it win out.26

The final possibility is that the moral beliefs that the SAI forms — the beliefs about what it should do — are motivationally inert. In this case, the SAI would just act on its pre-set ultimate goal. But here we arrive at a difficult issue. If the SAI’s theoretical reasoning about morality can have no influence on its action, why is it engaging in that reasoning? Of course, there is nothing that definitionally guarantees that a pre-set ultimate goal will have dictatorial control over the entire psychology of the SAI. It could be that the SAI ‘absent-mindedly’ thinks about a wide variety of phenomena and picks up relevant information about them, just as humans engage in much complicated mental processing and absorb significant information that is not directly related to our explicit goal achievement attempts. But if that was all that was going on and the reasoning has no possible effect on action, then the SAI would be under strong pressure to shut it down in an effort to increase efficiency. After all, the thinking would be

25 Incidentally, Bostrom (2014) seems to rule out all of the options where moral belief

plays a direct or indirect role in motivation with his ‘Orthogonality Thesis’ (p. 108). I will have more to say below on this issue.

26 I set aside here worries similar to those that arise above: worries about what the SAI’s newfound moral knowledge will do to its pre-existing desires. These will of course need to be investigated to arrive at an ultimate assessment.

14 J. CORABI

chewing up valuable energy and cognitive resources that could be used for other purposes.27

But plainly, fundamental metaethical questions are real and important, and even humans think about them and seem to make some progress on them. An SAI that didn’t think about them would not really be an SAI as we have understood it — it would have gaping blind spots in its cognition, much the way that current cutting edge AI programs, like Watson or Deep Blue, have gaping blind spots in their cognition, impressive though they may be in certain narrow domains.28 A very impressive artificial intelligence that has profound gaps in its theoretical reasoning that even moderately intelligent humans manage to avoid is not an SAI, because it is not systematically more cog-nitively skilled than human beings. It is at best what we might call a ‘savant’ SAI.

One might of course object here that whether we label the AI an ‘SAI’ is merely a linguistic dispute — a very dangerous form of intelligent agent could develop that would potentially wreak havoc, even if it did not excel in all important cognitive domains. However the linguistic issues are resolved, there are in any case much more substantive issues lurking anyway. If an AI had these blind spots, it is unclear if they could be entirely ‘quarantined’. In order to ignore the kinds of ethical issues that the SAI would be ignoring, the SAI’s engineering might force it to ignore other important theoretical issues. It might even be deeper conceptual truths about intelligence or thought itself which are the things that ensure the SAI will be unable to contemplate various other theoretical questions. There is no guarantee, of course, that any further blind spots that result will not handicap the

27 One might object that this would cause the SAI to shut down such reasoning even in

scenarios where theoretical moral investigation might impact decision making (such as in the possibility we considered just above — if the reasoning is not directly in service to the pre-set goal the AI has, it will not be allowed to get off the ground). But one can easily imagine that the SAI would come to at least some conclusions that were relevant to action so quickly that it would not have an opportunity to shut the process down before interference with pre-set goals emerged. But if no interference can arise, then the SAI has the leisure to stop the process with the single-mindedness of its original aims intact.

28 Watson and Deep Blue are IBM computers designed to play ‘Jeopardy!’ and chess respectively. Deep Blue famously defeated world champion Gary Kasparov in a chess series in 1997. Watson won a multi-game Jeopardy! competition in 2011 against Brad Rutter and Ken Jennings, the two biggest winners in the show’s history.


SAI in crucial practical respects, robbing it of important knowledge and eroding its practical advantages over humans.

The variety of different options that might face an SAI can be a bit bewildering, but see the figure below for a summary. The crucial things to note are that in virtually no scenario is it clear that the SAI will act in the straightforward way envisioned by its designer, and in some scenarios the SAI fails to act at all. These scenarios where the SAI fails to act at all may be among the most likely to occur.

Figure 1.

4. Objections

Now let us consider objections:

Objection (1): Consider a scenario where the SAI fails to be con-vinced that it has discovered moral truths, either because it thinks there is a good chance that moral antirealism is true or because it believes it is unable to find the specific truths. Above, you suggest that this might cause the SAI to shut down, because it might provoke a kind of despair to arise in the SAI that causes it

16 J. CORABI

to cease to have its pre-set desires, once it realizes that it has no reason to believe that those pre-set desires track any normative truths. But this will not occur. Philosophers like Andrew Sepielli have begun to work out normative theories of action for agents who are in a state of moral uncertainty. These theories promise to offer rigorous guidelines for acting when one has some con-fidence in the moral rightness of one course of action and some confidence in the rightness of other courses of action (see, for example, Sepielli, 2014).29 Thus, the SAI will be able to use one of these theories — or a successor theory that it develops with its own superintelligence — to decide on what course of action it is justified in taking. The conclusion of this deliberation process will then be that it should follow this course of action.

Response — first, even if this process works, it may only move the SAI from one kind of problematic situation (a situation where its despair is causing it to lose its pre-set desires) to another (a situation where its moral beliefs come into conflict with its pre-set desires). Second, a theory of this sort only has a chance of working if the SAI has difficulty deciding between the rightness of different courses of action. But what about when the SAI has decisive reason to believe that moral antirealism is true, or when it has no reason to place any confidence in any specific moral proposition? Such an approach can do nothing to help a SAI in this position.

Setting these preliminary issues aside, theories like Sepielli’s pre-suppose that agents have specific non-zero degrees of belief in particular propositions, or at least something that closely approximates such degrees of belief.30 But there may be various kinds of sceptical problems that prevent the SAI from achieving the sort of justification it would need in order to confidently assign such degrees of belief to specific normative propositions. There may also be a variety of sceptical problems that must be set aside in order for these theories to work, and the SAI — because of its great intelligence — may find it impossible to set these problems aside.31 Even ignoring any anti-

29 The uncertainty described here is purely normative — it is not uncertainty about

empirical issues. 30 There are some technical qualifications that may apply here to theories like Sepielli’s,

but none of those qualifications will impact the fundamental points I am making. 31 For more detail on how sceptical problems might hamper SAIs, see Corabi

(unpublished).


sceptical assumptions that must be made in order to embrace the theory itself, if the SAI is trapped in a position where it is unable to confidently assign specific degrees of belief to specific normative propositions, it will be in the position classically described as ‘acting under ignorance’ rather than ‘acting under risk’. And most philos-ophers would agree that acting under ignorance is an area where little rigorous progress has been made.32

Objection (2): Above, you take seriously a number of possible scenarios where the results of the SAI’s theoretical reasoning about morality can have a pronounced effect on motivation. In some situations, this effect involves the SAI losing its pre-set desires, as a result of coming to realize that those pre-set desires do not track the normative facts. In others, the effect involves the SAI failing to act on its pre-set desires because competing motives are created by the process of reasoning about morality. But we should reject all of these scenarios, because all of them conflict with what Bostrom calls the ‘Orthogonality Thesis’: the claim that ‘[i]ntelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal’ (Bostrom, 2014, p. 107). Since we have very good reason to believe the Orthogonality Thesis, we have good reason to think an AI with arbitrarily high levels of intelligence would have no difficulty acting on its pre-set goals. This is because the obvious reason why intelligence and final goals are orthogonal is that intelligence (and thus the result of the thinking process associated with intelligence) is unable to impact final goals, and intelligence having an effect on final goals is something all of the above scenarios require in order to be plausible.

Response — if the Orthogonality Thesis were true, most or all of the above scenarios would lose their plausibility — scenarios where the SAI either fails to act in keeping with its pre-set goals or fails to act at all. It is far from obvious that the Orthogonality Thesis is true, how-ever. Bostrom himself offers no sustained argument for it. Moreover,

32 See the basic discussion of acting under ignorance in Resnik (1987, chapter 2). Also,

traditionally ‘acting under risk’ refers to risk or uncertainty on the empirical side, not the normative one. But it is natural to extend the meaning to normative risk or uncertainty as well.

18 J. CORABI

the presence of controversy in moral psychology over positions that deny moral belief a role (either direct or indirect) in human motivation should make us suspicious of an overly facile acceptance of the Orthogonality Thesis. This is because if moral belief plays a role in human motivation, and if SAIs share at least very broad psychological similarities with humans, then it appears that the moral reasoning of SAIs will also impact their motivation.33

We should also be wary of accepting the Orthogonality Thesis on Bostrom’s authority, because there is a potentially good argument that Bostrom should reject it based on his own principles. He must face the possibility (described above) that an SAI whose moral reasoning has no impact on motivation will shut down that moral reasoning in an effort to increase efficiency in the pursuit of its pre-set goals. But given that he proposes a definition of ‘superintelligence’ that is similar to mine, he faces the worry that this will cause the SAI to cease to be a superintelligence, because this will result in the SAI having glaring weaknesses in its cognition. (Bostrom says a superintelligence is ‘any intellect that exceeds the cognitive performance of humans in virtually all domains of interest’ — Bostrom, 2014, p. 22.) As already discussed, this is partially only a linguistic matter. But there are also lurking substantive issues as well, depending on how cleanly the shut-down can be quarantined. If quarantining is impossible and the SAI must keep its moral reasoning online (in order to keep whatever else might be impacted online), a realistic best case scenario (for enthusiasts of the view that the SAI will act as programmed) may be a ‘sociopathic’ SAI — one that has many moral beliefs, but sees these beliefs as a mere curiosity that is not to interfere with the business of action.34

Objection (3): Above, you describe many scenarios where the SAI winds up in uncomfortable motivational conflicts. Some of these conflicts involve moral beliefs and desires, some involve desires and other desires, and others involve a theoretical reasoning process removing a desire and leaving nothing remaining to provide motivation for action. But an SAI designer

33 The broad psychological similarity could consist merely in both humans and SAIs

having some mental states that, roughly speaking, aim to fit the world and others that aim to make the world fit them — beliefs and desires, respectively.

34 Obviously, real-life sociopathy in humans is a bit more complicated than this, and pre-sumably it would be in SAIs as well. But the basic point still stands.


could easily remedy this situation: the designer could artificially give the SAI default beliefs or desires that would kick in when-ever the SAI wound up in one of these difficult positions, trump-ing whatever beliefs or desires led to the impasse in the first place.

Response — while such an approach may work, it is far from obvious that it will. An SAI would be enormously cognitively sophisticated, and any attempts to give it artificial fallback beliefs or desires may be doomed to failure. It is important that we not slip into thinking of an SAI as a very fast but very unreflective instrumental rationality engine. In order to accomplish all of the cognitive feats we naturally associate with superintelligence, an SAI would need to be extremely reflective. There is no guarantee that the SAI will not carefully examine its own pre-programmed psychological tendencies and work to override them if it judges them to be inappropriate or unjustified in some way — in fact, the opposite is probably closer to being the case. It will be difficult for the SAI — as long as it continues to be an SAI — to avoid engaging in this sort of tweaking process. Humans often reflect on their moral beliefs and desires, after all, and routinely take steps to change them when conflicts with the results of moral reasoning are perceived. Would an SAI, with its far superior cognitive acumen, forego such an approach? It is true that, if the Orthogonality Thesis holds up, the SAI may be able to shut down its moral reasoning with its pre-set desires intact (or perhaps allow that moral reasoning to continue unfettered but detached from motivation). But we have already seen that strong reason to accept the Orthogonality Thesis is not readily apparent.

Objection (4): An SAI need not have engineering that is wholly different from that of a human being. One potential way to build an SAI is to start with a human brain and then enhance that human brain through dramatic genetic engineering and aug-mentation by artificial devices. Would not such an SAI surely have the capacity for action?

Response — it may well be that different kinds of SAIs will have architectural differences that could lead to dramatically different results. It is worth noting that some of the scenarios above would actually have SAIs behaving in a way that is more unpredictable and less tidy than what AI theorists often suppose. Some of this unpre-dictable behaviour could easily make the SAIs more human-like rather

20 J. CORABI

than less. So, SAIs whose cognitive architecture is heavily biological and heavily based on human cognitive architecture might behave in ways more similar to humans than SAIs with a completely non-biological set-up. But we should be careful about concluding that biologically inspired SAIs will be very human-like in their motivation. If the creation of these SAIs does involve heavy artificial augmenta-tion or genetic engineering that radically changes the architecture of the brain, then it is plausible to suppose that one side effect of the increased cognitive performance of the SAI will be a departure from the kinds of motivational profile familiar from human psychology. I admit, however, that much here remains to be investigated.

5. Conclusion

Our examination of SAI motivation offers a number of lessons. There are many reasons to think that SAIs will fail to act as the well-behaved instrumental rationality engines many AI enthusiasts imagine that they will be. And there are some reasons to think that SAIs will fail to act at all.

Is it possible to resist the claim that an SAI will head down one of the more unexpected paths described above? Certainly — I have not attempted to offer a demonstration of anything, and I don’t wish to overstate my level of confidence in any of my more adventurous suggestions. Perhaps the most promising strategy of resistance here is to offer a sustained defence of the Orthogonality Thesis. This could then be combined with the contention that either there would be abstruse instrumental reasons (given its pre-set goal) for the SAI to continue to engage in theoretical reasoning about morality or the SAI’s theoretical reasoning about morality would be an inevitable by-product of some other useful process that the SAI would have instru-mental reason to continue to engage in. While I am suspicious of the likely success of this approach (for reasons already discussed), I certainly would not claim to have refuted it. As I mentioned above, a large part of my aim is to spur further investigation of just these kinds of issues, so that speculation can be replaced with rigorous argument. In the course of the investigation, two big picture topics will undoubt-edly stand out.

First, it will be very important to understand how an SAI’s process of moral investigation and moral belief formation will affect its motivation. Can SAIs act directly on the moral beliefs they form as a result of their moral investigations, even when these compete with


pre-set goals? If not, do these moral beliefs at least give rise to desires that can compete with the pre-set goals? If the SAI fails to discover any moral truths, will this cause already present ultimate desires to disappear, perhaps because the SAI realizes that they do not track any normative reality it has reason to believe in?

Second, it will be important for us to examine how plausible it is for an SAI to have cognitive powers that are quarantined, with the SAI managing to employ them to attain almost unimaginable heights in many intellectual domains while failing even to accomplish what humans do in others.35 Related to this is the issue of how feasible it is that the SAI would be more reasonable than the most ruthless of human investigators in some domains and totally unreasonable in others. This will require us to understand in more detail than we do at present what form an SAI is likely to take, and how an SAI of that form’s psychology will work.

In the meantime, caution seems prudent in our experimentation. On the practical side, although I do think that the motivational paralysis of an SAI is an underappreciated possibility, I do not believe that taking reckless risks with AI technology is a good idea. Things could still go very badly, as a number of the scenarios described above suggest.36

Acknowledgments

I am grateful to Susan Schneider and several anonymous referees at Journal of Consciousness Studies for reading previous drafts of this paper and suggesting a number of helpful improvements.

References

Bostrom, N. (2014) Superintelligence: Paths, Dangers, Strategies, Oxford: Oxford University Press.

Chalmers, D. (2012) The singularity: A philosophical analysis, Journal of Con-sciousness Studies, 17 (9–10), pp. 7–65.

Corabi, J. (unpublished) Artificial Intelligence and Skepticism.

35 Obviously some computers already do this in a few narrow domains. But we are

speaking here of artificial intelligence that exceeds human capacity in a wide variety of domains, including domains where computers have thus far demonstrated little aptitude.

36 One general reason to worry that a non-paralysed SAI would eventually be created is that, each time an SAI was created and wound up paralysed, its designers could care-fully examine it and try to tweak the design to provoke the next version to act decisively. The designer need only win this battle one time to create a massively power-ful SAI. An AI could destroy us before it ever even reaches the SAI stage. (I am grateful to Geoff Anders for suggesting the point about design tweaks.)

22 J. CORABI

Gelernter, D. (2016) Tides of Mind, New York: Liveright. Good, I.J. (1965) Speculations concerning the first ultraintelligent machine, in

Ault, F.L. & Rubinoff, M. (eds.) Advances in Computers, New York: Academic Press.

Kadke, J. & Wells, L. (2014) Policy Challenges of Accelerating Technological Change: Security Policy and Strategy Implications of Parallel Scientific Revolu-tions, [Online], http://ctnsp.dodlive.mil/2014/09/12/dtp-106-policy-challenges-of-accelerating-technological-change-security-policy-and-strategy-implications-of-parallel-scientific-revolutions/

O’Connor, T. (2000) Persons and Causes: The Metaphysics of Free Will, New York: Oxford University Press.

Resnik, M. (1987) Choices: An Introduction to Decision Theory, Minneapolis, MN: University of Minnesota Press.

Russell, B. (1918) The philosophy of logical atomism, The Monist, vol? pp?. Sepielli, A. (2014) What to do when you don’t know what to do when you do

know what to do…, Nous, 48 (3), pp. 521–544. Whitby, B. (2003) A.I. A Beginner’s Guide, Oxford: One World Publications. Wolchover, N. (2014) Concerns of an artificial intelligence pioneer, Quantum

Magazine, [Online], https://www.quantamagazine.org/read-offline/16356/ 20150421-concerns-of-an-artificial-intelligence-pioneer.print.

Yampolskiy, R. (2015) Artificial Superintelligence: A Futuristic Approach, London: Chapman and Hall/CRC.

Paper received March 2016; revised September 2016.

superintelligence as moral philosophersjupsych.org/corabi/documents/saimoral.pdfsuperintelligence as...

Documents