Superintelligence: Paths, Dangers, Strategies
Rate it:
Open Preview
Read between January 23 - February 16, 2018
33%
Flag icon
For example, a very detailed simulation of some actual or hypothetical human mind might be conscious and in many ways comparable to an emulation. One can imagine scenarios in which an AI creates trillions of such conscious simulations, perhaps in order to improve its understanding of human psychology and sociology. These simulations might be placed in simulated environments and subjected to various stimuli, and their reactions studied. Once their informational usefulness has been exhausted, they might be destroyed (much as lab rats are routinely sacrificed by human scientists at the end of an ...more
33%
Flag icon
There might also be other instrumental reasons, aside from epistemic ones, for a machine superintelligence to run computations that instantiate sentient minds or that otherwise infract moral norms. A superintelligence might threaten to mistreat, or commit to reward, sentient simulations in order to blackmail or incentivize various external agents; or it might create simulations in order to induce indexical uncertainty in outside observers.
34%
Flag icon
It is important to realize that some control method (or combination of methods) must be implemented before the system becomes superintelligent. It cannot be done after the system has obtained a decisive strategic advantage. The need to solve the control problem in advance—and to implement the solution successfully in the very first system to attain superintelligence—is part of what makes achieving a controlled detonation such a daunting challenge.
35%
Flag icon
Another problem with the incentive scheme is that it presupposes that we can tell whether the outcomes produced by the AI are in our interest.
36%
Flag icon
The traditional illustration of the direct rule-based approach is the “three laws of robotics” concept, formulated by science fiction author Isaac Asimov in a short story published in 1942.22 The three laws were: (1) A robot may not injure a human being or, through inaction, allow a human being to come to harm; (2) A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law; (3) A robot must protect its own existence as long as such protection does not conflict with the First or Second Law. Embarrassingly for our species, Asimov’s laws ...more
36%
Flag icon
“everything is vague to a degree you do not realize till you have tried to make it precise.”24 Russell’s dictum applies in spades to the direct specification approach.
37%
Flag icon
Most importantly, legal systems are administered by judges and juries who generally apply a measure of common sense and human decency to ignore logically possible legal interpretations that are sufficiently obviously unwanted and unintended by the lawgivers. It is probably humanly impossible to explicitly formulate a highly complex set of detailed rules, have them apply across a highly diverse set of circumstances, and get it right on the first implementation.
37%
Flag icon
For instance, the goal “Maximize the expectation of the balance of pleasure over pain in the world” may appear simple. Yet expressing it in computer code would involve, among other things, specifying how to recognize pleasure and pain. Doing this reliably might require solving an array of persistent problems in the philosophy of mind—even just to obtain a correct account expressed in a natural language, an account which would then, somehow, have to be translated into a programming language.
37%
Flag icon
Indirect normativity is a very important approach to motivation selection. Its promise lies in the fact that it could let us offload to the superintelligence much of the difficult cognitive work required to carry out a direct specification of an appropriate final goal.
37%
Flag icon
Creating a motivation system for a seed AI that remains reliably safe and beneficial under recursive self-improvement even as the system grows into a mature superintelligence is a tall order, especially if we must get the solution right on the first attempt.
39%
Flag icon
Could one have a general intelligence that is not an agent? Intuitively, it is not just the limited capability of ordinary software that makes it safe: it is also its lack of ambition. There is no subroutine in Excel that secretly wants to take over the world if only it were smart enough to find a way. The spreadsheet application does not “want” anything at all; it just blindly carries out the instructions in the program.
40%
Flag icon
Instead of allowing agent-like purposive behavior to emerge spontaneously and haphazardly from the implementation of powerful search processes (including processes searching for internal work plans and processes directly searching for solutions meeting some user-specified criterion), it may be better to create agents on purpose.
41%
Flag icon
The apparent safety of a tool-AI, meanwhile, may be illusory. In order for tools to be versatile enough to substitute for superintelligent agents, they may need to deploy extremely powerful internal search and planning processes. Agent-like behaviors may arise from such processes as an unplanned consequence. In that case, it would be better to design the system to be an agent in the first place, so that the programmers can more easily see what criteria will end up determining the system’s output.
43%
Flag icon
They would live in a world with extremely advanced technology, including not only superintelligent machines but also anti-aging medicine, virtual reality, and various enhancement technologies and pleasure drugs: yet these might be generally unaffordable. Perhaps instead of using enhancement medicine, they would take drugs to stunt their growth and slow their metabolism in order to reduce their cost of living (fast-burners being unable to survive at the gradually declining subsistence income).
43%
Flag icon
As our numbers increase and our average income declines further, we might degenerate into whatever minimal structure still qualifies to receive a pension—perhaps minimally conscious brains in vats, oxygenized and nourished by machines, slowly saving up enough money to reproduce by having a robot technician develop a clone of them.
44%
Flag icon
Emulations can now begin to outsource increasing portions of their functionality. Why learn arithmetic when you can send your numerical reasoning task to Gauss-Modules, Inc.? Why be articulate when you can hire Coleridge Conversations to put your thoughts into words? Why make decisions about your personal life when there are certified executive modules that can scan your goal system and manage your resources to achieve your goals better than if you tried to do it yourself?
44%
Flag icon
The bouillon cubes of discrete human-like intellects thus melt into an algorithmic soup.
44%
Flag icon
we must countenance the possibility that human-like cognitive architectures are optimal only within the constraints of human neurology
45%
Flag icon
The word “evolution” is often used as a synonym of “progress,” perhaps reflecting a common uncritical image of evolution as a force for good. A misplaced faith in the inherent beneficence of the evolutionary process can get in the way of a fair evaluation of the desirability of a multipolar outcome in which the future of intelligent life is determined by competitive dynamics.
47%
Flag icon
It is impossible to enumerate all possible situations a superintelligence might find itself in and to specify for each what action it should take. Similarly, it is impossible to create a list of all possible worlds and assign each of them a value. In any realm significantly more complicated than a game of tic-tac-toe, there are far too many possible states (and state-histories) for exhaustive enumeration to be feasible. A motivation system, therefore, cannot be specified as a comprehensive lookup table. It must instead be expressed more abstractly, as a formula or rule that allows the agent to ...more
48%
Flag icon
Computer languages do not contain terms such as “happiness” as primitives. If such a term is to be used, it must first be defined. It is not enough to define it in terms of other high-level human concepts—“happiness is enjoyment of the potentialities inherent in our human nature” or some such philosophical paraphrase. The definition must bottom out in terms that appear in the AI’s programming language, and ultimately in primitives such as mathematical operators and addresses pointing to the contents of individual memory registers. When one considers the problem from this perspective, one can ...more
48%
Flag icon
But if one seeks to promote or protect any plausible human value, and one is building a system intended to become a superintelligent sovereign, then explicitly coding the requisite complete goal representation appears to be hopelessly out of reach.
48%
Flag icon
Solving the value-loading problem is a research challenge worthy of some of the next generation’s best mathematical talent. We cannot postpone confronting this problem until the AI has developed enough reason to easily understand our intentions. As we saw in the section on convergent instrumental reasons, a generic system will resist attempts to alter its final values. If an agent is not already fundamentally friendly by the time it gains the ability to reflect on its own agency, it will not take kindly to a belated attempt at brainwashing or a plot to replace it with a different agent that ...more
50%
Flag icon
We can also control their environment so that they receive rewards only when they act in ways that are agreeable to us. But a reinforcement learner has a strong incentive to eliminate this artificial dependence of its rewards on our whims and wishes. Our relationship with a reinforcement learner is therefore fundamentally antagonistic. If the agent is strong, this spells danger.
50%
Flag icon
To clarify, the difficulty here is not so much how to ensure that the AI can understand human intentions. A superintelligence should easily develop such understanding. Rather, the difficulty is ensuring that the AI will be motivated to pursue the described values in the way we intended. This is not guaranteed by the AI’s ability to understand our intentions: an AI could know exactly what we meant and yet be indifferent to that interpretation of our words (being motivated instead by some other interpretation of the words or being indifferent to our words altogether).
53%
Flag icon
When we look back, we see glaring deficiencies not just in the behavior but in the moral beliefs of all previous ages. Though we have perhaps since gleaned some moral insight, we could hardly claim to be now basking in the high noon of perfect moral enlightenment. Very likely, we are still laboring under one or more grave moral misconceptions.
54%
Flag icon
In such circumstances to select a final value based on our current convictions, in a way that locks it in forever and precludes any possibility of further ethical progress, would be to risk an existential moral calamity.
54%
Flag icon
If by selecting a final value for the superintelligence we had to place a bet not just on a general moral theory but on a long conjunction of specific claims about how that theory is to be interpreted and integrated into an effective decision-making process, then our chances of striking lucky would dwindle to something close to hopeless. Fools might eagerly accept this challenge of solving in one swing all the important problems in moral philosophy, in order to infix their favorite answers into the seed AI. Wiser souls would look hard for some alternative approach, some way to hedge.
54%
Flag icon
Yudkowsky has proposed that a seed AI be given the final goal of carrying out humanity’s “coherent extrapolated volition” (CEV),
54%
Flag icon
Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
54%
Flag icon
it is more plausible that our CEV would wish for there to be people in the future who live rich and happy lives than that it would wish that we should all sit on stools in a dark room experiencing pain. If we can make at least some such judgments sensibly, so can a superintelligence. From the outset, the superintelligence’s conduct could thus be guided by its estimates of the content of our CEV.
54%
Flag icon
The CEV dynamic is supposed to act only when our wishes cohere. On issues on which there is widespread irreconcilable disagreement, even after the various idealizing conditions have been imposed, the dynamic should refrain from determining the outcome. To continue the cooking analogy, it might be that individuals or cultures will have different favorite dishes, but that they can nevertheless broadly agree that aliments should be nontoxic. The CEV dynamic could then act to prevent food poisoning while otherwise allowing humans to work out their culinary practices without its guidance or ...more
54%
Flag icon
The CEV approach is meant to be robust and self-correcting; it is meant to capture the source of our values instead of relying on us correctly enumerating and articulating, once and for all, each of our essential values.
55%
Flag icon
there are reasons to believe that our current moral beliefs are flawed in many ways; perhaps deeply flawed. If we were to stipulate a specific and unalterable moral code for the AI to follow, we would in effect be locking in our present moral convictions, including their errors, destroying any hope of moral growth. The CEV approach, by contrast, allows for the possibility of such growth because it has the AI try to do that which we would have wished it to do if we had developed further under favorable conditions,
55%
Flag icon
Distributing influence over humanity’s future is not only morally preferable to the programming team implementing their own favorite vision, it is also a way to reduce the incentive to fight over who gets to create the first superintelligence. In the CEV approach, the programmers (or their sponsors) exert no more influence over the content of the outcome than any other person—though they of course play a starring causal role in determining the structure of the extrapolation and in deciding to implement humanity’s CEV instead of some alternative.
56%
Flag icon
Suppose that this ethical theory is true, and that the AI knows it to be so. For present purposes, we can define hedonistic consequentialism as the claim that an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering. The AI, following MP, might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any ...more
57%
Flag icon
Many of the complications that might break the currently most popular decision theories were discovered only recently, suggesting that there might exist further problems that have not yet come into sight. The result of giving the AI a flawed decision theory might be disastrous, possibly amounting to an existential catastrophe.
57%
Flag icon
One might think that there is a limit to how much damage could arise from an incorrectly specified epistemology. If the epistemology is too dysfunctional, then the AI could not be very intelligent and it could not pose the kind of risk discussed in this book. But the concern is that we may specify an epistemology that is sufficiently sound to make the AI instrumentally effective in most situations, yet which has some flaw that leads the AI astray on some matter of crucial importance. Such an AI might be akin to a quick-witted person whose worldview is predicated on a false dogma, held to with ...more
58%
Flag icon
A good aim would be to endow the AI with fundamental epistemological principles that match those governing our own thinking. Any AI diverging from this ideal is an AI that we would judge to be reasoning incorrectly if we consistently applied our own standards. Of course, this applies only to our fundamental epistemological principles. Non-fundamental principles should be continuously created and revised by the seed AI itself as it develops its understanding of the world. The point of superintelligence is not to pander to human preconceptions but to make mincemeat out of our ignorance and ...more
58%
Flag icon
imagine that we first built an oracle AI for the sole purpose of answering questions about what the sovereign AI would do. As earlier chapters revealed, there are risks in creating a superintelligent oracle (such as risks of mind crime or infrastructure profusion). But for purposes of this example let us assume that the oracle AI has been successfully implemented in a way that avoided these pitfalls. We thus have an oracle AI that offers us its best guesses about the consequences of running some piece of code intended to implement humanity’s CEV. The oracle may not be able to predict in detail ...more
58%
Flag icon
It is not necessary for us to create a highly optimized design. Rather, our focus should be on creating a highly reliable design, one that can be trusted to retain enough sanity to recognize its own failings. An imperfect superintelligence, whose fundamentals are sound, would gradually repair itself; and having done so, it would exert as much beneficial optimization power on the world as if it had been perfect from the outset.
60%
Flag icon
progress on the control problem may be especially contingent on extreme levels of intellectual performance—even more so than the kind of work necessary to create machine intelligence.
60%
Flag icon
It requires foresight and reasoning to realize why the control problem is important and to make it a priority.10 It may also require uncommon sagacity to find promising ways of approaching such an unfamiliar problem.
61%
Flag icon
while whole brain emulation would require massive progress in various enabling technologies, it might not require any major new theoretical insight. In particular, it does not require that we understand how human cognition works, only that we know how to build computational models of small parts of the brain, such as different species of neuron. Nevertheless, in the course of developing the ability to emulate human brains, a wealth of neuroanatomical data would be collected, and functional models of cortical networks would surely be greatly improved. Such progress would seem to have a good ...more
61%
Flag icon
hardware can to some extent substitute for software; thus, better hardware reduces the minimum skill required to code a seed AI. Fast computers might also encourage the use of approaches that rely more heavily on brute-force techniques (such as genetic algorithms and other generate-evaluate-discard methods) and less on techniques that require deep understanding to use. If brute-force techniques lend themselves to more anarchic or imprecise system designs, where the control problem is harder to solve than in more precisely engineered and theoretically controlled systems, this would be another ...more
65%
Flag icon
We should also take heed not to work on problems that are negative-value (such that solving them is harmful). Some technical problems in the field of artificial intelligence, for instance, might be negative-value inasmuch as their solution would speed the development of machine intelligence without doing as much to expedite the development of control methods that could render the machine intelligence revolution survivable and beneficial.
66%
Flag icon
Imagine a project that invests millions of dollars and years of toil to develop a prototype AI, and that after surmounting many technical challenges the system is finally beginning to show real progress. There is a chance that with just a bit more work it could turn into something useful and profitable. Now a crucial consideration is discovered, indicating that a completely different approach would be a bit safer. Does the project kill itself off like a dishonored samurai, relinquishing its unsafe design and all the progress that had been made? Or does it react like a worried octopus, puffing ...more
66%
Flag icon
Before the prospect of an intelligence explosion, we humans are like small children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct. Superintelligence is a challenge for which we are not ready now and will not be ready for a long time. We have little idea when the detonation will occur, though if we hold the device to our ear we can hear a faint ticking sound.
66%
Flag icon
It has become easier to treat superintelligence as a non-silly topic—to take seriously the view that a machine intelligence transition might occur in this century, that such a transition might be among the most important events in human history, that it might be accompanied by some amount of existential risk as well as tremendous upside, and that it would be prudent to put in a bit of work in advance to see if there is something we should be doing to shorten the odds of a favorable outcome. Granted, there is still that picture of the Terminator jeering over practically every journalistic ...more
66%
Flag icon
This progress in building up the AI safety and impacts fields, though remarkable for having taken place in such a short span of time, should not be overstated. Yes, more funding is flowing into the field; but it is still two to three orders of magnitude less than is going into simply making machines smarter. Yes, there is more interest in thinking about the consequences of advances in machine intelligence; but much of this ends up focusing on nearer-term concerns such as lethal autonomous weapons, labor market impacts of automation, cybercrime, privacy, or self-driving cars. These are not ...more