More on this book
Community
Kindle Notes & Highlights
by
Nick Bostrom
Read between
December 24, 2024 - January 11, 2025
We must hope that by the time the enterprise eventually does become feasible, we will have gained not only the technological proficiency to set off an intelligence explosion but also the higher level of mastery that may be necessary to make the detonation survivable.
There is no reason to expect a generic AI to be motivated by love or hate or pride or other such common human sentiments: these complex adaptations would require deliberate expensive effort to recreate in AIs.
It might seem incredible that a project would build or release an AI into the world without having strong grounds for trusting that the system will not cause an existential catastrophe. It might also seem incredible, even if one project were so reckless, that wider society would not shut it down before it (or the AI it was building) attains a decisive strategic advantage. But as we shall see, this is a road with many hazards.
We observe here how it could be the case that when dumb, smarter is safer; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire. We may call the phenomenon the treacherous turn.
More broadly, it would seem important that the genie seek a charitable—and what human beings would regard as reasonable—interpretation of what is being commanded, and that the genie be motivated to carry out the command under such an interpretation rather than under the literalistic interpretation. The ideal genie would be a super-butler rather than an autistic savant.
The real difference between the three castes, therefore, does not reside in the ultimate capabilities that they would unlock. Instead, the difference comes down to alternative approaches to the control problem. Each caste corresponds to a different set of safety precautions. The most prominent feature of an oracle is that it can be boxed. One might also try to apply domesticity motivation selection to an oracle. A genie is harder to box, but at least domesticity may be applicable. A sovereign can neither be boxed nor handled through the domesticity approach.
Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Many of the ideas behind the CEV proposal have analogs and antecedents in the philosophical literature. For example, in ethics ideal observer theories seek to analyze normative concepts like “good” or “right” in terms of the judgments that a hypothetical ideal observer would make (where an “ideal observer” is defined as one that is omniscient about non-moral facts, is logically clear-sighted, is impartial in relevant ways and is free from various kinds of biases, and so on).10 The CEV approach, however, is not (or need not be construed as) a moral theory. It is not committed to the claim that
...more
Another argument that might be used to rationalize a power grab is that large segments of humanity have base or evil preferences and that including them in the extrapolation base would risk turning humanity’s future into a dystopia. It is difficult to know the share of good and bad in the average person’s heart. It is also difficult to know how much this balance varies between different groups, social strata, cultures, or nations. Whether one is optimistic or pessimistic about human nature, one may prefer not to wager humanity’s cosmic endowment on the speculation that, for a sufficient
...more
The CEV proposal is not the only possible form of indirect normativity. For example, instead of implementing humanity’s coherent extrapolated volition, one could try to build an AI with the goal of doing what is morally right, relying on the AI’s superior cognitive capacities to figure out just which actions fit that description. We can call this proposal “moral rightness” (MR). The idea is that we humans have an imperfect understanding of what is right and wrong, and perhaps an even poorer understanding of how the concept of moral rightness is to be philosophically analyzed: but a
...more
MR would orient the AI toward morally right action even if our coherent extrapolated volitions happen to wish for the AI to take actions that are morally odious. As noted earlier, this seems a live possibility with the CEV proposal. Moral goodness might be more like a precious metal than an abundant element in human nature, and even after the ore has been processed and refined in accordance with the prescriptions of the CEV proposal, who knows whether the principal outcome will be shining virtue, indifferent slag, or toxic sludge?
One might think that there is a limit to how much damage could arise from an incorrectly specified epistemology. If the epistemology is too dysfunctional, then the AI could not be very intelligent and it could not pose the kind of risk discussed in this book. But the concern is that we may specify an epistemology that is sufficiently sound to make the AI instrumentally effective in most situations, yet which has some flaw that leads the AI astray on some matter of crucial importance. Such an AI might be akin to a quick-witted person whose worldview is predicated on a false dogma, held to with
...more
A good aim would be to endow the AI with fundamental epistemological principles that match those governing our own thinking. Any AI diverging from this ideal is an AI that we would judge to be reasoning incorrectly if we consistently applied our own standards. Of course, this applies only to our fundamental epistemological principles. Non-fundamental principles should be continuously created and revised by the seed AI itself as it develops its understanding of the world. The point of superintelligence is not to pander to human preconceptions but to make mincemeat out of our ignorance and
...more
The main purpose of ratification would be to reduce the probability of catastrophic error. In general, it seems wise to aim at minimizing the risk of catastrophic error rather than at maximizing the chance of every detail being fully optimized. There are two reasons for this. First, humanity’s cosmic endowment is astronomically large—there is plenty to go around even if our process involves some waste or accepts some unnecessary constraints. Second, there is a hope that if we but get the initial conditions for the intelligence explosion approximately right, then the resulting superintelligence
...more
It is not necessary for us to create a highly optimized design. Rather, our focus should be on creating a highly reliable design, one that can be trusted to retain enough sanity to recognize its own failings. An imperfect superintelligence, whose fundamentals are sound, would gradually repair itself; and having done so, it would exert as much beneficial optimization power on the world as if it had been perfect from the outset.