More on this book
Community
Kindle Notes & Highlights
by
Nick Bostrom
Read between
May 7, 2024 - February 5, 2025
One sympathizes with John McCarthy, who lamented: “As soon as it works, no one calls it AI anymore.”
There is no reason to expect a generic AI to be motivated by love or hate or pride or other such common human sentiments: these complex adaptations would require deliberate expensive effort to recreate in AIs.
Machines have a number of fundamental advantages which will give them overwhelming superiority. Biological humans, even if enhanced, will be outclassed.
We can think of wisdom as the ability to get the important things approximately right.
First, we can expand the range of our reference points by considering nonhuman animals, which have intelligence of lower quality. (This is not meant as a speciesist remark. A zebrafish has a quality of intelligence that is excellently adapted to its ecological needs;
Nevertheless, this chapter will present some reasons for thinking that the slow transition scenario is improbable. If and when a takeoff occurs, it will likely be explosive.
Using its strategizing superpower, the AI develops a robust plan for achieving its long-term goals. (In particular, the AI does not adopt a plan so stupid that even we present-day humans can foresee how it would inevitably fail. This criterion rules out many science fiction scenarios that end in human triumph.10
For example, we could end up with an AI that would be willing to take extreme risks for the sake of a small chance of eventually obtaining control of a large share of the universe. It could be expensive to offer the AI a higher expected utility as reward for cooperation than the AI could hope to achieve by defecting and trying to escape.
What is essential is that the AI believes that the button will more likely remain unpressed if the AI continuously acts in the principal’s interest than if it rebels.
What might go wrong with such an incentive scheme? One possibility is that the AI will not trust the human operator to deliver the promised rewards.
The track record of human reliability is something other than a straight line of unerring perfection.
Since a superintelligent agent is skilled at achieving its ends, if it prefers not to cause harm (in some appropriate sense of “harm”) then it would tend not to cause harm (in that sense of “harm”).
Embarrassingly for our species, Asimov’s laws remained state-of-the-art for over half a century:
We will refer to this approach of giving the AI final goals aimed at limiting the scope of its ambitions and activities as “domesticity.”
The final goal given to the AI in this example could be something along the lines of “achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard.”
The attractiveness of augmentation may increase in proportion to our despair at the other approaches to the control problem.
For instance, one might ask for the solution to various technical or philosophical problems that may arise in the course of trying to develop more advanced motivation selection methods. If we had a proposed AI design alleged to be safe, we could ask an oracle whether it could identify any significant flaw in the design, and whether it could explain any such flaw to us in twenty words or less. Questions of this kind could elicit valuable information. Caution and restraint would be required, however, for us not to ask too many such questions—and not to allow ourselves to partake of too many
...more
a literalistic genie (one superintelligent enough to attain a decisive strategic advantage) might have a propensity to kill the user and the rest of humanity on its first use,
The ideal genie would be a super-butler rather than an autistic savant.
if it is insufficient capability rather than sufficient reliability that makes ordinary software existentially safe, then it is unclear how such software could be a model for a safe superintelligence.
Could one have a general intelligence that is not an agent?
Intuitively, it is not just the limited capability of ordinary software that makes it safe: it is also its lack of ambition.
Further frugality could be achieved by means of uploading, since a physically optimized computing substrate, devised by advanced superintelligence, would be more efficient than a biological brain. The migration into the digital realm might be stemmed, however, if emulations were regarded as non-humans or non-citizens ineligible to receive pensions or to hold tax-exempt savings accounts. In that case, a niche for biological humans might remain
Perhaps a more advanced motivation system would be based on an explicit representation of a utility function or some other architecture that has no exact functional analogs to pleasure and pain. A related but slightly more radical multipolar outcome—one that could involve the elimination of almost all value from the future—is that the universal proletariat would not even be conscious.
We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today—a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland without children.
At the bottom of the capability hierarchy (but at the top of the power hierarchy) would sit the relatively dumb and slow principal. This human principal would be like a demented king who reigns over an incompetent court that oversees a mediocre administration which governs a capable people. Alternatively, he might be likened to the evolutionarily ancient “reptilian brain” which ensures that the newer and much cleverer neocortex is employed in the service of goals such as feeding and copulation.
One might think that there is a limit to how much damage could arise from an incorrectly specified epistemology. If the epistemology is too dysfunctional, then the AI could not be very intelligent and it could not pose the kind of risk discussed in this book. But the concern is that we may specify an epistemology that is sufficiently sound to make the AI instrumentally effective in most situations, yet which has some flaw that leads the AI astray on some matter of crucial importance. Such an AI might be akin to a quick-witted person whose worldview is predicated on a false dogma, held to with
...more
Suppose that the development of a technology has two effects: giving a small benefit B to its inventors and the country that sponsors them, while imposing an aggregately larger harm H—which could be a risk externality—on everybody. Even somebody who is largely altruistic might then choose to develop the overall harmful technology. They might reason that the harm H will result no matter what they do, since if they refrain somebody else will develop the technology anyway; and given that total welfare cannot be affected, they might as well grab the benefit B for themselves and their nation.
...more
But, in fact, it would be overly pessimistic to be so confident that humanity is doomed.
While mature AI would render WBE obsolete (except for the special purpose of preserving individual human minds), the reverse does not hold.
if events in the digital sphere unfold a thousand times faster than on the outside, then a biological human would have to rely on the digital body politic holding steady for 50,000 years of internal change and churn. Yet if the digital political world were anything like ours, there would be a great many revolutions, wars, and catastrophic upheavals during those millennia that would probably inconvenience biological humans on the outside. Even a 0.01% risk per year of a global thermonuclear war or similar cataclysm would entail a near certain loss for the biological humans living out their
...more