More on this book
Community
Kindle Notes & Highlights
by
Nick Bostrom
Read between
December 14, 2019 - March 6, 2020
If we are wondering whether a mathematical proposition is true, we could ask the oracle to produce a proof or disproof of the proposition. Finding the proof may require insight and creativity beyond our ken, but checking a purported proof’s validity can be done by a simple mechanical procedure.
One might also consider whether to try to build the oracle in such a way that it would refuse to answer any question in cases where it predicts that its answering would have consequences classified as catastrophic according to some rough-and-ready criteria.
The ideal genie would be a super-butler rather than an autistic savant.
Each caste corresponds to a different set of safety precautions.
There is no subroutine in Excel that secretly wants to take over the world if only it were smart enough to find a way.
If making molecular smiley faces or transforming the planet into paperclips is the first idea that the superintelligence discovers that meets the solution criterion, then smiley faces or paperclips we get.
In other experiments, evolutionary algorithms designed circuits that sensed whether the motherboard was being monitored with an oscilloscope or whether a soldering iron was connected to the lab’s common power supply. These examples illustrate how an open-ended search process can repurpose the materials accessible to it in order to devise completely unexpected sensory capabilities, by means that conventional human design-thinking is poorly equipped to exploit or even account for in retrospect.
Absent phenomenal experience, the musician could be regarded as merely a high-powered jukebox, albeit one capable of creating the three-dimensional appearance of a performer interacting naturally with the crowd.
If there were inexpensive mechanical devices that ran on hay and had exactly the same shape, feel, smell, and behavior as biological horses—perhaps even the same conscious experiences—then demand for biological horses would probably decline further.
Given the astronomical amplification effect, even a tiny bit of pre-transition wealth would balloon into a vast post-transition fortune.
clans might restrict their fertility to below the rate of growth of their capital: such clans could slowly increment their numbers while their members also grow richer per capita.
succeed, by hook or by crook,
Let us, then, consider the plight of the working-class machine, whether it be operating as a slave or a free agent.
A typical short-lived emulation might wake up in a well-rested mental state that is optimized for loyalty and productivity.
He is not overly troubled by thoughts of his imminent death at the end of the working day. Emulations with death neuroses or other hang-ups are less productive and would not have been selected.
bias toward cheerfulness could thus have been selected for, with the result that human neurochemistry is now biased toward positive affect compared to what would have been maximally efficient according to simpler materialistic criteria. If this were the case, then the future of joie de vivre might depend on cheer retaining its social signaling function unaltered in the post-transition world: an issue to which we will return shortly.
Imagine running on a treadmill at a steep incline—heart pounding, muscles aching, lungs gasping for air. A glance at the timer: your next break, which will also be your death, is due in 49 years, 3 months, 20 days, 4 hours, 56 minutes, and 12 seconds. You wish you had not been born.
Second, technologically advanced agents might have available new means of reliably communicating information about themselves, means that do not rely on costly display.
Signaling one’s qualities by agreeing to such auditing might be more efficient than signaling via flamboyant display.
Even if a defender had the ability to kill nine-tenths of the aggressor’s population in a retaliatory strike, this would scarcely offer much deterrence if the deceased could be immediately resurrected from redundant backups.
Like the cells in our bodies, or the individual animals in a colony of eusocial insects, emulations that were wholly altruistic toward their copy-siblings would cooperate with one another even in the absence of elaborate incentive schemes.
The essential property of a superorganism is not that it consists of copies of a single progenitor but that all the individual agents within it are fully committed to a common goal.
Many a potentially beneficial deal never comes off because compliance would be too difficult to verify.
Ideally, the meta-treaty would be put into effect before any party had an opportunity to make the internal arrangements necessary to subvert its implementation. Once villainy has had an unguarded moment to sow its mines of deception, trust can never set foot there again.
“happiness is enjoyment of the potentialities inherent in our human nature”
We only need to open our eyes, so it seems, and a rich, meaningful, eidetic, three-dimensional view of the surrounding environment comes flooding into our minds.
If an agent is not already fundamentally friendly by the time it gains the ability to reflect on its own agency, it will not take kindly to a belated attempt at brainwashing or a plot to replace it with a different agent that better loves its neighbor.
Evolution has produced an organism with human values at least once.
Evolution can be viewed as a particular class of search algorithms that involve the alternation of two steps, one expanding a population of solution candidates by generating new candidates according to some relatively simple stochastic rule (such as random mutation or sexual recombination), the other contracting the population by pruning candidates that score poorly when tested by an evaluation function.
Nature might be a great experimentalist, but one who would never pass muster with an ethics review board—contravening the Helsinki Declaration and every norm of moral decency, left, right, and center. It is important that we not gratuitously replicate such horrors in silico.
Now one might wonder: if the value-loading problem is so tricky, how do we ourselves manage to acquire our values?
That may not even be desirable as an aim—human nature, after all, is flawed and all too often reveals a proclivity to evil which would be intolerable in any system poised to attain a decisive strategic advantage.
To count as improvements, however, such deviations from the human norm would have to be pointed in very particular directions rather than at random; and they would continue to presuppose the existence of a largely undisturbed anthropocentric frame of reference to provide humanly meaningful evaluative generalizations
The AI thus must be endowed with a criterion that it can use to determine which percepts constitute evidence in favor of some hypothesis about what the ultimate goal is, and which percepts constitute evidence against.
We can liken this kind of agent to a barge attached to several tugboats that pull in different directions. Each tugboat corresponds to a hypothesis about the agent’s final value. The engine power of each tugboat corresponds to the associated hypothesis’s probability, and thus changes as new evidence comes in, producing adjustments in the barge’s direction of motion. The resultant force should move the barge along a trajectory that facilitates learning about the (implicit) final value while avoiding the shoals of irreversible destruction; and later, when the open sea of more definite knowledge
...more
failure on the part of the agent to invest sufficiently in obtaining an accurate view of reality).
Rather, they are initial hypotheses about friendliness, hypotheses to which a rational AI will assign a high probability at least for as long as it trusts the programmers’ epistemic capacities more than its own.
Ideally, as the AI matures, it should overcome any cognitive biases and other more fundamental misconceptions that may have prevented its programmers from fully understanding what friendliness is.
Alternatively, he might be likened to the evolutionarily ancient “reptilian brain” which ensures that the newer and much cleverer neocortex is employed in the service of goals such as feeding and copulation.
For example, in the model where each level of supervision has half the numbers of the layer below, the extra computational overhead is bounded at a mere 100% of what the proletarian part of the system costs—less if the dumber boss layers require fewer computations per subagent.
In a highly competitive situation, the cost may be unaffordable unless an enterprise could be assured that its competitors would incur the same cost.
No ethical theory commands majority support among philosophers, so most philosophers must be wrong.
A mere hundred and fifty years ago, slavery still was widely practiced in the American South, with full support of the law and moral custom.
For example, consider the (unusually simple) consequentialist theory of hedonism. This theory states, roughly, that all and only pleasure has value, and all and only pain has disvalue.4 Even if we placed all our moral chips on this one theory, and the theory turned out to be right, a great many questions would remain open. Should “higher pleasures” be given priority over “lower pleasures,” as John Stuart Mill argued? How should the intensity and duration of a pleasure be factored in? Can pains and pleasures cancel each other out? What kinds of brain states are associated with morally relevant
...more
The Taliban could reason that if his religious views are in fact correct (as he is convinced they are) and if good grounds for accepting these views exist (as he is also convinced) then humankind would in the end come to accept these views if only people were less prejudiced and biased, if they spent more time studying scripture, if they could more clearly understand how the world works and recognize essential priorities, if they could be freed from irrational rebelliousness and cowardice, and so forth.15 The Humanist, similarly, would believe that under these idealized conditions, humankind
...more
CEV is meant to be an “initial dynamic,” a process that runs once and then replaces itself with whatever the extrapolated volition wishes.
Although the CEV proposal scores better on this desideratum than many alternatives, it does not entirely eliminate motives for conflict. A selfish individual, group, or nation might seek to enlarge its slice of the future by keeping others out of the extrapolation base.
Moral goodness might be more like a precious metal than an abundant element in human nature, and even after the ore has been processed and refined in accordance with the prescriptions of the CEV proposal, who knows whether the principal outcome will be shining virtue, indifferent slag, or toxic sludge?
If we knew how to code “Do What I Mean” in a general and powerful way, we might as well use that as a standalone goal.
It could highlight aspects of the future that the operator might not have thought of inquiring about but which would be regarded as pertinent once pointed out.