More on this book
Community
Kindle Notes & Highlights
by
Max Tegmark
Read between
January 28 - January 31, 2025
But even if you build an AI that will both learn and adopt your goals, you still haven’t finished solving the goal-alignment problem: what if your AI’s goals evolve as it gets smarter? How are you going to guarantee that it retains your goals no matter how much recursive self-improvement it undergoes?
This argument that an ever more intelligent AI will retain its ultimate goals forms a cornerstone of the friendly-AI vision promulgated by Eliezer Yudkowsky and others: it basically says that if we manage to get our self-improving AI to become friendly by learning and adopting our goals, then we’re all set, because we’re guaranteed that it will try its best to remain friendly forever.
For an AI, the subgoal of optimizing its hardware favors both better use of current resources (for sensors, actuators, computation and so on) and acquisition of more resources. It also implies a desire for self-preservation, since destruction/shutdown would be the ultimate hardware degradation.
you imbue a superintelligent AI with the sole goal to self-destruct, it will of course happily do so. However, the point is that it will resist being shut down if you give it any goal that it needs to remain operational to accomplish—and this covers almost all goals! If you give a superintelligence the sole goal of minimizing harm to humanity, for example, it will defend itself against shutdown attempts because it knows we’ll harm one another much more in its absence through future wars and other follies.
these emergent subgoals make it crucial that we not unleash superintelligence before solving the goal-alignment problem: unless we put great care into endowing it with human-friendly goals, things are likely to end badly for us.
We’re now ready to tackle the third and thorniest part of the goal-alignment problem: if we succeed in getting a self-improving superintelligence to both learn and adopt our goals, will it then retain them,
Should one person or group get to decide the goals adopted by a future superintelligence, even though there’s a vast difference between the goals of Adolf Hitler, Pope Francis and Carl Sagan? Or do there exist some sort of consensus goals that form a good compromise for humanity as a whole?
In my opinion, both this ethical problem and the goal-alignment problem are crucial ones that need to be solved before any superintelligence is developed. On one hand, postponing work on ethical issues until after goal-aligned superintelligence is built would be irresponsible and potentially disastrous. A perfectly obedient superintelligence whose goals automatically align with those of its human owner would be like Nazi SS-Obersturmbannführer Adolf Eichmann on steroids: lacking moral compass or inhibitions of its own, it would with ruthless efficiency implement its owner’s goals, whatever
...more
many ethical principles have commonalities with social emotions such as empathy and compassion: they evolved to engender collaboration, and they affect our behavior through rewards and punishments. If we do something mean and feel bad about it afterward, our emotional punishment is meted out directly by our brain chemistry. If we violate ethical principles, on the other hand, society may punish us in more indirect ways such as through informal shaming by our peers or by penalizing us for breaking a law.
In conclusion, it’s tricky to fully codify even widely accepted ethical principles into a form applicable to future AI, and this problem deserves serious discussion and research as AI keeps progressing. In the meantime, however, let’s not let perfect be the enemy of good: there are many examples of uncontroversial “kindergarten ethics” that can and should be built into tomorrow’s technology.
We saw that a cornerstone in the “friendly-AI” vision is the idea that a recursively self-improving AI will wish to retain its ultimate (friendly) goal as it gets more intelligent. But how can an “ultimate goal” (or “final goal,” as Bostrom calls it) even be defined for a superintelligence? The way I see it, we can’t have confidence in the friendly-AI vision unless we can answer this crucial question.
To program a friendly AI, we need to capture the meaning of life. What’s “meaning”? What’s “life”? What’s the ultimate ethical imperative? In other words, how should we strive to shape the future of our Universe? If we cede control to a superintelligence before answering these questions rigorously, the answer it comes up with is unlikely to involve us. This makes it timely to rekindle the classic debates of philosophy and ethics, and adds a new urgency to the conversation!
To appreciate how broad our consciousness definition is, note that it doesn’t mention behavior, perception, self-awareness, emotions or attention. So by this definition, you’re conscious also when you’re dreaming, even though you lack wakefulness or access to sensory input and (hopefully!) aren’t sleepwalking and doing things. Similarly, any system that experiences pain is conscious in this sense, even if it can’t move.
As David has emphasized, there are really two separate mysteries of the mind. First, there’s the mystery of how a brain processes information, which David calls the “easy” problems. For example, how does a brain attend to, interpret and respond to sensory input? How can it report on its internal state using language? Although these questions are actually extremely difficult, they’re by our definitions not mysteries of consciousness, but mysteries of intelligence: they ask how a brain remembers, computes and learns. Moreover, we saw in the first part of the book how AI researchers have started
...more
In summary, your consiousness lives in the past, with Christof Koch estimating that it lags behind the outside world by about a quarter second. Intriguingly, you can often react to things faster than you can become conscious of them, which proves that the information processing in charge of your most rapid reactions must be unconscious.
It takes longer for nerve signals to reach your brain from your fingers than from your face because of distance, and it takes longer for you to analyze images than sounds because it’s more complicated—which is why Olympic races are started with a bang rather than with a visual cue. Yet if you touch your nose, you consciously experience the sensation on your nose and fingertip as simultaneous, and if you clap your hands, you see, hear and feel the clap at exactly the same time.14 This means that your full conscious experience of an event isn’t created until the last slowpoke email reports have
...more
How can something as complex as consciousness be made of something as simple as particles? I think it’s because it’s a phenomenon that has properties above and beyond those of its particles. In physics, we call such phenomena “emergent.”
I’d been arguing for decades that consciousness is the way information feels when being processed in certain complex ways.18 IIT agrees with this and replaces my vague phrase “certain complex ways” by a precise definition: the information processing needs to be integrated, that is, Φ needs to be large. Giulio’s argument for this is as powerful as it is simple: the conscious system needs to be integrated into a unified whole, because if it instead consisted of two independent parts, then they’d feel like two separate conscious entities rather than one. In other words, if a conscious part of a
...more
In summary, I think that consciousness is a physical phenomenon that feels non-physical because it’s like waves and computations: it has properties independent of its specific physical substrate. This follows logically from the consciousness-as-information idea. This leads to a radical idea that I really like: If consciousness is the way that information feels when it’s processed in certain ways, then it must be substrate-independent; it’s only the structure of the information processing that matters, not the structure of the matter doing the information processing. In other words,
...more
If the information processing itself obeys certain principles, it can give rise to the higher-level emergent phenomenon that we call consciousness. This places your conscious experience not one but two levels up from the matter. No wonder your mind feels non-physical!
Would an artificial consciousness feel that it had free will? Note that, although philosophers have spent millennia quibbling about whether we have free will without reaching consensus even on how to define the question,33 I’m asking a different question, which is arguably easier to tackle. Let me try to persuade you that the answer is simply “Yes, any conscious decision maker will subjectively feel that it has free will, regardless of whether it’s biological or artificial.”