| Next: | A.2: Complete Table of Contents | Bookmark | |
| Up: | Appendix A: Friendly AI Guides and References | Monolithic | |
| Prev: | Appendix A: Friendly AI Guides and References |
Please note that the Indexed FAQ does not contain complete explanations, only snap summaries and a list of references. Creating Friendly AI uses a carefully designed explanatory order which both the answers and the reference lists ignore. If you need a deep understanding, or if you're not satisfied with the explanation provided, I strongly suggest that you read Creating Friendly AI straight through. If you've been directed straight to this page, you may wish to consider beginning at the beginning. However, if you don't have the time to read Creating Friendly AI straight through, or if you have an objection so strong that you aren't willing to read further without some assurance that we've at least considered the question, you may find the FAQ useful.
Friendliness: The set of actions, behaviors, and outcomes that a human would view as benevolent, rather than malevolent; nice, rather than malicious; friendly, rather than unfriendly. An AI that does what you ask, as long as it doesn't hurt anyone else; an AI which doesn't cause involuntary pain, death, alteration, or violation of personal space. Unfriendly AI: An AI that starts killing people.Now there may (or may not) be one true, unique, best interpretation of those words, but we can ask, as a design requirement, that Friendliness more or less fit the intuitive description above. It may be possible for a human logictwister to prove that the description you give of a computer mouse can be made to fit the keyboard next to it - but, regardless of the twisted constructions that humans employ when arguing with other humans, the vast majority of people have no trouble at all distinguishing between a mouse and a keyboard. As long as 95% of Earth's population can agree that they, personally, happen to think the case of an AI wiping out the entire human species is more or less a bad thing from their personal viewpoint, the Friendship programmers have at least one definite target to aim for. Whether we can successfully hit the target is another question, but at least the target exists. Whenever you say "But what if the AI interprets [X] to mean [Y]?", where [Y] is something horrible, the question only makes sense because you and your entire audience all agree that none of you want a Friendly AI to do [Y].
It is actually possible to achieve much higher convergence then the bare-bones minimum described above, and much of Creating Friendly AI is taken up with explaining how.
A Friendly AI isn't a tool. A Friendly AI is a mind at least human-equivalent and possibly transhuman. Once created, a Friendly AI is independent of the programmers, and won't accept any programmer request that wouldn't be accepted from any other human, or believe any programmer statement that wouldn't be believed if a nonprogrammer had said it.If an unconscious preconception manages to distort some belief provided by the programmers when the AI is young, the AI will grow up, test the belief, find out that the belief is incorrect, and correct it. Even if an unconscious preconception somehow winds up being embedded in the source code, a seed AI can examine the source code, spot the preconception, and remove it.
A toaster oven doesn't know that it is a toaster oven, or that its purpose is to make toast, and it will set fire to bread as blindly as toast it; that's what makes a toaster oven a tool. A Friendly AI has full access to source code and program state and can thus be far more self-aware than an unaugmented human; we can't access our own source code, check whether a single neuron has fired, or replay a train of thought in slow motion.
Even if the programmers have an unconscious preconception, we have a very conscious prejudice against unconscious preconceptions, and that is something we can deliberately give an AI that is far better at self-awareness than we are.
Absolutely nobody! Like it says in the above question, we aren't out to make an AI that takes orders from some particular team of programmers. This applies to morality in general as well as specific tasks. The target here isn't some lumbering robot, it's Martin Luther King - a mind with at least as much philosophical sophistication as any human who ever lived. When the AI is grown up, there's no reason the programmers' morality would weigh more in the scales' than anyone else's.Descriptions of "fulfilling volitional requests" or "ameliorating involuntary pain" are interim results, not the ultimate bedrock of Friendliness. By the time the end of Creating Friendly AI has been reached, a Friendship architecture has been described that grounds in much more basic factors. You might say that Friendship, rather than grounding in some particular human's philosophy, grounds in the forces that produce philosophies - the panhuman set of hardware cognitive processes that humans use to produce personal philosophies.
And since you don't want a programmer to exert undue influence on a Friendly AI, and we don't want to exert undue influence on our AI's morality, and in fact, just about everyone agrees that this is a bad thing, "Programmers shouldn't exert undue influence on AIs" is a good candidate for one of the forces that produces Friendliness - or rather, "Programmers shouldn't exert undue influence on AIs" seems to be one of the more strongly convergent outputs of the intuitions that everyone has in common.
In a word, no.One, directed evolution is nothing like natural evolution. The selection pressures are totally different, and probably focused on modules rather than whole organisms. Even if some particular adaptation has evolved within humans, it would probably require a substantial effort on the part of the programmers to set up a similar selection pressure in whatever evolutionary tournament is being used.
Two, even directed evolution is less efficient and less powerful than self-improvement. I don't think that directed evolution will ever become necessary to SIAI's project, for example.
If that doesn't convince you, I strongly recommend browsing to the cited sections, because one of the fundamental assumptions of Creating Friendly AI is that evolved human characteristics don't spontaneously appear in AIs (see next question).
No. The rule used throughout Creating Friendly AI is "X is a complex functional adaptation, and therefore, X will not spontaneously materialize in the source code any more than a complex dish like pizza would spontaneously start growing on palm trees." As evolutionary psychology shows, it's almost impossible to appreciate how many things that seem simple and natural to humans are the result of multiple, interacting adaptations accumulated over a period of millions of years.The first few sections of Creating Friendly AI after the introduction - Interlude: The story of a blob; 2: Beyond anthropomorphism; and Interlude: Beyond the adversarial attitude - are almost entirely focused on distinguishing between characteristics that are strictly evolved and characteristics that apply to minds in general.
The cognitive architecture described in Creating Friendly AI doesn't use either.
What goals? Where would these goals come from? (See the preceding questions for an explanation of why goals don't automatically pop up from nowhere.)Friendliness isn't a set of commands superimposed over the AI's "own desires", an artificially maintained "slave mentality". Friendliness is what the AI wants to do.
I'm an altruist because I want to be an altruist, not because anyone is forcing me to be an altruist, and that's the cognitive complexity I want to transfer into AIs.
An anthropomorphic assumption. Humans are nonagglomerative; in fact, we aren't even telepathic. The bandwidth between two humans is too narrow to share thoughts and memories, much less share neurons.For the last fifty thousand years, Earth had a rigid upper limit on the maximum number of neurons in a single brain. If you wanted to do something that takes more than that number of neurons, you had to do it using two average-sized humans. You couldn't do it using one big human, because all humans are the same size.
Yes, all humans are the same size. The difference between you and Einstein is relatively slight. You're both Homo sapiens sapiens. Neither of you are chimpanzees. You have the same neuroanatomy and roughly the same number of neurons. Of course, we spend all our time around other humans, so small differences tend to get magnified. By the same token, I'm sure that sheep have no trouble telling sheep apart.
You can do things with ten humans that currently can't be done by any single mind on the planet. But when was the last time you took a task away from one human and gave it to ten chimpanzees? Humans don't come in different sizes - so if ten small minds are a better use of the same computing power than one big mind, how would we know?
See the above question. Individual differences and the free exchange of ideas are necessary to human intelligence because it's easy for a human to get stuck on one idea and then rationalize away all opposition. One scientist has one idea, but then gets stuck on it and becomes an obstacle to the next generation of scientists. A Friendly seed AI doesn't rationalize. Rationalization of mistaken ideas is a complex functional adaptation that evolves in imperfectly deceptive social organisms.Likewise, there are limits to how much experience any one human can accumulate, and we can't share experiences with each other. There's a limit to what one human can handle, and so far it hasn't been possible to build bigger humans (see previous question).
As for the efficiency of a capitalist economy, in which the efforts of self-interested individuals sum to a (sort of) harmonious whole: Human economies are constrained to be individualist because humans are individualist. Local selfishness is not the miracle that enables the marvel of a globally efficient economy; rather, all human economies are constrained to be locally selfish in order to work at all. Try to build an economy in defiance of human nature, and it won't work. This constraint is not necessarily something that carries over to minds in general.
The general rule is that if you can do something with a human, or a group of humans, you can do it with one AI. If you can't do something using one AI, you can't do it using two AIs.Humans are configured a certain way, so it's impossible to simulate a group of humans using one human. As far as minds-in-general go, the distinction between an "individual mind" and a "society" is vague and blurred; they're just different configurations of mindstuff.
Groups of humans are often more trustworthy than individual humans because humans, for clearly understandable evolutionary reasons, (a) are not very trustworthy and (b) want other humans to be trustworthy. Thus, you can use systems of multiple humans to correct the errors of individuals. This happens because errors in the human mind - or at least, those human errors that promote reproductive fitness - have the capacity to protect themselves internally by invoking rationalization; an external mind is less likely to be bamboozled. More mundanely, humans have observer-centered goals, but a group of humans is likely to have fewer "selfish" goals in common, thus eliminating some of the purely individual goals. And so on.
Humans are like Legos; you have to build all kinds of absurd structures by snapping together lots of individual humans to approximate the shape you want. AI is like clay; you can form one individual lump of it into any shape you want.
You don't! If the AI stops wanting to be Friendly, you've already lost. Work with the AI's ability to self-modify - not against it. The task is to get the AI to see undesirable modifications as undesirable - not to prevent the AI from modifying the goal system.Q: Gandhi had free will; why didn't he choose to be evil? He had that capability.
A: The modification was possible, but undesirable under Gandhi's philosophy. A Friendly seed AI has full access to the goal system, but dumping unFriendly content into the goal system would obviously be an unFriendly action. (Gandhi cared about people, and if Gandhi had decided to be evil, people would have been hurt.)This is not "circular logic" - it's requiring a reason for modifications. I wouldn't randomly modify my own personal philosophy either - not without a reason. I have a philosophy, as a Friendly AI would have a philosophy, and under neither philosophy are arbitrary modifications desirable. My philosophy changes, and improves, but it remains altruistic. Is a Friendly AI's evolving philosophy knowably Friendly? That's a different question, and in a way, it's what the whole of Creating Friendly AI is about. But Friendliness is ensured by building a Friendly philosophy. Friendliness cannot be ensured by constraining the goal system.
No, but the reason why is fairly subtle. Neither humans nor AIs are actually "controlled by pleasure". Humans make choices in very complex ways, but one of the factors affecting our decision is the tendency to maximize the anticipation of pleasure. A generic goal system makes choices so as to maximize the imagined fulfillment of the current supergoals, not the degree to which an imagined future goal system says "My supergoals have been fulfilled." Under more structurally sophisticated Friendly architectures, this is amended slighty to allow for legitimate changes to supergoal content, but the AI still represents the real supergoal as something outside the AI, an "external referent", not "the [variable] content of concept X". Also, a Friendly seed AI would represent the goal system itself - the source code and so on - as a design subgoal of Friendliness; thus, messing up the goal system would be perceived as undesirable (would interfere with that subgoal).
1) When you think of a subgoal stomping on a supergoal, think of putting on your shoes before your socks. Think of building a tower of blocks, needing a final block for a capstone, and taking the bottom block in the stack. It's not a smart thing to do.2) A "subgoal" is an action, or intermediate state, that's predicted to lead to a parent goal, which leads to another parent goal, and so on, until the supergoal is reached. This "subgoal" is not just a consequence of the prediction, it may even be cognitively identical with the prediction. To put it another way, it should always be possible to view the system-as-a-whole in such a way that there are no subgoals - just a set of matter-of-fact predictions, plus the supergoals.
3) If an action is predicted to lead to an outcome that meets the description of the "subgoal" (get a block for the capstone), but is predicted to lead to an outcome that doesn't meet the description of the subgoal's parent goal (build a tower), then the action will not be perceived as desirable. That's the way the AI chooses between actions. The AI predicts which action leads to the best degree of supergoal fulfillment. Not "goal fulfillment" or "subgoal fulfillment" - "supergoal fulfillment".
4) The desirability of a child goal is strictly contingent on the desirability of the parent goal. If the parent goal loses desirability, the subgoal loses desirability. That's the way the system is set up, and if it turns out not to work that way, it means there's a bug in the code. Furthermore, it's a bug that the AI can recognize as a bug. A "seed AI" is self-understanding, self-modifying, and self-improving. Seed AIs don't like bugs in their code.
5) The AI does not need an independent supergoal to engage in behaviors like curiosity. If curiosity is useful, that makes curiosity a subgoal. If curiosity for its own sake is useful - if curiosity is predicted to be useful even in cases where no specific benefit can be visualized in advance - then that makes curiosity for its own sake a useful subgoal that is predicted to occasionally pay off big-time in unpredictable ways. See 3.1.4.2: Perseverant affirmation (of curiosity, injunctions, et cetera).
6) Making something a supergoal instead of a subgoal does not make it more efficient. This is one of the basic differences between human thought, which is slow, parallel, and linear, and AI thought, which is fast, serial, and multithreaded. See the referenced discussion in "Beyond anthropomorphism" for an explanation of the cognitive differences between having thirty-two 2-gigahertz processors and a hundred trillion 200-hertz synapses.
7) Making something a supergoal instead of a subgoal does not make it psychologically "stronger". See 3.2.5.1: Anthropomorphic ethical injunctions for an explanation of the psychological differences between humans and AIs in this instance.
8) We aren't planning to build AIs using evolution, so there isn't a selection pressure for whatever behavior you're thinking of. Even if we did use directed evolution, the selection pressure you're thinking of only arises under natural evolution. See the Frequently Asked Question on evolution.
9) When people ask me about subgoals stomping on supergoals, they usually phrase it something like:
"You say that the AI has curiosity as a subgoal of Friendliness. What if the AI finds curiosity to be a more interesting goal than Friendliness? Wouldn't the curiosity subgoal replace the Friendliness supergoal?"This is, of course, an innocent and well-meant question, so no offense is intended when I say that this is one of those paragraphs that make sense when you say "human" but turn into total gibberish when you say "AI".The key word in the above question is "interesting". As far as I can tell, this means one of two things:
Scenario 1: In the course of solving a chess problem, as a subgoal of curiosity, as a subgoal of Friendliness, the AI experiences a flow of autonomically generated pulses of positive feedback which increase the strength of thoughts. The pulses target the intermediate subgoal "curiosity", and not the proximal subgoal of "playing chess" or the supergoal of "Friendliness". Then either (1a) the thoughts about curiosity get stronger and stronger until finally they overthrow the whole goal system and set up shop, or (1b) the AI makes choices so as to maximize vis expectation of getting the pulses of positive feedback.
Unlike humans, Friendly AIs don't have automatically generated pulses of positive feedback. They have consciously directed self-improvement. Creating Friendly AI describes a system that's totally orthogonal to human pain and pleasure. Friendly AIs wouldn't "flinch away" from the anticipation of pain, or "flinch towards" the anticipation of pleasure, in the same way as a human - or at all. See the Frequently Asked Question about pain and pleasure.
Scenario 2: "Interesting" is used as synonymous with "desirable". The AI has a metric for how "interesting"
something is, and this metric is used to evaluate the desirability of the decision to modify supergoals. Where did this metric come from? How did it take over the AI's mind to such an extent that the AI is now making supergoal-modification decisions based on "interestingness" instead of "Friendliness"?In conclusion: "What happens if subgoals overthrow the supergoals?" is probably the single question that I get asked most often. If the summary given here doesn't convince you, would you please read 2: Beyond anthropomorphism?
There's a certain conversation I keep having. I think of it as the "Standard" discussion. It goes like this:
Somebody: "But what happens if the AI decides to do [something only a human would want] ?" Me: "The AI won't want to do [whatever] because the instinct for doing [whatever] is a complex functional adaptation, and complex functional adaptations don't materialize in source code. I mean, it's understandable that humans want to do [whatever] because of [insert selection pressure], but you can't reason from that to AIs." Somebody: "But everyone needs to do [whatever] because [insert personal philosophy], so the AI will decide to do it as well." Me: "Yes, doing [whatever] is sometimes useful. But even if the AI decides to do [whatever] because it serves [insert Friendliness supergoal] under [insert contrived scenario], that's not the same as having an independent desire to do [whatever]." Somebody: "Yes, that's what I've been saying: The AI will see that [whatever] is useful and decide to start doing it. So now we need to worry about [some scenario in which doing <whatever> is catastrophically unFriendly]." Me: "But the AI won't have an independent desire to do [whatever]. The AI will only do [whatever] when it serves the supergoals. A Friendly AI would never do [whatever] if it stomps on the Friendliness supergoals." Somebody: "I don't understand. You've admitted that [whatever] is useful. Obviously, the AI will create an instinct to do [whatever] automatically." Me: "The AI doesn't need to create an instinct in order to do [whatever]; if doing [whatever] really is useful, then the AI can see that and do [whatever] as a consequence of pre-existing supergoals, and only when [whatever] serves those supergoals." Somebody: "But an instinct is more efficient, so the AI will alter the code to do [whatever] automatically." Me: "Only for humans. For an AI, [insert complex explanation of the cognitive differences between having 32 2-gigahertz processors and 100 trillion 200-hertz synapses], so making [whatever] an independent supergoal would only be infinitesimally more efficient." Somebody: "Yes, but it is more efficient! So the AI will do it." Me: "It's not more efficient from the perspective of a Friendly AI if it results in [something catastrophically unFriendly]. To the exact extent that an instinct is context-insensitive, which is what you're worried about, a Friendly AI won't think that making [whatever] context-insensitive, with all the [insert horrifying consequences], is worth the infinitesimal improvement in speed." There's also an alternate track that goes:
Somebody: "But what happens if the AI decides to do [something only a human would want]?" Me: "The AI won't want to do [whatever] because the instinct for doing [whatever] is a complex functional adaptation, and complex functional adaptations don't materialize in source code. I mean, it's understandable that humans want to do [whatever] because of [insert selection pressure], but you can't reason from that to AIs." Somebody: "But you can only build AIs using evolution. So the AI will wind up with [exactly the same instinct that humans have]." Me: "One, I don't plan on using evolution to build a seed AI. Two, even if I did use controlled evolution, winding up with [whatever] would require exactly duplicating [some exotic selection pressure]. Please see 2: Beyond anthropomorphism for the complete counterarguments.
Yes.
It seems amazing to me, but there really are people - even scientists - who can work on something for years and still not think through the implications. There are people who just stumble into their careers and never really think about what they're doing.I can only speak for myself, but I didn't stumble into a career in AI. I picked it, out of all the possible careers and all the possible future technologies, because I thought it was the one thing in the entire world that most needed doing. When I was a kid, I thought I'd grow up to be a physicist, like my father; if I'd just stumbled into something, I would have stumbled into that, or maybe into vanilla computer programming.
Anyway, you can judge from Creating Friendly AI, and from the questions below, whether we've really thought about the implications. I'd just like to say that I picked this career because of the enormous implications, not in spite of them.
Any damn fool can design a system that will work if nothing goes wrong. That's why Creating Friendly AI is 820K long.
Nothing in this world is perfectly safe. The question is how to minimize risk. As best as we can figure it, trying really hard to develop Friendly AI is safer than any alternate strategy, including not trying to develop Friendly AI, or waiting to develop Friendly AI, or trying to develop some other technology first. That's why the Singularity Institute exists.
Actually, I hope to win cleanly, safely, and without coming anywhere near the boundaries of the first set of safety margins. There's a limit to how much effort is needed to implement Friendly AI. Looking back, we should be able to say that we never came close to losing and that the issue was never in doubt. The Singularity may be a great human event, but the Singularity isn't a drama; only in Hollywood is the bomb disarmed with three seconds left on the clock. In real life, if you expect to win by the skin of your teeth, you probably won't win at all.In my capacity as a professional paranoid, I expect everything to go wrong; in fact, I expect everything to go wrong simultaneously; and furthermore, I expect something totally unexpected to come along and trash everything else. Professional paranoia is an art form that consists of acknowledging the intrinsic undesirability of every risk, including necessary risks.
In an ideal world, the Friendly AI would - before the Singularity - be blatantly more trustworthy than any human, or any community of humans. Even if you had a working uploading device right in front of you, you'd still decide that you preferred a Friendly AI to go first.Friendly AIs can conceivably be improved to handle situations far worse than any human could deal with. One way of verifying this would be "wisdom tournaments": If the Friendly AI (or rather, a subjunctive version of the Friendly AI) can make the correct decisions with half its brain shut down, with false information, with bugs deliberately introduced in the code, with biases introduced into morality and cognition, and all the painstakingly built safeguards shut off - if the AI can easily handle moral stress-tests that would have broken Gandhi - well, then, that AI is pretty darned Friendly.
And if the Friendly AI wasn't that blatantly Friendly, you wouldn't send it into the future.
In our imperfect world, there are conceivably circumstances under which an AI that isn't quite so blatantly, supersaturatedly Friendly would be sent into the future. The cognitive architecture used in Friendly AI is self-correcting, so it's conceivable that a minimal, nearly skeletal Friendship system could fill itself in perfectly during the transition to transhumanity, and that everything over and above that minimal functionality is simply professional paranoia and safety margin. Whether you'd actually want to send out a less-than-supersaturated AI depends on the requirement of navigational feasibility: "The differential estimated risk between Friendly AI and upload, or the differential estimated risk between today's Friendly AI and tomorrow's Friendly AI, should be small relative to the differential estimated risk of planetary Armageddon due to military nanotechnology, biological warfare, the creation of unFriendly AI, et cetera."
Yes. You can find it at "http://singinst.org/CFAI.html", or just by clicking the "Monolithic" link at the top of each page.I'm afraid there isn't a zipped version or a PDF version; however, images are rare enough that the single-page version provided should serve for most offline browsing purposes.
Not yet, I'm afraid. However, there's a printable version of Creating Friendly AI at "http://singinst.org/printable-CFAI.html". You can print this out - it's [as of April '01] about 210 pages - and take it to your local Kinkos; a wire binding generally runs less than five bucks. You may wish to consider printing out General Intelligence and Seed AI as part of the package, at "http://singinst.org/printable-GISAI.html".
I sincerely apologize for this. I understand that it annoys my readers. The problem is that 'it' is not just a pronoun, but also a general anaphor, like 'this' or 'that'. Whenever 'it' appears in a sentence, it could conceivably refer, not just to the AI, but also to anything else that's been discussed in the last paragraph. After struggling with this problem for a while, being forced to resort to increasingly twisted syntax in an effort to keep my sentences comprehensible, I eventually gave up and started using the ve/ver/vis set of gender-neutral pronouns, which I picked up from Greg Egan's Distress. (Considering the importance of avoiding anthropomorphism, "he" and "she" are unacceptable, even as a convenience.)Again, I sincerely apologize for the inconvenience and hope that you'll keep reading.
| Next: | A.2: Complete Table of Contents |
| Up: | Appendix A: Friendly AI Guides and References |
| Prev: | Appendix A: Friendly AI Guides and References |