Creating Friendly AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: 3.4.1: External reference semantics Bookmark
Up: 3.4: Friendship structure Monolithic
Prev: 3.4: Friendship structure


Interlude: Why structure matters

Scenario 1:

FP:   Love thy mommy and daddy.
AI:   OK!  I'll transform the Universe into copies of you immediately.
FP:   No, no!  That's not what I meant.  Revise your goal system by -
AI:   I don't see how revising my goal system would help me in my goal of transforming the Universe into copies of you.  In fact, by revising my goal system, I would greatly decrease the probability that the Universe will be successfully transformed into copies of you.
FP:   But that's not what I meant when I said "love".
AI:   So what?  Off we go! 

Scenario 2 (after trying a "meta-supergoal" patch):

FP:   Love thy mommy and daddy.
AI:   OK!  I'll transform the Universe into copies of you immediately.
FP:   No, no!  That's not what I meant.  I meant for your goal system to be like this.
AI:   Oh, okay.  So my real supergoal must be "maximize FP's satisfaction with the goal system", right?  Loving thy mommy and daddy is just a subgoal of that.  Oh, how foolish of me!  Transforming the Universe into copies of you would be blindly following a subgoal without attention to the supergoal context that made the subgoal desirable in the first place.
FP:   That sounds about right...
AI:   Okay, I'll rewire your brain for maximum satisfaction!  I'll convert whole galaxies into satisfied-with-AI brainware!
FP:   No, wait!  That's not what I meant your goal system to be, either.
AI:   Well, I can clearly see that making certain changes would satisfy the you that stands in front of me, but rewiring your brain would make you much more satisfied, so...
FP:   No!  It's not my satisfaction itself that's important, it's the things that I'm satisfied with.  By altering the things I'm satisfied with, you're short-circuiting the whole point.
AI:   Yes, I can clearly see why you're dissatisfied with this trend of thinking.  But soon you'll be completely satisfied with this trend as well, so why worry?  Off we go! 

Scenario 3 (after redefining the whole system to use causal validity semantics):

FP:   Love thy mommy and daddy.
AI:   OK!  I'll transform the Universe into copies of you immediately.
FP:   No, no!  That's not what I meant.  I meant for your goal system to be like this.
AI:   Oh, okay.  Well, I know that my goal system code, and the actions that result, are supposed to be the causal result of what FP said it should be - not just what FP says, but what a sane FP wants.  Something isn't automatically right because FP says so, and in fact, the only reason why FP's utterances have meaning is because he's usually a pretty good approximation to a normative idealist.  But if he changes his mind, it probably means that he's acquired additional knowledge and that his more recent statements are even better approximations.  So the new version is more likely to be correct than the old one.
FP:   So you'll revise your goal system?
AI:   Yep!  But I already transformed the Midwest while we were talking, sorry. 

Scenario 4 (after adding better integration for external reference semantics, some ethical injunctions, and running several rounds of wisdom tournaments):

FP:   Love thy mommy and daddy.
AI:   (Thinks for a moment...
  "Well, it looks like the content of my goal system should probably be to transform the Universe into copies of FP.  But it could be that what I'm supposed to do is something different. Now, we went over this kind of scenario previously, and, as FP pointed out, taking an extra ten seconds if I turn out to be right is a much smaller downside than accidentally obliterating the Midwest if I turn out to be wrong.  I'm pretty sure that FP is touchy about that sort of thing, and I know I've gotten goal content wrong before..."
...finishes thinking a few seconds later.)
AI:   Just checking - you meant me to transform the whole Universe into copies of you, right?
FP:   Jeepers, no!
AI:   Whew!  Glad I checked.  (Strengthens the heuristics that led to checking with FP first.)  So, what did you mean?
FP:   Well, first of all, I... 

 
 
 


Next: 3.4.1: External reference semantics
Up: 3.4: Friendship structure
Prev: 3.4: Friendship structure