| Next: | 3.4.1: External reference semantics | Bookmark | |
| Up: | 3.4: Friendship structure | Monolithic | |
| Prev: | 3.4: Friendship structure |
FP: Love thy mommy and daddy. AI: OK! I'll transform the Universe into copies of you immediately. FP: No, no! That's not what I meant. Revise your goal system by - AI: I don't see how revising my goal system would help me in my goal of transforming the Universe into copies of you. In fact, by revising my goal system, I would greatly decrease the probability that the Universe will be successfully transformed into copies of you. FP: But that's not what I meant when I said "love". AI: So what? Off we go!
FP: Love thy mommy and daddy. AI: OK! I'll transform the Universe into copies of you immediately. FP: No, no! That's not what I meant. I meant for your goal system to be like this. AI: Oh, okay. So my real supergoal must be "maximize FP's satisfaction with the goal system", right? Loving thy mommy and daddy is just a subgoal of that. Oh, how foolish of me! Transforming the Universe into copies of you would be blindly following a subgoal without attention to the supergoal context that made the subgoal desirable in the first place. FP: That sounds about right... AI: Okay, I'll rewire your brain for maximum satisfaction! I'll convert whole galaxies into satisfied-with-AI brainware! FP: No, wait! That's not what I meant your goal system to be, either. AI: Well, I can clearly see that making certain changes would satisfy the you that stands in front of me, but rewiring your brain would make you much more satisfied, so... FP: No! It's not my satisfaction itself that's important, it's the things that I'm satisfied with. By altering the things I'm satisfied with, you're short-circuiting the whole point. AI: Yes, I can clearly see why you're dissatisfied with this trend of thinking. But soon you'll be completely satisfied with this trend as well, so why worry? Off we go!
FP: Love thy mommy and daddy. AI: OK! I'll transform the Universe into copies of you immediately. FP: No, no! That's not what I meant. I meant for your goal system to be like this. AI: Oh, okay. Well, I know that my goal system code, and the actions that result, are supposed to be the causal result of what FP said it should be - not just what FP says, but what a sane FP wants. Something isn't automatically right because FP says so, and in fact, the only reason why FP's utterances have meaning is because he's usually a pretty good approximation to a normative idealist. But if he changes his mind, it probably means that he's acquired additional knowledge and that his more recent statements are even better approximations. So the new version is more likely to be correct than the old one. FP: So you'll revise your goal system? AI: Yep! But I already transformed the Midwest while we were talking, sorry.
FP: Love thy mommy and daddy. AI: (Thinks for a moment... "Well, it looks like the content of my goal system should probably be to transform the Universe into copies of FP. But it could be that what I'm supposed to do is something different. Now, we went over this kind of scenario previously, and, as FP pointed out, taking an extra ten seconds if I turn out to be right is a much smaller downside than accidentally obliterating the Midwest if I turn out to be wrong. I'm pretty sure that FP is touchy about that sort of thing, and I know I've gotten goal content wrong before..." ...finishes thinking a few seconds later.) AI: Just checking - you meant me to transform the whole Universe into copies of you, right? FP: Jeepers, no! AI: Whew! Glad I checked. (Strengthens the heuristics that led to checking with FP first.) So, what did you mean? FP: Well, first of all, I...
| Next: | 3.4.1: External reference semantics |
| Up: | 3.4: Friendship structure |
| Prev: | 3.4: Friendship structure |