Creating Friendly AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: 3.5: Developmental Friendliness Bookmark
Up: 3.4: Friendship structure Monolithic
Prev: 3.4.3: Causal validity semantics


3.4.4: The actual definition of Friendliness

NOTE: If you just jumped directly here, forget it.  Read the rest of CFAI first.

I usually think of an individual human as being the sum of three layers of functional complexity.  The bottom layer is the panhuman; the set of complex functional adaptations that are shared by all humans who are not actually suffering from neurological disorders (with any given complex functional adaptation being shared by, say, at least 99% of the population and almost always more).  The middle layer is the Gaussian; the distribution of quantitative ability levels (or anything else that can be quantitative, such as the intrinsic strength of an emotion), which almost always lies along a bell curve.  (1).  (2).  The final layer is what I think of as the "personal" or "personality" layer; all the complex data structures, the patterned data, the beliefs and memes and so on.

The renormalizing shaper network should ultimately ground itself in the panhuman and gaussian layers, without use of material from the personality layer of the original programmer.  This is how "programmer independence" is ultimately defined.

Humanity is diverse, and there's still some variance even in the panhuman layer, but it's still possible to conceive of description for humanity and not just any one individual human, by superposing the sum of all the variances in the panhuman layer into one description of humanity.  Suppose, for example, that any given human has a preference for X; this preference can be thought of as a cloud in configuration space.  Certain events very strongly satisfy the metric for X; others satisfy it more weakly; other events satisfy it not at all.  Thus, there's a cloud in configuration space, with a clearly defined center.  If you take something in the panhuman layer (not the personal layer) and superimpose the clouds of all humanity, you should end up with a slightly larger cloud that still has a clearly defined center.  Any point that is squarely in the center of the cloud is "grounded in the panhuman layer of humanity".

Similarly, for Gaussian abilities, some abilities are recognized by the shaper network as being "good", and those are just amped to the right end of the graph, or way off the end of the graph.  If an ability is not "good" or "bad" but its level is still important, or its relative level is important, this can be determined by reference to the superposition of humanity.

Panhuman attributes that we would think of as "selfish" or "observer-biased" tend to cancel out in the superposition; since each individual human has a drastically different definition, the cloud is very thin, and insofar as it can be described at all, would center about equally on each individual human.  Panhuman attributes such as "altruism", especially morally symmetric altruism or altruism that has been phrased using the semantics of objectivity, or by other means made a little more convergent for use in "morality" and not just the originating mind, builds up very strongly when all the humans on Earth are superposed.  The difference is analogous to that between a beam of incoherent light and a laser.

Is this fair?  Consider two children arguing over a candy bar.  One says he should get all of it; the other says they should split it evenly and she should get one-half.  Is it fair to split the candy bar by giving him three-quarters and her one-quarter?  No.  The fair distribution is half and half.  There is an irreducible level of fairness, and if 95% of humanity agrees that half and half is irreducibly fair, then the remaining 5% who each individually think that it's more fair if they each get the entire Solar System do not impact on the majority vote.  And unless you're one of that 5%, this should be "fair" according to your intuitions; there should be no fairer way of doing it.  (Also, the remaining 5% would look like a thin cloud rather than a laser beam; the point here is that even the thin cloud fails to affect things, because while higher layers might be settled by compromise, the bottom layer of irreducible fairness is settled by majority vote of the panhuman superposition of humanity.)

If there is really anything that matters to the final outcome of Friendliness in the personality layer, and I don't think there is, it can be settled by, i.e., majority vote of the superposition of normative humanity, or just the superposition of humanity if the decision needs to made prior to the determination of "normative".

And that is programmer-independent normative altruism.

All that is required is that the initial shaper network of the Friendly AI converge to normative altruism.  Which requires all the structural Friendliness so far described, an explicit surface-level decision of the starting set to converge, prejudice against circular logic as a surface decision, protection against extraneous causes by causal validity semantics and surface decision, use of a renormalization complex enough to prevent accidental circular logic, a surface decision to absorb the programmer's shaper network and normalize it, plus the assorted injunctions, ethical injunctions, and anchoring points that reduce the probability of catastrophic failure.  Add in an initial, surface-level decision to implement volitional Friendliness so that the AI is also Friendly while converging to final Friendliness...

And that is Friendly AI.

3.4.4.1: Requirements for "sufficient" convergence

Complete convergence, a perfectly unique solution, is the ideal.
In the absence of perfect convergence, the solution must be sufficiently convergent:



Next: 3.5: Developmental Friendliness
Up: 3.4: Friendship structure
Prev: 3.4.3: Causal validity semantics