Creating Friendly AI is ©2001 by Singularity Institute for Artificial Intelligence, Inc.  All rights reserved.

Next: Interlude: The story of a blob Bookmark
Up: Creating Friendly AI Monolithic
Prev: 1: Challenges of Friendly AI


An Introduction to Goal Systems

Goal-oriented behavior is behavior that leads the world towards a particular state.  A thermostat is the classic example of goal-oriented behavior; a thermostat turns on the air conditioning when the temperature reaches 74 and turns on the heat when the temperature reaches 72.  The thermostat steers the world towards the state in which the temperature equals 73 - or rather, a state that can be described by "the house has a temperature of 73"; there are zillions (ten-to-the-zillions, rather) of possible physical states that conform to this description, even ignoring all the parts of the Universe outside the room.  Technically, the thermostat steers the room towards a particular volume of phase space, rather than a single point; but the set of points, from our perspective, is compact enough to be given a single name.  Faced with enough heat, the thermostat may technically fail to achieve its "goal", and the temperature may creep up past 75, but the thermostat still activates the air conditioning, and the themostat is still steering the room closer to 73 degrees than it otherwise would have been.

Within a mind, goal-oriented behaviors arise from goal-oriented cognition.  The mind possesses a mental image of the "desired" state of the world, and a mental image of the actual state of the world, and chooses actions such that the projected future of world-plus-action leads to the desired outcome state.  Humans can be said to implement this process because of a vast system of instincts; emotions; mental images; intuitions; pleasure and pain; thought sequences; nonetheless, the overall description usually holds true.

Any real-world AI will employ goal-oriented cognition.  It might be theoretically possible to build an AI that made choices by selecting the first perceived option in alphabetical ASCII order, but this would result in incoherent behavior (at least, incoherent from our perspective) with actions cancelling out, rather than reinforcing each other.  In a self-modifying AI, such incoherent behavior would rapidly tear the mind apart from the inside, if it didn't simply result in a string of error messages (effective stasis).  Of course, if it were possible to obtain Friendly behavior by choosing the first option in alphabetical order, and such a system were stably Friendly under self-modification, then that would be an excellent and entirely acceptable decision system!  Ultimately, it is the external behaviors we are interested in.  Even that is an overstatement; we are interested in the external results.  But as far as we humans know, the only way for a mind to exhibit coherent behavior is to model reality and the results of actions.  Thus, internal behaviors are as much our concern as external actions.  Internal behaviors are the source of the final external results.

To provide a very simple picture of a choice within a goal-oriented mind:

NOTE: Don't worry about the classical-AI look.  The neat boxes are just so that everything fits on one graph.  The fact that a single box is named "Goal B" doesn't mean that "Goal B" is a data structure; Goal B may be a complex of memories and abstracted experiences.  In short, consider the following graph to bear the same resemblance to the AI's thoughts that a flowchart bears to a programmer's mind.

 

Diagram 1: Simple choice

NOTE: Blue lines indicate predictions.  Rectangles indicate goals.  Diamonds indicate choices.  An oval or circle indicates a (non-goal) object or event within the world-model.

For this simple choice, the desirability of A is 23.75, and the desirability of ~A is 8.94, so the mind will choose A.  If A is not an atomic action - if other events are necessary to achieve A - then A's child goals will derive their desirability from the total desirability of A, which is 14.81.  If some new Event E has an 83% chance of leading to A, all else being equal, then Event E will become a child goal of A, and will have desirability of 12.29.  If B's desirability later changes to 10, the inherent desirability of A will change to 19, the total desirability of A will change to 10.06, and the desirability of E will change to 8.35.  The human mind, of course, does not use such exact properties, and rather uses qualitative "feels" for how probable or improbable, desirable or undesirable, an event is.  The uncertainties inherent in modeling the world render it too expensive for a neurally-based mind to track desirabilities to four significant figures.  A mind based on floating-point numbers might track desirabilities to nineteen decimal places, but if so, it would not contribute materially to intelligence (1).

In goal-oriented cognition, the actions chosen, and therefore the final results, are strictly dependent on the model of reality, as well as the desired final state.  A mind that desires a wet sponge, and knows that placing a sponge in water makes it wet, will choose to place the sponge in water.  A mind that desires a wet sponge, and which believes that setting a sponge on fire makes it wet, will choose to set the sponge on fire.  A mind that desires a burnt sponge, and which believes that placing a sponge in water burns it, will choose to place the sponge in water.  A mind which observes reality, and learns that wetting a sponge requires water rather than fire, may change actions (2).

One of the most important distinctions in Friendly AI is the distinction between supergoals and subgoals.  A subgoal is a way station, an intermediate point on the way to some parent goal, like "getting into the car" as a child goal of "driving to work", or "opening the door" as a child goal of "getting into the car", or "doing my job" as a parent goal of "driving to work" and a child goal of "making money".  (3).  Child goals are cognitive nodes that reflect a natural network structure in plans; three child goals are prerequisite to some parent goal, while two child2-goals are prerequisite to the second child1-goal, and so on.  Subgoals are useful cognitive objects because subgoals reflect a useful regularity in reality; some aspects of a problem can be solved in isolation from others.  Even when subgoals are entangled, so that achieving one subgoal may block fulfilling another, it is still more efficient to model the entanglement than to model each possible combination of actions in isolation.  (For example:  The chess-playing program Deep Blue, which handled the combinatorial explosion of chess through brute force - that is, without chunking facets of the game into subgoals - still evaluated the value of individual board positions by counting pieces and checking strategic positions.  A billion moves per second is not nearly enough to carry all positions to a known win or loss.  Pieces and strategic positions have no intrinsic utility in chess; the supergoal is winning.)

Subgoals are cached intermediate states between decisions and supergoals.  It should always be possible, given enough computational power, to eliminate "subgoals" entirely and make all decisions based on a separate prediction of expected supergoal fulfillment for each possible action.  This is the ideal that a normative reflective goal system should conceive of itself as approximating.

Subgoals reflect regularities in reality, and can thus twinkle and shift as easily as reality itself, even if the supergoals are absolutely constant.  (Even if the world itself were absolutely constant, changes in the model of reality would still be enough to break simplicity.)  The world changes with time.  Subgoals interfere with one another; the consequences of the achievement of one subgoal block the achievement of another subgoal, or downgrade the priority of the other subgoal, or even make the other subgoal entirely undesirable.  A child goal is cut loose from its parent goal and dies, or is cut loose from its parent goal and attached to a different parent goal, or attached to two parent goals simultaneously.  Subgoals acquire complex internal structure, so that changing the parent goal of a subgoal can change the way in which the subgoal needs to be achieved.  The grandparent goals of context-sensitive grandchildren transmit their internal details down the line.  Most of the time, we don't need to track plots this complicated unless we become ensnared in a deadly web of lies and revenge, but it's worth noting that we have the mental capability to track a deadly web of lies and revenge when we see it on television.

None of this complexity necessarily generalizes to the behavior of supergoals, which is why it is necessary to keep a firm grasp on the distinction between supergoals and subgoals.  If generalizing this complexity to supergoals is desirable, it may require a deliberate design effort.

That subgoals are probabilistic adds yet more complexity.  The methods that we use to deal with uncertainty often take the form of "heuristics" - rules of thumb - that have a surprising amount of context-independence.  "The key to strategy is not to choose a path to victory, but to choose so that all paths lead to a victory", for example.  Even more interesting, from a Friendly AI perspective, are "injunctions", heuristics that we implement even when the direct interpretation of the world-model seems opposed.  We'll analyze injunctions later; for now, we'll just note that there are some classes of heuristic - both injunctions, and plain old strategy heuristics - that act on almost all plans.  Thus, plans are produced, not just by the immediate "subgoals of the moment", but also by a store of general heuristics.  Yet such heuristics may still be, ultimately, subgoals - that is, the heuristics may have no desirability independent of the ultimate supergoals.

Cautionary injunctions often defy the direct interpretation of the goal system - suggesting that they should always apply, even when they look non-useful or anti-useful.  "Leaving margin for error," for example.  If you're the sort of person who leaves for the airport 30 minutes early, then you know that you always leave 30 minutes early, whether or not you think you're likely to need it, whether or not you think that the extra 30 minutes are just wasted time.  This happens for two reasons:  First, because your world-model is incomplete; you don't necessarily know about the factors that could cause you to be late.  It's not just a question of there being a known probability of traffic delays; there's also the probabilities that you wouldn't even think to evaluate, such as twisting your ankle in the airport.  The second reason is a sharp payoff discontinuity; arriving 30 minutes early loses 30 minutes, but arriving 30 minutes late loses the price of the plane ticket, possibly a whole day's worth of time before the next available flight, and also prevents you from doing whatever you needed to do at your destination.  "Leaving margin for error" is an example of a generalized subgoal which sometimes defies the short-term interpretation of payoffs, but which, when implemented consistently, maximizes the expected long-term payoff integrated over all probabilities.

Even heuristics that are supposed to be totally unconditional on events, such as "keeping your sworn word", can be viewed as subgoals - although such heuristics don't necessarily translate well from humans to AIs.  A human who swears a totally unconditional oath may have greater psychological strength than a human who swears a conditional oath, so that the 1% chance of encountering a situation where it would genuinely make sense to break the oath doesn't compensate for losing 50% of your resolve from knowing that you would break the oath if stressed enough.  It may even make sense, cognitively, to install (or preserve) psychological forces that would lead you to regard "make sense to break the oath" as being a nonsensical statement, a mental impossibility.  This way of thinking may not translate well for AIs, or may translate only partially.  (4)  Perhaps the best interim summary is that human decisions can be guided by heuristics as well as subgoals, and that human heuristics may not be cognitively represented as subgoals, even if the heuristics would be normatively regarded as subgoals.

Human decision-making is complex, probably unnecessarily so.  The way in which evolution accretes complexity results in simple behaviors being implemented as independent brainware even when there are very natural ways to view the simple behaviors as special cases of general cognition, since general cognition is an evolutionarily recent development.  For the human goal supersystem, there is no clear way to point to a single level where the "supergoals" are; depending on how you view the human supersystem, supergoals could be identified with declarative philosophical goals, emotions, or pain and pleasure.  Ultimately, goal-oriented cognition is not what humans are, but rather what humans do.  I have my own opinions on this subject, and the phrase "godawful mess" leaps eagerly to mind, but for the moment I'll simply note that the human goal system is extremely complicated; that every single chunk of brainware is there because it was adaptive at some point in our evolutionary history; and that engineering should learn from evolution but never blindly obey it.  The differences between AIs and evolved minds are explored further in the upcoming section 2: Beyond anthropomorphism.

DEFN: Goal-oriented behavior:  Goal-oriented behavior is behavior that steers the world, or a piece of it, towards a single state, or a describable set of states.  The perception of goal-oriented behavior comes from observing multiple actions that coherently steer the world towards a goal; or singular actions which are uniquely suited to promoting a goal-state and too improbable to have arisen by chance; or the use of different actions in different contexts to achieve a single goal on multiple occasions.  Informally:  Behavior which appears deliberate, centered around a goal or desire.

DEFN: Goal-oriented cognition:  A mind which possesses a mental image of the "desired" state of the world, and a mental image of the actual state of the world, and which chooses actions such that the projected future of world-plus-action leads to the desired outcome state.

DEFN: Goal:  A piece of mental imagery present within an intelligent mind which describes a state of the world, or set of states, such that the intelligent mind takes actions which are predicted to achieve the goal state.  Informally:  The image or statement that describes what you want to achieve.

DEFN: Causal goal system:  A goal system in which desirability backpropagates along predictive links.  If A is desirable, and B is predicted to lead to A, then B will inherit desirability from A, contingent on the continued desirability of A and the continued expectation that B will lead to A.  Since predictions are usually transitive - if C leads to B, and B leads to A, it usually implies that C leads to A - the flow of desirability is also usually transitive.

DEFN: Child goal:  A prerequisite of a parent goal; a state or characteristic which can usefully be considered as an independent event or object along the path to the parent goal.  "Child goal" describes a relation between two goals - it does not make sense to speak of a goal as being "a child" or "a parent" in an absolute sense, since B may be a child goal of A but a parent goal of C.

DEFN: Parent goal:  A source of desirability for a child goal.  The end to which the child goal is the means.  "Parent goal" describes a relation between two goals - it does not make sense to speak of a goal as being "a parent" or "a child" in an absolute sense, since B may be a parent goal of C but a child goal of A.

DEFN: Subgoal:  An intermediate point on the road to the supergoals.  A state whose desirability is contingent on its predicted outcome.

DEFN: Supergoal content:  The root of a directional goal network.  A goal which is treated as having intrinsic value, rather than having derivative value as a facilitator of some parent goal.  An event-state whose desirability is not contingent on its predicted outcome.  (Conflating supergoals with subgoals seems to account for a lot of mistakes in speculations about Friendly AI.)



Next: Interlude: The story of a blob
Up: Creating Friendly AI
Prev: 1: Challenges of Friendly AI