General Intelligence and Seed AI is
©2001 by Singularity Institute for Artificial Intelligence, Inc.
All rights reserved.
Intelligence
is an evolutionary advantage because it enables us to model, predict, and
manipulate reality. This includes not only Joe Caveman (or rather,
Pat Hunter-Gatherer) inventing the bow and arrow, but Chris Tribal-Chief
outwitting his (1) political
rivals and Sandy Spear-Maker realizing that the reason her spears keep
breaking is that she's being too impatient while making them. That
is, the "reality" we model includes not just things, but other humans,
and the self. (2).
A chain of reasoning is important because it
ends with a conclusion about how the world works, or about how the world
can be altered. The "world", for these purposes, includes the internal
world of the AI; when designing a bicycle, the hypothesis "a round object
can traverse ground without bumping" is a statement about the external
world. The hypotheses "it'd be a good idea to think about round objects",
or "the key problem is to figure out how to interface with the ground",
or even "I feel like designing a bicycle", are statements about the internal
world.
From an external perspective, cognitive events matter only insofar as
they affect external behavior. Just so, from an internal perspective,
the effect on the world-model is the punchline, the substance. This
is not to say that every line of code must make a change to the world-model,
or that the world-model is composed exclusively of high-level beliefs about
the real world. The thought sequences that construct a what-if scenario
- a subjunctive fantasy world - are altering a world-model,
even if it's not the model of the world. A "vague feeling
that there's some kind of as-yet unnamed similarity between two pictures"
is part of the content of the AI's beliefs about the world. The code
that produces that intuition may undergo many internal iterations, acting
on data structures with no obvious correspondence to the world-model, before
producing an understandable output.
What makes a pattern of bytes - or neurons - a "model"? And what
makes a particular statement in that model "true" or "false"? (3).
The best definition I've found is derived from looking at the cause
of our intelligence: "Intelligence is an evolutionary advantage because
it enables us to model, predict, and manipulate reality." Models
are useful because they correspond to external reality.
I
distinguish four levels of binding:
-
A sensory binding occurs when there is a mapping between the model's
data structures and characteristics of external reality.
-
A predictive binding occurs when the model can be used to correctly
predict future sensory inputs. (This presumes some kind of sensory
device targeted on external reality.)
-
A decisive binding occurs when the model can predict the effects
of several possible actions on external reality, and choose whichever
action gives the best result (according to some goal system). By
modeling the future given each of several possible actions, it becomes
possible to choose between futures - that is, between future sensory inputs.
(If the model is sufficiently accurate.)
-
A manipulative binding occurs when a future can be hypothesized,
and a sequence of actions invented which results in that future.
Given a desirable future - that is, a high-level property of the model
which is defined by the goal system as an end in itself ("supergoal"),
or which is a means to an end ("subgoal") - it is possible to invent the
actions required to bring the model into correspondence with that future.
If the model is correct, taking the specified external actions will actually
result in the desired external reality.
-
Qualitative actions are selected from a finite set. If this
set is small enough that all possible actions can be modeled - and are
modeled - then there is no fundamental distinction between a "decisive
binding", and a manipulative binding that uses qualitative actions.
-
Quantitative actions have one or more real (i.e., floating-point)
parameters. Since this will usually make an exhaustive, "blind" search
either theoretically or practically impossible - particularly if the fit
must be exact - some conscious heuristic, or a reversible
feature of a sensory modality, must be used to derive the numerical action
required from the numerical outcome specified. (Note that adding
a continuous time parameter to a simple on-or-off qualitative action makes
it a quantitative action.)
-
Structural actions have multiple elements (quantitative or qualitative),
possibly with links or interactions (quantitative or qualitative).
Emitting a string of characters - "foobar" - would be an example of a structural
action. To deduce a required structural action without an exhaustive
or impossible search requires either (A) a known rule linking actions and
results, simple enough to be reversible, or (B) deliberate analysis of
the simpler elements making up the structure.
These definitions raise an army of fundamental issues - time,
causality, subjunctivity, goals, searching,
invention - but first, let's look at a concrete example. Imagine a microworld
composed of Newtonian billiard balls - a world of spheres (or circles),
each with a position, radius, mass, and velocity, interacting on some frictionless
surface (or moving in a two-dimensional vacuum). (4).
The "world-model" for an AI living in that microworld
consists of everything the AI knows about that world - the positions, velocities,
radii, and masses of the billiard balls. More abstract perceptions,
such as "a group of three billiard balls", are also part
of the world-model. The prediction that "billiard ball A and billiard
ball B will collide" is part of the world-model. If the AI imagines
a situation where four billiard balls are arranged in a square, then that
imaginary world has its own, subjunctive world-model.
If the AI believes "'imagining four billiard balls in a square' will prove
useful in solving problem X", then that belief is part of the world-model.
In short, the world-model is not necessarily a programmatic concept
- a unified set of data structures with a common format and API.
(Although it would be wonderfully convenient, if we could pull it off.)
The "world-model" is a cognitive concept; it refers to the content of all
beliefs, the substance of all mental imagery.
Returning to the billiard-ball world, what is necessary for an AI to
have a "model" of this world?
-
A sensory binding occurs when there is covariance between internal
data structures of the AI and external properties of the billiard-ball
world. For example, when the floating-point number representing the
position of the billiard ball varies with the actual position of the billiard
ball. We would also require that the same mapping - the same rules
of interpretation - suffice to establish a binding between the modeled
positions and actual positions of all the other billiard balls. (5).
-
A predictive binding occurs when the model is accurate enough to
predict the future positions of billiard balls. Assume a sensory
device that reveals the positions of billiard balls to the AI, with a sensory
binding (correspondence) between the data output by the sensory device
and the actual positions. When the AI can establish a sensory binding
(correspondence) between predicted data and actual data, a predictive binding
has occurred.
-
A decisive binding requires that some limited set of actions be
available to the AI - for example, choosing whether to subtract some fixed
increment of momentum each time a billiard ball bounces off a wall.
(This action has been chosen so as to introduce no quantitative elements.)
It requires a goal state, such as "three balls halted on the north side
of the board". It requires that the AI be able to project the results
of actions - to predict the world-state given the current world-model plus
the fact of the action. It requires that the AI be able to recognize,
internally, whether a given imagined result meets the criteria of the goal-state.
Given these cognitive capabilities in a perfect world (6),
a blind search through possible actions, combined with the programmatic
rule "When an imagined situation meets the goal criteria, implement the
action-list leading to that situation", would create the "atomic" case
of decisive binding. (Of course, in accordance with the Law of Pragmatism,
simplifying the design down to the level where it's easy to visualize the
code has stripped it of all useful intelligence. Real minds are vastly
more complex.)
-
A manipulative binding would occur if, for example, the AI could
control a cue ball, and knew how to use this cue ball to "create two symmetrical
groups of three billiard balls". In this particular example, a structural
result (two groups of three) is obtained through a series of quantitative
actions (forces applied to the cue ball at particular times).
In the last case, the AI may have been able to manipulate each of the six
billiard balls as a separate object, or each action may have affected multiple
balls simultaneously, requiring a more complex planning process.
The important thing is that "creating two symmetrical groups of three billiard
balls" is not something that would happen by chance, or be uncovered by
a blind search. For the AI to create a structure of billiard balls,
it will need heuristics - knowledge about rules - that not
only link outcomes to actions, but reverse the process to link actions
to outcomes.
Suppose that a cue ball travelling south at 4 meters/second, bumping
into a billiard ball travelling south at 2 meters/second, results in the
cue ball and the billiard ball travelling south at 3 meters/second.
Suppose, furthermore, that these rules are contained within the AI's internal
model of the environment, so that if the AI visualizes a cue ball at {8.2,
6} of radius 1 travelling south at 4 m/s, and a ball at {8.2, 10} of radius
1 going south at 2 m/s, the AI will visualize the balls bumping one second
later at {8.2, 11}, and the two balls then travelling south at 3 m/s.
It's a long way from there to knowing - consciously, declaratively
- that two balls in general bumping at 4 m/s and 2 m/s while going
in the same direction will travel on together at 3 m/s. It's an even
longer way to knowing that "if billiard ball X bumps into billiard ball
Y, then they will continue on together with the average of their velocities".
And it's a still longer way to reversing the rule and knowing that
"to get a group of two balls travelling together with velocity X, given
billiard ball A with velocity Y, bump it with billiard ball B having velocity
(2X - Y)". Finally, to close the loop, this last high-level rule
must be applied to create a particular hypothesized action in the
world-model, and the hypothesized action needs to be taken as a real action
in external reality.
Without jumping too far ahead, there are a number of properties that
a world-model needs to support high-level thought. It needs to support
time
- multiple frames or a temporal visualization - with accompanying extraction
of temporal features. It needs to support predictions
and expectations (and an expectation isn't real unless the
AI notices when the expectation is fulfilled, and especially when it is
violated). The world-model needs to support hypotheses, subjunctive
frames of visualization, which are distinct from "real reality" and can
be manipulated freely by high-level thought. (By "freely manipulated",
I mean a direct manipulative binding; choosing to think about a
billiard ball at position {2, 3} should cause a billiard ball to materialize
directly within the representation at {2, 3}, with no careful sequence
of actions required.) And for the visualization to be useful once
it exists, the high-level thought which created the billiard-ball image
must
refer to the particular image visualized... and the reference
must run both ways, a two-way linkage.
Time, expectation, comparision, subjunctivity,
visualization, introspection, and reference. I haven't defined any
of these terms yet. (Most are discussed in 3: Cognition,
although you can jump ahead to Appendix A: Glossary if you're
impatient.) Nonetheless, these are some of the basic attributes that
are present in human world-models, and which are Necessary (But Not Sufficient)
for the existence of high-level features such as causality,
intentionality, goals, memory, learning, association, focus,
abstraction, categorization, and symbolization.
| NOTE: |
I mention that list of features to illustrate what will probably
be one of the major headaches for AI designers: If you design a system
and forget to allow for the possibility of expectation,
comparision, subjunctivity, visualization, or whatever,
then you'll either have to go back and redesign every single component
to open up space for the new possibilities, or start all over from scratch.
Actualities
can always be written in later, but the potential has to be there
from the beginning, and that means a designer who knows the requirements
spec in advance. |
- 1: After some soul-searching, I decided to use "his"
instead of "vis", since (a) hunter-gatherer societies are often blatantly
sexist; (b1) I'd have no qualms about using "his" or "her" if we were talking
about Alice and Bob in cryptography; (b2) from a cosmic perspective, one
occupation has no greater significance than the other.
- 2: It is very likely that human intelligence derives
not from the need to outwit tigers, but the need to outwit other humans.
(See conspecifics, and sexual selection in
the glossary.) Hopefully, none of this will hold true of AIs.
It's just an important thing to know about humans.
- 3: Philosophers
have been wrestling with this problem, "the meaning of meaning", for ages.
Attempts to create a mathematical definition are probably doomed; there
are no selection pressures in favor of reasoning processes which are precisely
definable and provably correct. Evolution favors the creation of
useful
models - that is, models whose use promotes inclusive reproductive fitness.
In some cases, such as tribal politics, selection pressures may have favored
inaccurate, observer-biased models, with consequent problems for modern-day
humanity. See also Interlude: The Consensus and the Veil of Maya.
- 4: This may
sound like the setup for one of those jokes that ends with the physicist
saying "First, assume a spherical chicken...", but the billiard-ball domain
is complex enough to pose nearly every problem that would be faced by a
real-world AI, including uncertainty. Even if sensory information
is perfect and complete, the internal model is still uncertain -
will spending 30 CPU-seconds on a problem-solving strategy yield results,
or just another blind alley?
- 5: Since
a mapping inherently requires a mapper, I do not believe that there is
any way to mathematically define a sensory binding in an observer-independent
fashion. In fact, I do not believe there is any way to define any
binding in an observer-independent way. I do not believe there is
any mathematical way to define when Turing-computable process
A instantiates Turing-computable process B. I've tried.
- 6: In an uncertain
world, the AI would need to be able to recognize if a plan had worked,
and re-plan if the actions did not have the predicted results. Smart
minds design plans that bear in mind the possibility of error.